date:20170105

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Amit Langote


Hi Keith,

On 2017/01/06 2:16, Keith Fiske wrote:
> Could we get some clarification on the partition_bound_spec portion of the
> PARTITION OF clause? Just doing some testing it seems it's inclusive of the
> FROM value but exclusive of the TO value. I don't see mention of this in
> the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
> It does mention that the values aren't allowed to overlap, but looking at
> the schema below, without the clarification of which side is
> inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
> the child table does not clarify this. Not sure if there's a way to do this
> in the \d+ display which would be ideal, but it should at least be
> mentioned in the docs.

I agree that needs highlighting.  I'm planning to write a doc patch for
that (among other documentation improvements).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group clear xid can leak semaphore count

2017-01-05 Thread Amit Kapila

On Fri, Jan 6, 2017 at 1:13 AM, Robert Haas  wrote:
> On Sat, Dec 31, 2016 at 12:44 AM, Amit Kapila  wrote:
>> During the review of Group update Clog patch [1], Dilip noticed an
>> issue with the patch where it can leak the semaphore count in one of
>> the corner case.  I have checked and found that similar issue exists
>> for Group clear xid (ProcArrayGroupClearXid) as well.  I think the
>> reason why this problem is not noticed by anyone till now is that it
>> can happen only in a rare scenario when the backend waiting for xid
>> clear is woken up due to some unrelated signal.  This problem didn't
>> exist in the original commit
>> (0e141c0fbb211bdd23783afa731e3eef95c9ad7a) of the patch, but later
>> while fixing some issues in the committed patch, it got introduced in
>> commit 4aec49899e5782247e134f94ce1c6ee926f88e1c. Patch to fix the
>> issue is attached.
>
> Yeah, that looks like a bug.  Thanks for the detailed analysis;
> committed and back-patched to 9.6.
>

Thanks for committing the fix.  I have updated the CF app entry for this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-05 Thread Pavel Stehule

2017-01-05 18:36 GMT+01:00 Merlin Moncure :

> On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas 
> wrote:
> > Now, that's not to say we should never break backward compatibility.
> > Sometimes we should.  I think the problem with PL/pgsql is that many
> > of the compatibility breaks that people want are likely to lead to
> > subtle misbehavior rather than outright failure, or are not easy to
> > spot via a cursory look ("hmm, could that SELECT query ever return
> > more than one row?").
>
> The core issue is that developers tend to be very poor at estimating
> the impacts of changes; they look at things the the lens of the "new".
> Professional software development is quite expensive and framework-
> (I'll lump the database and it's various built-in features under that
> term) level changes are essentially throwing out some portion of our
> user's investments.  Even fairly innocent compatibility breaks can
> have major downstream impacts on our users and it's always much worse
> than expected.  For example, nobody thought that changing the bytea
> text encoding format to hex would have corrupted our user's data, but
> it did.
>
> TBH, the discussion should shift away from specific issues on
> compatibility and towards a specific set of standards and policies
> around how to do it and what kinds of technical justifications need to
> be made in advance.  Security problems for example could be argued as
> a valid reason to break user code, or poor adherence to the the SQL
> standard which are in turn blocking other content.  Minus those kinds
> of considerations it's really just not worth doing, and there's no
> tricky strategy like playing with version numbers that can game that
> rule.  A formal deprecation policy might be a good start.
>
> The C language really should be considered the gold standard here.
> Changes did have to be made, like getting rid of the notoriously
> broken and insecure gets(), but they were made very, very slowly and
> unobtrusively.
>
> (I do think lpad should except "any" FWIW) :-D
>

I fully agree - sometimes there is fuzzy border in understanding what is
bug and what unhappy designed feature - probably lot of useful features can
be taken and used wrong.

Regards

Pavel




>
> merlin
>

[HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-05 Thread Ryan Murphy

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

(Though I could not check "make installcheck-world" as passed because it failed 
1 test, I think it basically SHOULD pass - see my comment below.)

Patch looks good to me and does what we talked about, and Docs seem clear and 
correct.

I was able to build Postgres and run pg_ctl and observe that it waited by 
default for the 'start' action, which addresses my original concern.

`make` and `make install` went fine, and `make check` did as well, but `make 
installcheck-world` said (after a while):

===
 1 of 55 tests failed. 
===

The diff summary is here:

*** 
/home/my_secret_local_username/my/secret/path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr
2016-08-23 10:00:53.0 -0500
--- 
/home/my_secret_local_username/my/secret/path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr
 2017-01-06 00:08:40.0 -0600
***
*** 36,42 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGconnect: opening database  on  port  
 for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
  
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ecpg_finish: connection main closed
--- 36,42 
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ECPGconnect: opening database  on  port  
 for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"my_secret_local_username" does not exist
  
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ecpg_finish: connection main closed
***
*** 73,79 
  [NO_PID]: sqlca: code: -220, state: 08003
  [NO_PID]: ECPGconnect: opening database  on  port  
 for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
  
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ecpg_finish: connection main closed
--- 73,79 
  [NO_PID]: sqlca: code: -220, state: 08003
  [NO_PID]: ECPGconnect: opening database  on  port  
 for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 0
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"my_secret_local_username" does not exist
  
  [NO_PID]: sqlca: code: 0, state: 0
  [NO_PID]: ecpg_finish: connection main closed

==


I don't actually believe this to indicate a problem though - I think perhaps 
there's a problem with this test, or with how I am running it.  The only diff 
was that when it (correctly) complained of a nonexistent database, it referred 
to my username that I was logged in as, instead of the test database name 
"regress_ecpg_user2".  I don't think this has anything to do with the changes 
to pg_ctl.

I could be wrong though!  I am going to leave this as "Needs review" until 
someone more familiar with the project double-checks this.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-05 Thread Ashutosh Sharma

> The previous patch was using pageinspect--1.5.sql as a base, and then uses
> pageinspect--1.5-1.6.sql to upgrade to version 1.6.
>
> Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the
> current interface will use pageinspect--1.6.sql directly where as existing
> installations will use the upgrade scripts.
>
> Hence I don't see a reason why we should keep pageinspect--1.5.sql around
> when we can provide a clear interface description in a pageinspect--1.6.sql.

okay, agreed.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-05 Thread Amit Kapila

On Thu, Jan 5, 2017 at 11:38 AM, Mithun Cy  wrote:
> On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy  wrote:
> I have re-based the patch to fix one compilation warning
> @_hash_doinsert where variable bucket was only used for Asserting but
> was not declared about its purpose.
>

Few more comments:
1.
 }
+
+
+/*
+ * _hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given

no need to two extra lines, one is sufficient and matches the nearby
coding pattern.

2.
+ * old bucket in case of bucket split, see @_hash_doinsert().

Do you see anywhere else in the code the pattern of using @ symbol in
comments before function name?  In general, there is no harm in using
it, but maybe it is better to be consistent with usage at other
places.

3.
+ /*
+ * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+ * Should be safe to use further.
+ */
+ bucket = pageopaque->hasho_bucket;

  /*
  * If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert:
  LockBuffer(buf, BUFFER_LOCK_UNLOCK);

  _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,

after this change, i think you can directly use bucket in
_hash_finish_split instead of pageopaque->hasho_bucket

4.
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan,
  *bufp = InvalidBuffer;
  /* check for interrupts while we're not holding any buffer lock */
  CHECK_FOR_INTERRUPTS();
- if (BlockNumberIsValid(blkno))
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))

Won't the check similar to the existing check (if (*bufp ==
so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
above this code will suffice the need?  If so, then you can check it
once and use it in both places.

5. The reader and insertion algorithm needs to be updated in README.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-05 Thread Michael Paquier

On Thu, Jan 5, 2017 at 8:39 PM, Beena Emerson  wrote:
> On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier 
> wrote:
>> Actually, why not just having an equivalent of the SQL
>> command and be able to query parameter values?
>
> This patch only needed the wal_segment_size and hence I made this specific
> command.
> How often and why would we need other parameter values in the replication
> connection?
> Making it a more general command to fetch any parameter can be a separate
> topic. If it gets consensus, maybe it could be done and used here.

I concur that for this patch it may not be necessary. But let's not
narrow us in a corner when designing things. Being able to query the
value of parameters is something that I think is actually useful for
cases where custom GUCs are loaded on the server's
shared_preload_libraries to do validation checks (one case is a
logical decoder on backend, with streaming receiver as client
expecting the logical decoder to do a minimum). This can allow a
client to do checks only using a replication stream. Another case that
I have in mind is for utilities like pg_rewind, we have been
discussing about being able to not need a superuser when querying the
target server. Having such a command would allow for example pg_rewind
to do a 'SHOW full_page_writes' without the need of an extra
connection.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2017-01-05 Thread Pavel Stehule

2017-01-06 7:01 GMT+01:00 Pavel Stehule :

>
>
>>
>>>
>>> Thank you for your work on this topic.
>>>
>>> Unfortunately, there is significant disagreement in this topic between
>>> us. I see a schema based persistent metadata a catalog based security as
>>> fundamental feature. Editing config file is not acceptable in any view.
>>>
>>
>> I generally agree with that. That said, it probably wouldn't be hard to
>> "register" GUCs during backend startup, based on what's in the catalog for
>> the database you're connecting to. There's certainly already a place in the
>> code to do this, since you can set per-database values for GUCs. That said,
>> IIRC GUCs are setup in such a way that could could just create a new stack
>> upon connection. Actually, I think that'd need to happen anyway, otherwise
>> these variables are going to look like GUCs even though they're not.
>
>
> Registration on every setup can be little bit expensive - more practical
> is variables initialized on demand - when they are used.
>

And if you have a catalog entry, then it is not necessary - self management
variables in memory is not too complex



>
> Regards
>
> Pavel
>
>>
>> --
>> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
>> Experts in Analytics, Data Architecture and PostgreSQL
>> Data in Trouble? Get it in Treble! http://BlueTreble.com
>> 855-TREBLE2 (855-873-2532)
>>
>

Re: [HACKERS] proposal: session server side variables

2017-01-05 Thread Pavel Stehule

>
>>
>> Thank you for your work on this topic.
>>
>> Unfortunately, there is significant disagreement in this topic between
>> us. I see a schema based persistent metadata a catalog based security as
>> fundamental feature. Editing config file is not acceptable in any view.
>>
>
> I generally agree with that. That said, it probably wouldn't be hard to
> "register" GUCs during backend startup, based on what's in the catalog for
> the database you're connecting to. There's certainly already a place in the
> code to do this, since you can set per-database values for GUCs. That said,
> IIRC GUCs are setup in such a way that could could just create a new stack
> upon connection. Actually, I think that'd need to happen anyway, otherwise
> these variables are going to look like GUCs even though they're not.


Registration on every setup can be little bit expensive - more practical is
variables initialized on demand - when they are used.

Regards

Pavel

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)
>

Re: [HACKERS] Logical decoding on standby

2017-01-05 Thread Michael Paquier

On Fri, Jan 6, 2017 at 1:07 PM, Craig Ringer  wrote:
> On 5 January 2017 at 13:12, Michael Paquier  wrote:
>> On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer  wrote:
>>> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>>>
 so here's a rebased series on top of master. No other changes.
>>>
>>> Now with actual patches.
>>
>> Looking at the PostgresNode code in 0001...
>> +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos,
>> timeout_secs, ...)
>> +
>> This format is incorrect. I think that you also need to fix the
>> brackets for the do{} and the eval{] blocks.
>>
>> +push @cmd, '--endpos', $endpos if ($endpos);
>> endpos should be made a mandatory argument. If it is not defined that
>> would make the test calling this routine being stuck.
>
> Acknowledged and agreed. I'll fix both in the next revision.  I'm
> currently working on hot standby replies, but will return to this
> ASAP.

By the way, be sure to fix as well the =pod blocks for the new
routines. perldoc needs to use both =pod and =item.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-05 Thread Ashutosh Sharma

Hi,

> using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test
>
> crashes after a few minutes with
>
> TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*)
> (&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)

Attached v4 patch fixes this assertion failure.

> BTW, better rename 'hashkillitems' to '_hash_kill_items' to follow the
> naming convention in hash.h

okay, I have renamed 'hashkillitems' to '_hash_kill_items'. Please
check the attached v4 patch.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com


microvacuum_hash_index_v4.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-01-05 Thread Amit Khandekar

Sorry for the delay in my next response. I still haven't exhaustively
gone through all changes, but meanwhile, below are some more points.

On 26 November 2016 at 18:18, Dilip Kumar  wrote:
> On Tue, Nov 22, 2016 at 9:05 AM, Dilip Kumar  wrote:
>> On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar  
>> wrote:
>>
>> Thanks for the review..
>
> I have worked on these comments..
>>
>>> In pbms_is_leader() , I didn't clearly understand the significance of
>>> the for-loop. If it is a worker, it can call
>>> ConditionVariablePrepareToSleep() followed by
>>> ConditionVariableSleep(). Once it comes out of
>>> ConditionVariableSleep(), isn't it guaranteed that the leader has
>>> finished the bitmap ? If yes, then it looks like it is not necessary
>>> to again iterate and go back through the pbminfo->state checking.
>>> Also, with this, variable queuedSelf also might not be needed. But I
>>> might be missing something here.
>>
>> I have taken this logic from example posted on conditional variable thread[1]
>>
>> for (;;)
>> {
>> ConditionVariablePrepareToSleep(cv);
>> if (condition for which we are waiting is satisfied)
>> break;
>> ConditionVariableSleep();
>> }
>> ConditionVariableCancelSleep();
>>
>> [1] 
>> https://www.postgresql.org/message-id/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com

With the latest patches, this looks fine to me.

>>> tbm_iterate() call under SpinLock :
>>> For parallel tbm iteration, tbm_iterate() is called while SpinLock is
>>> held. Generally we try to keep code inside Spinlock call limited to a
>>> few lines, and that too without occurrence of a function call.
>>> Although tbm_iterate() code itself looks safe under a spinlock, I was
>>> checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
>>> closer to each other. One thought is :  in tbm_iterate(), acquire the
>>> SpinLock before the while loop that iterates over lossy chunks. Then,
>>> if both chunk and per-page data remain, release spinlock just before
>>> returning (the first return stmt). And then just before scanning
>>> bitmap of an exact page, i.e. just after "if (iterator->spageptr <
>>> tbm->npages)", save the page handle, increment iterator->spageptr,
>>> release Spinlock, and then use the saved page handle to iterate over
>>> the page bitmap.
>>
>> Main reason to keep Spin lock out of this function to avoid changes
>> inside this function, and also this function takes local iterator as
>> input which don't have spin lock reference to it. But that can be
>> changed, we can pass shared iterator to it.
>>
>> I will think about this logic and try to update in next version.
> Still this issue is not addressed.
> Logic inside tbm_iterate is using same variable, like spageptr,
> multiple places. IMHO this complete logic needs to be done under one
> spin lock.

I think we both agree on the part that the mutex handle can be passed
to tbm_iterate() to do this. I am yet to give more thought on how
clumsy it will be if we add SpinlockRelease() calls in intermediate
places in the function rather than in the end.


>
>>
>>>
>>> prefetch_pages() call under Spinlock :
>>> Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
>>> Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
>>> would get called while under the Spinlock. These can even ereport().
>>> One option is to use mutex lock for this purpose. But I think that
>>> would slow things down. Moreover, the complete set of prefetch pages
>>> would be scanned by a single worker, and others might wait for this
>>> one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
>>> Spinlock only while incrementing pbminfo->prefetch_pages. The rest
>>> part viz : iterating over the prefetch pages, and doing the
>>> PrefetchBuffer() need not be synchronised using this
>>> pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
>>> its own iterator spinlock. Only thing is, workers may not do the
>>> actual PrefetchBuffer() sequentially. One of them might shoot ahead
>>> and prefetch 3-4 pages while the other is lagging with the
>>> sequentially lesser page number; but I believe this is fine, as long
>>> as they all prefetch all the required blocks.
>>
>> I agree with your point, will try to fix this as well.
>
> I have fixed this part.

This looks good now.


Further points below 



= nodeBItmapHeapscan.c ==


In BitmapHeapNext(), the following code is relevant only for tbm
returned from MultiExecProcNode().
if (!tbm || !IsA(tbm, TIDBitmap))
elog(ERROR, "unrecognized result from subplan");

So it should be moved just below MultiExecProcNode(), so that it does
not get called when it is created from tbm_create().

--

BitmapHeapScanState->is_leader field looks unnecessary. Instead, a
local variable is_leader in BitmapHeapNext() will solve the purpose.
This is because is_leader is used only in BitmapHeapNext().

--

In BitmapHeapNext(), just befo

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-05 Thread Vitaly Burovoy

On 1/4/17, Haribabu Kommi  wrote:
> On Tue, Nov 29, 2016 at 8:36 PM, Haribabu Kommi 
> wrote:
>> Updated patch attached with added cast function from macaddr8 to
>> macaddr.
>>
>> Currently there are no support for cross operators. Is this required
>> to be this patch only or can be handled later if required?
>>
>
> Updated patch attached to address the duplicate OID problem.
> There are no other functional changes to the previous patch.

Hello,

several thoughts about the patch:


Documentation:
1.
+ The remaining six input formats are not part of any standard.
Which ones (remaining six formats)?


2.
+ trunc(macaddr8) returns a MAC
+   address with the last 3 bytes set to zero.
It is a misprinting or a copy-paste error.
The implementation and the standard says that the last five bytes are
set to zero and the first three are left as is.


3.
+ for lexicographical ordering
I'm not a native English speaker, but I'd say just "for ordering"
since there are no words, it is just a big number with a special
input/output format.


The code:
4.
+   if ((a == 0) && (b == 0) && (c == 0) && (d == 0)
+   && (e == 0) && (f == 0) && (g == 0) && (h == 0))
...
+   if ((a == 255) && (b == 255) && (c == 255) && (d == 255)
+   && (e == 255) && (f == 255) && (g == 255) && (h == 255))
The standard forbids these values from using in real hardware, no
reason to block them as input data.
Moreover these values can be stored as a result of binary operations,
users can dump them but can not restore.


5.
+   if (!eight_byte_address)
+   {
+   result->d = 0xFF;
...

Don't you think the next version is simplier (all sscanf for macaddr6
skip "d" and "e")?

+   count = sscanf(str, "%x:%x:%x:%x:%x:%x%1s",
+  &a, &b, &c, &f, &g, &h, junk);
...
+   if (!eight_byte_address)
+   {
+   d = 0xFF;
+   e = 0xFE;
+   }
+   result->a = a;
+   result->b = b;
+   result->c = c;
+   result->d = d;
+   result->e = e;
+   result->f = f;
+   result->g = g;
+   result->h = h;

Also:
+   if (buf->len == 6)
+   {
+   addr->d = 0xFF;
+   addr->e = 0xFE;
+   addr->f = pq_getmsgbyte(buf);
+   addr->g = pq_getmsgbyte(buf);
+   addr->h = pq_getmsgbyte(buf);
+   }
+   else
+   {
+   addr->d = pq_getmsgbyte(buf);
+   addr->e = pq_getmsgbyte(buf);
+   addr->f = pq_getmsgbyte(buf);
+   addr->g = pq_getmsgbyte(buf);
+   addr->h = pq_getmsgbyte(buf);
+   }

can be written as:
+   if (buf->len == 6)
+   {
+   addr->d = 0xFF;
+   addr->e = 0xFE;
+   }
+   else
+   {
+   addr->d = pq_getmsgbyte(buf);
+   addr->e = pq_getmsgbyte(buf);
+   }
+   addr->f = pq_getmsgbyte(buf);
+   addr->g = pq_getmsgbyte(buf);
+   addr->h = pq_getmsgbyte(buf);

but it is only my point of view (don't need to pay close attention
that only those two bytes are written differently, not the last tree
ones), it is not an error.


6.
+errmsg("macaddr8 out of range to convert to 
macaddr")));
I think a hint should be added which values are allowed to convert to "macaddr".

7.
+DATA(insert (  829 7744123 i f ));
+DATA(insert (  774  829   4124 i f ));
It is a nitpicking, but try to use tabs as in the code around.
(tab between "774" and "829" and space instead of tab between "829" and "4124").

8.
+#define hibits(addr) \
+  ((unsigned long)(((addr)->a<<24)|((addr)->b<<16)|((addr)->c<<8)))
+
+#define lobits(addr) \
+  ((unsigned long)(((addr)->d<<16)|((addr)->e<<8)|((addr)->f)))
+
+#define lobits_extra(addr) \
+  ((unsigned long)((addr)->g<<8)|((addr)->h))

I mentioned that fitting all 4 bytes is a wrong idea[1]
> The macros "hibits" should be 3 octets long, not 4;

... but now I do not think so (there is no UB[2] because source and
destination are not signed).
Moreover you've already fill in "hibits" the topmost byte by shifting by 24.
If you use those two macros ("hibits" and "lobits") it allows to avoid
two extra comparisons in macaddr8_cmp_internal.
Version from the "macaddr64_poc.patch" is correct.


[1]https://www.postgresql.org/message-id/CAKOSWNng9_+=fvo6oz4tgv1kkhmom11ankihbokpuzki1ca...@mail.gmail.com
[2]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1817.htm


-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2017-01-05 Thread Joel Jacobson

On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian  wrote:
> Agreed.  No need in adding overhead for short-lived locks because the
> milli-second values are going to be meaningless to users. I would be
> happy if we could find some weasel value for non-heavyweight locks.

To avoid a NULL value for waiting_start, and thanks to non-heavyweight
locks don't exceed order-of-milliseconds, I think it would be
acceptable to just return now() whenever something wants to know
waiting_start i.e. when something selects from pg_stat_activity.

The exact value would only be within orders-of-milliseconds away from
now() anyway, so one can argue it's not that important, as long as the
documentation is clear on that point.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-01-05 Thread Amit Kapila

On Thu, Jan 5, 2017 at 8:42 AM, Tsunakawa, Takayuki
 wrote:
>> Your version of the patch looks better than the previous one.  Don't you
>> need to consider MEM_LARGE_PAGES in VirtualAllocEx call (refer
>> pgwin32_ReserveSharedMemoryRegion)?  At least that is what is mentioned
>> in MSDN [1].  Another point worth considering is that currently for
>> VirtualAllocEx() we use PAGE_READWRITE as flProtect value, shouldn't it
>> be same as used in CreateFileMapping() by patch.
>>
>>
>> [1] -
>> https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs
>> .85).aspx
>
> No, it's not necessary.  Please see PGSharedMemoryReAttach(), where 
> VirtualFree() is called to free the reserved address space and then call 
> MapViewOfFile() to allocate the already created shared memory to that area.
>

PGSharedMemoryReAttach is called after the startup of new process
whereas pgwin32_ReserveSharedMemoryRegion is called before the new
process could actually start.  Basically,
pgwin32_ReserveSharedMemoryRegion is used to reserve shared memory for
each backend, so calling VirtualAlloc there should follow spec for
huge pages.  If you have some reason for not using, then it is not
clear from your reply, can you try to explain in detail.

I have tried to test v4 version of the patch and it is always failing
in below error after call to AdjustTokenPrivileges:

+ if (GetLastError() != ERROR_SUCCESS)
+ {
+ if (GetLastError() == ERROR_NOT_ALL_ASSIGNED)
+ ereport(elevel,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("could not enable Lock pages in memory user right"),
+ errhint("Assign Lock pages in memory user right to the Windows user
account which runs PostgreSQL.")));

I have ensured that user used to run PostgreSQL has "Lock pages in
memory" privilege/rights.  I have followed msdn tips [1] to do that
(restarted the m/c after assigning privilege).  I am using Win7. Can
you share the steps you have followed to test and your windows m/c
details?

+ if (!LookupPrivilegeValue(NULL, SE_LOCK_MEMORY_NAME, &luid))
+ {
+ CloseHandle(hToken);
+ ereport(elevel,
+ (errmsg("could not enable Lock pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was LookupPrivilegeValue.")));
+ return FALSE;
+ }

The order of closing handle and ereport is different here than other
places in the same function.  If there is no specific reason for doing
so, then keep the order consistent.

[1] - 
https://msdn.microsoft.com/en-us/library/windows/desktop/ff961911(v=vs.85).aspx

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-05 Thread Craig Ringer

On 5 January 2017 at 20:43, Stas Kelvich  wrote:

> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
> commit records
> to know exactly what will be the cost of such approach.

Sounds like a good idea, especially if you remove any attempt to work
with GIDs for !2PC commits at the same time.

I don't think I care about having access to the GID for the use case I
have in mind, since we'd actually be wanting to hijack a normal COMMIT
and internally transform it to PREPARE TRANSACTION, , COMMIT
PREPARED. But for the more general case of logical decoding of 2PC I
can see the utility of having the xact identifier.

If we presume we're only interested in logically decoding 2PC xacts
that are not yet COMMIT PREPAREd, can we not avoid the WAL overhead of
writing the GID by looking it up in our shmem state at decoding-time
for PREPARE TRANSACTION? If we can't find the prepared transaction in
TwoPhaseState we know to expect a following ROLLBACK PREPARED or
COMMIT PREPARED, so we shouldn't decode it at the PREPARE TRANSACTION
stage.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2017-01-05 Thread Craig Ringer

On 6 January 2017 at 08:44, Jim Nasby  wrote:

>>(1) private/public visibility (as Oracle does with package vars).
>>this point is enough to implement the presented use case.

Agreed.

>>(2) typing (casting is a pain)

We already have typed GUCs and allow them to be user-defined. See
DefineCustomBoolVariable, DefineCustomIntVariable, etc.

What we lack is a way to declare and use typed dynamically
user-defined GUCs at runtime without a C extension.

We also lack user interface to load and store values without going via
their text representation; there's no current_setting_int4 etc.

So if we allow for now the idea that we'd extend the GUC model to do
this (which I'm not at all sure is a good thing) ... it's possible.

>>(5) have some "permanent" GUC type declarations (maybe editing the
>>config file does that already, by the way?)

We have that, but it currently requires a C extension.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-05 Thread Craig Ringer

On 5 January 2017 at 13:12, Michael Paquier  wrote:
> On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer  wrote:
>> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>>
>>> so here's a rebased series on top of master. No other changes.
>>
>> Now with actual patches.
>
> Looking at the PostgresNode code in 0001...
> +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos,
> timeout_secs, ...)
> +
> This format is incorrect. I think that you also need to fix the
> brackets for the do{} and the eval{] blocks.
>
> +push @cmd, '--endpos', $endpos if ($endpos);
> endpos should be made a mandatory argument. If it is not defined that
> would make the test calling this routine being stuck.
> --
> Michael

Acknowledged and agreed. I'll fix both in the next revision.  I'm
currently working on hot standby replies, but will return to this
ASAP.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results

2017-01-05 Thread Jim Nasby


On 12/28/16 3:14 AM, Craig Ringer wrote:

On 28 December 2016 at 12:32, Jim Nasby  wrote:

On 12/27/16 9:10 PM, Craig Ringer wrote:


On 28 December 2016 at 09:58, Jim Nasby  wrote:


I've looked at this some more, and ITSM that the only way to do this
without
some major surgery is to create a new type of Destination specifically
for
SPI that allows for the execution of an arbitrary C function for each
tuple
to be sent.



That sounds a lot more sensible than the prior proposals. Callback driven.



Are there other places this would be useful? I'm reluctant to write all of
this just to discover it doesn't help performance at all, but if it's useful
on it's own I can just submit it as a stand-alone patch.


I don't have a use for it personally. In BDR and pglogical anything
that does work with nontrivial numbers of tuples uses lower level
scans anyway.

I expect anything that uses the SPI to run arbitrary user queries
could have a use for something like this though. Any PL, for one.


Just a quick update on this: I've gotten this working well enough in 
plpython to do some performance testing. This patch does change python 
results from being a list of dicts to a dict of lists, but I suspect the 
vast majority of the speed improvement is from not creating a tuplestore.


The attached sample (from OS X /usr/bin/sample) is interesting. The 
highlight is:



!   3398 SPI_execute_callback  (in postgres) + 163  [0x110125793]
! 3394 _SPI_execute_plan  (in postgres) + 1262  [0x1101253fe]
! : 2043 standard_ExecutorRun  (in postgres) + 288  [0x1100f9a40]
! : | 1990 ExecProcNode  (in postgres) + 250  [0x1100fd62a]


The top line is the entry into SPI from plpython. The bottom line is 
generate_series into a tuplestore and then reading from that tuplestore. 
Almost all the time being spent in standard_ExecutorRun is in 
PLy_CSreceive, which is appending values to a set of python lists as 
it's getting tuples.


The other noteworthy item in the sample is this:


535 list_dealloc  (in Python) + 116,103,...  [0x11982b1b4,0x11982b1a7,...]


that's how long it's taking python to free the 3 lists (each with 
999 python int objects).


In short (and at best*), this makes plpython just as fast at processing 
results as SELECT count(SELECT s, s, s FROM generate_series()).


The * on that is there's something odd going on where plpython starts 
out really fast at this, then gets 100% slower. I've reached out to some 
python folks about that. Even so, the overall results from a quick test 
on my laptop are (IMHO) impressive:


Old CodeNew CodeImprovement
Pure SQL2 sec  2 sec
plpython12.7-14 sec4-10 sec ~1.3-3x
plpython - SQL 10.7-12 sec 2-8 sec  ~1.3-6x

Pure SQL is how long an equivalent query takes to run with just SQL.
plpython - SQL is simply the raw python times minus the pure SQL time.

I suspect other PL languages that have fairly fast object alloc and 
dealloc would see a similar benefit.


BTW, the patch currently breaks on nested calls to plpython, but I don't 
think that should change the performance much.


The test function:

CREATE OR REPLACE FUNCTION test_series(
iter int
) RETURNS int LANGUAGE plpythonu AS $body$
d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS 
some_other_field_name FROM generate_series(1,{}) s'.format(iter) )
return len(d['some_table_id'])
$body$;

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 80fc4c4725..ace30d75f0 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -55,7 +55,8 @@ static void _SPI_prepare_oneshot_plan(const char *src, 
SPIPlanPtr plan);
 
 static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
  Snapshot snapshot, Snapshot 
crosscheck_snapshot,
- bool read_only, bool fire_triggers, uint64 
tcount);
+ bool read_only, bool fire_triggers, uint64 
tcount,
+ DestReceiver *callback);
 
 static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
@@ -320,7 +321,34 @@ SPI_execute(const char *src, bool read_only, long tcount)
 
res = _SPI_execute_plan(&plan, NULL,
InvalidSnapshot, 
InvalidSnapshot,
-   read_only, true, 
tcount);
+   read_only, true, 
tcount, NULL);
+
+   _SPI_end_call(true);
+   return res;
+}
+int
+SPI_execute_callback(const char *src, bool read_only, long tcount,
+   DestReceiver *callback)
+{
+

[HACKERS] Hash support for grouping sets

2017-01-05 Thread Andrew Gierth

Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.

The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those.  (Yes, this
is a knapsack problem.)

This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens.  In
addition, support for multiple hash tables in AGG_HASHED mode is added.

Sample plans look like this:

explain select two,ten from onek group by cube(two,ten);
  QUERY PLAN  
--
 MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
   Hash Key: two, ten
   Hash Key: two
   Hash Key: ten
   Group Key: ()
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)

explain select two,ten from onek group by two, cube(ten,twenty);
  QUERY PLAN   
---
 HashAggregate  (cost=86.50..100.62 rows=162 width=12)
   Hash Key: two, ten, twenty
   Hash Key: two, ten
   Hash Key: two
   Hash Key: two, twenty
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)

set work_mem='64kB';
explain select count(*) from tenk1
  group by grouping sets (unique1,thousand,hundred,ten,two);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..3023.89 rows=2 width=28)
   Hash Key: two
   Hash Key: ten
   Hash Key: hundred
   Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: unique1
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)
(10 rows)

Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.

There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path.  This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:

explain select count(*) from tenk1
  group by grouping sets (two,unique1,thousand,hundred,ten);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..4126.28 rows=2 width=28)
   Hash Key: ten
   Hash Key: hundred
   Group Key: two
   Sort Key: unique1
 Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: two
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)

(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)

Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested.  But it should be reviewable
as-is.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c762fb0..6c787ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -972,6 +972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		pname = "HashAggregate";
 		strategy = "Hashed";
 		break;
+	case AGG_MIXED:
+		pname = "MixedAggregate";
+		strategy = "Mixed";
+		break;
 	default:
 		pname = "Aggregate ???";
 		strategy = "???";
@@ -1900,6 +1904,19 @@ show_grouping_set_keys(PlanState *planstate,
 	ListCell   *lc;
 	List	   *gsets = aggnode->groupingSets;
 	AttrNumber *keycols = aggnode->grpColIdx;
+	const char *keyname;
+	const char *keysetname;
+
+	if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
+	{
+		keyname = "Hash Key";
+		keysetname = "Hash Keys";
+	}
+	else
+	{
+		keyname = "Group Key";
+		keysetname = "Group Keys";
+	}
 
 	ExplainOpenGroup("Grouping Set", NULL, true, es);
 
@@ -1914,7 +1931,7 @@ show_grouping_set_keys(PlanState *planstate,
 			es->indent++;
 	}
 
-	ExplainOpenGroup("Group Keys", "Group Keys", false, es);
+	ExplainOpenGroup(keysetname, keysetname, false, es);
 
 	foreach(lc, gsets)
 	{
@@ -1938,12 +1955,12 @@ show_grouping_set_keys(PlanState *planstate,
 		}
 
 		if (!result && es->format == EXPLAIN_FORMAT_TEXT)
-			ExplainPropertyText("Group Key", "()", es);
+			ExplainPropertyText(keyname, "()", es);
 		else
-			ExplainPropertyListNested("Group Key", result, es);
+

Re: [HACKERS] Supporting huge pages on Windows

2017-01-05 Thread Tsunakawa, Takayuki

From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> In the Microsoft documentation I've seen, the privilege's name is always
> written as "Lock Pages in Memory" (note: "Pages" plural, and with initial
> capital letters).  It's quite hard to parse the sentence otherwise!  How
> about this?
> 
>  Huge pages are known as large pages on Windows.  To use them, you
> need to
>  assign the user right Lock Pages in Memory to the Windows user
> account
>  that runs PostgreSQL.
> 
> + ereport(elevel,
> + (errmsg("could not enable Lock pages in memory user right: error
> code %lu", GetLastError()),
> + errdetail("Failed system call was OpenProcessToken.")));
> 
> Same comment about capitalisation of the privilege name in this and other
> error messages.

Thanks, your sentences are surely better.  Fixed.

Regards
Takayuki Tsunakawa



win_large_pages_v5.patch
Description: win_large_pages_v5.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2017-01-05 Thread Bruce Momjian

On Mon, Dec 26, 2016 at 03:36:39PM -0500, Tom Lane wrote:
> In practice, there should never be waits on LWLocks (much less spinlocks)
> that exceed order-of-milliseconds; if there are, either we chose the wrong
> lock type or the system is pretty broken in general.  So maybe it's
> sufficient if we provide a wait start time for heavyweight locks ...
> though that still seems kind of ugly.  (Also, I don't recall the existing
> code factorization there, but getting the start time into pg_stat_activity
> without an extra gettimeofday call might be hard.  As I said, there is
> one being done, but I'm not sure how accessible its result is.)

Agreed.  No need in adding overhead for short-lived locks because the
milli-second values are going to be meaningless to users. I would be
happy if we could find some weasel value for non-heavyweight locks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy

On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 1/5/17, Tom Lane  wrote:
>>> My point is that ideally, any value that can physically fit into struct
>>> Interval ought to be considered valid.  The fact that interval_out can't
>>> cope is a bug in interval_out, which ideally we would fix without
>>> artificially restricting the range of the datatype.
>
>> Am I correct that we are still limited by ECPG which is limited by the
>> system "tm" struct?
>
> I'm not really that concerned about whether ECPG can deal with enormous
> intervals.  If it bothers you, and you want to go fix it, more power to
> you --- but I think fixing the backend is much higher priority.

I really do not think it is possible since it uses system struct.
My point - ECPG is a part of Postgres, we can't fix server side and
leave the rest.

>> Also users who use a binary protocol can also use the "tm" struct and
>> can not expect overflow.
>
> If they store an enormous interval value, its really up to them not to
> choke on it when they read it back.  Not our problem.

Those who store and those who read it back can be different groups of people.
For example, the second group are libraries' writers.
My point - that interval is big enough and limiting it can help people
from errors.
Because finally
* either they have an error in their data and that change will not
break their reslut (since it is wrong because of wrong source data)
* or they still get overflow and they have to find a solution - with
that patch or without it
* or they get result in that 16% interval between fitting hours into
"int" and "seconds" in PG_INT64, get silently corrupted result because
of ECPG or a library.

A solution for the third case can be the same as for the second one
because these groups can be the same (just with different data).

>> The docs say[1] the highest value of the interval type is 178_000_000
>> years which is
>
> ... irrelevant really.  That's talking about the largest possible value of
> the "months" field, viz (2^31-1)/12.  Perhaps we ought to document the
> other field limits, but right now there's nothing there about how large
> the hours field can get.

No, it was bad explanation of a point - now there is nothing
documented about "seconds" part of the "interval" type.
And we can just declare limit. Users don't mind reason of limitation,
they just will plan workaround if their data do not fit in limits
whatever they are.

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] T_Float morph to T_Integer after nodeRead

2017-01-05 Thread Kouhei Kaigai

> Kouhei Kaigai  writes:
> > Simplified description of what I did is:
> >   fval = makeFloat(psprintf("%.0f", plan_nrows));
> >   custom_scan->custom_private = list_make1(fval);
> 
> So don't do that.  The lexer would never produce T_Float for an
> integer-looking string, so I think it's out of scope for nodeRead() to be
> able to reconstitute such a thing.  You could use %.1f, perhaps.
> But actually, if you're worried about reconstituting exactly what you had,
> aren't you risking precision loss anyway?  Maybe something like
> psprintf("%.*e", DBL_DIG+3, plan_nrows) would be safer.
>
Ah, indeed, it is a good idea to avoid the problem.

Thanks,

PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2017-01-05 Thread Bruce Momjian

On Fri, Dec 30, 2016 at 07:35:30PM -0300, Alvaro Herrera wrote:
> Attached is v4, which fixes a couple of relatively minor bugs.  There
> are still things to tackle before this is committable, but coding review
> of the new executor node would be welcome.
> 
> The big remaining item is still fitting the PK data in TIDs 6 bytes.
> I've been looking at reworking the btree code to allow for an arbitrary
> size; it doesn't look impossible although it's going to be rather
> invasive.  Also, vacuuming: my answer continues to be that the killtuple
> interface should be good enough, but it's possible to vacuum the index
> separately from vacuuming the table anyway if you do a full scan and
> check the PK tuples for each indirect tuple.
> 
> This patch implements killtuple: a scan that sees an indirect tuple not
> returning anything from the heap marks the tuple as LP_DEAD.  Later,
> when the page is about to be split, those tuples are removed.
> 
> I also have a note in the code about not inserting an indirect tuple
> when an identical one already exists.  This is a correctness issue: we
> return duplicated heap rows in certain cases.

I still question the value of this patch as it requires user
configuration vs. more aggressive HOT/WARM updates.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] T_Float morph to T_Integer after nodeRead

2017-01-05 Thread Tom Lane

Kouhei Kaigai  writes:
> Simplified description of what I did is:
>   fval = makeFloat(psprintf("%.0f", plan_nrows));
>   custom_scan->custom_private = list_make1(fval);

So don't do that.  The lexer would never produce T_Float for an
integer-looking string, so I think it's out of scope for nodeRead()
to be able to reconstitute such a thing.  You could use %.1f, perhaps.
But actually, if you're worried about reconstituting exactly what you
had, aren't you risking precision loss anyway?  Maybe something like
psprintf("%.*e", DBL_DIG+3, plan_nrows) would be safer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting planning time with EXPLAIN

2017-01-05 Thread Tom Lane

Andres Freund  writes:
> On 2016-12-28 10:29:48 -0500, Tom Lane wrote:
>> How about just saying that the existing TIMING option turns this on,

> I don't like this much - I'd like (as previously stated in [1]) to be
> able to have an actual EXPLAIN ANALYZE (COSTS off, TIMING OFF) in tests
> because that shows the number of loops, rechecks, etc.

Hmm ...

regression=# EXPLAIN (analyze, COSTS off, TIMING OFF) select * from tenk1;
  QUERY PLAN   
---
 Seq Scan on tenk1 (actual rows=1 loops=1)
 Planning time: 1.075 ms
 Execution time: 2.723 ms
(3 rows)

I see your point.  OK, that's a use case not within the scope of the
original proposal, but it's a reasonable argument for having a SUMMARY OFF
option.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP FUNCTION of multiple functions

2017-01-05 Thread Jim Nasby

Forwarding some comments I neglected to send to the list...

On 1/3/17 9:16 AM, Peter Eisentraut wrote:
> On 1/2/17 1:04 PM, Jim Nasby wrote:

On 12/31/16 10:17 AM, Peter Eisentraut wrote:

--- a/src/test/regress/expected/event_trigger.out
+++ b/src/test/regress/expected/event_trigger.out
@@ -80,9 +80,6 @@ create event trigger regress_event_trigger2 on
ddl_command_start
execute procedure test_event_trigger();
 -- OK
 comment on event trigger regress_event_trigger is 'test comment';
--- should fail, event triggers are not schema objects
-comment on event trigger wrong.regress_event_trigger is 'test comment';
-ERROR:  event trigger name cannot be qualified
 -- drop as non-superuser should fail

I'm guessing that's a spurious change...

>> Well, the test is no longer useful as it was before, because the parser
>> just rejects the command.
>
> Ah. I would have just left it in to make sure a sane error was still
> generated, so I was wondering if something else was going on.

+++ b/src/test/regress/expected/object_address.out
@@ -287,7 +287,7 @@ WARNING:  error for function of access
method,{eins,zwei,drei},{integer}: argume
 SELECT pg_get_object_address('language', '{one}', '{}');
 ERROR:  language "one" does not exist
 SELECT pg_get_object_address('language', '{one,two}', '{}');
-ERROR:  language name cannot be qualified
+ERROR:  name list length must be exactly 1

Arguably, in this usage, that should be array length, not list length.
IIRC there's a function that will map an object type to it's name, so it
wouldn't be hard to restore the previous error message if you wanted to
go that route. pg_get_object_address() has to do a bunch of similar
error checks too, maybe there's a way to combine this stuff.

It's somewhat unfortunate that
pg_get_object_address()/pg_identify_object_as_address() won't match
these changes, but I don't see any good way to fix that.

BTW, it would also be damn nice if there was some form of wildcard
available for DROP FUNCTION (and maybe ALTER). If you've created several
overloaded versions of a function, making sure you change all of them
can be a PITA.

>> Yeah, that's the next patch after this. ;-)
>
> Yay :)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] T_Float morph to T_Integer after nodeRead

2017-01-05 Thread Kouhei Kaigai

I noticed a strange behavior when T_Float value is serialized, then deserialized
on the worker process for cpu parallel execution.

Simplified description of what I did is:
  fval = makeFloat(psprintf("%.0f", plan_nrows));
  custom_scan->custom_private = list_make1(fval);

This string expression contains no dot, then Float value was written out
as if it is an integer value, like "654321".

nodeRead() calls nodeTokenType() to determine the token type.
It determines a numeric token with no dot an Integer value, even if it is
generated by makeFloat(). Then, the worker process reference this value
using floatVal() and gets SEGV.

A workaround is that we never use "%.0f" format for makeFloat().
It may be sufficient because we have small number of makeFloat() call all
around the source tree. Or, do we have any other systematic solution to
prevent numeric cstring without dot?

Thanks,

PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-05 Thread Jonathon Nelson

On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote:
> > Attached please find a patch for PostgreSQL 9.4 which changes the maximum
> > amount of data that the wal sender will send at any point in time from
> the
> > hard-coded value of 128KiB to a user-controllable value up to 16MiB. It
> has
> > been primarily tested under 9.4 but there has been some testing with 9.5.
> >
> > In our lab environment and with a 16MiB setting, we saw substantially
> > better network utilization (almost 2x!), primarily over high bandwidth
> > delay product links.
>
> That's a bit odd - shouldn't the OS network stack take care of this in
> both cases?  I mean either is too big for TCP packets (including jumbo
> frames).  What type of OS and network is involved here?
>

In our test lab, we make use of multiple flavors of Linux. No jumbo frames.
We simulated anything from 0 to 160ms RTT (with varying degrees of jitter,
packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT
there was a 2x improvement in performance.

-- 
Jon Nelson
Dyn / Principal Software Engineer

Re: [HACKERS] Reporting planning time with EXPLAIN

2017-01-05 Thread Andres Freund

On 2016-12-28 10:29:48 -0500, Tom Lane wrote:
> How about just saying that the existing TIMING option turns this on,
> if it's specified without ANALYZE?  Right now that combination draws
> an error:
>
>   regression=# explain (timing on) select 1;
>   ERROR:  EXPLAIN option TIMING requires ANALYZE
>
> so there's no existing usage that this would break.

I don't like this much - I'd like (as previously stated in [1]) to be
able to have an actual EXPLAIN ANALYZE (COSTS off, TIMING OFF) in tests
because that shows the number of loops, rechecks, etc.

Andres

[1] 
http://archives.postgresql.org/message-id/20140603180548.GU24145%40awork2.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: session server side variables

2017-01-05 Thread Jim Nasby


On 1/5/17 4:59 AM, Pavel Stehule wrote:


 - Personnaly, I'm not convinced that a NEW type of session variable is
   a good thing as pg already has one, and two is one too many. I would
   find it more useful to enhance existing dynamic session variables
with,
   by order of importance:

   (1) private/public visibility (as Oracle does with package vars).
   this point is enough to implement the presented use case.

   (2) typing (casting is a pain)

   (3) improved syntax (set_config & current_setting is a pain)

Eventually, unrelated to the use case, but in line with your
motivations as I understand them:

   (4) add an option to make a GUC non transactional, iff there is
   a clear use case for that (maybe debug?).

   (5) have some "permanent" GUC type declarations (maybe editing the
   config file does that already, by the way?)


Thank you for your work on this topic.

Unfortunately, there is significant disagreement in this topic between
us. I see a schema based persistent metadata a catalog based security as
fundamental feature. Editing config file is not acceptable in any view.


I generally agree with that. That said, it probably wouldn't be hard to 
"register" GUCs during backend startup, based on what's in the catalog 
for the database you're connecting to. There's certainly already a place 
in the code to do this, since you can set per-database values for GUCs. 
That said, IIRC GUCs are setup in such a way that could could just 
create a new stack upon connection. Actually, I think that'd need to 
happen anyway, otherwise these variables are going to look like GUCs 
even though they're not.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-05 Thread Michael Paquier

On Fri, Jan 6, 2017 at 1:31 AM, Vladimir Rusinov  wrote:
> Attaching a patch that renames all 'xlog' functions, keeping aliases for old
> ones (since it looks like majority vote is for keeping them).

OK.

> - OIDs - where do I get numbers from? I was kinda choosing them at random,
> unaware if there is some process for keeping track of them. Please point me
> if such thing exists and I'll change them.

You can use src/include/catalog/unused_oids to look at the OIDs not
yet assigned. Assigning them logically increasing may be a good idea.
But OID ordering is not that mandatory for the internal functions and
operators.

> - Documentation. I've added a new section with a simple table, keeping it
> somewhat neglected on purpose. We'll have old names in index and searchable
> on page, but nothing more. Is it sufficient?

I find quite clear what you have produced here. The table states
clearly the one-one mapping between the old and new entries.

> - New function names. I've used 'recovery' instead of 'xlog_replay'

I would not have touched this one. But if we go this way, let's
bike-shed and use pg_recovery_ as prefix for consistency.

> and used 'wal_file' instead of 'xlogfile'. Does it make sense?

This one yes.

For simplicity I would just have done a s/xlog/wal/ and keep the names
consistent a maximum with the old ones. But I am just working here.
You clearly put thoughts in the new names.

> - Release notes. I was unable to find a draft for 10.0. How do I make sure
> these renames are not forgotten?

That's part of the job of the release note writer, very likely Tom or
Bruce. The commit message should be explicit enough that the writer
does not have to dig into the code itself for the changes.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Thomas Munro

On Fri, Jan 6, 2017 at 11:07 AM, Tom Lane  wrote:
> So, um, how do we know that backend A and backend B have the same idea
> about what tranche id 37 means?

[butting in]

In the particular case of dsa.c, the client has to supply a tranche ID
when creating the DSA area, and then the ID is recorded in
dsa_area_control (the area's header in either DSM or fixed shared
memory) and used when other backends attach.  How other backends get
their hands on the DSA area handle or pointer to attach is the problem
of the client code that's using DSA, but it doesn't have to worry
about sharing the tranche ID.

There are two cases to consider:

1.  There are well known tranche IDs defined in lwlock.h, like the one
used here.  As far as I can see, any tranche ID defined that way
should also be registered with a name in RegisterLWLockTranches in
every backend, and the name should be a pointer to constant data so
that it is always dereferenceable.  That is the case for the tranche
that's being discussed here as of this recent commit, and any backend
will be able to see the tranche name for that one.

2.  There are tranche IDs that are allocated at runtime.  An extension
that wants to create DSA areas or anything else that involves setting
up new LWLocks would probably need to allocate one with
LWLockNewTrancheId, and should ideally arrange to register a name in
all backends.  Sharing the tranche ID and name with other backends is
the extension's problem (it probably needs to put it the ID in shared
memory for other backends to find, like dsa.c does, and the name
probably needs to be a string constant).  If you look at
pg_stat_activity at a moment when some backend is waiting for an
LWLock with one of these dynamically allocated tranche IDs, and the
extension hasn't arranged to register the name for that tranche ID in
your backend, then pg_stat_activity will just show "extension" because
it has no better information.

To do better we'd probably need a whole registry and recycling
mechanism that tracks both IDs and names in shared memory as several
people have mentioned.  But the motivation to do that has reduced
significantly now that T_ID tracking has been dropped.  Previously,
you might have wanted to create and register a variable number of
tranche IDs in order to support, say, a bank of LWLocks used by an
executor node (because there might be more than one instance of that
executor node in query plan, and they'd need distinct IDs in order for
you to be able to register their different base + stride for T_ID
purposes, if we still had that concept).  The infrastructure was
somewhat lacking for that.  Now that tranche IDs are only used to find
a name to show in pg_stat_activity, there is nothing stopping you from
using the same tranche ID in several places at the same time, it just
means that you can't tell which of them is involved when reading that
view.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Peter Eisentraut

On 1/5/17 11:56 AM, Stephen Frost wrote:
> I've seen complaints about it and have seen people changing the
> permissions to be root/root on the .auto.conf file to disallow 'regular'
> superusers from doing ALTER SYSTEM.  It's not exactly elegant but it's a
> way to avoid the risk of someone messing with the system config without
> going through the CM system.

Good idea!  I have also had that issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane  wrote:
>> Better documentation seems required, but really the whole design seems
>> rather wacko.  Backends must agree on numeric tranche IDs, but every
>> backend has its own copy of the tranche name?  How do we even know what
>> agreement is?  And every one has to "register" every tranche ID for
>> itself?  Why in the world isn't registration done *once* and the tranche
>> name stored in shared memory?

> Well, with the original design, that wasn't feasible, because each
> backend had to store not only the name (which was constant across all
> backends) but also the array_base (which might not be, if the locks
> were stored in DSM) and array_stride.  Since commit
> 3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to
> do what you're proposing.  It's still not without difficulties,
> though.  There are 65,536 possible tranche IDs, and allocating an
> array of 64k pointers in shared memory would consume half a megabyte
> of shared memory the vast majority of which would normally be
> completely unused.  So I'm not very enthused about that solution, but
> you aren't the first person to propose it.

So, um, how do we know that backend A and backend B have the same idea
about what tranche id 37 means?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 4:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I suspect you're going to tell me this all needs to be better
>> documented, which is probably a valid criticism.  Suggestions as to
>> where such documentation should be added - either as code comments or
>> in a README somewhere or in doc/src/sgml - will be gratefully
>> accepted.
>
> Better documentation seems required, but really the whole design seems
> rather wacko.  Backends must agree on numeric tranche IDs, but every
> backend has its own copy of the tranche name?  How do we even know what
> agreement is?  And every one has to "register" every tranche ID for
> itself?  Why in the world isn't registration done *once* and the tranche
> name stored in shared memory?

Well, with the original design, that wasn't feasible, because each
backend had to store not only the name (which was constant across all
backends) but also the array_base (which might not be, if the locks
were stored in DSM) and array_stride.  Since commit
3761fe3c20bb040b15f0e8da58d824631da00caa, it would be much easier to
do what you're proposing.  It's still not without difficulties,
though.  There are 65,536 possible tranche IDs, and allocating an
array of 64k pointers in shared memory would consume half a megabyte
of shared memory the vast majority of which would normally be
completely unused.  So I'm not very enthused about that solution, but
you aren't the first person to propose it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Michael Banck

On Mon, Jan 02, 2017 at 10:21:41AM +0100, Magnus Hagander wrote:
> On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs  wrote:
> > On 31 December 2016 at 15:00, Magnus Hagander  wrote:
> > > max_wal_senders=10
> > > max_replication_slots=20

[...]

> > > wal_level=replica
> >
> > This is more problematic because it changes behaviours.
> 
> You can't actually change the other two without changing wal_level.

That actually goes both ways: I recently saw a server not start cause we
were experimenting with temporarily setting wal_level to minimal for
initial bulk loading, but did not reduce max_wal_senders back to zero.
So it failed at startup with 'FATAL:  WAL streaming (max_wal_senders >
0) requires wal_level "replica" or "logical"'.

I don't want to hijack this thread, but I wonder whether the name
"*max*_wal_senders" really conveys that dependence on wal_level (there's
no comment to that end in the postgresql.conf sample) and/or whether
maybe the admin should just be notified that WAL streaming is turned off
cause wal_level < 'replica'?

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group clear xid can leak semaphore count

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 4:48 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think we have run into this kind of issue before.  I wonder if
>> there's any way to insert some kind of a guard - e.g. detect at
>> backend startup time that the semaphore has a non-zero value and fix
>> it, issuing a warning along the way...  maybe something like:
>
> See the PGSemaphoreReset near the end of InitProcess.

Oh ho.  So the worst consequence of this is a backend-lifetime leak,
not a cluster-lifetime leak.  That's good, at least.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group clear xid can leak semaphore count

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> I think we have run into this kind of issue before.  I wonder if
> there's any way to insert some kind of a guard - e.g. detect at
> backend startup time that the semaphore has a non-zero value and fix
> it, issuing a warning along the way...  maybe something like:

See the PGSemaphoreReset near the end of InitProcess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-01-05 Thread Thomas Munro

On Thu, Jan 5, 2017 at 4:12 PM, Tsunakawa, Takayuki
 wrote:
> [win_large_pages_v4.patch]

Just a small suggestion about the wording in this patch:

+This feature uses the large-page support on Windows. To use
the large-page
+support, you need to assign Lock page in memory user right to
the Windows
+user account which runs PostgreSQL.

In the Microsoft documentation I've seen, the privilege's name is
always written as "Lock Pages in Memory" (note: "Pages" plural, and
with initial capital letters).  It's quite hard to parse the sentence
otherwise!  How about this?

 Huge pages are known as large pages on Windows.  To use them,
you need to
 assign the user right Lock Pages in Memory to the Windows user account
 that runs PostgreSQL.

+ ereport(elevel,
+ (errmsg("could not enable Lock pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was OpenProcessToken.")));

Same comment about capitalisation of the privilege name in this and
other error messages.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane

Vitaly Burovoy  writes:
> On 1/5/17, Tom Lane  wrote:
>> My point is that ideally, any value that can physically fit into struct
>> Interval ought to be considered valid.  The fact that interval_out can't
>> cope is a bug in interval_out, which ideally we would fix without
>> artificially restricting the range of the datatype.

> Am I correct that we are still limited by ECPG which is limited by the
> system "tm" struct?

I'm not really that concerned about whether ECPG can deal with enormous
intervals.  If it bothers you, and you want to go fix it, more power to
you --- but I think fixing the backend is much higher priority.

> Also users who use a binary protocol can also use the "tm" struct and
> can not expect overflow.

If they store an enormous interval value, its really up to them not to
choke on it when they read it back.  Not our problem.

> The docs say[1] the highest value of the interval type is 178_000_000
> years which is

... irrelevant really.  That's talking about the largest possible value of
the "months" field, viz (2^31-1)/12.  Perhaps we ought to document the
other field limits, but right now there's nothing there about how large
the hours field can get.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> I suspect you're going to tell me this all needs to be better
> documented, which is probably a valid criticism.  Suggestions as to
> where such documentation should be added - either as code comments or
> in a README somewhere or in doc/src/sgml - will be gratefully
> accepted.

Better documentation seems required, but really the whole design seems
rather wacko.  Backends must agree on numeric tranche IDs, but every
backend has its own copy of the tranche name?  How do we even know what
agreement is?  And every one has to "register" every tranche ID for
itself?  Why in the world isn't registration done *once* and the tranche
name stored in shared memory?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Tomas Vondra


On 01/05/2017 05:37 PM, Stephen Frost wrote:

Tomas,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On 01/05/2017 02:23 PM, Magnus Hagander wrote:

It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.


Not sure if I understand correctly what you're saying. You're saying
that although it'd be easy to construct a benchmark showing
significant performance impact, it won't represent a common
workload. Correct?


I think he's saying that it's not very easy to construct a good example
of typical bulk-loading workloads using just pgbench.  Bulk loading
certainly happens with PG and I don't think we'll make very many friends
if we break optimizations when wal_level is set to minimal like those
you get using:

BEGIN;
CREATE TABLE x (c1 int);
COPY x FROM STDIN;
COMMIT;

or:

BEGIN;
TRUNCATE x;
COPY x FROM STDIN;
COMMIT;

Changing the wal_level from 'minimal' to 'replica' or 'logical' with
such a benchmark is going to make the WAL go from next-to-nothing to
size-of-database.


Sure, I do know how to construct such workloads - and it's trivial even 
with pgbench custom scripts. The question is whether such workloads are 
common or not.


Most importantly, no one is proposing to break the optimizations, but 
changing the defaults - users relying on the optimizations are free to 
switch back to wal_level=minimal if needed.


>

One doesn't typically *just* do bulk loads, however,
often it's a bulk load into a table and then the contents of that table
are merged with another table or perhaps joined to it to produce some
report or something along those lines.  In many of those cases, our
more-recently added capability to have UNLOGGED tables will work, but
not all (in particular, it can be very handy to load everything in using
the above technique and then switch the wal_level to replica, which
avoids having to have the bulk of the data sent through WAL, something
you can't avoid if you want to turn an unlogged table into a logged
one).



Ultimately, the question is whether the number of people running into 
"Hey, I can't take pg_basebackup or setup a standby with the default 
config!" is higher or lower than number of people running into "Hey, 
CREATE TABLE + COPY is slower now!"


I haven't seen many systems relying on such load optimizations, for a 
number of reasons:


1) The important/critical systems usually have replicas, so are 
inherently incompatible with wal_level=minimal.


2) The batch jobs usually don't truncate the main table, but load the 
increment into a temporary/unlogged table first, then merge it into the 
main one.


That is not to say there are no other cases benefiting from those 
optimizations, but we're talking about the default value - we're not 
removing the wal_level=minimal.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-05 Thread Ants Aasma

On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:

On 2 January 2017 at 22:24, Craig Ringer  wrote:
>
>
> On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:
>
> On 21 December 2016 at 13:23, Simon Riggs  wrote:
>
>> Fix it up and I'll commit. Thanks for the report.
>
> I was hoping for some more effort from Ants to correct this.
>
> I'll commit Craig's new tests for hs feedback before this, so it can
> go in with a Tap test, so I imagine we're about a week away from
> commit on this.
>
>
> I posted a revised version of Ants's patch that passes the shell script
> test.
>
> A TAP  test would likely make sene though, I agree.

Ants, do you think you'll have a chance to convert your shell script
test into a TAP test in src/test/recovery?

Simon has said he would like to commit this fix. I'd personally be
happier if it had a test to go with it.

You could probably just add to src/test/recover/t/001 which now
contains my additions for hot standby.

I'm travelling right now, but I should be able to give it a shot next week.

Regards,
Ants Aasma

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 1:15 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> No, I think backend-lifetime is right.  The tranche registrations are
>> all backend-local state, so there's no problem with backend A
>> registering a string at one address and backend B registering a string
>> at a different address.  It's just important that neither of those
>> strings goes away before the corresponding backend does.
>
> Then I don't understand what's going on.  Isn't the crash you fixed
> because backend A was looking at the tranche containing the lock backend B
> was blocked on?  How can a tranche name local to backend B work?

I think the chain of events must be as follows:

1. Backend A executed a parallel query at least once.  During the
course of doing so, it registered the parallel_query_dsa tranche, but
after the execution of the parallel query completed, the tranche_name
pointer was no longer valid, because it pointed into a DSM segment
that got removed at the end of the query, rather than using
backend-lifespan memory as advocated by the header comment of
LWLockRegisterTranche.

2. Backend B began executing a parallel query and, somehow, ended up
blocking on one of the DSA LWLocks.  I'm not quite clear how this
could've happened, because I didn't think we had any committed code
that did DSA allocations yet, but maybe those locks are taken while
attaching to the DSA or something like that.  The details don't really
matter: somehow, B managed to advertise LWTRANCHE_PARALLEL_QUERY_DSA
in pg_stat_activity.

3. At exactly that moment, Backend A examined pg_stat_activity and saw
that tranche ID and tried to look it up, following the dangling
pointer left behind by step #1.  Boom.

If step #1 had not occurred, at the last step, backend A would have
seen the tranche ID as unregistered and it would not have crashed.
Instead, pg_stat_activity would have shown the wait event as
"extension" rather than crashing.  That's still wrong, of course,
though not as bad as crashing.  Basically, there are two problems
here:

- Every backend needs to register every built-in tranche ID at backend
startup.  If it doesn't, then it might read some other backend's wait
event as "extension" instead of the correct value.  The DSA code had
the mistaken idea that tranche IDs only need to be registered before
acquiring locks that use that tranche ID, which isn't true: we might
need to interpret a tranche ID that some OTHER backend has advertised
via pg_stat_activity.  A corollary is that an extension that uses a
tranche ID should register that tranche ID as early as possible (i.e.
_PG_init()) so that a backend which has loaded but not used an
extension can interpret that tranche ID if some other backend
advertises it in pg_stat_activity.

- Every backend needs to follow the directions and store the tranche
name in memory that won't go away before the backend itself.  If it
doesn't, an attempt to interpret a tranche ID read from
pg_stat_activity may follow a dangling pointer and crash, which is
what must have been happening here.

I suspect you're going to tell me this all needs to be better
documented, which is probably a valid criticism.  Suggestions as to
where such documentation should be added - either as code comments or
in a README somewhere or in doc/src/sgml - will be gratefully
accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-05 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-04 09:38:42 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2017-01-03 10:37:08 -0500, Stephen Frost wrote:
> > > > * Vladimir Rusinov (vrusi...@google.com) wrote:
> > > > > I think I +1 on this.
> > > > > I've did a github search on these function names and there is a lot 
> > > > > of code
> > > > > that use them. E.g. there is 8.5k hits for pg_last_xlog_location
> > > > > ;
> > > > > a lot of them a forks and copy-pastes, but still, that's quite a lot. 
> > > > > Let's
> > > > > keep the aliases around for couple of versions after which hopefully 
> > > > > a lot
> > > > > of the code will be updated.
> > > > 
> > > > And there's 12k hits for pg_xlog.
> > > 
> > > > If we do that, we'll just end up with exactly the same question about
> > > > removing them and the same amount of code breakage in a few years.  I
> > > > don't see how that is really helping anyone.
> > > 
> > > Meh^2.  The cost of having pg_xlog was that people lost their
> > > data. Hence their was motivation of changing things. The cost of having
> > > some function aliases is, what, a pg_proc line?  If we end up carrying
> > > them forever, so what?
> > 
> > I outlined exactly that in the part you didn't quote.
> 
> Sure, but my point was that you and several others essentially argue for
> breaking things gratuitously. And I think that's bad. That you can live
> with backward compatibility doesn't change that point.

I do not agree that we are arguing for breaking things gratuitously.
That's saying that there's no justification for such a change and I
think that's wrong- the prior names were driven off the directory name
and were, really, not well suited to what the functions were actually
for.  This change is to align the functions with the agreed-upon
directory name change and because the new name is distinctly more
accurate than the prior name.  As discussed previously, the term
'transaction log' is more closely related to what *clog* is than the
write-ahead log.

My feeling is that people who are arguing for the aliases are doing so
from a position of "it's easy for us to do, so we should," which I am
generally not in favor of when it comes to backwards compatibility
because it leaves things in an inconsistent state.  Tools which monitor
the progression of our write-ahead-log (wal) from the primary to stanbys
will end up using a function that has "xlog" in the name, even though
that's an inconsistent and less than accurate name for what they're
actually doing, simply because "well, that part didn't break, so we
don't have to change it, even though these other things had to be
changed to refer to 'wal'".  That, in my opinion, isn't the right
compromise to be making.

> > > > If we really feel that this is the only thing between 9.6 and 10 that'll
> > > > cause problems for some serious amount of code and we don't expect to
> > > > change the function APIs anytime in the near future then perhaps we
> > > > could keep aliases, *document* them, and treat them as full functions
> > > > just like the regular ones.
> > > 
> > > I think we've been far to cavalier lately about unnecessarily breaking
> > > admin and monitoring tools. There's been pg_stat_activity backward
> > > incompat changes in most of the last releases. It's a *PAIN* to develop
> > > monitoring / admin tools that work with a number of releases.  It's fine
> > > to cause that pain if there's some considerable benefit (e.g. not
> > > triggering data loss seems like a case for that, as imo is unifying
> > > configuration), but I don't see how that justifying breaking things
> > > gratuitously.  Just renaming well known functions for a minor bit of
> > > cleanliness seems not to survive a cost/benefit analysis.
> > 
> > I agree that we have been breaking monitoring tools on the regular for a
> > while as we continue to add capabilities and improve things, which is
> > part of the reason that I don't see this particular break as a very big
> > deal- serious monitoring tools are going to need to be updated anyway.
> 
> I have no problem with breaking things when necessary. I do have a
> problem when we do so willy-nilly when providing backward-compatibility
> would have been easy enough. E.g. Nearly all recent pg_stat_activity
> changes would have been doable just as well without breakage.

Would those changes have resulted in an inconsistent and difficult
system to understand and follow?  For pg_stat_activity, frankly, I don't
recall offhand which probably means they weren't a very big deal for me.

Changes to anything related to the 'pg_xlog' directory or the set of
functions that pgbackrest or our tools care about are a lot closer to me
and changes there represent serious time that I'll need to ask David and
Jeff to spend on updating pgbackrest and our tooling for, yet I've had
no push-back

Re: [HACKERS] Group clear xid can leak semaphore count

2017-01-05 Thread Robert Haas

On Sat, Dec 31, 2016 at 12:44 AM, Amit Kapila  wrote:
> During the review of Group update Clog patch [1], Dilip noticed an
> issue with the patch where it can leak the semaphore count in one of
> the corner case.  I have checked and found that similar issue exists
> for Group clear xid (ProcArrayGroupClearXid) as well.  I think the
> reason why this problem is not noticed by anyone till now is that it
> can happen only in a rare scenario when the backend waiting for xid
> clear is woken up due to some unrelated signal.  This problem didn't
> exist in the original commit
> (0e141c0fbb211bdd23783afa731e3eef95c9ad7a) of the patch, but later
> while fixing some issues in the committed patch, it got introduced in
> commit 4aec49899e5782247e134f94ce1c6ee926f88e1c. Patch to fix the
> issue is attached.

Yeah, that looks like a bug.  Thanks for the detailed analysis;
committed and back-patched to 9.6.

I suppose in the worst case it's possible that we'd leak a semaphore
count and then every future time we enter a PGSemaphoreLock using that
PGPROC we have to eat up the leaked count (or counts) and then put it
(or them) back after we really wait.  That would suck.  But I wasn't
able to observe any leaks in a high-concurrency pgbench test on hydra,
so it's either very unlikely or requires some additional condition to
trigger the problem.

I think we have run into this kind of issue before.  I wonder if
there's any way to insert some kind of a guard - e.g. detect at
backend startup time that the semaphore has a non-zero value and fix
it, issuing a warning along the way...  maybe something like:

while (sem_trywait(sem) == 0)
++bogus;
if (bogus > 0)
elog(WARNING, "uh oh");

I'm not sure if that would be prone to false positives though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy

On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 1/5/17, Tom Lane  wrote:
>>> We could think about replacing interval2tm's output format with some
>>> other struct that uses a TimeOffset for hours and so cannot overflow.
>>> I'm not sure though how far the effects would propagate; it might be
>>> more work than we want to put into this.
>
>> If values with overflow are already in a database, what do you expect
>> a new output function should fix?
>
> My point is that ideally, any value that can physically fit into struct
> Interval ought to be considered valid.  The fact that interval_out can't
> cope is a bug in interval_out, which ideally we would fix without
> artificially restricting the range of the datatype.
>
> Now, the problem with that of course is that it's not only interval_out
> but multiple other places.  But your proposed patch also requires touching
> nearly everything interval-related, so I'm not sure it has any advantage
> that way over the less restrictive answer.

OK, I try to limit values from
9223372036854775807/36/24/365=292471.2 years
to
77309411328/36/24/365=245146.5 years

i.e. cut 47325 years or ~16%, but it is only for interval's part
measured in seconds.

Am I correct that we are still limited by ECPG which is limited by the
system "tm" struct?
Also users who use a binary protocol can also use the "tm" struct and
can not expect overflow.

The docs say[1] the highest value of the interval type is 178_000_000
years which is
significantly bigger than both of values (current and limited)
measured in seconds.
I don't think applying that limitation is a big deal.


I tried to find functions should be touched and there are not so many of them:

-- internal functions:
interval_check_overflow(Interval *i)
tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
interval2tm(Interval span, struct pg_tm * tm, fsec_t *fsec)
intervaltypmodleastfield(int32 typmod)

-- output functions:
interval_out(PG_FUNCTION_ARGS)  -- skip output
interval_send(PG_FUNCTION_ARGS) -- skip output

-- main input/computing functions:
interval_in(PG_FUNCTION_ARGS)   -- forgot to cover
interval_recv(PG_FUNCTION_ARGS) -- covered
make_interval(PG_FUNCTION_ARGS) -- covered

AdjustIntervalForTypmod(Interval *interval, int32 typmod)  -- can lead
to overflow

interval_justify_interval(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_justify_hours(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_justify_days(PG_FUNCTION_ARGS)  -- can lead to overflow
interval_um(PG_FUNCTION_ARGS)  -- covered
interval_pl(PG_FUNCTION_ARGS)  -- covered
interval_mi(PG_FUNCTION_ARGS)  -- covered
interval_mul(PG_FUNCTION_ARGS) -- covered

-- can not lead to overflow
interval_transform(PG_FUNCTION_ARGS)
interval_div(PG_FUNCTION_ARGS)
interval_trunc(PG_FUNCTION_ARGS)
interval_part(PG_FUNCTION_ARGS)

-- based on other functions
interval_scale(PG_FUNCTION_ARGS) -- based on AdjustIntervalForTypmod
interval_accum(PG_FUNCTION_ARGS) -- based on interval_pl
interval_accum_inv(PG_FUNCTION_ARGS) -- based on interval_mi
interval_avg(PG_FUNCTION_ARGS)   -- based on interval_pl
interval_combine(PG_FUNCTION_ARGS)   -- based on interval_pl
mul_d_interval(PG_FUNCTION_ARGS) -- based on interval_mul

-- checking, not changing:
interval_finite(PG_FUNCTION_ARGS)
interval_smaller(PG_FUNCTION_ARGS)
interval_larger(PG_FUNCTION_ARGS)
interval_cmp_value(const Interval *interval)
interval_cmp_internal(Interval *interval1, Interval *interval2)
interval_eq(PG_FUNCTION_ARGS)
interval_ne(PG_FUNCTION_ARGS)
interval_lt(PG_FUNCTION_ARGS)
interval_gt(PG_FUNCTION_ARGS)
interval_le(PG_FUNCTION_ARGS)
interval_ge(PG_FUNCTION_ARGS)
interval_cmp(PG_FUNCTION_ARGS)
interval_hash(PG_FUNCTION_ARGS)

-- not applicable:
intervaltypmodin(PG_FUNCTION_ARGS)
intervaltypmodout(PG_FUNCTION_ARGS)
timestamp_pl_interval(PG_FUNCTION_ARGS)
timestamp_mi_interval(PG_FUNCTION_ARGS)
timestamptz_pl_interval(PG_FUNCTION_ARGS)
timestamptz_mi_interval(PG_FUNCTION_ARGS)


=
In fact, only six of them (not "touching nearly everything
interval-related") should be covered:

* interval_in
* interval_out (yes, the SAMESIGN there is useless)
* AdjustIntervalForTypmod
* interval_justify_interval
* interval_justify_hours
* interval_justify_days


[1]https://www.postgresql.org/docs/current/static/datatype-datetime.html
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-05 Thread Andres Freund

On 2017-01-04 09:38:42 -0500, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-01-03 10:37:08 -0500, Stephen Frost wrote:
> > > * Vladimir Rusinov (vrusi...@google.com) wrote:
> > > > I think I +1 on this.
> > > > I've did a github search on these function names and there is a lot of 
> > > > code
> > > > that use them. E.g. there is 8.5k hits for pg_last_xlog_location
> > > > ;
> > > > a lot of them a forks and copy-pastes, but still, that's quite a lot. 
> > > > Let's
> > > > keep the aliases around for couple of versions after which hopefully a 
> > > > lot
> > > > of the code will be updated.
> > > 
> > > And there's 12k hits for pg_xlog.
> > 
> > > If we do that, we'll just end up with exactly the same question about
> > > removing them and the same amount of code breakage in a few years.  I
> > > don't see how that is really helping anyone.
> > 
> > Meh^2.  The cost of having pg_xlog was that people lost their
> > data. Hence their was motivation of changing things. The cost of having
> > some function aliases is, what, a pg_proc line?  If we end up carrying
> > them forever, so what?
> 
> I outlined exactly that in the part you didn't quote.

Sure, but my point was that you and several others essentially argue for
breaking things gratuitously. And I think that's bad. That you can live
with backward compatibility doesn't change that point.


> > > If we really feel that this is the only thing between 9.6 and 10 that'll
> > > cause problems for some serious amount of code and we don't expect to
> > > change the function APIs anytime in the near future then perhaps we
> > > could keep aliases, *document* them, and treat them as full functions
> > > just like the regular ones.
> > 
> > I think we've been far to cavalier lately about unnecessarily breaking
> > admin and monitoring tools. There's been pg_stat_activity backward
> > incompat changes in most of the last releases. It's a *PAIN* to develop
> > monitoring / admin tools that work with a number of releases.  It's fine
> > to cause that pain if there's some considerable benefit (e.g. not
> > triggering data loss seems like a case for that, as imo is unifying
> > configuration), but I don't see how that justifying breaking things
> > gratuitously.  Just renaming well known functions for a minor bit of
> > cleanliness seems not to survive a cost/benefit analysis.
> 
> I agree that we have been breaking monitoring tools on the regular for a
> while as we continue to add capabilities and improve things, which is
> part of the reason that I don't see this particular break as a very big
> deal- serious monitoring tools are going to need to be updated anyway.

I have no problem with breaking things when necessary. I do have a
problem when we do so willy-nilly when providing backward-compatibility
would have been easy enough. E.g. Nearly all recent pg_stat_activity
changes would have been doable just as well without breakage.


> I don't see that changing any time soon either- we are woefully far
> behind in this area compared to other databases and we should be trying
> to encourage people to continue improving these areas, not making things
> unnecessairly difficult by requiring backwards compatibility hacks.

By breaking stuff on a regular basis we're also showing that we're
behind. Compatibility also is a big topic. We also force users to stay
behind longer, and to upgrade less frequently.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-05 Thread Andres Freund

Hi,

On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote:
> Attached please find a patch for PostgreSQL 9.4 which changes the maximum
> amount of data that the wal sender will send at any point in time from the
> hard-coded value of 128KiB to a user-controllable value up to 16MiB. It has
> been primarily tested under 9.4 but there has been some testing with 9.5.
> 
> In our lab environment and with a 16MiB setting, we saw substantially
> better network utilization (almost 2x!), primarily over high bandwidth
> delay product links.

That's a bit odd - shouldn't the OS network stack take care of this in
both cases?  I mean either is too big for TCP packets (including jumbo
frames).  What type of OS and network is involved here?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-05 Thread Jonathon Nelson

Attached please find a patch for PostgreSQL 9.4 which changes the maximum
amount of data that the wal sender will send at any point in time from the
hard-coded value of 128KiB to a user-controllable value up to 16MiB. It has
been primarily tested under 9.4 but there has been some testing with 9.5.

In our lab environment and with a 16MiB setting, we saw substantially
better network utilization (almost 2x!), primarily over high bandwidth
delay product links.

-- 
Jon Nelson
Dyn / Principal Software Engineer
From 5ba24d84d880d756bec538e35c499811d88e2fc3 Mon Sep 17 00:00:00 2001
From: Jon Nelson 
Date: Wed, 7 Sep 2016 07:23:53 -0500
Subject: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

---
 src/backend/replication/walsender.c | 14 --
 src/backend/utils/misc/guc.c| 12 
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b671c43..743d6c8 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -89,7 +89,7 @@
  * because signals are checked only between messages.  128kB (with
  * default 8k blocks) seems like a reasonable guess for now.
  */
-#define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
+int	max_wal_send_guc = 0;
 
 /* Array of WalSnds in shared memory */
 WalSndCtlData *WalSndCtl = NULL;
@@ -2181,7 +2181,7 @@ retry:
 /*
  * Send out the WAL in its normal physical/stored form.
  *
- * Read up to MAX_SEND_SIZE bytes of WAL that's been flushed to disk,
+ * Read up to max_wal_send bytes of WAL that's been flushed to disk,
  * but not yet sent to the client, and buffer it in the libpq output
  * buffer.
  *
@@ -2195,6 +2195,7 @@ XLogSendPhysical(void)
 	XLogRecPtr	startptr;
 	XLogRecPtr	endptr;
 	Size		nbytes;
+	int		max_wal_send = max_wal_send_guc * 1024;
 
 	if (streamingDoneSending)
 	{
@@ -2333,8 +2334,8 @@ XLogSendPhysical(void)
 
 	/*
 	 * Figure out how much to send in one message. If there's no more than
-	 * MAX_SEND_SIZE bytes to send, send everything. Otherwise send
-	 * MAX_SEND_SIZE bytes, but round back to logfile or page boundary.
+	 * max_wal_send bytes to send, send everything. Otherwise send
+	 * max_wal_send bytes, but round back to logfile or page boundary.
 	 *
 	 * The rounding is not only for performance reasons. Walreceiver relies on
 	 * the fact that we never split a WAL record across two messages. Since a
@@ -2344,7 +2345,7 @@ XLogSendPhysical(void)
 	 */
 	startptr = sentPtr;
 	endptr = startptr;
-	endptr += MAX_SEND_SIZE;
+	endptr += max_wal_send;
 
 	/* if we went beyond SendRqstPtr, back off */
 	if (SendRqstPtr <= endptr)
@@ -2363,7 +2364,8 @@ XLogSendPhysical(void)
 	}
 
 	nbytes = endptr - startptr;
-	Assert(nbytes <= MAX_SEND_SIZE);
+	Assert(nbytes <= max_wal_send);
+	elog(DEBUG2, "walsender sending WAL payload of %d bytes", nbytes);
 
 	/*
 	 * OK to read and send the slice.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a9f31ef..3a5018d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -128,6 +128,7 @@ extern bool synchronize_seqscans;
 extern char *SSLCipherSuites;
 extern char *SSLECDHCurve;
 extern bool SSLPreferServerCiphers;
+extern int max_wal_send_guc;
 
 #ifdef TRACE_SORT
 extern bool trace_sort;
@@ -2145,6 +2146,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"max_wal_send", PGC_SIGHUP, REPLICATION_SENDING,
+			gettext_noop("Sets the maximum WAL payload size for WAL replication."),
+			NULL,
+			GUC_UNIT_KB
+		},
+		&max_wal_send_guc,
+		128, 4, 16384,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"commit_delay", PGC_SUSET, WAL_SETTINGS,
 			gettext_noop("Sets the delay in microseconds between transaction commit and "
 		 "flushing WAL to disk."),
-- 
2.10.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Euler Taveira

On 04-01-2017 17:30, Tom Lane wrote:
> Simon Riggs  writes:
>> My next thought is ALTER SYSTEM support for pg_hba.conf, especially
>> since that would make it easier to do a formal test of Haribabu's
>> pg_hba view patch by adding each of the options one by one and then
>> juggling them.
> 
> It's quite unclear from this spec what you have in mind to control the
> entry order.  Also, I'd personally be -1 on inventing a pile of new SQL
> keywords for this.  Why not do it with a function, instead?  Or for extra
> credit, finish the pg_hba view work first and then make it an updatable
> view.
> 
Even if you made the view updatable, you need a field to control the
order. It has the line_number but an specific field would be desirable
(someone could add a blank or comment line between querying the view and
typing the update command).

Also, in-place update a .conf file was something vetoed in the ALTER
SYSTEM design and I think it was a clever idea. If we decided to mix
automated and hand editing, a rewrite on every change is an easier path.
Unlike ALTER SYSTEM, I'm afraid we can't invent a pg_hba.auto.conf
because (i) order matters and (ii) it stops processing when a rule
matches. In this case, we'll limit the feature usefulness.

If we don't invent new fields in pg_hba.conf, a function could be a
solution instead of a SQL syntax. However, a new field could break
compatibility (unless we stick with a default value that could not be a
good idea in the security pov).

-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-05 Thread Tom Lane

ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>> One thing I don't like about this patch is that if a user has increased
>> max_pred_locks_per_transaction, they need to set
>> max_pred_locks_per_relation to half of that to retain the current
>> behaviour, or they'll suddenly find themselves with a lot more relation
>> locks.  If it's possible to make a GUCs default value dependent on the
>> value of another, that could be a solution.  Otherwise, the page lock
>> threshold GUC could be changed to be expressed as a fraction of
>> max_pred_locks_per_transaction, to keep the current behaviour.

> Another option would be to have a special sentinel (e.g. -1) which is
> the default, and keeps the current behaviour.

FWIW, interdependent GUC defaults are generally unworkable.
See commit a16d421ca and the sorry history that led up to it.
I think you could make it work as a fraction, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> No, I think backend-lifetime is right.  The tranche registrations are
> all backend-local state, so there's no problem with backend A
> registering a string at one address and backend B registering a string
> at a different address.  It's just important that neither of those
> strings goes away before the corresponding backend does.

Then I don't understand what's going on.  Isn't the crash you fixed
because backend A was looking at the tranche containing the lock backend B
was blocked on?  How can a tranche name local to backend B work?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Joe Conway

On 01/05/2017 08:27 AM, Robert Haas wrote:
> There's also the question of whether opening up the ability to do
> this sort of thing from the SQL level is a security hazard,

It unquestionably is.

> but we've already gone fairly far down the path of assuming that
> there's not a tremendous amount of privilege separation between the
> operating system user account and the database superuser,

I think this is a very bad assumption.

> so maybe the answer is that as things stand it's not expanding the 
> vulnerability surface very much.

Perhaps as things currently stand this is true.

> One thing I'm kind of happy about is that, as far as I can see, there
> hasn't been much backlash against the existing ALTER SYSTEM, either
> from a security point of view or a user-confusion point of view.

Possibly only because there are workarounds possible using hooks and
extension code. Personally I think we should have an official way to
disable ALTER SYSTEM and I would like the same for pg_hba.conf related
functionality.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 1:02 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Not that you mention it, I think I mis-stated the problem in the
>> commit message: the problem is not if the tranche is unregistered, but
>> rather if it is registered but the pointer references an address that
>> is no longer valid.  Registering the tranche with a fixed string
>> rather than a pointer into a DSM segment that can go away fixes that.
>
> Got it.  That's fine then, but perhaps the comment on
> LWLockRegisterTranche needs to be reconsidered.  It's not good enough for
> the tranche name to be "backend lifetime", it has to be something valid in
> other backends as well.  That probably means either (1) compile-time
> constant in the core backend (not loadable modules!) or (2) allocated in
> the main shared-memory segment.

No, I think backend-lifetime is right.  The tranche registrations are
all backend-local state, so there's no problem with backend A
registering a string at one address and backend B registering a string
at a different address.  It's just important that neither of those
strings goes away before the corresponding backend does.

To put that another way, registration is performed separately in every backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-05 Thread Dagfinn Ilmari Mannsåker

ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:

> One thing I don't like about this patch is that if a user has increased
> max_pred_locks_per_transaction, they need to set
> max_pred_locks_per_relation to half of that to retain the current
> behaviour, or they'll suddenly find themselves with a lot more relation
> locks.  If it's possible to make a GUCs default value dependent on the
> value of another, that could be a solution.  Otherwise, the page lock
> threshold GUC could be changed to be expressed as a fraction of
> max_pred_locks_per_transaction, to keep the current behaviour.

Another option would be to have a special sentinel (e.g. -1) which is
the default, and keeps the current behaviour.

-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> Not that you mention it, I think I mis-stated the problem in the
> commit message: the problem is not if the tranche is unregistered, but
> rather if it is registered but the pointer references an address that
> is no longer valid.  Registering the tranche with a fixed string
> rather than a pointer into a DSM segment that can go away fixes that.

Got it.  That's fine then, but perhaps the comment on
LWLockRegisterTranche needs to be reconsidered.  It's not good enough for
the tranche name to be "backend lifetime", it has to be something valid in
other backends as well.  That probably means either (1) compile-time
constant in the core backend (not loadable modules!) or (2) allocated in
the main shared-memory segment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 12:37 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> With the old code, a backend that read pg_stat_activity without ever
>> having executed a parallel query might see a backend in the midst of
>> executing one waiting on a DSA LWLock, resulting in a crash.  The
>> solution is for backends to register the tranche at startup time, not
>> the first time a parallel query is executed.
>
> While I have no objection to the patch as committed, I have to wonder
> if this isn't papering over the underlying problem rather than solving it.
> It seems like this direction means that there's no such thing as dynamic
> registration of LWLock tranches and we should just give up on that concept
> entirely.  If we do want to preserve the concept, don't we need to fix the
> pg_stat_activity code so it doesn't fail on tranches that aren't known
> locally?

It actually has such a safeguard already (see GetLWLockIdentifier).
Not that you mention it, I think I mis-stated the problem in the
commit message: the problem is not if the tranche is unregistered, but
rather if it is registered but the pointer references an address that
is no longer valid.  Registering the tranche with a fixed string
rather than a pointer into a DSM segment that can go away fixes that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 3:21 AM, Masahiko Sawada  wrote:
> Hi and happy new year.
>
> The lazy vacuum calls lazy_cleanup_index to update statistics of
> indexes on a table such as relpages, reltuples at the end of the
> lazy_scan_heap. In all type of indexes the lazy_cleanup_index scans
> all index pages. It happens even if table has not been updated at all
> since previous vacuum invoked. Freeze map reduces the execution time
> and cost of table vacuuming much if almost table has been frozen. But
> it doesn't work for cleaning up indexes. If a very large static table
> has index then because the cleaning up index is called and it always
> scans all index pages, it takes time to scan all pages of index as
> reported[1].
>
> Attached patch introduces new GUC parameter parameter
> vacuum_cleanup_index_scale_factor which specifies the fraction of the
> table pages containing dead tuple needed to trigger a cleaning up
> indexes. The default is 0.0, which means that the cleanup index is not
> invoked if no update on table. In other word, if table is completely
> frozen then lazy vacuum can skip the index scans as well. Increasing
> this value could reduce total time of lazy vacuum but the statistics
> and the free space map of index are not updated.

Cool.  I'll look at this, but not until next CF.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> With the old code, a backend that read pg_stat_activity without ever
> having executed a parallel query might see a backend in the midst of
> executing one waiting on a DSA LWLock, resulting in a crash.  The
> solution is for backends to register the tranche at startup time, not
> the first time a parallel query is executed.

While I have no objection to the patch as committed, I have to wonder
if this isn't papering over the underlying problem rather than solving it.
It seems like this direction means that there's no such thing as dynamic
registration of LWLock tranches and we should just give up on that concept
entirely.  If we do want to preserve the concept, don't we need to fix the
pg_stat_activity code so it doesn't fail on tranches that aren't known
locally?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use strict in all Perl programs

2017-01-05 Thread Peter Eisentraut

On 12/31/16 1:34 AM, Michael Paquier wrote:
> On Sat, Dec 31, 2016 at 3:07 PM, Peter Eisentraut
>  wrote:
>> Here is a patch to add 'use strict' to all Perl programs (that I could
>> find), or move it to the right place where it was already there.  I
>> think that is a pretty standard thing to do nowadays.

committed that

> What about adding as well "use warnings"? That's standard in all the TAP 
> tests.

'use strict' can be statically checked using perl -c, but 'use warnings'
is run-time behavior, so one would have to extensively test the involved
programs.  Some cursory checking already reveals that this is going to
need to more investigation.  So in principle yes, but maybe later.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-05 Thread Merlin Moncure

On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas  wrote:
> Now, that's not to say we should never break backward compatibility.
> Sometimes we should.  I think the problem with PL/pgsql is that many
> of the compatibility breaks that people want are likely to lead to
> subtle misbehavior rather than outright failure, or are not easy to
> spot via a cursory look ("hmm, could that SELECT query ever return
> more than one row?").

The core issue is that developers tend to be very poor at estimating
the impacts of changes; they look at things the the lens of the "new".
Professional software development is quite expensive and framework-
(I'll lump the database and it's various built-in features under that
term) level changes are essentially throwing out some portion of our
user's investments.  Even fairly innocent compatibility breaks can
have major downstream impacts on our users and it's always much worse
than expected.  For example, nobody thought that changing the bytea
text encoding format to hex would have corrupted our user's data, but
it did.

TBH, the discussion should shift away from specific issues on
compatibility and towards a specific set of standards and policies
around how to do it and what kinds of technical justifications need to
be made in advance.  Security problems for example could be argued as
a valid reason to break user code, or poor adherence to the the SQL
standard which are in turn blocking other content.  Minus those kinds
of considerations it's really just not worth doing, and there's no
tricky strategy like playing with version numbers that can game that
rule.  A formal deprecation policy might be a good start.

The C language really should be considered the gold standard here.
Changes did have to be made, like getting rid of the notoriously
broken and insecure gets(), but they were made very, very slowly and
unobtrusively.

(I do think lpad should except "any" FWIW) :-D

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-05 Thread Jesper Pedersen


Hi Ashutosh,

On 01/04/2017 06:13 AM, Ashutosh Sharma wrote:

Attached is the v3 patch rebased on postgreSQL HEAD and WAL v7 patch.
It also takes care of all the previous comments from Jesper - [1].



With an --enable-cassert build (master / WAL v7 / MV v3) and

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
-- ddl.sql --

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
-- test.sql --

using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test

crashes after a few minutes with

TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*) 
(&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)


BTW, better rename 'hashkillitems' to '_hash_kill_items' to follow the 
naming convention in hash.h


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 12:28 PM, Stephen Frost  wrote:
> Generally speaking, an ALTER DATABASE is unlikely to make the cluster
> fail to start.  To be clear, I've only seen 1 or 2 cases and I'm not
> sure if, in those cases, they even fully understood how much can be
> changed through ALTER DATABASE or ALTER ROLE.

OK.

> My goal in those cases (and others where I come across installations
> with a lot of superusers) is typically to try and educate them as to
> just how close a superuser is to the unix user and recommend that they
> reconsider how they handle access privileges in the system (in
> particular, to try and get them to not have so many superusers and
> instead use other ways to give people access to what they need).

Makes sense.

> Of course, that tends to lead into things like "well, how do I make sure
> that user X has read rights on every table, always" or "how do I give
> someone the ability to terminate runaway queries that another user
> started."  We've made progress there, but there's more to do still.

I agree!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 5:03 PM, Thomas Munro
 wrote:
> The way I proposed makes it a lot easier to work with dynamic names so
> you can differentiate variable numbers of areas; the names would have
> exactly the right extent and they'd get unregistered in each backend
> at just the right time.

Only from the perspective of the backend that's using DSA.  From the
perspective of some other backend reading pg_stat_activity, it's all
wrong.  There, you want the name to registered as early as possible -
either at system startup time for builtin things or at module load
time for extensions.  With the way you have it, you'd only be able to
recognize a lock wait as being related to parallel_query_dsa if your
session had previously executed a parallel query.  That's clearly not
desirable.  With this approach, _PG_init can do LWLockRegisterTranche
and then if you stick the library into shared_preload_libraries or
session_preload_libraries *all* backends have that tranche whether
they use the library or not.  If the tranche registry were shared,
then your approach would be fine, but it isn't.

> On the other hand, I don't really like seeing
> lock tranche stuff leaking into other APIs like this, and using
> tranche IDs in any way other than allocating a small number of them at
> startup isn't really supported anyway, so +1 for doing it your way.

OK.

> At one stage I had an idea that that argument was naming the DSA area,
> not the lock tranche, and then the lock tranche happened to use a name
> that was built from that name, but that doesn't make any sense if it's
> optional depending on whether you already registered the lock
> tranche...
>
> - char lwlock_tranche_name[DSA_MAXLEN];
>
> You can remove the now-unused DSA_MAXLEN macro.

Ah, thanks.  Committed with that change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jan 5, 2017 at 11:56 AM, Stephen Frost  wrote:
> >> One thing I'm kind of happy about is that, as far as I can see, there
> >> hasn't been much backlash against the existing ALTER SYSTEM, either
> >> from a security point of view or a user-confusion point of view.
> >
> > I've seen complaints about it and have seen people changing the
> > permissions to be root/root on the .auto.conf file to disallow 'regular'
> > superusers from doing ALTER SYSTEM.  It's not exactly elegant but it's a
> > way to avoid the risk of someone messing with the system config without
> > going through the CM system.
> 
> Hmm, OK.  They're not bothered by ALTER DATABASE the_one_everybody_uses?

Generally speaking, an ALTER DATABASE is unlikely to make the cluster
fail to start.  To be clear, I've only seen 1 or 2 cases and I'm not
sure if, in those cases, they even fully understood how much can be
changed through ALTER DATABASE or ALTER ROLE.

My goal in those cases (and others where I come across installations
with a lot of superusers) is typically to try and educate them as to
just how close a superuser is to the unix user and recommend that they
reconsider how they handle access privileges in the system (in
particular, to try and get them to not have so many superusers and
instead use other ways to give people access to what they need).

Of course, that tends to lead into things like "well, how do I make sure
that user X has read rights on every table, always" or "how do I give
someone the ability to terminate runaway queries that another user
started."  We've made progress there, but there's more to do still.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] generating fmgr prototypes automatically

2017-01-05 Thread Pavel Stehule

Hi

2017-01-04 21:09 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 1/3/17 2:16 PM, Pavel Stehule wrote:
> > patch 0001 .. trivial cleaning
> > patch 0002 .. renaming lo_* to be_lo_* -- the prefix "be" is not what I
> > expect - maybe "pg" instead. More because the  be-fsstubs.h will be
> > holds only lo_read, lo_write on end
>
> It's the naming that the OpenSSL functions use, e.g., be_tls_init.
>
> > patch 0003 .. trivial, but doesn't mean so we have not regress tests for
> > these functions?
>
> OK, added tests.
>
> > patch 0004 .. bigger part - I miss a comments when there are a
> exceptions:
> >
> > extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS);
> > extern Datum nextval(PG_FUNCTION_ARGS);
> > extern Datum fmgr_sql(PG_FUNCTION_ARGS);
> > extern Datum aggregate_dummy(PG_FUNCTION_ARGS);
>
> These functions and their special purpose are commented in the .c files,
> so I think that is covered OK.  I added an additional comment to the
> numeric_float8_no_overflow().
>
> > I found some obsolete definitions in c files
>
> OK, fixed those as well.  I think I initially only looked in .h files.
> That just underlines how inconsistent this is, e.g.,
>
> > pgstatfuncs.c
> >
> > /* bogus ... these externs should be in a header file */
>
> > namespace.c
> >
> > /* These don't really need to appear in any header file */
>
> New patch set attached.
>

I checked last set of patches and I didn't find any issue.

There are no problems with patching, compilation and all regress tests
passed.

I'll mark this patch as ready for commiter

Regards

Pavel





>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Keith Fiske

Could we get some clarification on the partition_bound_spec portion of the
PARTITION OF clause? Just doing some testing it seems it's inclusive of the
FROM value but exclusive of the TO value. I don't see mention of this in
the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
It does mention that the values aren't allowed to overlap, but looking at
the schema below, without the clarification of which side is
inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
the child table does not clarify this. Not sure if there's a way to do this
in the \d+ display which would be ideal, but it should at least be
mentioned in the docs.

keith@keith=# \d+ measurement
 Table "public.measurement"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | | plain
|  |
Partition key: RANGE (logdate)
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)
Partitions: measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO
('2016-08-01'),
measurement_y2016m08 FOR VALUES FROM ('2016-08-01') TO
('2016-09-01')

keith@keith=# \d+ measurement_y2016m07
 Table "public.measurement_y2016m07"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | 0   | plain
|  |
Partition of: measurement FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)


keith@keith=# insert into measurement (logdate) values ('2016-08-01');
INSERT 0 1
Time: 2.848 ms

keith@keith=# select * from measurement_y2016m07;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.273 ms
keith@keith=# select * from measurement_y2016m08;
  logdate   | peaktemp | unitsales
+--+---
 2016-08-01 |1 |«NULL»
(1 row)

Time: 0.272 ms

keith@keith=# drop table measurement_y2016m08;
DROP TABLE
Time: 5.919 ms
keith@keith=# select * from only measurement;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.307 ms
keith@keith=# insert into measurement (logdate) values ('2016-08-01');
ERROR:  no partition of relation "measurement" found for row
DETAIL:  Failing row contains (2016-08-01, 1, null).
Time: 0.622 ms

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 09:12:49 -0800, Andres Freund wrote:
> On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote:
> > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:
> > 
> > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > > > I also suggest making the defaults for both 20 instead of 10.  That
> > > > leaves enough room that almost nobody ever has to change them, whereas
> > > > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > > > plus backup?).
> > >
> > > I'm afraid we need to add initdb integration / testing for those. I mean
> > > we have initdb test down to 10 connections to deal with limited
> > > resources...
> > >
> > 
> > If we make both 10 by default we should be OK though, no?
> 
> I'm a bit doubtful about that. On the other hand, we've increased
> max_parallel_workers without anybody complaining.

Err, I mean max_worker_processes.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote:
> On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:
> 
> > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > > I also suggest making the defaults for both 20 instead of 10.  That
> > > leaves enough room that almost nobody ever has to change them, whereas
> > > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > > plus backup?).
> >
> > I'm afraid we need to add initdb integration / testing for those. I mean
> > we have initdb test down to 10 connections to deal with limited
> > resources...
> >
> 
> If we make both 10 by default we should be OK though, no?

I'm a bit doubtful about that. On the other hand, we've increased
max_parallel_workers without anybody complaining.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 11:56 AM, Stephen Frost  wrote:
> Greetings,
>
> If we keep it to superusers then we aren't changing anything, from my
> point of view at least.  That does bring up the question of if it'd be
> useful for a non-superuser to be able to control.  I'm on the fence
> about that at the moment.  Generally speaking, it's useful for
> non-superusers to be able to control access, but pg_hba is a bit special
> as it also controls the auth method and I'm not sure that is really
> something it makes sense for a non-superuser to hack around.
>
> However, the other bits that pg_hba allows (controlling access based on
> if it's an SSL connection, or based on the source IP) would be nice to
> provide alongside the 'CONNECT' GRANT privilege instead of only being
> able to do in pg_hba.
>
> In short, I'd rather we look at ways to minimize the need for users to
> interact with pg_hba.conf than make it easier to do.

That's an interesting point.

>> One thing I'm kind of happy about is that, as far as I can see, there
>> hasn't been much backlash against the existing ALTER SYSTEM, either
>> from a security point of view or a user-confusion point of view.
>
> I've seen complaints about it and have seen people changing the
> permissions to be root/root on the .auto.conf file to disallow 'regular'
> superusers from doing ALTER SYSTEM.  It's not exactly elegant but it's a
> way to avoid the risk of someone messing with the system config without
> going through the CM system.

Hmm, OK.  They're not bothered by ALTER DATABASE the_one_everybody_uses?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane

Vitaly Burovoy  writes:
> On 1/5/17, Tom Lane  wrote:
>> We could think about replacing interval2tm's output format with some
>> other struct that uses a TimeOffset for hours and so cannot overflow.
>> I'm not sure though how far the effects would propagate; it might be
>> more work than we want to put into this.

> If values with overflow are already in a database, what do you expect
> a new output function should fix?

My point is that ideally, any value that can physically fit into struct
Interval ought to be considered valid.  The fact that interval_out can't
cope is a bug in interval_out, which ideally we would fix without
artificially restricting the range of the datatype.

Now, the problem with that of course is that it's not only interval_out
but multiple other places.  But your proposed patch also requires touching
nearly everything interval-related, so I'm not sure it has any advantage
that way over the less restrictive answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Magnus Hagander

On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund  wrote:

> On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> > I also suggest making the defaults for both 20 instead of 10.  That
> > leaves enough room that almost nobody ever has to change them, whereas
> > 10 can be a bit tight for some not-outrageous installations (8 standbys
> > plus backup?).
>
> I'm afraid we need to add initdb integration / testing for those. I mean
> we have initdb test down to 10 connections to deal with limited
> resources...
>

If we make both 10 by default we should be OK though, no?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy

On 1/5/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>>> I've written a patch which fixes that bug (in attachment).
>>> Should it be registered in the CF?
>
>> Oops. Forgot to attach the patch. Fixed.
>
> I suspect that many of these SAMESIGN() tests you've added are not
> actually adequate/useful.  That's only sufficient when the output could be
> at most a factor of 2 out-of-range.  If it could overflow past the sign
> bit then you need to work harder.

I disagree. These SAMESIGN() tests cover addition where result can not
be more than one out-of-range.
Multiplications are covered just before.

> (By the same token, the existing SAMESIGN test in interval2tm is
> wrong.)
>
> Possibly we should consider alternatives before plowing ahead in this
> direction, since adding guards to interval_in and interval computations
> doesn't help with oversize values that are already stored in a database.

We can do nothing with values are already stored in a database. But we
can prevent appearing them later.

> We could think about replacing interval2tm's output format with some
> other struct that uses a TimeOffset for hours and so cannot overflow.
> I'm not sure though how far the effects would propagate; it might be
> more work than we want to put into this.

If values with overflow are already in a database, what do you expect
a new output function should fix?

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-05 Thread Robert Haas

On Wed, Dec 28, 2016 at 2:25 PM, Jim Nasby  wrote:
> That's my whole point of why this needs to be settable at a global level: so
> that people with a lot of legacy code can set the OLD behavior at a global
> level, and deal with the old code over time.

This has the same problem being discussed nearby on the case-folding
thread, though: any extension or third-party tool has to either work
with every possible value, or else it has to require one particular
value and therefore not be usable if you need another value for some
other reason.

Now, that's not to say we should never break backward compatibility.
Sometimes we should.  I think the problem with PL/pgsql is that many
of the compatibility breaks that people want are likely to lead to
subtle misbehavior rather than outright failure, or are not easy to
spot via a cursory look ("hmm, could that SELECT query ever return
more than one row?").  Also, while everybody agrees that a bunch of
things should be changed and improved, not everybody agrees about
which ones, and sometimes person A desperately wants X changed while
person B desperately wants it changed in the other direction or left
alone.  If there were a set of changes that we could make all at once,
call the result plpgsql2 or nplpgsql or whatever, and make everybody
happy, that'd be fabulous, but we don't.  So we're left with doing
nothing, or having 2^n language variants controlled by GUCs or
pragmas, neither of which is appealing.

I think it would be a good idea to lock all the people who really care
about PL/pgsql in a room until they agree on what changes should be
made for the next version of the language.  If they don't agree
quickly enough, we can resort to the techniques described in
https://en.wikipedia.org/wiki/Papal_election,_1268%E2%80%9371

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Andres Freund

On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote:
> I also suggest making the defaults for both 20 instead of 10.  That
> leaves enough room that almost nobody ever has to change them, whereas
> 10 can be a bit tight for some not-outrageous installations (8 standbys
> plus backup?).

I'm afraid we need to add initdb integration / testing for those. I mean
we have initdb test down to 10 connections to deal with limited
resources...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 4, 2017 at 3:30 PM, Tom Lane  wrote:
> > Simon Riggs  writes:
> >> My next thought is ALTER SYSTEM support for pg_hba.conf, especially
> >> since that would make it easier to do a formal test of Haribabu's
> >> pg_hba view patch by adding each of the options one by one and then
> >> juggling them.
> >
> > It's quite unclear from this spec what you have in mind to control the
> > entry order.  Also, I'd personally be -1 on inventing a pile of new SQL
> > keywords for this.  Why not do it with a function, instead?  Or for extra
> > credit, finish the pg_hba view work first and then make it an updatable
> > view.
> >
> >> and we can then have a nice simple
> >> ALTER SYSTEM ENABLE REMOTE ACCESS FOR REPLICATION USING md5;
> >
> > I am minus a lot more than 1 on inventing a new SQL statement every time
> > somebody thinks of a new way in which they'd like to frob pg_hba.conf.
> 
> Yeah.  I don't think that the idea of having SQL syntax to manipulate
> pg_hba.conf is a terrible one, but it'd probably require some thought
> to figure out exactly how to do it nicely - i.e. easy-to-use and not
> too many new keywords.  

Agreed.

> There's also the question of whether opening
> up the ability to do this sort of thing from the SQL level is a
> security hazard, but we've already gone fairly far down the path of
> assuming that there's not a tremendous amount of privilege separation
> between the operating system user account and the database superuser,
> so maybe the answer is that as things stand it's not expanding the
> vulnerability surface very much.

If we keep it to superusers then we aren't changing anything, from my
point of view at least.  That does bring up the question of if it'd be
useful for a non-superuser to be able to control.  I'm on the fence
about that at the moment.  Generally speaking, it's useful for
non-superusers to be able to control access, but pg_hba is a bit special
as it also controls the auth method and I'm not sure that is really
something it makes sense for a non-superuser to hack around.

However, the other bits that pg_hba allows (controlling access based on
if it's an SSL connection, or based on the source IP) would be nice to
provide alongside the 'CONNECT' GRANT privilege instead of only being
able to do in pg_hba.

In short, I'd rather we look at ways to minimize the need for users to
interact with pg_hba.conf than make it easier to do.

> One thing I'm kind of happy about is that, as far as I can see, there
> hasn't been much backlash against the existing ALTER SYSTEM, either
> from a security point of view or a user-confusion point of view.

I've seen complaints about it and have seen people changing the
permissions to be root/root on the .auto.conf file to disallow 'regular'
superusers from doing ALTER SYSTEM.  It's not exactly elegant but it's a
way to avoid the risk of someone messing with the system config without
going through the CM system.

> We
> (collectively) spent a lot of time worrying about that, and AFAICS it
> hasn't really been the case.  Now, I am not sure how many people are
> using it vs. other methods of setting cluster-wide configuration
> parameters, and there have been a handful of bug reports, but
> basically nobody's come back and said that they had a terrible,
> horrible, no-good, very-bad day as a result of it, which was a concern
> at the time.  So maybe the experience with a new variant would be
> similarly good.

I've not seen any reports of someone having a terrible day because of
it, yet.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2017-01-05 Thread Tom Lane

Robert Haas  writes:
> Of course, if there's some sort of commonly-used library out there for
> this sort of thing where we can just link against it and call whatever
> APIs it exposes, that might be a better alternative, or something to
> support in addition, but I don't really know whether there's any
> standardization in this area.

I was wondering if we could make use of ssh-agent.  But it seems to want
to hold the private key itself, so that you have to communicate with it
every time you need an operation done with the key.  I'm not sure what the
performance of that is like, and I am sure that the code would look a
whole lot different from the path where we hold the key locally.  It might
be workable if OpenSSL already incorporates library routines for talking
to ssh-agent, but I haven't looked.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Tom Lane

Vitaly Burovoy  writes:
>> I've written a patch which fixes that bug (in attachment).
>> Should it be registered in the CF?

> Oops. Forgot to attach the patch. Fixed.

I suspect that many of these SAMESIGN() tests you've added are not
actually adequate/useful.  That's only sufficient when the output could be
at most a factor of 2 out-of-range.  If it could overflow past the sign
bit then you need to work harder.

(By the same token, the existing SAMESIGN test in interval2tm is
wrong.)

Possibly we should consider alternatives before plowing ahead in this
direction, since adding guards to interval_in and interval computations
doesn't help with oversize values that are already stored in a database.
We could think about replacing interval2tm's output format with some
other struct that uses a TimeOffset for hours and so cannot overflow.
I'm not sure though how far the effects would propagate; it might be
more work than we want to put into this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Stephen Frost

Tomas,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 01/05/2017 02:23 PM, Magnus Hagander wrote:
> >It's easy enough to construct a benchmark specifically to show the
> >difference, but of any actual "normal workload" for it. Typically the
> >optimization applies to things like bulk loading, which typically never
> >done alone and does not lend itself to that type of benchmarking very
> >easily.
> 
> Not sure if I understand correctly what you're saying. You're saying
> that although it'd be easy to construct a benchmark showing
> significant performance impact, it won't represent a common
> workload. Correct?

I think he's saying that it's not very easy to construct a good example
of typical bulk-loading workloads using just pgbench.  Bulk loading
certainly happens with PG and I don't think we'll make very many friends
if we break optimizations when wal_level is set to minimal like those
you get using:

BEGIN;
CREATE TABLE x (c1 int);
COPY x FROM STDIN;
COMMIT;

or:

BEGIN;
TRUNCATE x;
COPY x FROM STDIN;
COMMIT;

Changing the wal_level from 'minimal' to 'replica' or 'logical' with
such a benchmark is going to make the WAL go from next-to-nothing to
size-of-database.  One doesn't typically *just* do bulk loads, however,
often it's a bulk load into a table and then the contents of that table
are merged with another table or perhaps joined to it to produce some
report or something along those lines.  In many of those cases, our
more-recently added capability to have UNLOGGED tables will work, but
not all (in particular, it can be very handy to load everything in using
the above technique and then switch the wal_level to replica, which
avoids having to have the bulk of the data sent through WAL, something
you can't avoid if you want to turn an unlogged table into a logged
one).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-05 Thread Vladimir Rusinov

Attaching a patch that renames all 'xlog' functions, keeping aliases for
old ones (since it looks like majority vote is for keeping them).

Following functions have been renamed:

Name|   Replaced by
|---
pg_current_xlog_flush_location  | pg_current_wal_flush_location
pg_current_xlog_insert_location | pg_current_wal_insert_location
pg_current_xlog_location| pg_current_wal_location
pg_is_xlog_replay_paused| pg_is_recovery_paused
pg_last_xlog_receive_location   | pg_last_wal_receive_location
pg_last_xlog_replay_location| pg_last_wal_replay_location
pg_switch_xlog  | pg_switch_wal
pg_xlog_location_diff   | pg_wal_location_diff
pg_xlog_replay_pause| pg_pause_recovery
pg_xlog_replay_resume   | pg_resume_recovery
pg_xlogfile_name| pg_wal_file_name
pg_xlogfile_name_offset | pg_wal_file_name_offset


Questions/possible follow-up work:

- OIDs - where do I get numbers from? I was kinda choosing them at random,
unaware if there is some process for keeping track of them. Please point me
if such thing exists and I'll change them.
- Documentation. I've added a new section with a simple table, keeping it
somewhat neglected on purpose. We'll have old names in index and searchable
on page, but nothing more. Is it sufficient?
- New function names. I've used 'recovery' instead of 'xlog_replay' and
used 'wal_file' instead of 'xlogfile'. Does it make sense?
- Release notes. I was unable to find a draft for 10.0. How do I make sure
these renames are not forgotten?

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047
From f7353e84a48eb6857a3f3acf6b6fe76675e11587 Mon Sep 17 00:00:00 2001
From: Vladimir Rusinov 
Date: Tue, 3 Jan 2017 12:17:26 +
Subject: [PATCH 1/1] Remove 'xlog' references from admin functions.

After 'pg_xlog' has been renamed to 'pg_wal' 'xlog' reference in function names is confusing.
This change renames 'xlog' in function names to 'wal', keeping old
function names as aliases.

Following functions have been renamed:

Name|   Replaced by
|---
pg_current_xlog_flush_location  | pg_current_wal_flush_location
pg_current_xlog_insert_location | pg_current_wal_insert_location
pg_current_xlog_location| pg_current_wal_location
pg_is_xlog_replay_paused| pg_is_recovery_paused
pg_last_xlog_receive_location   | pg_last_wal_receive_location
pg_last_xlog_replay_location| pg_last_wal_replay_location
pg_switch_xlog  | pg_switch_wal
pg_xlog_location_diff   | pg_wal_location_diff
pg_xlog_replay_pause| pg_pause_recovery
pg_xlog_replay_resume   | pg_resume_recovery
pg_xlogfile_name| pg_wal_file_name
pg_xlogfile_name_offset | pg_wal_file_name_offset

Signed-off-by: Vladimir Rusinov 
---
 contrib/bloom/t/001_wal.pl |   2 +-
 contrib/test_decoding/expected/ddl.out |   2 +-
 contrib/test_decoding/sql/ddl.sql  |   2 +-
 doc/src/sgml/backup.sgml   |   6 +-
 doc/src/sgml/func.sgml | 225 +
 doc/src/sgml/high-availability.sgml|  12 +-
 doc/src/sgml/recovery-config.sgml  |   2 +-
 src/backend/access/transam/recovery.conf.sample|   7 +-
 src/backend/access/transam/xlog.c  |   2 +-
 src/backend/access/transam/xlogfuncs.c |  43 ++--
 src/backend/catalog/system_views.sql   |   3 +
 src/backend/po/de.po   |  12 +-
 src/backend/po/es.po   |  12 +-
 src/backend/po/fr.po   |  12 +-
 src/backend/po/id.po   |  12 +-
 src/backend/po/it.po   |  12 +-
 src/backend/po/ja.po   |  12 +-
 src/backend/po/pl.po   |  12 +-
 src/backend/po/pt_BR.po|  12 +-
 src/backend/po/ru.po   |  12 +-
 src/backend/po/zh_CN.po|  12 +-
 src/bin/pg_rewind/RewindTest.pm|   2 +-
 src/bin/pg_rewind/libpq_fetch.c|   4 +-
 src/include/access/xlog_fn.h   |  24 +--
 src/include/catalog/pg_proc.h  |  92 ++---
 src/test/modules/commit_ts/t/002_standby.pl|   8 +-
 src/test/modules/commit_ts/t/003_standby_2.pl  |   4 +-
 src/test/recovery/t/002_archiving.pl   |   6 +-
 src/test/recovery/t/003_recovery_targets.pl|  16 +-
 src/test/recovery/t/005_replay_delay.pl|   4 +-

Re: [HACKERS] ALTER SYSTEM for pg_hba.conf

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 3:30 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> My next thought is ALTER SYSTEM support for pg_hba.conf, especially
>> since that would make it easier to do a formal test of Haribabu's
>> pg_hba view patch by adding each of the options one by one and then
>> juggling them.
>
> It's quite unclear from this spec what you have in mind to control the
> entry order.  Also, I'd personally be -1 on inventing a pile of new SQL
> keywords for this.  Why not do it with a function, instead?  Or for extra
> credit, finish the pg_hba view work first and then make it an updatable
> view.
>
>> and we can then have a nice simple
>> ALTER SYSTEM ENABLE REMOTE ACCESS FOR REPLICATION USING md5;
>
> I am minus a lot more than 1 on inventing a new SQL statement every time
> somebody thinks of a new way in which they'd like to frob pg_hba.conf.

Yeah.  I don't think that the idea of having SQL syntax to manipulate
pg_hba.conf is a terrible one, but it'd probably require some thought
to figure out exactly how to do it nicely - i.e. easy-to-use and not
too many new keywords.  There's also the question of whether opening
up the ability to do this sort of thing from the SQL level is a
security hazard, but we've already gone fairly far down the path of
assuming that there's not a tremendous amount of privilege separation
between the operating system user account and the database superuser,
so maybe the answer is that as things stand it's not expanding the
vulnerability surface very much.

One thing I'm kind of happy about is that, as far as I can see, there
hasn't been much backlash against the existing ALTER SYSTEM, either
from a security point of view or a user-confusion point of view.  We
(collectively) spent a lot of time worrying about that, and AFAICS it
hasn't really been the case.  Now, I am not sure how many people are
using it vs. other methods of setting cluster-wide configuration
parameters, and there have been a handful of bug reports, but
basically nobody's come back and said that they had a terrible,
horrible, no-good, very-bad day as a result of it, which was a concern
at the time.  So maybe the experience with a new variant would be
similarly good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Tomas Vondra


On 01/05/2017 02:23 PM, Magnus Hagander wrote:



On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:

On 01/03/2017 11:56 PM, Tomas Vondra wrote:

Hi,

...

I'll push results for larger ones once those tests complete
(possibly
tomorrow).


I just pushed additional results (from the additional scales) to the
git repositories. On the larger (16/32-cores) machine with 2x
e5-2620, the results look like this

   scale minimal   replica logical
  -
   100 23968 24393   24393
   100023412 23656   23794
   15283  53205197

and on the smaller one (i5-2500k with 4 cores) I got this:

   scale minimal   replica logical
  -
   50   5884  58965873
   400  5324  53425478
   1000 5341  54395425

The scales were chosen so that the smallest one fits into shared
buffers, the medium exceeds shared buffers but still fits into RAM,
and the largest scale exceeds RAM.

The results seem to confirm that for this workload (regular
pgbench), there's very little difference between the different WAL
levels. Actually, the 'replica' seems a tad faster than 'minimal',
but the difference may be easily due to noise.

I've also looked at the amount of WAL actually produced, by doing
pgbench runs throttled to the same throughput, and counting the
number of archived WAL segments & running pg_xlogdump. Interestingly
enough, those two metrics differ quite a bit - for example for scale
1000 (on the 32-core machine), the 2h runs produced these number of
WAL segments:

   minimal: 5515 (88.2GB)
   replica: 5587 (89.4GB)
   logical: 6058 (96.9GB)

so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump,
the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of
only ~1.5% between minimal and logical. The values are also much
lower than raw WAL size, so I assume it's because pg_xlogdump
ignores some extra overhead, present in the segments. Moreover, the
sequential nature of WAL writes means even the +10% is not a big
deal (unless it results in saturating the bandwidth, but running on
>90% is a bad idea anyway).


If you are using log archiving, it also means your log archive grows by
10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica).



... and that the standby has to chew through the additional 10% of WAL. 
We already have standbys that occasionally struggle to keep up with the 
master, and adding more load won't make them happy (even if just 10%).





My conclusion from these results is that using 'wal_level=replica'
by default seems fine. Perhaps even wal_level=logical would be OK,
but that's probably a too big step for 10.0.



I think it sounds like 'replica' is the safe default.

If we can make it possible to go replica<->logical without a restart,
that makes it easy enough to increase it if necessary, and the default
still applies to most people (most people take backups, most people
probably don't do logical replication).



My thoughts, exactly.




Any ideas how to construct a plausible workload where the
differences are significantly larger? Running the tests on non-SSD
storage might also be useful.


It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.



Not sure if I understand correctly what you're saying. You're saying 
that although it'd be easy to construct a benchmark showing significant 
performance impact, it won't represent a common workload. Correct?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 11:49 AM, Stephen Frost  wrote:
>> systemd has support for getting passwords to services without tty.
>
> Oh, that's interesting, I wasn't aware of that.
>
>> So if someone is interested, there is some room for enhancement here.
>
> Agreed.

The first thing that pops into my head is that we could add a GUC
ssl_cert_passphrase_command whose contents get executed as a shell
command when we need a passphrase; that program is expected to emit
the passphrase and nothing else on standard output and then exit(0).
Blah blah logging blah blah failure handling.  That's not trivial to
implement if you want the postmaster to still be responsive while the
command is running, but I think it could be done.  (I'm not
volunteering.)

Of course, if there's some sort of commonly-used library out there for
this sort of thing where we can just link against it and call whatever
APIs it exposes, that might be a better alternative, or something to
support in addition, but I don't really know whether there's any
standardization in this area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy

On 1/5/17, Vitaly Burovoy  wrote:
> On 1/4/17, Pantelis Theodosiou  wrote:
>> On Wed, Jan 4, 2017 at 3:03 PM,  wrote:
>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:  14486
>>> Logged by:  Per Modin
>>> Email address:  web+postgre...@modin.io
>>> PostgreSQL version: 9.6.1
>>> Operating system:   Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0
>>> Description:
>>>
>>> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's
>>> in
>>> there as well. A minimum working example would be as follows:
>>>
>>> ```
>>> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1
>>> second'
>>> col; TABLE tbl;
>>> SELECT 1
>>> ERROR:  interval out of range
>>> ```
>>>
>>> ```
>>> postgres=# SELECT count(*) FROM tbl;
>>>  count
>>> ---
>>>  1
>>> (1 row)
>>> ```
>>>
>>> It seems that inserting and retrieving data have different constraints.
>>> As
>>> you
>>> can see from query 2, the data still gets inserted.
>>>
>>> ```
>>> postgres=# select version();
>>>  version
>>> 
>>> --
>>>  PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian
>>> 4.9.2-10)
>>> 4.9.2, 64-bit
>>> (1 row)
>>> ```
>>>
>>> Best regards,
>>> Per Modin
>>>
>>>
>> And these work without error:
>>
>> postgres=# select col - 9223372036854 * interval '1 second' from tbl ;
>>  ?column?
>> --
>>  00:00:00
>> (1 row)
>>
>> postgres=# select col from xx where col < interval '10 year' ;
>>  col
>> -
>> (0 rows)
>>
>
> Yes, it is a bug, but it is not a constraint, it is just different
> internal checks.
> Moreover even if a function does not raise an error, output could be wrong
> (pay attention to the duplicated '-' sign)
> postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval;
>   ?column?
> 
>  --2147483648:00:00
> (1 row)
>
> I've written a patch which fixes that bug (in attachment).
> Should it be registered in the CF?

Oops. Forgot to attach the patch. Fixed.

-- 
Best regards,
Vitaly Burovoy


0001-Add-check-for-overflow-to-interval-functions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints

2017-01-05 Thread Vitaly Burovoy

On 1/4/17, Pantelis Theodosiou  wrote:
> On Wed, Jan 4, 2017 at 3:03 PM,  wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:  14486
>> Logged by:  Per Modin
>> Email address:  web+postgre...@modin.io
>> PostgreSQL version: 9.6.1
>> Operating system:   Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0
>> Description:
>>
>> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's
>> in
>> there as well. A minimum working example would be as follows:
>>
>> ```
>> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1 second'
>> col; TABLE tbl;
>> SELECT 1
>> ERROR:  interval out of range
>> ```
>>
>> ```
>> postgres=# SELECT count(*) FROM tbl;
>>  count
>> ---
>>  1
>> (1 row)
>> ```
>>
>> It seems that inserting and retrieving data have different constraints.
>> As
>> you
>> can see from query 2, the data still gets inserted.
>>
>> ```
>> postgres=# select version();
>>  version
>> 
>> --
>>  PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian
>> 4.9.2-10)
>> 4.9.2, 64-bit
>> (1 row)
>> ```
>>
>> Best regards,
>> Per Modin
>>
>>
> And these work without error:
>
> postgres=# select col - 9223372036854 * interval '1 second' from tbl ;
>  ?column?
> --
>  00:00:00
> (1 row)
>
> postgres=# select col from xx where col < interval '10 year' ;
>  col
> -
> (0 rows)
>

Yes, it is a bug, but it is not a constraint, it is just different
internal checks.
Moreover even if a function does not raise an error, output could be wrong
(pay attention to the duplicated '-' sign)
postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval;
  ?column?

 --2147483648:00:00
(1 row)

I've written a patch which fixes that bug (in attachment).
Should it be registered in the CF?
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNDO and in-place update

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 6:51 AM, Amit Kapila  wrote:
> UNDO has to be kept till heap page is marked as all visible.  This is
> required to check the visibility of index.  Now, I think the page can
> be marked as all visible when we removed corresponding dead entries in
> heap.   I think the main point of discussion here is how vacuum will
> clean heap and index entries in this new system.  Another idea could
> be that we allow undo entries to be removed (we can allow heap entry
> to be removed) based on if they are visible or not and then while
> traversing index we cross check in heap if the entry can be removed.
> Basically,  check the TID and traverse undo contents if any to verify
> if the index entry can be removed.  I think this sounds to be more
> complicated and less efficient than what I suggested earlier.

In my proposal, when a tuple gets updated or deleted in the heap, we
go find the corresponding index entries and delete-mark them.  When an
index tuple is delete-marked, we have to cross-check it against the
heap, because the index tuple might not match the version of the heap
tuple that our snapshot can see.  Therefore, it's OK to discard the
heap page's UNDO before cleaning the indexes, because the index tuples
are already delete-marked and rechecks will therefore do the right
thing.

In short, I agree with Dilip.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNDO and in-place update

2017-01-05 Thread Robert Haas

On Wed, Jan 4, 2017 at 6:05 AM, Amit Kapila  wrote:
> Okay, so this optimization can work only after all the active
> transactions operating on a page are finished.  If that is true, in
> some cases such a design can consume a lot of CPU traversing all the
> tuples in a page for un-setting the bit, especially when such tuples
> are less.

I suppose.  I didn't think that cost was likely to be big enough to
worry about, but I might be wrong.  The worst case would be when you
modify one tuple on a page, let the transaction that did the
modification become all-visible, modify one tuple on the page again,
etc. and, at the same time, the page is entirely full of tuples.  So
you keep having to loop over all the bits to clear them (they are all
clear except one, but you don't know that) and then re-set just one of
them.  That's not free, but keep in mind that the existing system
would be forced to perform non-HOT updates in that situation, which
isn't free either.

Also, I'm thinking the bit could be stored in the line pointer rather
than the tuple, because with this design we don't need
LP_UNUSED/LP_NORMAL/LP_REDIRECT/LP_DEAD any more.  We could use one
bit to indicate dead or not-dead and the second bit to indicate
recently-modified or not-recently-modified.  With that approach,
clearing the bits only requires iterating over the line pointer array,
not the tuples themselves.

>> We don't necessarily need UNDO to clean up the indexes, although it
>> might be a good idea.  It would *work* to just periodically scan the
>> index for delete-marked items.  For each one, visit the heap and see
>> if there's a version of that tuple present in the heap or current UNDO
>> that matches the index entry.  If not, remove the index entry.
>
> I think this is somewhat similar to how we clean the index now and
> seems to be a workable design.  However, don't you think that it will
> have somewhat similar characteristics for index-bloat as we have now?

Yes, it would have similar characteristics.  Thus we might want to do better.

> OTOH for heap, it will help us to take away old versions away from the
> main heap, but still, we can't get rid of that space or reuse that
> space till we can clean corresponding index entries.

I don't think that's true.  If in-place update is ever allowed in
cases where indexed columns have been modified, then the index already
has to cope with the possibility that the heap tuple it can see
doesn't match the index.  And if it can cope with that, then why do we
have to remove the index entry before reusing the heap TID?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-05 Thread Robert Haas

On Thu, Jan 5, 2017 at 6:39 AM, Beena Emerson  wrote:
> This patch only needed the wal_segment_size and hence I made this specific
> command.
> How often and why would we need other parameter values in the replication
> connection?
> Making it a more general command to fetch any parameter can be a separate
> topic. If it gets consensus, maybe it could be done and used here.

I think the idea of supporting SHOW here is better than adding a
special-purpose command just for the WAL size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generating fmgr prototypes automatically

2017-01-05 Thread Pavel Stehule

2017-01-04 22:17 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 1/4/17 3:35 PM, Pavel Stehule wrote:
> > On 1/3/17 2:16 PM, Pavel Stehule wrote:
> > > patch 0001 .. trivial cleaning
> > > patch 0002 .. renaming lo_* to be_lo_* -- the prefix "be" is not
> what I
> > > expect - maybe "pg" instead. More because the  be-fsstubs.h will be
> > > holds only lo_read, lo_write on end
> >
> > It's the naming that the OpenSSL functions use, e.g., be_tls_init.
> >
> >
> > lo_* functions are used by OpenSSL ?
>
> There are tls functions that exist in similar ways in the backend and in
> libpq, and the backend versions are called be_*.
>

tls functions are not available via V1 API. It is still strange to me.

Regards

Pavel



>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] proposal: session server side variables

2017-01-05 Thread Fabien COELHO




Good. So we seem to agree that GUCS are transactional?


I'm surprised, I never knew this.


I must admit that it was also a (good) surprise for me.

The documentation says it:

"""
If SET (or equivalently SET SESSION) is issued within a transaction that 
is later aborted, the effects of the SET command disappear when the 
transaction is rolled back. Once the surrounding transaction is committed, 
the effects will persist until the end of the session, unless overridden 
by another SET.

"""

But I have not found anything clear about user-defined parameters.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2017-01-05 Thread Michael Paquier

On Thu, Jan 5, 2017 at 10:31 PM, Peter Eisentraut
 wrote:
> On 1/3/17 9:09 AM, Heikki Linnakangas wrote:
>> Since not everyone agrees with this approach, I split this patch into
>> two. The first patch refactors things, replacing the isMD5() function
>> with get_password_type(), without changing the representation of
>> pg_authid.rolpassword. That is hopefully uncontroversial. And the second
>> patch adds the "plain:" prefix, which not everyone agrees on.
>>
>> Barring objections I'm going to at least commit the first patch. I think
>> we should commit the second one too, but it's not as critical, and the
>> first patch matters more for the SCRAM patch, too.
>
> Is there currently anything to review here for the commit fest?

The patches sent here make sense as part of the SCRAM set:
https://www.postgresql.org/message-id/6831df67-7641-1a66-4985-268609a48...@iki.fi
I was just waiting for Heikki to fix the split of the patches before
moving on with an extra lookup though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

2017-01-05 Thread Alvaro Herrera

Tom Lane wrote:

> We could probably fix the specific issue being seen here by passing the
> expression tree through a suitable attno remapping,

Here's a first attempt at fixing this.  It makes the test pass, but I
have the feeling that more complex ones might need more work.  Have to
leave for a bit now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e70d752..75fd45a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7999,12 +7999,96 @@ ATPrepAlterColumnType(List **wqueue,
ReleaseSysCache(tuple);
 
/*
-* The recursion case is handled by ATSimpleRecursion.  However, if we 
are
-* told not to recurse, there had better not be any child tables; else 
the
-* alter would put them out of step.
+* Recurse manually, if necessary.  We cannot apply ATSimpleRecursion 
here
+* because we need to remap attribute numbers for each child.
+*
+* If we are told not to recurse, there had better not be any child
+* tables; else the alter would put them out of step.
 */
if (recurse)
-   ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
+   {
+   Oid relid = RelationGetRelid(rel);
+   ListCell   *child;
+   List   *children;
+
+   children = find_all_inheritors(relid, lockmode, NULL);
+
+   /*
+* find_all_inheritors does the recursive search of the 
inheritance
+* hierarchy, so all we have to do is process all of the relids 
in the
+* list that it returns.
+*/
+   foreach(child, children)
+   {
+   Oid childrelid = lfirst_oid(child);
+   Relationchildrel;
+   AttrNumber *attmap;
+   AttrNumber  parent_attno;
+   boolfound_whole_row;
+   TupleDesc   parentDesc;
+   TupleDesc   childDesc;
+
+   if (childrelid == relid)
+   continue;
+
+   /* find_all_inheritors already got lock */
+   childrel = relation_open(childrelid, NoLock);
+   CheckTableNotInUse(childrel, "ALTER TABLE");
+
+   /*
+* Build an attribute map for map_variable_attnos.  
This is O(N^2)
+* on the number of attributes ...
+*/
+   parentDesc = RelationGetDescr(rel);
+   childDesc = RelationGetDescr(childrel);
+   attmap = (AttrNumber *) palloc0(sizeof(AttrNumber) *
+   
parentDesc->natts);
+   for (parent_attno = 1;
+parent_attno <= parentDesc->natts;
+parent_attno++)
+   {
+   boolfound = false;
+   AttrNumber child_attno;
+
+   for (child_attno = 1;
+child_attno <= childDesc->natts;
+child_attno++)
+   {
+   if 
(strncmp(NameStr(parentDesc->attrs[parent_attno - 1]->attname),
+   
NameStr(childDesc->attrs[child_attno - 1]->attname),
+   NAMEDATALEN) == 
0)
+   {
+   attmap[parent_attno - 1] = 
child_attno;
+   found = true;
+   break;
+   }
+   }
+
+   /* should not happen */
+   if (!found)
+   elog(ERROR, "column \"%s\" not found in 
child table \"%s\"",
+
NameStr(parentDesc->attrs[parent_attno - 1]->attname),
+
RelationGetRelationName(childrel));
+   }
+
+   /*
+* Queue a command for this child, with remapped 
attnums.  Note
+* that ATPrepCmd creates a copy, so there's no need to 
do that
+* here.  XXX what about the entry for the parent table?
+*/
+

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Peter Eisentraut

On 1/4/17 2:44 PM, Peter Eisentraut wrote:
> On 1/4/17 9:46 AM, Magnus Hagander wrote:
>> How about we default max_replication_slots to -1, which means to use the
>> same value as max_wal_senders?
> 
>> But you don't necessarily want to adjust them together, do you? They are
>> both capped by max_connections, but I don't think they have any other
>> direct relation between each other? 
> 
> I think the most usual case is that you use approximately one
> replication slot per wal sender slot.  So it would be a good default to
> make them equal.

Well, let's worry about that later.

I think everyone is now in agreement with your original change proposal.

My suggestion would be to make the defaults of max_wal_senders and
max_replication_slots the same, so we don't open an opportunity for
ongoing discussions about why they are different and how different they
should be.  Ideally, we would make max_replication_slots go away at some
point similar to my suggestion above.

I also suggest making the defaults for both 20 instead of 10.  That
leaves enough room that almost nobody ever has to change them, whereas
10 can be a bit tight for some not-outrageous installations (8 standbys
plus backup?).

Your patch looks OK, documentation changes look good.  I wouldn't be
surprised if we found another place or two that will need updating, but
that is not a big deal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2017-01-05 Thread Peter Eisentraut

On 1/3/17 9:09 AM, Heikki Linnakangas wrote:
> Since not everyone agrees with this approach, I split this patch into 
> two. The first patch refactors things, replacing the isMD5() function 
> with get_password_type(), without changing the representation of 
> pg_authid.rolpassword. That is hopefully uncontroversial. And the second 
> patch adds the "plain:" prefix, which not everyone agrees on.
> 
> Barring objections I'm going to at least commit the first patch. I think 
> we should commit the second one too, but it's not as critical, and the 
> first patch matters more for the SCRAM patch, too.

Is there currently anything to review here for the commit fest?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-05 Thread Jesper Pedersen


Hi Ashutosh,

On 01/05/2017 07:13 AM, Ashutosh Sharma wrote:

* Readded pageinspect--1.6.sql

In order to have the latest pageinspect interface in 1 file, as we need
something to install from.


I think there should be no problem even if we simply add
pageinspect--1.5--1.6.sql file instead of removing 1.5.sql as with the
latest changes to the create extension infrastructure in postgresql it
automatically finds a lower version script if unable to find the
default version script and then upgrade it to the latest version.


Removing --1.5.sql otherwise would give

test=# CREATE EXTENSION "pageinspect";
ERROR:  extension "pageinspect" has no installation script nor update path
for version "1.6"


I didn't get this. Do you mean to say that if you add 1.6.sql and do
not remove 1.5.sql you get this error when trying to add pageinspect
module. I didn't
get any such error. See below,

postgres=# create extension pageinspect WITH VERSION '1.6';
CREATE EXTENSION



The previous patch was using pageinspect--1.5.sql as a base, and then 
uses pageinspect--1.5-1.6.sql to upgrade to version 1.6.


Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the 
current interface will use pageinspect--1.6.sql directly where as 
existing installations will use the upgrade scripts.


Hence I don't see a reason why we should keep pageinspect--1.5.sql 
around when we can provide a clear interface description in a 
pageinspect--1.6.sql.



Peter has already written a test-case-[1] based on your earlier patch
for supporting hash index with pageinspect module. Once the latest
patch (v10) becomes stable i will share a separete patch having a
test-case for hash index. Infact I will try to modify an already
existing patch by Peter.



Ok, sounds good.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication/backup defaults

2017-01-05 Thread Magnus Hagander

On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra 
wrote:

> On 01/03/2017 11:56 PM, Tomas Vondra wrote:
>
>> Hi,
>>
>> ...
>
>> I'll push results for larger ones once those tests complete (possibly
>> tomorrow).
>>
>>
> I just pushed additional results (from the additional scales) to the git
> repositories. On the larger (16/32-cores) machine with 2x e5-2620, the
> results look like this
>
>scale minimal   replica logical
>   -
>100 23968 24393   24393
>100023412 23656   23794
>15283  53205197
>
> and on the smaller one (i5-2500k with 4 cores) I got this:
>
>scale minimal   replica logical
>   -
>50   5884  58965873
>400  5324  53425478
>1000 5341  54395425
>
> The scales were chosen so that the smallest one fits into shared buffers,
> the medium exceeds shared buffers but still fits into RAM, and the largest
> scale exceeds RAM.
>
> The results seem to confirm that for this workload (regular pgbench),
> there's very little difference between the different WAL levels. Actually,
> the 'replica' seems a tad faster than 'minimal', but the difference may be
> easily due to noise.
>
> I've also looked at the amount of WAL actually produced, by doing pgbench
> runs throttled to the same throughput, and counting the number of archived
> WAL segments & running pg_xlogdump. Interestingly enough, those two metrics
> differ quite a bit - for example for scale 1000 (on the 32-core machine),
> the 2h runs produced these number of WAL segments:
>
>minimal: 5515 (88.2GB)
>replica: 5587 (89.4GB)
>logical: 6058 (96.9GB)
>
> so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL
> amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5%
> between minimal and logical. The values are also much lower than raw WAL
> size, so I assume it's because pg_xlogdump ignores some extra overhead,
> present in the segments. Moreover, the sequential nature of WAL writes
> means even the +10% is not a big deal (unless it results in saturating the
> bandwidth, but running on >90% is a bad idea anyway).
>

If you are using log archiving, it also means your log archive grows by 10%
(well, 8% assuming it was 9.8% on top of 0, not on top of replica).


>
> My conclusion from these results is that using 'wal_level=replica' by
> default seems fine. Perhaps even wal_level=logical would be OK, but that's
> probably a too big step for 10.0.
>


I think it sounds like 'replica' is the safe default.

If we can make it possible to go replica<->logical without a restart, that
makes it easy enough to increase it if necessary, and the default still
applies to most people (most people take backups, most people probably
don't do logical replication).



> Any ideas how to construct a plausible workload where the differences are
> significantly larger? Running the tests on non-SSD storage might also be
> useful.
>
>
It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] rewrite HeapSatisfiesHOTAndKey

2017-01-05 Thread Pavan Deolasee

On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila  wrote:

>
> Your test and results look good, what kind of m/c you have used to
> test this.

I ran it on my Macbook Pro, so nothing fancy. The code was compiled with
simple ./confgure and with no special flags. The only non-default setting
was shared_buffers = 512MB to ensure the table/index fits in memory.

>   Let me see if I or one of my colleague can do this and
> similar test on some high-end m/c.
>
>
Sure. That'll be helpful given a slight unexpected result. May be it's just
a noise or we may see different result on a high end m/c.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

1 2 >

1 - 100 of 121 matches

Mail list logo