Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 01:10 PM, Simon Riggs wrote:
 Given, user-stated probability of accessing a block of P and N total
 blocks, there are a few ways to implement block sampling.
 
 1. Test P for each block individually. This gives a range of possible
 results, with 0 blocks being possible outcome, though decreasing in
 probability as P increases for fixed N. This is the same way BERNOULLI
 works, we just do it for blocks rather than rows.
 
 2. We calculate P/N at start of scan and deliver this number blocks by
 random selection from N available blocks.
 
 At present we do (1), exactly as documented. (2) is slightly harder
 since we'd need to track which blocks have been selected already so we
 can use a random selection with no replacement algorithm. On a table
 with uneven distribution of rows this would still return a variable
 sample size, so it didn't seem worth changing.

Aha, thanks!

So, seems like this is just a doc issue? That is, we just need to
document that using SYSTEM on very small sample sizes may return
unexpected numbers of results ... and maybe also how the algorithm
actually works.

I agree that implementing (2) makes more sense as an additional
algorithm for someone to write in the future.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Alvaro Herrera
Petr Jelinek wrote:
 On 2015-08-06 22:25, Josh Berkus wrote:

 If there is no appropriate place, I'll just write a blog.
 
 There is a blog post on 2ndQ blog page which tries to describe the sampling
 methods visually, not sure if it's more obvious from that or not. It's
 somewhat broken on planet though (only title there).

http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5/

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Simon Riggs
On 6 August 2015 at 20:14, Josh Berkus j...@agliodbs.com wrote:


 This table has around 185 rows per page.  As the sample size goes up,
 the number times I get zero rows goes down, but those results seem to
 still include data pages with zero rows.  For example, here's a series
 of results from a 0.04 sample against the million-row table.

 370
 370
 370
 555
 555
 185
 0
 925

 Since this is a synthetic table I just generated, it contains almost
 exactly 185 rows per data page for every data page.  So on a 0.04%
 sample, the variation between 370 rows and 555 rows (whether we have 2
 or 3 data pages) is expected, since 0.04% of 5406 data pages is 2.16 pages.

 The results of 0, 185 and 925 are not.  It really seems like SYSTEM is
 treating 0.04% as a maximum, but taking a random number of data pages
 somewhere around that maximum, using math which can choose numbers of
 pages far outside of the % requested by the user, and which includes 0.

 Speaking from a user perspective, SYSTEM seems broken to me.  I can't
 imagine using it for anything with a that degree of variation in the
 number of results returned, especially if it's possible to return zero
 rows from a populated table.


The docs say...

The SYSTEM method does block-level sampling with each block having the
specified chance of being selected; all rows in each selected block are
returned.

SQLStandard suggests the parameter value should be an integer, which would
limit it to 1% sample. Allowing a non-integer sample size is a PostgreSQL
extension. There is no guidance on how to implement SYSTEM in the standard.

Given, user-stated probability of accessing a block of P and N total
blocks, there are a few ways to implement block sampling.

1. Test P for each block individually. This gives a range of possible
results, with 0 blocks being possible outcome, though decreasing in
probability as P increases for fixed N. This is the same way BERNOULLI
works, we just do it for blocks rather than rows.

2. We calculate P/N at start of scan and deliver this number blocks by
random selection from N available blocks.

At present we do (1), exactly as documented. (2) is slightly harder since
we'd need to track which blocks have been selected already so we can use a
random selection with no replacement algorithm. On a table with uneven
distribution of rows this would still return a variable sample size, so it
didn't seem worth changing.

This is exactly why TABLESAMPLE was designed to allow you to write your own
sampling algorithm, so you can do it anyway you want.

The context here is that SELECT count(*) from the test table takes 10ms.
The intended use case for this feature is against tables that would give
some kind of performance problem, which isn't the case here. On a table
100x larger, the sample is 1000 +/- 40, which seems more acceptable. The
variability of Bernoulli is much less, but then it is roughly 100x slower.
On a 1TB table, SYSTEM is going to be your only choice.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 01:14 PM, Josh Berkus wrote:
 On 08/06/2015 01:10 PM, Simon Riggs wrote:
 Given, user-stated probability of accessing a block of P and N total
 blocks, there are a few ways to implement block sampling.

 1. Test P for each block individually. This gives a range of possible
 results, with 0 blocks being possible outcome, though decreasing in
 probability as P increases for fixed N. This is the same way BERNOULLI
 works, we just do it for blocks rather than rows.

 2. We calculate P/N at start of scan and deliver this number blocks by
 random selection from N available blocks.

 At present we do (1), exactly as documented. (2) is slightly harder
 since we'd need to track which blocks have been selected already so we
 can use a random selection with no replacement algorithm. On a table
 with uneven distribution of rows this would still return a variable
 sample size, so it didn't seem worth changing.
 
 Aha, thanks!
 
 So, seems like this is just a doc issue? That is, we just need to
 document that using SYSTEM on very small sample sizes may return
 unexpected numbers of results ... and maybe also how the algorithm
 actually works.

Following up on this ... where is TABLESAMPLE documented other than in
the SELECT command?  Doc search on the website is having issues right
now.  I'm happy to write a doc patch.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-08-06 Thread Robert Haas
On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure mmonc...@gmail.com wrote:
 hm.  OK, what's the behavior of:

 BEGIN
   UPDATE foo SET x = x + 1 WHERE foo_id = 1;

   BEGIN WITH AUTONOMOUS TRANSACTION
 UPDATE foo SET x = x + 1 WHERE foo_id = 1;
   END;

   RAISE EXCEPTION ...;
 EXCEPTION ...

 END;

Sure, so that case might need a little bit of special handling.  That
doesn't mean it's a good idea for heavyweight locks to conflict in
general.  I think you're going to find that implementing the latter is
an extremely unrewarding task, and that the benefits are seriously
negative.  For example, consider:

BEGIN
UPDATE foo SET x = x + 1 WHERE foo_id = 1;
BEGIN WITH AUTONOMOUS TRANSACTION
UPDATE foo SET x = x + 1 WHERE foo_id = 2;
END;
END;

Now, suppose that a concurrent session does LOCK TABLE foo after the
first UPDATE and before the second one.  That's now a soft deadlock.
But the only way the deadlock detector can see that is if the main
transaction and the autonomous transaction have separate PGPROC
entries, which is a design we explicitly rejected because it puts a
tight limit on the number of ATs that can be in progress and the level
to which those ATs can be nested.  But let's say you don't care, so we
go back to that design.  The deadlock detector will have to be taught
that the outer transaction can't help but wait for the inner
transaction, so we teach it that.  Now it can see that the only way to
resolve the deadlock without aborting any transactions is to reorder
the lock request from the autonomous transaction ahead of the
concurrent session that is seeking a full table lock.  So the
autonomous transaction acquires the lock without blocking after all.
You have exactly the same result that you would have had anyway but
with a phenomenal amount of additional code and complexity.

And for what?  In the original example, the way the deadlock is going
to be reported is like this:

ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked
by process 12345.
Process 12345 waits for ShareLock on transaction 1001; blocked by process 12345.

That is not a model of clarity.  On the other hand, if you just make a
rule that attempting to update or delete a tuple that an outer
transaction has already updated throws a bespoke error, you can do
something like this:

ERROR: tuple to be updated was already modified by a suspended outer transaction

...which has precedent in an existing message in trigger.c.
Similarly, if you try to drop a table that the outer transaction has
locked, the natural thing is for CheckTableNotInUse() to catch that
and report it this way:

ERROR: cannot DROP TABLE foo because it is being used by active
queries in this session

If you work hard enough, you can instead make that generate a deadlock
error message, but you're going to have to work pretty hard, and the
result is worse.

I'd really like to hear some more *specific* scenarios where it's
valuable for locks to conflict between the outer transaction and the
AT.  I grant that tuple updates are a case where the conflict has to
be detected somehow, but I don't accept that the lock manager is the
best way to do that, and I don't accept that there are a large number
of other cases that will need similar handling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Fri, Jun 19, 2015 at 08:21:19PM +0200, Andres Freund wrote:

  I think 647248e3708, 4fe384bd85, 4f85fde8, 59f71a0d0 should also be
 
 I couldn't look up 647248e3708, I got unknown revision or path not in
 the working tree.

a 6 is missing.  6647248e3708

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Peter Geoghegan
On Fri, Jun 26, 2015 at 3:39 PM, Peter Geoghegan p...@heroku.com wrote:
 I attach a compatibility note that is clearly needed; adding this is
 an open item of mine for 9.5. This concerns foreign data wrappers and
 UPSERT.

Can you look at this please, Bruce?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 06:44:40PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Fri, Jun 19, 2015 at 08:21:19PM +0200, Andres Freund wrote:
 
   I think 647248e3708, 4fe384bd85, 4f85fde8, 59f71a0d0 should also be
  
  I couldn't look up 647248e3708, I got unknown revision or path not in
  the working tree.
 
 a 6 is missing.  6647248e3708

Thanks.  It was this one:

Don't allow immediate interrupts during authentication anymore.

Falls into the same category, but if someone can write up an entry that
covers them all, I think it would make sense to add it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thinko in processing of SHM message size info?

2015-08-06 Thread Antonin Houska
Robert Haas robertmh...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Aug 6, 2015 at 9:04 AM, Antonin Houska a...@cybertec.at wrote:
  Can anyone please explain why the following patch shouldn't be applied?
 
  diff --git a/src/backend/storage/ipc/shm_mq.c 
  b/src/backend/storage/ipc/shm_mq.c
  index 126cb07..4cd52ac 100644
  --- a/src/backend/storage/ipc/shm_mq.c
  +++ b/src/backend/storage/ipc/shm_mq.c
  @@ -584,7 +584,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void 
  **datap, bool nowait)
  if (mqh-mqh_partial_bytes + rb  sizeof(Size))
  lengthbytes = sizeof(Size) - 
  mqh-mqh_partial_bytes;
  else
  -   lengthbytes = rb - mqh-mqh_partial_bytes;
  +   lengthbytes = rb;
  memcpy(mqh-mqh_buffer[mqh-mqh_partial_bytes], 
  rawdata,
 lengthbytes);
  mqh-mqh_partial_bytes += lengthbytes;
 
 
  I'm failing to understand why anything should be subtracted. Note that the
  previous iteration must have called shm_mq_inc_bytes_read(), so rb should
  not include anything of mqh-mqh_partial_bytes. Thanks.
 

  Hmm, I think you are correct.  This would matter in the case where the
  message length word was read in more than two chunks.  I don't *think*
  that's possible right now because I believe the only systems where
  MAXIMUM_ALIGNOF  sizeof(Size) are those with MAXIMUM_ALIGNOF == 4 and
  sizeof(Size) == 8. However, if we had systems where MAXIMUM_ALIGNOF
  == 4 and sizeof(Size) == 16, or systems where MAXIMUM_ALIGNOF == 2 and
  sizeof(Size) == 8, this would be a live bug.

I ought to admit that I didn't think about the specific combinations of
MAXIMUM_ALIGNOF and sizeof(Size), and considered the problem rather rare
(maybe also because it can't happen on my workstation). But the next your
consideration makes sense to me:

 Hmm, actually, maybe it is a live bug anyway, because the if statement
 tests  rather than =.  If we've read 4 bytes and exactly 4 more
 bytes are available, we'd set lengthbytes to 0 instead of 4.  Oops.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Simon Riggs
On 6 August 2015 at 21:14, Josh Berkus j...@agliodbs.com wrote:

 On 08/06/2015 01:10 PM, Simon Riggs wrote:
  Given, user-stated probability of accessing a block of P and N total
  blocks, there are a few ways to implement block sampling.
 
  1. Test P for each block individually. This gives a range of possible
  results, with 0 blocks being possible outcome, though decreasing in
  probability as P increases for fixed N. This is the same way BERNOULLI
  works, we just do it for blocks rather than rows.
 
  2. We calculate P/N at start of scan and deliver this number blocks by
  random selection from N available blocks.


(My mistake, that would be P*N)


  At present we do (1), exactly as documented. (2) is slightly harder
  since we'd need to track which blocks have been selected already so we
  can use a random selection with no replacement algorithm. On a table
  with uneven distribution of rows this would still return a variable
  sample size, so it didn't seem worth changing.

 Aha, thanks!

 So, seems like this is just a doc issue? That is, we just need to
 document that using SYSTEM on very small sample sizes may return
 unexpected numbers of results ... and maybe also how the algorithm
 actually works.


For me, the docs seem exactly correct. The mathematical implications of
that just aren't recorded explicitly.

I will try to reword or add something to make it clear that this can return
a variable number of blocks and thus produces a result with greater
variability in the number of rows returned.

It's documented on the SELECT page only; plus there is a whole new section
on writing tablesample functions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Petr Jelinek

On 2015-08-06 22:25, Josh Berkus wrote:

On 08/06/2015 01:19 PM, Simon Riggs wrote:

For me, the docs seem exactly correct. The mathematical implications of
that just aren't recorded explicitly.


Well, for the SELECT page, all we need is the following (one changed
sentence):

The SYSTEM method is significantly faster than the BERNOULLI method when
small sampling percentages are specified, but it may return a
less-random sample of the table as a result of clustering effects, and
may return a highly variable number of results for very small sample sizes.



BTW this was one of the motivations for making tsm_system_rows contrib 
module, that one will give you exact number of tuples while still doing 
page level sampling. But since it does linear probing it's only useful 
if you want those really small amounts of tuples because it will always 
do random I/O even if you are scanning large part of the table.




I will try to reword or add something to make it clear that this can
return a variable number of blocks and thus produces a result with
greater variability in the number of rows returned.

It's documented on the SELECT page only; plus there is a whole new
section on writing tablesample functions.


Seems like it would be nice to have more detailed user docs somewhere
which explain the sampling algos we have, especially if we get more in
the future.  Not sure where would be appropriate for that, though.

If there is no appropriate place, I'll just write a blog.



There is a blog post on 2ndQ blog page which tries to describe the 
sampling methods visually, not sure if it's more obvious from that or 
not. It's somewhat broken on planet though (only title there).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Peter Geoghegan
On Thu, Aug 6, 2015 at 3:06 PM, Bruce Momjian br...@momjian.us wrote:
 Below performance improvement in the General Performance category is 
 missing:

 Reduce btree scan overhead for  and  strategies

 For , =,  and = strategies, mark the first scan key
 as already matched if scanning in an appropriate direction.
 If index tuple contains no nulls we can skip the first
 re-check for each tuple.

 While this is a nice 9.5 feature, it really is btree is faster, and
 users usually don't need to know that, unless the change is massive that
 they would change their use of the feature.

I think that Rajeev is entitled to be credited for his work,
especially as a less experienced contributor.

Sure, users are not likely to care too much about incremental progress
like this (however, I would be willing to bet that more users care
about this item than about existing items like Make initdb issue a
warning about placing the data directory at the top of a file system
mount point). IMV it is the role of the release notes to document
what went into a release fairly methodically. I often look at release
notes to determine what might have gone wrong in a part of the code
that I'm less familiar with, for example.

Users mostly only specifically care about one or two big ticket items,
and in any case are not likely to learn about them from the release
notes. The release notes seem shorter than previous years, and I don't
think it's because 9.5 is a smaller release.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 01:48:26PM -0400, Robert Haas wrote:
 On Thu, Aug 6, 2015 at 10:33 AM, Bruce Momjian br...@momjian.us wrote:
  On Sun, Jun 14, 2015 at 12:05:54PM +0100, Dean Rasheed wrote:
  On 11 June 2015 at 05:15, Bruce Momjian br...@momjian.us wrote:
   I have committed the first draft of the 9.5 release notes.  You can view
   the output here:
  
   http://momjian.us/pgsql_docs/release-9-5.html
  
 
  I think it's worth mentioning
  dcbf5948e12aa60b4d6ab65b6445897dfc971e01, probably under General
  Performance. It's an optimisation, and also a user-visible change to
  the way LEAKPROOF works. Perhaps something like
 
  Allow pushdown of non-leakproof functions into security barrier views
  if the function is not passed any arguments from the view.
 
  Previously only functions marked as LEAKPROOF could be pushed down
  into security barrier views.
 
  Sorry, just looking at this now.  Since RLS is new for 9.5, we wouldn't
  mention the RLS change in the release notes because is is part of the
  RLS new features, but we could mention the SB change --- the new text
  would be:
 
  Allow non-LEAKPROOF functions to be passed into security barrier 
  views
  if the function does not reference any table columns (Dean Rasheed)
 
  However, this is usually a level of detail that I do not cover in the
  release notes, so I need someone else to tell me it should be added.
 
 +1 for including it.  That's a security-relevant backward incompatibility.

Thanks, done.  I was not aware of the complexity of this issue.  Applied
patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
new file mode 100644
index 722c8bd..2ed7b01
*** a/doc/src/sgml/release-9.5.sgml
--- b/doc/src/sgml/release-9.5.sgml
*** FIXME: Add Andres
*** 412,417 
--- 412,428 
  
listitem
  !--
+ 2015-04-27 [dcbf594]  Stephe..: Improve qual pushdown for RLS and SB views
+ --
+para
+ Allow non-LEAKPROOF functions to be passed into security barrier
+ views if the function does not reference any table columns
+ (Dean Rasheed)
+/para
+   /listitem
+ 
+   listitem
+ !--
  2014-11-04 [5028f22] Heikki..: Switch to CRC-32C in WAL and other places.
  2015-02-10 [025c024] Heikki..: Speed up CRC calculation using slicing-by-8 alg..
  2015-04-14 [3dc2d62] Heikki..: Use Intel SSE 4.2 CRC instructions where availa..

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Fri, Jun 19, 2015 at 08:21:19PM +0200, Andres Freund wrote:
 Hi,
 
 On 2015-06-11 00:15:21 -0400, Bruce Momjian wrote:
  I have committed the first draft of the 9.5 release notes.  You can view
  the output here:
 
 So, I did a pass through master's state:
 
   listitem
para
 Add link linkend=BRINBlock Range Indexes/ (acronymBRIN/)
 (Aacute;lvaro Herrera, Heikki Linnakangas, Emre Hasegeli)
/para
 
para
 acronymBRIN/ indexes are very compact and store the min/max
 values for a range of heap blocks.
/para
   /listitem
 
 Maybe we should mention that they're cheap to maintain?

I saw how you were able to get that idea into the release notes with
little additional text --- nice.  :-)  I couldn't figure out how to do
that.

   listitem
para
 Improve in-memory hash performance (Tomas Vondra, Robert Haas)
/para
   /listitem
 
 hash performance is pretty general, there's lot of places where we use
 hashing that aren't affected.

Well, how many of our users even know what a hash is, or when we use
hashing internally, and care where the speedup is, or will understand
it?

The guiding principal is that these release notes are written for our
users, not for us.  I don't see any way to give more details here
without being confusing or overly verbose.

   listitem
para
 Improve concurrency of link linkend=guc-shared-buffersshared
 buffer/ replacement (Robert Haas, Amit Kapila)
/para
   /listitem
 
 I think in the end that patch was enhanced to a significant degree by
 making it lockless in d72731a70450b. I think the three (?) involved
 patches should just combined under one entry.

Did you address this already in
c0b0501925eacbf2d9c10cd231bf8a14e7c9ef4c?  I can't tell.

   listitem
para
 Improve concurrent locking and buffer scan performance (Andres
 Freund, Kevin Grittner)
/para
   /listitem
 
 If this is ab5194e6f, I don't think it makes sense to mention buffer
 scan - it's just any lwlock, and buffer locks aren't the primary
 benefit (ProcArrayLock, buffer mapping lock probably are that). I also
 don't think Kevin was involved?
 
 I think ed127002d8 and 4b4b680c should be mentioned in this section as
 well. 4b4b680c will considerably reduce the per backend memory usage for
 servers with large shared buffers.

I think you addressed this, right?

 sect4
  titleServer Settings/title
 
  itemizedlist
 
   listitem
para
 Replace varnamecheckpoint_segments/ with link
 linkend=guc-min-wal-sizevarnamemin_wal_size// and
 link linkend=guc-max-wal-sizevarnamemax_wal_size//
 (Heikki Linnakangas)
/para
 
para
 This allows the allocation of a large number of acronymWAL/
 files without keeping them if they are not needed.
/para
   /listitem
 
 Hm. This affects performance significantly, should we also list it there?

Uh, I saw this as more of a configuration improvement in that you don't
need to waste lots of WAL files if you don't need them.

   listitem
para
 Add acronymGUC/ link
 linkend=guc-wal-compressionvarnamewal_compression// to
 enable compression of full page images stored in acronymWAL/
 (Rahila Syed, Michael Paquier)
/para
   /listitem
 
 Also rather performance relevant?

Well, it is both, but we already have a configuration section, so it
seems most logical there.

   listitem
para
 Archive acronymWAL/ files with suffix literal.partial/
 during standby promotion (Heikki Linnakangas)
/para
   /listitem
 
 This should be expanded, will mention to Heikki. Possibly also need to
 be mentioned in the backward incompat section.

Uh, it seemed obscure enough to not mention it as a backward
incompatibility.  I am a little afraid of briefkly trying to explain why
we did it.

   listitem
para
 Allow the link linkend=pg-replication-origin-createlabeling/
 of the origin of logical replication changes (Andres Freund)
/para
 
para
 This helps with change tracking.
/para
   /listitem
 
 I think it should be 'origin and progress'. The explanation should
 probably rather be 'This is helpful when implementing replication
 solutions or something like it.

Done.

   listitem
para
 Allow control of table acronymWAL/ logging after table creation
 with link linkend=SQL-ALTERTABLEcommandALTER TABLE .. SET
 LOGGED / UNLOGGED// (Fabriacute;zio de Royes Mello)
/para
   /listitem
 
 This sounds a bit confusing. Maybe Allow to convert a WAL logged table
 to an UNLOGGED one, and the other way round?

I saw you used you wording, but I changed it to this:

Allow changing of the acronymWAL/acronym
logging status of a table ater creation with link
   

Re: [HACKERS] Race conditions in shm_mq.c

2015-08-06 Thread Antonin Houska
Robert Haas robertmh...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska a...@cybertec.at wrote:
  During my experiments with parallel workers I sometimes saw the master 
  and
  worker process blocked. The master uses shm queue to send data to the 
  worker,
  both sides nowait==false. I concluded that the following happened:
 
  The worker process set itself as a receiver on the queue after
  shm_mq_wait_internal() has completed its first check of ptr, so this
  function left sender's procLatch in reset state. But before the procLatch 
  was
  reset, the receiver still managed to read some data and set sender's 
  procLatch
  to signal the reading, and eventually called its (receiver's) WaitLatch().
 
  So sender has effectively missed the receiver's notification and called
  WaitLatch() too (if the receiver already waits on its latch, it does not 
  help
  for sender to call shm_mq_notify_receiver(): receiver won't do anything
  because there's no new data in the queue).
 
  Below is my patch proposal.
 
  Another good catch.  However, I would prefer to fix this without
  introducing a continue as I think that will make the control flow
  clearer.  Therefore, I propose the attached variant of your idea.
 
 Err, that doesn't work at all.  Have a look at this version instead.

This makes sense to me.

One advantage of continue was that I could apply the patch to my test code
(containing the appropriate sleep() calls, to simulate the race conditions)
with no conflicts and see the difference. The restructuring you do does not
allow for such a mechanical testing, but it's clear enough.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 12:45 PM, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 6 August 2015 at 20:14, Josh Berkus j...@agliodbs.com wrote:
 Speaking from a user perspective, SYSTEM seems broken to me.  I can't
 imagine using it for anything with a that degree of variation in the
 number of results returned, especially if it's possible to return zero
 rows from a populated table.
 
 Please bear in mind you have requested a very small random sample of blocks.
 
 Indeed.  My expectation about it is that you'd get the requested number of
 rows *on average* over many tries (which is pretty much what Josh's
 results show).  Since what SYSTEM actually returns must be a multiple of
 the number of rows per page, if you make a request that's less than that
 number of rows, you must get zero rows some of the time.  Otherwise the
 sampling logic is cheating.

Except that I'm getting zero results when requesting rows which equate
to 2-3 pages, not just less than one page.  Please do read the full bug
report.

And why should the *number* of pages sampled be random?  That's not what
the documentation says that SYSTEM does.  But it's pretty clearly what's
happening; this table has exactly 185 rows per page on every page.  So
the only reason for there to be a variation in the number of rows
returned is that SYSTEM is varying the number of pages sampled.

On testing, the zero results don't go away until the number of pages
requested goes up to 5 (0.095).  So that suggests that SYSTEM is varying
the number of pages retrieved by +/- 4.  Why?  What purpose does that serve?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Fri, Jun 19, 2015 at 09:44:04PM +0200, Andres Freund wrote:
 Hi,
 
 On 2015-06-11 00:15:21 -0400, Bruce Momjian wrote:
  I have committed the first draft of the 9.5 release notes.  You can view
  the output here:
 
 I'm looking through all the commits, checking which I think should
 possibly be mentioned additionally:
 - 9f03ca91 - Speed up CREATE INDEX by avoiding to copy index tuples

I can't see this information being useful to our general users.

 - 9da86753 - Reject duplicate column names in foreign key referenced-columns 
 lists.
   behavioural change, should be documented as intentional

Uh, well, I didn't think such uses were common, and we now issue a very
clear error message, so I didn't think it was worth mentioning.

 - 680513ab - Break out OpenSSL-specific code to separate files.
   should perhaps be mentioned in the developer section.

I usually only mention code changes that would affect extension authors,
or massive changes.

 - 0709b7ee7 - Change the spinlock primitives to function as compiler barriers.
   Significant behavioural change for developers.

Uh, again, not really for the group of developers I usually worry about
in the release notes.

 - 94028691 - Advance backend's advertised xmin more aggressively.
   Pretty good feature imo.

Yeah, but again, do our general users understand enough to even
understand the explaination, and even if they do, do they care.

 - 5028f22f6 - Switch to CRC-32C in WAL and other places.
   This might have compability impact in some external tools.

Agreed, text updated:

Speed up acronymCRC/ (cyclic redundancy check) computations
and switch to CRC-32C (Abhijit Menon-Sen, Heikki Linnakangas)

 - 4f1b890b1 -
   Unsure if this should be mentioned, at least it's a fairly
   large change.

This is:

Merge the various forms of transaction commit  abort records.

Again, much too internal-focused for our users to care.

 - 27846f02 - Optimize locking a tuple already locked by another subxact
   Several users ran into this slowness, so I think we should mention the
   optimization.

I usually don't mention speedup of edge cases.  There are problably a
few people who care about it though, but this will be read by thousands
of people.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 6 August 2015 at 20:14, Josh Berkus j...@agliodbs.com wrote:
 Speaking from a user perspective, SYSTEM seems broken to me.  I can't
 imagine using it for anything with a that degree of variation in the
 number of results returned, especially if it's possible to return zero
 rows from a populated table.

 Please bear in mind you have requested a very small random sample of blocks.

Indeed.  My expectation about it is that you'd get the requested number of
rows *on average* over many tries (which is pretty much what Josh's
results show).  Since what SYSTEM actually returns must be a multiple of
the number of rows per page, if you make a request that's less than that
number of rows, you must get zero rows some of the time.  Otherwise the
sampling logic is cheating.

I do *not* think that we should force the sample to contain at least one
page, which is the only way that we could satisfy the complaint as stated.

Perhaps we need to adjust the documentation to make it clearer that
block-level sampling is not the thing to use if you want a sample that
doesn't amount to a reasonable number of blocks.  But I see absolutely
no evidence here that the sampling isn't behaving exactly as expected.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Petr Jelinek

On 2015-08-06 22:17, Josh Berkus wrote:

On 08/06/2015 01:14 PM, Josh Berkus wrote:

On 08/06/2015 01:10 PM, Simon Riggs wrote:

Given, user-stated probability of accessing a block of P and N total
blocks, there are a few ways to implement block sampling.

1. Test P for each block individually. This gives a range of possible
results, with 0 blocks being possible outcome, though decreasing in
probability as P increases for fixed N. This is the same way BERNOULLI
works, we just do it for blocks rather than rows.

2. We calculate P/N at start of scan and deliver this number blocks by
random selection from N available blocks.

At present we do (1), exactly as documented. (2) is slightly harder
since we'd need to track which blocks have been selected already so we
can use a random selection with no replacement algorithm. On a table
with uneven distribution of rows this would still return a variable
sample size, so it didn't seem worth changing.


Aha, thanks!

So, seems like this is just a doc issue? That is, we just need to
document that using SYSTEM on very small sample sizes may return
unexpected numbers of results ... and maybe also how the algorithm
actually works.




Yes, it's expected behavior on very small sample size so doc patch seems 
best fix.



Following up on this ... where is TABLESAMPLE documented other than in
the SELECT command?  Doc search on the website is having issues right
now.  I'm happy to write a doc patch.



The user documentation is only in SELECT page, the rest is API docs.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-06 Thread Ildus Kurbangaliev

On 08/05/2015 09:33 PM, Robert Haas wrote:

On Wed, Aug 5, 2015 at 1:10 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:

About `memcpy`, PgBackendStatus struct already have a bunch of multi-byte
variables,  so it will be
not consistent anyway if somebody will want to copy it in that way. On the
other hand two bytes in this case
give less overhead because we can avoid the offset calculations. And as I've
mentioned before the class
of wait will be useful when monitoring of waits will be extended.

You're missing the point.  Those multi-byte fields have additional
synchronization requirements, as I explained in some detail in my
previous email. You can't just wave that away.

I see that now. Thank you for the point.

I've looked deeper and I found PgBackendStatus to be not a suitable
place for keeping information about low level waits. Really, PgBackendStatus
is used to track high level information about backend. This is why auxiliary
processes don't have PgBackendStatus, because they don't have such 
information

to expose. But when we come to the low level wait events then auxiliary
processes are as useful for monitoring as backends are. WAL writer,
checkpointer, bgwriter etc are using LWLocks as well. This is certainly 
unclear

why they can't be monitored.

This is why I think we shoudn't place wait event into PgBackendStatus. It
could be placed into PGPROC or even separate data structure with different
concurrency model which would be most suitable for monitoring.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Priority table or Cache table

2015-08-06 Thread Haribabu Kommi
On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson memissemer...@gmail.com wrote:

 I also ran the test script after making the same configuration changes that
 you have specified. I found that I was not able to get the same performance
 difference that you have reported.

 Following table lists the tps in each scenario and the % increase in
 performance.

 Threads Head PatchedDiff
 1  1669  1718  3%
 2  2844  3195  12%
 4  3909  4915  26%
 8  7332  8329 14%



coming back to this old thread.

I just tried a new approach for this priority table, instead of a
entirely separate buffer pool,
Just try to use a some portion of shared buffers to priority tables
using some GUC variable
buffer_cache_ratio(0-75) to specify what percentage of shared
buffers to be used.

Syntax:

create table tbl(f1 int) with(buffer_cache=true);

Comparing earlier approach, I though of this approach is easier to implement.
But during the performance run, it didn't showed much improvement in
performance.
Here are the test results.

 Threads HeadPatchedDiff
 1  3123  3238  3.68%
 2  5997  6261  4.40%
 4 11102   11407  2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

Here I attached a proof of concept patch.

Regards,
Hari Babu
Fujitsu Australia


cache_table_poc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 01:19 PM, Simon Riggs wrote:
 On 6 August 2015 at 21:14, Josh Berkus j...@agliodbs.com
 mailto:j...@agliodbs.com wrote:
 
 On 08/06/2015 01:10 PM, Simon Riggs wrote:
  Given, user-stated probability of accessing a block of P and N total
  blocks, there are a few ways to implement block sampling.
 
  1. Test P for each block individually. This gives a range of possible
  results, with 0 blocks being possible outcome, though decreasing in
  probability as P increases for fixed N. This is the same way BERNOULLI
  works, we just do it for blocks rather than rows.
 
  2. We calculate P/N at start of scan and deliver this number blocks by
  random selection from N available blocks.
 
 
 (My mistake, that would be P*N) 
  
 
  At present we do (1), exactly as documented. (2) is slightly harder
  since we'd need to track which blocks have been selected already so we
  can use a random selection with no replacement algorithm. On a table
  with uneven distribution of rows this would still return a variable
  sample size, so it didn't seem worth changing.
 
 Aha, thanks!
 
 So, seems like this is just a doc issue? That is, we just need to
 document that using SYSTEM on very small sample sizes may return
 unexpected numbers of results ... and maybe also how the algorithm
 actually works.
 
 
 For me, the docs seem exactly correct. The mathematical implications of
 that just aren't recorded explicitly.

Well, for the SELECT page, all we need is the following (one changed
sentence):

The SYSTEM method is significantly faster than the BERNOULLI method when
small sampling percentages are specified, but it may return a
less-random sample of the table as a result of clustering effects, and
may return a highly variable number of results for very small sample sizes.

 
 I will try to reword or add something to make it clear that this can
 return a variable number of blocks and thus produces a result with
 greater variability in the number of rows returned.
 
 It's documented on the SELECT page only; plus there is a whole new
 section on writing tablesample functions.

Seems like it would be nice to have more detailed user docs somewhere
which explain the sampling algos we have, especially if we get more in
the future.  Not sure where would be appropriate for that, though.

If there is no appropriate place, I'll just write a blog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Wed, Jun 24, 2015 at 11:34:27AM +0200, Andres Freund wrote:
 On 2015-06-23 21:08:36 -0400, Robert Haas wrote:
  On Tue, Jun 23, 2015 at 5:48 PM, Kevin Grittner kgri...@ymail.com wrote:
   Andres Freund and...@anarazel.de wrote:
listitem
 para
  Improve concurrent locking and buffer scan performance (Andres
  Freund, Kevin Grittner)
 /para
/listitem
  
   If this is ab5194e6f, I don't think it makes sense to mention buffer
   scan - it's just any lwlock, and buffer locks aren't the primary
   benefit (ProcArrayLock, buffer mapping lock probably are that). I also
  
   don't think Kevin was involved?
  
   It seems likely that 2ed5b87f9 was combined with something else in
   this reference.  By reducing buffer pins and buffer content locking
   during btree index scans it shows a slight performance gain in
   btree scans and avoids some blocking of btree index vacuuming.
 
 Oh. That's what it was combined with. I don't think it makes sense to
 throw these three items together into one note. Their benefit/risk
 potential is pretty different.

I believe Andres has made all these adjustments suggested.  If not,
please let me know.

  I think maybe we should separate that back out.  The list needs to be
  user-accessible, but if it's hard to understand what it's referring
  to, that's not good either.
 
 Yea. And if then Bruce goes and compares feature counts... :)

:-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Mon, Jun 22, 2015 at 09:14:01PM +, Rajeev rastogi wrote:
 On 11 June 2015 09:45, Bruce Momjian Wrote:
 
  
  I have committed the first draft of the 9.5 release notes.  You can
  view the output here:
  
  http://momjian.us/pgsql_docs/release-9-5.html
  
  and it will eventually appear here:
  
  http://www.postgresql.org/docs/devel/static/release.html
  
  I am ready to make suggested adjustments, though I am traveling for
  conferences for the next ten days so there might a delay in my replies.
 
 Below performance improvement in the General Performance category is 
 missing:
 
 Reduce btree scan overhead for  and  strategies
 
 For , =,  and = strategies, mark the first scan key
 as already matched if scanning in an appropriate direction.
 If index tuple contains no nulls we can skip the first
 re-check for each tuple.

While this is a nice 9.5 feature, it really is btree is faster, and
users usually don't need to know that, unless the change is massive that
they would change their use of the feature.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 11:19:46AM -0700, Jeff Janes wrote:
 On Wed, Jun 10, 2015 at 9:15 PM, Bruce Momjian br...@momjian.us wrote:
 
 I have committed the first draft of the 9.5 release notes.  You can view
 the output here:
 
         http://momjian.us/pgsql_docs/release-9-5.html
 
 and it will eventually appear here:
 
         http://www.postgresql.org/docs/devel/static/release.html
 
 I am ready to make suggested adjustments, though I am traveling for
 conferences for the next ten days so there might a delay in my replies.
 
 
 
 Improve the speed of sorting character and numeric fields
 
 Here, does numeric refer specifically to the type numeric, or to all 
 fields
 which are numeric in nature?
 
 If the former (which I think it is, as other numeric types are pass-by-value 
 to
 start with), it should probably get some sgml markup.

You are right, is only NUMERIC.  It was this commit:

commit abd94bcac4582903765be7be959d1dbc121df0d0
Author: Robert Haas rh...@postgresql.org
Date:   Thu Apr 2 14:02:55 2015 -0400

Use abbreviated keys for faster sorting of numeric datums.

Andrew Gierth, reviewed by Peter Geoghegan, with further tweaks by 
me.

I changed it to typeNUMERIC/.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-06 Thread Amit Kapila
On Thu, Aug 6, 2015 at 9:36 PM, Robert Haas robertmh...@gmail.com wrote:


 OK, committed.


Thank you.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Noah Misch
On Thu, Aug 06, 2015 at 05:34:50PM +0200, Andres Freund wrote:
 On 2015-08-06 12:29:15 -0300, Alvaro Herrera wrote:
  Andres Freund wrote:
   @@ -0,0 +1,56 @@
   +/*-
   + *
   + * lockdefs.h
   + *  Frontend exposed parts of postgres' low level lock mechanism
   + *
   + * The split between lockdefs.h and lock.h is not very principled.
  
  No kidding!
 
 Do you have a good suggestion about the split? I wanted to expose the
 minimal amount necessary, and those were the ones.

lock.h includes lwlock.h only for the benefit of the three LockHashPartition*
macros.  Those macros are pieces of the lock.c implementation that cross into
proc.c, not pieces of the lock.c public API.  I suggest instead making a
lock_internals.h for those macros and for anything else private to the various
files of the lock implementation.  For example, the PROCLOCK struct and the
functions that take arguments of that type would fit in lock_internals.h.
With that, indirect frontend lock.h inclusion will work fine.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-06 Thread Peter Geoghegan
On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Yeah, I don't think there's a big performance difference between the two
 approaches. I'm not wedded to either approach. Whichever approach we use, my
 main point was that it would be better to handle this in the logical tape
 abstraction. In my approach, you would have those in-memory tails attached
 to the last two tapes. In Peter's approach, you would have one tape that's
 completely in memory, backed by the array. In either case, the tapes would
 look like normal tapes to most of tuplesort.c. There would be no extra TSS
 state, it would be TSS_SORTEDONTAPE or TSS_FINALMERGE as usual.

TBH, it's not clear to me why making the logical tape abstraction
manage some fraction of memtuples has any advantage at all. The only
case that I can imagine where this could be useful is where a logical
tape does asynchronous I/O, and has the benefit of some portion of
memtuples (probably half) as scratch space (most I/O probably occurs
while tuplesort quicksorts the other half of memtuples). But that has
nothing to do with building a better abstraction.

The more expansive patch that I'm currently working on, that covers
every external sorting case -- the patch to *also* quicksort runs past
the first run, regardless of whether or not we can get away with a
quicksort with spillover -- is going very well. I haven't had a
solid block of time to work on it this week due to other commitments,
but it isn't very complicated. Performance benefits are considerable
even without saving any I/O. I can be as much as twice as fast for
merge sort sorts in some cases. So not quite as nice as quicksort
with spillover, but still a significant improvement considering
writing everything out is inevitable for the cases helped.

As I said before, the upcoming patch has tuplesort give up on
memtuples *ever* being a heap after the first run, whatever happens. I
just quicksort and dump in batches past the first run. Since I give up
on replacement selection sort only after the first run, I still have
most of the upside of replacement selection, but little of the big
downside of heap maintenance. This will result in smaller runs on
average past the first run. I can give you at least 3 very strong
arguments for why this is totally worth it in every case, but I'll
wait till I'm asked for them.  :-)

One useful saving made in this upcoming multi-tape-run patch is that
it never treats any non-current run as part of the heap beyond its run
number, even when currentRun is the first (run 0). So no comparisons
occur beyond the first run to maintain the heap invariant *even when
the first run is current*  -- tuples are simply appended that belong
to the second run (we only do an extra comparison to determine that
that's the run they belong in). So the second run (run 1) is not
trusted to be heapified by dumptuples(), and is quicksorted (either by
quicksort with spillover, along with much of the first run, or on
its own, when there are multiple conventional on-tape runs; it doesn't
matter which way it is quicksorted). From there on, every run is
quicksorted when memtuples fills, and written out entirely in memtuple
sized batches.

 The logical tape abstraction is currently too low-level for that. It's just
 a tape of bytes, and tuplesort.c determines where a tuple begins and ends.
 That needs to be changed so that the logical tape abstraction works
 tuple-at-a-time instead. For example, instead of LogicalTapeRead(N) which
 reads N bytes, you would have LogicalTapeReadTuple(), which reads next
 tuple, and returns its length and the tuple itself. But that would be quite
 sensible anyway.

Why would it be sensible? I honestly wonder why you want to do things
that way. What is the advantage of not having what I call the
in-memory subrun managed by a logical tape? It's already nothing
like any other type of run in several ways. Aside from being all
in-memory, it is often much larger. It's special in that it kind of
rejects the preliminary determination that some tuples within
memtuples need to be in a second, traditional, on-tape run (because we
can just quicksort everything and merge with the existing single
on-tape run). Also, we now need tuplesort_gettuple_common() to ask
READTUP() what to tell its caller about freeing memory that is
allocated in within tuplesort.c directly.

The memtuples array is already treated as an array, a heap, the head
of each tape that is merged, and maybe one other thing that I forgot
about offhand. The field SortTuple.tupindex has a total of 3
context-dependent meanings. Playing these kind of games with the
memtuples array is very much something that happens already.

More than anything else, I think that the new TSS_MEMTAPEMERGE state
is justified as a special case because quicksort with spillover is
legitimately a special case. Users will want to know how close they
were to an internal sort when looking at EXPLAIN ANALYZE and so on.
When cost_sort() is 

Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 06:42:38PM -0700, Peter Geoghegan wrote:
 On Thu, Aug 6, 2015 at 6:08 PM, Bruce Momjian br...@momjian.us wrote:
  I though tabout this, and it is really an issue for FDW authors, not for
  end users, so I put this text in the Source Code changes section:
 
 I carefully considered where to put it, and chose the compatibility
 section based on the precedent of this 9.4 item:
 
 Foreign data wrappers that support updating foreign tables must
 consider the possible presence of AFTER ROW triggers (Noah Misch)
 
 I don't suppose it matters much, though. I can close out that open item now.

Oh, I think that 9.4 is in the wrong section, but good you researched it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-08-06 Thread Merlin Moncure
On Thu, Aug 6, 2015 at 4:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure mmonc...@gmail.com wrote:
 hm.  OK, what's the behavior of:

 BEGIN
   UPDATE foo SET x = x + 1 WHERE foo_id = 1;

   BEGIN WITH AUTONOMOUS TRANSACTION
 UPDATE foo SET x = x + 1 WHERE foo_id = 1;
   END;

   RAISE EXCEPTION ...;
 EXCEPTION ...

 END;

 Sure, so that case might need a little bit of special handling.  That
 doesn't mean it's a good idea for heavyweight locks to conflict in
 general.  I think you're going to find that implementing the latter is
 an extremely unrewarding task, and that the benefits are seriously
 negative.  For example, consider:

 BEGIN
 UPDATE foo SET x = x + 1 WHERE foo_id = 1;
 BEGIN WITH AUTONOMOUS TRANSACTION
 UPDATE foo SET x = x + 1 WHERE foo_id = 2;
 END;
 END;

 Now, suppose that a concurrent session does LOCK TABLE foo after the
 first UPDATE and before the second one.  That's now a soft deadlock.
 But the only way the deadlock detector can see that is if the main
 transaction and the autonomous transaction have separate PGPROC
 entries, which is a design we explicitly rejected because it puts a
 tight limit on the number of ATs that can be in progress and the level
 to which those ATs can be nested.  But let's say you don't care, so we
 go back to that design.  The deadlock detector will have to be taught
 that the outer transaction can't help but wait for the inner
 transaction, so we teach it that.  Now it can see that the only way to
 resolve the deadlock without aborting any transactions is to reorder
 the lock request from the autonomous transaction ahead of the
 concurrent session that is seeking a full table lock.  So the
 autonomous transaction acquires the lock without blocking after all.
 You have exactly the same result that you would have had anyway but
 with a phenomenal amount of additional code and complexity.

 And for what?  In the original example, the way the deadlock is going
 to be reported is like this:

 ERROR: deadlock detected
 DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked
 by process 12345.
 Process 12345 waits for ShareLock on transaction 1001; blocked by process 
 12345.

 That is not a model of clarity.  On the other hand, if you just make a
 rule that attempting to update or delete a tuple that an outer
 transaction has already updated throws a bespoke error, you can do
 something like this:

 ERROR: tuple to be updated was already modified by a suspended outer 
 transaction

 ...which has precedent in an existing message in trigger.c.
 Similarly, if you try to drop a table that the outer transaction has
 locked, the natural thing is for CheckTableNotInUse() to catch that
 and report it this way:

 ERROR: cannot DROP TABLE foo because it is being used by active
 queries in this session

 If you work hard enough, you can instead make that generate a deadlock
 error message, but you're going to have to work pretty hard, and the
 result is worse.

 I'd really like to hear some more *specific* scenarios where it's
 valuable for locks to conflict between the outer transaction and the
 AT.  I grant that tuple updates are a case where the conflict has to
 be detected somehow, but I don't accept that the lock manager is the
 best way to do that, and I don't accept that there are a large number
 of other cases that will need similar handling.

I don't necessarily disagree with what you're saying, but it's not
clear to me what the proposed behavior is.  Since the AT can commit
before the outer, ISTM *any* ungranted lock requested by the AT but
held by the outer leads to either A: functional deadlock (regardless
of implementation details) or B: special behavior.  Deadlocks would
certainly require some acrobatics to detect and resolve due to the
fact that one party to the lock is not in fact blocked on a lock but
on the outer's execution state.  So maybe the right thing to do is to
simply ignore the problem and hang both transactions until timeout or
cancel; this isn't really much different vs. ghetto dblink style AT
that is done today in my experience.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Peter Geoghegan
On Thu, Aug 6, 2015 at 3:39 PM, Bruce Momjian br...@momjian.us wrote:
 I am using the same criteria I have always used.  If you would like it
 changed, we need to discuss it at a macro level, not for individual
 cases where we feel someone didn't get enough _credit_.

I don't know how you can say that no *user* cares about Rajeev's
B-Tree contribution on the one hand, but on the other hand add items
about things like hash index crash-safety warnings, or an entire
section on obscure source code changes, with commentary about memory
ordering considerations, for example.

 And, I will restate this again, the release note are not for _us_, or
 for _credit_ --- they are for our general users.  If you would like that
 changed, you need to make a case for that change.

Certainly, that's mostly true. But if it was entirely true, then no
individual would be named on the release notes at all. That would be a
mistake, but not because any ordinary user would care one bit.

I am not going to comment further on this matter.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Mon, Jun 29, 2015 at 05:05:59PM -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  I'm working on integrating the suggestions I made last week to the
  release notes. Would anybody mind if I start to add commit ids in
  comments, similar to what Tom has done for minor releases, to news
  items?
 
 +1.  Helps confirm which items are meant to correspond to which commits.
 
 In case you didn't realize it already, the stuff I put into the minor
 release notes is from src/tools/git_changelog.  Dunno whether we need
 to use that exact format for major releases though; there's no need to
 identify branches in major release notes.

Gee, if I had known we wanted to do this for major releases I could have
done it at the time I created the 9.5 release notes.  It has to be
harder to do it after the fact.  Should I do it for 9.6?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 03:32:43PM -0700, Peter Geoghegan wrote:
 On Thu, Aug 6, 2015 at 3:06 PM, Bruce Momjian br...@momjian.us wrote:
  Below performance improvement in the General Performance category is 
  missing:
 
  Reduce btree scan overhead for  and  strategies
 
  For , =,  and = strategies, mark the first scan key
  as already matched if scanning in an appropriate direction.
  If index tuple contains no nulls we can skip the first
  re-check for each tuple.
 
  While this is a nice 9.5 feature, it really is btree is faster, and
  users usually don't need to know that, unless the change is massive that
  they would change their use of the feature.
 
 I think that Rajeev is entitled to be credited for his work,
 especially as a less experienced contributor.
 
 Sure, users are not likely to care too much about incremental progress
 like this (however, I would be willing to bet that more users care
 about this item than about existing items like Make initdb issue a
 warning about placing the data directory at the top of a file system
 mount point). IMV it is the role of the release notes to document
 what went into a release fairly methodically. I often look at release
 notes to determine what might have gone wrong in a part of the code
 that I'm less familiar with, for example.
 
 Users mostly only specifically care about one or two big ticket items,
 and in any case are not likely to learn about them from the release
 notes. The release notes seem shorter than previous years, and I don't
 think it's because 9.5 is a smaller release.

I am using the same criteria I have always used.  If you would like it
changed, we need to discuss it at a macro level, not for individual
cases where we feel someone didn't get enough _credit_.

And, I will restate this again, the release note are not for _us_, or
for _credit_ --- they are for our general users.  If you would like that
changed, you need to make a case for that change.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2015-08-06 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 Tom Lane writes:
 On 08/01/2015 05:59 PM, Tom Lane wrote:
 Well, I certainly think all of these represent bugs:
 1 | ERROR:  could not find pathkey item to sort

 Hmm ... I see no error with these queries as of today's HEAD or
 back-branch tips.  I surmise that this was triggered by one of the other
 recently-fixed bugs, though the connection isn't obvious offhand.

 I still see this error in master as of b8cbe43, but the queries are
 indeed a pita to reproduce.  The one below is the only one so far that
 is robust against running ANALYZE on the regression db, and also
 reproduces when I run it as an EXTRA_TEST with make check.

Got that one, thanks!  But we've seen this error message before in all
sorts of weird situations, so I wouldn't assume that all the cases you've
come across had the same cause.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Tue, Jun 30, 2015 at 12:12:16AM +0200, Andres Freund wrote:
 On 2015-06-29 17:58:54 -0400, Tom Lane wrote:
  Yeah we are.  The only places you'll find where we aren't formatting to 77
  or 78 columns or so are where it would require breaking SGML tags in weird
  places.
 
 Which isn't exactly infrequent...
 
 Anyway, how about:
 format:%cd [%h] %(8,trunc)%cN: %(49,trunc)%s
 
 (which you can configure as pretty.pgmajor or so in .gitconfig btw.)

Should we add this to src/tools/git_changelog?  It currently uses git
log --format=fuller --date=iso.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Jun 30, 2015 at 12:12:16AM +0200, Andres Freund wrote:
 Anyway, how about:
 format:%cd [%h] %(8,trunc)%cN: %(49,trunc)%s
 (which you can configure as pretty.pgmajor or so in .gitconfig btw.)

 Should we add this to src/tools/git_changelog?  It currently uses git
 log --format=fuller --date=iso.

I don't think that format can easily be merged into what git_changelog
does: there would be no place to put branch or release annotations.
But it was a real PITA to add that format into the alpha2 release notes,
so we should probably think of a way to produce a suitable report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Fri, Jun 26, 2015 at 03:39:19PM -0700, Peter Geoghegan wrote:
 On Wed, Jun 10, 2015 at 9:15 PM, Bruce Momjian br...@momjian.us wrote:
  I am ready to make suggested adjustments
 
 I attach a compatibility note that is clearly needed; adding this is
 an open item of mine for 9.5. This concerns foreign data wrappers and
 UPSERT.

I though tabout this, and it is really an issue for FDW authors, not for
end users, so I put this text in the Source Code changes section:

+para
+ Foreign tables can now take part in commandINSERT ... ON CONFLICT
+ DO NOTHING/ queries (Peter Geoghegan, Heikki Linnakangas,
+ Andres Freund)
+/para
+
+para
+ Foreign data wrappers must be modified to handle this.
+ commandINSERT ... ON CONFLICT DO UPDATE/ is not supported on
+ foreign tables.
+/para
+   /listitem

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Peter Geoghegan
On Thu, Aug 6, 2015 at 6:08 PM, Bruce Momjian br...@momjian.us wrote:
 I though tabout this, and it is really an issue for FDW authors, not for
 end users, so I put this text in the Source Code changes section:

I carefully considered where to put it, and chose the compatibility
section based on the precedent of this 9.4 item:

Foreign data wrappers that support updating foreign tables must
consider the possible presence of AFTER ROW triggers (Noah Misch)

I don't suppose it matters much, though. I can close out that open item now.

Thanks

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in analyzejoins.c

2015-08-06 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 this one was in today's sqlsmith harvest.  It triggers an assertion in
 current master (c030dc4).

Fixed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Tue, Jun 30, 2015 at 09:00:44PM +0200, Andres Freund wrote:
 I've gone through the release notes and added comments referencing
 commits as discussed earlier. Additionally I've improved and added a
 bunch of items.
 
 Further stuff that came up while looking:
 * 2014-09-25 [b0d81ad] Heikki..: Add -D option to specify data directory to 
 pg_c..
   new options, should possibly be documented?

As I remember it was added just for consistency with other tools and is
not expected to be used because the data directory is required, so
documenting it seems like it would encourage nonsensical use.

 * 2014-10-02 [3acc10c9] Robert..: Increase the number of buffer mapping 
 partition..
   should we mention this? This has been patched by a number of
   downstream vendors and users, so it's probably worth calling out?

Uh, I am not sure why general users would care.

 * 2014-11-18 [606c012] Simon ..: Reduce btree scan overhead for  and  
 strategies
   Worthy of a note in the performance section.

I commented on this already.

 * 2014-11-22 [eca2b9b] Andrew..: Rework echo_hidden for \sf and \ef from 
 commit ..
   Seems far too minor in comparison to other stuff left out.

This is a user-visible change.

 * 2014-11-07 [7516f52] Alvaro..: BRIN: Block Range Indexes
   Looks to me like that should just be Alvaro.

OK, other names removed.

 * 2014-11-22 [b62f94c] Tom Lane: Allow simplification of EXISTS() subqueries 
 con..
   Planner change, I think it's good to mention those.

I have talked to Tom about this in the past and unless the item has a
user-visible componient, in general he doesn't think they should be
mentioned.  I added a bunch of planner items in the major release notes
years ago and he thought they should be removed, and they were.

 
 * 2014-11-28 [e384ed6] Tom Lane: Improve typcache: cache negative lookup 
 results..
   Should perhaps, together with other cache enhancing entries, be mentioned?

Uh, I am not seeing it.

 * 2014-12-08 [519b075] Simon ..: Use GetSystemTimeAsFileTime directly in win32
   2014-12-08 [8001fe6] Simon ..: Windows: use GetSystemTimePreciseAsFileTime 
 if ..
   Timer resolution isn't a unimportant thing for people using explain?

This all seemed very internals-only, e.g.:

On most Windows systems this change will actually have no significant 
effect on
timestamp resolution as the system timer tick is typically between 1ms and 
15ms
depending on what timer resolution currently running applications have
requested. You can check this with clockres.exe from sysinternals. Despite 
the
platform limiation this change still permits capture of finer timestamps 
where
the system is capable of producing them and it gets rid of an unnecessary
syscall.

Was I wrong?

 * 2014-12-12 [7e354ab] Andrew..: Add several generator functions for jsonb 
 that ..
   2015-05-12 [c694701] Andrew..: Additional functions and operators for jsonb
   2015-05-31 [37def42] Andrew..: Rename jsonb_replace to jsonb_set and allow 
 it ..
   2014-12-12 [237a882] Andrew..: Add json_strip_nulls and jsonb_strip_nulls 
 fun..
   should probably be merged? Similar set of authors and too many similar 
 release note
   entries.

They all do different types of things, hance the different entries.

 * 2014-12-23 [d7ee82e] Alvaro..: Add SQL-callable pg_get_object_address
 * 2014-12-30 [a676201] Alvaro..: Add pg_identify_object_as_address
   should be merged.

OK, merged.

 * 2015-01-13 [4bad60e] Andres..: Allow latches to wait for socket writability 
 wi..
   2015-01-14 [59f71a0] Andres..: Add a default local latch for use in signal 
 han..
   2015-01-17 [ff44fba] Andres..: Replace walsender's latch with the general 
 shar..
   2015-02-03 [387da18] Andres..: Use a nonblocking socket for FE/BE 
 communicatio..
   2015-02-03 [4f85fde] Andres..: Introduce and use infrastructure for 
 interrupt ..
   2015-02-03 [4fe384b] Andres..: Process 'die' interrupts while 
 reading/writing ..
   2015-02-03 [6647248] Andres..: Don't allow immediate interrupts during 
 authent..
   2015-02-03 [675] Andres..: Move deadlock and other interrupt handling 
 in p..
   2015-02-13 [80788a4] Heikki..: Simplify waiting logic in reading from 
 /writin..

Again, I am willing to add an combined entry for these, but I can't
figure out accurate text for it.

 * 2015-01-17 [9402869] Heikki..: Advance backend's advertised xmin more 
 aggressi..
   This is a pretty big efficiency boon for systems with longer
   nontrivial transactions.

What is the user-visible behavior here?  More aggressive cleanup?

 * 2015-01-29 [ed12700] Andres..: Align buffer descriptors to cache line 
 boundari..
   Maybe worthwhile to mention?

Uh, I think you and I worked on that.  Odd I didn't list it, but I don't
think it fit in with any user-visible behavior, and was rare in that it
only happened on larger systems.

 * 2015-02-16 [9e3ad1a] Tom Lane: Use fast path in plpgsql's RETURN/RETURN 
 NEXT i..
   2015-02-28 [e524cbd] Tom Lane: Track typmods in 

Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Thu, Aug  6, 2015 at 10:21:29PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Jun 30, 2015 at 12:12:16AM +0200, Andres Freund wrote:
  Anyway, how about:
  format:%cd [%h] %(8,trunc)%cN: %(49,trunc)%s
  (which you can configure as pretty.pgmajor or so in .gitconfig btw.)
 
  Should we add this to src/tools/git_changelog?  It currently uses git
  log --format=fuller --date=iso.
 
 I don't think that format can easily be merged into what git_changelog
 does: there would be no place to put branch or release annotations.
 But it was a real PITA to add that format into the alpha2 release notes,
 so we should probably think of a way to produce a suitable report.

Agreed.  If it is already output by git_changelog it would be much
easier to add them to the release notes.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to

2015-08-06 Thread Mikko Tiihonen
On Thu, Aug 6, 2015 at 03:15 AM, Michael Paquier michael.paqu...@gmail.com 
wrote:
On Wed, Aug 5, 2015 at 11:53 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Jul  8, 2015 at 12:24:37PM -0400, Robbie Harwood wrote:
  You update the documentation just for  psql but your change effects any
  libpq application if we go forward with this patch we should update the
  documentation for libpq as well.
 
  This approach seems to work with the url style of conninfo
 
  For example
postgres://some-down-host.info,some-other-host.org:5435/test1
 
  seems to work as expected but I don't like that syntax I would rather see
  postgres://some-down-host.info:5435/test1,postgres://some-other-host.org:5435/test1
 
  This would be a more invasive change but I think the syntax is more 
  usable.

 I agree with this; it seems to me that it's more powerful to be able to
 specify complete urls for when they may differ.

 For the non-url case though, I don't see a clean way of doing this.  We
 could always, e.g., locally bind port specification to the closest host
 specification, but that seems nasty, and is still less powerful than
 passing urls (or we could just do the same for all parameters, but
 that's just a mess).

 Might it be reasonable to only allow the multi-host syntax in the
 url-style and not otherwise?

 First, I agree this is a very useful feature that we want.  Many NoSQL
 databases are promoting multi-host client libraries as HA, which is kind
 of humorous, and also makes sense because many NoSQL solution are
 multi-host.
 I can see this feature benefitting us for clients to auto-failover
 without requiring a pooler or virtual IP reassignment, and also useful
 for read-only connections that want to connect to a read-only slave, but
 don't care which one.  The idea of randomly selecting a host from the
 list might be a future feature.

Yep. The JDBC driver is doing it as well.

I added the JDBC driver support similar feature. Currently it supports the 
following tuning parameters given a list of hostname/port combinations to 
connect to:
  targetServerType=any|master|slave|preferSlave
  loadBalanceHosts=false|true

For an example 2 node master,replica setup one would open write connections 
with host1,host2  targetServerType=master
and read-only connections with host1,host2  targetServerType=preferSlave.

 I realize this is libpq-feature-creep, but considering the complexities
 of a pooler and virtual IP address reassignment, I think adding this
 The fact that other DBs are doing it, including I think
 VMWare's libpq, supports the idea of adding this simple specification.

Because the feature as its simplest is a for loop in libpq. I would not think 
it much of a feature creep, especially since my original patch to libpq showed 
the loop already has been hidden in libpq for a long time, it just needed a 
special dns record for the postgresql hosts that returned dns records for all 
hosts.

Even there are poolers in front of postgres they can be set up in much simpler 
and reliable non-cluster mode when the libpq can be given multiple pooler 
addresses to connect to.

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Andres Freund
On 2015-08-05 23:18:08 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  ... I'm going to reshuffle things in that direction tomorrow. I'll
  wait for other fallout first though. So far only gcc, xlc and clang (via
  gcc frontend) have run...
 
 In the department of other fallout, pademelon is not happy:
 
 cc -Ae -g +O0 -Wp,-H16384 -I../../../src/include -D_XOPEN_SOURCE_EXTENDED  
 -I/usr/local/libxml2-2.6.23/include/libxml2 -I/usr/local/include  -c -o 
 pg_resetxlog.o pg_resetxlog.c
 cc -Ae -g +O0 -Wp,-H16384 pg_resetxlog.o -L../../../src/port 
 -L../../../src/common -L/usr/local/libxml2-2.6.23/lib -L/usr/local/lib -Wl,+b 
 -Wl,'/home/bfarm/bf-data/HEAD/inst/lib'  -Wl,-z -lpgcommon -lpgport -lxnet 
 -lxml2 -lz -lreadline -ltermcap -lm  -o pg_resetxlog
 /usr/ccs/bin/ld: Unsatisfied symbols:
pg_atomic_clear_flag_impl (code)
pg_atomic_init_flag_impl (code)
pg_atomic_compare_exchange_u32_impl (code)
pg_atomic_fetch_add_u32_impl (code)
pg_atomic_test_set_flag_impl (code)
pg_atomic_init_u32_impl (code)
 make[3]: *** [pg_resetxlog] Error 1
 
 I'd have thought that port/atomics.c would be included in libpgport, but
 it seems not.

Given that it requires spinlocks for the fallback, which in turn may
require semaphores, that didn't seem like a good idea. Thus atomics.c is
in src/backend/port not src/port (just like other locking stuff like
spinlocks, semaphores etc).

 (But is pademelon really the only buildfarm critter relying
 on the fallback atomics implementation?)

I don't think so. At least that didn't use to be the case. My guess is
that less ancient compilers don't emit code for unreferenced functions
and this thus doesn't show up.

 Another possible angle is: what the heck does pg_resetxlog need with
 atomic ops, anyway?

It really doesn't. It's just fallout from indirectly including lwlock.h
which includes an atomic variable. The include path leading to it is

In file included from 
/home/andres/src/postgresql/src/include/storage/lwlock.h:19:0,
 from /home/andres/src/postgresql/src/include/storage/lock.h:18,
 from 
/home/andres/src/postgresql/src/include/access/tuptoaster.h:18,
 from 
/home/andres/src/postgresql/src/bin/pg_resetxlog/pg_resetxlog.c:49:
/home/andres/src/postgresql/src/include/port/atomics.h:41:2: error: #error 
THOU SHALL NOT REQUIRE ATOMICS
 #error THOU SHALL NOT REQUIRE ATOMICS

There's other path's (slot.h) as well if that were fixed.

 Should these things have a different, stub implementation for FRONTEND
 code?

I'm honestly not yet sure what the best approach here would be.

One approach is to avoid including lwlock.h/slot.h in frontend
code. That'll require some minor surgery and adding a couple includes,
but it doesn't look that bad.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Thinko in processing of SHM message size info?

2015-08-06 Thread Antonin Houska
Can anyone please explain why the following patch shouldn't be applied?

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 126cb07..4cd52ac 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -584,7 +584,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void 
**datap, bool nowait)
if (mqh-mqh_partial_bytes + rb  sizeof(Size))
lengthbytes = sizeof(Size) - 
mqh-mqh_partial_bytes;
else
-   lengthbytes = rb - mqh-mqh_partial_bytes;
+   lengthbytes = rb;
memcpy(mqh-mqh_buffer[mqh-mqh_partial_bytes], 
rawdata,
   lengthbytes);
mqh-mqh_partial_bytes += lengthbytes;


I'm failing to understand why anything should be subtracted. Note that the
previous iteration must have called shm_mq_inc_bytes_read(), so rb should
not include anything of mqh-mqh_partial_bytes. Thanks.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing unreferenced files

2015-08-06 Thread Joshua D. Drake


On 08/05/2015 11:44 AM, Ron Farrer wrote:


Initial questions that had no consensus in previous discussions:
1. Approach on file handling undecided
2. Startup vs standalone tool


I think it should be on startup and perhaps also have a function that 
will do it from user space. If this problem persists, we shouldn't 
expect users to go down to clean it up.



3. If startup: how to determine when to run

Outstanding problems (point-for-point with original 2005 post):
1. Does not work with non-standard tablespaces. The check will even
report all files are stale, etc.
2. Has issues with stale subdirs of a tablespace (subdirs corresponding
to a nonexistent database) [appears related to #1 because of maintenance
mode and not failing]
3. Assumes relfilenode is unique database-wide when it’s only safe
tablespace-wide
4. Does not examine table segment files such as “nnn.1” - it should
instead complain when “nnn” does not match a hash entry
5. It loads every value of relfilenode in pg_class into the hash table
without checking that it is meaningful or not - needs to check.
6. strol vs strspn (or other) [not sure what the problem here is. If
errors are handled correctly this should not be an issue]
7. No checks for readdir failure [this should be easy to check for]


Ron,

Do you have suggestions on how to resolve any of the 1-7?



Other thoughts:
1. What to do if problem happens during drop table/index and the files
that should be removed are still there.. the DBA needs to know when this
happens somehow


Why? If the drop/table/index happens, should it not just remove those files?


2.  What happened to pgfsck: was that a better approach? why was that
abandoned?



3.  What to do about stale files and missing files


IMO, unless there is a prevailing reason not to we should remove them.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Simon Riggs
On 6 August 2015 at 20:14, Josh Berkus j...@agliodbs.com wrote:


 The results of 0, 185 and 925 are not.  It really seems like SYSTEM is
 treating 0.04% as a maximum, but taking a random number of data pages
 somewhere around that maximum, using math which can choose numbers of
 pages far outside of the % requested by the user, and which includes 0.


Thanks for the report, I'll look at this now.


 Speaking from a user perspective, SYSTEM seems broken to me.  I can't
 imagine using it for anything with a that degree of variation in the
 number of results returned, especially if it's possible to return zero
 rows from a populated table.


Please bear in mind you have requested a very small random sample of blocks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-08-06 Thread Fabrízio de Royes Mello
On Wed, Aug 5, 2015 at 9:21 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Aug 6, 2015 at 3:06 AM, Fabrízio de Royes Mello wrote:
  On Wed, Aug 5, 2015 at 9:31 AM, Robert Haas wrote:
  Agreed.  I think we're making a mountain out of a molehill here.  As
  long as the locks that are actually used are monotonic, just use  and
  stick a comment in there explaining that it could need adjustment if
  we use other lock levels in the future.  I presume all the lock-levels
  used for DDL are, and will always be, self-exclusive, so why all this
  hand-wringing?
 
 
  New version attached with suggested changes.

 Thanks!

 +# SET autovacuum_* options needs a ShareUpdateExclusiveLock
 +# so we mix reads with it to see what works or waits
 s/needs/need/ and I think you mean mixing writes, not reads.

 Those are minor things though, and from my point of view a committer
 can look at it.


Fixed. Thanks for your review.

Regards,

*** This work is funded by Zenvia Mobile Results (http://www.zenvia.com.br)

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1c1c181..ad985cd 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -543,6 +543,10 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
   of commandALTER TABLE/ that forces a table rewrite.
  /para
 
+ para
+  Changing autovacuum storage parameters acquires a literalSHARE UPDATE EXCLUSIVE/literal lock.
+ /para
+
  note
   para
While commandCREATE TABLE/ allows literalOIDS/ to be specified
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 180f529..c39b878 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -57,7 +57,8 @@ static relopt_bool boolRelOpts[] =
 		{
 			autovacuum_enabled,
 			Enables autovacuum in this relation,
-			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
 		},
 		true
 	},
@@ -65,7 +66,8 @@ static relopt_bool boolRelOpts[] =
 		{
 			user_catalog_table,
 			Declare a table as an additional catalog table, e.g. for the purpose of logical replication,
-			RELOPT_KIND_HEAP
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
 		},
 		false
 	},
@@ -73,7 +75,8 @@ static relopt_bool boolRelOpts[] =
 		{
 			fastupdate,
 			Enables \fast update\ feature for this GIN index,
-			RELOPT_KIND_GIN
+			RELOPT_KIND_GIN,
+			AccessExclusiveLock
 		},
 		true
 	},
@@ -81,7 +84,8 @@ static relopt_bool boolRelOpts[] =
 		{
 			security_barrier,
 			View acts as a row security barrier,
-			RELOPT_KIND_VIEW
+			RELOPT_KIND_VIEW,
+			AccessExclusiveLock
 		},
 		false
 	},
@@ -95,7 +99,8 @@ static relopt_int intRelOpts[] =
 		{
 			fillfactor,
 			Packs table pages only to this percentage,
-			RELOPT_KIND_HEAP
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
 		},
 		HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
 	},
@@ -103,7 +108,8 @@ static relopt_int intRelOpts[] =
 		{
 			fillfactor,
 			Packs btree index pages only to this percentage,
-			RELOPT_KIND_BTREE
+			RELOPT_KIND_BTREE,
+			AccessExclusiveLock
 		},
 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
 	},
@@ -111,7 +117,8 @@ static relopt_int intRelOpts[] =
 		{
 			fillfactor,
 			Packs hash index pages only to this percentage,
-			RELOPT_KIND_HASH
+			RELOPT_KIND_HASH,
+			AccessExclusiveLock
 		},
 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
 	},
@@ -119,7 +126,8 @@ static relopt_int intRelOpts[] =
 		{
 			fillfactor,
 			Packs gist index pages only to this percentage,
-			RELOPT_KIND_GIST
+			RELOPT_KIND_GIST,
+			AccessExclusiveLock
 		},
 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
 	},
@@ -127,7 +135,8 @@ static relopt_int intRelOpts[] =
 		{
 			fillfactor,
 			Packs spgist index pages only to this percentage,
-			RELOPT_KIND_SPGIST
+			RELOPT_KIND_SPGIST,
+			AccessExclusiveLock
 		},
 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100
 	},
@@ -135,7 +144,8 @@ static relopt_int intRelOpts[] =
 		{
 			autovacuum_vacuum_threshold,
 			Minimum number of tuple updates or deletes prior to vacuum,
-			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
 		},
 		-1, 0, INT_MAX
 	},
@@ -143,7 +153,8 @@ static relopt_int intRelOpts[] =
 		{
 			autovacuum_analyze_threshold,
 			Minimum number of tuple inserts, updates or deletes prior to analyze,
-			RELOPT_KIND_HEAP
+			RELOPT_KIND_HEAP,
+			ShareUpdateExclusiveLock
 		},
 		-1, 0, INT_MAX
 	},
@@ -151,7 +162,8 @@ static relopt_int intRelOpts[] =
 		{
 			autovacuum_vacuum_cost_delay,
 			Vacuum cost delay 

Re: [HACKERS] Race conditions in shm_mq.c

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska a...@cybertec.at wrote:
 During my experiments with parallel workers I sometimes saw the master and
 worker process blocked. The master uses shm queue to send data to the worker,
 both sides nowait==false. I concluded that the following happened:

 The worker process set itself as a receiver on the queue after
 shm_mq_wait_internal() has completed its first check of ptr, so this
 function left sender's procLatch in reset state. But before the procLatch was
 reset, the receiver still managed to read some data and set sender's procLatch
 to signal the reading, and eventually called its (receiver's) WaitLatch().

 So sender has effectively missed the receiver's notification and called
 WaitLatch() too (if the receiver already waits on its latch, it does not help
 for sender to call shm_mq_notify_receiver(): receiver won't do anything
 because there's no new data in the queue).

 Below is my patch proposal.

Another good catch.  However, I would prefer to fix this without
introducing a continue as I think that will make the control flow
clearer.  Therefore, I propose the attached variant of your idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index e765cea..8e10822 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -777,7 +777,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			return SHM_MQ_DETACHED;
 		}
 
-		if (available == 0)
+		if (available == 0  !mqh-mqh_counterparty_attached)
 		{
 			shm_mq_result res;
 
@@ -805,6 +805,13 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 mqh-mqh_counterparty_attached = true;
 			}
 
+			/*
+			 * The receiver may have read some data after attaching, so we
+			 * must not wait without rechecking the queue state.
+			 */
+		}
+		else if (available == 0)
+		{
 			/* Let the receiver know that we need them to read some data. */
 			res = shm_mq_notify_receiver(mq);
 			if (res != SHM_MQ_SUCCESS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
Version: 9.5alpha2

Issue: when requesting small samples, SYSTEM often returns zero rows,
and sometimes returns unexpected numbers of rows.

Example:

create table thous ( id int, val text );
insert into thous select i, i::text || '-val'  from
generate_series(1,10) as gs(i);
analyze;

This is a 100,000 row table, so a 0.01% sample should be 10 rows.  Since
10 rows is far less than what's on one data page, the documentation
suggests that I should get 1 data page worth of rows.   However:

postgres=# select * from thous tablesample system ( 0.01 );
 id | val
+-
(0 rows)

Time: 0.636 ms

... this query consistently returns 0 rows, in 20 runs.

Ok,let's try a million-row table, which is the example we have in the docs.

postgres=# select * from mil tablesample system ( 0.01 );
 id | val
+-
(0 rows)

Hmmm?

On testing, the query against the million-row table returns 0 rows
around 50% of the time.

This table has around 185 rows per page.  As the sample size goes up,
the number times I get zero rows goes down, but those results seem to
still include data pages with zero rows.  For example, here's a series
of results from a 0.04 sample against the million-row table.

370
370
370
555
555
185
0
925

Since this is a synthetic table I just generated, it contains almost
exactly 185 rows per data page for every data page.  So on a 0.04%
sample, the variation between 370 rows and 555 rows (whether we have 2
or 3 data pages) is expected, since 0.04% of 5406 data pages is 2.16 pages.

The results of 0, 185 and 925 are not.  It really seems like SYSTEM is
treating 0.04% as a maximum, but taking a random number of data pages
somewhere around that maximum, using math which can choose numbers of
pages far outside of the % requested by the user, and which includes 0.

Speaking from a user perspective, SYSTEM seems broken to me.  I can't
imagine using it for anything with a that degree of variation in the
number of results returned, especially if it's possible to return zero
rows from a populated table.

BERNOULLI works as expected.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thinko in processing of SHM message size info?

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 9:04 AM, Antonin Houska a...@cybertec.at wrote:
 Can anyone please explain why the following patch shouldn't be applied?

 diff --git a/src/backend/storage/ipc/shm_mq.c 
 b/src/backend/storage/ipc/shm_mq.c
 index 126cb07..4cd52ac 100644
 --- a/src/backend/storage/ipc/shm_mq.c
 +++ b/src/backend/storage/ipc/shm_mq.c
 @@ -584,7 +584,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void 
 **datap, bool nowait)
 if (mqh-mqh_partial_bytes + rb  sizeof(Size))
 lengthbytes = sizeof(Size) - 
 mqh-mqh_partial_bytes;
 else
 -   lengthbytes = rb - mqh-mqh_partial_bytes;
 +   lengthbytes = rb;
 memcpy(mqh-mqh_buffer[mqh-mqh_partial_bytes], 
 rawdata,
lengthbytes);
 mqh-mqh_partial_bytes += lengthbytes;


 I'm failing to understand why anything should be subtracted. Note that the
 previous iteration must have called shm_mq_inc_bytes_read(), so rb should
 not include anything of mqh-mqh_partial_bytes. Thanks.

 Hmm, I think you are correct.  This would matter in the case where the
 message length word was read in more than two chunks.  I don't *think*
 that's possible right now because I believe the only systems where
 MAXIMUM_ALIGNOF  sizeof(Size) are those with MAXIMUM_ALIGNOF == 4 and
 sizeof(Size) == 8.  However, if we had systems where MAXIMUM_ALIGNOF
 == 4 and sizeof(Size) == 16, or systems where MAXIMUM_ALIGNOF == 2 and
 sizeof(Size) == 8, this would be a live bug.

Hmm, actually, maybe it is a live bug anyway, because the if statement
tests  rather than =.  If we've read 4 bytes and exactly 4 more
bytes are available, we'd set lengthbytes to 0 instead of 4.  Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thinko in processing of SHM message size info?

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 9:04 AM, Antonin Houska a...@cybertec.at wrote:
 Can anyone please explain why the following patch shouldn't be applied?

 diff --git a/src/backend/storage/ipc/shm_mq.c 
 b/src/backend/storage/ipc/shm_mq.c
 index 126cb07..4cd52ac 100644
 --- a/src/backend/storage/ipc/shm_mq.c
 +++ b/src/backend/storage/ipc/shm_mq.c
 @@ -584,7 +584,7 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void 
 **datap, bool nowait)
 if (mqh-mqh_partial_bytes + rb  sizeof(Size))
 lengthbytes = sizeof(Size) - 
 mqh-mqh_partial_bytes;
 else
 -   lengthbytes = rb - mqh-mqh_partial_bytes;
 +   lengthbytes = rb;
 memcpy(mqh-mqh_buffer[mqh-mqh_partial_bytes], 
 rawdata,
lengthbytes);
 mqh-mqh_partial_bytes += lengthbytes;


 I'm failing to understand why anything should be subtracted. Note that the
 previous iteration must have called shm_mq_inc_bytes_read(), so rb should
 not include anything of mqh-mqh_partial_bytes. Thanks.

Hmm, I think you are correct.  This would matter in the case where the
message length word was read in more than two chunks.  I don't *think*
that's possible right now because I believe the only systems where
MAXIMUM_ALIGNOF  sizeof(Size) are those with MAXIMUM_ALIGNOF == 4 and
sizeof(Size) == 8.  However, if we had systems where MAXIMUM_ALIGNOF
== 4 and sizeof(Size) == 16, or systems where MAXIMUM_ALIGNOF == 2 and
sizeof(Size) == 8, this would be a live bug.

Thanks for reviewing; I'll go push this change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 10:33 AM, Bruce Momjian br...@momjian.us wrote:
 On Sun, Jun 14, 2015 at 12:05:54PM +0100, Dean Rasheed wrote:
 On 11 June 2015 at 05:15, Bruce Momjian br...@momjian.us wrote:
  I have committed the first draft of the 9.5 release notes.  You can view
  the output here:
 
  http://momjian.us/pgsql_docs/release-9-5.html
 

 I think it's worth mentioning
 dcbf5948e12aa60b4d6ab65b6445897dfc971e01, probably under General
 Performance. It's an optimisation, and also a user-visible change to
 the way LEAKPROOF works. Perhaps something like

 Allow pushdown of non-leakproof functions into security barrier views
 if the function is not passed any arguments from the view.

 Previously only functions marked as LEAKPROOF could be pushed down
 into security barrier views.

 Sorry, just looking at this now.  Since RLS is new for 9.5, we wouldn't
 mention the RLS change in the release notes because is is part of the
 RLS new features, but we could mention the SB change --- the new text
 would be:

 Allow non-LEAKPROOF functions to be passed into security barrier views
 if the function does not reference any table columns (Dean Rasheed)

 However, this is usually a level of detail that I do not cover in the
 release notes, so I need someone else to tell me it should be added.

+1 for including it.  That's a security-relevant backward incompatibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cache invalidation skip logic

2015-08-06 Thread Qingqing Zhou
In cache invalidation logic, we have the following comment:

/*
* Now that we have the lock, check for invalidation messages, so that we
* will update or flush any stale relcache entry before we try to use it.
* RangeVarGetRelid() specifically relies on us for this.  We can skip
* this in the not-uncommon case that we already had the same type of lock
* being requested, since then no one else could have modified the
* relcache entry in an undesirable way.  (In the case where our own xact
* modifies the rel, the relcache update happens via
* CommandCounterIncrement, not here.)
*/
if (res != LOCKACQUIRE_ALREADY_HELD)
   AcceptInvalidationMessages();

It is true after we hold the lock, nobody will further modify it but there
could be some left-over invalidation message we shall accept before we can
continue. This is can be demonstrated with the following invalidation
sequence:
 {
1: inval A;
2: inval B;
...;
10: inval pg_class
}

After step 10, another session may encounter a lock and replays this sequence:

step 1:  RelationBuildDesc(A), it heap_open(pg_class),
 pg_class lock not acquired yet, so it acquires the lock and
 recursively replay the sequence, goto step 2.
step 2:
 RelationBuildDesc(B), it heap_open(pg_class),
 but this time we already have LOCKACQUIRE_ALREADY_HELD with
 pg_class, so we now access pg_class but it is wrong.

User may ends up with a could not open file ... error.

Is above sequence possible?

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in analyzejoins.c

2015-08-06 Thread Andreas Seltenreich
Hi,

this one was in today's sqlsmith harvest.  It triggers an assertion in
current master (c030dc4).

regards,
Andreas

-- TRAP: FailedAssertion(!(!bms_is_empty(phinfo-ph_eval_at)), File: 
analyzejoins.c, Line: 474)
select
  rel_141618057.srvfdw as c0,
  rel_141618057.srvversion as c1
from
  public.tv as rel_141617979
right join pg_catalog.pg_foreign_server as rel_141618057
  left join pg_catalog.pg_roles as rel_141618058
  on (rel_141618057.srvname = rel_141618058.rolname )
on (rel_141617979.type = rel_141618058.rolpassword )
where rel_141618057.srvfdw is not NULL;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-08-06 Thread Simon Riggs
On 5 August 2015 at 18:46, Alvaro Herrera alvhe...@2ndquadrant.com wrote:


 What do others think?


Wow, everything moves when you blink, eh? Sorry I was wasn't watching this.
Mainly because I was working on some other related thoughts, separate post
coming.

1. Most importantly, it needs to be somewhere where we can use the function
in a regression test. As I said before, I would not commit this without a
formal proof of correctness.

2. I'd also like to be able to make checks on this while we're in
production, to ensure we have no bugs. I was trying to learn from earlier
mistakes and make sure we are ready with diagnostic tools to allow run-time
checks and confirm everything is good. If people feel that means I've asked
for something in the wrong place, I am happy to skip that request and place
it wherever requested.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Race conditions in shm_mq.c

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska a...@cybertec.at wrote:
 During my experiments with parallel workers I sometimes saw the master and
 worker process blocked. The master uses shm queue to send data to the worker,
 both sides nowait==false. I concluded that the following happened:

 The worker process set itself as a receiver on the queue after
 shm_mq_wait_internal() has completed its first check of ptr, so this
 function left sender's procLatch in reset state. But before the procLatch was
 reset, the receiver still managed to read some data and set sender's 
 procLatch
 to signal the reading, and eventually called its (receiver's) WaitLatch().

 So sender has effectively missed the receiver's notification and called
 WaitLatch() too (if the receiver already waits on its latch, it does not help
 for sender to call shm_mq_notify_receiver(): receiver won't do anything
 because there's no new data in the queue).

 Below is my patch proposal.

 Another good catch.  However, I would prefer to fix this without
 introducing a continue as I think that will make the control flow
 clearer.  Therefore, I propose the attached variant of your idea.

Err, that doesn't work at all.  Have a look at this version instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index e765cea..0e60dbc 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -777,33 +777,37 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			return SHM_MQ_DETACHED;
 		}
 
-		if (available == 0)
+		if (available == 0  !mqh-mqh_counterparty_attached)
 		{
-			shm_mq_result res;
-
 			/*
 			 * The queue is full, so if the receiver isn't yet known to be
 			 * attached, we must wait for that to happen.
 			 */
-			if (!mqh-mqh_counterparty_attached)
+			if (nowait)
 			{
-if (nowait)
+if (shm_mq_get_receiver(mq) == NULL)
 {
-	if (shm_mq_get_receiver(mq) == NULL)
-	{
-		*bytes_written = sent;
-		return SHM_MQ_WOULD_BLOCK;
-	}
-}
-else if (!shm_mq_wait_internal(mq, mq-mq_receiver,
-			   mqh-mqh_handle))
-{
-	mq-mq_detached = true;
 	*bytes_written = sent;
-	return SHM_MQ_DETACHED;
+	return SHM_MQ_WOULD_BLOCK;
 }
-mqh-mqh_counterparty_attached = true;
 			}
+			else if (!shm_mq_wait_internal(mq, mq-mq_receiver,
+		   mqh-mqh_handle))
+			{
+mq-mq_detached = true;
+*bytes_written = sent;
+return SHM_MQ_DETACHED;
+			}
+			mqh-mqh_counterparty_attached = true;
+
+			/*
+			 * The receiver may have read some data after attaching, so we
+			 * must not wait without rechecking the queue state.
+			 */
+		}
+		else if (available == 0)
+		{
+			shm_mq_result res;
 
 			/* Let the receiver know that we need them to read some data. */
 			res = shm_mq_notify_receiver(mq);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: [HACKERS] 9.5 release notes

2015-08-06 Thread Jeff Janes
On Wed, Jun 10, 2015 at 9:15 PM, Bruce Momjian br...@momjian.us wrote:

 I have committed the first draft of the 9.5 release notes.  You can view
 the output here:

 http://momjian.us/pgsql_docs/release-9-5.html

 and it will eventually appear here:

 http://www.postgresql.org/docs/devel/static/release.html

 I am ready to make suggested adjustments, though I am traveling for
 conferences for the next ten days so there might a delay in my replies.



Improve the speed of sorting character and numeric fields

Here, does numeric refer specifically to the type numeric, or to all
fields which are numeric in nature?

If the former (which I think it is, as other numeric types are
pass-by-value to start with), it should probably get some sgml markup.

(copy to list, accidentally sent just to Bruce initially)

Thanks,

Jeff


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 One approach is to avoid including lwlock.h/slot.h in frontend
 code. That'll require some minor surgery and adding a couple includes,
 but it doesn't look that bad.

 Patch doing that attached.

This seems kinda messy.  Looking at the contents of lock.h, it seems like
getting rid of its dependency on lwlock.h is not really very appropriate,
because there is boatloads of other backend-only stuff in there.  Why is
any frontend code including lock.h at all?  If there is a valid reason,
should we refactor lock.h into two separate headers, one that is safe to
expose to frontends and one with the rest of the stuff?

Also, I think the reason for the #include is that lock.h has a couple
of macros that reference MainLWLockArray.  The fact that you can omit
the #include lwlock.h is therefore only accidental: if we had done those
as static inline functions, this wouldn't work at all.  So I think this
solution is not very future-proof either.

 As a consequence I think we should actually add a bunch of #ifdef
 FRONTEND #error checks in code we definitely do not want to included in
 frontend code.  The attached patch so far adds a check to atomics.h,
 lwlock.h and s_lock.h.

I agree with that idea anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-08-06 Thread Uriy Zhuravlev
On Wednesday 08 July 2015 12:29:38 David Rowley wrote:
 On 8 July 2015 at 02:00, Alexander Korotkov a.korot...@postgrespro.ru
 
 wrote:
  Patch doesn't apply to current master. Could you, please, rebase it?
 
 Attached. Thanks.
 
 Regards
 
 David Rowley
 
 --
  David Rowley   http://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Hello.
What is the current status of the patch?

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Race conditions in shm_mq.c

2015-08-06 Thread Antonin Houska
During my experiments with parallel workers I sometimes saw the master and
worker process blocked. The master uses shm queue to send data to the worker,
both sides nowait==false. I concluded that the following happened:

The worker process set itself as a receiver on the queue after
shm_mq_wait_internal() has completed its first check of ptr, so this
function left sender's procLatch in reset state. But before the procLatch was
reset, the receiver still managed to read some data and set sender's procLatch
to signal the reading, and eventually called its (receiver's) WaitLatch().

So sender has effectively missed the receiver's notification and called
WaitLatch() too (if the receiver already waits on its latch, it does not help
for sender to call shm_mq_notify_receiver(): receiver won't do anything
because there's no new data in the queue).

Below is my patch proposal.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 126cb07..39ea673 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -803,6 +808,22 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 	return SHM_MQ_DETACHED;
 }
 mqh-mqh_counterparty_attached = true;
+
+/*
+ * Re-check if the queue is still full.
+ *
+ * While we were using our procLatch to detect receiver's
+ * connection, the receiver could have connected and started
+ * reading from it - that includes concurrent manipulation of
+ * the latch, in order to report on reading activity. Thus we
+ * could miss the information that some data has already been
+ * consumed, and cause a deadlock by calling SetLatch() below.
+ *
+ * (If the receiver starts waiting on its latch soon enough,
+ * our call of shm_mq_notify_receiver() will have no effect.)
+ */
+if (!nowait)
+	continue;
 			}
 
 			/* Let the receiver know that we need them to read some data. */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 3:09 AM, Andres Freund and...@anarazel.de wrote:
 It really doesn't. It's just fallout from indirectly including lwlock.h
 which includes an atomic variable. The include path leading to it is

 In file included from 
 /home/andres/src/postgresql/src/include/storage/lwlock.h:19:0,
  from 
 /home/andres/src/postgresql/src/include/storage/lock.h:18,
  from 
 /home/andres/src/postgresql/src/include/access/tuptoaster.h:18,
  from 
 /home/andres/src/postgresql/src/bin/pg_resetxlog/pg_resetxlog.c:49:
 /home/andres/src/postgresql/src/include/port/atomics.h:41:2: error: #error 
 THOU SHALL NOT REQUIRE ATOMICS
  #error THOU SHALL NOT REQUIRE ATOMICS

Isn't that #include entirely superfluous?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Andres Freund
On 2015-08-06 10:29:39 -0400, Robert Haas wrote:
 On Thu, Aug 6, 2015 at 3:09 AM, Andres Freund and...@anarazel.de wrote:
  It really doesn't. It's just fallout from indirectly including lwlock.h
  which includes an atomic variable. The include path leading to it is
 
  In file included from 
  /home/andres/src/postgresql/src/include/storage/lwlock.h:19:0,
   from 
  /home/andres/src/postgresql/src/include/storage/lock.h:18,
   from 
  /home/andres/src/postgresql/src/include/access/tuptoaster.h:18,
   from 
  /home/andres/src/postgresql/src/bin/pg_resetxlog/pg_resetxlog.c:49:
  /home/andres/src/postgresql/src/include/port/atomics.h:41:2: error: #error 
  THOU SHALL NOT REQUIRE ATOMICS
   #error THOU SHALL NOT REQUIRE ATOMICS
 
 Isn't that #include entirely superfluous?

Which one?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-08-06 Thread Erik Rijkers

On 2015-08-06 15:36, Uriy Zhuravlev wrote:

On Wednesday 08 July 2015 12:29:38 David Rowley wrote:
On 8 July 2015 at 02:00, Alexander Korotkov 
a.korot...@postgrespro.ru


wrote:
 Patch doesn't apply to current master. Could you, please, rebase it?

Attached. Thanks.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Hello.
What is the current status of the patch?



[unique_joins_9e6d4e4_2015-07-08.patch]


FWIW, I just happened to be looking at the latest patch:

applies OK (against current HEAD)
compiles OK
make check fail:
 join ... FAILED
installs OK

I was just going to repeat David's 2 queries (hardware: x86_64, Linux 
2.6.18-402.el5)


-- with these tables:
create table t1 (id int primary key);
create table t2 (id int primary key);
insert into t1 select x.x from generate_series(1,100) x(x);
insert into t2 select x.x from generate_series(1,100) x(x);

create table s1 (id varchar(32) primary key);
insert into s1 select x.x::varchar from generate_series(1,100) x(x);
create table s2 (id varchar(32) primary key);
insert into s2 select x.x::varchar from generate_series(1,100) x(x);

vacuum analyze;


I do not see much difference between patched and unpatched
after running both David's statements (unijoin.sql and unijoin2.sql):

-- pgbench -f unijoin.sql -n -T 300 -P30 testdb
select count(t2.id) from t1 left outer join t2 on t1.id=t2.id;
tps = 2.323222 - unpatched
tps = 2.356906 - patched

-- -- pgbench -f unijoin2.sql -n -T 300 -P30 testdb
tps = 1.257656 - unpatched
tps = 1.225758 - patched


So as far as I can tell it does not look very promising.


Erik Rijkers
























--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-06 Thread Bruce Momjian
On Sun, Jun 14, 2015 at 12:05:54PM +0100, Dean Rasheed wrote:
 On 11 June 2015 at 05:15, Bruce Momjian br...@momjian.us wrote:
  I have committed the first draft of the 9.5 release notes.  You can view
  the output here:
 
  http://momjian.us/pgsql_docs/release-9-5.html
 
 
 I think it's worth mentioning
 dcbf5948e12aa60b4d6ab65b6445897dfc971e01, probably under General
 Performance. It's an optimisation, and also a user-visible change to
 the way LEAKPROOF works. Perhaps something like
 
 Allow pushdown of non-leakproof functions into security barrier views
 if the function is not passed any arguments from the view.
 
 Previously only functions marked as LEAKPROOF could be pushed down
 into security barrier views.

Sorry, just looking at this now.  Since RLS is new for 9.5, we wouldn't
mention the RLS change in the release notes because is is part of the
RLS new features, but we could mention the SB change --- the new text
would be:

Allow non-LEAKPROOF functions to be passed into security barrier views
if the function does not reference any table columns (Dean Rasheed)

However, this is usually a level of detail that I do not cover in the
release notes, so I need someone else to tell me it should be added.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Andres Freund
On 2015-08-06 10:27:52 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  One approach is to avoid including lwlock.h/slot.h in frontend
  code. That'll require some minor surgery and adding a couple includes,
  but it doesn't look that bad.
 
  Patch doing that attached.
 
 This seems kinda messy.  Looking at the contents of lock.h, it seems like
 getting rid of its dependency on lwlock.h is not really very appropriate,
 because there is boatloads of other backend-only stuff in there.  Why is
 any frontend code including lock.h at all?  If there is a valid reason,
 should we refactor lock.h into two separate headers, one that is safe to
 expose to frontends and one with the rest of the stuff?

I think the primary reason for lock.h being included pretty widely is to
have the declaration of LOCKMODE. That's pretty widely used in headers
included by clients for various reasons. There's also a bit of fun
around xl_standby_locks.

I can have a go at trying to separate them, but I'm not convinced it's
going to look particularly pretty.

 Also, I think the reason for the #include is that lock.h has a couple
 of macros that reference MainLWLockArray.  The fact that you can omit
 the #include lwlock.h is therefore only accidental: if we had done those
 as static inline functions, this wouldn't work at all.  So I think this
 solution is not very future-proof either.

Yea, I saw that and concluded I'm not particularly bothered by it. Most
of the other similar macros are in lwlock.h itself, and if it became a
problem we could just move them alongside.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Robert Haas
On Thu, Aug 6, 2015 at 10:31 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-08-06 10:29:39 -0400, Robert Haas wrote:
 On Thu, Aug 6, 2015 at 3:09 AM, Andres Freund and...@anarazel.de wrote:
  It really doesn't. It's just fallout from indirectly including lwlock.h
  which includes an atomic variable. The include path leading to it is
 
  In file included from 
  /home/andres/src/postgresql/src/include/storage/lwlock.h:19:0,
   from 
  /home/andres/src/postgresql/src/include/storage/lock.h:18,
   from 
  /home/andres/src/postgresql/src/include/access/tuptoaster.h:18,
   from 
  /home/andres/src/postgresql/src/bin/pg_resetxlog/pg_resetxlog.c:49:
  /home/andres/src/postgresql/src/include/port/atomics.h:41:2: error: #error 
  THOU SHALL NOT REQUIRE ATOMICS
   #error THOU SHALL NOT REQUIRE ATOMICS

 Isn't that #include entirely superfluous?

 Which one?

Never mind, I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nodes/*funcs.c inconsistencies

2015-08-06 Thread Stephen Frost
Noah,

* Noah Misch (n...@leadboat.com) wrote:
 Rather than commit on an emergency basis in the few hours between these +1's
 and the wrap, I kept to my original schedule.  FYI, if it hadn't required
 emergency procedures (cancelling the day's plans so I could get to a notebook
 computer), I would have committed early based on the three +1s.

No worries, fully agreed that this was not an emergency in any way.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Alvaro Herrera
Andres Freund wrote:
 On 2015-08-06 10:27:52 -0400, Tom Lane wrote:
  Andres Freund and...@anarazel.de writes:
   One approach is to avoid including lwlock.h/slot.h in frontend
   code. That'll require some minor surgery and adding a couple includes,
   but it doesn't look that bad.
  
   Patch doing that attached.
  
  This seems kinda messy.  Looking at the contents of lock.h, it seems like
  getting rid of its dependency on lwlock.h is not really very appropriate,
  because there is boatloads of other backend-only stuff in there.  Why is
  any frontend code including lock.h at all?  If there is a valid reason,
  should we refactor lock.h into two separate headers, one that is safe to
  expose to frontends and one with the rest of the stuff?
 
 I think the primary reason for lock.h being included pretty widely is to
 have the declaration of LOCKMODE. That's pretty widely used in headers
 included by clients for various reasons. There's also a bit of fun
 around xl_standby_locks.

I think it is a good idea to split up LOCKMODE so that most headers do
not need to include lock.h at all; you will need to add an explicit
#include storage/lock.h to a lot of C files, but to me that's a good
thing.

See
http://doxygen.postgresql.org/lock_8h.html
Funnily enough, the included by graph is so large that my browser
(arguably a bit dated) fails to display it and shows a black rectangle
instead.  Chromium shows it, though it's illegible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Andres Freund
On 2015-08-06 12:05:24 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2015-08-06 10:27:52 -0400, Tom Lane wrote:
   Andres Freund and...@anarazel.de writes:
One approach is to avoid including lwlock.h/slot.h in frontend
code. That'll require some minor surgery and adding a couple includes,
but it doesn't look that bad.
   
Patch doing that attached.
   
   This seems kinda messy.  Looking at the contents of lock.h, it seems like
   getting rid of its dependency on lwlock.h is not really very appropriate,
   because there is boatloads of other backend-only stuff in there.  Why is
   any frontend code including lock.h at all?  If there is a valid reason,
   should we refactor lock.h into two separate headers, one that is safe to
   expose to frontends and one with the rest of the stuff?
  
  I think the primary reason for lock.h being included pretty widely is to
  have the declaration of LOCKMODE. That's pretty widely used in headers
  included by clients for various reasons. There's also a bit of fun
  around xl_standby_locks.
 
 I think it is a good idea to split up LOCKMODE so that most headers do
 not need to include lock.h at all; you will need to add an explicit
 #include storage/lock.h to a lot of C files, but to me that's a good
 thing.

I had to split of three things: LOCKMASK, the individual lock levels and
xl_standby_lock to be able to prohibit lock.h to be included by frontend
code. lockdefs.h works for me, counter proposals?

There weren't any places that needed additional lock.h includes. But
hashfn.c somewhat hilariously missed utils/hsearch.h ;)

Regards,

Andres
From bc9dc1910058a3e4638576261f24cb12471e7b15 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 6 Aug 2015 12:53:56 +0200
Subject: [PATCH] Don't include low level locking code from frontend code.

To avoid that happening again put some checks agains this happening into
atomics.h, lock.h, lwlock.h and s_lock.h.
---
 src/backend/utils/hash/hashfn.c |  1 +
 src/include/access/genam.h  |  2 +-
 src/include/access/hash.h   |  2 +-
 src/include/access/tuptoaster.h |  2 +-
 src/include/catalog/objectaddress.h |  2 +-
 src/include/port/atomics.h  |  4 +++
 src/include/storage/lock.h  | 43 
 src/include/storage/lockdefs.h  | 56 +
 src/include/storage/lwlock.h|  4 +++
 src/include/storage/procarray.h |  1 +
 src/include/storage/s_lock.h|  4 +++
 src/include/storage/standby.h   |  2 +-
 12 files changed, 80 insertions(+), 43 deletions(-)
 create mode 100644 src/include/storage/lockdefs.h

diff --git a/src/backend/utils/hash/hashfn.c b/src/backend/utils/hash/hashfn.c
index 260d880..bdd438d 100644
--- a/src/backend/utils/hash/hashfn.c
+++ b/src/backend/utils/hash/hashfn.c
@@ -22,6 +22,7 @@
 #include postgres.h
 
 #include access/hash.h
+#include utils/hsearch.h
 
 
 /*
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d86590a..d9d05a0 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -17,7 +17,7 @@
 #include access/sdir.h
 #include access/skey.h
 #include nodes/tidbitmap.h
-#include storage/lock.h
+#include storage/lockdefs.h
 #include utils/relcache.h
 #include utils/snapshot.h
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 93cc8af..97cb859 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -24,7 +24,7 @@
 #include fmgr.h
 #include lib/stringinfo.h
 #include storage/bufmgr.h
-#include storage/lock.h
+#include storage/lockdefs.h
 #include utils/relcache.h
 
 /*
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 7d18535..77f637e 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -14,8 +14,8 @@
 #define TUPTOASTER_H
 
 #include access/htup_details.h
+#include storage/lockdefs.h
 #include utils/relcache.h
-#include storage/lock.h
 
 /*
  * This enables de-toasting of index entries.  Needed until VACUUM is
diff --git a/src/include/catalog/objectaddress.h b/src/include/catalog/objectaddress.h
index 37808c0..0fc16ed 100644
--- a/src/include/catalog/objectaddress.h
+++ b/src/include/catalog/objectaddress.h
@@ -14,7 +14,7 @@
 #define OBJECTADDRESS_H
 
 #include nodes/pg_list.h
-#include storage/lock.h
+#include storage/lockdefs.h
 #include utils/acl.h
 #include utils/relcache.h
 
diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index bb87945..65cfa3f 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -37,6 +37,10 @@
 #ifndef ATOMICS_H
 #define ATOMICS_H
 
+#ifdef FRONTEND
+#error atomics.h may not be included from frontend code
+#endif
+
 #define INSIDE_ATOMICS_H
 
 #include limits.h
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 96fe3a6..a9cd08c 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -14,6 +14,11 

Re: [HACKERS] raw output from copy

2015-08-06 Thread Pavel Stehule
Hi,

Psql based implementation needs new infrastructure (more than few lines)

Missing:

* binary mode support
* parametrized query support,

I am not against, but both points I proposed, and both was rejected.

So why dont use current infrastructure? Raw copy is trivial patch.
Dne 6.8.2015 0:09 napsal uživatel Andrew Dunstan and...@dunslane.net:


 On 08/05/2015 04:59 PM, Heikki Linnakangas wrote:

 On 07/27/2015 02:28 PM, Pavel Stehule wrote:

 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

 What about input? This is a whole new feature, but it would be nice to be
 able to pass the file contents as a query parameter. Something like:

 \P /tmp/foo binary
 INSERT INTO foo VALUES (?);


 The example of input is strong reason, why don't do it via inserts. Only
 parsing some special ? symbol needs lot of new code.


 Sorry, I meant $1 in place of the ?. No special parsing needed, psql can
 send the query to the server as is, with the parameters that are given by
 this new mechanism.

 In this case, I don't see any advantage of  psql based solution. COPY is
 standard interface for input/output from/to files, and it should be used
 there.


 I'm not too happy with the COPY approach, although I won't object is one
 of the other committers feel more comfortable with it. However, we don't
 seem to be making progress here, so I'm going to mark this as Returned with
 Feedback. I don't feel good about that either, because I don't actually
 have any great suggestions on how to move this forward. Which is a pity
 because this is a genuine problem for users.



 This is really only a psql problem, IMNSHO. Inserting and extracting
 binary data is pretty trivial for most users of client libraries (e.g. it's
 a couple of lines of code in a DBD::Pg program), but it's hard in psql.

 I do agree that the COPY approach feels more than a little klunky.

 cheers

 andrew





Re: [HACKERS] deparsing utility commands

2015-08-06 Thread Alvaro Herrera
Jim Nasby wrote:

 FWIW, my interest in this is largely due to the problems I've had with this
 in the variant type. In particular, using the same resolution rules for
 functions and operators. So I'm wondering if there's a bigger issue here.

I'd be glad to review your variant stuff, but I doubt it's the same
thing at play.  Please don't hijack this thread.  Deparse is doing some
pretty nasty games with the backend innards to make the deparsed
representation usable generally.  We were just missing a simple trick.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Andres Freund
On 2015-08-06 12:29:15 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
 
  I had to split of three things: LOCKMASK, the individual lock levels and
  xl_standby_lock to be able to prohibit lock.h to be included by frontend
  code. lockdefs.h works for me, counter proposals?
  
  There weren't any places that needed additional lock.h includes.
 
 Ah, but that's because you cheated and didn't remove the include from
 namespace.h ...

Well, it's not included from frontend code, so I didn't see the need?
Going through all the backend code and replacing lock.h by lockdefs.h
and some other includes doesn't seem particularly beneficial to me.

FWIW, removing it from namespace.h is relatively easy. It starts to get
a lot more noisy when you want to touch heapam.h.

  diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
  new file mode 100644
  index 000..bfbcdba
  --- /dev/null
  +++ b/src/include/storage/lockdefs.h
  @@ -0,0 +1,56 @@
  +/*-
  + *
  + * lockdefs.h
  + *Frontend exposed parts of postgres' low level lock mechanism
  + *
  + * The split between lockdefs.h and lock.h is not very principled.
 
 No kidding!

Do you have a good suggestion about the split? I wanted to expose the
minimal amount necessary, and those were the ones.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] obsolete comments for InitializeMaxBackends

2015-08-06 Thread Robert Haas
While hacking today, I realized that a couple of comments related to
InitializeMaxBackends are obsolete.  Originally, the number of
background workers was determined just after processing
shared_preload_libraries, but I changed that in commit
6bc8ef0b7f1f1df3998745a66e1790e27424aa0c with the introduction of
max_worker_processes.  The attached patch fixes up the comments so
that they no longer offer up an obsolete rationale for what the code
does.  Assuming nobody minds, I plan to commit this, and, since the
comments are actually wrong and might confuse someone, back-patch this
change as far as 9.4, where the above-mentioned commit first appeared.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 000524d..0c8bc8f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -923,8 +923,9 @@ PostmasterMain(int argc, char *argv[])
 	process_shared_preload_libraries();
 
 	/*
-	 * Now that loadable modules have had their chance to register background
-	 * workers, calculate MaxBackends.
+	 * Calculate the maximum possible number of backends.  This must happen
+	 * after we have finished initalizing GUCs and before shared memory size
+	 * is determined.
 	 */
 	InitializeMaxBackends();
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7b19714..c99e7c0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -482,9 +482,8 @@ pg_split_opts(char **argv, int *argcp, const char *optstr)
 /*
  * Initialize MaxBackends value from config options.
  *
- * This must be called after modules have had the chance to register background
- * workers in shared_preload_libraries, and before shared memory size is
- * determined.
+ * This must happen after we finish setting the initial values for GUCs,
+ * and before shared memory size is determined.
  *
  * Note that in EXEC_BACKEND environment, the value is passed down from
  * postmaster to subprocesses via BackendParameters in SubPostmasterMain; only

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-08-06 Thread Jim Nasby

On 8/5/15 9:55 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 7/31/15 8:45 AM, Shulgin, Oleksandr wrote:



STATEMENT:  create view v1 as select * from t1 ;
ERROR:  operator does not exist: pg_catalog.oid = pg_catalog.oid at
character 52
HINT:  No operator matches the given name and argument type(s). You
might need to add explicit type casts.
QUERY:  SELECT * FROM pg_catalog.pg_rewrite WHERE ev_class = $1 AND
rulename = $2
CONTEXT:  PL/pgSQL function test_ddl_deparse() line 1 at FOR over SELECT
rows



I'm not sure what test_ddl_deparse is doing, is that where the oid = oid is
coming from?

It might be enlightening to replace = with OPERATOR(pg_catalog.=) and see if
that works.


That worked, thanks!


Good, but why weren't operator resolution rules working correctly here 
to begin with?


FWIW, my interest in this is largely due to the problems I've had with 
this in the variant type. In particular, using the same resolution rules 
for functions and operators. So I'm wondering if there's a bigger issue 
here.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Alvaro Herrera
Andres Freund wrote:

 I had to split of three things: LOCKMASK, the individual lock levels and
 xl_standby_lock to be able to prohibit lock.h to be included by frontend
 code. lockdefs.h works for me, counter proposals?
 
 There weren't any places that needed additional lock.h includes.

Ah, but that's because you cheated and didn't remove the include from
namespace.h ...

 But hashfn.c somewhat hilariously missed utils/hsearch.h ;)

hah.

 diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
 new file mode 100644
 index 000..bfbcdba
 --- /dev/null
 +++ b/src/include/storage/lockdefs.h
 @@ -0,0 +1,56 @@
 +/*-
 + *
 + * lockdefs.h
 + *  Frontend exposed parts of postgres' low level lock mechanism
 + *
 + * The split between lockdefs.h and lock.h is not very principled.

No kidding!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-08-06 Thread Jim Nasby

On 8/5/15 1:47 PM, Petr Jelinek wrote:

On 2015-08-05 20:09, Alvaro Herrera wrote:

Josh Berkus wrote:

On 08/05/2015 10:46 AM, Alvaro Herrera wrote:

1. Add the functions as a builtins.
This is what the current patch does.  Simon seems to prefer this,
because he wants the function to be always available in production;
but I don't like this option because adding functions as builtins
makes it impossible to move later to extensions.
Bruce doesn't like this option either.


Why would we want to move them later to extensions?


Because it's not nice to have random stuff as builtins.



Extensions have one nice property, they provide namespacing so not
everything has to be in pg_catalog which already has about gazilion
functions. It's nice to have stuff you don't need for day to day
operations separate but still available (which is why src/extensions is
better than contrib).


They also provide a level of control over what is and isn't installed in 
a cluster. Personally, I'd prefer that most users not even be aware of 
the existence of things like pageinspect.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-06 Thread Alvaro Herrera
Andres Freund wrote:
 On 2015-08-06 12:29:15 -0300, Alvaro Herrera wrote:

  Ah, but that's because you cheated and didn't remove the include from
  namespace.h ...
 
 Well, it's not included from frontend code, so I didn't see the need?
 Going through all the backend code and replacing lock.h by lockdefs.h
 and some other includes doesn't seem particularly beneficial to me.
 
 FWIW, removing it from namespace.h is relatively easy. It starts to get
 a lot more noisy when you want to touch heapam.h.

Ah, heapam.h is the granddaddy of all header messes, isn't it.  (Actually
it's execnodes.h or something like that.)


   diff --git a/src/include/storage/lockdefs.h 
   b/src/include/storage/lockdefs.h
   new file mode 100644
   index 000..bfbcdba
   --- /dev/null
   +++ b/src/include/storage/lockdefs.h
   @@ -0,0 +1,56 @@
   +/*-
   + *
   + * lockdefs.h
   + *  Frontend exposed parts of postgres' low level lock mechanism
   + *
   + * The split between lockdefs.h and lock.h is not very principled.
  
  No kidding!
 
 Do you have a good suggestion about the split? I wanted to expose the
 minimal amount necessary, and those were the ones.

Nope, nothing useful, sorry.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-06 Thread Robert Haas
On Wed, Aug 5, 2015 at 10:59 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Aug 4, 2015 at 8:59 PM, Robert Haas robertmh...@gmail.com wrote:

 I'm not entirely happy with the name nextClearXidElem but apart from
 that I'm fairly happy with this version.  We should probably test it
 to make sure I haven't broken anything;


 I have verified the patch and it is fine.  I have tested it via manual
 tests; for long pgbench tests, results are quite similar to previous
 versions of patch.

 Few changes, I have made in patch:

 1.

 +static void

 +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)

 +{

 + volatile PROC_HDR *procglobal = ProcGlobal;

 + uint32 nextidx;

 + uint32 wakeidx;

 + int extraWaits = -1;

 +

 + /* We should definitely have an XID to clear. */

 + Assert(TransactionIdIsValid(pgxact-xid));



 Here Assert is using pgxact which is wrong.

 2. Made ProcArrayEndTransactionInternal as inline function as
 suggested by you.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers