[HACKERS] Review of GetUserId() Usage

2014-09-23 Thread Stephen Frost
Greetings,

  While looking through the GetUserId() usage in the backend, I couldn't
  help but notice a few rather questionable cases that, in my view,
  should be cleaned up.

  As a reminder, GetUserId() returns the OID of the user we are
  generally operating as (eg: whatever the 'role' is, as GetUserId()
  respects SET ROLE).  It does NOT include roles which we currently have
  the privileges of (that would be has_privs_of_role()), nor roles which
  we could SET ROLE to (that's is_member_of_role, or check_is_... if you
  want to just error out in failure cases).

  On to the list-

  pgstat_get_backend_current_activity()
Used to decide if the current activity string should be returned or
not.  In my view, this is a clear case which should be addressed
through has_privs_of_role() instead of requiring the user to
SET ROLE to each role they are an inheirited member of to query for
what the other sessions are doing.

  pg_signal_backend()
Used to decide if pg_terminate_backend and pg_cancel_backend are
allowed.  Another case which should be changed over to
has_privs_of_role(), in my view.  Requiring the user to SET ROLE for
roles which they are an inheirited member of is confusing as it's
generally not required.

  pg_stat_get_activity()
Used to decide if the state information should be shared.
My opinion is the same as above- use has_privs_of_role().
There are a number of other functions in pgstatfuncs.c with similar
issues (eg: pg_stat_get_backend_activity(),
pg_stat_get_backend_client_port(), and others).

  Changing these would make things easier for some of our users, I'm
  sure..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Commitfest status

2014-09-23 Thread Heikki Linnakangas

On 09/20/2014 06:54 AM, Michael Paquier wrote:

CF3 is actually over for a couple of days,


There are different opinions on when a commitfest is over. In my 
opinion, the point of a commitfest is that every patch that someone 
submits gets enough review so that the patch author knows what he needs 
to do next. It's not determined by a date, but by progress.



wouldn't it be better to
bounce back patches marked as waiting on author and work on the rest
needing review?


Yep, it's time to do that.

I have now marked those patches that have been in Waiting on Author 
state, but have already been reviewed to some extent, as Returned with 
Feedback.


I kept a two patches:

* Flush buffers belonging to unlogged tables, and
* Function returning the timestamp of last transaction

The first one is a bug-fix, and the second one is stalled by a bug-fix 
that hasn't been applied yet. We should deal with them ASAP.



There are still plenty of patches in Needs review state. We got below 
20 at one point, but are back to 24 now. Reviewers: Please *review a 
patch*! We need to get closure to every patch.


Patch authors: Nag the reviewer of your patch. If that doesn't help, 
contact other people who you think would be qualified to review your 
patch, and ask them nicely to review your patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Heikki Linnakangas

On 09/23/2014 09:15 AM, Peter Geoghegan wrote:

On Mon, Sep 22, 2014 at 8:25 PM, Robert Haas robertmh...@gmail.com wrote:

Should RLS be reverted, and revisited in a future CF?


IMHO, that would be entirely appropriate.


That seems pretty straightforward, then. I think that it will have to
be reverted.


OTOH, if the patch is actually OK as it was committed, there's no point 
reverting it only to commit it again later. At the end of the day, the 
important thing is that the patch gets sufficient review. Clearly 
Stephen thinks that it did, while you and Andres do not.


To make sure this doesn't just slip by without sufficient review, I'll 
add this to the next commitfest. It's a bit unusual to have a commitfest 
entry for something that's already been committed, but that's fine.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extending COPY TO

2014-09-23 Thread Andrea Riciputi
Hi all,
it’s my first time here, so please let me know if I’m doing something wrong. 
I’m a developer, heavy PG user, but I’ve never hacked it before. Last week at 
work we had to produce a quite big CSV data file which should be used as input 
by another piece of software.

Since the file must be produced on a daily basis, is big, and it contains data 
stored in our PG database, letting PG produce the file itself seemed the right 
approach. Unfortunately the target software is, let say, “legacy” software and 
can only accept CRLF as EOL character. Since our PG is installed on a Linux 
server COPY TO results in a CSV file with LF as EOL, forcing us to pass the 
file a second time to convert EOL, which is inconvenient.

My idea was to extend the COPY TO command to accept an EOL option as it already 
does with the DELIMITER option. To keep it simple we can limit the EOL choice 
to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour 
when no EOL option is given. I was also wondering if an EOL option could be 
useful also for the text output format or not.

I spent the weekend reading the COPY command source code and its grammar 
definition and I think I can patch it by myself, submit the patch here and wait 
for your review. However before starting this in my spare time I wanted to know 
if you, as the PG hackers community, would be against a similar proposal for 
any reason, and if so why.

Thanks in advance,
Andrea

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Peter Geoghegan
On Mon, Sep 22, 2014 at 11:45 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 To make sure this doesn't just slip by without sufficient review, I'll add
 this to the next commitfest. It's a bit unusual to have a commitfest entry
 for something that's already been committed, but that's fine.

That seems sensible.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we excise the remnants of borland cc support?

2014-09-23 Thread Magnus Hagander
On Sep 23, 2014 2:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@2ndquadrant.com writes:
  On 2014-09-20 10:03:43 -0400, Andrew Dunstan wrote:
  I thought the Borland stuff was there only so we could build client
  libraries for use with things like Delphi.

  FWIW I got offlist reports of two not subscribed people that they simply
  use the normal libpq dll from delphi. Copying it from pgadmin or the pg
  installer.

 Whether or not it's really needed to preserve the ability to build libpq
 with borland, I'm just about certain that it's never worked to build the
 backend with borland (thus explaining the lack of buildfarm members).
 So it should be safe enough to strip support appearing in backend-only
 header files.


The backend has never built with borland.

I'm pretty sure I suggested we drop borland support completely a few years
ago but people felt it wasnt costing enough to warrant a drop at the time.
Things may have changed now, but even without that we  can definitely drop
the backend side of things.

/Magnus


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Heikki Linnakangas

Some random comments after a quick read-through of the patch:

* Missing documentation. For a major feature like this, reference pages 
for the CREATE/DROP POLICY statements are not sufficient. We'll need a 
whole new chapter for this.


* In CREATE POLICY, the USING and WITH CHECK keywords are not very 
descriptive. I kind of understand WITH CHECK - it's applied to new rows 
like a CHECK constraint - but USING seems rather arbitrary and WITH 
CHECK isn't all that clear either. Perhaps ON SELECT CHECK and ON 
UPDATE CHECK or something like that would be better. I guess USING 
makes sense when that's the only expression given, so that it applies to 
both SELECTs and UPDATEs. But when a WITH CHECK expression is given 
together with USING, it gets confusing.



+   if (is_from)
+   ereport(ERROR,
+   
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg(COPY FROM not supported 
with row security.),
+errhint(Use direct INSERT 
statements instead.)));
+


Why is that not implemented? I think it should be.

* In src/backend/commands/createas.c:


@@ -420,6 +421,19 @@ intorel_startup(DestReceiver *self, int operation, 
TupleDesc typeinfo)
ExecCheckRTPerms(list_make1(rte), true);

/*
+* Make sure the constructed table does not have RLS enabled.
+*
+* check_enable_rls() will ereport(ERROR) itself if the user has 
requested
+* something invalid, and otherwise will return RLS_ENABLED if RLS 
should
+* be enabled here.  We don't actually support that currently, so throw
+* our own ereport(ERROR) if that happens.
+*/
+   if (check_enable_rls(intoRelationId, InvalidOid) == RLS_ENABLED)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+(errmsg(policies not yet implemented for this 
command;
+
+   /*
 * Tentatively mark the target as populated, if it's a matview and we're
 * going to fill it; otherwise, no change needed.
 */


Hmm. So, if a table we just created with CREATE TABLE AS has row-level 
security enabled, we throw an error? Can that actually happen? AFAICS a 
freshly-created table shouldn't can't have row-level security enabled. 
Or is row-level security enabled/disabled status copied from the source 
table?


* Row-level security is not allowed for foreign tables. Why not? It's 
perhaps not a very useful feature, but I don't see any immediate reason 
to forbid it either. How about views?


* The new pg_class.relhasrowsecurity column is not updated lazily like 
most other relhas* columns. That's intentional and documented, but 
perhaps it would've been better to name the column differently, to avoid 
confusion. Maybe just relrowsecurity


* In rewrite/rowsecurity:


+ * Policies can be defined for specific roles, specific commands, or provided
+ * by an extension.


How do you define a policy for a specific role? There's a hook for the 
extensions in there, but I didn't see any documentation mentioning that 
an extension might provide policies, nor any developer documentation on 
how to use the hook.


* In pg_backup_archiver.c:


/*
 * Enable row-security if necessary.
 */
if (!ropt-enable_row_security)
ahprintf(AH, SET row_security = off;\n);
else
ahprintf(AH, SET row_security = on;\n);


That makes pg_restore to throw an error if you try to restore to a 
pre-9.5 server. I'm not sure if using a newer pg_restore against an 
older server is supported in general, but without the above it works in 
simple cases, at least. It would be trivi


* The new --enable-row-security option to pg_restore is not documented 
in the user manual.


* in src/bin/psql/describe.c:


@@ -2529,6 +2640,11 @@ describeRoles(const char *pattern, bool verbose)
appendPQExpBufferStr(buf, \n, r.rolreplication);
}

+   if (pset.sversion = 90500)
+   {
+   appendPQExpBufferStr(buf, \n, r.rolbypassrls);
+   }
+
appendPQExpBufferStr(buf, \nFROM pg_catalog.pg_roles r\n);


The rolbypassrls column isn't actually used for anything in this function.


In addition to the above, attached is a patch with some trivial fixes.

- Heikki

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 76d6405..bae08ee 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1943,6 +1943,7 @@
  row
   entrystructfieldrelhasrowsecurity/structfield/entry
   entrytypebool/type/entry
+  entry/entry
   entry
True if table has row-security enabled; see
link linkend=catalog-pg-rowsecuritystructnamepg_rowsecurity/structname/link catalog
diff 

Re: [HACKERS] better atomics - v0.6

2014-09-23 Thread Oskari Saarenmaa

23.09.2014, 00:01, Andres Freund kirjoitti:

I've finally managed to incorporate (all the?) feedback I got for
0.5. Imo the current version looks pretty good.

Most notable changes:
* Lots of comment improvements
* code moved out of storage/ into port/
* separated the s_lock.h changes into its own commit
* merged i386/amd64 into one file
* fixed lots of the little details Amit noticed
* fixed lots of XXX/FIXMEs
* rebased to today's master
* tested various gcc/msvc versions
* extended the regression tests
* ...

The patches:
0001: The actual atomics API


I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm 
animal dingo) with this patch but regression tests failed due to:


/export/home/os/postgresql/src/test/regress/regress.so: symbol 
pg_write_barrier_impl: referenced symbol not found


which turns out to be caused by a leftover PG_ prefix in ifdefs for 
HAVE_GCC__ATOMIC_INT64_CAS.  Removing the PG_ prefix fixed the build and 
regression tests.  Attached a patch to strip the invalid prefix.



0002: Implement s_lock.h support ontop the atomics API. Existing
   implementations continue to be used unless
   FORCE_ATOMICS_BASED_SPINLOCKS is defined


Applied this and built PG with and without FORCE_ATOMICS_BASED_SPINLOCKS 
- both builds passed regression tests.



0003-0005: Not proposed for review here. Just included because code
  actually using the atomics make testing them easier.


I'll look at these patches later.

/ Oskari
From 207b02cd7ee6d38b92b51195a951713639f0d738 Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa o...@ohmu.fi
Date: Tue, 23 Sep 2014 13:28:51 +0300
Subject: [PATCH] atomics: fix ifdefs for gcc atomics

---
 src/include/port/atomics/generic-gcc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/port/atomics/generic-gcc.h b/src/include/port/atomics/generic-gcc.h
index 9b0c636..f82af01 100644
--- a/src/include/port/atomics/generic-gcc.h
+++ b/src/include/port/atomics/generic-gcc.h
@@ -40,19 +40,19 @@
  * definitions where possible, and use this only as a fallback.
  */
 #if !defined(pg_memory_barrier_impl)
-#	if defined(HAVE_GCC__PG_ATOMIC_INT64_CAS)
+#	if defined(HAVE_GCC__ATOMIC_INT64_CAS)
 #		define pg_memory_barrier_impl()		__atomic_thread_fence(__ATOMIC_SEQ_CST)
 #	elif (__GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__ = 1))
 #		define pg_memory_barrier_impl()		__sync_synchronize()
 #	endif
 #endif /* !defined(pg_memory_barrier_impl) */
 
-#if !defined(pg_read_barrier_impl)  defined(HAVE_GCC__PG_ATOMIC_INT64_CAS)
+#if !defined(pg_read_barrier_impl)  defined(HAVE_GCC__ATOMIC_INT64_CAS)
 /* acquire semantics include read barrier semantics */
 #		define pg_read_barrier_impl()		__atomic_thread_fence(__ATOMIC_ACQUIRE)
 #endif
 
-#if !defined(pg_write_barrier_impl)  defined(HAVE_GCC__PG_ATOMIC_INT64_CAS)
+#if !defined(pg_write_barrier_impl)  defined(HAVE_GCC__ATOMIC_INT64_CAS)
 /* release semantics include write barrier semantics */
 #		define pg_write_barrier_impl()		__atomic_thread_fence(__ATOMIC_RELEASE)
 #endif
@@ -101,7 +101,7 @@ typedef struct pg_atomic_uint64
 	volatile uint64 value;
 } pg_atomic_uint64;
 
-#endif /* defined(HAVE_GCC__PG_ATOMIC_INT64_CAS) || defined(HAVE_GCC__SYNC_INT64_CAS) */
+#endif /* defined(HAVE_GCC__ATOMIC_INT64_CAS) || defined(HAVE_GCC__SYNC_INT64_CAS) */
 
 /*
  * Implementation follows. Inlined or directly included from atomics.c
-- 
1.8.4.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.6

2014-09-23 Thread Andres Freund
On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote:
 23.09.2014, 00:01, Andres Freund kirjoitti:
 I've finally managed to incorporate (all the?) feedback I got for
 0.5. Imo the current version looks pretty good.
 
 Most notable changes:
 * Lots of comment improvements
 * code moved out of storage/ into port/
 * separated the s_lock.h changes into its own commit
 * merged i386/amd64 into one file
 * fixed lots of the little details Amit noticed
 * fixed lots of XXX/FIXMEs
 * rebased to today's master
 * tested various gcc/msvc versions
 * extended the regression tests
 * ...
 
 The patches:
 0001: The actual atomics API
 
 I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal
 dingo) with this patch but regression tests failed due to:
 
 /export/home/os/postgresql/src/test/regress/regress.so: symbol
 pg_write_barrier_impl: referenced symbol not found
 
 which turns out to be caused by a leftover PG_ prefix in ifdefs for
 HAVE_GCC__ATOMIC_INT64_CAS.  Removing the PG_ prefix fixed the build and
 regression tests.  Attached a patch to strip the invalid prefix.

Gah. Stupid last minute changes... Thanks for diagnosing. Will
integrate.

 0002: Implement s_lock.h support ontop the atomics API. Existing
implementations continue to be used unless
FORCE_ATOMICS_BASED_SPINLOCKS is defined
 
 Applied this and built PG with and without FORCE_ATOMICS_BASED_SPINLOCKS -
 both builds passed regression tests.

Cool.

 0003-0005: Not proposed for review here. Just included because code
   actually using the atomics make testing them easier.
 
 I'll look at these patches later.

Cool. Although I'm not proposing them to be integrated as-is...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.6

2014-09-23 Thread Andres Freund
Hi,

On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote:
 23.09.2014, 00:01, Andres Freund kirjoitti:
 I've finally managed to incorporate (all the?) feedback I got for
 0.5. Imo the current version looks pretty good.
 
 Most notable changes:
 * Lots of comment improvements
 * code moved out of storage/ into port/
 * separated the s_lock.h changes into its own commit
 * merged i386/amd64 into one file
 * fixed lots of the little details Amit noticed
 * fixed lots of XXX/FIXMEs
 * rebased to today's master
 * tested various gcc/msvc versions
 * extended the regression tests
 * ...
 
 The patches:
 0001: The actual atomics API
 
 I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal
 dingo) with this patch but regression tests failed due to:

Btw, if you could try sun studio it'd be great. I wrote the support for
it blindly, and I'd be surprised if I got it right on the first try.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Stephen Frost
Heikki,

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 Some random comments after a quick read-through of the patch:

Glad you were able to find a bit of time to take a look, thanks!

 * Missing documentation. For a major feature like this, reference
 pages for the CREATE/DROP POLICY statements are not sufficient.
 We'll need a whole new chapter for this.

Peter mentioned this too and I've been working on adding documentation.
The general capability, at least in my view, is pretty clear but I agree
that examples and adding more about the trade-offs would be useful.  Do
you have any opinion regarding additional documentation for updatable
security-barrier views?  I'm trying to work out a way to share this new
documentation with that since there is overlap as they share a lot of
the caveats, given that they use the same code underneath.

 * In CREATE POLICY, the USING and WITH CHECK keywords are not
 very descriptive. I kind of understand WITH CHECK - it's applied to
 new rows like a CHECK constraint - but USING seems rather arbitrary
 and WITH CHECK isn't all that clear either. Perhaps ON SELECT
 CHECK and ON UPDATE CHECK or something like that would be better.

Right, WITH CHECK matches up with the SQL standard 'WITH CHECK OPTION'.

 I guess USING makes sense when that's the only expression given, so
 that it applies to both SELECTs and UPDATEs. But when a WITH CHECK
 expression is given together with USING, it gets confusing.

Right, USING was there in the original patch back in January (actually,
it might be farther back, there were versions of the patch prior to
that) while WITH CHECK was added more recently.

 +   if (is_from)
 +   ereport(ERROR,
 +   
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 +errmsg(COPY FROM not 
 supported with row security.),
 +errhint(Use direct INSERT 
 statements instead.)));
 +
 
 Why is that not implemented? I think it should be.

It's certainly an item that I was planning to look at addressing, though
only for completeness.  COPY's focus is providing a faster interface and
the RLS checks would end up dwarfing the overall time and making COPY
not actually be usefully faster than INSERT.

 * In src/backend/commands/createas.c:
 
 @@ -420,6 +421,19 @@ intorel_startup(DestReceiver *self, int operation, 
 TupleDesc typeinfo)
 ExecCheckRTPerms(list_make1(rte), true);
 
 /*
 +* Make sure the constructed table does not have RLS enabled.
 +*
 +* check_enable_rls() will ereport(ERROR) itself if the user has 
 requested
 +* something invalid, and otherwise will return RLS_ENABLED if RLS 
 should
 +* be enabled here.  We don't actually support that currently, so 
 throw
 +* our own ereport(ERROR) if that happens.
 +*/
 +   if (check_enable_rls(intoRelationId, InvalidOid) == RLS_ENABLED)
 +   ereport(ERROR,
 +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 +(errmsg(policies not yet implemented for 
 this command;
 +
 +   /*
  * Tentatively mark the target as populated, if it's a matview and 
  we're
  * going to fill it; otherwise, no change needed.
  */
 
 Hmm. So, if a table we just created with CREATE TABLE AS has
 row-level security enabled, we throw an error? Can that actually
 happen? 

I don't believe it actually can happen, but I wanted to be sure that the
case was covered in the event that support is added.  Essentially, I
carefully went through looking at all of the existing ExecCheckRTPerms()
calls and made sure RLS was handled in all of those cases, in one way or
another.  As I expect you noticed, I also updated the comments in
ExecCheckRTPerms() to reflect that RLS needs to be addressed as well as
normal checks when returning tuples or adding new tuples.

 AFAICS a freshly-created table shouldn't can't have
 row-level security enabled. Or is row-level security
 enabled/disabled status copied from the source table?

It's not copied, no.  That didn't make a lot of sense to me as we don't
even have an option to copy permissions.

 * Row-level security is not allowed for foreign tables. Why not?
 It's perhaps not a very useful feature, but I don't see any
 immediate reason to forbid it either. How about views?

The question about views came up but with views you could simply add the
quals into the view definition instead of combining views and RLS.
Still, I'm not against it, but the patch certainly seemed large enough
to me that it made sense to move forward with the basic, but complete,
implementation for tables.  Foreign tables fell into the same bucket but
perhaps it shouldn't have..  I'll definitely take a look at what the
level of complexity there is.

 * The new pg_class.relhasrowsecurity column is not 

Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Dave Page
On Tue, Sep 23, 2014 at 4:25 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Sep 22, 2014 at 7:22 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Sep 22, 2014 at 4:02 PM, Andres Freund and...@anarazel.de wrote:
 This patch has been pushed in a clear violation of established policy.

 Fundamental pieces of the patch have changed *after* the commitfest
 started. And there wasn't a recent patch in the commitfest either - the
 entry was moved over from the last round without a new patch.  It didn't
 receive independent review (Robert explicitly said his wasn't a full
 review).  It wasn't marked ready for committer.  The intention to commit
 wasn't announced publically.  There were *clear* and unaddressed
 objections to committing the patch as is, by a committer (Robert)
 nonetheless.

 I have no reason to doubt your version of events here

 Fortunately, you don't have to take anything on faith.  This is a
 public mailing list, and the exact sequence of events is open to
 inspection by anyone who cares to take a few minutes to do so.  You
 can easily verify whether my statement that I asked Stephen twice to
 hold off committing it is correct or not; and you can also verify the
 rest of the history that Andres and I recounted.  This is all there in
 black and white.

Just to be clear here, the *only* issue we should even be discussing
is whether the patch should or should not have been committed in the
face of those objections. As Josh has also noted, the commitfest
process was never meant to constrain what committers do or when they
do it with their own patches or ones they've worked heavily on. They
are there as a backstop to make sure that regardless of what the
committers are doing day to day, patch authors know that their patch
is expected to receive some review within N weeks.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-23 Thread Rajeev rastogi
On 22 September 2014 19:17, Heikki Linnakangas wrote:
 
 On 09/22/2014 04:45 PM, Tom Lane wrote:
  Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 09/22/2014 07:47 AM, Rajeev rastogi wrote:
  So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
 
  The same happens in a single-column case. If you have a query like
  SELECT * FROM tbl2 where id2  'a', once you've found the start
  position of the scan, you know that all the rows that follow match
 too.
 
  ... unless you're doing a backwards scan.
 
 Sure. And you have to still check for NULLs. Have to get the details
 right..

I have finished implementation of the discussed optimization.
I got a performance improvement of around 30% on the schema and data shared 
in earlier mail.

I also tested for the index scan case, where our optimization is not done and 
observed that there
is no effect on those query because of this change.

Change details:
I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey 
structure), the value used for this 
0x0004, which was unused.
Inside the function _bt_first, once we finish finding the start scan position 
based on the first key, 
I am appending the flag SK_BT_MATCHED to the first key.
Then in the function _bt_checkkeys, during the key comparison, I am checking if 
the key has SK_BT_MATCHED flag set, if yes then
there is no need to further comparison. But if the tuple is having NULL value, 
then even if this flag is set, we will continue
with further comparison (this handles the Heikki point of checking NULLs).

I will add this patch to the next CommitFest.

Please let me know your feedback. 

Thanks and Regards,
Kumar Rajeev Rastogi




index_scan_opt.patch
Description: index_scan_opt.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: adding a GUC for BAS_BULKREAD strategy

2014-09-23 Thread didier
Hi,

Currently the value is hard code to NBuffers / 4 but ISTM that with
bigger shared_buffer it's too much, ie even with a DB 10 to 20 time
the memory size there's a lot of tables under this limit and nightly
batch reports are trashing the shared buffers cache as if there's no
tomorrow.


regards,


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: adding a GUC for BAS_BULKREAD strategy

2014-09-23 Thread Andres Freund
Hi,

On 2014-09-23 14:47:33 +0200, didier wrote:
 Currently the value is hard code to NBuffers / 4 but ISTM that with
 bigger shared_buffer it's too much, ie even with a DB 10 to 20 time
 the memory size there's a lot of tables under this limit and nightly
 batch reports are trashing the shared buffers cache as if there's no
 tomorrow.

I'd like the ability to tune this as well. Even though I more often want
to entirely disable it, rather than make it more aggressive. For
workloads where the majority of the read data fits into shared_buffers
and sequential scans over large relations are something happening
frequently, the current strategy *sucks* because the large table will
pretty much never get cached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Andres Freund
On 2014-09-23 13:23:32 +0100, Dave Page wrote:
 Just to be clear here, the *only* issue we should even be discussing
 is whether the patch should or should not have been committed in the
 face of those objections. As Josh has also noted, the commitfest
 process was never meant to constrain what committers do or when they
 do it with their own patches or ones they've worked heavily on. They
 are there as a backstop to make sure that regardless of what the
 committers are doing day to day, patch authors know that their patch
 is expected to receive some review within N weeks.

FWIW, while not really at the core of the problem here, I don't think
this is entirely true anymore.

We certainly seem to to expect bigger feature patches to go through the
commitfest process to some degree. Just look at the discussions about
*committers* patches being committed or not at each cycles last
commitfest. Every single time the point in time they've been submitted
to which CF plays a rather prominent role in the discussion.

Also look at committers like Robert that *do* feel constrained about
when to commit or even expect review for submitted patches.

I think it's obvious that a committer doesn't need to wait till some
later commitfest to commit patches that have since gotten enough review
or are uncontroversial. Neither is the case here.

I also think committers need to be much more careful when committing
patches which they (or their employer) appear to have a business
interest in. Rushing ahead to commit the patch of somebody 'unrelated'
leaves a completely different taste than committing your colleagues
patch. *INDEPENDENT* of the actual reasons and the state of the patch.

Perhaps we can use this as a chance to make the review process a bit
better defined. There certainly have been a few patches by commiters
over the last years that have had a noticeable negative impact. Some of
which might have been cought by more diligent review.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Andres Freund
On 2014-09-22 21:38:17 -0700, David G Johnston wrote:
 Robert Haas wrote
  It's difficult to imagine a more flagrant violation of process than
  committing a patch without any warning and without even *commenting*
  on the fact that clear objections to commit were made on a public
  mailing list.  If that is allowed to stand, what can we assume other
  than that Stephen, at least, has a blank check to change anything he
  wants, any time he wants, with no veto possible from anyone else?
 
 I'm of a mind to agree that this shouldn't have been committed...but I'm not
 seeing where Stephen has done sufficient wrong to justify crucifixion. 

I've not seen much in the way of 'crucifixion' before this email. And I
explicitly *don't* think it's warranted. Also it's not happening.

 At this point my hindsight says a strictly declaratory statement of reasons
 this is not ready combined with reverting the patch would have been
 sufficient; or even just a I am going to revert this for these reasons
 post.  The difference between building support for a revert and gathering a
 mob is a pretty thin line.

The reason it's being discussed is to find a way to align the different
views about when to commit stuff. The primary goal is *not* to revert
the commit or anything but to make sure we're not slipping into
procedures we all would regret. Which *really* can happen very
easily. We're all humans and most of us have more than enough to do.

 Though I guess if you indeed feel that his actions were truly heinous you
 should also then put forth the proposal that his ability to commit be
 revoked.

I think *you* are escalating this to something unwarranted here by the
way you're painting the discussion.

 If your not willing to go to that extent then, unless you know
 Stephen personally, I'd not assume that public flogging is the best way to
 get him to not mess up in the future; but the honest and cordial dialog
 about cause/effect is likely to be more productive and less
 self-destructing.  Though, to each their own style.

All the people involved know Stephen personally. There's also no public
flogging.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 9:00 AM, Andres Freund and...@anarazel.de wrote:
 I think it's obvious that a committer doesn't need to wait till some
 later commitfest to commit patches that have since gotten enough review
 or are uncontroversial. Neither is the case here.

Right.  I mean, the occasionally-floated notion that committers can't
commit things when there's no CommitFest ongoing is obviously flat
wrong, as a quick look at the commit log will demonstrate.

On the flip side, it's unreasonable to expect that as soon as a
committer posts 7000-line patch, everyone who possibly has an
objection to that patch will drop everything that they are doing to
object to anything they don't like about it.  It can easily take a
week to review such a patch thoroughly even if you start the minute it
hits the list.

Many committers, including me, have held off on committing patches for
many months to make sure that the community got time to review them,
even though nobody explicitly objected.  This patch was committed much
faster than that and over an explicit objection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of GetUserId() Usage

2014-09-23 Thread Alvaro Herrera
Stephen Frost wrote:

   pgstat_get_backend_current_activity()
 Used to decide if the current activity string should be returned or
 not.  In my view, this is a clear case which should be addressed
 through has_privs_of_role() instead of requiring the user to
 SET ROLE to each role they are an inheirited member of to query for
 what the other sessions are doing.
 
   pg_signal_backend()
 Used to decide if pg_terminate_backend and pg_cancel_backend are
 allowed.  Another case which should be changed over to
 has_privs_of_role(), in my view.  Requiring the user to SET ROLE for
 roles which they are an inheirited member of is confusing as it's
 generally not required.
 
   pg_stat_get_activity()
 Used to decide if the state information should be shared.
 My opinion is the same as above- use has_privs_of_role().
 There are a number of other functions in pgstatfuncs.c with similar
 issues (eg: pg_stat_get_backend_activity(),
 pg_stat_get_backend_client_port(), and others).

I think the case for pgstat_get_backend_current_activity() and
pg_stat_get_activity and the other pgstatfuncs.c callers is easy to make
and seems acceptable to me; but I would leave pg_signal_backend out of
that discussion, because it has a potentially harmful side effect.  By
requiring SET ROLE you add an extra layer of protection against
mistakes.  (Hopefully, pg_signal_backend() is not a routine thing for
well-run systems, which means human intervention, and therefore the room
for error isn't insignificant.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 12:38 AM, David G Johnston
david.g.johns...@gmail.com wrote:
 I'm of a mind to agree that this shouldn't have been committed...but I'm not
 seeing where Stephen has done sufficient wrong to justify crucifixion.
 Especially since everything is being done publicly and you are one of the
 many people in the position to flex a veto by reverting the patch.

I'd be interested to see what the reaction would be if I reverted this
patch out of the blue.  I suspect it would not be positive.  More
generally, I don't want the PostgreSQL source code repository to
ground zero for a revert war.

 Subsequent, possibly private, discussion between you and Stephen could then
 occur before making any conclusions you form public so that others can learn
 from the experience and ponder whether anything could be changed to mitigate
 such situations in the future.

I'd be happy to discuss this with Stephen, either in person, by phone,
or over public or private email.  Unfortunately, although he's posted
many other emails to this list over the last couple of days, he hasn't
chosen to respond, publicly or privately, to my statement that he must
have read the email in which I asked him to hold off committing
(because he addressed technical feedback from that same email) and
that he went ahead and did it anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-23 Thread Fabien COELHO


Hello Stephen,


But this is not convincing. Adding a unary function with a clean
syntax indeed requires doing something with the parser.


Based on the discussion so far, it sounds like you're coming around to
agree with Robert (as I'm leaning towards also) that we'd be better off
building a real syntax here instead.


Not exactly.

My actual opinion is that it is really an orthogonal issue. ISTM that a 
similar code would be required somehow for the executor part of an 
hypothetical real syntax if it was to support modulo.


If so, do you anticipate having a patch to do so in the next few days, 
or...?


Developping a full expression syntax is a significant project that I'm not 
planing to undertake in the short or medium term, especially as I'm not 
convinced that it is worth it: It would involve many changes, and I'm 
afraid that the likelyhood of the patch being rejected on some ground is 
high.


So my opinion is that this small modulo operator patch is both useful and 
harmless, so it should be committed. If someday there is a nice real 
syntax implemented, that will be great as well.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread David Johnston
On Tue, Sep 23, 2014 at 9:09 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-09-22 21:38:17 -0700, David G Johnston wrote:
  Robert Haas wrote
   It's difficult to imagine a more flagrant violation of process than
   committing a patch without any warning and without even *commenting*
   on the fact that clear objections to commit were made on a public
   mailing list.  If that is allowed to stand, what can we assume other
   than that Stephen, at least, has a blank check to change anything he
   wants, any time he wants, with no veto possible from anyone else?
 
  I'm of a mind to agree that this shouldn't have been committed...but I'm
 not
  seeing where Stephen has done sufficient wrong to justify crucifixion.

 I've not seen much in the way of 'crucifixion' before this email. And I
 explicitly *don't* think it's warranted. Also it's not happening.


I maybe got a little carried​ away with my hyperbole...



  At this point my hindsight says a strictly declaratory statement of
 reasons
  this is not ready combined with reverting the patch would have been
  sufficient; or even just a I am going to revert this for these reasons
  post.  The difference between building support for a revert and
 gathering a
  mob is a pretty thin line.

 The reason it's being discussed is to find a way to align the different
 views about when to commit stuff. The primary goal is *not* to revert
 the commit or anything but to make sure we're not slipping into
 procedures we all would regret. Which *really* can happen very
 easily. We're all humans and most of us have more than enough to do.


​So, the second option then...​and I'm sorry but this should never have
been committed tends to cause one to think it should therefore be reverted.


  Though I guess if you indeed feel that his actions were truly heinous you
  should also then put forth the proposal that his ability to commit be
  revoked.

 I think *you* are escalating this to something unwarranted here by the
 way you're painting the discussion.


​Not everyone who reads -hackers knows all the people involved personally.
I had an initial reaction to these e-mails that I thought I would share,
nothing more.  I'm not going to quote the different comments that led me to
my feeling that the response to this was disproportionate to the offense
but after a first pass - which is all many people would do - that is what I
came away with.  Though you could say I fell into the very same trap by
reacting off my first impression...

David J.


Re: [HACKERS] tick buildfarm failure

2014-09-23 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
   To be a bit more clear- why is it safe to change the contents if the
   equal() function returns 'false'?  I'm guessing the answer is
   locking- whereby such a change would imply a lock was acquired on
   the relation by another backend, which shouldn't be possible if we're
   currently working with it, but wanted to double-check that.

Right.  If that were to fail, it would be the fault of something else
not having gotten sufficient lock for whatever it had been doing: either
lack of exclusive lock for an RLS update, or lack of an access lock to
examine the cached data.  The reason we want to make the equal() check
rather than just summarily replace the data is that code that *does*
hold AccessShare might reasonably expect the data to not move while it's
looking at it.
  
   I also wonder if we might be better off with a way to identify that
   nothing has actually been changed due to the locks we hold and avoid
   rebuilding the relation cache entry entirely..

I am entirely disinclined to reinvent relcache as part of the RLS patch.
You've got enough problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-23 Thread Tom Lane
Fabien COELHO coe...@cri.ensmp.fr writes:
 So my opinion is that this small modulo operator patch is both useful and 
 harmless, so it should be committed.

You've really failed to make that case --- in particular, AFAICS there is
not even consensus on the exact semantics that the operator should have.
So I'm inclined to reject rather than put in something that may cause
surprises down the road.  The usefulness doesn't seem great enough to
take that risk.

The way forward, if we think there is enough value in it (I'm not
sure there is), would be to build enough expression infrastructure
so that we could inexpensively add both operators and functions.
Then we could add a modulo operator with whatever semantics seem
most popular, and some function(s) for the other semantics, and
there would not be so much riding on choosing the right semantics.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tick buildfarm failure

2014-09-23 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
To be a bit more clear- why is it safe to change the contents if the
equal() function returns 'false'?  I'm guessing the answer is
locking- whereby such a change would imply a lock was acquired on
the relation by another backend, which shouldn't be possible if we're
currently working with it, but wanted to double-check that.
 
 Right.  If that were to fail, it would be the fault of something else
 not having gotten sufficient lock for whatever it had been doing: either
 lack of exclusive lock for an RLS update, or lack of an access lock to
 examine the cached data.  The reason we want to make the equal() check
 rather than just summarily replace the data is that code that *does*
 hold AccessShare might reasonably expect the data to not move while it's
 looking at it.

Ok, great, thanks for the confirmation.  Yes- the RLS code wasn't
anticipating a change happening underneath it such as this (as other
places didn't appear worried about same, I didn't expect to see it be an
issue).  No problem, I'll definitely address it.

I also wonder if we might be better off with a way to identify that
nothing has actually been changed due to the locks we hold and avoid
rebuilding the relation cache entry entirely..
 
 I am entirely disinclined to reinvent relcache as part of the RLS patch.

Apologies- I hadn't intend to even suggest that but rather to speculate
about this possibility for future improvements.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-23 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Fabien COELHO coe...@cri.ensmp.fr writes:
  So my opinion is that this small modulo operator patch is both useful and 
  harmless, so it should be committed.
 
 You've really failed to make that case --- in particular, AFAICS there is
 not even consensus on the exact semantics that the operator should have.
 So I'm inclined to reject rather than put in something that may cause
 surprises down the road.  The usefulness doesn't seem great enough to
 take that risk.

Agreed.

 The way forward, if we think there is enough value in it (I'm not
 sure there is), would be to build enough expression infrastructure
 so that we could inexpensively add both operators and functions.
 Then we could add a modulo operator with whatever semantics seem
 most popular, and some function(s) for the other semantics, and
 there would not be so much riding on choosing the right semantics.

Indeed and there's plenty of time to make it happen for 9.5.
Personally, I'd really like to see as I feel it'd help with the
performance farm goal which has been discussed many times over.

Fabien, I'd ask that you not be discouraged by this and continue to work
with pgbench and work on such an improvement, if you're able to take it
on with your other committments.  I do see value in it and I feel it
will help reproducability, a key and important aspect of performance
analysis, much more so than just hacking a local copy of pgbench would.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Refactoring code for sync node detection (was: Support for N synchronous standby servers)

2014-09-23 Thread Michael Paquier
On Sat, Sep 20, 2014 at 1:16 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Sep 19, 2014 at 12:18 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 16, 2014 at 2:19 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 - A patch refactoring code for pg_stat_get_wal_senders and
 SyncRepReleaseWaiters as there is in either case duplicated code in
 this area to select the synchronous node as the one connected with
 lowest priority

 A strong +1 for this idea.  I have never liked that, and cleaning it
 up seems eminently sensible.

 Interestingly, the syncrep code has in some of its code paths the idea
 that a synchronous node is unique, while other code paths assume that
 there can be multiple synchronous nodes. If that's fine I think that
 it would be better to just make the code multiple-sync node aware, by
 having a single function call in walsender.c and syncrep.c that
 returns an integer array of WAL sender positions (WalSndCtl). as that
 seems more extensible long-term. Well for now the array would have a
 single element, being the WAL sender with lowest priority  0. Feel
 free to protest about that approach though :)
Please find attached a patch refactoring this code. Looking once again
at that I have taken the approach minimizing the footprint of
refactoring on current code, by simply adding a function called
SyncRepGetSynchronousNode in syncrep.c that returns to the caller a
position in the WAL sender array to define the code considered as
synchronous, and if no synchronous node is found.

I'll add it to the next commit fest.

Regards,
-- 
Michael
From 6f66020b3c5fc0866b63bb12cf12c582be14f7d0 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 23 Sep 2014 22:57:00 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 89 ++---
 src/backend/replication/walsender.c | 34 +-
 src/include/replication/syncrep.h   |  1 +
 3 files changed, 74 insertions(+), 50 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..e0b1034 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,60 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock.
+ */
+int
+SyncRepGetSynchronousNode(void)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+
+		/* Process to next if not active */
+		if (walsnd-pid == 0)
+			continue;
+
+		/* Process to next if not streaming */
+		if (walsnd-state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Process to next one if asynchronous */
+		if (walsnd-sync_standby_priority == 0)
+			continue;
+
+		/* Process to next one if priority conditions not satisfied */
+		if (priority != 0 
+			priority = walsnd-sync_standby_priority)
+			continue;
+
+		/* Process to next one if flush position is invalid */
+		if (XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd-sync_standby_priority;
+	}
+
+	return sync_node;
+}
+
 /*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +423,9 @@ SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	volatile WalSnd *syncWalSnd = NULL;
+	int			sync_node;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +441,14 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first 

Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 I'd be happy to discuss this with Stephen, either in person, by phone,
 or over public or private email.

Please understand that I'm not ignoring you, the conversation, or the
concern.  I was asked (by other community members) to not comment on the
thread and I have been trying to hold to that.

Posted to the list so others are aware.

I'd be more than happy to discuss it over the phone, or in person, as I
offered to do.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 1:23 AM, Stephen Frost sfr...@snowman.net wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  To clarify- we'll simply swap from (essentially) floor() to ceil() for
  handling all GUC_with_unit to internal_unit conversions, document that,
  and note it in the release notes as a possible behavior change and move
  on.

 Worksforme.

 Great, thanks.  I'll wait a couple days to see if anyone else wants to
 voice a concern.

 Tomonari, don't worry about updating the patch (unless you really want
 to) as I suspect I'll be changing around the wording anyway.  No
 offense, please, but I suspect English isn't your first language and
 it'll be simpler for me if I just handle it.  Thanks a lot for the bug
 report and discussion and I'll be sure to credit you for it in the
 commit.

Three people have voted for making it an *error* to supply a value
that needs to be rounded, instead of changing the rounding behavior.
There are not three votes for any other proposal.  (This may still be
an improvement over the current behavior, though.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Three people have voted for making it an *error* to supply a value
 that needs to be rounded, instead of changing the rounding behavior.

Votes or no votes, that's a horrible idea; it breaks the design goal
that users shouldn't need to remember the precise unit size when making
postgresql.conf entries.

And I'm not sure what votes you're counting, anyway.  People's opinions
have changed as the discussion proceeded ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Fri, Sep 19, 2014 at 7:21 AM, Amit Kapila amit.kapil...@gmail.com
wrote:

 Specific numbers of both the configurations for which I have
 posted data in previous mail are as follows:

 Scale Factor - 800
 Shared_Buffers - 12286MB (Total db size is 12288MB)
 Client and Thread Count = 64
 buffers_touched_freelist - count of buffers that backends found touched
 after
 popping from freelist.
 buffers_backend_clocksweep - count of buffer allocations not satisfied
 from freelist

   buffers_alloc 1531023  buffers_backend_clocksweep 0
 buffers_touched_freelist 0



I didn't believe these numbers, so I did some testing.  I used the same
configuration you mention here, scale factor = 800, shared_buffers = 12286
MB, and I also saw buffers_backend_clocksweep = 0.  I didn't see
buffers_touched_freelist showing up anywhere, so I don't know whether that
would have been zero or not.  Then I tried reducing the high watermark for
the freelist from 2000 buffers to 25 buffers, and
buffers_backend_clocksweep was *still* 0.  At that point I started to smell
a rat.  It turns out that, with this test configuration, there's no buffer
allocation going on at all.  Everything fits in shared_buffers, or it did
on my test.  I had to reduce shared_buffers down to 10491800kB before I got
any significant buffer eviction.

At that level, a 100-buffer high watermark wasn't sufficient to prevent the
freelist from occasionally going empty.  A 2000-buffer high water mark was
by and large sufficient, although I was able to see small numbers of
buffers being allocated via clocksweep right at the very beginning of the
test, I guess before the reclaimer really got cranking.  So the watermarks
seem to be broadly in the right ballpark, but I think the statistics
reporting needs improving.  We need an easy way to measure the amount of
work that bgreclaimer is actually doing.

I suggest we count these things:

1. The number of buffers the reclaimer has put back on the free list.
2. The number of times a backend has run the clocksweep.
3. The number of buffers past which the reclaimer has advanced the clock
sweep (i.e. the number of buffers it had to examine in order to reclaim the
number counted by #1).
4. The number of buffers past which a backend has advanced the clocksweep
(i.e. the number of buffers it had to examine in order to allocate the
number of buffers count by #3).
5. The number of buffers allocated from the freelist which the backend did
not use because they'd been touched (what you're calling
buffers_touched_freelist).

It's hard to come up with good names for all of these things that are
consistent with the somewhat wonky existing names.  Here's an attempt:

1. bgreclaim_freelist
2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I
think we want to make it more parallel with buffers_alloc, which is the
number of buffers allocated, not buffers_backend, the number of buffers
*written* by a backend)
3. clocksweep_bgreclaim
4. clocksweep_backend
5. freelist_touched

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 10:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Three people have voted for making it an *error* to supply a value
 that needs to be rounded, instead of changing the rounding behavior.

 Votes or no votes, that's a horrible idea; it breaks the design goal
 that users shouldn't need to remember the precise unit size when making
 postgresql.conf entries.

Not at all.  You can still supply the value in another unit as long as
it converts exactly.  If it doesn't, shouldn't you care about that?

 And I'm not sure what votes you're counting, anyway.  People's opinions
 have changed as the discussion proceeded ...

David Johnston, Peter Eisentraut, myself.  I don't see any indication
that any of those three people have reversed their opinion at any
point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Andres Freund
Hi,

On 2014-09-23 10:31:24 -0400, Robert Haas wrote:
 I suggest we count these things:
 
 1. The number of buffers the reclaimer has put back on the free list.
 2. The number of times a backend has run the clocksweep.
 3. The number of buffers past which the reclaimer has advanced the clock
 sweep (i.e. the number of buffers it had to examine in order to reclaim the
 number counted by #1).

 4. The number of buffers past which a backend has advanced the clocksweep
 (i.e. the number of buffers it had to examine in order to allocate the
 number of buffers count by #3).

 5. The number of buffers allocated from the freelist which the backend did
 not use because they'd been touched (what you're calling
 buffers_touched_freelist).

Sounds good.

 It's hard to come up with good names for all of these things that are
 consistent with the somewhat wonky existing names.  Here's an attempt:
 
 1. bgreclaim_freelist

bgreclaim_alloc_clocksweep?

 2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I
 think we want to make it more parallel with buffers_alloc, which is the
 number of buffers allocated, not buffers_backend, the number of buffers
 *written* by a backend)
 3. clocksweep_bgreclaim
 4. clocksweep_backend

I think bgreclaim/backend should always be either be a prefix or a
postfix. But not one in some variables and some in another.

 5. freelist_touched

I wonder if we shouldn't move all this to a new view, instead of
stuffing it somewhere where it really doesn't belong. pg_stat_buffers or
something like it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com wrote:
 [ review ]

Oh, by the way, I noticed that this patch breaks pg_buffercache.  If
we're going to have 128 lock partitions, we need to bump
MAX_SIMUL_LWLOCKS.

But this gets at another point: the way we're benchmarking this right
now, we're really conflating the effects of three different things:

1. Changing the locking regimen around the freelist and clocksweep.
2. Adding a bgreclaimer process.
3. Raising the number of buffer locking partitions.

I think it's pretty clear that #1 and #2 are a good idea.  #3 is a
mixed bag, and it might account for the regressions you saw on some
test cases.  Increasing the number of buffer mapping locks means that
those locks take up more cache lines, which could slow things down in
cases where there's no reduction in contention.  It also means that
the chances of an allocated buffer ending up in the same buffer
mapping lock partition are 1/128 instead of 1/16, which means about
5.4 additional lwlock acquire/release cycles per 100 allocations.
That's not a ton, but it's not free either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Dave Page
On Tue, Sep 23, 2014 at 2:00 PM, Andres Freund and...@anarazel.de wrote:
 On 2014-09-23 13:23:32 +0100, Dave Page wrote:
 Just to be clear here, the *only* issue we should even be discussing
 is whether the patch should or should not have been committed in the
 face of those objections. As Josh has also noted, the commitfest
 process was never meant to constrain what committers do or when they
 do it with their own patches or ones they've worked heavily on. They
 are there as a backstop to make sure that regardless of what the
 committers are doing day to day, patch authors know that their patch
 is expected to receive some review within N weeks.

 FWIW, while not really at the core of the problem here, I don't think
 this is entirely true anymore.

I'm not aware that we've made any such changes since the process was
originally developed. The fact that developers may constrain their own
review/commit work to certain periods is a personal choice, not policy
or requirement.

 We certainly seem to to expect bigger feature patches to go through the
 commitfest process to some degree. Just look at the discussions about
 *committers* patches being committed or not at each cycles last
 commitfest. Every single time the point in time they've been submitted
 to which CF plays a rather prominent role in the discussion.

They should be tracked on the app certainly, but that doesn't prevent
review/commits being made outside of the commitfest.

 Also look at committers like Robert that *do* feel constrained about
 when to commit or even expect review for submitted patches.

Regardless of what Robert may feel, review should only generally be
*expected* during a commitfest, but it can be done at any time.
Committers are free to commit at any time. The process was never
intended to restrict what committers do or when - in fact when I
introduced the process to -hackers, it was specifically worded to say
that developers are strongly encouraged to take part in the commitfest
reviews, but not forced to, and may continue to work on their patches
as they see fit.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote:
 Regardless of what Robert may feel, review should only generally be
 *expected* during a commitfest, but it can be done at any time.
 Committers are free to commit at any time. The process was never
 intended to restrict what committers do or when - in fact when I
 introduced the process to -hackers, it was specifically worded to say
 that developers are strongly encouraged to take part in the commitfest
 reviews, but not forced to, and may continue to work on their patches
 as they see fit.

So, just to be clear, you're saying that committers are free to commit
things even when other community members (who may themselves be
committers) ask them not to?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Dave Page
On Tue, Sep 23, 2014 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote:
 Regardless of what Robert may feel, review should only generally be
 *expected* during a commitfest, but it can be done at any time.
 Committers are free to commit at any time. The process was never
 intended to restrict what committers do or when - in fact when I
 introduced the process to -hackers, it was specifically worded to say
 that developers are strongly encouraged to take part in the commitfest
 reviews, but not forced to, and may continue to work on their patches
 as they see fit.

 So, just to be clear, you're saying that committers are free to commit
 things even when other community members (who may themselves be
 committers) ask them not to?

At what point did I say anything like that? Do not twist my words.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Andrew Dunstan


On 09/23/2014 10:29 AM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

Three people have voted for making it an *error* to supply a value
that needs to be rounded, instead of changing the rounding behavior.

Votes or no votes, that's a horrible idea; it breaks the design goal
that users shouldn't need to remember the precise unit size when making
postgresql.conf entries.

And I'm not sure what votes you're counting, anyway.  People's opinions
have changed as the discussion proceeded ...





I don't think I've weighed in on this, and yes, I'm very late to the 
party. The round away from zero suggestion seemed the simplest and 
most easily understood rule to me.


As Tom, I think, remarked, if that seems silly because 1 second gets 
rounded up to 1 hour or whatever, then we've chosen the wrong units in 
the first place.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Andres Freund
On 2014-09-23 16:16:18 +0100, Dave Page wrote:
 On Tue, Sep 23, 2014 at 2:00 PM, Andres Freund and...@anarazel.de wrote:
  On 2014-09-23 13:23:32 +0100, Dave Page wrote:
  Just to be clear here, the *only* issue we should even be discussing
  is whether the patch should or should not have been committed in the
  face of those objections. As Josh has also noted, the commitfest
  process was never meant to constrain what committers do or when they
  do it with their own patches or ones they've worked heavily on. They
  are there as a backstop to make sure that regardless of what the
  committers are doing day to day, patch authors know that their patch
  is expected to receive some review within N weeks.
 
  FWIW, while not really at the core of the problem here, I don't think
  this is entirely true anymore.
 
 I'm not aware that we've made any such changes since the process was
 originally developed. The fact that developers may constrain their own
 review/commit work to certain periods is a personal choice, not policy
 or requirement.

I do think that it has widely lead to a bit more formal expectance of
committers patches being reviewed by others.

  We certainly seem to to expect bigger feature patches to go through the
  commitfest process to some degree. Just look at the discussions about
  *committers* patches being committed or not at each cycles last
  commitfest. Every single time the point in time they've been submitted
  to which CF plays a rather prominent role in the discussion.
 
 They should be tracked on the app certainly, but that doesn't prevent
 review/commits being made outside of the commitfest.

And I've explicitly stated that I don't believe that they should be.

  Also look at committers like Robert that *do* feel constrained about
  when to commit or even expect review for submitted patches.
 
 Regardless of what Robert may feel, review should only generally be
 *expected* during a commitfest, but it can be done at any time.

I think you're misunderstanding my point here. I would never ever
protest against *more* reviews. Stephen quoted the delay of getting
review as a reason for committing the patch at the point he did. And
that seems unwarranted, because the current form of the patch (which is
significantly different!) was only posted in the middle of the
commitfest.

The reason I mentioned the commitfest is that he had couldn't expect a
review in the couple of days in which the patch existed in the current
form - because other reviewers were still trying to keep up (and
failing!) with the stuff that was submitted to the commitest.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Andrew Dunstan


On 09/23/2014 11:19 AM, Robert Haas wrote:

On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote:

Regardless of what Robert may feel, review should only generally be
*expected* during a commitfest, but it can be done at any time.
Committers are free to commit at any time. The process was never
intended to restrict what committers do or when - in fact when I
introduced the process to -hackers, it was specifically worded to say
that developers are strongly encouraged to take part in the commitfest
reviews, but not forced to, and may continue to work on their patches
as they see fit.

So, just to be clear, you're saying that committers are free to commit
things even when other community members (who may themselves be
committers) ask them not to?



I don't think anyone is suggesting that.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 11:23 AM, Dave Page dp...@pgadmin.org wrote:
 On Tue, Sep 23, 2014 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote:
 Regardless of what Robert may feel, review should only generally be
 *expected* during a commitfest, but it can be done at any time.
 Committers are free to commit at any time. The process was never
 intended to restrict what committers do or when - in fact when I
 introduced the process to -hackers, it was specifically worded to say
 that developers are strongly encouraged to take part in the commitfest
 reviews, but not forced to, and may continue to work on their patches
 as they see fit.

 So, just to be clear, you're saying that committers are free to commit
 things even when other community members (who may themselves be
 committers) ask them not to?

 At what point did I say anything like that? Do not twist my words.

Well, if that's not what you're saying, then good, because I sure as
heck don't think that.  I think it's the role of a committer to commit
things that are generally agreed to be good ideas, not things that
that particular committer personally thinks are a good idea regardless
of the opinions of others.  The committer is entitled to weigh their
own opinion more heavily than the opinions of others because, hey,
that's why they have the commit bit and the other people don't.   But
they're not entitled to run roughshod over contrary opinions.

Now, the way that CommitFests come into it is that people can't do two
things at once.  It's completely reasonable, in my opinion, for me to
say, hey look, I really want to review your patch but I don't have
time to do that right now because we're in the middle of a CommitFest;
please therefore submit it to the next CommitFest.  And if I say that,
the person should either (1) do it or (2) disagree with the necessity
of doing it not (3) commit anyway.

Whether you've been paying attention or not, most committer patches
ARE submitted to CommitFest and go through the exact same review
process as patches from non-committers.  That's a good thing.  It
contributes to our delivery of quality software, and it reduces
arguments about whether something was rushed through.  There are
appropriate times to leapfrog that process, when it's clear that the
patch is uncontroversial and that delay is merely dithering.  But this
is not one of those times.  It's a giant patch implementing a complex
feature for which more review time was explicitly requested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread David Johnston
On Tue, Sep 23, 2014 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Johnston david.g.johns...@gmail.com writes:
  My original concern was things that are rounded to zero now will not be
 in
  9.5 if the non-error solution is chosen.  The risk profile is extremely
  small but it is not theoretically zero.

 This is exactly the position I was characterizing as an excessively
 narrow-minded attachment to backwards compatibility.  We are trying to
 make the behavior better (as in less confusing), not guarantee that it's
 exactly the same.


​I am going to assume that the feature designers were focused on wanting to
avoid:

1000 * 60 * 5

to get a 5-minute value set on a millisecond unit parameter.

The designer of the variable, in choosing a unit, has specified the minimum
value that they consider sane.  Attempting to record an insane value should
throw an error.

I do not support throwing an error on all attempts to round but specifying
a value less than 1 in the variable's unit should not be allowed.  If such
a value is proposed the user either made an error OR they misunderstand the
variable they are using.  In either case telling them of their error is
more friendly than allowing them to discover the problem on their own.

If we are only allowed to change the behavior by
 throwing errors in cases where we previously didn't, then we are
 voluntarily donning a straitjacket that will pretty much ensure Postgres
 doesn't improve any further.


​I'm not proposing project-level policy here.

David J.


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Heikki Linnakangas

On 09/23/2014 06:23 PM, Andrew Dunstan wrote:


On 09/23/2014 10:29 AM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

Three people have voted for making it an *error* to supply a value
that needs to be rounded, instead of changing the rounding behavior.

Votes or no votes, that's a horrible idea; it breaks the design goal
that users shouldn't need to remember the precise unit size when making
postgresql.conf entries.

And I'm not sure what votes you're counting, anyway.  People's opinions
have changed as the discussion proceeded ...


I don't think I've weighed in on this, and yes, I'm very late to the
party. The round away from zero suggestion seemed the simplest and
most easily understood rule to me.


+1, rounding up seems better to me as well. Although throwing an error 
wouldn't be that bad either.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JsonbValue to Jsonb conversion

2014-09-23 Thread Dmitry Dolgov
Hi all,

I'm faced with some troubles about the jsonb implementation, and I hope
I'll get little advice =)
If I understand correctly, an abstract function for jsonb modification
should have the following stages:

Jsonb - JsonbValue - Modification - JsonbValue - Jsonb

One can convert the *JsonbValue* to the *Jsonb* only by *JsonbValueToJsonb*
function. So, my question is can be *JsonbValue*, that contains few
*jbvBinary* elements, converted to *Jsonb* by this function? It will be
very useful, if you want modify only small part of your JsonbValue (e.g.
replace value of some key). But when I'm trying to do this, an exception
unknown type of jsonb container appears. Maybe I missed something? Or is
there another approach to do this conversion?


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-09-23 Thread Robert Haas
On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
 there's another problem in this area: 9.4 adds Planning time to the
 EXPLAIN output. If you don't want to see that, you need to use (costs
 off), but this, well, also disables the costs. If you are running
 regression tests to actually test the costs, you've lost in 9.4.

 This problem just emerged in the Multicorn FDW where the regression
 tests were monitoring the costs, but in 9.4 (costs off) kills that.

 https://github.com/Kozea/Multicorn/pull/7

 Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
 line? That would even be backwards compatible with 9.x where it would
 be a no-op.

I don't think that'll work becuase:

/* check that timing is used with EXPLAIN ANALYZE */
if (es.timing  !es.analyze)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(EXPLAIN option TIMING
requires ANALYZE)));

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-09-23 Thread Heikki Linnakangas

On 09/15/2014 05:25 PM, Kevin Grittner wrote:

Tom Lane t...@sss.pgh.pa.us wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:


On 08/30/2014 12:15 AM, Kevin Grittner wrote:

If we were to go with the hooks as you propose, we would still need
to take the information from TriggerData and put it somewhere else
for the hook to reference.



Sure.


FWIW, I agree with Heikki on this point.  It makes a lot more sense for
the parser to provide hooks comparable to the existing hooks for resolving
column refs, and it's not apparent that loading such functionality into
SPI is sane at all.

OTOH, I agree with Kevin that the things we're talking about are
lightweight relations not variables.


Try as I might, I was unable to find any sensible way to use hooks.
If the issue was only the parsing, the route was fairly obvious,
but the execution side needs to access the correct tuplestore(s)
for each run, too -- so the sort of information provided by
relcache needed to be passed in to based on the context within the
process (i.e., if you have nested triggers firing, each needs to
use a different tuplestore with the same name in the same
function, even though it's using the same plan).  On both sides it
seemed easier to pass things through the same sort of techniques as
normal parameters; I couldn't find a way to use hooks that didn't
just make things uglier.


Hmph. You didn't actually use the same sort of techniques we use for 
normal parameters. You invented a new TsrCache registry, which marries 
the metadata for planning with the tuplestore itself. That's quite 
different from the way we deal with parameters (TsrCache is a misnomer, 
BTW; it's not a cache, but the primary source of information for the 
planner). And instead of passing parameters to the SPI calls 
individually, you invented SPI_register_tuplestore which affects all 
subsequent SPI calls.


To recap, this is how normal parameters work:

In the parse stage, you pass a ParserSetupHook function pointer to the 
parser. The parser calls the callback, which sets up more hooks in the 
ParseState struct: a column-ref hook and/or a param ref hook. The parser 
then proceeds to parse the query, and whenever it sees a reference to a 
column that it doesn't recognize, or a $n style parameter marker, it 
calls the column-ref or param-ref hook.


The column- or param-ref hook can return a Param node, indicating that 
the parameter's value will be supplied later, at execution time. The 
Param node contains a numeric ID for the parameter, and the type OID and 
other information needed to complete the parsing.


At execution time, you pass a ParamListInfo struct to the executor. It 
contains values for all the parameters. Alternatively, the values can be 
supplied lazily, by providing a param-fetch hook in the ParamListInfo 
struct. Whenever the executor needs the value of a parameter, and the 
ParamListInfo struct doesn't contain it, it calls the paramFetch hook 
which should fill it in ParamListInfo.


Now, how do we make the tuplestores work similarly? Here's what I think 
we should do:


Add a new p_tableref_hook function pointer, similar to p_paramref_hook. 
Whenever the parser sees a RangeVar that it doesn't recognize (or 
actually, I think it should call it *before* resolving regular tables, 
but let's ignore that for now), it calls the p_tableref_hook. It can 
return a new RelationParam node (similar to regular Param), which 
contains a numeric ID for the table/tuplestore, as well as its tuple 
descriptor.


For the execution phase, add a new array of Tuplestorestates to 
ParamListInfo. Similar to the existing array of ParamExternalDatas.


The next question is how to pass the new hooks and tuplestores through 
the SPI interface. For prepared plans, the current SPI_prepare_params + 
SPI_execute_plan_with_paramlist functions work fine. However, there 
doesn't seem to be any way to do one-shot queries with a ParserSetupHook 
and ParamListInfo. That seems like an oversight, and would be nice to 
address that anyway.


PS. the copy/read/write functions for TuplestoreRelation in the patch 
are broken; TuplestoreRelation is not a subclass of Plan. (But if you go 
with the approach I'm advocating for, TuplestoreRelation in its current 
form will be gone)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 09/15/2014 05:25 PM, Kevin Grittner wrote:
 Now, how do we make the tuplestores work similarly? Here's what I think we
 should do:

 Add a new p_tableref_hook function pointer, similar to p_paramref_hook.
 Whenever the parser sees a RangeVar that it doesn't recognize (or actually,
 I think it should call it *before* resolving regular tables, but let's
 ignore that for now), it calls the p_tableref_hook. It can return a new
 RelationParam node (similar to regular Param), which contains a numeric ID
 for the table/tuplestore, as well as its tuple descriptor.

 For the execution phase, add a new array of Tuplestorestates to
 ParamListInfo. Similar to the existing array of ParamExternalDatas.

I haven't been following this issue closely, but this sounds like a
really nice design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-09-23 Thread Florian Weimer
* Ants Aasma:

 CRC has exactly one hardware implementation in general purpose CPU's

I'm pretty sure that's not true.  Many general purpose CPUs have CRC
circuity, and there must be some which also expose them as
instructions.

 and Intel has a patent on the techniques they used to implement
 it. The fact that AMD hasn't yet implemented this instruction shows
 that this patent is non-trivial to work around.

I think you're jumping to conclusions.  Intel and AMD have various
cross-licensing deals.  AMD faces other constraints which can make
implementing the instruction difficult.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-09-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
 Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
 line? That would even be backwards compatible with 9.x where it would
 be a no-op.

 I don't think that'll work becuase:

 /* check that timing is used with EXPLAIN ANALYZE */
 if (es.timing  !es.analyze)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg(EXPLAIN option TIMING
 requires ANALYZE)));

It looks to me like that would complain about EXPLAIN (TIMING ON),
not the case Christoph is suggesting.  What he proposes seems a bit
odd and non-orthogonal, but we could make the code do it if we wanted.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-09-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Now, how do we make the tuplestores work similarly? Here's what I think we
 should do:
 
 Add a new p_tableref_hook function pointer, similar to p_paramref_hook.
 Whenever the parser sees a RangeVar that it doesn't recognize (or actually,
 I think it should call it *before* resolving regular tables, but let's
 ignore that for now), it calls the p_tableref_hook. It can return a new
 RelationParam node (similar to regular Param), which contains a numeric ID
 for the table/tuplestore, as well as its tuple descriptor.
 
 For the execution phase, add a new array of Tuplestorestates to
 ParamListInfo. Similar to the existing array of ParamExternalDatas.

 I haven't been following this issue closely, but this sounds like a
 really nice design.

I'm on board with the parser hooks part of that.  I don't especially agree
with the idea of a new sub-structure for ParamListInfo: if you do that you
will need a whole bunch of new boilerplate infrastructure to allocate,
copy, and generally manage that structure, for darn little gain.  What I'd
suggest is just saying that some Params might have type INTERNAL with
Datum values that are pointers to tuplestores; then all you need to do is
remember which Param number has been assigned to the particular tuplestore
you want.  There is already precedent for that in the recursive CTE code,
IIRC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 1:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
 Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
 line? That would even be backwards compatible with 9.x where it would
 be a no-op.

 I don't think that'll work becuase:

 /* check that timing is used with EXPLAIN ANALYZE */
 if (es.timing  !es.analyze)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg(EXPLAIN option TIMING
 requires ANALYZE)));

 It looks to me like that would complain about EXPLAIN (TIMING ON),
 not the case Christoph is suggesting.  What he proposes seems a bit
 odd and non-orthogonal, but we could make the code do it if we wanted.

Ah, right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-23 Thread Fabien COELHO



So my opinion is that this small modulo operator patch is both useful and
harmless, so it should be committed.


You've really failed to make that case --- in particular, AFAICS there is
not even consensus on the exact semantics that the operator should have.


There is. Basically whatever with a positive remainder when the divisor is 
positive is fine. The only one to avoid is the dividend signed version, 
which happen to be the one of C and SQL, a very unfortunate choice in both 
case as soon as you have negative numbers.



So I'm inclined to reject rather than put in something that may cause
surprises down the road.  The usefulness doesn't seem great enough to
take that risk.


If you reject it, you can also remove the gaussian and exponential random 
distribution which is near useless without a mean to add a minimal 
pseudo-random stage, for which (x * something) % size is a reasonable 
approximation, hence the modulo submission.



The way forward, if we think there is enough value in it (I'm not
sure there is), would be to build enough expression infrastructure
so that we could inexpensively add both operators and functions.


The modulo patch is basically 10 lines of code, I would not call that 
expensive...


As I explained earlier, it would NOT be any different with an expression 
infrastructure, as you would have to add a lex for the operator, then a 
yacc rule for the construction, the operator would need to be represented 
in a data structure, and the executor should be able to handle the case 
including errors... there is no way that this would be less that 10 lines 
of code. It would basically include the very same lines for the executor 
part.



Then we could add a modulo operator with whatever semantics seem
most popular, and some function(s) for the other semantics, and
there would not be so much riding on choosing the right semantics.


The semantics is clear. I just choose the wrong one in the first patch:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication identifiers, take 3

2014-09-23 Thread Andres Freund
Hi,

I've previously started two threads about replication identifiers. Check
http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de
and
http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de
.

The've also been discussed in the course of another thread:
http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de


As the topic has garnered some heat and confusion I thought it'd be
worthwile to start afresh with an explanation why I think they're
useful.

I don't really want to discuss about implementation specifics for now,
but rather about (details of the) concept. Once we've hashed those out,
I'll adapt the existing patch to match them.

There are three primary use cases for replication identifiers:
1) The ability Monitor how for replication has progressed in a
   crashsafe manner to allow it to restart at the right point after
   errors/crashes.
2) Efficiently identify the origin of individual changes and
   transactions. In multimaster and some cascading scenarios it is
   necessary to do so to avoid sending out replayed changes again.
3) Allow to efficiently filter out replayed changes from logical
   decoding. It's currently possible to filter changes from inside the
   output plugin, but it's much more efficient to filter them out
   before decoding.

== Logical Decoding Background ==

To understand the need for 1) it's important to roughly understand how
logical decoding/walsender streams changes and handles feedback from
the receiving side. A walsender performing logical decoding
*continously* sends out transactions. As long as there's new local
changes (i.e. unprocessed WAL) and the network buffers aren't full it
will send changes. *Without* waiting for the client. Everything else
would lead to horrible latency.

Because it sends data without waiting for the client to have processed
them it obviously can't remove resources that are needed to stream
them out again. The client or network connection could crash after
all.

To let the sender know when it can remove resources the receiver
regularly sends back 'feedback' messages acknowledging up to where
changes have been safely received. Whenever such a feedback message
arrives the sender can release resources that aren't needed to decode
the changes below that horizon.

When the receiver ask the server to stream changes out it tells the
sender at which LSN it should start sending changes. All
*transactions* that *commit* after that LSN are sent out. Possibly
again.

== Crashsafe apply ==

Based on those explanations, when building a logical replication
solution on top of logical decoding, one must remember the latest
*remote* LSN that already has been replayed. So that, when the apply
process or the whole database crashes, it is possibly to ask for all
changes since the last transaction that has been successfully applied.

The trivial solution here is to have a table (remote_node,
last_replayed_lsn) and update it for every replayed
transaction. Unfortunately that doesn't perform very well because that
table quickly gets heavily bloated. It's also hard to avoid page level
contention when replaying transaction from multiple remote
nodes. Additionally these changes have to be filtered out when
replicating these changes in a cascading fashion.

To do this more efficiently there needs to be a crashsafe way to
associate the latest successfully replayed remote transaction.


== Identify the origin of changes ==

Say you're building a replication solution that allows two nodes to
insert into the same table on two nodes. Ignoring conflict resolution
and similar fun, one needs to prevent the same change being replayed
over and over. In logical replication the changes to the heap have to
be WAL logged, and thus the *replay* of changes from a remote node
produce WAL which then will be decoded again.

To avoid that it's very useful to tag individual changes/transactions
with their 'origin'. I.e. mark changes that have been directly
triggered by the user sending SQL as originating 'locally' and changes
originating from replaying another node's changes as originating
somewhere else.

If that origin is exposed to logical decoding output plugins they can
easily check whether to stream out the changes/transactions or not.


It is possible to do this by adding extra columns to every table and
store the origin of a row in there, but that a) permanently needs
storage b) makes things much more invasive.


== Proposed solution ==

These two fundamental problems happen to have overlapping
requirements.

A rather efficient solution for 1) is to attach the 'origin node' and
the remote commit LSN to every local commit record produced by
replay. That allows to have a shared memory table (remote_node,
local_lsn, remote_lsn).

During replay that table is kept up2date in sync with transaction
commits. If updated within the transaction commit's critical section
it's guaranteed to be correct, 

Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Gregory Smith

On 9/23/14, 9:00 AM, Andres Freund wrote:
I also think committers need to be much more careful when committing 
patches which they (or their employer) appear to have a business 
interest in. Rushing ahead to commit the patch of somebody 'unrelated' 
leaves a completely different taste than committing your colleagues 
patch. *INDEPENDENT* of the actual reasons and the state of the patch.


I haven't been doing much personal development work around here lately, 
but I did more than a little of the planning on how to handle this as 
responsibly as I felt it deserved.  I think this is worth talking about 
a little bit because this whole topic, how to deal with a funded project 
where the committer is also a contributor, isn't something a lot of 
people have visibility into. (Not talking about you, Andres, you know 
the deal)


This commit didn't happen until I first succeeded in getting Stephen 
Frost fully funded as a community PostgreSQL contributor (for this job, 
as well as others with a broader community appeal).  Everyone attached 
to the budget side very explicitly understands that includes an 
open-ended responsibility to the community for addressing fall-out from 
RLS, going from unforeseen issues to revisiting the things left on the 
known open items list.  It's hard to reach perfect before commit; 
eventually you just have to just shoot and see what happens.


Just to be really safe, I also kicked off training a whole new 
PostgreSQL contributor (Adam Brightwell) five months ago, so that by the 
time we got to here he's also making his own contributions to the 
security code of the database.  And that's included exercises tracking 
down bugs in the early RLS POC deployments already going on here at 
Crunchy, so that Stephen isn't the sole Postgres community expertise 
bottleneck on the code even at this company.  (I am NOT on the hook for 
fixing RLS bugs)


The decision on when to commit was strictly Stephen's.  During our last 
chat, we agreed this seemed like an ideal time of year in the 
development cycle for such a thing to land though.  9.5 is a fresh 
branch, and there is plenty of time to clean up and/or revert if that's 
the unfortunate end for anything going in today.  Plus the ongoing 
CommitFest schedule means everyone in the community already is arranging 
to provide review resources needed in the near future pipeline.


I can understand that Robert feels a procedural sting and/or slight due 
to the exact sequence of events here, and I'm staying out of that.  But 
in general, I don't know what we could have done much better to be a 
responsible contributor, to finally push this long in development 
feature over the finish line.  A description like rushing ahead would 
feel inappropriate to me for this one, given how much care went into 
timing even roughly when this particular commit should happen.  The time 
of year in particular was very specifically aimed at for minimal 
disruption, based on both major version release dates and the related 
community development cycle around them.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Alvaro Herrera
Alvaro Herrera wrote:

 Heikki Linnakangas wrote:

  If you add a new datatype, and define b-tree operators for it, what
  is required to create a minmax opclass for it? Would it be possible
  to generalize the functions in brin_minmax.c so that they can be
  reused for any datatype (with b-tree operators) without writing any
  new C code? I think we're almost there; the only thing that differs
  between each data type is the opcinfo function. Let's pass the type
  OID as argument to the opcinfo function. You could then have just a
  single minmax_opcinfo function, instead of the macro to generate a
  separate function for each built-in datatype.
 
 Yeah, that's how I had that initially.  I changed it to what it's now as
 part of a plan to enable building cross-type opclasses, so you could
 have WHERE int8col=42 without requiring a cast of the constant to type
 int8.  This might have been a thinko, because AFAICS it's possible to
 build them with a constant opcinfo as well (I changed several other
 things to support this, as described in a previous email.)  I will look
 into this later.

I found out that we don't really throw errors in such cases anymore; we
insert casts instead.  Maybe there's a performance argument that it
might be better to use existing cross-type operators than casting, but
justifying this work just turned a lot harder.  Here's a patch that
reverts opcinfo into a generic function that receives the type OID.

I will look into adding some testing mechanism for the union support
proc; with that I will just consider the patch ready for commit and will
push.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/access/brin/brin.c
--- b/src/backend/access/brin/brin.c
***
*** 886,894  brin_build_desc(Relation rel)
  
  		opcInfoFn = index_getprocinfo(rel, keyno + 1, BRIN_PROCNUM_OPCINFO);
  
- 		/* actually FunctionCall0 but we don't have that */
  		opcinfo[keyno] = (BrinOpcInfo *)
! 			DatumGetPointer(FunctionCall1(opcInfoFn, InvalidOid));
  		totalstored += opcinfo[keyno]-oi_nstored;
  	}
  
--- 886,894 
  
  		opcInfoFn = index_getprocinfo(rel, keyno + 1, BRIN_PROCNUM_OPCINFO);
  
  		opcinfo[keyno] = (BrinOpcInfo *)
! 			DatumGetPointer(FunctionCall1(opcInfoFn,
! 		  tupdesc-attrs[keyno]-atttypid));
  		totalstored += opcinfo[keyno]-oi_nstored;
  	}
  
*** a/src/backend/access/brin/brin_minmax.c
--- b/src/backend/access/brin/brin_minmax.c
***
*** 50,87  typedef struct MinmaxOpaque
  	bool		inited[MINMAX_NUM_PROCNUMS];
  } MinmaxOpaque;
  
! #define OPCINFO(typname, typoid)			\
! PG_FUNCTION_INFO_V1(minmaxOpcInfo_##typname);\
! Datum		\
! minmaxOpcInfo_##typname(PG_FUNCTION_ARGS)	\
! {			\
! 	BrinOpcInfo *result;	\
! 			\
! 	/*		\
! 	 * opaque-operators is initialized lazily, as indicated by 'inited'	\
! 	 * which is initialized to all false by palloc0.\
! 	 */		\
! 			\
! 	result = palloc0(MAXALIGN(SizeofBrinOpcInfo(2)) +		\
! 	 sizeof(MinmaxOpaque));	\
! 	result-oi_nstored = 2;	\
! 	result-oi_opaque = (MinmaxOpaque *)	\
! 		MAXALIGN((char *) result + SizeofBrinOpcInfo(2));	\
! 	result-oi_typids[0] = typoid;			\
! 	result-oi_typids[1] = typoid;			\
! 			\
! 	PG_RETURN_POINTER(result);\
! }
  
! OPCINFO(int4, INT4OID)
! OPCINFO(numeric, NUMERICOID)
! OPCINFO(text, TEXTOID)
! OPCINFO(time, TIMEOID)
! OPCINFO(timetz, TIMETZOID)
! OPCINFO(timestamp, TIMESTAMPOID)
! OPCINFO(timestamptz, TIMESTAMPTZOID)
! OPCINFO(date, DATEOID)
! OPCINFO(char, CHAROID)
  
  /*
   * Examine the given index tuple (which contains partial status of a certain
--- 50,77 
  	bool		inited[MINMAX_NUM_PROCNUMS];
  } MinmaxOpaque;
  
! PG_FUNCTION_INFO_V1(minmaxOpcInfo);
! Datum
! minmaxOpcInfo(PG_FUNCTION_ARGS)
! {
! 	Oid		typoid = PG_GETARG_OID(0);
! 	BrinOpcInfo *result;
! 
! 	/*
! 	 * opaque-operators is initialized lazily, as indicated by 'inited'
! 	 * which is initialized to all false by palloc0.
! 	 */
  
! 	result = palloc0(MAXALIGN(SizeofBrinOpcInfo(2)) +
! 	 sizeof(MinmaxOpaque));
! 	result-oi_nstored = 2;
! 	result-oi_opaque = (MinmaxOpaque *)
! 		MAXALIGN((char *) result + SizeofBrinOpcInfo(2));
! 	result-oi_typids[0] = typoid;
! 	result-oi_typids[1] = typoid;
! 
! 	PG_RETURN_POINTER(result);
! }
  
  /*
   * Examine the given index tuple (which contains partial status of a certain
*** a/src/include/catalog/pg_amproc.h
--- b/src/include/catalog/pg_amproc.h
***
*** 436,514  DATA(insert (	4017   25 25 5 4031 ));
  DATA(insert (   4054   23 23 1 3383 ));
  DATA(insert (   4054   23 23 2 3384 ));
  DATA(insert (   4054   23 23 3 3385 ));
! DATA(insert (   4054   23 23 4 3394 ));
  DATA(insert (   4054   23 23 5   66 ));
  

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Greg Stark
Fwiw I agree with TL2. The simplest, least surprising behaviour to explain
to users is to say we round to nearest and if that means we rounded to zero
(or another special value) we throw an error. We could list the minimum
value in pg_settings and maybe in documentation.

-- 
greg


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-23 Thread Heikki Linnakangas

On 09/23/2014 09:15 PM, Fabien COELHO wrote:

So I'm inclined to reject rather than put in something that may cause
surprises down the road.  The usefulness doesn't seem great enough to
take that risk.


Marked as Returned with feedback.


If you reject it, you can also remove the gaussian and exponential random
distribution which is near useless without a mean to add a minimal
pseudo-random stage, for which (x * something) % size is a reasonable
approximation, hence the modulo submission.


I'm confused. The gaussian and exponential patch was already committed. 
Are you saying that it's not actually useful, and should be reverted? 
That doesn't make sense to me, gaussian and exponential distributed 
values seems quite useful to me in test scripts.


I don't understand what that pseudo-random stage you're talking about 
is. Can you elaborate?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-09-23 Thread Heikki Linnakangas

On 09/23/2014 08:51 PM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Now, how do we make the tuplestores work similarly? Here's what I think we
should do:

Add a new p_tableref_hook function pointer, similar to p_paramref_hook.
Whenever the parser sees a RangeVar that it doesn't recognize (or actually,
I think it should call it *before* resolving regular tables, but let's
ignore that for now), it calls the p_tableref_hook. It can return a new
RelationParam node (similar to regular Param), which contains a numeric ID
for the table/tuplestore, as well as its tuple descriptor.

For the execution phase, add a new array of Tuplestorestates to
ParamListInfo. Similar to the existing array of ParamExternalDatas.



I haven't been following this issue closely, but this sounds like a
really nice design.


I'm on board with the parser hooks part of that.  I don't especially agree
with the idea of a new sub-structure for ParamListInfo: if you do that you
will need a whole bunch of new boilerplate infrastructure to allocate,
copy, and generally manage that structure, for darn little gain.  What I'd
suggest is just saying that some Params might have type INTERNAL with
Datum values that are pointers to tuplestores; then all you need to do is
remember which Param number has been assigned to the particular tuplestore
you want.  There is already precedent for that in the recursive CTE code,
IIRC.


Works for me.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Josh Berkus
On 09/22/2014 08:23 PM, Amit Kapila wrote:
 Who decides if the patch is adequately reviewed?
 
 Author, Committer or Reviewer?  In CF, that is comparatively clear
 that once Reviewer is satisfied, he marks the patch as
 Ready For Committer and then Committer picks up and if he is satisfied
 with the review and quality of patch, then he commits the patch or if the
 Committer himself is reviewer than in many cases once he is satisfied,
 he commits it.

Well, outside of the CF process, it becomes up to the committer to get
adequate review of the patch so it can be committed, and adequate
review in one of those cases generally means another committer who
didn't work on the patch previously.  It's also standard for our
committers, up to and including Tom Lane, to say things like I'm going
to commit this in 24 hours if nobody objects further.

 Now in this case, if I understand correctly the story is not
 so straightforward.  It seems to me Robert as the Reviewer was not
 completely satisfied where as Stephen as the Committer was pretty
 much okay with the patch so he went ahead and commits it.

That's certainly what it looks like to me, and on that basis Stephen
should have held back the patch until he got reviewer OK.

Fortunately, since we use Git and not CVS, reverting patches isn't the
trauma it once was.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-09-23 Thread Christoph Berg
Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us
 Robert Haas robertmh...@gmail.com writes:
  On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
  Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
  line? That would even be backwards compatible with 9.x where it would
  be a no-op.
 
  I don't think that'll work becuase:
 
  /* check that timing is used with EXPLAIN ANALYZE */
  if (es.timing  !es.analyze)
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg(EXPLAIN option TIMING
  requires ANALYZE)));
 
 It looks to me like that would complain about EXPLAIN (TIMING ON),
 not the case Christoph is suggesting.  What he proposes seems a bit
 odd and non-orthogonal, but we could make the code do it if we wanted.

I don't think this warrants a new flag, and TIMING OFF seems to be the
right naming for it. (In fact it was the first I tried, and I was
cursing quite a bit over the lack of configurability until I realized
that COSTS OFF disabled the planning time display as well.) It might
be a bit odd, but it's easy to remember.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote:
 But this gets at another point: the way we're benchmarking this right
 now, we're really conflating the effects of three different things:

 1. Changing the locking regimen around the freelist and clocksweep.
 2. Adding a bgreclaimer process.
 3. Raising the number of buffer locking partitions.

I did some more experimentation on this.  Attached is a patch that
JUST does #1, and, as previously suggested, it uses a single spinlock
instead of using two of them that are probably in the same cacheline.
Without this patch, on a 32-client, read-only pgbench at scale factor
1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about
LWLock-inspired preemption:

 - LWLockAcquire
+ 68.41% ReadBuffer_common
+ 31.59% StrategyGetBuffer

With the patch, unsurprisingly, StrategyGetBuffer disappears and the
only lwlocks that are causing context-switches are the individual
buffer locks.  But are we suffering spinlock contention instead as a
result?   Yes, but not much.  s_lock is at 0.41% in the corresponding
profile, and only 6.83% of those calls are from the patched
StrategyGetBuffer.  In a similar profile of master, s_lock is at
0.31%.  So there's a bit of additional s_lock contention, but I think
it's considerably less than the contention over the lwlock it's
replacing, because the spinlock is only held for the minimal amount of
time needed, whereas the lwlock could be held across taking and
releasing many individual buffer locks.

TPS results are a little higher with the patch - these are alternating
5 minute runs:

master tps = 176010.647944 (including connections establishing)
master tps = 176615.291149 (including connections establishing)
master tps = 175648.370487 (including connections establishing)
reduce-replacement-locking tps = 177888.734320 (including connections
establishing)
reduce-replacement-locking tps = 177797.842410 (including connections
establishing)
reduce-replacement-locking tps = 177894.822656 (including connections
establishing)

The picture is similar at 64 clients, but the benefit is a little more:

master tps = 179037.231597 (including connections establishing)
master tps = 180500.937068 (including connections establishing)
master tps = 181565.706514 (including connections establishing)
reduce-replacement-locking tps = 185741.503425 (including connections
establishing)
reduce-replacement-locking tps = 188598.728062 (including connections
establishing)
reduce-replacement-locking tps = 187340.977277 (including connections
establishing)

What's interesting is that I can't see in the perf output any real
sign that the buffer mapping locks are slowing things down, but they
clearly are, because when I take this patch and also boost
NUM_BUFFER_PARTITIONS to 128, the performance goes up:

reduce-replacement-locking-128 tps = 251001.812843 (including
connections establishing)
reduce-replacement-locking-128 tps = 247368.925927 (including
connections establishing)
reduce-replacement-locking-128 tps = 250775.304177 (including
connections establishing)

The performance also goes up if I do that on master, but the effect is
substantially less:

master-128 tps = 219301.492902 (including connections establishing)
master-128 tps = 219786.249076 (including connections establishing)
master-128 tps = 219821.220271 (including connections establishing)

I think this shows pretty clearly that, even without the bgreclaimer,
there's a lot of merit in getting rid of BufFreelistLock and using a
spinlock held for the absolutely minimal number of instructions
instead.  There's already some benefit without doing anything about
the buffer mapping locks, and we'll get a lot more benefit once that
issue is addressed.  I think we need to do some serious study to see
whether bgreclaimer is even necessary, because it looks like this
change alone, which is much simpler, takes a huge bite out of the
scalability problem.

I welcome further testing and comments, but my current inclination is
to go ahead and push the attached patch.  To my knowledge, nobody has
at any point objected to this aspect of what's being proposed, and it
looks safe to me and seems to be a clear win.  We can then deal with
the other issues on their own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit df4077cda2eae3eb4a5cf387da0c1e7616e73204
Author: Robert Haas rh...@postgresql.org
Date:   Mon Sep 22 16:42:14 2014 -0400

Remove volatile qualifiers from lwlock.c.

Now that spinlocks (hopefully!) act as compiler barriers, as of commit
0709b7ee72e4bc71ad07b7120acd117265ab51d0, this should be safe.  This
serves as a demonstration of the new coding style, and may be optimized
better on some machines as well.

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 7c96da5..66fb2e4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ 

Re: [HACKERS] RLS feature has been committed

2014-09-23 Thread Stephen Frost
All,

  Robert and I had a great discussion on the phone where we were both
  able to voice our concerns and feelings about RLS being pushed.  By
  way of summary, we agree that it was pushed ahead of its time and that
  it should have waited for at least another review, which likely would
  have happened even prior to the October commitfest.

  I've apologized for jumping the gun on pushing it, it wasn't the best
  move for the project as a whole and I realize that now.  Having other
  committers review and +1 a large patch before moving forward with it
  is important, even if there is already consensus on the overall
  design, as it will help minimize the bugs and issues in PostgreSQL.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread David Johnston
On Tue, Sep 23, 2014 at 3:05 PM, Greg Stark st...@mit.edu wrote:

 Fwiw I agree with TL2. The simplest, least surprising behaviour to explain
 to users is to say we round to nearest and if that means we rounded to zero
 (or another special value) we throw an error. We could list the minimum
 value in pg_settings and maybe in documentation.

​I'm not sure TL2 would agree that you are agreeing with him...

To the other point the minimum unit-less value is 1.  The unit that is
applied is already listed in pg_settings​ and the documentation.  While 0
is allowed it is more of a flag than a value.

David J.


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote:
 I did some more experimentation on this.  Attached is a patch that
 JUST does #1, and, ...

...and that was the wrong patch.  Thanks to Heikki for point that out.

Second try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README
index 1fd38d0..a4ebbcc 100644
--- a/src/backend/storage/buffer/README
+++ b/src/backend/storage/buffer/README
@@ -125,14 +125,12 @@ bits of the tag's hash value.  The rules stated above apply to each partition
 independently.  If it is necessary to lock more than one partition at a time,
 they must be locked in partition-number order to avoid risk of deadlock.
 
-* A separate system-wide LWLock, the BufFreelistLock, provides mutual
+* A separate system-wide spinlock, buffer_strategy_lock, provides mutual
 exclusion for operations that access the buffer free list or select
-buffers for replacement.  This is always taken in exclusive mode since
-there are no read-only operations on those data structures.  The buffer
-management policy is designed so that BufFreelistLock need not be taken
-except in paths that will require I/O, and thus will be slow anyway.
-(Details appear below.)  It is never necessary to hold the BufMappingLock
-and the BufFreelistLock at the same time.
+buffers for replacement.  A spinlock is used here rather than a lightweight
+lock for efficiency; no other locks of any sort should be acquired while
+buffer_strategy_lock is held.  This is essential to allow buffer replacement
+to happen in multiple backends with reasonable concurrency.
 
 * Each buffer header contains a spinlock that must be taken when examining
 or changing fields of that buffer header.  This allows operations such as
@@ -165,7 +163,7 @@ consider their pages unlikely to be needed soon; however, the current
 algorithm never does that.  The list is singly-linked using fields in the
 buffer headers; we maintain head and tail pointers in global variables.
 (Note: although the list links are in the buffer headers, they are
-considered to be protected by the BufFreelistLock, not the buffer-header
+considered to be protected by the buffer_strategy_lock, not the buffer-header
 spinlocks.)  To choose a victim buffer to recycle when there are no free
 buffers available, we use a simple clock-sweep algorithm, which avoids the
 need to take system-wide locks during common operations.  It works like
@@ -178,25 +176,26 @@ buffer reference count, so it's nearly free.)
 
 The clock hand is a buffer index, nextVictimBuffer, that moves circularly
 through all the available buffers.  nextVictimBuffer is protected by the
-BufFreelistLock.
+buffer_strategy_lock.
 
 The algorithm for a process that needs to obtain a victim buffer is:
 
-1. Obtain BufFreelistLock.
+1. Obtain buffer_strategy_lock.
 
-2. If buffer free list is nonempty, remove its head buffer.  If the buffer
-is pinned or has a nonzero usage count, it cannot be used; ignore it and
-return to the start of step 2.  Otherwise, pin the buffer, release
-BufFreelistLock, and return the buffer.
+2. If buffer free list is nonempty, remove its head buffer.  Release
+buffer_strategy_lock.  If the buffer is pinned or has a nonzero usage count,
+it cannot be used; ignore it go back to step 1.  Otherwise, pin the buffer,
+and return it.
 
-3. Otherwise, select the buffer pointed to by nextVictimBuffer, and
-circularly advance nextVictimBuffer for next time.
+3. Otherwise, the buffer free list is empty.  Select the buffer pointed to by
+nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
+Release buffer_strategy_lock.
 
 4. If the selected buffer is pinned or has a nonzero usage count, it cannot
-be used.  Decrement its usage count (if nonzero) and return to step 3 to
-examine the next buffer.
+be used.  Decrement its usage count (if nonzero), reacquire
+buffer_strategy_lock, and return to step 3 to examine the next buffer.
 
-5. Pin the selected buffer, release BufFreelistLock, and return the buffer.
+5. Pin the selected buffer, and return.
 
 (Note that if the selected buffer is dirty, we will have to write it out
 before we can recycle it; if someone else pins the buffer meanwhile we will
@@ -259,7 +258,7 @@ dirty and not pinned nor marked with a positive usage count.  It pins,
 writes, and releases any such buffer.
 
 If we can assume that reading nextVictimBuffer is an atomic action, then
-the writer doesn't even need to take the BufFreelistLock in order to look
+the writer doesn't even need to take buffer_strategy_lock in order to look
 for buffers to write; it needs only to spinlock each buffer header for long
 enough to check the dirtybit.  Even without that assumption, the writer
 only needs to take the lock long enough to read the variable value, not
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Peter Eisentraut
On 9/23/14 10:29 AM, Tom Lane wrote:
 Votes or no votes, that's a horrible idea; it breaks the design goal
 that users shouldn't need to remember the precise unit size when making
 postgresql.conf entries.

I think this is not historically correct.

The original motivation was that you had to remember what the units on

shared_buffers = 37

were, and that it was different units than

work_mem = 37


That's independent of the question whether

shared_buffers = 250kB

might be rejected or not.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Peter Eisentraut
On 9/23/14 1:13 AM, Stephen Frost wrote:
 To clarify- we'll simply swap from (essentially) floor() to ceil() for
 handling all GUC_with_unit to internal_unit conversions, document that,
 and note it in the release notes as a possible behavior change and move
 on.

That'll probably work, as long as we don't invent any other-than-zero
values that are somehow treated special.

One might be able to construct a case where something gets rounded to -1
when it didn't before or the other way around, but that's clearly a
silly case.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-09-23 Thread Kevin Grittner
Thanks for reviewing this.  I will spend some time looking at your
recommendations in detail and seeing what it would take to
implement them, and whether I agree that is better; but I wanted to
point out a couple things regarding the SPI interface where I'm not
sure you realize what's currently being done.  I may want to argue
some of the rest if I don't agree after more detailed review; this
is just what jumps out on a first pass.

Heikki Linnakangas hlinnakan...@vmware.com wrote:

 instead of passing parameters to the SPI calls individually, you
 invented SPI_register_tuplestore which affects all subsequent SPI
 calls.

All subsequent SPI calls on that particular SPI connection until it
is closed, except for any tuplestores are later unregistered.
Nested SPI connections do not automatically inherit the named
tuplestores; whatever code opens an SPI connection would need to
register them for the new context, if desired.  This seemed to me
to provide minimal disruption to the existing SPI callers who might
want to use this.

 The next question is how to pass the new hooks and tuplestores
 through the SPI interface. For prepared plans, the current
 SPI_prepare_params + SPI_execute_plan_with_paramlist functions
 work fine.

They work fine, I guess, in the *one* place they are used.
SPI_prepare_params() is called exactly *once* from plpgsql's
pl_exec.c, and SPI_execute_plan_with_paramlist() is called twice
from the same file.  There are no other calls to either from
anywhere else in the code base.

 However, there doesn't seem to be any way to do one-shot queries
 with a ParserSetupHook and ParamListInfo. That seems like an
 oversight, and would be nice to address that anyway.

There are dozens of SPI_prepare* and SPI_exec* calls scattered all
over the backend, pl, and contrib code which appear to get along
fine without that.  Partly it may be because it involves something
of a modularity violation; the comment for the function used for
this call (where it *is* used) says:

/*
 * plpgsql_parser_setup set up parser hooks for dynamic parameters
 *
 * Note: this routine, and the hook functions it prepares for, are logically
 * part of plpgsql parsing.  But they actually run during function execution,
 * when we are ready to evaluate a SQL query or expression that has not
 * previously been parsed and planned.
 */

Can you clarify what benefit you see to modifying the SPI API the
way you suggest, and what impact it might have on existing calling
code?

 PS. the copy/read/write functions for TuplestoreRelation in the
 patch are broken; TuplestoreRelation is not a subclass of Plan.

I'm not sure what you mean by broken -- could you elaborate?  It
is, in a lot of ways, parallel to the CommonTableExpr defined a
little above it in the parsenodes.h file -- a relation which can
only be referenced by unqualified name.  It is after CTEs in
search order, and before anything else.  Having such a node in the
tree after parse analysis allows a number of special cases to be
handled pretty much the same as they are for CTEs, which seems like
a good thing to me.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.6

2014-09-23 Thread Oskari Saarenmaa

23.09.2014, 15:18, Andres Freund kirjoitti:

On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote:

23.09.2014, 00:01, Andres Freund kirjoitti:

The patches:
0001: The actual atomics API


I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal
dingo) with this patch but regression tests failed due to:


Btw, if you could try sun studio it'd be great. I wrote the support for
it blindly, and I'd be surprised if I got it right on the first try.


I just installed Solaris Studio 12.3 and tried compiling this:

../../../../src/include/port/atomics/generic-sunpro.h, line 54: return 
value type mismatch
../../../../src/include/port/atomics/generic-sunpro.h, line 77: return 
value type mismatch
../../../../src/include/port/atomics/generic-sunpro.h, line 79: 
#if-less #endif
../../../../src/include/port/atomics/generic-sunpro.h, line 81: 
#if-less #endif


atomic_add_64 and atomic_add_32 don't return anything (the 
atomic_add_*_nv variants return the new value) and there were a few 
extra #endifs.  Regression tests pass after applying the attached patch 
which defines PG_HAS_ATOMIC_ADD_FETCH_U32.


There doesn't seem to be a fallback for defining pg_atomic_fetch_add 
based on pg_atomic_add_fetch so pg_atomic_add_fetch now gets implemented 
using pg_atomic_compare_exchange.


Also, it's not possible to compile PG with FORCE_ATOMICS_BASED_SPINLOCKS 
with these patches:


../../../../src/include/storage/s_lock.h, line 868: #error: PostgreSQL 
does not have native spinlock support on this platform


atomics/generic.h would implement atomic flags using operations exposed 
by atomics/generic-sunpro.h, but atomics/fallback.h is included before 
it and it defines functions for flag operations which s_lock.h doesn't 
want to use.


/ Oskari
From 42a5bbbab0c8f42c6014ebe12c9963b371168866 Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa o...@ohmu.fi
Date: Tue, 23 Sep 2014 23:53:46 +0300
Subject: [PATCH] atomics: fix atomic add for sunpro and drop invalid #endifs

Solaris Studio compiler has an atomic add operation that returns the new
value, the one with no _nv suffix doesn't return anything.
---
 src/include/port/atomics/generic-sunpro.h | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/include/port/atomics/generic-sunpro.h b/src/include/port/atomics/generic-sunpro.h
index 10ac70d..fedc099 100644
--- a/src/include/port/atomics/generic-sunpro.h
+++ b/src/include/port/atomics/generic-sunpro.h
@@ -47,14 +47,12 @@ pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
 	return ret;
 }
 
-#define PG_HAS_ATOMIC_FETCH_ADD_U32
+#define PG_HAS_ATOMIC_ADD_FETCH_U32
 STATIC_IF_INLINE uint32
-pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
+pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 {
-	return atomic_add_32(ptr-value, add_);
+	return atomic_add_32_nv(ptr-value, add_);
 }
-#endif
-
 
 #define PG_HAS_ATOMIC_COMPARE_EXCHANGE_U64
 static inline bool
@@ -70,12 +68,11 @@ pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
 	return ret;
 }
 
-#define PG_HAS_ATOMIC_FETCH_ADD_U64
+#define PG_HAS_ATOMIC_ADD_FETCH_U64
 STATIC_IF_INLINE uint64
-pg_atomic_fetch_add_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_)
+pg_atomic_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_)
 {
-	return atomic_add_64(ptr-value, add_);
+	return atomic_add_64_nv(ptr-value, add_);
 }
-#endif
 
 #endif
-- 
1.8.4.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 9/23/14 10:29 AM, Tom Lane wrote:
 Votes or no votes, that's a horrible idea; it breaks the design goal
 that users shouldn't need to remember the precise unit size when making
 postgresql.conf entries.

 I think this is not historically correct.

 The original motivation was that you had to remember what the units on
 shared_buffers = 37
 were, and that it was different units than
 work_mem = 37

Right, but the issue of not requiring the specified value to be an exact
multiple of the underlying unit did come up in the early discussion,
and we were pretty clear that we didn't want to throw an error for that:

http://www.postgresql.org/message-id/flat/29818.1153976...@sss.pgh.pa.us#29818.1153976...@sss.pgh.pa.us

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote:
 I did some more experimentation on this.  Attached is a patch that
 JUST does #1, and, ...

 ...and that was the wrong patch.  Thanks to Heikki for point that out.
 Second try.

But the results you gave in the previous message were correctly
attributed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote:
 I did some more experimentation on this.  Attached is a patch that
 JUST does #1, and, ...

 ...and that was the wrong patch.  Thanks to Heikki for point that out.
 Second try.

 But the results you gave in the previous message were correctly
 attributed?

The patch I attached the first time was just the last commit in the
git repository where I wrote the patch, rather than the changes that I
made on top of that commit.  So, yes, the results from the previous
message are with the patch attached to the follow-up.  I just typed
the wrong git command when attempting to extract that patch to attach
it to the email.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Gregory Smith

On 9/23/14, 10:31 AM, Robert Haas wrote:

I suggest we count these things:

1. The number of buffers the reclaimer has put back on the free list.
2. The number of times a backend has run the clocksweep.
3. The number of buffers past which the reclaimer has advanced the 
clock sweep (i.e. the number of buffers it had to examine in order to 
reclaim the number counted by #1).
4. The number of buffers past which a backend has advanced the 
clocksweep (i.e. the number of buffers it had to examine in order to 
allocate the number of buffers count by #3).
5. The number of buffers allocated from the freelist which the backend 
did not use because they'd been touched (what you're calling 
buffers_touched_freelist).


All sound reasonable.  To avoid wasting time here, I think it's only 
worth doing all of these as DEBUG level messages for now.  Then only go 
through the overhead of exposing the ones that actually seem relevant.  
That's what I did for the 8.3 work, and most of that data at this level 
was barely relevant to anyone but me then or since. We don't want the 
system views to include so much irrelevant trivia that finding the 
important parts becomes overwhelming.


I'd like to see that level of instrumentation--just the debug level 
messages--used to quantify the benchmarks that people are running 
already, to prove they are testing what they think they are.  That would 
have caught the test error you already stumbled on for example.  Simple 
paranoia says there may be more issues like that hidden in here 
somewhere, and this set you've identified should find any more of them 
around.


If all that matches up so the numbers for the new counters seem sane, I 
think that's enough to commit the first basic improvement here.  Then 
make a second pass over exposing the useful internals that let everyone 
quantify either the improvement or things that may need to be tunable.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.6

2014-09-23 Thread Andres Freund
On 2014-09-24 00:27:25 +0300, Oskari Saarenmaa wrote:
 23.09.2014, 15:18, Andres Freund kirjoitti:
 On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote:
 23.09.2014, 00:01, Andres Freund kirjoitti:
 The patches:
 0001: The actual atomics API
 
 I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal
 dingo) with this patch but regression tests failed due to:
 
 Btw, if you could try sun studio it'd be great. I wrote the support for
 it blindly, and I'd be surprised if I got it right on the first try.
 
 I just installed Solaris Studio 12.3 and tried compiling this:

Cool.

 ../../../../src/include/port/atomics/generic-sunpro.h, line 54: return
 value type mismatch
 ../../../../src/include/port/atomics/generic-sunpro.h, line 77: return
 value type mismatch
 ../../../../src/include/port/atomics/generic-sunpro.h, line 79: #if-less
 #endif
 ../../../../src/include/port/atomics/generic-sunpro.h, line 81: #if-less
 #endif
 
 atomic_add_64 and atomic_add_32 don't return anything (the atomic_add_*_nv
 variants return the new value) and there were a few extra #endifs.
 Regression tests pass after applying the attached patch which defines
 PG_HAS_ATOMIC_ADD_FETCH_U32.

Thanks for the fixes!

Hm. I think then it's better to simply not implement addition and rely
on cmpxchg.

 Also, it's not possible to compile PG with FORCE_ATOMICS_BASED_SPINLOCKS
 with these patches:
 
 ../../../../src/include/storage/s_lock.h, line 868: #error: PostgreSQL
 does not have native spinlock support on this platform
 
 atomics/generic.h would implement atomic flags using operations exposed by
 atomics/generic-sunpro.h, but atomics/fallback.h is included before it and
 it defines functions for flag operations which s_lock.h doesn't want to use.

Right. It should check not just for flag support, but also for 32bit
atomics. Easily fixable.

I'll send a new version with your fixes incorporated tomorrow.

Thanks for looking into this! Very helpful.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Andres Freund
On 2014-09-23 16:29:16 -0400, Robert Haas wrote:
 On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote:
  But this gets at another point: the way we're benchmarking this right
  now, we're really conflating the effects of three different things:
 
  1. Changing the locking regimen around the freelist and clocksweep.
  2. Adding a bgreclaimer process.
  3. Raising the number of buffer locking partitions.
 
 I did some more experimentation on this.  Attached is a patch that
 JUST does #1, and, as previously suggested, it uses a single spinlock
 instead of using two of them that are probably in the same cacheline.
 Without this patch, on a 32-client, read-only pgbench at scale factor
 1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about
 LWLock-inspired preemption:
 
  - LWLockAcquire
 + 68.41% ReadBuffer_common
 + 31.59% StrategyGetBuffer
 
 With the patch, unsurprisingly, StrategyGetBuffer disappears and the
 only lwlocks that are causing context-switches are the individual
 buffer locks.  But are we suffering spinlock contention instead as a
 result?   Yes, but not much.  s_lock is at 0.41% in the corresponding
 profile, and only 6.83% of those calls are from the patched
 StrategyGetBuffer.  In a similar profile of master, s_lock is at
 0.31%.  So there's a bit of additional s_lock contention, but I think
 it's considerably less than the contention over the lwlock it's
 replacing, because the spinlock is only held for the minimal amount of
 time needed, whereas the lwlock could be held across taking and
 releasing many individual buffer locks.

Am I understanding you correctly that you also measured context switches
for spinlocks? If so, I don't think that's a valid comparison. LWLocks
explicitly yield the CPU as soon as there's any contention while
spinlocks will, well, spin. Sure they also go to sleep, but it'll take
longer.

It's also worthwile to remember in such comparisons that lots of the
cost of spinlocks will be in the calling routines, not s_lock() - the
first TAS() is inlined into it. And that's the one that'll incur cache
misses and such...

Note that I'm explicitly *not* doubting the use of a spinlock
itself. Given the short acquiration times and the exclusive use of
exlusive acquiration a spinlock makes more sense. The lwlock's spinlock
alone will have about as much contention.

I think it might be possible to construct some cases where the spinlock
performs worse than the lwlock. But I think those will be clearly in the
minority. And at least some of those will be fixed by bgwriter. As an
example consider a single backend COPY ... FROM of large files with a
relatively large s_b. That's a seriously bad workload for the current
victim buffer selection because frequently most of the needs to be
searched for a buffer with usagecount 0. And this patch will double the
amount of atomic ops during that.

Let me try to quantify that.

 What's interesting is that I can't see in the perf output any real
 sign that the buffer mapping locks are slowing things down, but they
 clearly are, because when I take this patch and also boost
 NUM_BUFFER_PARTITIONS to 128, the performance goes up:

What did events did you profile?

 I welcome further testing and comments, but my current inclination is
 to go ahead and push the attached patch.  To my knowledge, nobody has
 at any point objected to this aspect of what's being proposed, and it
 looks safe to me and seems to be a clear win.  We can then deal with
 the other issues on their own merits.

I've took a look at this, and all the stuff I saw that I disliked were
there before this patch. So +1 for going ahead.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 6:02 PM, Gregory Smith gregsmithpg...@gmail.com wrote:
 On 9/23/14, 10:31 AM, Robert Haas wrote:
 I suggest we count these things:

 1. The number of buffers the reclaimer has put back on the free list.
 2. The number of times a backend has run the clocksweep.
 3. The number of buffers past which the reclaimer has advanced the clock
 sweep (i.e. the number of buffers it had to examine in order to reclaim the
 number counted by #1).
 4. The number of buffers past which a backend has advanced the clocksweep
 (i.e. the number of buffers it had to examine in order to allocate the
 number of buffers count by #3).
 5. The number of buffers allocated from the freelist which the backend did
 not use because they'd been touched (what you're calling
 buffers_touched_freelist).

 All sound reasonable.  To avoid wasting time here, I think it's only worth
 doing all of these as DEBUG level messages for now.  Then only go through
 the overhead of exposing the ones that actually seem relevant.  That's what
 I did for the 8.3 work, and most of that data at this level was barely
 relevant to anyone but me then or since. We don't want the system views to
 include so much irrelevant trivia that finding the important parts becomes
 overwhelming.

I think we expose far too little information in our system views.
Just to take one example, we expose no useful information about lwlock
acquire or release, but a lot of real-world performance problems are
caused by lwlock contention.  There are of course difficulties in
exposing huge numbers of counters, but we're not talking about many
here, so I'd lean toward exposing them in the final patch if they seem
at all useful.

 I'd like to see that level of instrumentation--just the debug level
 messages--used to quantify the benchmarks that people are running already,
 to prove they are testing what they think they are.  That would have caught
 the test error you already stumbled on for example.  Simple paranoia says
 there may be more issues like that hidden in here somewhere, and this set
 you've identified should find any more of them around.

Right.

 If all that matches up so the numbers for the new counters seem sane, I
 think that's enough to commit the first basic improvement here.  Then make a
 second pass over exposing the useful internals that let everyone quantify
 either the improvement or things that may need to be tunable.

Well, I posted a patch a bit ago that I think is the first basic
improvement - and none of these counters are relevant to that.  It
doesn't add a new background process or anything; it just does pretty
much the same thing we do now with less-painful locking.  There are no
water marks to worry about, or tunable thresholds, or anything; and
because it's so much simpler, it's far easier to reason about than the
full patch, which is why I feel quite confident pressing on toward a
commit.

Once that is in, I think we should revisit the idea of a bgreclaimer
process, and see how much more that improves things - if at all - on
top of what that basic patch already does.  For that we'll need these
counters, and maybe others.  But let's make that phase two.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund and...@2ndquadrant.com wrote:
 Am I understanding you correctly that you also measured context switches
 for spinlocks? If so, I don't think that's a valid comparison. LWLocks
 explicitly yield the CPU as soon as there's any contention while
 spinlocks will, well, spin. Sure they also go to sleep, but it'll take
 longer.

No, I measured CPU consumption attributable to s_lock.

(I checked context-switches, too.)

 It's also worthwile to remember in such comparisons that lots of the
 cost of spinlocks will be in the calling routines, not s_lock() - the
 first TAS() is inlined into it. And that's the one that'll incur cache
 misses and such...

True.  I can check that - I did not.

 Note that I'm explicitly *not* doubting the use of a spinlock
 itself. Given the short acquiration times and the exclusive use of
 exlusive acquiration a spinlock makes more sense. The lwlock's spinlock
 alone will have about as much contention.

Right.

 I think it might be possible to construct some cases where the spinlock
 performs worse than the lwlock. But I think those will be clearly in the
 minority. And at least some of those will be fixed by bgwriter. As an
 example consider a single backend COPY ... FROM of large files with a
 relatively large s_b. That's a seriously bad workload for the current
 victim buffer selection because frequently most of the needs to be
 searched for a buffer with usagecount 0. And this patch will double the
 amount of atomic ops during that.

It will actually be far worse than that, because we'll acquire and
release the spinlock for every buffer over which we advance the clock
sweep, instead of just once for the whole thing.  That's reason to
hope that a smart bgreclaimer process may be a good improvement on top
of this.  It can do things like advance the clock sweep hand 16
buffers at a time and then sweep them all after-the-fact, reducing
contention on the mutex by an order-of-magnitude, if that turns out to
be an important consideration.

But I think it's right to view that as something we need to test vs.
the baseline established by this patch.  What's clear today is that
workloads which stress buffer-eviction fall to pieces, because the
entire buffer-eviction process is essentially single-threaded.  One
process can't begin evicting a buffer until another has finished doing
so.  This lets multiple backends do that at the same time.  We may
find cases where that leads to an unpleasant amount of contention, but
since we have several ideas for how to mitigate that as needs be, I
think it's OK to go ahead.  The testing we're doing on the combined
patch is conflating the effects the new locking regimen with however
the bgreclaimer affects things, and it's very clear to me now that we
need to make sure those are clearly separate.

 Let me try to quantify that.

Please do.

 What's interesting is that I can't see in the perf output any real
 sign that the buffer mapping locks are slowing things down, but they
 clearly are, because when I take this patch and also boost
 NUM_BUFFER_PARTITIONS to 128, the performance goes up:

 What did events did you profile?

cs.

 I welcome further testing and comments, but my current inclination is
 to go ahead and push the attached patch.  To my knowledge, nobody has
 at any point objected to this aspect of what's being proposed, and it
 looks safe to me and seems to be a clear win.  We can then deal with
 the other issues on their own merits.

 I've took a look at this, and all the stuff I saw that I disliked were
 there before this patch. So +1 for going ahead.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 3:04 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:
 Heikki Linnakangas wrote:
  If you add a new datatype, and define b-tree operators for it, what
  is required to create a minmax opclass for it? Would it be possible
  to generalize the functions in brin_minmax.c so that they can be
  reused for any datatype (with b-tree operators) without writing any
  new C code? I think we're almost there; the only thing that differs
  between each data type is the opcinfo function. Let's pass the type
  OID as argument to the opcinfo function. You could then have just a
  single minmax_opcinfo function, instead of the macro to generate a
  separate function for each built-in datatype.

 Yeah, that's how I had that initially.  I changed it to what it's now as
 part of a plan to enable building cross-type opclasses, so you could
 have WHERE int8col=42 without requiring a cast of the constant to type
 int8.  This might have been a thinko, because AFAICS it's possible to
 build them with a constant opcinfo as well (I changed several other
 things to support this, as described in a previous email.)  I will look
 into this later.

 I found out that we don't really throw errors in such cases anymore; we
 insert casts instead.  Maybe there's a performance argument that it
 might be better to use existing cross-type operators than casting, but
 justifying this work just turned a lot harder.  Here's a patch that
 reverts opcinfo into a generic function that receives the type OID.

 I will look into adding some testing mechanism for the union support
 proc; with that I will just consider the patch ready for commit and will
 push.

With all respect, I think this is a bad idea.  I know you've put a lot
of energy into this patch and I'm confident it's made a lot of
progress.  But as with Stephen's patch, the final form deserves a
thorough round of looking over by someone else before it goes in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Michael Paquier
On Wed, Sep 24, 2014 at 8:23 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 23, 2014 at 3:04 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:
 I will look into adding some testing mechanism for the union support
 proc; with that I will just consider the patch ready for commit and will
 push.

 With all respect, I think this is a bad idea.  I know you've put a lot
 of energy into this patch and I'm confident it's made a lot of
 progress.  But as with Stephen's patch, the final form deserves a
 thorough round of looking over by someone else before it goes in.

Would this person be it an extra committer or an simple reviewer? It
would give more insurance if such huge patches (couple of thousands of
lines) get an extra +1 from another committer, proving that the code
has been reviewed by people well-experienced with backend code. Now as
this would put more pressure in the hands of committers, an extra
external pair of eyes, be it non-committer but let's say a seasoned
reviewer would be fine IMO.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] “Core” function in Postgres

2014-09-23 Thread Mingzhe Li
Hi experts,

I want to know what's the core function used in Postgres server? I am
looking for something corresponding to main() in a simple C program. I want
to know the file path and the function name. I am using Postgres 9.3.5,
however I assume the core function will be unchanged between different
revisions.

Please let me know if you are confused by my question.

Thanks

Zack

PS: I have the same post on stackoverflow. Since no one answered there, I
just report here.


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Andres Freund
On 2014-09-23 19:21:10 -0400, Robert Haas wrote:
 On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund and...@2ndquadrant.com wrote:
  I think it might be possible to construct some cases where the spinlock
  performs worse than the lwlock. But I think those will be clearly in the
  minority. And at least some of those will be fixed by bgwriter.

Err, this should read bgreclaimer, not bgwriter.

  As an
  example consider a single backend COPY ... FROM of large files with a
  relatively large s_b. That's a seriously bad workload for the current
  victim buffer selection because frequently most of the needs to be
  searched for a buffer with usagecount 0. And this patch will double the
  amount of atomic ops during that.
 
 It will actually be far worse than that, because we'll acquire and
 release the spinlock for every buffer over which we advance the clock
 sweep, instead of just once for the whole thing.

I said double, because we already acquire the buffer header's spinlock
every tick.

 That's reason to hope that a smart bgreclaimer process may be a good
 improvement on top of this.

Right. That's what I was trying to say with bgreclaimer above.

 But I think it's right to view that as something we need to test vs.
 the baseline established by this patch.

Agreed. I think the possible downsides are at the very fringe of already
bad cases. That's why I agreed that you should go ahead.

  Let me try to quantify that.
 
 Please do.

I've managed to find a ~1.5% performance regression. But the setup was
plain absurd. COPY ... FROM /tmp/... BINARY; of large bytea datums into
a fillfactor 10 table with the column set to PLAIN storage. With the
resulting table size chosen so it's considerably bigger than s_b, but
smaller than the dirty writeback limit of the kernel.

That's perfectly reasonable.

I can think of a couple other cases, but they're all similarly absurd.

  What's interesting is that I can't see in the perf output any real
  sign that the buffer mapping locks are slowing things down, but they
  clearly are, because when I take this patch and also boost
  NUM_BUFFER_PARTITIONS to 128, the performance goes up:
 
  What did events did you profile?
 
 cs.

Ah. My guess is that most of the time will probably actually be spent in
the lwlock's spinlock, not the the lwlock putting itself to sleep.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] “Core” function in Postgres

2014-09-23 Thread Peter Geoghegan
On Tue, Sep 23, 2014 at 4:29 PM, Mingzhe Li mingzhe0...@gmail.com wrote:
 I want to know what's the core function used in Postgres server? I am
 looking for something corresponding to main() in a simple C program. I want
 to know the file path and the function name. I am using Postgres 9.3.5,
 however I assume the core function will be unchanged between different
 revisions.

 Please let me know if you are confused by my question.

I think that the tcop is the closest thing to what you're looking for
(which includes postgres.c/PostgresMain()). There is a very simple
stub entry point (main(int argc, char *argv[])) within main.c, too,
which is the real entry point.

Why not just set some breakpoints in a place that seems interesting
from within GDB, and inspect the call stack? That can be a useful
technique for gaining understanding of the structure of complicated
codebases that you're totally unfamiliar with.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JsonbValue to Jsonb conversion

2014-09-23 Thread Andrew Dunstan


On 09/23/2014 12:23 PM, Dmitry Dolgov wrote:

Hi all,

I'm faced with some troubles about the jsonb implementation, and I 
hope I'll get little advice =)
If I understand correctly, an abstract function for jsonb modification 
should have the following stages:


Jsonb - JsonbValue - Modification - JsonbValue - Jsonb

One can convert the *JsonbValue* to the *Jsonb* only by 
*JsonbValueToJsonb* function. So, my question is can be *JsonbValue*, 
that contains few *jbvBinary* elements, converted to *Jsonb* by this 
function? It will be very useful, if you want modify only small part 
of your JsonbValue (e.g. replace value of some key). But when I'm 
trying to do this, an exception unknown type of jsonb container 
appears. Maybe I missed something? Or is there another approach to do 
this conversion?



If you can come up with a way of handling the jbvBinary values then by 
all means send a patch.


But this problem is fairly easily worked around by using an iterator 
over the binary value. The attached patch, which is work in progress for 
adding in the currently missing json functions for jsonb, contains a 
sort of example of doing this in jsonb_agg_transfn.


cheers

andrew

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..82dae72 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -12,11 +12,20 @@
  */
 #include postgres.h
 
+#include miscadmin.h
+#include access/htup_details.h
+#include access/transam.h
+#include catalog/pg_cast.h
+#include catalog/pg_type.h
 #include libpq/pqformat.h
 #include utils/builtins.h
+#include utils/datetime.h
+#include utils/lsyscache.h
 #include utils/json.h
 #include utils/jsonapi.h
 #include utils/jsonb.h
+#include utils/syscache.h
+#include utils/typcache.h
 
 typedef struct JsonbInState
 {
@@ -24,6 +33,23 @@ typedef struct JsonbInState
 	JsonbValue *res;
 } JsonbInState;
 
+/* unlike with json categories, we need to treat json and jsonb differently */
+typedef enum	/* type categories for datum_to_jsonb */
+{
+	JSONBTYPE_NULL,/* null, so we didn't bother to identify */
+	JSONBTYPE_BOOL,/* boolean (built-in types only) */
+	JSONBTYPE_NUMERIC,			/* numeric (ditto) */
+	JSONBTYPE_TIMESTAMP,/* we use special formatting for timestamp */
+	JSONBTYPE_TIMESTAMPTZ,  /* ... and timestamptz */
+	JSONBTYPE_JSON,/* JSON */
+	JSONBTYPE_JSONB,			/* JSONB */
+	JSONBTYPE_ARRAY,			/* array */
+	JSONBTYPE_COMPOSITE,		/* composite */
+	JSONBTYPE_JSONCAST,			/* something with an explicit cast to JSON */
+	JSONBTYPE_JSONBCAST,		/* something with an explicit cast to JSONB */
+	JSONBTYPE_OTHER/* all else */
+} JsonbTypeCategory;
+
 static inline Datum jsonb_from_cstring(char *json, int len);
 static size_t checkStringLen(size_t len);
 static void jsonb_in_object_start(void *pstate);
@@ -33,6 +59,22 @@ static void jsonb_in_array_end(void *pstate);
 static void jsonb_in_object_field_start(void *pstate, char *fname, bool isnull);
 static void jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal);
 static void jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype);
+static void jsonb_categorize_type(Oid typoid,
+  JsonbTypeCategory *tcategory,
+  Oid *outfuncoid);
+static void composite_to_jsonb(Datum composite, JsonbInState *result);
+static void array_dim_to_jsonb(JsonbInState *result, int dim, int ndims, int *dims,
+			  Datum *vals, bool *nulls, int *valcount,
+			  JsonbTypeCategory tcategory, Oid outfuncoid);
+static void array_to_jsonb_internal(Datum array, JsonbInState *result);
+static void jsonb_categorize_type(Oid typoid,
+  JsonbTypeCategory *tcategory,
+  Oid *outfuncoid);
+static void datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
+		   JsonbTypeCategory tcategory, Oid outfuncoid,
+		   bool key_scalar);
+static void add_jsonb(Datum val, bool is_null, JsonbInState *result,
+	  Oid val_type, bool key_scalar);
 
 /*
  * jsonb type input function
@@ -462,3 +504,1070 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
 
 	return out-data;
 }
+
+
+/*
+ * Determine how we want to render values of a given type in datum_to_jsonb.
+ *
+ * Given the datatype OID, return its JsonbTypeCategory, as well as the type's
+ * output function OID.  If the returned category is JSONBTYPE_CAST, we
+ * return the OID of the type-JSON cast function instead.
+ */
+static void
+jsonb_categorize_type(Oid typoid,
+	  JsonbTypeCategory *tcategory,
+	  Oid *outfuncoid)
+{
+	bool		typisvarlena;
+
+	/* Look through any domain */
+	typoid = getBaseType(typoid);
+
+	/* We'll usually need to return the type output function */
+	getTypeOutputInfo(typoid, outfuncoid, typisvarlena);
+
+	/* Check for known types */
+	switch (typoid)
+	{
+		case BOOLOID:
+			*tcategory = JSONBTYPE_BOOL;
+			break;
+
+		case INT2OID:
+		case INT4OID:
+		case INT8OID:
+		case FLOAT4OID:
+		case FLOAT8OID:
+		case NUMERICOID:
+	

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Peter Geoghegan
On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think we should bite the bullet and break compatibility with 9.4beta2
 format, even if we go with my patch. In a jsonb object, it makes sense to
 store all the keys first, like Tom did, because of cache benefits, and the
 future possibility to do smart EXTERNAL access. Also, even if we can make
 the on-disk format compatible, it's weird that you can get different runtime
 behavior with datums created with a beta version. Seems more clear to just
 require a pg_dump + restore.

I vote for going with your patch, and breaking compatibility for the
reasons stated here (though I'm skeptical of the claims about cache
benefits, FWIW).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I think we should bite the bullet and break compatibility with 9.4beta2
 format, even if we go with my patch. In a jsonb object, it makes sense to
 store all the keys first, like Tom did, because of cache benefits, and the
 future possibility to do smart EXTERNAL access. Also, even if we can make
 the on-disk format compatible, it's weird that you can get different runtime
 behavior with datums created with a beta version. Seems more clear to just
 require a pg_dump + restore.

 I vote for going with your patch, and breaking compatibility for the
 reasons stated here (though I'm skeptical of the claims about cache
 benefits, FWIW).

I'm also skeptical of that, but I think the potential for smart EXTERNAL
access is a valid consideration.

I've not had time to read Heikki's updated patch yet --- has anyone
else compared the two patches for code readability?  If they're fairly
close on that score, then I'd agree his approach is the best solution.
(I will look at his code, but I'm not sure I'm the most unbiased
observer.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core function in Postgres

2014-09-23 Thread Gregory Smith

On 9/23/14, 8:02 PM, Michael Paquier wrote:
pgsql-hackers is as well a mailing list where people have technical 
discussions about features and patches, hence your question would be 
more adapted for pgsql-novice or pgsql-general.


Let's be fair and get the details perfect if we're going to advise 
people on list policy, and that's not quite right if you compare against 
http://www.postgresql.org/list/


The official description says you consider trying elsewhere first, and  
If your question cannot be answered by people in the other lists, and it 
is likely that only a developer will know the answer, you may re-post 
your question in this list.


From that, it's completely reasonable that Mingzhe guessed how main() 
is used in PostgreSQL is something only a developer would know the 
answer to, and therefore should go here instead of pgsql-general.


My feedback would be to point out that that pgsql-general is also a list 
where  many of the developers monitor traffic, and that's the right 
place to post first for this sort of question.  The mailing list page 
does not make that completely obvious though, so this is an 
understandable thing to be unsure about.  This is definitely not a 
pgsql-novice question.


The mailing list page could probably use an explicit clarification that 
pgsql-general *is* an appropriate venue for simpler questions that only 
a developer will know the answer.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Alvaro Herrera
Robert Haas wrote:

 With all respect, I think this is a bad idea.  I know you've put a lot
 of energy into this patch and I'm confident it's made a lot of
 progress.  But as with Stephen's patch, the final form deserves a
 thorough round of looking over by someone else before it goes in.

As you can see in the thread, Heikki's put a lot of review effort into
it (including important code contributions); I don't feel I'm rushing it
at this point.  If you or somebody else want to give it a look, I have
no problem waiting a bit longer.  I don't want to delay indefinitely,
though, because I think it's better shipped early in the release cycle
than later, to allow for further refinements and easier testing by other
interested parties.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] “Core” function in Postgres

2014-09-23 Thread Mark Kirkwood

On 24/09/14 11:29, Mingzhe Li wrote:

Hi experts,

I want to know what's the core function used in Postgres server? I am
looking for something corresponding to main() in a simple C program. I
want to know the file path and the function name. I am using Postgres
9.3.5, however I assume the core function will be unchanged between
different revisions.



I suspect you want to start looking at
PostgresMain in src/backend/tcop/postgres.c

and ServerLoop in src/backend/postmaster/postmaster.c

Regards

Mark

P.s: FWIW I think this *is* the right list to ask this type of question...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 7:35 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Would this person be it an extra committer or an simple reviewer? It
 would give more insurance if such huge patches (couple of thousands of
 lines) get an extra +1 from another committer, proving that the code
 has been reviewed by people well-experienced with backend code. Now as
 this would put more pressure in the hands of committers, an extra
 external pair of eyes, be it non-committer but let's say a seasoned
 reviewer would be fine IMO.

If you're volunteering, I certainly wouldn't say no.  The more the
merrier.  Same with anyone else.  Since Heikki looked at it before, I
also think it would be appropriate to give him a bit of time to see if
he feels satisfied with it now - nobody on this project has more
experience with indexing than he does, but he may not have the time,
and even if he does, someone else might spot something he misses.

Alvaro's quite right to point out that there is no sense in waiting a
long time for a review that isn't coming.  That just backs everything
up against the end of the release cycle to no benefit.  But if there's
review available from experienced people within the community, taking
advantage of that now might find things that could be much harder to
fix later.  That's a win for everybody.  And it's not like we're
pressed up against the end of the cycle, nor is it as if this feature
has been through endless rounds of review already.  It's certainly had
some, and it's gotten better as a result.  But it's also changed a lot
in the process.

And much of the review to date has been high-level design review, like
how should the opclasses look? and what should we call this thing
anyway?.  Going through it for logic errors, documentation
shortcomings, silly thinkos, etc. has not been done too much, I think,
and definitely not on the latest version.  So, some of that might not
be out of place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 9:23 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 With all respect, I think this is a bad idea.  I know you've put a lot
 of energy into this patch and I'm confident it's made a lot of
 progress.  But as with Stephen's patch, the final form deserves a
 thorough round of looking over by someone else before it goes in.

 As you can see in the thread, Heikki's put a lot of review effort into
 it (including important code contributions); I don't feel I'm rushing it
 at this point.

Yeah, I was really glad Heikki looked at it.  That seemed good.

 If you or somebody else want to give it a look, I have
 no problem waiting a bit longer.  I don't want to delay indefinitely,
 though, because I think it's better shipped early in the release cycle
 than later, to allow for further refinements and easier testing by other
 interested parties.

I agree with that.  I'd like to look at it, and I will if I get time,
but as I said elsewhere, I also think it's appropriate to give a
little time around the final version of any big, complex patch just
because people may have thoughts, and they may not have time to
deliver those thoughts the minute the patch hits the list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Robert Haas
On Tue, Sep 23, 2014 at 7:42 PM, Andres Freund and...@2ndquadrant.com wrote:
 It will actually be far worse than that, because we'll acquire and
 release the spinlock for every buffer over which we advance the clock
 sweep, instead of just once for the whole thing.

 I said double, because we already acquire the buffer header's spinlock
 every tick.

Oh, good point.

  Let me try to quantify that.

 Please do.

 I've managed to find a ~1.5% performance regression. But the setup was
 plain absurd. COPY ... FROM /tmp/... BINARY; of large bytea datums into
 a fillfactor 10 table with the column set to PLAIN storage. With the
 resulting table size chosen so it's considerably bigger than s_b, but
 smaller than the dirty writeback limit of the kernel.

 That's perfectly reasonable.

 I can think of a couple other cases, but they're all similarly absurd.

Well, it's not insane to worry about such things, but if you can only
manage 1.5% on such an extreme case, I'm encouraged.  This is killing
us on OLTP workloads, and fixing that is a lot more important than a
couple percent on an extreme case.

 Ah. My guess is that most of the time will probably actually be spent in
 the lwlock's spinlock, not the the lwlock putting itself to sleep.

Ah, OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-09-23 Thread Etsuro Fujita

(2014/09/13 2:42), Fujii Masao wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:



PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE?  How about setting PENDING_LIST_CLEANUP_SIZE to
work_mem as the default value when running the CREATE INDEX command?


That's an idea. But there might be some users who want to change
the cleanup size per session like they can do by setting work_mem,
and your idea would prevent them from doing that...

So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.


Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.


Agreed.


I'm not sure about the idea of being able to change it per session,
though.  Do you mean that you would like insert processes use a very
large value so that they can just append new values to the pending list,
and have vacuum use a small value so that it cleans up as soon as it
runs?  Two things: 1. we could have an autovacuum_ reloption which
only changes what autovacuum does; 2. we could have autovacuum run
index cleanup actions separately from actual vacuuming.


Yes, I was thinking something like that. But if autovacuum
has already been able to handle that, it's nice. Anyway,
as you pointed out, it's better to have both GUC and reloption
for the cleanup size of pending list.


OK, I'd vote for your idea of having both the GUC and the reloption. 
So, I think the patch needs to be updated.  Fujii-san, what plan do you 
have about the patch?


Sorry for the delay.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Gregory Smith

On 9/23/14, 1:21 AM, David Johnston wrote:
This patch should fix the round-to-zero issue.  If someone wants to 
get rid of zero as a special case let them supply a separate patch for 
that improvement.


I am going to wander into this fresh after just reading everything once 
(but with more than a little practice with real-world GUC mucking) and 
say that, no, it should not even do that.  The original complaint here 
is real and can be straightforward to fix, but I don't like any of the 
proposals so far.


My suggestion:  fix the one really bad one in 9.5 with an adjustment to 
the units of log_rotation_age.  That's a small commit that should thank 
Tomonari Katsumata for discovering the issue and even suggesting a fix 
(that we don't use) and a release note; sample draft below.  Stop there.


Let someone with a broader objection take on the fact that zero (and -1) 
values have special properties, and that sucks, as a fully reasoned 
redesign.  I have a larger idea for minimizing rounding issues of 
timestamps in particular at the bottom of this too.  I wouldn't dare 
take on changes to rounding of integers without sorting out the special 
flag value issue first, because the variety of those in the database is 
large compared to the time ones.


== log_rotation_age ==

Back to where this started at 
http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp : 
log_rotation_age would be disabled if we set it less than one minute, 
with this example of a surprising behavior:


log_rotation_age = 10s

postgres=# show log_rotation_age ;
log_rotation_age
--
0

That's bad and the GUC system is not being friendly; fully agreed. But 
this is just one where the resolution for the parameter is poor.  
log_rotation_age is the *only* GUC with a time interval that's set in 
minutes:


postgres=# SELECT name, unit FROM pg_settings WHERE unit ='min';
   name   | unit
--+--
 log_rotation_age | min

For comparison, there are 10 GUCs set in seconds and 13 in ms in HEAD.

We could just change the units for log_rotation_age to seconds, then let 
the person who asks for a 10s rotation time suffer for that decision 
only with many log files.  The person who instead chooses 10ms may find 
log rotation disabled altogether because that rounds to zero, but now we 
are not talking about surprises on something that seems sane--that's a 
fully unreasonable user request.


Following that style of fix all the way through to the sort of release 
notes needed would give something like this:


log_rotation_age is now set in units of seconds instead of minutes. 
Earlier installations of PostgreSQL that set this value directly, to a 
value in minutes, should change that setting to use a time unit before 
migrating to this version.


And we could be done for now.

== Flag values and error handling ==

I would like to see using 0 and -1 as special values in GUCs overhauled 
one day, with full disregard for backward compatibility as a 
straightjacket on doing the right thing.  This entire class of behavior 
is annoying.  But I am skeptical anything less than such an overhaul 
will really be useful.  To me it's not worth adding new code, 
documentation, and associated release notes to guide migration all to 
try and describe new rounding trivia--which I don't see as better nor 
worse than the trade-offs of what happens today.


I can't take the idea of throwing errors for something this minor 
seriously at all.  There are a lot more precedents for spreading 
settings around in a few places now, from include_dir to 
postgresql.auto.conf.  There are so many ways to touch a config now and 
not notice the error message now, I really don't want to see any more 
situations appear where trying to change a setting will result in a 
broken file added into that mix.  None of this broken units due to 
rounding stuff that I've found, now that I went out of my way looking 
for some, even comes close to rising to that level of serious to me.  
Only this log rotation one is so badly out of line that it will act 
quite badly.


== Overhauling all time unit GUCs ==

Here's the complete list of potential ugly time settings to review in 
the future, if my suggestion of only fixing log_rotation_age were followed:


gsmith=# SELECT name,setting, unit, min_val FROM pg_settings where unit 
in ('s','ms') and (min_val::integer)=0 order by unit,min_val,name;

 name | setting | unit | min_val
--+-+--+-
 autovacuum_vacuum_cost_delay | 20  | ms   | -1
 log_autovacuum_min_duration  | -1  | ms   | -1
 log_min_duration_statement   | -1  | ms   | -1
 max_standby_archive_delay| 3   | ms   | -1
 max_standby_streaming_delay  | 3   | ms   | -1
 lock_timeout | 0   | ms   | 0
 statement_timeout| 0   | ms   | 0
 vacuum_cost_delay| 0   | ms   | 0
 wal_receiver_timeout | 6   

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-09-23 Thread Peter Geoghegan
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan p...@heroku.com wrote:
 Omission
 ===

 The patch currently lacks a way of referencing datums rejected for
 insertion when updating.

Attached revision of the patch set (which I'll call v1.2) adds this
capability in a separate commit. It now becomes possible to add a
CONFLICTING expression within the ON CONFLICT UPDATE targetlist or
predicate. Example use:


postgres=# CREATE TABLE upsert(key int4 PRIMARY KEY, val text);
CREATE TABLE
postgres=# INSERT INTO upsert VALUES(1, 'Giraffe');
INSERT 0 1
postgres=# SELECT * FROM upsert;
 key |   val
-+-
   1 | Giraffe
(1 row)

postgres=# INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON
CONFLICT UPDATE SET val = CONFLICTING(val);
INSERT 0 1
postgres=# SELECT * FROM upsert;
 key | val
-+--
   1 | Bear
   2 | Lion
(2 rows)



Note that the effects of BEFORE INSERT triggers are carried here,
which I slightly favor over the alternative of not having it work that
way.

I've also expanded upon my explanation for the structure of the query
tree and plan within (revised/rebased versions of) earlier commits. I
am clearer on why there is a special subquery planning step for the
auxiliary UPDATE, rather than making the UPDATE directly accessible as
a subquery within the post-parse-analysis query tree. Basically, the
optimizer has no basis for understanding that a DML sublink isn't
optimizable. It'll try to pull-up the subquery and so on, which of
course does not and cannot work. Whereas treating it as an
independently planned subquery of the top-level query, kind of like a
data-modifying CTE makes sense (with such CTEs, the executor is
prepared for the possibility that not all rows will be pulled up - so
there too, the executor drives execution more directly than makes
sense when not dealing with DML: it finishes off the data-modifying
CTE's DML for any still-unconsumed tuples, within
ExecPostprocessPlan()).

It's certainly possible that a more unified representation makes sense
(i.e. one ModifyTable plan, likely still having seperate INSERT/UPDATE
representations at earlier stages of query processing), but that would
require serious refactoring of the representation of ModifyTable
operations -- just for example, consider the need for a
unified-though-separate targetlist, one for the INSERT part, the other
for the UPDATE part. For now, I continue to find it very convenient to
represent the UPDATE as a selectively executed, auxiliary, distinct
ModifyTable plan, rather than adding a subquery rangetable directly
during parse analysis.

There is another significant change. In this revision, I am at least
honest about the plan representation within EXPLAIN:


postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val);
QUERY PLAN
--
 Insert on upsert  (cost=0.00..0.03 rows=2 width=36) (actual
time=0.115..0.115 rows=0 loops=1)
   -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2 width=36)
(actual time=0.003..0.005 rows=2 loops=1)
   -  Conflict Update on upsert  (cost=0.00..22.30 rows=1230
width=36) (actual time=0.042..0.051 rows=0 loops=1)
 -  Seq Scan on upsert  (cost=0.00..22.30 rows=1230 width=36)
(never executed)
 Planning time: 0.065 ms
 Execution time: 0.158 ms
(6 rows)

postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val) where key = 2;
  QUERY PLAN
--
 Insert on upsert  (cost=0.00..0.03 rows=2 width=36) (actual
time=0.075..0.075 rows=0 loops=1)
   -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2 width=36)
(actual time=0.001..0.002 rows=2 loops=1)
   -  Conflict Update on upsert  (cost=4.16..8.17 rows=1 width=36)
(actual time=0.012..0.026 rows=0 loops=1)
 -  Bitmap Heap Scan on upsert  (cost=4.16..8.17 rows=1
width=36) (never executed)
   Recheck Cond: (key = 2)
   -  Bitmap Index Scan on upsert_pkey  (cost=0.00..4.16
rows=1 width=0) (never executed)
 Index Cond: (key = 2)
 Planning time: 0.090 ms
 Execution time: 0.125 ms
(9 rows)



The second query gets a bitmap scan because plain index scans have
been disabled for the UPDATE (a temporary kludge), since index-only
scans can break things - IndexOnlyRecheck() throws an error. Not quite
sure why the optimizer doesn't care about resjunk for the UPDATE,
which is presumably why in general regular updates never use
index-only scans. Since I think the actual auxiliary plan generation
needs work, so as to not have uselessly complicated plans, I didn't
try too hard to figure that out. This plan structure is not
acceptable, of course, but maybe 

Re: [HACKERS] LIMIT for UPDATE and DELETE

2014-09-23 Thread Etsuro Fujita

(2014/09/17 1:58), Robert Haas wrote:

On Tue, Sep 16, 2014 at 11:31 AM, Kevin Grittner kgri...@ymail.com wrote:

Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:

(2014/08/15 6:18), Rukh Meski wrote:

Based on the feedback on my previous patch, I've separated only the
LIMIT part into its own feature.  This version plays nicely with
inheritance.  The intended use is splitting up big UPDATEs and DELETEs
into batches more easily and efficiently.


IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT.  Is
that OK?  When we support ORDER BY ... LIMIT/OFFSET, we will also be
allowing for OFFSET with UPDATE/DELETE ... LIMIT.  So, ISTM it would be
better for the patch to support OFFSET at this point.  No?


Without ORDER BY you really would have no idea *which* rows the
OFFSET would be skipping.  Even more dangerously, you might *think*
you do, and get a surprise when you see the results (if, for
example, a seqscan starts at a point other than the start of the
heap, due to a concurrent seqscan from an unrelated query).  It
might be better not to provide an illusion of a degree of control
you don't have, especially for UPDATE and DELETE operations.


Fair point, but I'd lean toward including it.  I think we all agree
the end goal is ORDER BY .. LIMIT, and there OFFSET certainly has
meaning.  If we don't include it now, we'll just end up adding it
later.  It makes for fewer patches, and fewer changes for users, if we
do it all at once.


I agree with Robert.

Rukh, what do you think as an author?

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread David Johnston
On Tue, Sep 23, 2014 at 10:11 PM, Gregory Smith gregsmithpg...@gmail.com
wrote:

 On 9/23/14, 1:21 AM, David Johnston wrote:

 This patch should fix the round-to-zero issue.  If someone wants to get
 rid of zero as a special case let them supply a separate patch for that
 improvement.


 I am going to wander into this fresh after just reading everything once
 (but with more than a little practice with real-world GUC mucking) and say
 that, no, it should not even do that.  The original complaint here is real
 and can be straightforward to fix, but I don't like any of the proposals so
 far.

 My suggestion:  fix the one really bad one in 9.5 with an adjustment to
 the units of log_rotation_age.  That's a small commit that should thank
 Tomonari Katsumata for discovering the issue and even suggesting a fix
 (that we don't use) and a release note; sample draft below.  Stop there.


​+1​

I'm not convinced minute is wrong but it does imply a level of
user-friendliness (or over-parenting) that we can do away with.



 We could just change the units for log_rotation_age to seconds, then let
 the person who asks for a 10s rotation time suffer for that decision only
 with many log files.  The person who instead chooses 10ms may find log
 rotation disabled altogether because that rounds to zero, but now we are
 not talking about surprises on something that seems sane--that's a fully
 unreasonable user request.


​Given the following why not just pick ms for log_rotation_age now?
​



 Following that style of fix all the way through to the sort of release
 notes needed would give something like this:

 log_rotation_age is now set in units of seconds instead of minutes.
 Earlier installations of PostgreSQL that set this value directly, to a
 value in minutes, should change that setting to use a time unit before
 migrating to this version.


? are there any special considerations for people using pg_dump vs. those
using pg_upgrade?​


 If I were feeling ambitious about breaking configurations with a long-term
 eye toward improvement, I'd be perfectly happy converting everything on
 this list to ms.  We live in 64 bit integer land now; who cares if the
 numbers are bigger?


 Then the rounding issues only impact sub-millisecond values, making all
 them squarely in nonsense setting land.  Users will be pushed very clearly
 to always use time units in their postgresql.conf files instead of guessing
 whether something is set in ms vs. seconds.  Seeing the reaction to a units
 change on log_rotation_age might be a useful test for how that sort of
 change plays out in the real world.


​If we are going to go that far why not simply modify the GUC input routine
to require unit-values on these particular parameters?  Direct manipulation
of pg_settings probably would need some attention but everything else could
reject integers and non-unit-specifying text.  Allow the same input routine
to accept the constants on|off|default and convert them internally into
whatever the given GUC requires and you get the UI benefits without mucking
around with the implementation details.  Modify pg_settings accordingly to
hide those now-irrelevant pieces.  For UPDATE a trigger can be used to
enforce the text-only input requirement.

As long as we do not make microsecond µs a valid time-unit it is
impossible currently to directly specify a fraction of a millisecond.

David J.
​


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Tue, Sep 23, 2014 at 9:23 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  If you or somebody else want to give it a look, I have
  no problem waiting a bit longer.  I don't want to delay indefinitely,
  though, because I think it's better shipped early in the release cycle
  than later, to allow for further refinements and easier testing by other
  interested parties.
 
 I agree with that.  I'd like to look at it, and I will if I get time,
 but as I said elsewhere, I also think it's appropriate to give a
 little time around the final version of any big, complex patch just
 because people may have thoughts, and they may not have time to
 deliver those thoughts the minute the patch hits the list.

Fair enough -- I'll keep it open for the time being.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-23 Thread Gregory Smith

On 9/23/14, 10:52 PM, David Johnston wrote:

​Given the following why not just pick ms for log_rotation_age now?


Right now there are people out there who have configurations that look 
like this:


log_rotation_age=60

In order to get hourly rotation.  These users will suffer some trauma 
should they upgrade to a version where this parameter now means a new 
log file pops every 60 seconds instead.  If they didn't catch the 
warning in the release notes and it happens, I'm pretty comfortable 
they'll survive though, just with a bunch of cursing until it's 
resolved.  I'd even wager a beer that more than 10% of PostgreSQL 
installs that get hit won't even notice.


I'd prefer not to make that experience into one where they get a log 
file every 60ms though.  That's seriously bad.  I'm not opposed to 
making major changes like that, I just like them to be as part of 
updates big enough that people are unlikely to miss that something is 
different.  Just doing this log_rotation_age thing is small enough that 
I expect people to miss the change in the release notes.  That limits 
how much I think we can change the magnitude of an easy to miss setting 
with a large unit adjustment.


? are there any special considerations for people using pg_dump vs. 
those using pg_upgrade?​


I don't know that there's any code in pg_upgrade aiming at this sort of 
job.  I'd prefer not to add find parameters that have changed in 
meaning and flag them to pg_upgrade's job requirements.  It has enough 
problems to worry about, and we really don't do this very often.


If we are going to go that far why not simply modify the GUC input 
routine to require unit-values on these particular parameters?  Direct 
manipulation of pg_settings probably would need some attention but 
everything else could reject integers and non-unit-specifying text.


That would be fine by me as a long-term direction, but it's more of a 
two to three version release level of change.  To keep that from being 
terrible for users, we'd need to provide a transition release that 
helped people find the problem ones without units.  After that was in 
the wild for a while, then could we have some confidence that enough 
people had flushed the issue out enough to turn it into a hard requirement.


The project went through this exact sort of exercise with the 
standard_conforming_strings GUC being the way we flagged the bad 
constructs for people.  That was a much harder job because it touched 
everyone's application code too.  But it took many years to play out.  
I'd be shocked if we could herd our flock of old time users fast enough 
to remove unitless GUC values from PostgreSQL in less than 3 years worth 
of new releases.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-23 Thread Gregory Smith

On 9/23/14, 7:13 PM, Robert Haas wrote:
I think we expose far too little information in our system views. Just 
to take one example, we expose no useful information about lwlock 
acquire or release, but a lot of real-world performance problems are 
caused by lwlock contention.
I sent over a proposal for what I was calling Performance Events about a 
year ago.  The idea was to provide a place to save data about lock 
contention, weird checkpoint sync events, that sort of thing.  Replacing 
log parsing to get at log_lock_waits data was my top priority.  Once 
that's there, lwlocks was an obvious next target.  Presumably we just 
needed collection to be low enough overhead, and then we can go down to 
whatever shorter locks we want; lower the overhead, faster the event we 
can measure.


Sometimes the database will never be able to instrument some of its 
fastest events without blowing away the event itself.  We'll still have 
perf / dtrace / systemtap / etc. for those jobs.  But those are not the 
problems of the average Postgres DBA's typical day.


The data people need to solve this sort of thing in production can't 
always show up in counters.  You'll get evidence the problem is there, 
but you need more details to actually find the culprit.  Some info about 
the type of lock, tables and processes involved, maybe the query that's 
running, that sort of thing.  You can kind of half-ass the job if you 
make per-tables counter for everything, but we really need more, both to 
serve our users and to compare well against what other databases provide 
for tools.  That's why I was trying to get the infrastructure to capture 
all that lock detail, without going through the existing logging system 
first.


Actually building Performance Events fell apart on the storage side:  
figuring out where to put it all without waiting for a log file to hit 
disk.  I wanted in-memory storage so clients don't wait for anything, 
then a potentially lossy persistence writer.  I thought I could get away 
with a fixed size buffer like pg_stat_statements uses.  That was 
optimistic.  Trying to do better got me lost in memory management land 
without making much progress.


I think the work you've now done on dynamic shared memory gives the 
right shape of infrastructure that I could pull this off now.  I even 
have funding to work on it again, and it's actually the #2 thing I'd 
like to take on as I get energy for new feature development.  (#1 is the 
simple but time consuming job of adding block write counters, the lack 
of which which is just killing me on some fast growing installs)


I have a lot of unread messages on this list to sort through right now.  
I know I saw someone try to revive the idea of saving new sorts of 
performance log data again recently; can't seem to find it again right 
now.  That didn't seem like it went any farther than thinking about the 
specifications though.  The last time I jumped right over that and hit a 
wall with this one hard part of the implementation instead, low overhead 
memory management for saving everything.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Jan Wieck

On 09/15/2014 09:46 PM, Craig Ringer wrote:

On 09/16/2014 07:44 AM, Peter Geoghegan wrote:

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.


Also, at a certain size the fact that Pg must rewrite the whole document
for any change to it starts to introduce other practical changes.

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?



This comment seems to have drowned in the discussion.

If there indeed has to be a catversion bump in the process of this, then 
I agree with Craig.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-23 Thread Peter Geoghegan
On Tue, Sep 23, 2014 at 10:02 PM, Jan Wieck j...@wi3ck.info wrote:
 Is it worth paying a byte per value to save on possible upgrade pain?


 This comment seems to have drowned in the discussion.

 If there indeed has to be a catversion bump in the process of this, then I
 agree with Craig.

-1. We already have a reserved bit.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >