[HACKERS] Review of GetUserId() Usage
Greetings, While looking through the GetUserId() usage in the backend, I couldn't help but notice a few rather questionable cases that, in my view, should be cleaned up. As a reminder, GetUserId() returns the OID of the user we are generally operating as (eg: whatever the 'role' is, as GetUserId() respects SET ROLE). It does NOT include roles which we currently have the privileges of (that would be has_privs_of_role()), nor roles which we could SET ROLE to (that's is_member_of_role, or check_is_... if you want to just error out in failure cases). On to the list- pgstat_get_backend_current_activity() Used to decide if the current activity string should be returned or not. In my view, this is a clear case which should be addressed through has_privs_of_role() instead of requiring the user to SET ROLE to each role they are an inheirited member of to query for what the other sessions are doing. pg_signal_backend() Used to decide if pg_terminate_backend and pg_cancel_backend are allowed. Another case which should be changed over to has_privs_of_role(), in my view. Requiring the user to SET ROLE for roles which they are an inheirited member of is confusing as it's generally not required. pg_stat_get_activity() Used to decide if the state information should be shared. My opinion is the same as above- use has_privs_of_role(). There are a number of other functions in pgstatfuncs.c with similar issues (eg: pg_stat_get_backend_activity(), pg_stat_get_backend_client_port(), and others). Changing these would make things easier for some of our users, I'm sure.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Commitfest status
On 09/20/2014 06:54 AM, Michael Paquier wrote: CF3 is actually over for a couple of days, There are different opinions on when a commitfest is over. In my opinion, the point of a commitfest is that every patch that someone submits gets enough review so that the patch author knows what he needs to do next. It's not determined by a date, but by progress. wouldn't it be better to bounce back patches marked as waiting on author and work on the rest needing review? Yep, it's time to do that. I have now marked those patches that have been in Waiting on Author state, but have already been reviewed to some extent, as Returned with Feedback. I kept a two patches: * Flush buffers belonging to unlogged tables, and * Function returning the timestamp of last transaction The first one is a bug-fix, and the second one is stalled by a bug-fix that hasn't been applied yet. We should deal with them ASAP. There are still plenty of patches in Needs review state. We got below 20 at one point, but are back to 24 now. Reviewers: Please *review a patch*! We need to get closure to every patch. Patch authors: Nag the reviewer of your patch. If that doesn't help, contact other people who you think would be qualified to review your patch, and ask them nicely to review your patch. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 09/23/2014 09:15 AM, Peter Geoghegan wrote: On Mon, Sep 22, 2014 at 8:25 PM, Robert Haas robertmh...@gmail.com wrote: Should RLS be reverted, and revisited in a future CF? IMHO, that would be entirely appropriate. That seems pretty straightforward, then. I think that it will have to be reverted. OTOH, if the patch is actually OK as it was committed, there's no point reverting it only to commit it again later. At the end of the day, the important thing is that the patch gets sufficient review. Clearly Stephen thinks that it did, while you and Andres do not. To make sure this doesn't just slip by without sufficient review, I'll add this to the next commitfest. It's a bit unusual to have a commitfest entry for something that's already been committed, but that's fine. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extending COPY TO
Hi all, it’s my first time here, so please let me know if I’m doing something wrong. I’m a developer, heavy PG user, but I’ve never hacked it before. Last week at work we had to produce a quite big CSV data file which should be used as input by another piece of software. Since the file must be produced on a daily basis, is big, and it contains data stored in our PG database, letting PG produce the file itself seemed the right approach. Unfortunately the target software is, let say, “legacy” software and can only accept CRLF as EOL character. Since our PG is installed on a Linux server COPY TO results in a CSV file with LF as EOL, forcing us to pass the file a second time to convert EOL, which is inconvenient. My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep it simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour when no EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or not. I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself, submit the patch here and wait for your review. However before starting this in my spare time I wanted to know if you, as the PG hackers community, would be against a similar proposal for any reason, and if so why. Thanks in advance, Andrea -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Mon, Sep 22, 2014 at 11:45 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: To make sure this doesn't just slip by without sufficient review, I'll add this to the next commitfest. It's a bit unusual to have a commitfest entry for something that's already been committed, but that's fine. That seems sensible. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we excise the remnants of borland cc support?
On Sep 23, 2014 2:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-09-20 10:03:43 -0400, Andrew Dunstan wrote: I thought the Borland stuff was there only so we could build client libraries for use with things like Delphi. FWIW I got offlist reports of two not subscribed people that they simply use the normal libpq dll from delphi. Copying it from pgadmin or the pg installer. Whether or not it's really needed to preserve the ability to build libpq with borland, I'm just about certain that it's never worked to build the backend with borland (thus explaining the lack of buildfarm members). So it should be safe enough to strip support appearing in backend-only header files. The backend has never built with borland. I'm pretty sure I suggested we drop borland support completely a few years ago but people felt it wasnt costing enough to warrant a drop at the time. Things may have changed now, but even without that we can definitely drop the backend side of things. /Magnus
Re: [HACKERS] RLS feature has been committed
Some random comments after a quick read-through of the patch: * Missing documentation. For a major feature like this, reference pages for the CREATE/DROP POLICY statements are not sufficient. We'll need a whole new chapter for this. * In CREATE POLICY, the USING and WITH CHECK keywords are not very descriptive. I kind of understand WITH CHECK - it's applied to new rows like a CHECK constraint - but USING seems rather arbitrary and WITH CHECK isn't all that clear either. Perhaps ON SELECT CHECK and ON UPDATE CHECK or something like that would be better. I guess USING makes sense when that's the only expression given, so that it applies to both SELECTs and UPDATEs. But when a WITH CHECK expression is given together with USING, it gets confusing. + if (is_from) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg(COPY FROM not supported with row security.), +errhint(Use direct INSERT statements instead.))); + Why is that not implemented? I think it should be. * In src/backend/commands/createas.c: @@ -420,6 +421,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) ExecCheckRTPerms(list_make1(rte), true); /* +* Make sure the constructed table does not have RLS enabled. +* +* check_enable_rls() will ereport(ERROR) itself if the user has requested +* something invalid, and otherwise will return RLS_ENABLED if RLS should +* be enabled here. We don't actually support that currently, so throw +* our own ereport(ERROR) if that happens. +*/ + if (check_enable_rls(intoRelationId, InvalidOid) == RLS_ENABLED) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +(errmsg(policies not yet implemented for this command; + + /* * Tentatively mark the target as populated, if it's a matview and we're * going to fill it; otherwise, no change needed. */ Hmm. So, if a table we just created with CREATE TABLE AS has row-level security enabled, we throw an error? Can that actually happen? AFAICS a freshly-created table shouldn't can't have row-level security enabled. Or is row-level security enabled/disabled status copied from the source table? * Row-level security is not allowed for foreign tables. Why not? It's perhaps not a very useful feature, but I don't see any immediate reason to forbid it either. How about views? * The new pg_class.relhasrowsecurity column is not updated lazily like most other relhas* columns. That's intentional and documented, but perhaps it would've been better to name the column differently, to avoid confusion. Maybe just relrowsecurity * In rewrite/rowsecurity: + * Policies can be defined for specific roles, specific commands, or provided + * by an extension. How do you define a policy for a specific role? There's a hook for the extensions in there, but I didn't see any documentation mentioning that an extension might provide policies, nor any developer documentation on how to use the hook. * In pg_backup_archiver.c: /* * Enable row-security if necessary. */ if (!ropt-enable_row_security) ahprintf(AH, SET row_security = off;\n); else ahprintf(AH, SET row_security = on;\n); That makes pg_restore to throw an error if you try to restore to a pre-9.5 server. I'm not sure if using a newer pg_restore against an older server is supported in general, but without the above it works in simple cases, at least. It would be trivi * The new --enable-row-security option to pg_restore is not documented in the user manual. * in src/bin/psql/describe.c: @@ -2529,6 +2640,11 @@ describeRoles(const char *pattern, bool verbose) appendPQExpBufferStr(buf, \n, r.rolreplication); } + if (pset.sversion = 90500) + { + appendPQExpBufferStr(buf, \n, r.rolbypassrls); + } + appendPQExpBufferStr(buf, \nFROM pg_catalog.pg_roles r\n); The rolbypassrls column isn't actually used for anything in this function. In addition to the above, attached is a patch with some trivial fixes. - Heikki diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 76d6405..bae08ee 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1943,6 +1943,7 @@ row entrystructfieldrelhasrowsecurity/structfield/entry entrytypebool/type/entry + entry/entry entry True if table has row-security enabled; see link linkend=catalog-pg-rowsecuritystructnamepg_rowsecurity/structname/link catalog diff
Re: [HACKERS] better atomics - v0.6
23.09.2014, 00:01, Andres Freund kirjoitti: I've finally managed to incorporate (all the?) feedback I got for 0.5. Imo the current version looks pretty good. Most notable changes: * Lots of comment improvements * code moved out of storage/ into port/ * separated the s_lock.h changes into its own commit * merged i386/amd64 into one file * fixed lots of the little details Amit noticed * fixed lots of XXX/FIXMEs * rebased to today's master * tested various gcc/msvc versions * extended the regression tests * ... The patches: 0001: The actual atomics API I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal dingo) with this patch but regression tests failed due to: /export/home/os/postgresql/src/test/regress/regress.so: symbol pg_write_barrier_impl: referenced symbol not found which turns out to be caused by a leftover PG_ prefix in ifdefs for HAVE_GCC__ATOMIC_INT64_CAS. Removing the PG_ prefix fixed the build and regression tests. Attached a patch to strip the invalid prefix. 0002: Implement s_lock.h support ontop the atomics API. Existing implementations continue to be used unless FORCE_ATOMICS_BASED_SPINLOCKS is defined Applied this and built PG with and without FORCE_ATOMICS_BASED_SPINLOCKS - both builds passed regression tests. 0003-0005: Not proposed for review here. Just included because code actually using the atomics make testing them easier. I'll look at these patches later. / Oskari From 207b02cd7ee6d38b92b51195a951713639f0d738 Mon Sep 17 00:00:00 2001 From: Oskari Saarenmaa o...@ohmu.fi Date: Tue, 23 Sep 2014 13:28:51 +0300 Subject: [PATCH] atomics: fix ifdefs for gcc atomics --- src/include/port/atomics/generic-gcc.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/include/port/atomics/generic-gcc.h b/src/include/port/atomics/generic-gcc.h index 9b0c636..f82af01 100644 --- a/src/include/port/atomics/generic-gcc.h +++ b/src/include/port/atomics/generic-gcc.h @@ -40,19 +40,19 @@ * definitions where possible, and use this only as a fallback. */ #if !defined(pg_memory_barrier_impl) -# if defined(HAVE_GCC__PG_ATOMIC_INT64_CAS) +# if defined(HAVE_GCC__ATOMIC_INT64_CAS) # define pg_memory_barrier_impl() __atomic_thread_fence(__ATOMIC_SEQ_CST) # elif (__GNUC__ 4 || (__GNUC__ == 4 __GNUC_MINOR__ = 1)) # define pg_memory_barrier_impl() __sync_synchronize() # endif #endif /* !defined(pg_memory_barrier_impl) */ -#if !defined(pg_read_barrier_impl) defined(HAVE_GCC__PG_ATOMIC_INT64_CAS) +#if !defined(pg_read_barrier_impl) defined(HAVE_GCC__ATOMIC_INT64_CAS) /* acquire semantics include read barrier semantics */ # define pg_read_barrier_impl() __atomic_thread_fence(__ATOMIC_ACQUIRE) #endif -#if !defined(pg_write_barrier_impl) defined(HAVE_GCC__PG_ATOMIC_INT64_CAS) +#if !defined(pg_write_barrier_impl) defined(HAVE_GCC__ATOMIC_INT64_CAS) /* release semantics include write barrier semantics */ # define pg_write_barrier_impl() __atomic_thread_fence(__ATOMIC_RELEASE) #endif @@ -101,7 +101,7 @@ typedef struct pg_atomic_uint64 volatile uint64 value; } pg_atomic_uint64; -#endif /* defined(HAVE_GCC__PG_ATOMIC_INT64_CAS) || defined(HAVE_GCC__SYNC_INT64_CAS) */ +#endif /* defined(HAVE_GCC__ATOMIC_INT64_CAS) || defined(HAVE_GCC__SYNC_INT64_CAS) */ /* * Implementation follows. Inlined or directly included from atomics.c -- 1.8.4.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.6
On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote: 23.09.2014, 00:01, Andres Freund kirjoitti: I've finally managed to incorporate (all the?) feedback I got for 0.5. Imo the current version looks pretty good. Most notable changes: * Lots of comment improvements * code moved out of storage/ into port/ * separated the s_lock.h changes into its own commit * merged i386/amd64 into one file * fixed lots of the little details Amit noticed * fixed lots of XXX/FIXMEs * rebased to today's master * tested various gcc/msvc versions * extended the regression tests * ... The patches: 0001: The actual atomics API I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal dingo) with this patch but regression tests failed due to: /export/home/os/postgresql/src/test/regress/regress.so: symbol pg_write_barrier_impl: referenced symbol not found which turns out to be caused by a leftover PG_ prefix in ifdefs for HAVE_GCC__ATOMIC_INT64_CAS. Removing the PG_ prefix fixed the build and regression tests. Attached a patch to strip the invalid prefix. Gah. Stupid last minute changes... Thanks for diagnosing. Will integrate. 0002: Implement s_lock.h support ontop the atomics API. Existing implementations continue to be used unless FORCE_ATOMICS_BASED_SPINLOCKS is defined Applied this and built PG with and without FORCE_ATOMICS_BASED_SPINLOCKS - both builds passed regression tests. Cool. 0003-0005: Not proposed for review here. Just included because code actually using the atomics make testing them easier. I'll look at these patches later. Cool. Although I'm not proposing them to be integrated as-is... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.6
Hi, On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote: 23.09.2014, 00:01, Andres Freund kirjoitti: I've finally managed to incorporate (all the?) feedback I got for 0.5. Imo the current version looks pretty good. Most notable changes: * Lots of comment improvements * code moved out of storage/ into port/ * separated the s_lock.h changes into its own commit * merged i386/amd64 into one file * fixed lots of the little details Amit noticed * fixed lots of XXX/FIXMEs * rebased to today's master * tested various gcc/msvc versions * extended the regression tests * ... The patches: 0001: The actual atomics API I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal dingo) with this patch but regression tests failed due to: Btw, if you could try sun studio it'd be great. I wrote the support for it blindly, and I'd be surprised if I got it right on the first try. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: Some random comments after a quick read-through of the patch: Glad you were able to find a bit of time to take a look, thanks! * Missing documentation. For a major feature like this, reference pages for the CREATE/DROP POLICY statements are not sufficient. We'll need a whole new chapter for this. Peter mentioned this too and I've been working on adding documentation. The general capability, at least in my view, is pretty clear but I agree that examples and adding more about the trade-offs would be useful. Do you have any opinion regarding additional documentation for updatable security-barrier views? I'm trying to work out a way to share this new documentation with that since there is overlap as they share a lot of the caveats, given that they use the same code underneath. * In CREATE POLICY, the USING and WITH CHECK keywords are not very descriptive. I kind of understand WITH CHECK - it's applied to new rows like a CHECK constraint - but USING seems rather arbitrary and WITH CHECK isn't all that clear either. Perhaps ON SELECT CHECK and ON UPDATE CHECK or something like that would be better. Right, WITH CHECK matches up with the SQL standard 'WITH CHECK OPTION'. I guess USING makes sense when that's the only expression given, so that it applies to both SELECTs and UPDATEs. But when a WITH CHECK expression is given together with USING, it gets confusing. Right, USING was there in the original patch back in January (actually, it might be farther back, there were versions of the patch prior to that) while WITH CHECK was added more recently. + if (is_from) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg(COPY FROM not supported with row security.), +errhint(Use direct INSERT statements instead.))); + Why is that not implemented? I think it should be. It's certainly an item that I was planning to look at addressing, though only for completeness. COPY's focus is providing a faster interface and the RLS checks would end up dwarfing the overall time and making COPY not actually be usefully faster than INSERT. * In src/backend/commands/createas.c: @@ -420,6 +421,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) ExecCheckRTPerms(list_make1(rte), true); /* +* Make sure the constructed table does not have RLS enabled. +* +* check_enable_rls() will ereport(ERROR) itself if the user has requested +* something invalid, and otherwise will return RLS_ENABLED if RLS should +* be enabled here. We don't actually support that currently, so throw +* our own ereport(ERROR) if that happens. +*/ + if (check_enable_rls(intoRelationId, InvalidOid) == RLS_ENABLED) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +(errmsg(policies not yet implemented for this command; + + /* * Tentatively mark the target as populated, if it's a matview and we're * going to fill it; otherwise, no change needed. */ Hmm. So, if a table we just created with CREATE TABLE AS has row-level security enabled, we throw an error? Can that actually happen? I don't believe it actually can happen, but I wanted to be sure that the case was covered in the event that support is added. Essentially, I carefully went through looking at all of the existing ExecCheckRTPerms() calls and made sure RLS was handled in all of those cases, in one way or another. As I expect you noticed, I also updated the comments in ExecCheckRTPerms() to reflect that RLS needs to be addressed as well as normal checks when returning tuples or adding new tuples. AFAICS a freshly-created table shouldn't can't have row-level security enabled. Or is row-level security enabled/disabled status copied from the source table? It's not copied, no. That didn't make a lot of sense to me as we don't even have an option to copy permissions. * Row-level security is not allowed for foreign tables. Why not? It's perhaps not a very useful feature, but I don't see any immediate reason to forbid it either. How about views? The question about views came up but with views you could simply add the quals into the view definition instead of combining views and RLS. Still, I'm not against it, but the patch certainly seemed large enough to me that it made sense to move forward with the basic, but complete, implementation for tables. Foreign tables fell into the same bucket but perhaps it shouldn't have.. I'll definitely take a look at what the level of complexity there is. * The new pg_class.relhasrowsecurity column is not
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 4:25 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Sep 22, 2014 at 7:22 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Sep 22, 2014 at 4:02 PM, Andres Freund and...@anarazel.de wrote: This patch has been pushed in a clear violation of established policy. Fundamental pieces of the patch have changed *after* the commitfest started. And there wasn't a recent patch in the commitfest either - the entry was moved over from the last round without a new patch. It didn't receive independent review (Robert explicitly said his wasn't a full review). It wasn't marked ready for committer. The intention to commit wasn't announced publically. There were *clear* and unaddressed objections to committing the patch as is, by a committer (Robert) nonetheless. I have no reason to doubt your version of events here Fortunately, you don't have to take anything on faith. This is a public mailing list, and the exact sequence of events is open to inspection by anyone who cares to take a few minutes to do so. You can easily verify whether my statement that I asked Stephen twice to hold off committing it is correct or not; and you can also verify the rest of the history that Andres and I recounted. This is all there in black and white. Just to be clear here, the *only* issue we should even be discussing is whether the patch should or should not have been committed in the face of those objections. As Josh has also noted, the commitfest process was never meant to constrain what committers do or when they do it with their own patches or ones they've worked heavily on. They are there as a backstop to make sure that regardless of what the committers are doing day to day, patch authors know that their patch is expected to receive some review within N weeks. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 19:17, Heikki Linnakangas wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. I have finished implementation of the discussed optimization. I got a performance improvement of around 30% on the schema and data shared in earlier mail. I also tested for the index scan case, where our optimization is not done and observed that there is no effect on those query because of this change. Change details: I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey structure), the value used for this 0x0004, which was unused. Inside the function _bt_first, once we finish finding the start scan position based on the first key, I am appending the flag SK_BT_MATCHED to the first key. Then in the function _bt_checkkeys, during the key comparison, I am checking if the key has SK_BT_MATCHED flag set, if yes then there is no need to further comparison. But if the tuple is having NULL value, then even if this flag is set, we will continue with further comparison (this handles the Heikki point of checking NULLs). I will add this patch to the next CommitFest. Please let me know your feedback. Thanks and Regards, Kumar Rajeev Rastogi index_scan_opt.patch Description: index_scan_opt.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: adding a GUC for BAS_BULKREAD strategy
Hi, Currently the value is hard code to NBuffers / 4 but ISTM that with bigger shared_buffer it's too much, ie even with a DB 10 to 20 time the memory size there's a lot of tables under this limit and nightly batch reports are trashing the shared buffers cache as if there's no tomorrow. regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: adding a GUC for BAS_BULKREAD strategy
Hi, On 2014-09-23 14:47:33 +0200, didier wrote: Currently the value is hard code to NBuffers / 4 but ISTM that with bigger shared_buffer it's too much, ie even with a DB 10 to 20 time the memory size there's a lot of tables under this limit and nightly batch reports are trashing the shared buffers cache as if there's no tomorrow. I'd like the ability to tune this as well. Even though I more often want to entirely disable it, rather than make it more aggressive. For workloads where the majority of the read data fits into shared_buffers and sequential scans over large relations are something happening frequently, the current strategy *sucks* because the large table will pretty much never get cached. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 2014-09-23 13:23:32 +0100, Dave Page wrote: Just to be clear here, the *only* issue we should even be discussing is whether the patch should or should not have been committed in the face of those objections. As Josh has also noted, the commitfest process was never meant to constrain what committers do or when they do it with their own patches or ones they've worked heavily on. They are there as a backstop to make sure that regardless of what the committers are doing day to day, patch authors know that their patch is expected to receive some review within N weeks. FWIW, while not really at the core of the problem here, I don't think this is entirely true anymore. We certainly seem to to expect bigger feature patches to go through the commitfest process to some degree. Just look at the discussions about *committers* patches being committed or not at each cycles last commitfest. Every single time the point in time they've been submitted to which CF plays a rather prominent role in the discussion. Also look at committers like Robert that *do* feel constrained about when to commit or even expect review for submitted patches. I think it's obvious that a committer doesn't need to wait till some later commitfest to commit patches that have since gotten enough review or are uncontroversial. Neither is the case here. I also think committers need to be much more careful when committing patches which they (or their employer) appear to have a business interest in. Rushing ahead to commit the patch of somebody 'unrelated' leaves a completely different taste than committing your colleagues patch. *INDEPENDENT* of the actual reasons and the state of the patch. Perhaps we can use this as a chance to make the review process a bit better defined. There certainly have been a few patches by commiters over the last years that have had a noticeable negative impact. Some of which might have been cought by more diligent review. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 2014-09-22 21:38:17 -0700, David G Johnston wrote: Robert Haas wrote It's difficult to imagine a more flagrant violation of process than committing a patch without any warning and without even *commenting* on the fact that clear objections to commit were made on a public mailing list. If that is allowed to stand, what can we assume other than that Stephen, at least, has a blank check to change anything he wants, any time he wants, with no veto possible from anyone else? I'm of a mind to agree that this shouldn't have been committed...but I'm not seeing where Stephen has done sufficient wrong to justify crucifixion. I've not seen much in the way of 'crucifixion' before this email. And I explicitly *don't* think it's warranted. Also it's not happening. At this point my hindsight says a strictly declaratory statement of reasons this is not ready combined with reverting the patch would have been sufficient; or even just a I am going to revert this for these reasons post. The difference between building support for a revert and gathering a mob is a pretty thin line. The reason it's being discussed is to find a way to align the different views about when to commit stuff. The primary goal is *not* to revert the commit or anything but to make sure we're not slipping into procedures we all would regret. Which *really* can happen very easily. We're all humans and most of us have more than enough to do. Though I guess if you indeed feel that his actions were truly heinous you should also then put forth the proposal that his ability to commit be revoked. I think *you* are escalating this to something unwarranted here by the way you're painting the discussion. If your not willing to go to that extent then, unless you know Stephen personally, I'd not assume that public flogging is the best way to get him to not mess up in the future; but the honest and cordial dialog about cause/effect is likely to be more productive and less self-destructing. Though, to each their own style. All the people involved know Stephen personally. There's also no public flogging. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 9:00 AM, Andres Freund and...@anarazel.de wrote: I think it's obvious that a committer doesn't need to wait till some later commitfest to commit patches that have since gotten enough review or are uncontroversial. Neither is the case here. Right. I mean, the occasionally-floated notion that committers can't commit things when there's no CommitFest ongoing is obviously flat wrong, as a quick look at the commit log will demonstrate. On the flip side, it's unreasonable to expect that as soon as a committer posts 7000-line patch, everyone who possibly has an objection to that patch will drop everything that they are doing to object to anything they don't like about it. It can easily take a week to review such a patch thoroughly even if you start the minute it hits the list. Many committers, including me, have held off on committing patches for many months to make sure that the community got time to review them, even though nobody explicitly objected. This patch was committed much faster than that and over an explicit objection. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of GetUserId() Usage
Stephen Frost wrote: pgstat_get_backend_current_activity() Used to decide if the current activity string should be returned or not. In my view, this is a clear case which should be addressed through has_privs_of_role() instead of requiring the user to SET ROLE to each role they are an inheirited member of to query for what the other sessions are doing. pg_signal_backend() Used to decide if pg_terminate_backend and pg_cancel_backend are allowed. Another case which should be changed over to has_privs_of_role(), in my view. Requiring the user to SET ROLE for roles which they are an inheirited member of is confusing as it's generally not required. pg_stat_get_activity() Used to decide if the state information should be shared. My opinion is the same as above- use has_privs_of_role(). There are a number of other functions in pgstatfuncs.c with similar issues (eg: pg_stat_get_backend_activity(), pg_stat_get_backend_client_port(), and others). I think the case for pgstat_get_backend_current_activity() and pg_stat_get_activity and the other pgstatfuncs.c callers is easy to make and seems acceptable to me; but I would leave pg_signal_backend out of that discussion, because it has a potentially harmful side effect. By requiring SET ROLE you add an extra layer of protection against mistakes. (Hopefully, pg_signal_backend() is not a routine thing for well-run systems, which means human intervention, and therefore the room for error isn't insignificant.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 12:38 AM, David G Johnston david.g.johns...@gmail.com wrote: I'm of a mind to agree that this shouldn't have been committed...but I'm not seeing where Stephen has done sufficient wrong to justify crucifixion. Especially since everything is being done publicly and you are one of the many people in the position to flex a veto by reverting the patch. I'd be interested to see what the reaction would be if I reverted this patch out of the blue. I suspect it would not be positive. More generally, I don't want the PostgreSQL source code repository to ground zero for a revert war. Subsequent, possibly private, discussion between you and Stephen could then occur before making any conclusions you form public so that others can learn from the experience and ponder whether anything could be changed to mitigate such situations in the future. I'd be happy to discuss this with Stephen, either in person, by phone, or over public or private email. Unfortunately, although he's posted many other emails to this list over the last couple of days, he hasn't chosen to respond, publicly or privately, to my statement that he must have read the email in which I asked him to hold off committing (because he addressed technical feedback from that same email) and that he went ahead and did it anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Stephen, But this is not convincing. Adding a unary function with a clean syntax indeed requires doing something with the parser. Based on the discussion so far, it sounds like you're coming around to agree with Robert (as I'm leaning towards also) that we'd be better off building a real syntax here instead. Not exactly. My actual opinion is that it is really an orthogonal issue. ISTM that a similar code would be required somehow for the executor part of an hypothetical real syntax if it was to support modulo. If so, do you anticipate having a patch to do so in the next few days, or...? Developping a full expression syntax is a significant project that I'm not planing to undertake in the short or medium term, especially as I'm not convinced that it is worth it: It would involve many changes, and I'm afraid that the likelyhood of the patch being rejected on some ground is high. So my opinion is that this small modulo operator patch is both useful and harmless, so it should be committed. If someday there is a nice real syntax implemented, that will be great as well. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 9:09 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-22 21:38:17 -0700, David G Johnston wrote: Robert Haas wrote It's difficult to imagine a more flagrant violation of process than committing a patch without any warning and without even *commenting* on the fact that clear objections to commit were made on a public mailing list. If that is allowed to stand, what can we assume other than that Stephen, at least, has a blank check to change anything he wants, any time he wants, with no veto possible from anyone else? I'm of a mind to agree that this shouldn't have been committed...but I'm not seeing where Stephen has done sufficient wrong to justify crucifixion. I've not seen much in the way of 'crucifixion' before this email. And I explicitly *don't* think it's warranted. Also it's not happening. I maybe got a little carried away with my hyperbole... At this point my hindsight says a strictly declaratory statement of reasons this is not ready combined with reverting the patch would have been sufficient; or even just a I am going to revert this for these reasons post. The difference between building support for a revert and gathering a mob is a pretty thin line. The reason it's being discussed is to find a way to align the different views about when to commit stuff. The primary goal is *not* to revert the commit or anything but to make sure we're not slipping into procedures we all would regret. Which *really* can happen very easily. We're all humans and most of us have more than enough to do. So, the second option then...and I'm sorry but this should never have been committed tends to cause one to think it should therefore be reverted. Though I guess if you indeed feel that his actions were truly heinous you should also then put forth the proposal that his ability to commit be revoked. I think *you* are escalating this to something unwarranted here by the way you're painting the discussion. Not everyone who reads -hackers knows all the people involved personally. I had an initial reaction to these e-mails that I thought I would share, nothing more. I'm not going to quote the different comments that led me to my feeling that the response to this was disproportionate to the offense but after a first pass - which is all many people would do - that is what I came away with. Though you could say I fell into the very same trap by reacting off my first impression... David J.
Re: [HACKERS] tick buildfarm failure
Stephen Frost sfr...@snowman.net writes: To be a bit more clear- why is it safe to change the contents if the equal() function returns 'false'? I'm guessing the answer is locking- whereby such a change would imply a lock was acquired on the relation by another backend, which shouldn't be possible if we're currently working with it, but wanted to double-check that. Right. If that were to fail, it would be the fault of something else not having gotten sufficient lock for whatever it had been doing: either lack of exclusive lock for an RLS update, or lack of an access lock to examine the cached data. The reason we want to make the equal() check rather than just summarily replace the data is that code that *does* hold AccessShare might reasonably expect the data to not move while it's looking at it. I also wonder if we might be better off with a way to identify that nothing has actually been changed due to the locks we hold and avoid rebuilding the relation cache entry entirely.. I am entirely disinclined to reinvent relcache as part of the RLS patch. You've got enough problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Fabien COELHO coe...@cri.ensmp.fr writes: So my opinion is that this small modulo operator patch is both useful and harmless, so it should be committed. You've really failed to make that case --- in particular, AFAICS there is not even consensus on the exact semantics that the operator should have. So I'm inclined to reject rather than put in something that may cause surprises down the road. The usefulness doesn't seem great enough to take that risk. The way forward, if we think there is enough value in it (I'm not sure there is), would be to build enough expression infrastructure so that we could inexpensively add both operators and functions. Then we could add a modulo operator with whatever semantics seem most popular, and some function(s) for the other semantics, and there would not be so much riding on choosing the right semantics. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tick buildfarm failure
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: To be a bit more clear- why is it safe to change the contents if the equal() function returns 'false'? I'm guessing the answer is locking- whereby such a change would imply a lock was acquired on the relation by another backend, which shouldn't be possible if we're currently working with it, but wanted to double-check that. Right. If that were to fail, it would be the fault of something else not having gotten sufficient lock for whatever it had been doing: either lack of exclusive lock for an RLS update, or lack of an access lock to examine the cached data. The reason we want to make the equal() check rather than just summarily replace the data is that code that *does* hold AccessShare might reasonably expect the data to not move while it's looking at it. Ok, great, thanks for the confirmation. Yes- the RLS code wasn't anticipating a change happening underneath it such as this (as other places didn't appear worried about same, I didn't expect to see it be an issue). No problem, I'll definitely address it. I also wonder if we might be better off with a way to identify that nothing has actually been changed due to the locks we hold and avoid rebuilding the relation cache entry entirely.. I am entirely disinclined to reinvent relcache as part of the RLS patch. Apologies- I hadn't intend to even suggest that but rather to speculate about this possibility for future improvements. Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] add modulo (%) operator to pgbench
* Tom Lane (t...@sss.pgh.pa.us) wrote: Fabien COELHO coe...@cri.ensmp.fr writes: So my opinion is that this small modulo operator patch is both useful and harmless, so it should be committed. You've really failed to make that case --- in particular, AFAICS there is not even consensus on the exact semantics that the operator should have. So I'm inclined to reject rather than put in something that may cause surprises down the road. The usefulness doesn't seem great enough to take that risk. Agreed. The way forward, if we think there is enough value in it (I'm not sure there is), would be to build enough expression infrastructure so that we could inexpensively add both operators and functions. Then we could add a modulo operator with whatever semantics seem most popular, and some function(s) for the other semantics, and there would not be so much riding on choosing the right semantics. Indeed and there's plenty of time to make it happen for 9.5. Personally, I'd really like to see as I feel it'd help with the performance farm goal which has been discussed many times over. Fabien, I'd ask that you not be discouraged by this and continue to work with pgbench and work on such an improvement, if you're able to take it on with your other committments. I do see value in it and I feel it will help reproducability, a key and important aspect of performance analysis, much more so than just hacking a local copy of pgbench would. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Refactoring code for sync node detection (was: Support for N synchronous standby servers)
On Sat, Sep 20, 2014 at 1:16 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Sep 19, 2014 at 12:18 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 16, 2014 at 2:19 PM, Michael Paquier michael.paqu...@gmail.com wrote: - A patch refactoring code for pg_stat_get_wal_senders and SyncRepReleaseWaiters as there is in either case duplicated code in this area to select the synchronous node as the one connected with lowest priority A strong +1 for this idea. I have never liked that, and cleaning it up seems eminently sensible. Interestingly, the syncrep code has in some of its code paths the idea that a synchronous node is unique, while other code paths assume that there can be multiple synchronous nodes. If that's fine I think that it would be better to just make the code multiple-sync node aware, by having a single function call in walsender.c and syncrep.c that returns an integer array of WAL sender positions (WalSndCtl). as that seems more extensible long-term. Well for now the array would have a single element, being the WAL sender with lowest priority 0. Feel free to protest about that approach though :) Please find attached a patch refactoring this code. Looking once again at that I have taken the approach minimizing the footprint of refactoring on current code, by simply adding a function called SyncRepGetSynchronousNode in syncrep.c that returns to the caller a position in the WAL sender array to define the code considered as synchronous, and if no synchronous node is found. I'll add it to the next commit fest. Regards, -- Michael From 6f66020b3c5fc0866b63bb12cf12c582be14f7d0 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Tue, 23 Sep 2014 22:57:00 +0900 Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 89 ++--- src/backend/replication/walsender.c | 34 +- src/include/replication/syncrep.h | 1 + 3 files changed, 74 insertions(+), 50 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..e0b1034 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -357,6 +357,60 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of synchronous standby in the array referencing all + * the WAL senders, or -1 if no synchronous node can be found. The caller + * of this function should take a lock on SyncRepLock. + */ +int +SyncRepGetSynchronousNode(void) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = WalSndCtl-walsnds[i]; + + /* Process to next if not active */ + if (walsnd-pid == 0) + continue; + + /* Process to next if not streaming */ + if (walsnd-state != WALSNDSTATE_STREAMING) + continue; + + /* Process to next one if asynchronous */ + if (walsnd-sync_standby_priority == 0) + continue; + + /* Process to next one if priority conditions not satisfied */ + if (priority != 0 + priority = walsnd-sync_standby_priority) + continue; + + /* Process to next one if flush position is invalid */ + if (XLogRecPtrIsInvalid(walsnd-flush)) + continue; + + /* + * We have a potential synchronous candidate, choose it as the + * synchronous node for the time being before going through the + * other nodes listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd-sync_standby_priority; + } + + return sync_node; +} + /* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. @@ -369,10 +423,9 @@ SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; volatile WalSnd *syncWalSnd = NULL; + int sync_node; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +441,14 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first
Re: [HACKERS] RLS feature has been committed
Robert, * Robert Haas (robertmh...@gmail.com) wrote: I'd be happy to discuss this with Stephen, either in person, by phone, or over public or private email. Please understand that I'm not ignoring you, the conversation, or the concern. I was asked (by other community members) to not comment on the thread and I have been trying to hold to that. Posted to the list so others are aware. I'd be more than happy to discuss it over the phone, or in person, as I offered to do. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Tue, Sep 23, 2014 at 1:23 AM, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: To clarify- we'll simply swap from (essentially) floor() to ceil() for handling all GUC_with_unit to internal_unit conversions, document that, and note it in the release notes as a possible behavior change and move on. Worksforme. Great, thanks. I'll wait a couple days to see if anyone else wants to voice a concern. Tomonari, don't worry about updating the patch (unless you really want to) as I suspect I'll be changing around the wording anyway. No offense, please, but I suspect English isn't your first language and it'll be simpler for me if I just handle it. Thanks a lot for the bug report and discussion and I'll be sure to credit you for it in the commit. Three people have voted for making it an *error* to supply a value that needs to be rounded, instead of changing the rounding behavior. There are not three votes for any other proposal. (This may still be an improvement over the current behavior, though.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
Robert Haas robertmh...@gmail.com writes: Three people have voted for making it an *error* to supply a value that needs to be rounded, instead of changing the rounding behavior. Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. And I'm not sure what votes you're counting, anyway. People's opinions have changed as the discussion proceeded ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Fri, Sep 19, 2014 at 7:21 AM, Amit Kapila amit.kapil...@gmail.com wrote: Specific numbers of both the configurations for which I have posted data in previous mail are as follows: Scale Factor - 800 Shared_Buffers - 12286MB (Total db size is 12288MB) Client and Thread Count = 64 buffers_touched_freelist - count of buffers that backends found touched after popping from freelist. buffers_backend_clocksweep - count of buffer allocations not satisfied from freelist buffers_alloc 1531023 buffers_backend_clocksweep 0 buffers_touched_freelist 0 I didn't believe these numbers, so I did some testing. I used the same configuration you mention here, scale factor = 800, shared_buffers = 12286 MB, and I also saw buffers_backend_clocksweep = 0. I didn't see buffers_touched_freelist showing up anywhere, so I don't know whether that would have been zero or not. Then I tried reducing the high watermark for the freelist from 2000 buffers to 25 buffers, and buffers_backend_clocksweep was *still* 0. At that point I started to smell a rat. It turns out that, with this test configuration, there's no buffer allocation going on at all. Everything fits in shared_buffers, or it did on my test. I had to reduce shared_buffers down to 10491800kB before I got any significant buffer eviction. At that level, a 100-buffer high watermark wasn't sufficient to prevent the freelist from occasionally going empty. A 2000-buffer high water mark was by and large sufficient, although I was able to see small numbers of buffers being allocated via clocksweep right at the very beginning of the test, I guess before the reclaimer really got cranking. So the watermarks seem to be broadly in the right ballpark, but I think the statistics reporting needs improving. We need an easy way to measure the amount of work that bgreclaimer is actually doing. I suggest we count these things: 1. The number of buffers the reclaimer has put back on the free list. 2. The number of times a backend has run the clocksweep. 3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1). 4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3). 5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist). It's hard to come up with good names for all of these things that are consistent with the somewhat wonky existing names. Here's an attempt: 1. bgreclaim_freelist 2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I think we want to make it more parallel with buffers_alloc, which is the number of buffers allocated, not buffers_backend, the number of buffers *written* by a backend) 3. clocksweep_bgreclaim 4. clocksweep_backend 5. freelist_touched -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Tue, Sep 23, 2014 at 10:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Three people have voted for making it an *error* to supply a value that needs to be rounded, instead of changing the rounding behavior. Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. Not at all. You can still supply the value in another unit as long as it converts exactly. If it doesn't, shouldn't you care about that? And I'm not sure what votes you're counting, anyway. People's opinions have changed as the discussion proceeded ... David Johnston, Peter Eisentraut, myself. I don't see any indication that any of those three people have reversed their opinion at any point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
Hi, On 2014-09-23 10:31:24 -0400, Robert Haas wrote: I suggest we count these things: 1. The number of buffers the reclaimer has put back on the free list. 2. The number of times a backend has run the clocksweep. 3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1). 4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3). 5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist). Sounds good. It's hard to come up with good names for all of these things that are consistent with the somewhat wonky existing names. Here's an attempt: 1. bgreclaim_freelist bgreclaim_alloc_clocksweep? 2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I think we want to make it more parallel with buffers_alloc, which is the number of buffers allocated, not buffers_backend, the number of buffers *written* by a backend) 3. clocksweep_bgreclaim 4. clocksweep_backend I think bgreclaim/backend should always be either be a prefix or a postfix. But not one in some variables and some in another. 5. freelist_touched I wonder if we shouldn't move all this to a new view, instead of stuffing it somewhere where it really doesn't belong. pg_stat_buffers or something like it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com wrote: [ review ] Oh, by the way, I noticed that this patch breaks pg_buffercache. If we're going to have 128 lock partitions, we need to bump MAX_SIMUL_LWLOCKS. But this gets at another point: the way we're benchmarking this right now, we're really conflating the effects of three different things: 1. Changing the locking regimen around the freelist and clocksweep. 2. Adding a bgreclaimer process. 3. Raising the number of buffer locking partitions. I think it's pretty clear that #1 and #2 are a good idea. #3 is a mixed bag, and it might account for the regressions you saw on some test cases. Increasing the number of buffer mapping locks means that those locks take up more cache lines, which could slow things down in cases where there's no reduction in contention. It also means that the chances of an allocated buffer ending up in the same buffer mapping lock partition are 1/128 instead of 1/16, which means about 5.4 additional lwlock acquire/release cycles per 100 allocations. That's not a ton, but it's not free either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 2:00 PM, Andres Freund and...@anarazel.de wrote: On 2014-09-23 13:23:32 +0100, Dave Page wrote: Just to be clear here, the *only* issue we should even be discussing is whether the patch should or should not have been committed in the face of those objections. As Josh has also noted, the commitfest process was never meant to constrain what committers do or when they do it with their own patches or ones they've worked heavily on. They are there as a backstop to make sure that regardless of what the committers are doing day to day, patch authors know that their patch is expected to receive some review within N weeks. FWIW, while not really at the core of the problem here, I don't think this is entirely true anymore. I'm not aware that we've made any such changes since the process was originally developed. The fact that developers may constrain their own review/commit work to certain periods is a personal choice, not policy or requirement. We certainly seem to to expect bigger feature patches to go through the commitfest process to some degree. Just look at the discussions about *committers* patches being committed or not at each cycles last commitfest. Every single time the point in time they've been submitted to which CF plays a rather prominent role in the discussion. They should be tracked on the app certainly, but that doesn't prevent review/commits being made outside of the commitfest. Also look at committers like Robert that *do* feel constrained about when to commit or even expect review for submitted patches. Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. Committers are free to commit at any time. The process was never intended to restrict what committers do or when - in fact when I introduced the process to -hackers, it was specifically worded to say that developers are strongly encouraged to take part in the commitfest reviews, but not forced to, and may continue to work on their patches as they see fit. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote: Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. Committers are free to commit at any time. The process was never intended to restrict what committers do or when - in fact when I introduced the process to -hackers, it was specifically worded to say that developers are strongly encouraged to take part in the commitfest reviews, but not forced to, and may continue to work on their patches as they see fit. So, just to be clear, you're saying that committers are free to commit things even when other community members (who may themselves be committers) ask them not to? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote: Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. Committers are free to commit at any time. The process was never intended to restrict what committers do or when - in fact when I introduced the process to -hackers, it was specifically worded to say that developers are strongly encouraged to take part in the commitfest reviews, but not forced to, and may continue to work on their patches as they see fit. So, just to be clear, you're saying that committers are free to commit things even when other community members (who may themselves be committers) ask them not to? At what point did I say anything like that? Do not twist my words. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 09/23/2014 10:29 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Three people have voted for making it an *error* to supply a value that needs to be rounded, instead of changing the rounding behavior. Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. And I'm not sure what votes you're counting, anyway. People's opinions have changed as the discussion proceeded ... I don't think I've weighed in on this, and yes, I'm very late to the party. The round away from zero suggestion seemed the simplest and most easily understood rule to me. As Tom, I think, remarked, if that seems silly because 1 second gets rounded up to 1 hour or whatever, then we've chosen the wrong units in the first place. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 2014-09-23 16:16:18 +0100, Dave Page wrote: On Tue, Sep 23, 2014 at 2:00 PM, Andres Freund and...@anarazel.de wrote: On 2014-09-23 13:23:32 +0100, Dave Page wrote: Just to be clear here, the *only* issue we should even be discussing is whether the patch should or should not have been committed in the face of those objections. As Josh has also noted, the commitfest process was never meant to constrain what committers do or when they do it with their own patches or ones they've worked heavily on. They are there as a backstop to make sure that regardless of what the committers are doing day to day, patch authors know that their patch is expected to receive some review within N weeks. FWIW, while not really at the core of the problem here, I don't think this is entirely true anymore. I'm not aware that we've made any such changes since the process was originally developed. The fact that developers may constrain their own review/commit work to certain periods is a personal choice, not policy or requirement. I do think that it has widely lead to a bit more formal expectance of committers patches being reviewed by others. We certainly seem to to expect bigger feature patches to go through the commitfest process to some degree. Just look at the discussions about *committers* patches being committed or not at each cycles last commitfest. Every single time the point in time they've been submitted to which CF plays a rather prominent role in the discussion. They should be tracked on the app certainly, but that doesn't prevent review/commits being made outside of the commitfest. And I've explicitly stated that I don't believe that they should be. Also look at committers like Robert that *do* feel constrained about when to commit or even expect review for submitted patches. Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. I think you're misunderstanding my point here. I would never ever protest against *more* reviews. Stephen quoted the delay of getting review as a reason for committing the patch at the point he did. And that seems unwarranted, because the current form of the patch (which is significantly different!) was only posted in the middle of the commitfest. The reason I mentioned the commitfest is that he had couldn't expect a review in the couple of days in which the patch existed in the current form - because other reviewers were still trying to keep up (and failing!) with the stuff that was submitted to the commitest. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 09/23/2014 11:19 AM, Robert Haas wrote: On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote: Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. Committers are free to commit at any time. The process was never intended to restrict what committers do or when - in fact when I introduced the process to -hackers, it was specifically worded to say that developers are strongly encouraged to take part in the commitfest reviews, but not forced to, and may continue to work on their patches as they see fit. So, just to be clear, you're saying that committers are free to commit things even when other community members (who may themselves be committers) ask them not to? I don't think anyone is suggesting that. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On Tue, Sep 23, 2014 at 11:23 AM, Dave Page dp...@pgadmin.org wrote: On Tue, Sep 23, 2014 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 23, 2014 at 11:16 AM, Dave Page dp...@pgadmin.org wrote: Regardless of what Robert may feel, review should only generally be *expected* during a commitfest, but it can be done at any time. Committers are free to commit at any time. The process was never intended to restrict what committers do or when - in fact when I introduced the process to -hackers, it was specifically worded to say that developers are strongly encouraged to take part in the commitfest reviews, but not forced to, and may continue to work on their patches as they see fit. So, just to be clear, you're saying that committers are free to commit things even when other community members (who may themselves be committers) ask them not to? At what point did I say anything like that? Do not twist my words. Well, if that's not what you're saying, then good, because I sure as heck don't think that. I think it's the role of a committer to commit things that are generally agreed to be good ideas, not things that that particular committer personally thinks are a good idea regardless of the opinions of others. The committer is entitled to weigh their own opinion more heavily than the opinions of others because, hey, that's why they have the commit bit and the other people don't. But they're not entitled to run roughshod over contrary opinions. Now, the way that CommitFests come into it is that people can't do two things at once. It's completely reasonable, in my opinion, for me to say, hey look, I really want to review your patch but I don't have time to do that right now because we're in the middle of a CommitFest; please therefore submit it to the next CommitFest. And if I say that, the person should either (1) do it or (2) disagree with the necessity of doing it not (3) commit anyway. Whether you've been paying attention or not, most committer patches ARE submitted to CommitFest and go through the exact same review process as patches from non-committers. That's a good thing. It contributes to our delivery of quality software, and it reduces arguments about whether something was rushed through. There are appropriate times to leapfrog that process, when it's clear that the patch is uncontroversial and that delay is merely dithering. But this is not one of those times. It's a giant patch implementing a complex feature for which more review time was explicitly requested. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Tue, Sep 23, 2014 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Johnston david.g.johns...@gmail.com writes: My original concern was things that are rounded to zero now will not be in 9.5 if the non-error solution is chosen. The risk profile is extremely small but it is not theoretically zero. This is exactly the position I was characterizing as an excessively narrow-minded attachment to backwards compatibility. We are trying to make the behavior better (as in less confusing), not guarantee that it's exactly the same. I am going to assume that the feature designers were focused on wanting to avoid: 1000 * 60 * 5 to get a 5-minute value set on a millisecond unit parameter. The designer of the variable, in choosing a unit, has specified the minimum value that they consider sane. Attempting to record an insane value should throw an error. I do not support throwing an error on all attempts to round but specifying a value less than 1 in the variable's unit should not be allowed. If such a value is proposed the user either made an error OR they misunderstand the variable they are using. In either case telling them of their error is more friendly than allowing them to discover the problem on their own. If we are only allowed to change the behavior by throwing errors in cases where we previously didn't, then we are voluntarily donning a straitjacket that will pretty much ensure Postgres doesn't improve any further. I'm not proposing project-level policy here. David J.
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 09/23/2014 06:23 PM, Andrew Dunstan wrote: On 09/23/2014 10:29 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Three people have voted for making it an *error* to supply a value that needs to be rounded, instead of changing the rounding behavior. Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. And I'm not sure what votes you're counting, anyway. People's opinions have changed as the discussion proceeded ... I don't think I've weighed in on this, and yes, I'm very late to the party. The round away from zero suggestion seemed the simplest and most easily understood rule to me. +1, rounding up seems better to me as well. Although throwing an error wouldn't be that bad either. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] JsonbValue to Jsonb conversion
Hi all, I'm faced with some troubles about the jsonb implementation, and I hope I'll get little advice =) If I understand correctly, an abstract function for jsonb modification should have the following stages: Jsonb - JsonbValue - Modification - JsonbValue - Jsonb One can convert the *JsonbValue* to the *Jsonb* only by *JsonbValueToJsonb* function. So, my question is can be *JsonbValue*, that contains few *jbvBinary* elements, converted to *Jsonb* by this function? It will be very useful, if you want modify only small part of your JsonbValue (e.g. replace value of some key). But when I'm trying to do this, an exception unknown type of jsonb container appears. Maybe I missed something? Or is there another approach to do this conversion?
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote: there's another problem in this area: 9.4 adds Planning time to the EXPLAIN output. If you don't want to see that, you need to use (costs off), but this, well, also disables the costs. If you are running regression tests to actually test the costs, you've lost in 9.4. This problem just emerged in the Multicorn FDW where the regression tests were monitoring the costs, but in 9.4 (costs off) kills that. https://github.com/Kozea/Multicorn/pull/7 Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time line? That would even be backwards compatible with 9.x where it would be a no-op. I don't think that'll work becuase: /* check that timing is used with EXPLAIN ANALYZE */ if (es.timing !es.analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(EXPLAIN option TIMING requires ANALYZE))); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On 09/15/2014 05:25 PM, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 08/30/2014 12:15 AM, Kevin Grittner wrote: If we were to go with the hooks as you propose, we would still need to take the information from TriggerData and put it somewhere else for the hook to reference. Sure. FWIW, I agree with Heikki on this point. It makes a lot more sense for the parser to provide hooks comparable to the existing hooks for resolving column refs, and it's not apparent that loading such functionality into SPI is sane at all. OTOH, I agree with Kevin that the things we're talking about are lightweight relations not variables. Try as I might, I was unable to find any sensible way to use hooks. If the issue was only the parsing, the route was fairly obvious, but the execution side needs to access the correct tuplestore(s) for each run, too -- so the sort of information provided by relcache needed to be passed in to based on the context within the process (i.e., if you have nested triggers firing, each needs to use a different tuplestore with the same name in the same function, even though it's using the same plan). On both sides it seemed easier to pass things through the same sort of techniques as normal parameters; I couldn't find a way to use hooks that didn't just make things uglier. Hmph. You didn't actually use the same sort of techniques we use for normal parameters. You invented a new TsrCache registry, which marries the metadata for planning with the tuplestore itself. That's quite different from the way we deal with parameters (TsrCache is a misnomer, BTW; it's not a cache, but the primary source of information for the planner). And instead of passing parameters to the SPI calls individually, you invented SPI_register_tuplestore which affects all subsequent SPI calls. To recap, this is how normal parameters work: In the parse stage, you pass a ParserSetupHook function pointer to the parser. The parser calls the callback, which sets up more hooks in the ParseState struct: a column-ref hook and/or a param ref hook. The parser then proceeds to parse the query, and whenever it sees a reference to a column that it doesn't recognize, or a $n style parameter marker, it calls the column-ref or param-ref hook. The column- or param-ref hook can return a Param node, indicating that the parameter's value will be supplied later, at execution time. The Param node contains a numeric ID for the parameter, and the type OID and other information needed to complete the parsing. At execution time, you pass a ParamListInfo struct to the executor. It contains values for all the parameters. Alternatively, the values can be supplied lazily, by providing a param-fetch hook in the ParamListInfo struct. Whenever the executor needs the value of a parameter, and the ParamListInfo struct doesn't contain it, it calls the paramFetch hook which should fill it in ParamListInfo. Now, how do we make the tuplestores work similarly? Here's what I think we should do: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. The next question is how to pass the new hooks and tuplestores through the SPI interface. For prepared plans, the current SPI_prepare_params + SPI_execute_plan_with_paramlist functions work fine. However, there doesn't seem to be any way to do one-shot queries with a ParserSetupHook and ParamListInfo. That seems like an oversight, and would be nice to address that anyway. PS. the copy/read/write functions for TuplestoreRelation in the patch are broken; TuplestoreRelation is not a subclass of Plan. (But if you go with the approach I'm advocating for, TuplestoreRelation in its current form will be gone) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 09/15/2014 05:25 PM, Kevin Grittner wrote: Now, how do we make the tuplestores work similarly? Here's what I think we should do: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. I haven't been following this issue closely, but this sounds like a really nice design. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
* Ants Aasma: CRC has exactly one hardware implementation in general purpose CPU's I'm pretty sure that's not true. Many general purpose CPUs have CRC circuity, and there must be some which also expose them as instructions. and Intel has a patent on the techniques they used to implement it. The fact that AMD hasn't yet implemented this instruction shows that this patent is non-trivial to work around. I think you're jumping to conclusions. Intel and AMD have various cross-licensing deals. AMD faces other constraints which can make implementing the instruction difficult. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
Robert Haas robertmh...@gmail.com writes: On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote: Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time line? That would even be backwards compatible with 9.x where it would be a no-op. I don't think that'll work becuase: /* check that timing is used with EXPLAIN ANALYZE */ if (es.timing !es.analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(EXPLAIN option TIMING requires ANALYZE))); It looks to me like that would complain about EXPLAIN (TIMING ON), not the case Christoph is suggesting. What he proposes seems a bit odd and non-orthogonal, but we could make the code do it if we wanted. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Robert Haas robertmh...@gmail.com writes: On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Now, how do we make the tuplestores work similarly? Here's what I think we should do: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. I haven't been following this issue closely, but this sounds like a really nice design. I'm on board with the parser hooks part of that. I don't especially agree with the idea of a new sub-structure for ParamListInfo: if you do that you will need a whole bunch of new boilerplate infrastructure to allocate, copy, and generally manage that structure, for darn little gain. What I'd suggest is just saying that some Params might have type INTERNAL with Datum values that are pointers to tuplestores; then all you need to do is remember which Param number has been assigned to the particular tuplestore you want. There is already precedent for that in the recursive CTE code, IIRC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
On Tue, Sep 23, 2014 at 1:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote: Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time line? That would even be backwards compatible with 9.x where it would be a no-op. I don't think that'll work becuase: /* check that timing is used with EXPLAIN ANALYZE */ if (es.timing !es.analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(EXPLAIN option TIMING requires ANALYZE))); It looks to me like that would complain about EXPLAIN (TIMING ON), not the case Christoph is suggesting. What he proposes seems a bit odd and non-orthogonal, but we could make the code do it if we wanted. Ah, right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
So my opinion is that this small modulo operator patch is both useful and harmless, so it should be committed. You've really failed to make that case --- in particular, AFAICS there is not even consensus on the exact semantics that the operator should have. There is. Basically whatever with a positive remainder when the divisor is positive is fine. The only one to avoid is the dividend signed version, which happen to be the one of C and SQL, a very unfortunate choice in both case as soon as you have negative numbers. So I'm inclined to reject rather than put in something that may cause surprises down the road. The usefulness doesn't seem great enough to take that risk. If you reject it, you can also remove the gaussian and exponential random distribution which is near useless without a mean to add a minimal pseudo-random stage, for which (x * something) % size is a reasonable approximation, hence the modulo submission. The way forward, if we think there is enough value in it (I'm not sure there is), would be to build enough expression infrastructure so that we could inexpensively add both operators and functions. The modulo patch is basically 10 lines of code, I would not call that expensive... As I explained earlier, it would NOT be any different with an expression infrastructure, as you would have to add a lex for the operator, then a yacc rule for the construction, the operator would need to be represented in a data structure, and the executor should be able to handle the case including errors... there is no way that this would be less that 10 lines of code. It would basically include the very same lines for the executor part. Then we could add a modulo operator with whatever semantics seem most popular, and some function(s) for the other semantics, and there would not be so much riding on choosing the right semantics. The semantics is clear. I just choose the wrong one in the first patch:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication identifiers, take 3
Hi, I've previously started two threads about replication identifiers. Check http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de and http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de . The've also been discussed in the course of another thread: http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de As the topic has garnered some heat and confusion I thought it'd be worthwile to start afresh with an explanation why I think they're useful. I don't really want to discuss about implementation specifics for now, but rather about (details of the) concept. Once we've hashed those out, I'll adapt the existing patch to match them. There are three primary use cases for replication identifiers: 1) The ability Monitor how for replication has progressed in a crashsafe manner to allow it to restart at the right point after errors/crashes. 2) Efficiently identify the origin of individual changes and transactions. In multimaster and some cascading scenarios it is necessary to do so to avoid sending out replayed changes again. 3) Allow to efficiently filter out replayed changes from logical decoding. It's currently possible to filter changes from inside the output plugin, but it's much more efficient to filter them out before decoding. == Logical Decoding Background == To understand the need for 1) it's important to roughly understand how logical decoding/walsender streams changes and handles feedback from the receiving side. A walsender performing logical decoding *continously* sends out transactions. As long as there's new local changes (i.e. unprocessed WAL) and the network buffers aren't full it will send changes. *Without* waiting for the client. Everything else would lead to horrible latency. Because it sends data without waiting for the client to have processed them it obviously can't remove resources that are needed to stream them out again. The client or network connection could crash after all. To let the sender know when it can remove resources the receiver regularly sends back 'feedback' messages acknowledging up to where changes have been safely received. Whenever such a feedback message arrives the sender can release resources that aren't needed to decode the changes below that horizon. When the receiver ask the server to stream changes out it tells the sender at which LSN it should start sending changes. All *transactions* that *commit* after that LSN are sent out. Possibly again. == Crashsafe apply == Based on those explanations, when building a logical replication solution on top of logical decoding, one must remember the latest *remote* LSN that already has been replayed. So that, when the apply process or the whole database crashes, it is possibly to ask for all changes since the last transaction that has been successfully applied. The trivial solution here is to have a table (remote_node, last_replayed_lsn) and update it for every replayed transaction. Unfortunately that doesn't perform very well because that table quickly gets heavily bloated. It's also hard to avoid page level contention when replaying transaction from multiple remote nodes. Additionally these changes have to be filtered out when replicating these changes in a cascading fashion. To do this more efficiently there needs to be a crashsafe way to associate the latest successfully replayed remote transaction. == Identify the origin of changes == Say you're building a replication solution that allows two nodes to insert into the same table on two nodes. Ignoring conflict resolution and similar fun, one needs to prevent the same change being replayed over and over. In logical replication the changes to the heap have to be WAL logged, and thus the *replay* of changes from a remote node produce WAL which then will be decoded again. To avoid that it's very useful to tag individual changes/transactions with their 'origin'. I.e. mark changes that have been directly triggered by the user sending SQL as originating 'locally' and changes originating from replaying another node's changes as originating somewhere else. If that origin is exposed to logical decoding output plugins they can easily check whether to stream out the changes/transactions or not. It is possible to do this by adding extra columns to every table and store the origin of a row in there, but that a) permanently needs storage b) makes things much more invasive. == Proposed solution == These two fundamental problems happen to have overlapping requirements. A rather efficient solution for 1) is to attach the 'origin node' and the remote commit LSN to every local commit record produced by replay. That allows to have a shared memory table (remote_node, local_lsn, remote_lsn). During replay that table is kept up2date in sync with transaction commits. If updated within the transaction commit's critical section it's guaranteed to be correct,
Re: [HACKERS] RLS feature has been committed
On 9/23/14, 9:00 AM, Andres Freund wrote: I also think committers need to be much more careful when committing patches which they (or their employer) appear to have a business interest in. Rushing ahead to commit the patch of somebody 'unrelated' leaves a completely different taste than committing your colleagues patch. *INDEPENDENT* of the actual reasons and the state of the patch. I haven't been doing much personal development work around here lately, but I did more than a little of the planning on how to handle this as responsibly as I felt it deserved. I think this is worth talking about a little bit because this whole topic, how to deal with a funded project where the committer is also a contributor, isn't something a lot of people have visibility into. (Not talking about you, Andres, you know the deal) This commit didn't happen until I first succeeded in getting Stephen Frost fully funded as a community PostgreSQL contributor (for this job, as well as others with a broader community appeal). Everyone attached to the budget side very explicitly understands that includes an open-ended responsibility to the community for addressing fall-out from RLS, going from unforeseen issues to revisiting the things left on the known open items list. It's hard to reach perfect before commit; eventually you just have to just shoot and see what happens. Just to be really safe, I also kicked off training a whole new PostgreSQL contributor (Adam Brightwell) five months ago, so that by the time we got to here he's also making his own contributions to the security code of the database. And that's included exercises tracking down bugs in the early RLS POC deployments already going on here at Crunchy, so that Stephen isn't the sole Postgres community expertise bottleneck on the code even at this company. (I am NOT on the hook for fixing RLS bugs) The decision on when to commit was strictly Stephen's. During our last chat, we agreed this seemed like an ideal time of year in the development cycle for such a thing to land though. 9.5 is a fresh branch, and there is plenty of time to clean up and/or revert if that's the unfortunate end for anything going in today. Plus the ongoing CommitFest schedule means everyone in the community already is arranging to provide review resources needed in the near future pipeline. I can understand that Robert feels a procedural sting and/or slight due to the exact sequence of events here, and I'm staying out of that. But in general, I don't know what we could have done much better to be a responsible contributor, to finally push this long in development feature over the finish line. A description like rushing ahead would feel inappropriate to me for this one, given how much care went into timing even roughly when this particular commit should happen. The time of year in particular was very specifically aimed at for minimal disruption, based on both major version release dates and the related community development cycle around them. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
Alvaro Herrera wrote: Heikki Linnakangas wrote: If you add a new datatype, and define b-tree operators for it, what is required to create a minmax opclass for it? Would it be possible to generalize the functions in brin_minmax.c so that they can be reused for any datatype (with b-tree operators) without writing any new C code? I think we're almost there; the only thing that differs between each data type is the opcinfo function. Let's pass the type OID as argument to the opcinfo function. You could then have just a single minmax_opcinfo function, instead of the macro to generate a separate function for each built-in datatype. Yeah, that's how I had that initially. I changed it to what it's now as part of a plan to enable building cross-type opclasses, so you could have WHERE int8col=42 without requiring a cast of the constant to type int8. This might have been a thinko, because AFAICS it's possible to build them with a constant opcinfo as well (I changed several other things to support this, as described in a previous email.) I will look into this later. I found out that we don't really throw errors in such cases anymore; we insert casts instead. Maybe there's a performance argument that it might be better to use existing cross-type operators than casting, but justifying this work just turned a lot harder. Here's a patch that reverts opcinfo into a generic function that receives the type OID. I will look into adding some testing mechanism for the union support proc; with that I will just consider the patch ready for commit and will push. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/access/brin/brin.c --- b/src/backend/access/brin/brin.c *** *** 886,894 brin_build_desc(Relation rel) opcInfoFn = index_getprocinfo(rel, keyno + 1, BRIN_PROCNUM_OPCINFO); - /* actually FunctionCall0 but we don't have that */ opcinfo[keyno] = (BrinOpcInfo *) ! DatumGetPointer(FunctionCall1(opcInfoFn, InvalidOid)); totalstored += opcinfo[keyno]-oi_nstored; } --- 886,894 opcInfoFn = index_getprocinfo(rel, keyno + 1, BRIN_PROCNUM_OPCINFO); opcinfo[keyno] = (BrinOpcInfo *) ! DatumGetPointer(FunctionCall1(opcInfoFn, ! tupdesc-attrs[keyno]-atttypid)); totalstored += opcinfo[keyno]-oi_nstored; } *** a/src/backend/access/brin/brin_minmax.c --- b/src/backend/access/brin/brin_minmax.c *** *** 50,87 typedef struct MinmaxOpaque bool inited[MINMAX_NUM_PROCNUMS]; } MinmaxOpaque; ! #define OPCINFO(typname, typoid) \ ! PG_FUNCTION_INFO_V1(minmaxOpcInfo_##typname);\ ! Datum \ ! minmaxOpcInfo_##typname(PG_FUNCTION_ARGS) \ ! { \ ! BrinOpcInfo *result; \ ! \ ! /* \ ! * opaque-operators is initialized lazily, as indicated by 'inited' \ ! * which is initialized to all false by palloc0.\ ! */ \ ! \ ! result = palloc0(MAXALIGN(SizeofBrinOpcInfo(2)) + \ ! sizeof(MinmaxOpaque)); \ ! result-oi_nstored = 2; \ ! result-oi_opaque = (MinmaxOpaque *) \ ! MAXALIGN((char *) result + SizeofBrinOpcInfo(2)); \ ! result-oi_typids[0] = typoid; \ ! result-oi_typids[1] = typoid; \ ! \ ! PG_RETURN_POINTER(result);\ ! } ! OPCINFO(int4, INT4OID) ! OPCINFO(numeric, NUMERICOID) ! OPCINFO(text, TEXTOID) ! OPCINFO(time, TIMEOID) ! OPCINFO(timetz, TIMETZOID) ! OPCINFO(timestamp, TIMESTAMPOID) ! OPCINFO(timestamptz, TIMESTAMPTZOID) ! OPCINFO(date, DATEOID) ! OPCINFO(char, CHAROID) /* * Examine the given index tuple (which contains partial status of a certain --- 50,77 bool inited[MINMAX_NUM_PROCNUMS]; } MinmaxOpaque; ! PG_FUNCTION_INFO_V1(minmaxOpcInfo); ! Datum ! minmaxOpcInfo(PG_FUNCTION_ARGS) ! { ! Oid typoid = PG_GETARG_OID(0); ! BrinOpcInfo *result; ! ! /* ! * opaque-operators is initialized lazily, as indicated by 'inited' ! * which is initialized to all false by palloc0. ! */ ! result = palloc0(MAXALIGN(SizeofBrinOpcInfo(2)) + ! sizeof(MinmaxOpaque)); ! result-oi_nstored = 2; ! result-oi_opaque = (MinmaxOpaque *) ! MAXALIGN((char *) result + SizeofBrinOpcInfo(2)); ! result-oi_typids[0] = typoid; ! result-oi_typids[1] = typoid; ! ! PG_RETURN_POINTER(result); ! } /* * Examine the given index tuple (which contains partial status of a certain *** a/src/include/catalog/pg_amproc.h --- b/src/include/catalog/pg_amproc.h *** *** 436,514 DATA(insert ( 4017 25 25 5 4031 )); DATA(insert ( 4054 23 23 1 3383 )); DATA(insert ( 4054 23 23 2 3384 )); DATA(insert ( 4054 23 23 3 3385 )); ! DATA(insert ( 4054 23 23 4 3394 )); DATA(insert ( 4054 23 23 5 66 ));
Re: [HACKERS] proposal: rounding up time value less than its unit.
Fwiw I agree with TL2. The simplest, least surprising behaviour to explain to users is to say we round to nearest and if that means we rounded to zero (or another special value) we throw an error. We could list the minimum value in pg_settings and maybe in documentation. -- greg
Re: [HACKERS] add modulo (%) operator to pgbench
On 09/23/2014 09:15 PM, Fabien COELHO wrote: So I'm inclined to reject rather than put in something that may cause surprises down the road. The usefulness doesn't seem great enough to take that risk. Marked as Returned with feedback. If you reject it, you can also remove the gaussian and exponential random distribution which is near useless without a mean to add a minimal pseudo-random stage, for which (x * something) % size is a reasonable approximation, hence the modulo submission. I'm confused. The gaussian and exponential patch was already committed. Are you saying that it's not actually useful, and should be reverted? That doesn't make sense to me, gaussian and exponential distributed values seems quite useful to me in test scripts. I don't understand what that pseudo-random stage you're talking about is. Can you elaborate? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On 09/23/2014 08:51 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Sep 23, 2014 at 12:46 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Now, how do we make the tuplestores work similarly? Here's what I think we should do: Add a new p_tableref_hook function pointer, similar to p_paramref_hook. Whenever the parser sees a RangeVar that it doesn't recognize (or actually, I think it should call it *before* resolving regular tables, but let's ignore that for now), it calls the p_tableref_hook. It can return a new RelationParam node (similar to regular Param), which contains a numeric ID for the table/tuplestore, as well as its tuple descriptor. For the execution phase, add a new array of Tuplestorestates to ParamListInfo. Similar to the existing array of ParamExternalDatas. I haven't been following this issue closely, but this sounds like a really nice design. I'm on board with the parser hooks part of that. I don't especially agree with the idea of a new sub-structure for ParamListInfo: if you do that you will need a whole bunch of new boilerplate infrastructure to allocate, copy, and generally manage that structure, for darn little gain. What I'd suggest is just saying that some Params might have type INTERNAL with Datum values that are pointers to tuplestores; then all you need to do is remember which Param number has been assigned to the particular tuplestore you want. There is already precedent for that in the recursive CTE code, IIRC. Works for me. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 09/22/2014 08:23 PM, Amit Kapila wrote: Who decides if the patch is adequately reviewed? Author, Committer or Reviewer? In CF, that is comparatively clear that once Reviewer is satisfied, he marks the patch as Ready For Committer and then Committer picks up and if he is satisfied with the review and quality of patch, then he commits the patch or if the Committer himself is reviewer than in many cases once he is satisfied, he commits it. Well, outside of the CF process, it becomes up to the committer to get adequate review of the patch so it can be committed, and adequate review in one of those cases generally means another committer who didn't work on the patch previously. It's also standard for our committers, up to and including Tom Lane, to say things like I'm going to commit this in 24 hours if nobody objects further. Now in this case, if I understand correctly the story is not so straightforward. It seems to me Robert as the Reviewer was not completely satisfied where as Stephen as the Committer was pretty much okay with the patch so he went ahead and commits it. That's certainly what it looks like to me, and on that basis Stephen should have held back the patch until he got reviewer OK. Fortunately, since we use Git and not CVS, reverting patches isn't the trauma it once was. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us Robert Haas robertmh...@gmail.com writes: On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote: Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time line? That would even be backwards compatible with 9.x where it would be a no-op. I don't think that'll work becuase: /* check that timing is used with EXPLAIN ANALYZE */ if (es.timing !es.analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(EXPLAIN option TIMING requires ANALYZE))); It looks to me like that would complain about EXPLAIN (TIMING ON), not the case Christoph is suggesting. What he proposes seems a bit odd and non-orthogonal, but we could make the code do it if we wanted. I don't think this warrants a new flag, and TIMING OFF seems to be the right naming for it. (In fact it was the first I tried, and I was cursing quite a bit over the lack of configurability until I realized that COSTS OFF disabled the planning time display as well.) It might be a bit odd, but it's easy to remember. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote: But this gets at another point: the way we're benchmarking this right now, we're really conflating the effects of three different things: 1. Changing the locking regimen around the freelist and clocksweep. 2. Adding a bgreclaimer process. 3. Raising the number of buffer locking partitions. I did some more experimentation on this. Attached is a patch that JUST does #1, and, as previously suggested, it uses a single spinlock instead of using two of them that are probably in the same cacheline. Without this patch, on a 32-client, read-only pgbench at scale factor 1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about LWLock-inspired preemption: - LWLockAcquire + 68.41% ReadBuffer_common + 31.59% StrategyGetBuffer With the patch, unsurprisingly, StrategyGetBuffer disappears and the only lwlocks that are causing context-switches are the individual buffer locks. But are we suffering spinlock contention instead as a result? Yes, but not much. s_lock is at 0.41% in the corresponding profile, and only 6.83% of those calls are from the patched StrategyGetBuffer. In a similar profile of master, s_lock is at 0.31%. So there's a bit of additional s_lock contention, but I think it's considerably less than the contention over the lwlock it's replacing, because the spinlock is only held for the minimal amount of time needed, whereas the lwlock could be held across taking and releasing many individual buffer locks. TPS results are a little higher with the patch - these are alternating 5 minute runs: master tps = 176010.647944 (including connections establishing) master tps = 176615.291149 (including connections establishing) master tps = 175648.370487 (including connections establishing) reduce-replacement-locking tps = 177888.734320 (including connections establishing) reduce-replacement-locking tps = 177797.842410 (including connections establishing) reduce-replacement-locking tps = 177894.822656 (including connections establishing) The picture is similar at 64 clients, but the benefit is a little more: master tps = 179037.231597 (including connections establishing) master tps = 180500.937068 (including connections establishing) master tps = 181565.706514 (including connections establishing) reduce-replacement-locking tps = 185741.503425 (including connections establishing) reduce-replacement-locking tps = 188598.728062 (including connections establishing) reduce-replacement-locking tps = 187340.977277 (including connections establishing) What's interesting is that I can't see in the perf output any real sign that the buffer mapping locks are slowing things down, but they clearly are, because when I take this patch and also boost NUM_BUFFER_PARTITIONS to 128, the performance goes up: reduce-replacement-locking-128 tps = 251001.812843 (including connections establishing) reduce-replacement-locking-128 tps = 247368.925927 (including connections establishing) reduce-replacement-locking-128 tps = 250775.304177 (including connections establishing) The performance also goes up if I do that on master, but the effect is substantially less: master-128 tps = 219301.492902 (including connections establishing) master-128 tps = 219786.249076 (including connections establishing) master-128 tps = 219821.220271 (including connections establishing) I think this shows pretty clearly that, even without the bgreclaimer, there's a lot of merit in getting rid of BufFreelistLock and using a spinlock held for the absolutely minimal number of instructions instead. There's already some benefit without doing anything about the buffer mapping locks, and we'll get a lot more benefit once that issue is addressed. I think we need to do some serious study to see whether bgreclaimer is even necessary, because it looks like this change alone, which is much simpler, takes a huge bite out of the scalability problem. I welcome further testing and comments, but my current inclination is to go ahead and push the attached patch. To my knowledge, nobody has at any point objected to this aspect of what's being proposed, and it looks safe to me and seems to be a clear win. We can then deal with the other issues on their own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit df4077cda2eae3eb4a5cf387da0c1e7616e73204 Author: Robert Haas rh...@postgresql.org Date: Mon Sep 22 16:42:14 2014 -0400 Remove volatile qualifiers from lwlock.c. Now that spinlocks (hopefully!) act as compiler barriers, as of commit 0709b7ee72e4bc71ad07b7120acd117265ab51d0, this should be safe. This serves as a demonstration of the new coding style, and may be optimized better on some machines as well. diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 7c96da5..66fb2e4 100644 --- a/src/backend/storage/lmgr/lwlock.c +++
Re: [HACKERS] RLS feature has been committed
All, Robert and I had a great discussion on the phone where we were both able to voice our concerns and feelings about RLS being pushed. By way of summary, we agree that it was pushed ahead of its time and that it should have waited for at least another review, which likely would have happened even prior to the October commitfest. I've apologized for jumping the gun on pushing it, it wasn't the best move for the project as a whole and I realize that now. Having other committers review and +1 a large patch before moving forward with it is important, even if there is already consensus on the overall design, as it will help minimize the bugs and issues in PostgreSQL. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Tue, Sep 23, 2014 at 3:05 PM, Greg Stark st...@mit.edu wrote: Fwiw I agree with TL2. The simplest, least surprising behaviour to explain to users is to say we round to nearest and if that means we rounded to zero (or another special value) we throw an error. We could list the minimum value in pg_settings and maybe in documentation. I'm not sure TL2 would agree that you are agreeing with him... To the other point the minimum unit-less value is 1. The unit that is applied is already listed in pg_settings and the documentation. While 0 is allowed it is more of a flag than a value. David J.
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote: I did some more experimentation on this. Attached is a patch that JUST does #1, and, ... ...and that was the wrong patch. Thanks to Heikki for point that out. Second try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README index 1fd38d0..a4ebbcc 100644 --- a/src/backend/storage/buffer/README +++ b/src/backend/storage/buffer/README @@ -125,14 +125,12 @@ bits of the tag's hash value. The rules stated above apply to each partition independently. If it is necessary to lock more than one partition at a time, they must be locked in partition-number order to avoid risk of deadlock. -* A separate system-wide LWLock, the BufFreelistLock, provides mutual +* A separate system-wide spinlock, buffer_strategy_lock, provides mutual exclusion for operations that access the buffer free list or select -buffers for replacement. This is always taken in exclusive mode since -there are no read-only operations on those data structures. The buffer -management policy is designed so that BufFreelistLock need not be taken -except in paths that will require I/O, and thus will be slow anyway. -(Details appear below.) It is never necessary to hold the BufMappingLock -and the BufFreelistLock at the same time. +buffers for replacement. A spinlock is used here rather than a lightweight +lock for efficiency; no other locks of any sort should be acquired while +buffer_strategy_lock is held. This is essential to allow buffer replacement +to happen in multiple backends with reasonable concurrency. * Each buffer header contains a spinlock that must be taken when examining or changing fields of that buffer header. This allows operations such as @@ -165,7 +163,7 @@ consider their pages unlikely to be needed soon; however, the current algorithm never does that. The list is singly-linked using fields in the buffer headers; we maintain head and tail pointers in global variables. (Note: although the list links are in the buffer headers, they are -considered to be protected by the BufFreelistLock, not the buffer-header +considered to be protected by the buffer_strategy_lock, not the buffer-header spinlocks.) To choose a victim buffer to recycle when there are no free buffers available, we use a simple clock-sweep algorithm, which avoids the need to take system-wide locks during common operations. It works like @@ -178,25 +176,26 @@ buffer reference count, so it's nearly free.) The clock hand is a buffer index, nextVictimBuffer, that moves circularly through all the available buffers. nextVictimBuffer is protected by the -BufFreelistLock. +buffer_strategy_lock. The algorithm for a process that needs to obtain a victim buffer is: -1. Obtain BufFreelistLock. +1. Obtain buffer_strategy_lock. -2. If buffer free list is nonempty, remove its head buffer. If the buffer -is pinned or has a nonzero usage count, it cannot be used; ignore it and -return to the start of step 2. Otherwise, pin the buffer, release -BufFreelistLock, and return the buffer. +2. If buffer free list is nonempty, remove its head buffer. Release +buffer_strategy_lock. If the buffer is pinned or has a nonzero usage count, +it cannot be used; ignore it go back to step 1. Otherwise, pin the buffer, +and return it. -3. Otherwise, select the buffer pointed to by nextVictimBuffer, and -circularly advance nextVictimBuffer for next time. +3. Otherwise, the buffer free list is empty. Select the buffer pointed to by +nextVictimBuffer, and circularly advance nextVictimBuffer for next time. +Release buffer_strategy_lock. 4. If the selected buffer is pinned or has a nonzero usage count, it cannot -be used. Decrement its usage count (if nonzero) and return to step 3 to -examine the next buffer. +be used. Decrement its usage count (if nonzero), reacquire +buffer_strategy_lock, and return to step 3 to examine the next buffer. -5. Pin the selected buffer, release BufFreelistLock, and return the buffer. +5. Pin the selected buffer, and return. (Note that if the selected buffer is dirty, we will have to write it out before we can recycle it; if someone else pins the buffer meanwhile we will @@ -259,7 +258,7 @@ dirty and not pinned nor marked with a positive usage count. It pins, writes, and releases any such buffer. If we can assume that reading nextVictimBuffer is an atomic action, then -the writer doesn't even need to take the BufFreelistLock in order to look +the writer doesn't even need to take buffer_strategy_lock in order to look for buffers to write; it needs only to spinlock each buffer header for long enough to check the dirtybit. Even without that assumption, the writer only needs to take the lock long enough to read the variable value, not diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14 10:29 AM, Tom Lane wrote: Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. I think this is not historically correct. The original motivation was that you had to remember what the units on shared_buffers = 37 were, and that it was different units than work_mem = 37 That's independent of the question whether shared_buffers = 250kB might be rejected or not. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14 1:13 AM, Stephen Frost wrote: To clarify- we'll simply swap from (essentially) floor() to ceil() for handling all GUC_with_unit to internal_unit conversions, document that, and note it in the release notes as a possible behavior change and move on. That'll probably work, as long as we don't invent any other-than-zero values that are somehow treated special. One might be able to construct a case where something gets rounded to -1 when it didn't before or the other way around, but that's clearly a silly case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Thanks for reviewing this. I will spend some time looking at your recommendations in detail and seeing what it would take to implement them, and whether I agree that is better; but I wanted to point out a couple things regarding the SPI interface where I'm not sure you realize what's currently being done. I may want to argue some of the rest if I don't agree after more detailed review; this is just what jumps out on a first pass. Heikki Linnakangas hlinnakan...@vmware.com wrote: instead of passing parameters to the SPI calls individually, you invented SPI_register_tuplestore which affects all subsequent SPI calls. All subsequent SPI calls on that particular SPI connection until it is closed, except for any tuplestores are later unregistered. Nested SPI connections do not automatically inherit the named tuplestores; whatever code opens an SPI connection would need to register them for the new context, if desired. This seemed to me to provide minimal disruption to the existing SPI callers who might want to use this. The next question is how to pass the new hooks and tuplestores through the SPI interface. For prepared plans, the current SPI_prepare_params + SPI_execute_plan_with_paramlist functions work fine. They work fine, I guess, in the *one* place they are used. SPI_prepare_params() is called exactly *once* from plpgsql's pl_exec.c, and SPI_execute_plan_with_paramlist() is called twice from the same file. There are no other calls to either from anywhere else in the code base. However, there doesn't seem to be any way to do one-shot queries with a ParserSetupHook and ParamListInfo. That seems like an oversight, and would be nice to address that anyway. There are dozens of SPI_prepare* and SPI_exec* calls scattered all over the backend, pl, and contrib code which appear to get along fine without that. Partly it may be because it involves something of a modularity violation; the comment for the function used for this call (where it *is* used) says: /* * plpgsql_parser_setup set up parser hooks for dynamic parameters * * Note: this routine, and the hook functions it prepares for, are logically * part of plpgsql parsing. But they actually run during function execution, * when we are ready to evaluate a SQL query or expression that has not * previously been parsed and planned. */ Can you clarify what benefit you see to modifying the SPI API the way you suggest, and what impact it might have on existing calling code? PS. the copy/read/write functions for TuplestoreRelation in the patch are broken; TuplestoreRelation is not a subclass of Plan. I'm not sure what you mean by broken -- could you elaborate? It is, in a lot of ways, parallel to the CommonTableExpr defined a little above it in the parsenodes.h file -- a relation which can only be referenced by unqualified name. It is after CTEs in search order, and before anything else. Having such a node in the tree after parse analysis allows a number of special cases to be handled pretty much the same as they are for CTEs, which seems like a good thing to me. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.6
23.09.2014, 15:18, Andres Freund kirjoitti: On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote: 23.09.2014, 00:01, Andres Freund kirjoitti: The patches: 0001: The actual atomics API I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal dingo) with this patch but regression tests failed due to: Btw, if you could try sun studio it'd be great. I wrote the support for it blindly, and I'd be surprised if I got it right on the first try. I just installed Solaris Studio 12.3 and tried compiling this: ../../../../src/include/port/atomics/generic-sunpro.h, line 54: return value type mismatch ../../../../src/include/port/atomics/generic-sunpro.h, line 77: return value type mismatch ../../../../src/include/port/atomics/generic-sunpro.h, line 79: #if-less #endif ../../../../src/include/port/atomics/generic-sunpro.h, line 81: #if-less #endif atomic_add_64 and atomic_add_32 don't return anything (the atomic_add_*_nv variants return the new value) and there were a few extra #endifs. Regression tests pass after applying the attached patch which defines PG_HAS_ATOMIC_ADD_FETCH_U32. There doesn't seem to be a fallback for defining pg_atomic_fetch_add based on pg_atomic_add_fetch so pg_atomic_add_fetch now gets implemented using pg_atomic_compare_exchange. Also, it's not possible to compile PG with FORCE_ATOMICS_BASED_SPINLOCKS with these patches: ../../../../src/include/storage/s_lock.h, line 868: #error: PostgreSQL does not have native spinlock support on this platform atomics/generic.h would implement atomic flags using operations exposed by atomics/generic-sunpro.h, but atomics/fallback.h is included before it and it defines functions for flag operations which s_lock.h doesn't want to use. / Oskari From 42a5bbbab0c8f42c6014ebe12c9963b371168866 Mon Sep 17 00:00:00 2001 From: Oskari Saarenmaa o...@ohmu.fi Date: Tue, 23 Sep 2014 23:53:46 +0300 Subject: [PATCH] atomics: fix atomic add for sunpro and drop invalid #endifs Solaris Studio compiler has an atomic add operation that returns the new value, the one with no _nv suffix doesn't return anything. --- src/include/port/atomics/generic-sunpro.h | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/src/include/port/atomics/generic-sunpro.h b/src/include/port/atomics/generic-sunpro.h index 10ac70d..fedc099 100644 --- a/src/include/port/atomics/generic-sunpro.h +++ b/src/include/port/atomics/generic-sunpro.h @@ -47,14 +47,12 @@ pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, return ret; } -#define PG_HAS_ATOMIC_FETCH_ADD_U32 +#define PG_HAS_ATOMIC_ADD_FETCH_U32 STATIC_IF_INLINE uint32 -pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_) +pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_) { - return atomic_add_32(ptr-value, add_); + return atomic_add_32_nv(ptr-value, add_); } -#endif - #define PG_HAS_ATOMIC_COMPARE_EXCHANGE_U64 static inline bool @@ -70,12 +68,11 @@ pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, return ret; } -#define PG_HAS_ATOMIC_FETCH_ADD_U64 +#define PG_HAS_ATOMIC_ADD_FETCH_U64 STATIC_IF_INLINE uint64 -pg_atomic_fetch_add_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_) +pg_atomic_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, int64 add_) { - return atomic_add_64(ptr-value, add_); + return atomic_add_64_nv(ptr-value, add_); } -#endif #endif -- 1.8.4.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
Peter Eisentraut pete...@gmx.net writes: On 9/23/14 10:29 AM, Tom Lane wrote: Votes or no votes, that's a horrible idea; it breaks the design goal that users shouldn't need to remember the precise unit size when making postgresql.conf entries. I think this is not historically correct. The original motivation was that you had to remember what the units on shared_buffers = 37 were, and that it was different units than work_mem = 37 Right, but the issue of not requiring the specified value to be an exact multiple of the underlying unit did come up in the early discussion, and we were pretty clear that we didn't want to throw an error for that: http://www.postgresql.org/message-id/flat/29818.1153976...@sss.pgh.pa.us#29818.1153976...@sss.pgh.pa.us regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
Robert Haas robertmh...@gmail.com writes: On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote: I did some more experimentation on this. Attached is a patch that JUST does #1, and, ... ...and that was the wrong patch. Thanks to Heikki for point that out. Second try. But the results you gave in the previous message were correctly attributed? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Sep 23, 2014 at 4:29 PM, Robert Haas robertmh...@gmail.com wrote: I did some more experimentation on this. Attached is a patch that JUST does #1, and, ... ...and that was the wrong patch. Thanks to Heikki for point that out. Second try. But the results you gave in the previous message were correctly attributed? The patch I attached the first time was just the last commit in the git repository where I wrote the patch, rather than the changes that I made on top of that commit. So, yes, the results from the previous message are with the patch attached to the follow-up. I just typed the wrong git command when attempting to extract that patch to attach it to the email. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/23/14, 10:31 AM, Robert Haas wrote: I suggest we count these things: 1. The number of buffers the reclaimer has put back on the free list. 2. The number of times a backend has run the clocksweep. 3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1). 4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3). 5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist). All sound reasonable. To avoid wasting time here, I think it's only worth doing all of these as DEBUG level messages for now. Then only go through the overhead of exposing the ones that actually seem relevant. That's what I did for the 8.3 work, and most of that data at this level was barely relevant to anyone but me then or since. We don't want the system views to include so much irrelevant trivia that finding the important parts becomes overwhelming. I'd like to see that level of instrumentation--just the debug level messages--used to quantify the benchmarks that people are running already, to prove they are testing what they think they are. That would have caught the test error you already stumbled on for example. Simple paranoia says there may be more issues like that hidden in here somewhere, and this set you've identified should find any more of them around. If all that matches up so the numbers for the new counters seem sane, I think that's enough to commit the first basic improvement here. Then make a second pass over exposing the useful internals that let everyone quantify either the improvement or things that may need to be tunable. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.6
On 2014-09-24 00:27:25 +0300, Oskari Saarenmaa wrote: 23.09.2014, 15:18, Andres Freund kirjoitti: On 2014-09-23 13:50:28 +0300, Oskari Saarenmaa wrote: 23.09.2014, 00:01, Andres Freund kirjoitti: The patches: 0001: The actual atomics API I tried building PG on Solaris 10/Sparc using GCC 4.9.0 (buildfarm animal dingo) with this patch but regression tests failed due to: Btw, if you could try sun studio it'd be great. I wrote the support for it blindly, and I'd be surprised if I got it right on the first try. I just installed Solaris Studio 12.3 and tried compiling this: Cool. ../../../../src/include/port/atomics/generic-sunpro.h, line 54: return value type mismatch ../../../../src/include/port/atomics/generic-sunpro.h, line 77: return value type mismatch ../../../../src/include/port/atomics/generic-sunpro.h, line 79: #if-less #endif ../../../../src/include/port/atomics/generic-sunpro.h, line 81: #if-less #endif atomic_add_64 and atomic_add_32 don't return anything (the atomic_add_*_nv variants return the new value) and there were a few extra #endifs. Regression tests pass after applying the attached patch which defines PG_HAS_ATOMIC_ADD_FETCH_U32. Thanks for the fixes! Hm. I think then it's better to simply not implement addition and rely on cmpxchg. Also, it's not possible to compile PG with FORCE_ATOMICS_BASED_SPINLOCKS with these patches: ../../../../src/include/storage/s_lock.h, line 868: #error: PostgreSQL does not have native spinlock support on this platform atomics/generic.h would implement atomic flags using operations exposed by atomics/generic-sunpro.h, but atomics/fallback.h is included before it and it defines functions for flag operations which s_lock.h doesn't want to use. Right. It should check not just for flag support, but also for 32bit atomics. Easily fixable. I'll send a new version with your fixes incorporated tomorrow. Thanks for looking into this! Very helpful. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-09-23 16:29:16 -0400, Robert Haas wrote: On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas robertmh...@gmail.com wrote: But this gets at another point: the way we're benchmarking this right now, we're really conflating the effects of three different things: 1. Changing the locking regimen around the freelist and clocksweep. 2. Adding a bgreclaimer process. 3. Raising the number of buffer locking partitions. I did some more experimentation on this. Attached is a patch that JUST does #1, and, as previously suggested, it uses a single spinlock instead of using two of them that are probably in the same cacheline. Without this patch, on a 32-client, read-only pgbench at scale factor 1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about LWLock-inspired preemption: - LWLockAcquire + 68.41% ReadBuffer_common + 31.59% StrategyGetBuffer With the patch, unsurprisingly, StrategyGetBuffer disappears and the only lwlocks that are causing context-switches are the individual buffer locks. But are we suffering spinlock contention instead as a result? Yes, but not much. s_lock is at 0.41% in the corresponding profile, and only 6.83% of those calls are from the patched StrategyGetBuffer. In a similar profile of master, s_lock is at 0.31%. So there's a bit of additional s_lock contention, but I think it's considerably less than the contention over the lwlock it's replacing, because the spinlock is only held for the minimal amount of time needed, whereas the lwlock could be held across taking and releasing many individual buffer locks. Am I understanding you correctly that you also measured context switches for spinlocks? If so, I don't think that's a valid comparison. LWLocks explicitly yield the CPU as soon as there's any contention while spinlocks will, well, spin. Sure they also go to sleep, but it'll take longer. It's also worthwile to remember in such comparisons that lots of the cost of spinlocks will be in the calling routines, not s_lock() - the first TAS() is inlined into it. And that's the one that'll incur cache misses and such... Note that I'm explicitly *not* doubting the use of a spinlock itself. Given the short acquiration times and the exclusive use of exlusive acquiration a spinlock makes more sense. The lwlock's spinlock alone will have about as much contention. I think it might be possible to construct some cases where the spinlock performs worse than the lwlock. But I think those will be clearly in the minority. And at least some of those will be fixed by bgwriter. As an example consider a single backend COPY ... FROM of large files with a relatively large s_b. That's a seriously bad workload for the current victim buffer selection because frequently most of the needs to be searched for a buffer with usagecount 0. And this patch will double the amount of atomic ops during that. Let me try to quantify that. What's interesting is that I can't see in the perf output any real sign that the buffer mapping locks are slowing things down, but they clearly are, because when I take this patch and also boost NUM_BUFFER_PARTITIONS to 128, the performance goes up: What did events did you profile? I welcome further testing and comments, but my current inclination is to go ahead and push the attached patch. To my knowledge, nobody has at any point objected to this aspect of what's being proposed, and it looks safe to me and seems to be a clear win. We can then deal with the other issues on their own merits. I've took a look at this, and all the stuff I saw that I disliked were there before this patch. So +1 for going ahead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 6:02 PM, Gregory Smith gregsmithpg...@gmail.com wrote: On 9/23/14, 10:31 AM, Robert Haas wrote: I suggest we count these things: 1. The number of buffers the reclaimer has put back on the free list. 2. The number of times a backend has run the clocksweep. 3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1). 4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3). 5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist). All sound reasonable. To avoid wasting time here, I think it's only worth doing all of these as DEBUG level messages for now. Then only go through the overhead of exposing the ones that actually seem relevant. That's what I did for the 8.3 work, and most of that data at this level was barely relevant to anyone but me then or since. We don't want the system views to include so much irrelevant trivia that finding the important parts becomes overwhelming. I think we expose far too little information in our system views. Just to take one example, we expose no useful information about lwlock acquire or release, but a lot of real-world performance problems are caused by lwlock contention. There are of course difficulties in exposing huge numbers of counters, but we're not talking about many here, so I'd lean toward exposing them in the final patch if they seem at all useful. I'd like to see that level of instrumentation--just the debug level messages--used to quantify the benchmarks that people are running already, to prove they are testing what they think they are. That would have caught the test error you already stumbled on for example. Simple paranoia says there may be more issues like that hidden in here somewhere, and this set you've identified should find any more of them around. Right. If all that matches up so the numbers for the new counters seem sane, I think that's enough to commit the first basic improvement here. Then make a second pass over exposing the useful internals that let everyone quantify either the improvement or things that may need to be tunable. Well, I posted a patch a bit ago that I think is the first basic improvement - and none of these counters are relevant to that. It doesn't add a new background process or anything; it just does pretty much the same thing we do now with less-painful locking. There are no water marks to worry about, or tunable thresholds, or anything; and because it's so much simpler, it's far easier to reason about than the full patch, which is why I feel quite confident pressing on toward a commit. Once that is in, I think we should revisit the idea of a bgreclaimer process, and see how much more that improves things - if at all - on top of what that basic patch already does. For that we'll need these counters, and maybe others. But let's make that phase two. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund and...@2ndquadrant.com wrote: Am I understanding you correctly that you also measured context switches for spinlocks? If so, I don't think that's a valid comparison. LWLocks explicitly yield the CPU as soon as there's any contention while spinlocks will, well, spin. Sure they also go to sleep, but it'll take longer. No, I measured CPU consumption attributable to s_lock. (I checked context-switches, too.) It's also worthwile to remember in such comparisons that lots of the cost of spinlocks will be in the calling routines, not s_lock() - the first TAS() is inlined into it. And that's the one that'll incur cache misses and such... True. I can check that - I did not. Note that I'm explicitly *not* doubting the use of a spinlock itself. Given the short acquiration times and the exclusive use of exlusive acquiration a spinlock makes more sense. The lwlock's spinlock alone will have about as much contention. Right. I think it might be possible to construct some cases where the spinlock performs worse than the lwlock. But I think those will be clearly in the minority. And at least some of those will be fixed by bgwriter. As an example consider a single backend COPY ... FROM of large files with a relatively large s_b. That's a seriously bad workload for the current victim buffer selection because frequently most of the needs to be searched for a buffer with usagecount 0. And this patch will double the amount of atomic ops during that. It will actually be far worse than that, because we'll acquire and release the spinlock for every buffer over which we advance the clock sweep, instead of just once for the whole thing. That's reason to hope that a smart bgreclaimer process may be a good improvement on top of this. It can do things like advance the clock sweep hand 16 buffers at a time and then sweep them all after-the-fact, reducing contention on the mutex by an order-of-magnitude, if that turns out to be an important consideration. But I think it's right to view that as something we need to test vs. the baseline established by this patch. What's clear today is that workloads which stress buffer-eviction fall to pieces, because the entire buffer-eviction process is essentially single-threaded. One process can't begin evicting a buffer until another has finished doing so. This lets multiple backends do that at the same time. We may find cases where that leads to an unpleasant amount of contention, but since we have several ideas for how to mitigate that as needs be, I think it's OK to go ahead. The testing we're doing on the combined patch is conflating the effects the new locking regimen with however the bgreclaimer affects things, and it's very clear to me now that we need to make sure those are clearly separate. Let me try to quantify that. Please do. What's interesting is that I can't see in the perf output any real sign that the buffer mapping locks are slowing things down, but they clearly are, because when I take this patch and also boost NUM_BUFFER_PARTITIONS to 128, the performance goes up: What did events did you profile? cs. I welcome further testing and comments, but my current inclination is to go ahead and push the attached patch. To my knowledge, nobody has at any point objected to this aspect of what's being proposed, and it looks safe to me and seems to be a clear win. We can then deal with the other issues on their own merits. I've took a look at this, and all the stuff I saw that I disliked were there before this patch. So +1 for going ahead. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Tue, Sep 23, 2014 at 3:04 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: Heikki Linnakangas wrote: If you add a new datatype, and define b-tree operators for it, what is required to create a minmax opclass for it? Would it be possible to generalize the functions in brin_minmax.c so that they can be reused for any datatype (with b-tree operators) without writing any new C code? I think we're almost there; the only thing that differs between each data type is the opcinfo function. Let's pass the type OID as argument to the opcinfo function. You could then have just a single minmax_opcinfo function, instead of the macro to generate a separate function for each built-in datatype. Yeah, that's how I had that initially. I changed it to what it's now as part of a plan to enable building cross-type opclasses, so you could have WHERE int8col=42 without requiring a cast of the constant to type int8. This might have been a thinko, because AFAICS it's possible to build them with a constant opcinfo as well (I changed several other things to support this, as described in a previous email.) I will look into this later. I found out that we don't really throw errors in such cases anymore; we insert casts instead. Maybe there's a performance argument that it might be better to use existing cross-type operators than casting, but justifying this work just turned a lot harder. Here's a patch that reverts opcinfo into a generic function that receives the type OID. I will look into adding some testing mechanism for the union support proc; with that I will just consider the patch ready for commit and will push. With all respect, I think this is a bad idea. I know you've put a lot of energy into this patch and I'm confident it's made a lot of progress. But as with Stephen's patch, the final form deserves a thorough round of looking over by someone else before it goes in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Wed, Sep 24, 2014 at 8:23 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 23, 2014 at 3:04 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: I will look into adding some testing mechanism for the union support proc; with that I will just consider the patch ready for commit and will push. With all respect, I think this is a bad idea. I know you've put a lot of energy into this patch and I'm confident it's made a lot of progress. But as with Stephen's patch, the final form deserves a thorough round of looking over by someone else before it goes in. Would this person be it an extra committer or an simple reviewer? It would give more insurance if such huge patches (couple of thousands of lines) get an extra +1 from another committer, proving that the code has been reviewed by people well-experienced with backend code. Now as this would put more pressure in the hands of committers, an extra external pair of eyes, be it non-committer but let's say a seasoned reviewer would be fine IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] “Core” function in Postgres
Hi experts, I want to know what's the core function used in Postgres server? I am looking for something corresponding to main() in a simple C program. I want to know the file path and the function name. I am using Postgres 9.3.5, however I assume the core function will be unchanged between different revisions. Please let me know if you are confused by my question. Thanks Zack PS: I have the same post on stackoverflow. Since no one answered there, I just report here.
Re: [HACKERS] Scaling shared buffer eviction
On 2014-09-23 19:21:10 -0400, Robert Haas wrote: On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund and...@2ndquadrant.com wrote: I think it might be possible to construct some cases where the spinlock performs worse than the lwlock. But I think those will be clearly in the minority. And at least some of those will be fixed by bgwriter. Err, this should read bgreclaimer, not bgwriter. As an example consider a single backend COPY ... FROM of large files with a relatively large s_b. That's a seriously bad workload for the current victim buffer selection because frequently most of the needs to be searched for a buffer with usagecount 0. And this patch will double the amount of atomic ops during that. It will actually be far worse than that, because we'll acquire and release the spinlock for every buffer over which we advance the clock sweep, instead of just once for the whole thing. I said double, because we already acquire the buffer header's spinlock every tick. That's reason to hope that a smart bgreclaimer process may be a good improvement on top of this. Right. That's what I was trying to say with bgreclaimer above. But I think it's right to view that as something we need to test vs. the baseline established by this patch. Agreed. I think the possible downsides are at the very fringe of already bad cases. That's why I agreed that you should go ahead. Let me try to quantify that. Please do. I've managed to find a ~1.5% performance regression. But the setup was plain absurd. COPY ... FROM /tmp/... BINARY; of large bytea datums into a fillfactor 10 table with the column set to PLAIN storage. With the resulting table size chosen so it's considerably bigger than s_b, but smaller than the dirty writeback limit of the kernel. That's perfectly reasonable. I can think of a couple other cases, but they're all similarly absurd. What's interesting is that I can't see in the perf output any real sign that the buffer mapping locks are slowing things down, but they clearly are, because when I take this patch and also boost NUM_BUFFER_PARTITIONS to 128, the performance goes up: What did events did you profile? cs. Ah. My guess is that most of the time will probably actually be spent in the lwlock's spinlock, not the the lwlock putting itself to sleep. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] “Core” function in Postgres
On Tue, Sep 23, 2014 at 4:29 PM, Mingzhe Li mingzhe0...@gmail.com wrote: I want to know what's the core function used in Postgres server? I am looking for something corresponding to main() in a simple C program. I want to know the file path and the function name. I am using Postgres 9.3.5, however I assume the core function will be unchanged between different revisions. Please let me know if you are confused by my question. I think that the tcop is the closest thing to what you're looking for (which includes postgres.c/PostgresMain()). There is a very simple stub entry point (main(int argc, char *argv[])) within main.c, too, which is the real entry point. Why not just set some breakpoints in a place that seems interesting from within GDB, and inspect the call stack? That can be a useful technique for gaining understanding of the structure of complicated codebases that you're totally unfamiliar with. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JsonbValue to Jsonb conversion
On 09/23/2014 12:23 PM, Dmitry Dolgov wrote: Hi all, I'm faced with some troubles about the jsonb implementation, and I hope I'll get little advice =) If I understand correctly, an abstract function for jsonb modification should have the following stages: Jsonb - JsonbValue - Modification - JsonbValue - Jsonb One can convert the *JsonbValue* to the *Jsonb* only by *JsonbValueToJsonb* function. So, my question is can be *JsonbValue*, that contains few *jbvBinary* elements, converted to *Jsonb* by this function? It will be very useful, if you want modify only small part of your JsonbValue (e.g. replace value of some key). But when I'm trying to do this, an exception unknown type of jsonb container appears. Maybe I missed something? Or is there another approach to do this conversion? If you can come up with a way of handling the jbvBinary values then by all means send a patch. But this problem is fairly easily worked around by using an iterator over the binary value. The attached patch, which is work in progress for adding in the currently missing json functions for jsonb, contains a sort of example of doing this in jsonb_agg_transfn. cheers andrew diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 2fd87fc..82dae72 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -12,11 +12,20 @@ */ #include postgres.h +#include miscadmin.h +#include access/htup_details.h +#include access/transam.h +#include catalog/pg_cast.h +#include catalog/pg_type.h #include libpq/pqformat.h #include utils/builtins.h +#include utils/datetime.h +#include utils/lsyscache.h #include utils/json.h #include utils/jsonapi.h #include utils/jsonb.h +#include utils/syscache.h +#include utils/typcache.h typedef struct JsonbInState { @@ -24,6 +33,23 @@ typedef struct JsonbInState JsonbValue *res; } JsonbInState; +/* unlike with json categories, we need to treat json and jsonb differently */ +typedef enum /* type categories for datum_to_jsonb */ +{ + JSONBTYPE_NULL,/* null, so we didn't bother to identify */ + JSONBTYPE_BOOL,/* boolean (built-in types only) */ + JSONBTYPE_NUMERIC, /* numeric (ditto) */ + JSONBTYPE_TIMESTAMP,/* we use special formatting for timestamp */ + JSONBTYPE_TIMESTAMPTZ, /* ... and timestamptz */ + JSONBTYPE_JSON,/* JSON */ + JSONBTYPE_JSONB, /* JSONB */ + JSONBTYPE_ARRAY, /* array */ + JSONBTYPE_COMPOSITE, /* composite */ + JSONBTYPE_JSONCAST, /* something with an explicit cast to JSON */ + JSONBTYPE_JSONBCAST, /* something with an explicit cast to JSONB */ + JSONBTYPE_OTHER/* all else */ +} JsonbTypeCategory; + static inline Datum jsonb_from_cstring(char *json, int len); static size_t checkStringLen(size_t len); static void jsonb_in_object_start(void *pstate); @@ -33,6 +59,22 @@ static void jsonb_in_array_end(void *pstate); static void jsonb_in_object_field_start(void *pstate, char *fname, bool isnull); static void jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal); static void jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype); +static void jsonb_categorize_type(Oid typoid, + JsonbTypeCategory *tcategory, + Oid *outfuncoid); +static void composite_to_jsonb(Datum composite, JsonbInState *result); +static void array_dim_to_jsonb(JsonbInState *result, int dim, int ndims, int *dims, + Datum *vals, bool *nulls, int *valcount, + JsonbTypeCategory tcategory, Oid outfuncoid); +static void array_to_jsonb_internal(Datum array, JsonbInState *result); +static void jsonb_categorize_type(Oid typoid, + JsonbTypeCategory *tcategory, + Oid *outfuncoid); +static void datum_to_jsonb(Datum val, bool is_null, JsonbInState *result, + JsonbTypeCategory tcategory, Oid outfuncoid, + bool key_scalar); +static void add_jsonb(Datum val, bool is_null, JsonbInState *result, + Oid val_type, bool key_scalar); /* * jsonb type input function @@ -462,3 +504,1070 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len) return out-data; } + + +/* + * Determine how we want to render values of a given type in datum_to_jsonb. + * + * Given the datatype OID, return its JsonbTypeCategory, as well as the type's + * output function OID. If the returned category is JSONBTYPE_CAST, we + * return the OID of the type-JSON cast function instead. + */ +static void +jsonb_categorize_type(Oid typoid, + JsonbTypeCategory *tcategory, + Oid *outfuncoid) +{ + bool typisvarlena; + + /* Look through any domain */ + typoid = getBaseType(typoid); + + /* We'll usually need to return the type output function */ + getTypeOutputInfo(typoid, outfuncoid, typisvarlena); + + /* Check for known types */ + switch (typoid) + { + case BOOLOID: + *tcategory = JSONBTYPE_BOOL; + break; + + case INT2OID: + case INT4OID: + case INT8OID: + case FLOAT4OID: + case FLOAT8OID: + case NUMERICOID: +
Re: [HACKERS] jsonb format is pessimal for toast compression
On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think we should bite the bullet and break compatibility with 9.4beta2 format, even if we go with my patch. In a jsonb object, it makes sense to store all the keys first, like Tom did, because of cache benefits, and the future possibility to do smart EXTERNAL access. Also, even if we can make the on-disk format compatible, it's weird that you can get different runtime behavior with datums created with a beta version. Seems more clear to just require a pg_dump + restore. I vote for going with your patch, and breaking compatibility for the reasons stated here (though I'm skeptical of the claims about cache benefits, FWIW). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Peter Geoghegan p...@heroku.com writes: On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think we should bite the bullet and break compatibility with 9.4beta2 format, even if we go with my patch. In a jsonb object, it makes sense to store all the keys first, like Tom did, because of cache benefits, and the future possibility to do smart EXTERNAL access. Also, even if we can make the on-disk format compatible, it's weird that you can get different runtime behavior with datums created with a beta version. Seems more clear to just require a pg_dump + restore. I vote for going with your patch, and breaking compatibility for the reasons stated here (though I'm skeptical of the claims about cache benefits, FWIW). I'm also skeptical of that, but I think the potential for smart EXTERNAL access is a valid consideration. I've not had time to read Heikki's updated patch yet --- has anyone else compared the two patches for code readability? If they're fairly close on that score, then I'd agree his approach is the best solution. (I will look at his code, but I'm not sure I'm the most unbiased observer.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core function in Postgres
On 9/23/14, 8:02 PM, Michael Paquier wrote: pgsql-hackers is as well a mailing list where people have technical discussions about features and patches, hence your question would be more adapted for pgsql-novice or pgsql-general. Let's be fair and get the details perfect if we're going to advise people on list policy, and that's not quite right if you compare against http://www.postgresql.org/list/ The official description says you consider trying elsewhere first, and If your question cannot be answered by people in the other lists, and it is likely that only a developer will know the answer, you may re-post your question in this list. From that, it's completely reasonable that Mingzhe guessed how main() is used in PostgreSQL is something only a developer would know the answer to, and therefore should go here instead of pgsql-general. My feedback would be to point out that that pgsql-general is also a list where many of the developers monitor traffic, and that's the right place to post first for this sort of question. The mailing list page does not make that completely obvious though, so this is an understandable thing to be unsure about. This is definitely not a pgsql-novice question. The mailing list page could probably use an explicit clarification that pgsql-general *is* an appropriate venue for simpler questions that only a developer will know the answer. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
Robert Haas wrote: With all respect, I think this is a bad idea. I know you've put a lot of energy into this patch and I'm confident it's made a lot of progress. But as with Stephen's patch, the final form deserves a thorough round of looking over by someone else before it goes in. As you can see in the thread, Heikki's put a lot of review effort into it (including important code contributions); I don't feel I'm rushing it at this point. If you or somebody else want to give it a look, I have no problem waiting a bit longer. I don't want to delay indefinitely, though, because I think it's better shipped early in the release cycle than later, to allow for further refinements and easier testing by other interested parties. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] “Core” function in Postgres
On 24/09/14 11:29, Mingzhe Li wrote: Hi experts, I want to know what's the core function used in Postgres server? I am looking for something corresponding to main() in a simple C program. I want to know the file path and the function name. I am using Postgres 9.3.5, however I assume the core function will be unchanged between different revisions. I suspect you want to start looking at PostgresMain in src/backend/tcop/postgres.c and ServerLoop in src/backend/postmaster/postmaster.c Regards Mark P.s: FWIW I think this *is* the right list to ask this type of question... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Tue, Sep 23, 2014 at 7:35 PM, Michael Paquier michael.paqu...@gmail.com wrote: Would this person be it an extra committer or an simple reviewer? It would give more insurance if such huge patches (couple of thousands of lines) get an extra +1 from another committer, proving that the code has been reviewed by people well-experienced with backend code. Now as this would put more pressure in the hands of committers, an extra external pair of eyes, be it non-committer but let's say a seasoned reviewer would be fine IMO. If you're volunteering, I certainly wouldn't say no. The more the merrier. Same with anyone else. Since Heikki looked at it before, I also think it would be appropriate to give him a bit of time to see if he feels satisfied with it now - nobody on this project has more experience with indexing than he does, but he may not have the time, and even if he does, someone else might spot something he misses. Alvaro's quite right to point out that there is no sense in waiting a long time for a review that isn't coming. That just backs everything up against the end of the release cycle to no benefit. But if there's review available from experienced people within the community, taking advantage of that now might find things that could be much harder to fix later. That's a win for everybody. And it's not like we're pressed up against the end of the cycle, nor is it as if this feature has been through endless rounds of review already. It's certainly had some, and it's gotten better as a result. But it's also changed a lot in the process. And much of the review to date has been high-level design review, like how should the opclasses look? and what should we call this thing anyway?. Going through it for logic errors, documentation shortcomings, silly thinkos, etc. has not been done too much, I think, and definitely not on the latest version. So, some of that might not be out of place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Tue, Sep 23, 2014 at 9:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: With all respect, I think this is a bad idea. I know you've put a lot of energy into this patch and I'm confident it's made a lot of progress. But as with Stephen's patch, the final form deserves a thorough round of looking over by someone else before it goes in. As you can see in the thread, Heikki's put a lot of review effort into it (including important code contributions); I don't feel I'm rushing it at this point. Yeah, I was really glad Heikki looked at it. That seemed good. If you or somebody else want to give it a look, I have no problem waiting a bit longer. I don't want to delay indefinitely, though, because I think it's better shipped early in the release cycle than later, to allow for further refinements and easier testing by other interested parties. I agree with that. I'd like to look at it, and I will if I get time, but as I said elsewhere, I also think it's appropriate to give a little time around the final version of any big, complex patch just because people may have thoughts, and they may not have time to deliver those thoughts the minute the patch hits the list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Sep 23, 2014 at 7:42 PM, Andres Freund and...@2ndquadrant.com wrote: It will actually be far worse than that, because we'll acquire and release the spinlock for every buffer over which we advance the clock sweep, instead of just once for the whole thing. I said double, because we already acquire the buffer header's spinlock every tick. Oh, good point. Let me try to quantify that. Please do. I've managed to find a ~1.5% performance regression. But the setup was plain absurd. COPY ... FROM /tmp/... BINARY; of large bytea datums into a fillfactor 10 table with the column set to PLAIN storage. With the resulting table size chosen so it's considerably bigger than s_b, but smaller than the dirty writeback limit of the kernel. That's perfectly reasonable. I can think of a couple other cases, but they're all similarly absurd. Well, it's not insane to worry about such things, but if you can only manage 1.5% on such an extreme case, I'm encouraged. This is killing us on OLTP workloads, and fixing that is a lot more important than a couple percent on an extreme case. Ah. My guess is that most of the time will probably actually be spent in the lwlock's spinlock, not the the lwlock putting itself to sleep. Ah, OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/09/13 2:42), Fujii Masao wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. Agreed. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. Yes, I was thinking something like that. But if autovacuum has already been able to handle that, it's nice. Anyway, as you pointed out, it's better to have both GUC and reloption for the cleanup size of pending list. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Sorry for the delay. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14, 1:21 AM, David Johnston wrote: This patch should fix the round-to-zero issue. If someone wants to get rid of zero as a special case let them supply a separate patch for that improvement. I am going to wander into this fresh after just reading everything once (but with more than a little practice with real-world GUC mucking) and say that, no, it should not even do that. The original complaint here is real and can be straightforward to fix, but I don't like any of the proposals so far. My suggestion: fix the one really bad one in 9.5 with an adjustment to the units of log_rotation_age. That's a small commit that should thank Tomonari Katsumata for discovering the issue and even suggesting a fix (that we don't use) and a release note; sample draft below. Stop there. Let someone with a broader objection take on the fact that zero (and -1) values have special properties, and that sucks, as a fully reasoned redesign. I have a larger idea for minimizing rounding issues of timestamps in particular at the bottom of this too. I wouldn't dare take on changes to rounding of integers without sorting out the special flag value issue first, because the variety of those in the database is large compared to the time ones. == log_rotation_age == Back to where this started at http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp : log_rotation_age would be disabled if we set it less than one minute, with this example of a surprising behavior: log_rotation_age = 10s postgres=# show log_rotation_age ; log_rotation_age -- 0 That's bad and the GUC system is not being friendly; fully agreed. But this is just one where the resolution for the parameter is poor. log_rotation_age is the *only* GUC with a time interval that's set in minutes: postgres=# SELECT name, unit FROM pg_settings WHERE unit ='min'; name | unit --+-- log_rotation_age | min For comparison, there are 10 GUCs set in seconds and 13 in ms in HEAD. We could just change the units for log_rotation_age to seconds, then let the person who asks for a 10s rotation time suffer for that decision only with many log files. The person who instead chooses 10ms may find log rotation disabled altogether because that rounds to zero, but now we are not talking about surprises on something that seems sane--that's a fully unreasonable user request. Following that style of fix all the way through to the sort of release notes needed would give something like this: log_rotation_age is now set in units of seconds instead of minutes. Earlier installations of PostgreSQL that set this value directly, to a value in minutes, should change that setting to use a time unit before migrating to this version. And we could be done for now. == Flag values and error handling == I would like to see using 0 and -1 as special values in GUCs overhauled one day, with full disregard for backward compatibility as a straightjacket on doing the right thing. This entire class of behavior is annoying. But I am skeptical anything less than such an overhaul will really be useful. To me it's not worth adding new code, documentation, and associated release notes to guide migration all to try and describe new rounding trivia--which I don't see as better nor worse than the trade-offs of what happens today. I can't take the idea of throwing errors for something this minor seriously at all. There are a lot more precedents for spreading settings around in a few places now, from include_dir to postgresql.auto.conf. There are so many ways to touch a config now and not notice the error message now, I really don't want to see any more situations appear where trying to change a setting will result in a broken file added into that mix. None of this broken units due to rounding stuff that I've found, now that I went out of my way looking for some, even comes close to rising to that level of serious to me. Only this log rotation one is so badly out of line that it will act quite badly. == Overhauling all time unit GUCs == Here's the complete list of potential ugly time settings to review in the future, if my suggestion of only fixing log_rotation_age were followed: gsmith=# SELECT name,setting, unit, min_val FROM pg_settings where unit in ('s','ms') and (min_val::integer)=0 order by unit,min_val,name; name | setting | unit | min_val --+-+--+- autovacuum_vacuum_cost_delay | 20 | ms | -1 log_autovacuum_min_duration | -1 | ms | -1 log_min_duration_statement | -1 | ms | -1 max_standby_archive_delay| 3 | ms | -1 max_standby_streaming_delay | 3 | ms | -1 lock_timeout | 0 | ms | 0 statement_timeout| 0 | ms | 0 vacuum_cost_delay| 0 | ms | 0 wal_receiver_timeout | 6
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan p...@heroku.com wrote: Omission === The patch currently lacks a way of referencing datums rejected for insertion when updating. Attached revision of the patch set (which I'll call v1.2) adds this capability in a separate commit. It now becomes possible to add a CONFLICTING expression within the ON CONFLICT UPDATE targetlist or predicate. Example use: postgres=# CREATE TABLE upsert(key int4 PRIMARY KEY, val text); CREATE TABLE postgres=# INSERT INTO upsert VALUES(1, 'Giraffe'); INSERT 0 1 postgres=# SELECT * FROM upsert; key | val -+- 1 | Giraffe (1 row) postgres=# INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val); INSERT 0 1 postgres=# SELECT * FROM upsert; key | val -+-- 1 | Bear 2 | Lion (2 rows) Note that the effects of BEFORE INSERT triggers are carried here, which I slightly favor over the alternative of not having it work that way. I've also expanded upon my explanation for the structure of the query tree and plan within (revised/rebased versions of) earlier commits. I am clearer on why there is a special subquery planning step for the auxiliary UPDATE, rather than making the UPDATE directly accessible as a subquery within the post-parse-analysis query tree. Basically, the optimizer has no basis for understanding that a DML sublink isn't optimizable. It'll try to pull-up the subquery and so on, which of course does not and cannot work. Whereas treating it as an independently planned subquery of the top-level query, kind of like a data-modifying CTE makes sense (with such CTEs, the executor is prepared for the possibility that not all rows will be pulled up - so there too, the executor drives execution more directly than makes sense when not dealing with DML: it finishes off the data-modifying CTE's DML for any still-unconsumed tuples, within ExecPostprocessPlan()). It's certainly possible that a more unified representation makes sense (i.e. one ModifyTable plan, likely still having seperate INSERT/UPDATE representations at earlier stages of query processing), but that would require serious refactoring of the representation of ModifyTable operations -- just for example, consider the need for a unified-though-separate targetlist, one for the INSERT part, the other for the UPDATE part. For now, I continue to find it very convenient to represent the UPDATE as a selectively executed, auxiliary, distinct ModifyTable plan, rather than adding a subquery rangetable directly during parse analysis. There is another significant change. In this revision, I am at least honest about the plan representation within EXPLAIN: postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val); QUERY PLAN -- Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual time=0.115..0.115 rows=0 loops=1) - Values Scan on *VALUES* (cost=0.00..0.03 rows=2 width=36) (actual time=0.003..0.005 rows=2 loops=1) - Conflict Update on upsert (cost=0.00..22.30 rows=1230 width=36) (actual time=0.042..0.051 rows=0 loops=1) - Seq Scan on upsert (cost=0.00..22.30 rows=1230 width=36) (never executed) Planning time: 0.065 ms Execution time: 0.158 ms (6 rows) postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val) where key = 2; QUERY PLAN -- Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual time=0.075..0.075 rows=0 loops=1) - Values Scan on *VALUES* (cost=0.00..0.03 rows=2 width=36) (actual time=0.001..0.002 rows=2 loops=1) - Conflict Update on upsert (cost=4.16..8.17 rows=1 width=36) (actual time=0.012..0.026 rows=0 loops=1) - Bitmap Heap Scan on upsert (cost=4.16..8.17 rows=1 width=36) (never executed) Recheck Cond: (key = 2) - Bitmap Index Scan on upsert_pkey (cost=0.00..4.16 rows=1 width=0) (never executed) Index Cond: (key = 2) Planning time: 0.090 ms Execution time: 0.125 ms (9 rows) The second query gets a bitmap scan because plain index scans have been disabled for the UPDATE (a temporary kludge), since index-only scans can break things - IndexOnlyRecheck() throws an error. Not quite sure why the optimizer doesn't care about resjunk for the UPDATE, which is presumably why in general regular updates never use index-only scans. Since I think the actual auxiliary plan generation needs work, so as to not have uselessly complicated plans, I didn't try too hard to figure that out. This plan structure is not acceptable, of course, but maybe
Re: [HACKERS] LIMIT for UPDATE and DELETE
(2014/09/17 1:58), Robert Haas wrote: On Tue, Sep 16, 2014 at 11:31 AM, Kevin Grittner kgri...@ymail.com wrote: Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/08/15 6:18), Rukh Meski wrote: Based on the feedback on my previous patch, I've separated only the LIMIT part into its own feature. This version plays nicely with inheritance. The intended use is splitting up big UPDATEs and DELETEs into batches more easily and efficiently. IIUC, the patch doesn't support OFFSET with UPDATE/DELETE ... LIMIT. Is that OK? When we support ORDER BY ... LIMIT/OFFSET, we will also be allowing for OFFSET with UPDATE/DELETE ... LIMIT. So, ISTM it would be better for the patch to support OFFSET at this point. No? Without ORDER BY you really would have no idea *which* rows the OFFSET would be skipping. Even more dangerously, you might *think* you do, and get a surprise when you see the results (if, for example, a seqscan starts at a point other than the start of the heap, due to a concurrent seqscan from an unrelated query). It might be better not to provide an illusion of a degree of control you don't have, especially for UPDATE and DELETE operations. Fair point, but I'd lean toward including it. I think we all agree the end goal is ORDER BY .. LIMIT, and there OFFSET certainly has meaning. If we don't include it now, we'll just end up adding it later. It makes for fewer patches, and fewer changes for users, if we do it all at once. I agree with Robert. Rukh, what do you think as an author? Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Tue, Sep 23, 2014 at 10:11 PM, Gregory Smith gregsmithpg...@gmail.com wrote: On 9/23/14, 1:21 AM, David Johnston wrote: This patch should fix the round-to-zero issue. If someone wants to get rid of zero as a special case let them supply a separate patch for that improvement. I am going to wander into this fresh after just reading everything once (but with more than a little practice with real-world GUC mucking) and say that, no, it should not even do that. The original complaint here is real and can be straightforward to fix, but I don't like any of the proposals so far. My suggestion: fix the one really bad one in 9.5 with an adjustment to the units of log_rotation_age. That's a small commit that should thank Tomonari Katsumata for discovering the issue and even suggesting a fix (that we don't use) and a release note; sample draft below. Stop there. +1 I'm not convinced minute is wrong but it does imply a level of user-friendliness (or over-parenting) that we can do away with. We could just change the units for log_rotation_age to seconds, then let the person who asks for a 10s rotation time suffer for that decision only with many log files. The person who instead chooses 10ms may find log rotation disabled altogether because that rounds to zero, but now we are not talking about surprises on something that seems sane--that's a fully unreasonable user request. Given the following why not just pick ms for log_rotation_age now? Following that style of fix all the way through to the sort of release notes needed would give something like this: log_rotation_age is now set in units of seconds instead of minutes. Earlier installations of PostgreSQL that set this value directly, to a value in minutes, should change that setting to use a time unit before migrating to this version. ? are there any special considerations for people using pg_dump vs. those using pg_upgrade? If I were feeling ambitious about breaking configurations with a long-term eye toward improvement, I'd be perfectly happy converting everything on this list to ms. We live in 64 bit integer land now; who cares if the numbers are bigger? Then the rounding issues only impact sub-millisecond values, making all them squarely in nonsense setting land. Users will be pushed very clearly to always use time units in their postgresql.conf files instead of guessing whether something is set in ms vs. seconds. Seeing the reaction to a units change on log_rotation_age might be a useful test for how that sort of change plays out in the real world. If we are going to go that far why not simply modify the GUC input routine to require unit-values on these particular parameters? Direct manipulation of pg_settings probably would need some attention but everything else could reject integers and non-unit-specifying text. Allow the same input routine to accept the constants on|off|default and convert them internally into whatever the given GUC requires and you get the UI benefits without mucking around with the implementation details. Modify pg_settings accordingly to hide those now-irrelevant pieces. For UPDATE a trigger can be used to enforce the text-only input requirement. As long as we do not make microsecond µs a valid time-unit it is impossible currently to directly specify a fraction of a millisecond. David J.
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
Robert Haas wrote: On Tue, Sep 23, 2014 at 9:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: If you or somebody else want to give it a look, I have no problem waiting a bit longer. I don't want to delay indefinitely, though, because I think it's better shipped early in the release cycle than later, to allow for further refinements and easier testing by other interested parties. I agree with that. I'd like to look at it, and I will if I get time, but as I said elsewhere, I also think it's appropriate to give a little time around the final version of any big, complex patch just because people may have thoughts, and they may not have time to deliver those thoughts the minute the patch hits the list. Fair enough -- I'll keep it open for the time being. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14, 10:52 PM, David Johnston wrote: Given the following why not just pick ms for log_rotation_age now? Right now there are people out there who have configurations that look like this: log_rotation_age=60 In order to get hourly rotation. These users will suffer some trauma should they upgrade to a version where this parameter now means a new log file pops every 60 seconds instead. If they didn't catch the warning in the release notes and it happens, I'm pretty comfortable they'll survive though, just with a bunch of cursing until it's resolved. I'd even wager a beer that more than 10% of PostgreSQL installs that get hit won't even notice. I'd prefer not to make that experience into one where they get a log file every 60ms though. That's seriously bad. I'm not opposed to making major changes like that, I just like them to be as part of updates big enough that people are unlikely to miss that something is different. Just doing this log_rotation_age thing is small enough that I expect people to miss the change in the release notes. That limits how much I think we can change the magnitude of an easy to miss setting with a large unit adjustment. ? are there any special considerations for people using pg_dump vs. those using pg_upgrade? I don't know that there's any code in pg_upgrade aiming at this sort of job. I'd prefer not to add find parameters that have changed in meaning and flag them to pg_upgrade's job requirements. It has enough problems to worry about, and we really don't do this very often. If we are going to go that far why not simply modify the GUC input routine to require unit-values on these particular parameters? Direct manipulation of pg_settings probably would need some attention but everything else could reject integers and non-unit-specifying text. That would be fine by me as a long-term direction, but it's more of a two to three version release level of change. To keep that from being terrible for users, we'd need to provide a transition release that helped people find the problem ones without units. After that was in the wild for a while, then could we have some confidence that enough people had flushed the issue out enough to turn it into a hard requirement. The project went through this exact sort of exercise with the standard_conforming_strings GUC being the way we flagged the bad constructs for people. That was a much harder job because it touched everyone's application code too. But it took many years to play out. I'd be shocked if we could herd our flock of old time users fast enough to remove unitless GUC values from PostgreSQL in less than 3 years worth of new releases. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/23/14, 7:13 PM, Robert Haas wrote: I think we expose far too little information in our system views. Just to take one example, we expose no useful information about lwlock acquire or release, but a lot of real-world performance problems are caused by lwlock contention. I sent over a proposal for what I was calling Performance Events about a year ago. The idea was to provide a place to save data about lock contention, weird checkpoint sync events, that sort of thing. Replacing log parsing to get at log_lock_waits data was my top priority. Once that's there, lwlocks was an obvious next target. Presumably we just needed collection to be low enough overhead, and then we can go down to whatever shorter locks we want; lower the overhead, faster the event we can measure. Sometimes the database will never be able to instrument some of its fastest events without blowing away the event itself. We'll still have perf / dtrace / systemtap / etc. for those jobs. But those are not the problems of the average Postgres DBA's typical day. The data people need to solve this sort of thing in production can't always show up in counters. You'll get evidence the problem is there, but you need more details to actually find the culprit. Some info about the type of lock, tables and processes involved, maybe the query that's running, that sort of thing. You can kind of half-ass the job if you make per-tables counter for everything, but we really need more, both to serve our users and to compare well against what other databases provide for tools. That's why I was trying to get the infrastructure to capture all that lock detail, without going through the existing logging system first. Actually building Performance Events fell apart on the storage side: figuring out where to put it all without waiting for a log file to hit disk. I wanted in-memory storage so clients don't wait for anything, then a potentially lossy persistence writer. I thought I could get away with a fixed size buffer like pg_stat_statements uses. That was optimistic. Trying to do better got me lost in memory management land without making much progress. I think the work you've now done on dynamic shared memory gives the right shape of infrastructure that I could pull this off now. I even have funding to work on it again, and it's actually the #2 thing I'd like to take on as I get energy for new feature development. (#1 is the simple but time consuming job of adding block write counters, the lack of which which is just killing me on some fast growing installs) I have a lot of unread messages on this list to sort through right now. I know I saw someone try to revive the idea of saving new sorts of performance log data again recently; can't seem to find it again right now. That didn't seem like it went any farther than thinking about the specifications though. The last time I jumped right over that and hit a wall with this one hard part of the implementation instead, low overhead memory management for saving everything. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On 09/15/2014 09:46 PM, Craig Ringer wrote: On 09/16/2014 07:44 AM, Peter Geoghegan wrote: FWIW, I am slightly concerned about weighing use cases around very large JSON documents too heavily. Having enormous jsonb documents just isn't going to work out that well, but neither will equivalent designs in popular document database systems for similar reasons. For example, the maximum BSON document size supported by MongoDB is 16 megabytes, and that seems to be something that their users don't care too much about. Having 270 pairs in an object isn't unreasonable, but it isn't going to be all that common either. Also, at a certain size the fact that Pg must rewrite the whole document for any change to it starts to introduce other practical changes. Anyway - this is looking like the change will go in, and with it a catversion bump. Introduction of a jsonb version/flags byte might be worthwhile at the same time. It seems likely that there'll be more room for improvement in jsonb, possibly even down to using different formats for different data. Is it worth paying a byte per value to save on possible upgrade pain? This comment seems to have drowned in the discussion. If there indeed has to be a catversion bump in the process of this, then I agree with Craig. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On Tue, Sep 23, 2014 at 10:02 PM, Jan Wieck j...@wi3ck.info wrote: Is it worth paying a byte per value to save on possible upgrade pain? This comment seems to have drowned in the discussion. If there indeed has to be a catversion bump in the process of this, then I agree with Craig. -1. We already have a reserved bit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers