Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Michael Paquier
On Thu, Jul 16, 2015 at 12:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Thu, Jul 16, 2015 at 3:43 AM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:
 Attached is a patch that implements the extension support discussed at
 PgCon this year during the FDW unconference sesssion.

 ...

 Thinking a bit wider, why is this just limited to extensions?

 The basic issue here is how can a user control which functions/operators
 can be sent for remote execution?.

Yep.

 While it's certainly true that
 sometimes you might want function-by-function control of that, Paul's
 point was that extension-level granularity would be extremely convenient
 for PostGIS, and probably for other extensions.  I don't see anything
 wrong with that --- and I don't think that we should insist that Paul's
 patch implement both cases.  Somebody else who really needs
 function-by-function control can do the dogwork of figuring out a
 reasonable API for that.

Well, you could for example pass a JSON string (that's fashionable
these days) that sets up a list of authorized objects per category
instead, like:
authorized_objects = {functions:[foo_oid,foo2_oid],
operators:[ope1_oid,ope2_oid]}

 Disclaimer 1: Paul and I discussed this back at PGCon, and I encouraged
 him to send in his patch.

 Disclaimer 2: I haven't read the patch and don't mean to vouch for any
 implementation details.  But the functional spec of allow remote
 execution of functions belonging to named extensions seems sane to me.

Well, I am just questioning the extensibility of the proposed
interface, not saying that this is a bad thing :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-15 Thread Amit Kapila
On Wed, Jul 15, 2015 at 9:13 PM, David Christensen da...@endpoint.com
wrote:


  On Jul 15, 2015, at 3:18 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
 
 - pg_disable_checksums(void) = turn checksums off for a cluster.
Sets the state to disabled, which means bg_worker will not do anything.
  
 - pg_request_checksum_cycle(void) = if checksums are enabled,
increment the data_checksum_cycle counter and set the state to enabling.
  
 
  If the cluster is already enabled for checksums, then what is
  the need for any other action?

 You are assuming this is a one-way action.


No, I was confused by the state (enabling) this function will set.

 Requesting an explicit checksum cycle would be desirable in the case
where you want to proactively verify there is no cluster corruption to be
found.


Sure, but I think that is different from setting the state to enabling.
In your proposal above, in enabling state cluster needs to write
checksums, where for such a feature you only need read validation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread dinesh kumar
Hi

On Wed, Jul 15, 2015 at 9:27 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Jul 16, 2015 at 5:18 AM, Rahila Syed rahilasye...@gmail.com
 wrote:
  Hello,
 
  Please find attached updated patch with an interface to calculate command
  progress in pgstat.c.

 Thanks for updating the patch!

 I got the following compiler warning.

 guc.c:2316: warning: initialization makes pointer from integer without a
 cast

 make check-world caused lots of failures in my environment.

 Yeah, i got couple of warnings with plain make.


 The following query caused a segmentation fault.


It was the typo I believe. I see the problem is with GUC definition in
guc.c. There should be NULL between gettext_noop and GUC_UNIT_MS.

Regards,
Dinesh
manojadinesh.blogspot.com

SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
 pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'
 UNION ALL SELECT 'all') ss  WHERE substring(name,1,3)='tra';

 Regards,

 --
 Fujii Masao


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Thu, Jul 16, 2015 at 3:43 AM, Paul Ramsey pram...@cleverelephant.ca 
 wrote:
 Attached is a patch that implements the extension support discussed at
 PgCon this year during the FDW unconference sesssion.

...

 Thinking a bit wider, why is this just limited to extensions?

The basic issue here is how can a user control which functions/operators
can be sent for remote execution?.  While it's certainly true that
sometimes you might want function-by-function control of that, Paul's
point was that extension-level granularity would be extremely convenient
for PostGIS, and probably for other extensions.  I don't see anything
wrong with that --- and I don't think that we should insist that Paul's
patch implement both cases.  Somebody else who really needs
function-by-function control can do the dogwork of figuring out a
reasonable API for that.

Disclaimer 1: Paul and I discussed this back at PGCon, and I encouraged
him to send in his patch.

Disclaimer 2: I haven't read the patch and don't mean to vouch for any
implementation details.  But the functional spec of allow remote
execution of functions belonging to named extensions seems sane to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Fujii Masao
On Thu, Jul 16, 2015 at 5:18 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

 Please find attached updated patch with an interface to calculate command
 progress in pgstat.c.

Thanks for updating the patch!

I got the following compiler warning.

guc.c:2316: warning: initialization makes pointer from integer without a cast

make check-world caused lots of failures in my environment.

The following query caused a segmentation fault.

SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'
UNION ALL SELECT 'all') ss  WHERE substring(name,1,3)='tra';

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-15 Thread Haribabu Kommi
On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.


 One case where this patch can behave differently then current
 code is, when before taking Exclusive lock on relation for truncation,
 if some backend clears the vm bit and then inserts-deletes-prune that
 page.  I think with patch it will not consider to truncate pages whereas
 current code will allow to truncate it as it re-verifies each page.  I think
 such a case would be rare and we might not want to bother about it,
 but still worth to think about it once.

Thanks for your review.

corrected the code as instead of returning the blkno after visibility map
check failure, the code just continue to verify the contents as
earlier approach.


 Some minor points about patch:

 count_nondeletable_pages()
 {
 ..
 if (PageIsNew(page) || PageIsEmpty(page))
 {
 /* PageIsNew probably shouldn't happen... */
 UnlockReleaseBuffer(buf);
 continue;
 }
 ..
 }

 Why vmbuffer is not freed in above loop?


In the above check, we are just continuing to the next blkno, at the
end of the loop the vmbuffer
is freed.


 count_nondeletable_pages()
 {
 ..
 + /*
 + * Pages that were inspected and found to be empty by this vacuum run
 + * must still be empty if the vm bit is still set.  Only vacuum sets
 + * vm bits, and only one vacuum can exist in a table at one time.
 + */
 + trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ?
 + vacrelstats-nonempty_pages : vacrelstats-skipped_pages;

 ..
 }

 I think it is better to have spaces in above assignment statement
 (near '=' and near '')


corrected.

Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.

I marked the patch as ready for committer.


Regards,
Hari Babu
Fujitsu Australia


vac_trunc_trust_vm_v2.patch
Description: Binary data


vac_trunc_trust_vm_and_prefetch_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-15 Thread Amit Kapila
On Fri, Jul 3, 2015 at 8:45 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 On Thu, Jul 2, 2015 at 7:44 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:
 
   Can we reduce that to a single one?  Maybe the
  first one could be errdetail or something.
 

 I think it is better other way (basically have second one as errdetail).
 We already have one similar message in xlog.c that way.
 ereport(LOG,
 (errmsg(online backup mode canceled),
 errdetail(\%s\ was renamed to \%s\.,
   BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));

 Attached patch consolidates errmsg into one message.

This can be tracked either in 9.5 Open Items or for next CF,
any opinions?

If nobody else has any opinion on this, I will add it to 9.5 Open Items
list.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Sameer Thakur-2
Hello,
Your current design completely misses the time taken to scan indexes, which
is significant.
I tried to address this issue in the attached patch.
The patch calculates index scan progress by measuring against scanned pages
in LVRelStats. It checks for a change current page being scanned and
increments the progress counter. When counter reaches scanned pages number
in LVRelStats, progress is 100% complete. For now the progress is emitted as
a warning (so no config changes needed to see progress)
Thoughts?
regards
Sameer IndexScanProgress.patch
http://postgresql.nabble.com/file/n5858109/IndexScanProgress.patch  




--
View this message in context: 
http://postgresql.nabble.com/PROPOSAL-VACUUM-Progress-Checker-tp5855849p5858109.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-07-15 Thread Amit Kapila
On Wed, Jul 15, 2015 at 2:14 PM, Antonin Houska a...@cybertec.at wrote:

 Amit Kapila amit.kapil...@gmail.com wrote:
  Attached, find the rebased version of patch.

 [I haven't read this thread so far, sorry for possibly redundant comment.]

 I noticed that false is passed for required_outer agrument of
 create_partialseqscan_path(), while NULL seems to be cleaner in terms of C
 language.

 But in terms of semantics, I'm not sure this is correct anyway. Why does
 create_parallelscan_paths() not accept the actual rel-lateral_relids,
just
 like create_seqscan_path() does? (See set_plain_rel_pathlist().) If
there's
 reason for your approach, I think it's worth a comment.


Right, I think this is left over from initial version where parallel seq
scan
was supported just for single table scan.  It should probably do similar to
create_seqscan_path() and then pass the same down to
create_partialseqscan_path() and get_baserel_parampathinfo().

Thanks, I will fix this in next version of patch.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] option for psql - short list non template database

2015-07-15 Thread Pavel Stehule
Hi

terrible often I use pattern

 psql -c select datname from pg_database where not datistemplate and
datallowconn postgres

What about introduction new long option that does it?

psql -At -list --names --without-templates

Regards

Pavel


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Michael Paquier
On Thu, Jul 16, 2015 at 1:30 PM, Sameer Thakur-2 wrote:
 Thoughts?
 regards
 Sameer IndexScanProgress.patch
 http://postgresql.nabble.com/file/n5858109/IndexScanProgress.patch

I am not really willing to show up as the picky guy here, but could it
be possible to receive those patches as attached to emails instead of
having them referenced by URL? I imagine that you are directly using
the nabble interface.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Amit Langote
On 2015-07-16 PM 12:43, Tom Lane wrote:
 
 The basic issue here is how can a user control which functions/operators
 can be sent for remote execution?.  While it's certainly true that
 sometimes you might want function-by-function control of that, Paul's
 point was that extension-level granularity would be extremely convenient
 for PostGIS, and probably for other extensions.

Perhaps just paranoid but is the extension version number any significant?
I guess that if a function name is matched locally and declared safe to
send but doesn't really exist on the remote end or does exist but has
different behavior, then that can't be expected to work or work correctly.
But it seems difficult to incorporate the version number into chosen
approach of matching functions anyway.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Michael Paquier
On Thu, Jul 16, 2015 at 3:43 AM, Paul Ramsey pram...@cleverelephant.ca wrote:
 Attached is a patch that implements the extension support discussed at
 PgCon this year during the FDW unconference sesssion. Highlights:

 * Pass extension operators and functions to the foreign server
 * Only send ops/funcs if the foreign server is declared to support the
 relevant extension, for example:

 CREATE SERVER foreign_server
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (host '127.0.0.1', port '5432', dbname 'my_db',
 extensions 'cube, seg');

 Github branch is here:
   https://github.com/pramsey/postgres/tree/fdw-extension-suppport

 Synthetic pull request for easy browsing/commenting is here:
   https://github.com/pramsey/postgres/pull/1


+
 /*
  * Returns true if given expr is safe to evaluate on the foreign server.
  */
@@ -177,7 +185,7 @@ is_foreign_expr(PlannerInfo *root,
 {
 foreign_glob_cxt glob_cxt;
 foreign_loc_cxt loc_cxt;
-
+
 /*
  * Check that the expression consists of nodes that are safe to execute
  * remotely.
@@ -207,6 +215,8 @@ is_foreign_expr(PlannerInfo *root,
 return true;
 }

+
+

This patch includes some diff noise, it would be better to remove that.

-if (!is_builtin(fe-funcid))
+if (!is_builtin(fe-funcid) 
(!is_in_extension(fe-funcid, fpinfo)))
Extra parenthesis are not needed.

+if ( (!is_builtin(oe-opno)) 
(!is_in_extension(oe-opno, fpinfo)) )
... And this does not respect the project code format. See here for
more details for example:
http://www.postgresql.org/docs/devel/static/source.html

+/* PostGIS metadata */
+List*extensions;
+booluse_postgis;
+Oid postgis_oid;
This addition in PgFdwRelationInfo is surprising. What the point of
keeping use_postgis and postgres_oid that are actually used nowhere?
(And you could rely on the fact that postgis_oid is InvalidOid to
determine if it is defined or not btw). I have a hard time
understanding why this refers to PostGIS as well as a postgres_fdw
feature.

 appendStringInfo(buf, ::%s,
- format_type_with_typemod(node-consttype,
-  node-consttypmod));
+ format_type_be_qualified(node-consttype));
What's the reason for this change?

Thinking a bit wider, why is this just limited to extensions? You may
have as well other objects defined locally and remotely like functions
or operators that are not defined in extensions, but as normal
objects. Hence I think that what you are looking for is not this
option, but a boolean option enforcing the behavior of code paths
using is_builtin() in foreign_expr_walker such as the sanity checks on
existing objects are not limited to FirstBootstrapObjectId but to
other objects in the catalogs. That's a risky option because it could
lead to inconsistencies among objects, so obviously the default is
false but by documenting correctly the risks of using this option we
may be able to get something integrated (aka be sure that each object
is defined consistently across the servers queried or you'll have
surprising results!). In short, it seems to me that you are taking the
wrong approach.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-07-15 Thread Amit Kapila
On Thu, Jul 16, 2015 at 2:02 AM, Robert Haas robertmh...@gmail.com wrote:

 exec_stmt_execsql is called by exec_stmt_open and exec_stmt_forc.
 Those are cursor operations and thus - I think - parallelism can't be
 used there.

Right, but it also gets called from exec_stmt where a parallel-safe
statement could be passed to it.  So it seems to me that we
should enable parallelism for that path in code.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Thakur, Sameer
Hello,
I am not really willing to show up as the picky guy here, but could it be 
possible to receive those patches as attached to emails instead of having them 
referenced by URL? I imagine that you are directly using the nabble interface.
Just configured a new mail client for nabble, did not know how to use it within 
an existing conversation.
Now I can send the patch attached!
Thanks
Sameer


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


IndexScanProgress.patch
Description: IndexScanProgress.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Amit Langote
On 2015-07-16 AM 05:18, Rahila Syed wrote:
 
 GUC parameter 'pgstat_track_progress' is currently PGC_SUSET in line with
 'track_activities' GUC parameter.

Naming the GUC pgstat* seems a little inconsistent. It could be called,
say, track_maintenance_progress_interval/track_vacuum_progress_interval.
That way, it will look similar to existing track_* parameters:

#track_activities = on
#track_counts = on
#track_io_timing = off
#track_functions = none # none, pl, all
#track_activity_query_size = 1024   # (change requires restart)

Also, adding the new GUC to src/backend/utils/misc/postgresql.conf.sample
might be helpful.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-15 Thread Andres Freund
On 2015-07-14 13:09:26 -0400, Adam Brightwell wrote:
 All,
 
  I won't have time to do anything about this anytime soon, but I think we
  should fix that at some point.  Shall I put this on the todo? Or do we
  want to create an 'open items' page that's not major version specific?
 
  I think adding it to the TODO would be great.

Done. It's rather telling that it took me a fair while to find a spot in
the todo list where it fits...

 I'd be willing to look/dive into this one further.

Cool.


One thing worth mentioning is that arguably the problem is caused by the
fact that we're here emitting database level information in pg_dump,
normally only done for dumpall.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-15 Thread Andres Freund
On 2015-07-15 12:48:40 +0530, Amit Kapila wrote:
 If during scan of a relation, after doing checksum for half of the
 blocks in relation, system crashes, then in the above scheme a
 restart would need to again read all the blocks even though some
 of the blocks are already checksummed in previous cycle, this is
 okay if it happens for few small or medium size relations, but assume
 it happens when multiple large size relations are at same state
 (half blocks are checksummed) when the crash occurs, then it could
 lead to much more IO than required.

I don't think this is worth worrying about. If you crash frequently
enough for this to be a problem you should fix that.  Adding complexity
for such an uncommon case spreads the cost to many more people.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-15 Thread Alvaro Herrera
Andres Freund wrote:

 One thing worth mentioning is that arguably the problem is caused by the
 fact that we're here emitting database level information in pg_dump,
 normally only done for dumpall.

... the reason for which is probably the lack of CURRENT_DATABASE as a
database specifier.  It might make sense to add the rest of
database-level information to pg_dump output, when we get that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-15 Thread Andres Freund
On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  One thing worth mentioning is that arguably the problem is caused by the
  fact that we're here emitting database level information in pg_dump,
  normally only done for dumpall.
 
 ... the reason for which is probably the lack of CURRENT_DATABASE as a
 database specifier.  It might make sense to add the rest of
 database-level information to pg_dump output, when we get that.

I'm not sure. I mean, it's not that an odd idea to assign a label to a
database and then restore data into it, and expect the explicitly
assigned label to survive.  Not sure if there actually is a good
behaviour either way here :/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Michael Paquier
On Wed, Jul 15, 2015 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote:
 pg_replslot has persistent state. We are discussing permanent configuration
 data for which I don't see the need to create an additional parallel
 infrastructure just to store a string given stated objection that the string
 is fairly long. AFAICS its not even that long.

 ...

 JSON seems the most sensible format for the string. Inventing a new one
 doesn't make sense. Most important for me is the ability to programmatically
 manipulate/edit the config string, which would be harder with a new custom
 format.

 ...

 Group labels are essential.

OK, so this is leading us to the following points:
- Use a JSON object to define the quorum/priority groups for the sync state.
- Store it as a GUC, and use the check hook to validate its format,
which is what we have now with s_s_names
- Rely on SIGHUP to maintain an in-memory image of the quorum/priority
sync state
- Have the possibility to define group labels in this JSON blob, and
be able to use those labels in a quorum or priority sync definition.
- For backward-compatibility, use for example s_s_names = 'json' to
switch to the new system.

Also, as a first step of the implementation, do we actually need a set
of functions to manipulate the JSON blob. I mean, we could perhaps
have them in contrib/ but they do not seem mandatory as long as we
document correctly how to document a label group and define a quorum
or priority group, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Could be improved point of UPSERT

2015-07-15 Thread Yourfriend
For the most cases I mentioned, we don't request a strict gapless sequence
for the Invoice ID, the essential requirement is unique.
We just hope that there is no obviously gap in most situations.
From the test of UPSERT, there are quite a few chances to generate a big
gap when UPSERT multi records.
However, the result of UPSERT is acceptable, and I do love this function.
So, it's a suggestion only.

Anyway, thanks a lot for the detail explanation.

Regards,

Daojing Zhou.




On Wed, Jul 15, 2015 at 3:23 PM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, Jul 15, 2015 at 12:01 AM, Yourfriend doudou...@gmail.com wrote:
  for example, SO201507_1001, PO201503_1280, etc.
 
  As these IDs would be the most important attribute to the business, so,
 we
  hope there is no gap for the IDs.

 That's a requirement I've heard a number of times before. If you're
 relying on a sequence for this purpose, your application is already
 broken [1]. UPSERT need not be involved at all.

 [1] http://www.varlena.com/GeneralBits/130.php
 --
 Peter Geoghegan



Re: [HACKERS] Parallel Seq Scan

2015-07-15 Thread Antonin Houska
Amit Kapila amit.kapil...@gmail.com wrote:
 Attached, find the rebased version of patch.

[I haven't read this thread so far, sorry for possibly redundant comment.]

I noticed that false is passed for required_outer agrument of
create_partialseqscan_path(), while NULL seems to be cleaner in terms of C
language.

But in terms of semantics, I'm not sure this is correct anyway. Why does
create_parallelscan_paths() not accept the actual rel-lateral_relids, just
like create_seqscan_path() does? (See set_plain_rel_pathlist().) If there's
reason for your approach, I think it's worth a comment.


BTW, emacs shows whitespace on otherwise empty line parallelpath.c:57.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Simon Riggs
On 7 July 2015 at 07:03, Michael Paquier michael.paqu...@gmail.com wrote:

 On Tue, Jul 7, 2015 at 2:19 PM, Josh Berkus j...@agliodbs.com wrote:
  On 07/06/2015 09:56 PM, Michael Paquier wrote:
  On Tue, Jul 7, 2015 at 12:51 PM, Josh Berkus j...@agliodbs.com wrote:
  On 07/06/2015 06:40 PM, Michael Paquier wrote:
  On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com
 wrote:
  pro-JSON:
 
  * standard syntax which is recognizable to sysadmins and devops.
  * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
  additions/deletions from the synch rep config.
  * can add group labels (see below)
 
  If we go this way, I think that managing a JSON blob with a GUC
  parameter is crazy, this is way longer in character size than a simple
  formula because of the key names. Hence, this JSON blob should be in a
  separate place than postgresql.conf not within the catalog tables,
  manageable using an SQL interface, and reloaded in backends using
  SIGHUP.
 
  I'm not following this at all.  What are you saying here?
 
  A JSON string is longer in terms of number of characters than a
  formula because it contains key names, and those key names are usually
  repeated several times, making it harder to read in a configuration
  file. So what I am saying that that we do not save it as a GUC, but as
  a separate metadata that can be accessed with a set of SQL functions
  to manipulate it.
 
  Where, though?  Someone already pointed out the issues with storing it
  in a system catalog, and adding an additional .conf file with a
  different format is too horrible to contemplate.

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.


-1 to pg_syncinfo/

pg_replslot has persistent state. We are discussing permanent configuration
data for which I don't see the need to create an additional parallel
infrastructure just to store a string given stated objection that the
string is fairly long. AFAICS its not even that long.

...

JSON seems the most sensible format for the string. Inventing a new one
doesn't make sense. Most important for me is the ability to
programmatically manipulate/edit the config string, which would be harder
with a new custom format.

...

Group labels are essential.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Could be improved point of UPSERT

2015-07-15 Thread Yourfriend
In my example, I just give each record a different ID to access it
efficiently.

In our business cases, some times, we also use some prefix letter like
'SO', PO' combining with the current year, month and then a sequence
to make a invoice ID,

for example, SO201507_1001, PO201503_1280, etc.

As these IDs would be the most important attribute to the business, so, we
hope there is no gap for the IDs.

Regards,

Daojing Zhou.



On Wed, Jul 15, 2015 at 2:33 AM, Peter Geoghegan p...@heroku.com wrote:

 On Sun, Jul 12, 2015 at 4:09 AM, Yourfriend doudou...@gmail.com wrote:
  Suggestion:  When a conflict was found for UPSERT, don't access the
  sequence, so users can have a reasonable list of ID.

 This is not technically feasible. What if the arbiter index is a serial PK?

 The same thing can happen when a transaction is aborted. SERIAL is not
 guaranteed to be gapless.
 --
 Peter Geoghegan



Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Simon Riggs
On 29 June 2015 at 18:40, Josh Berkus j...@agliodbs.com wrote:


 I'm in favor of a more robust and sophisticated synch rep.  But not if
 nobody not on this mailing list can configure it, and not if even we
 don't know what it will do in an actual failure situation.


That's the key point. Editing the config after a failure is a Failure of
Best Practice in an HA system.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Amit Kapila
On Fri, Jun 26, 2015 at 11:16 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Jun 25, 2015 at 8:32 PM, Simon Riggs  wrote:
  Let's start with a complex, fully described use case then work out how
to
  specify what we want.

 Well, one of the most simple cases where quorum commit and this
 feature would be useful for is that, with 2 data centers:
 - on center 1, master A and standby B
 - on center 2, standby C and standby D
 With the current synchronous_standby_names, what we can do now is
 ensuring that one node has acknowledged the commit of master. For
 example synchronous_standby_names = 'B,C,D'. But you know that :)
 What this feature would allow use to do is for example being able to
 ensure that a node on the data center 2 has acknowledged the commit of
 master, meaning that even if data center 1 completely lost for a
 reason or another we have at least one node on center 2 that has lost
 no data at transaction commit.


I think the way to address this could be via SQL Syntax as that
will make users life easier.

Create Replication Setup Master A
Sync_Priority_Standby B Sync_Group_Any_Standby C,D
Sync_Group_Fixed_Standby 2,E,F,G

where
Sync_Priority_Standby - means same as current setting in
synchronous_standby_names

Sync_Group_Any_Standby - means if any one in the group has
acknowledged commit master can proceed

Sync_Group_Fixed_Standby - means fixed number
(that will be first parameter following this option) of standby's from this
group should commit before master can proceed.

The above syntax is just to explain the idea, but I think we can invent
better syntax if required.  We can define these as options in syntax
like we do in some other syntaxes to avoid creating more keywords.
We need to ensure that all these option values needs to be persisted.

 Now, regarding the way to express that, we need to use a concept of
 node group for each element of synchronous_standby_names. A group
 contains a set of elements, each element being a group or a single
 node. And for each group we need to know three things when a commit
 needs to be acknowledged:
 - Does my group need to acknowledge the commit?
 - If yes, how many elements in my group need to acknowledge it?
 - Does the order of my elements matter?


I think with above kind of syntax we can address all these points
and even if something is remaining it is easily extendable.

 That's where the micro-language idea makes sense to use.

Micro-language idea is good, but I think if we can provide some
syntax or via SQL functions, then it can be convienient for users to
specify the replication topology.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-15 Thread Kouhei Kaigai
 -Original Message-
 From: Michael Paquier [mailto:michael.paqu...@gmail.com]
 Sent: Wednesday, July 15, 2015 3:29 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Tom Lane; Robert Haas; Alvaro Herrera; hlinnaka; Jim Nasby; Kohei KaiGai;
 PgHacker; Simon Riggs
 Subject: Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] 
 Custom
 Plan API)
 
 On Wed, Jul 15, 2015 at 10:12 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  We never guarantee the interface compatibility between major version up.
  If we add/modify interface on v9.6, it is duty for developer of extensions
  to follow the new version, even not specific to custom-scan provider.
  If v9.6 adds support fine-grained function cost estimation, I also have
  to follow the feature, but it is natural.
 
 Maintaining compatibility across major versions is a best-effort and
 even if we sometimes break things across major versions, and sometimes
 even silently (take the example of 9.3's background worker that do not
 start with 9.4 as long as bgw_notify_pid is not set to 0), the
 approach is usually taken to have APIs stable and convenient able to
 cover a maximum set of cases for a given set of plugins, and this
 serves well in the long term. Now it is true that we cannot assume
 either that the version of a plugin API will be perfect forever and
 will be able to cover all the imagined test cases at first shot, still
 I'd like to think that things are broken only when necessary and with
 good reasons. A set of APIs changed at each major release tends to be
 proof that research lacked in the first version, and would surely
 demotivate its adoption by extension developers.

Yep, I also don't want to break the interface compatibility without
something reasonable, and also think following-up new core features
is a good reason to enhance the interface in the future version.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Jeff Davis
On Tue, 2015-07-14 at 16:19 -0400, Robert Haas wrote:
 tuplesort.c does its own accounting, and TBH that seems like the right
 thing to do here, too.  The difficulty is, I think, that some
 transition functions use an internal data type for the transition
 state, which might not be a single palloc'd chunk.  But since we can't
 spill those aggregates to disk *anyway*, that doesn't really matter.

So would it be acceptable to just ignore the memory consumed by
internal, or come up with some heuristic?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-15 Thread Michael Paquier
On Wed, Jul 15, 2015 at 10:12 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 We never guarantee the interface compatibility between major version up.
 If we add/modify interface on v9.6, it is duty for developer of extensions
 to follow the new version, even not specific to custom-scan provider.
 If v9.6 adds support fine-grained function cost estimation, I also have
 to follow the feature, but it is natural.

Maintaining compatibility across major versions is a best-effort and
even if we sometimes break things across major versions, and sometimes
even silently (take the example of 9.3's background worker that do not
start with 9.4 as long as bgw_notify_pid is not set to 0), the
approach is usually taken to have APIs stable and convenient able to
cover a maximum set of cases for a given set of plugins, and this
serves well in the long term. Now it is true that we cannot assume
either that the version of a plugin API will be perfect forever and
will be able to cover all the imagined test cases at first shot, still
I'd like to think that things are broken only when necessary and with
good reasons. A set of APIs changed at each major release tends to be
proof that research lacked in the first version, and would surely
demotivate its adoption by extension developers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Simon Riggs
On 2 July 2015 at 19:50, Josh Berkus j...@agliodbs.com wrote:


 So there's two parts to this:

 1. I need to ensure that data is replicated to X places.

 2. I need to *know* which places data was synchronously replicated to
 when the master goes down.

 My entire point is that (1) alone is useless unless you also have (2).
 And do note that I'm talking about information on the replica, not on
 the master, since in any failure situation we don't have the old master
 around to check.


You might *think* you know, but given we are in this situation because of
an unexpected failure, it seems strange to specifically avoid checking
before you proceed.

Bacon not Aristotle.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-15 Thread Amit Kapila
On Tue, Jul 14, 2015 at 1:56 AM, David Christensen da...@endpoint.com
wrote:


 For any relation that it finds in the database which is not checksummed,
it starts an actual worker to handle the checksum process for this table.
Since the state of the cluster is already either enforcing or
revalidating, any block writes will get checksums added automatically, so
the only thing the bgworker needs to do is load each block in the relation
and explicitly mark as dirty (unless that's not required for FlushBuffer()
to do its thing).  After every block in the relation is visited this way
and checksummed, its pg_class record will have rellastchecksum updated.


If during scan of a relation, after doing checksum for half of the
blocks in relation, system crashes, then in the above scheme a
restart would need to again read all the blocks even though some
of the blocks are already checksummed in previous cycle, this is
okay if it happens for few small or medium size relations, but assume
it happens when multiple large size relations are at same state
(half blocks are checksummed) when the crash occurs, then it could
lead to much more IO than required.

 ** Function API:

 Interface to the functionality will be via the following Utility
functions:

   - pg_enable_checksums(void) = turn checksums on for a cluster.  Will
error if the state is anything but disabled.  If this is the first time
this cluster has run this, this will initialize
ControlFile-data_checksum_version to the preferred built-in algorithm
(since there's only one currently, we just set it to 1).  This increments
the ControlFile-data_checksum_cycle variable, then sets the state to
enabling, which means that the next time the bgworker checks if there is
anything to do it will see that state,  scan all the databases'
datlastchecksum fields, and start kicking off the bgworker processes to
handle the checksumming of the actual relation files.

   - pg_disable_checksums(void) = turn checksums off for a cluster.  Sets
the state to disabled, which means bg_worker will not do anything.

   - pg_request_checksum_cycle(void) = if checksums are enabled,
increment the data_checksum_cycle counter and set the state to enabling.


If the cluster is already enabled for checksums, then what is
the need for any other action?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Could be improved point of UPSERT

2015-07-15 Thread Peter Geoghegan
On Wed, Jul 15, 2015 at 12:01 AM, Yourfriend doudou...@gmail.com wrote:
 for example, SO201507_1001, PO201503_1280, etc.

 As these IDs would be the most important attribute to the business, so, we
 hope there is no gap for the IDs.

That's a requirement I've heard a number of times before. If you're
relying on a sequence for this purpose, your application is already
broken [1]. UPSERT need not be involved at all.

[1] http://www.varlena.com/GeneralBits/130.php
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Atri Sharma
On Wed, Jul 15, 2015 at 12:57 PM, Jeff Davis pg...@j-davis.com wrote:

 On Tue, 2015-07-14 at 16:19 -0400, Robert Haas wrote:
  tuplesort.c does its own accounting, and TBH that seems like the right
  thing to do here, too.  The difficulty is, I think, that some
  transition functions use an internal data type for the transition
  state, which might not be a single palloc'd chunk.  But since we can't
  spill those aggregates to disk *anyway*, that doesn't really matter.

 So would it be acceptable to just ignore the memory consumed by
 internal, or come up with some heuristic?

 Regards,
 Jeff Davis


I think a heuristic would be more suited here and ignoring memory
consumption for internal types means that we are not making the memory
accounting useful for a set of usecases.



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Andrew Dunstan


On 07/15/2015 10:52 AM, Robert Haas wrote:

On Mon, Jul 13, 2015 at 10:46 AM, Ryan Pedela rped...@datalanche.com wrote:

As far as large numbers in JSON, I think Postgres is doing the right thing
and should not be changed. It is Javascript that is stupid here, and I don't
think it is wise to add something to core just because one client does
stupid things with large numbers. In addition, ES7 is introducing value
types which will hopefully solve the large number problem in Javascript.

FWIW, I don't agree.  If it's not easy to read the JSON that
PostgreSQL generates using JavaScript, then a lot of people are just
going to give up on doing it, and IMO that would be sad.  Telling
people that they have to parse the JSON using some parser other than
the one built into their JavaScript engine, whack it around, and then
render it as text and parse it again is not really an acceptable
answer.  The reason why the logical decoding stuff allows multiple
output formats is because Andres, quite correctly, foresaw that
different people would need different output formats.  He could have
designed that system to output only one output format and just said,
everybody's got to read and parse this, but that would have been slow.
Instead, he tried to set things up so that you could get the output in
the format that was most convenient for your client, whatever that is.
On this thread, we're back-pedaling from that idea: sorry, you can get
JSON output, but if you want JSON output that will be properly
interpreted by your JSON parser, you can't have that.  Regardless of
the details of this particular patch, I can't endorse that approach.
If we want people to use our software, we need to meet them where they
are at, especially when we are only (IIUC) talking about inserting a
few extra quotation marks.



The question for me is where is the best place to transform the data. 
The approach take was both invasive and broken. The approach I 
suggested, reparsing and transforming it in the logical decoder, would 
be both fairly simple and completely noninvasive. If someone gives me a 
test for what is an acceptable number for JS processors, I bet I could 
write a transformation function in an hour or two, and in a hundred 
lines or so. I admit that I probably have more experience doing this 
than anyone else, but the parser API was designed to be fairly simple, 
and I believe it is.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Ryan Pedela
On Wed, Jul 15, 2015 at 11:10 AM, Ryan Pedela rped...@datalanche.com
wrote:

 On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com
 wrote:

 FWIW, I don't agree.  If it's not easy to read the JSON that
 PostgreSQL generates using JavaScript, then a lot of people are just
 going to give up on doing it, and IMO that would be sad.  Telling
 people that they have to parse the JSON using some parser other than
 the one built into their JavaScript engine, whack it around, and then
 render it as text and parse it again is not really an acceptable
 answer.


 The vast majority of Javascript users are going to be using Node.js when
 they connect to Postgres if only for security reasons. If they use Node,
 they will be using node-postgres [1] or something that builds on top of it.
 For int64 and numerics in a row, the default is to return a string, and
 there is a flag you can set to round returned numbers if you prefer. There
 is also a way to override the default parsing of each Postgres type [2]. So
 in the case of JSON using my json-bignum module [3], the code looks like
 this:

 var pgTypes = require('pg').types;
 var bignumJSON = require('json-bignum');

 types.setTypeParser(JSON_TYPE_OID, function (value) {
 return bignumJSON.parse(value);
 });

 types.setTypeParser(JSONB_TYPE_OID, function (value) {
 return bignumJSON.parse(value);
 });

 To me that code is super simple, and no a pain in the ass. In other words,
 it is not Telling people that they have to parse the JSON using some
 parser other than the one built into their JavaScript engine, whack it
 around, and then render it as text and parse it again. Like I said
 previously, the situation with Javascript will hopefully be remedied in a
 few years with ES7 anyway.

 1. https://github.com/brianc/node-postgres
 2. https://github.com/brianc/node-pg-types
 3. https://github.com/datalanche/json-bignum

  On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com
  wrote:

 The reason why the logical decoding stuff allows multiple
 output formats is because Andres, quite correctly, foresaw that
 different people would need different output formats.  He could have
 designed that system to output only one output format and just said,
 everybody's got to read and parse this, but that would have been slow.
 Instead, he tried to set things up so that you could get the output in
 the format that was most convenient for your client, whatever that is.
 On this thread, we're back-pedaling from that idea: sorry, you can get
 JSON output, but if you want JSON output that will be properly
 interpreted by your JSON parser, you can't have that.  Regardless of
 the details of this particular patch, I can't endorse that approach.
 If we want people to use our software, we need to meet them where they
 are at, especially when we are only (IIUC) talking about inserting a
 few extra quotation marks.


 I would be okay with a generic way to specify output formats if there are
 many use cases beyond Javascript and JSON. I vaguely remember someone
 suggesting a FORMAT clause on CREATE TABLE which would specify how a
 particular column would output from a SELECT. For example, returning a date
 with a non-ISO format. I liked that idea. However if the only reason for
 different output formats is Javascript, that is silly. I have a very long
 list of feature requests that would probably only be beneficial to me or a
 handful of users. Should we implement them? No, of course not! If we did
 that Postgres would cease to be the best open-source database. You can't
 have the best product and say yes to everything. Feature creep is the enemy
 of quality. If Javascript is the sole reason for supporting multiple output
 formats, then that is the definition of feature creep in my opinion. If
 there are many use cases beyond Javascript and JSON, then that is different
 and a conversation worth having.


Bottom line: Large numbers are a pain to deal with in Javascript regardless
of where they come from or what format they are in. Adding code to Postgres
core will never change that.


[HACKERS] Retrieve the snapshot's LSN

2015-07-15 Thread Florent Guiliani
Hello everyone,

I would need to start a read repeatable transaction and retrieve the
corresponding LSN. I'm looking for pointers or Ideas on how to achieve
this.

Andres F. suggested me to extend pg_export_snapshot() [1] and call
GetLatestSnapshot() [2] while reliably retrieving the current LSN.
Should I call
GetXLogWriteRecPtr() [3] for that ? What lock(s) could I take to
synchronize the two calls?

Any other Idea ?

A snapshot is exported when creating a logical replication slot [4]
and the corresponding LSN is also returned [5]. This is what I need
except that I'd rather prefer to not create a replication slot each
time I need the snapshot.

During slot creation, the snapshot building and exporting code seems
highly coupled with the logical decoding stuff. It doesn't seems much
reusable to retrieve the snapshot's LSN outside of logical decoding.

Thank you for your help,

References:

[1] pg_export_snapshot()
https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/utils/time/snapmgr.c#L

[2] GetLatestSnapshot()
https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/utils/time/snapmgr.c#L259

[3] GetXLogWriteRecPtr()
https://github.com/postgres/postgres/blob/7b156c1e0746a46d083d7dbcd28afb303b3484ef/src/backend/access/transam/xlog.c#L10616

[4] Exported snapshot in logical replication slot creation
https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/replication/walsender.c#L815

/* build initial snapshot, might take a while */
DecodingContextFindStartpoint(ctx);

/*
 * Export a plain (not of the snapbuild.c type) snapshot to the user
 * that can be imported into another session.
 */
 snapshot_name = SnapBuildExportSnapshot(ctx-snapshot_builder);


[5] Consistent point LSN in logical replication slot creation:
https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/src/backend/replication/walsender.c#L831

snprintf(xpos, sizeof(xpos), %X/%X,
(uint32)
(MyReplicationSlot-data.confirmed_flush  32),
(uint32)
MyReplicationSlot-data.confirmed_flush);

...cut...

/* second field: LSN at which we became consistent */
pq_sendstring(buf, consistent_point); /* col name */

...cut

/* consistent wal location */
pq_sendint(buf, strlen(xpos), 4); /* col2 len */
pq_sendbytes(buf, xpos, strlen(xpos));

--
Florent


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Ryan Pedela
On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote:

 FWIW, I don't agree.  If it's not easy to read the JSON that
 PostgreSQL generates using JavaScript, then a lot of people are just
 going to give up on doing it, and IMO that would be sad.  Telling
 people that they have to parse the JSON using some parser other than
 the one built into their JavaScript engine, whack it around, and then
 render it as text and parse it again is not really an acceptable
 answer.


The vast majority of Javascript users are going to be using Node.js when
they connect to Postgres if only for security reasons. If they use Node,
they will be using node-postgres [1] or something that builds on top of it.
For int64 and numerics in a row, the default is to return a string, and
there is a flag you can set to round returned numbers if you prefer. There
is also a way to override the default parsing of each Postgres type [2]. So
in the case of JSON using my json-bignum module [3], the code looks like
this:

var pgTypes = require('pg').types;
var bignumJSON = require('json-bignum');

types.setTypeParser(JSON_TYPE_OID, function (value) {
return bignumJSON.parse(value);
});

types.setTypeParser(JSONB_TYPE_OID, function (value) {
return bignumJSON.parse(value);
});

To me that code is super simple, and no a pain in the ass. In other words,
it is not Telling people that they have to parse the JSON using some
parser other than the one built into their JavaScript engine, whack it
around, and then render it as text and parse it again. Like I said
previously, the situation with Javascript will hopefully be remedied in a
few years with ES7 anyway.

1. https://github.com/brianc/node-postgres
2. https://github.com/brianc/node-pg-types
3. https://github.com/datalanche/json-bignum

 On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote:

 The reason why the logical decoding stuff allows multiple
 output formats is because Andres, quite correctly, foresaw that
 different people would need different output formats.  He could have
 designed that system to output only one output format and just said,
 everybody's got to read and parse this, but that would have been slow.
 Instead, he tried to set things up so that you could get the output in
 the format that was most convenient for your client, whatever that is.
 On this thread, we're back-pedaling from that idea: sorry, you can get
 JSON output, but if you want JSON output that will be properly
 interpreted by your JSON parser, you can't have that.  Regardless of
 the details of this particular patch, I can't endorse that approach.
 If we want people to use our software, we need to meet them where they
 are at, especially when we are only (IIUC) talking about inserting a
 few extra quotation marks.


I would be okay with a generic way to specify output formats if there are
many use cases beyond Javascript and JSON. I vaguely remember someone
suggesting a FORMAT clause on CREATE TABLE which would specify how a
particular column would output from a SELECT. For example, returning a date
with a non-ISO format. I liked that idea. However if the only reason for
different output formats is Javascript, that is silly. I have a very long
list of feature requests that would probably only be beneficial to me or a
handful of users. Should we implement them? No, of course not! If we did
that Postgres would cease to be the best open-source database. You can't
have the best product and say yes to everything. Feature creep is the enemy
of quality. If Javascript is the sole reason for supporting multiple output
formats, then that is the definition of feature creep in my opinion. If
there are many use cases beyond Javascript and JSON, then that is different
and a conversation worth having.


[HACKERS] [PATCH] postgres_fdw extension support

2015-07-15 Thread Paul Ramsey
Hi all,

Attached is a patch that implements the extension support discussed at
PgCon this year during the FDW unconference sesssion. Highlights:

* Pass extension operators and functions to the foreign server
* Only send ops/funcs if the foreign server is declared to support the
relevant extension, for example:

CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '127.0.0.1', port '5432', dbname 'my_db',
extensions 'cube, seg');

Github branch is here:
  https://github.com/pramsey/postgres/tree/fdw-extension-suppport

Synthetic pull request for easy browsing/commenting is here:
  https://github.com/pramsey/postgres/pull/1

Thanks!

Paul
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..bbe3c9d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -34,11 +34,15 @@
 
 #include postgres_fdw.h
 
+#include access/genam.h
 #include access/heapam.h
 #include access/htup_details.h
 #include access/sysattr.h
 #include access/transam.h
+#include catalog/dependency.h
+#include catalog/indexing.h
 #include catalog/pg_collation.h
+#include catalog/pg_depend.h
 #include catalog/pg_namespace.h
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
@@ -49,8 +53,10 @@
 #include optimizer/var.h
 #include parser/parsetree.h
 #include utils/builtins.h
+#include utils/fmgroids.h
 #include utils/lsyscache.h
 #include utils/rel.h
+#include utils/snapmgr.h
 #include utils/syscache.h
 
 
@@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, 
int32 paramtypmod,
 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
+static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo);
 
 
 /*
@@ -167,6 +174,7 @@ classifyConditions(PlannerInfo *root,
}
 }
 
+
 /*
  * Returns true if given expr is safe to evaluate on the foreign server.
  */
@@ -177,7 +185,7 @@ is_foreign_expr(PlannerInfo *root,
 {
foreign_glob_cxt glob_cxt;
foreign_loc_cxt loc_cxt;
-
+   
/*
 * Check that the expression consists of nodes that are safe to execute
 * remotely.
@@ -207,6 +215,8 @@ is_foreign_expr(PlannerInfo *root,
return true;
 }
 
+
+
 /*
  * Check if expression is safe to execute remotely, and return true if so.
  *
@@ -229,6 +239,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt-foreignrel-fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +374,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe-funcid))
+   if (!is_builtin(fe-funcid)  
(!is_in_extension(fe-funcid, fpinfo)))
return false;
 
/*
@@ -407,7 +420,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe-opno))
+   if ( (!is_builtin(oe-opno))  
(!is_in_extension(oe-opno, fpinfo)) )
return false;
 
/*
@@ -445,7 +458,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe-opno))
+   if (!is_builtin(oe-opno)  
(!is_in_extension(oe-opno, fpinfo)))
return false;
 
/*
@@ -591,7 +604,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type  !is_builtin(exprType(node)))
+   if (check_type  !is_builtin(exprType(node))  
(!is_in_extension(exprType(node), fpinfo)) )
return false;
 
/*
@@ -643,6 +656,8 @@ foreign_expr_walker(Node *node,
return true;
 }
 
+
+
 /*
  * Return true if given object is one of PostgreSQL's built-in objects.
  *
@@ -669,6 +684,67 @@ is_builtin(Oid oid)
 
 
 /*
+ * Returns true if given operator/function is part of an extension declared in 
the 

Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Jeff Davis
On Wed, 2015-07-15 at 12:59 +0530, Atri Sharma wrote:

 
 I think a heuristic would be more suited here and ignoring memory
 consumption for internal types means that we are not making the memory
 accounting useful for a set of usecases. 
 
OK, I will drop this patch and proceed with the HashAgg patch, with a
heuristic for internal types.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Andrew Dunstan


On 07/15/2015 07:58 AM, Simon Riggs wrote:



For me the design summary is this

* CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table 
but with different relkind
* When we see a request to INSERT, DEL, UPD, SEL from the temp table, 
if it does not exist we create it as a TEMP table of the same name, 
using the Global's pg_class entry as a template


That meets the SQL Standard and doesn't contain any visibility 
problems or need for new internals.


The purpose of this feature is to automatically create a temp table 
with the same definition whenever needed. The discussion of bloat is 
just wrong. We create exactly the same amount of bloat as if we had 
typed CREATE TEMP TABLE. Optimising temp table entries in the catalog 
is another, separate patch, if we care.





Sounds fine in general. I'm a bit curious to know what are the locking 
implications of vivifying the table on access.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Alvaro Herrera
Simon Riggs wrote:

 JSON seems the most sensible format for the string. Inventing a new one
 doesn't make sense. Most important for me is the ability to
 programmatically manipulate/edit the config string, which would be harder
 with a new custom format.

Do we need to keep the value consistent across all the servers in the
flock?  If not, is the behavior halfway sane upon failover?

If we need the DBA to keep the value in sync manually, that's going to
be a recipe for trouble.  Which is going to bite particularly hard
during those stressing moments when disaster strikes and things have to
be done in emergency mode.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Simon Riggs
On 14 July 2015 at 23:20, Jim Nasby jim.na...@bluetreble.com wrote:

 On 7/9/15 12:45 AM, Pavel Stehule wrote:


 2015-07-09 7:32 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu
 mailto:zhy...@cs.ucsd.edu:

   I am not sure, if it is not useless work.

 I don't understand why an implementation taking approach 2.a would
 be useless. As I said, its performance will be no worse than current
 temp tables and it will provide a lot of convenience to users who
 need to create temp tables in every session.


 Surely it should be step forward. But you will to have to solve lot of
 problems with duplicated tables in system catalogue, and still it
 doesn't solve the main problem with temporary tables - the bloating
 catalogue - and related performance degradation.


 That being the main problem is strictly a matter of opinion based on
 your experience. Many people don't have a performance problem today, but do
 have to deal with all the pain of handling this manually (as well as all
 the limitations that go with that).

 If it's easy to fix the bloat problem at the same time as adding GLOBAL
 TEMP then great! But there's no reason to reject this just because it
 doesn't fix that issue.


Agreed

There are some good arguments for why we need this feature.

Pavel's original description of how to do this seem valid, and from the
link Tom agreed in 2009.

For me the design summary is this

* CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table but
with different relkind
* When we see a request to INSERT, DEL, UPD, SEL from the temp table, if it
does not exist we create it as a TEMP table of the same name, using the
Global's pg_class entry as a template

That meets the SQL Standard and doesn't contain any visibility problems or
need for new internals.

The purpose of this feature is to automatically create a temp table with
the same definition whenever needed. The discussion of bloat is just
wrong. We create exactly the same amount of bloat as if we had typed CREATE
TEMP TABLE. Optimising temp table entries in the catalog is another,
separate patch, if we care.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 10:03, Michael Paquier michael.paqu...@gmail.com wrote:


 OK, so this is leading us to the following points:
 - Use a JSON object to define the quorum/priority groups for the sync
 state.
 - Store it as a GUC, and use the check hook to validate its format,
 which is what we have now with s_s_names
 - Rely on SIGHUP to maintain an in-memory image of the quorum/priority
 sync state
 - Have the possibility to define group labels in this JSON blob, and
 be able to use those labels in a quorum or priority sync definition.


+1


 - For backward-compatibility, use for example s_s_names = 'json' to
 switch to the new system.


Seems easy enough to check to see if it is has a leading { and then treat
it as if it is an attempt to use JSON (which may fail), otherwise use the
old syntax.


 Also, as a first step of the implementation, do we actually need a set
 of functions to manipulate the JSON blob. I mean, we could perhaps
 have them in contrib/ but they do not seem mandatory as long as we
 document correctly how to document a label group and define a quorum
 or priority group, no?


Agreed, no specific functions needed to manipulate this field.

If we lack the means to manipulate JSON in SQL that can be solved outside
of the scope of this patch, because its just JSON.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Pavel Stehule
2015-07-15 13:58 GMT+02:00 Simon Riggs si...@2ndquadrant.com:

 On 14 July 2015 at 23:20, Jim Nasby jim.na...@bluetreble.com wrote:

 On 7/9/15 12:45 AM, Pavel Stehule wrote:


 2015-07-09 7:32 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu
 mailto:zhy...@cs.ucsd.edu:

   I am not sure, if it is not useless work.

 I don't understand why an implementation taking approach 2.a would
 be useless. As I said, its performance will be no worse than current
 temp tables and it will provide a lot of convenience to users who
 need to create temp tables in every session.


 Surely it should be step forward. But you will to have to solve lot of
 problems with duplicated tables in system catalogue, and still it
 doesn't solve the main problem with temporary tables - the bloating
 catalogue - and related performance degradation.


 That being the main problem is strictly a matter of opinion based on
 your experience. Many people don't have a performance problem today, but do
 have to deal with all the pain of handling this manually (as well as all
 the limitations that go with that).

 If it's easy to fix the bloat problem at the same time as adding GLOBAL
 TEMP then great! But there's no reason to reject this just because it
 doesn't fix that issue.


 Agreed

 There are some good arguments for why we need this feature.

 Pavel's original description of how to do this seem valid, and from the
 link Tom agreed in 2009.

 For me the design summary is this

 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table but
 with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
 it does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

 That meets the SQL Standard and doesn't contain any visibility problems or
 need for new internals.

 The purpose of this feature is to automatically create a temp table with
 the same definition whenever needed. The discussion of bloat is just
 wrong. We create exactly the same amount of bloat as if we had typed CREATE
 TEMP TABLE. Optimising temp table entries in the catalog is another,
 separate patch, if we care.


The optimization of local temp tables is little bit harder - you cannot to
share pg_class and pg_attribute - although some memory entries can be used
too.

Regards

Pavel



 --
 Simon Riggshttp://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Robert Haas
On Tue, Jul 14, 2015 at 9:14 PM, Tomas Vondra
tomas.von...@2ndquadrant.com wrote:
 Firstly, do we really have good benchmarks and measurements? I really doubt
 that. We do have some numbers for REINDEX, where we observed 0.5-1%
 regression on noisy results from a Power machine (and we've been unable to
 reproduce that on x86). I don't think that's a representative benchmark, and
 I'd like to see more thorough measurements. And I agreed to do this, once
 Jeff comes up with a new version of the patch.

 Secondly, the question is whether the performance is impacted more by the
 additional instructions, or by other things - say, random padding, as was
 explained by Andrew Gierth in [1].

 I don't know whether that's happening in this patch, but if it is, it seems
 rather strange to use this against this patch and not the others (because
 there surely will be other patches causing similar issues).

It matters, at least to me, an awful lot where we're adding lines of
code.  If you want to add modest amounts of additional code to CREATE
TABLE or CHECKPOINT or something like that, I really don't care,
because that stuff doesn't execute frequently enough to really matter
to performance even if we add a significant chunk of overhead, and
we're doing other expensive stuff at the same time, like fsync.  The
same can't be said about functions like LWLockAcquire and
AllocSetAlloc that routinely show up at the top of CPU profiles.

I agree that there is room to question whether the benchmarks I did
are sufficient reason to think that the abstract concern that putting
more code into a function might make it slower translates into a
measurable drop in performance in practice.  But I think when it comes
to these very hot code paths, extreme conservatism is warranted.  We
can agree to disagree about that.

 tuplesort.c does its own accounting, and TBH that seems like the right
 thing to do here, too.  The difficulty is, I think, that some
 transition functions use an internal data type for the transition
 state, which might not be a single palloc'd chunk.  But since we can't
 spill those aggregates to disk *anyway*, that doesn't really matter.
 If the transition is a varlena or a fixed-length type, we can know how
 much space it's consuming without hooking into the memory context
 framework.

 I respectfully disagree. Our current inability to dump/load the state has
 little to do with how we measure consumed memory, IMHO.

 It's true that we do have two kinds of aggregates, depending on the nature
 of the aggregate state:

 (a) fixed-size state (data types passed by value, variable length types
 that do not grow once allocated, ...)

 (b) continuously growing state (as in string_agg/array_agg)

 Jeff's HashAgg patch already fixes (a) and can fix (b) once we get a
 solution for dump/load of the aggregate stats - which we need to implement
 anyway for parallel aggregate.

 I know there was a proposal to force all aggregates to use regular data
 types as aggregate stats, but I can't see how that could work without a
 significant performance penalty. For example array_agg() is using internal
 to pass ArrayBuildState - how do you turn that to a regular data type
 without effectively serializing/deserializing the whole array on every
 transition?

That is a good point.  Tom suggested that his new expanded-format
stuff might be able to be adapted to the purpose, but I have no idea
how possible that really is.

 And even if we come up with a solution for array_agg, do we really believe
 it's possible to do for all custom aggregates? Maybe I'm missing something
 but I doubt that. ISTM designing ephemeral data structure allows tweaks that
 are impossible otherwise.

 What might be possible is extending the aggregate API with another custom
 function returning size of the aggregate state. So when defining an
 aggregate using 'internal' for aggregate state, you'd specify transfunc,
 finalfunc and sizefunc. That seems doable, I guess.

And infunc and outfunc.  If we don't use the expanded-format stuff for
aggregates with a type-internal transition state, we won't be able to
use input and output functions to serialize and deserialize them,
either.

 I find the memory accounting as a way more elegant solution, though.

I think we're just going to have to agree to disagree on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Robert Haas
On Wed, Jul 15, 2015 at 3:27 AM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2015-07-14 at 16:19 -0400, Robert Haas wrote:
 tuplesort.c does its own accounting, and TBH that seems like the right
 thing to do here, too.  The difficulty is, I think, that some
 transition functions use an internal data type for the transition
 state, which might not be a single palloc'd chunk.  But since we can't
 spill those aggregates to disk *anyway*, that doesn't really matter.

 So would it be acceptable to just ignore the memory consumed by
 internal, or come up with some heuristic?

Either one would be OK with me.  I'd probably ignore that for the
first version of the patch and just let the hash table grow without
bound in that case.  Then in a later patch I'd introduce additional
infrastructure to deal with that part of the problem.  But if
something else works out well while coding it up, I'd be OK with that,
too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-07-15 Thread Robert Haas
On Tue, Jul 14, 2015 at 5:57 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 7/7/15 7:07 AM, Andres Freund wrote:
 On 2015-07-03 18:03:58 -0400, Tom Lane wrote:
 I have just looked through this thread, and TBH I think we should reject
 this patch altogether --- not RWF, but no we don't want this.  The
 use-case remains hypothetical: no performance numbers showing a
 real-world
 benefit have been exhibited AFAICS.


 It's not that hard to imagine a performance benefit though? If the
 toasted column is accessed infrequently/just after filtering on other
 columns (not uncommon) it could very well be beneficial to put the main
 table on fast storage (SSD) and the toast table on slow storage
 (spinning rust).

 I've seen humoungous toast tables that are barely ever read for tables
 that are constantly a couple times in the field.

 +1. I know of one case where the heap was ~8GB and TOAST was over 400GB.

Yeah, I think that's a good argument for this.  I have to admit that
I'm a bit fatigued by this thread - it started out as a learn
PostgreSQL project, and we iterated through a few design problems
that made me kind of worried about what sort of state the patch was in
- and now this patch is more than 4 years old.  But if some committer
still has the energy to go through it in detail and make sure that all
of the problems have been fixed and that the patch is, as Andreas
says, in good shape, then I don't see why we shouldn't take it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Function to get size of asynchronous notification queue

2015-07-15 Thread Gurjeet Singh
On Thu, Jun 25, 2015 at 8:43 PM, Brendan Jurd dire...@gmail.com wrote:

 On Fri, 26 Jun 2015 at 06:03 Gurjeet Singh gurj...@singh.im wrote:


 s/proportion/fraction/


 I think of these as synonymous -- do you have any particular reason to
 prefer fraction?  I don't feel strongly about it either way, so I'm quite
 happy to go with fraction if folks find that more expressive.


It just feels better to me in this context.

If the number of times used in Postgres code is any measure, 'fraction'
wins hands down: proportion : 33, fraction: 620.

I don't feel strongly about it, either. I can leave it up to the committer
to decide.





 + * The caller must hold (at least) shared AysncQueueLock.

 A possibly better wording: The caller must hold AysncQueueLock in (at
 least) shared mode.


 Yes, that is more accurate.


OK.





 Unnecessary whitespace changes in pg_proc.h for existing functions.


 I did group the asynchronous notification functions together, which seemed
 reasonable as there are now three of them, and changed the tabbing between
 the function name and namespace ID to match, as is done elsewhere in
 pg_proc.h.  I think those changes improve readability, but again I don't
 feel strongly about it.


Fair enough.



 +DESCR(get the current usage of the asynchronous notification queue);

 A possibly better wording: get the fraction of the asynchronous
 notification queue currently in use


 I have no objections to your wording.


OK. Please send a new patch with the changes you agree to, and I can mark
it ready for committer.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/


Re: [HACKERS] Grouping Sets: Fix unrecognized node type bug

2015-07-15 Thread Andrew Gierth
 Jeevan == Jeevan Chalke jeevan.cha...@enterprisedb.com writes:

 Jeevan Hi,
 Jeevan It looks like we do support nested GROUPING SETS, I mean Sets
 Jeevan withing Sets, not other types.  However this nesting is broken.

Good catch, but I'm not yet sure your fix is correct; I'll need to look
into that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Comment fix for miscinit.c

2015-07-15 Thread David Christensen
Quick comment fix for edit issue.



0001-Fix-comment.patch
Description: Binary data



--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-15 Thread Rahila Syed
Hello,

Please find attached updated patch with an interface to calculate command
progress in pgstat.c. This interface currently implements VACUUM progress
 tracking .
A column named percent_complete has been added in pg_stat_activity to
report progress.

VACUUM calls the progress calculation interface periodically at an interval
specified by pgstat_track_progress GUC in ms.
Progress calculation can be disabled by setting pgstat_track_progress  as
-1.

Remaining_time for VACUUM is not included in current patch to avoid
cluttering pg_stat_activity with too many columns.
But the estimate as seen from previous implementation seems reasonable
enough to be included in progress information , may be as an exclusive view
for vacuum progress information.

GUC parameter 'pgstat_track_progress' is currently PGC_SUSET in line with
'track_activities' GUC parameter.  Although IMO, pgstat_track_progress can
be made PGC_USERSET in order to provide more flexibility to any user to
enable/disable progress calculation provided progress is tracked only if
track_activities GUC parameter  is enabled.

In this patch, index scans are not taken into account for progress
calculation as of now .

Thank you,
Rahila Syed.


Vacuum_progress_checker_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Tomas Vondra

Hi,

On 07/15/2015 09:21 PM, Robert Haas wrote:

On Tue, Jul 14, 2015 at 9:14 PM, Tomas Vondra
tomas.von...@2ndquadrant.com wrote:

Firstly, do we really have good benchmarks and measurements? I really doubt
that. We do have some numbers for REINDEX, where we observed 0.5-1%
regression on noisy results from a Power machine (and we've been unable to
reproduce that on x86). I don't think that's a representative benchmark, and
I'd like to see more thorough measurements. And I agreed to do this, once
Jeff comes up with a new version of the patch.

Secondly, the question is whether the performance is impacted more by the
additional instructions, or by other things - say, random padding, as was
explained by Andrew Gierth in [1].

I don't know whether that's happening in this patch, but if it is,
it seems rather strange to use this against this patch and not the
others (because there surely will be other patches causing similar
issues).


It matters, at least to me, an awful lot where we're adding lines of
code. If you want to add modest amounts of additional code to CREATE
TABLE or CHECKPOINT or something like that, I really don't care,
because that stuff doesn't execute frequently enough to really
matter to performance even if we add a significant chunk of overhead,
and we're doing other expensive stuff at the same time, like fsync.
The same can't be said about functions like LWLockAcquire and
AllocSetAlloc that routinely show up at the top of CPU profiles.

I agree that there is room to question whether the benchmarks I did
are sufficient reason to think that the abstract concern that putting
more code into a function might make it slower translates into a
measurable drop in performance in practice. But I think when it comes
to these very hot code paths, extreme conservatism is warranted. We
can agree to disagree about that.


No, that is not what I tried to say. I certainly agree that we need to 
pay attention when adding stuff hot paths - there's no disagreement 
about this.


The problem with random padding is that adding code somewhere may 
introduce padding that affects other pieces of code. That is essentially 
the point of Andrew's explanation that I linked in my previous message.


And the question is - are the differences we've measured really due to 
code added to the hot path, or something introduced by random padding 
due to some other changes (possibly in a code that was not even executed 
during the test)?


I don't know how significant impact this may have in this case, or how 
serious this is in general, but we're talking about 0.5-1% difference on 
a noisy benchmark. And if such cases of random padding really are a 
problem in practice, we've certainly missed plenty of other patches with 
the same impact.


Because effectively what Jeff's last patch did was adding a single int64 
counter to MemoryContextData structure, and incrementing it for each 
allocated block (not chunk). I can't really imagine a solution making it 
cheaper, because there are very few faster operations. Even opt-in 
won't make this much faster, because you'll have to check a flag and 
you'll need two fields (flag + counter).


Of course, this assumes local counter (i.e. not updating counters the 
parent contexts), but I claim that's OK. I've been previously pushing 
for eagerly updating all the parent contexts, so that finding out the 
allocated memory is fast even when there are many child contexts - a 
prime example was array_agg() before I fixed it. But I changed my mind 
on this and now say screw them. I claim that aggregates using a 
separate memory context for each group are a lost case - they already 
suffer a significant overhead, and should be fixed just like we fixed 
array_agg().


That was effectively the outcome of pgcon discussions - to use the 
simple int64 counter, do the accounting for all contexts, and update 
only the local counter. For cases with many child contexts finding out 
the amount of allocated memory won't be cheap, but well - there's not 
much we can do about that.



I know there was a proposal to force all aggregates to use regular
data types as aggregate stats, but I can't see how that could work
without a significant performance penalty. For example array_agg()
is using internal to pass ArrayBuildState - how do you turn that to
a regular data type without effectively serializing/deserializing
the whole array on every transition?


That is a good point. Tom suggested that his new expanded-format
stuff might be able to be adapted to the purpose, but I have no idea
 how possible that really is.


Me neither, and maybe there's a good solution for that, making my 
concerns unfounded.



What might be possible is extending the aggregate API with another
custom function returning size of the aggregate state. So when
defining an aggregate using 'internal' for aggregate state, you'd
specify transfunc, finalfunc and sizefunc. That seems doable, I
guess.


And infunc and outfunc.  If we don't use 

Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Tomas Vondra

Hi,

On 07/15/2015 07:07 PM, Jeff Davis wrote:

On Wed, 2015-07-15 at 12:59 +0530, Atri Sharma wrote:



I think a heuristic would be more suited here and ignoring memory
consumption for internal types means that we are not making the
memory accounting useful for a set of usecases.


OK, I will drop this patch and proceed with the HashAgg patch, with
a heuristic for internal types.


Could someone briefly explain what heuristics are we talking about?


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 12:25, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Simon Riggs wrote:

  JSON seems the most sensible format for the string. Inventing a new one
  doesn't make sense. Most important for me is the ability to
  programmatically manipulate/edit the config string, which would be harder
  with a new custom format.

 Do we need to keep the value consistent across all the servers in the
 flock?  If not, is the behavior halfway sane upon failover?


Mostly, yes. Which means it doesn't change much, so config data is OK.


 If we need the DBA to keep the value in sync manually, that's going to
 be a recipe for trouble.  Which is going to bite particularly hard
 during those stressing moments when disaster strikes and things have to
 be done in emergency mode.


Manual config itself is the recipe for trouble, not this particular
setting. There are already many other settings that need to be the same on
all nodes for example. Nothing here changes that. This is just an
enhancement of the current technology.

For the future, a richer mechanism for defining nodes and their associated
metadata is needed for logical replication and clustering. That is not what
is being discussed here though, nor should we begin!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Making ParameterStatus available for more parameter types?

2015-07-15 Thread Robert Haas
On Sun, Jul 12, 2015 at 5:30 AM, Shay Rojansky r...@roji.org wrote:
 The ParameterStatus message is currently sent for a hard-wired set of
 parameters
 (http://www.postgresql.org/docs/current/static/protocol-flow.html#PROTOCOL-ASYNC).

 Just wanted to let you know that making this more flexible would be a great
 help in driver implementation. Npgsql maintains its own view of the current
 statement_timeout parameter for various reasons; as long as the standard
 ADO.NET API is used to change the timeout everything is OK, but if the user
 ever manually sets statement_timeout things can become quite messy. Being
 able to subscribe to more parameter changes would help in this case. In the
 very short term simply adding statement_timeout to the hard-wired set would
 help me as well.

Requests to add more stuff to ParameterStatus messages are becoming a
somewhat regular thing around here.  Your idea of giving the client
the ability to subscribe to the stuff it cares about might be a good
way forward, because sending more and more things all the time adds
overhead for all clients, even those that don't care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 13:26, Andrew Dunstan and...@dunslane.net wrote:


 On 07/15/2015 07:58 AM, Simon Riggs wrote:


 For me the design summary is this

 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table
 but with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
 it does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

 That meets the SQL Standard and doesn't contain any visibility problems
 or need for new internals.

 The purpose of this feature is to automatically create a temp table with
 the same definition whenever needed. The discussion of bloat is just
 wrong. We create exactly the same amount of bloat as if we had typed CREATE
 TEMP TABLE. Optimising temp table entries in the catalog is another,
 separate patch, if we care.



 Sounds fine in general. I'm a bit curious to know what are the locking
 implications of vivifying the table on access.


We would lock the Global Temp Table at the same lock level for the
activity, just as we do for INSERT, SELECT etc.. That prevents concurrent
DDL like DROP or ALTER on the Global Temp Table.

The Local temp table created is a copy of the Global Temp Table. The Local
temp table is only locally locked, so no worries.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Pavel Stehule
2015-07-15 15:53 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu:

   there is other question - what is effect of ALTER TABLE of global temp
 table on
   instances of this table in active sessions?

 As I said, we need to first agree on the behaviors of the existing
 commands. I can think of two options now for ALTER TABLE: 1) only allow
 ALTER TABLE when there is no other active sessions (this is how Oracle
 deals with it.) 2) handle it as if session copies inherit from the global
 copy and ALTER TABLE executes on the global copy.


There are two possible kinds of GLOBAL TEMP tables -  session related and
transation related. Transaction related tables has very short life - and @1
needs outage, @2 requires stronger locks and can slow and less effective -
because a) some changes can be invisible in other transactions (depends on
isolation levels), b) the structure can be changed, but function code not
(without dependency on isolation levels) - so it can be non consistent, c)
why to change table if this table will be dropped in next milisecond. For
this case the behave like PL functions can be very practical ~ third option
for ALTER TABLE

Regards

Pavel


 Thanks,
 Zhaomo



Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 16:44, Andres Freund and...@anarazel.de wrote:

 On 2015-07-15 16:36:12 +0100, Simon Riggs wrote:
  On 15 July 2015 at 16:28, Andres Freund and...@anarazel.de wrote:
   I think that's generally a fair point. But here we're discussing to add
   a fair amount of wrinkles with the copy approach. The fact alone that
   the oid is different will have some ugly consequences.
  
 
  Why? We are creating a local temp table LIKE the global temp table. That
 is
  already a supported operation. So there is no different oid.

 Then your locking against ALTER, DROP etc. isn't going to work.


There would be two objects, both locked. The temp table is just nice and
simple. No problem.

Your optimization may work; I hope it does. My approach definitely will. So
we could choose either.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Pavel Stehule
2015-07-15 15:21 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu:

  Sounds fine in general. I'm a bit curious to know what are the locking
 implications of  vivifying the table on access.

 The locking implications depend on how we interpret the existing commands
 in the context of global temp tables and I think we should discuss and
 agree on the behaviors of the commands with global temp tables, but I think
 in general we can follow these rules:

 If the command executes on the global temp table's metadata, for example
 an ALTER TABLE command, then we lock the global copy at the same level as
 we do a regular table.


there is other question - what is effect of ALTER TABLE of global temp
table on instances of this table in active sessions?



 If the command executes on the global temp table's data (which is actually
 stored in the session copy), for example an DML command, then the global
 copy is locked at the AccessShareLock level to prevent concurrent
 modifications to the global temp table's definition from other sessions.

 Thanks,
 Zhaomo

 On Wed, Jul 15, 2015 at 4:26 AM, Andrew Dunstan and...@dunslane.net
 wrote:


 On 07/15/2015 07:58 AM, Simon Riggs wrote:


 For me the design summary is this

 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table
 but with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
 it does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

 That meets the SQL Standard and doesn't contain any visibility problems
 or need for new internals.

 The purpose of this feature is to automatically create a temp table with
 the same definition whenever needed. The discussion of bloat is just
 wrong. We create exactly the same amount of bloat as if we had typed CREATE
 TEMP TABLE. Optimising temp table entries in the catalog is another,
 separate patch, if we care.



 Sounds fine in general. I'm a bit curious to know what are the locking
 implications of vivifying the table on access.

 cheers

 andrew





[HACKERS] Grouping Sets: Fix unrecognized node type bug

2015-07-15 Thread Jeevan Chalke
Hi,

It looks like we do support nested GROUPING SETS, I mean Sets withing
Sets, not other types.  However this nesting is broken.

Here is the simple example where I would expect three rows in the
result.  But unfortunately it is giving unrecognized node type
error.  Which is something weird and need a fix.

postgres=# create table gstest2 (a integer, b integer, c integer);
postgres=# insert into gstest2 values (1,1,1), (1,1,1), (1,1,1),
(1,1,1), (1,1,1), (1,1,1), (1,1,2), (1,2,2), (2,2,2);
postgres=# select sum(c) from gstest2
  group by grouping sets((), grouping sets((), grouping sets((
  order by 1 desc;
ERROR:  unrecognized node type: 926


I spend much time to understand the cause and was looking into
transformGroupingSet() and transformGroupClauseList() function.
I have tried fixing unrecognized node type: 926 error there,
but later it is failing with unrecognized node type: 656.

Later I have realized that we have actually have an issue while
flattening grouping sets.  If we have nested grouping sets like
above, then we are getting GroupingSet node inside the list and
transformGroupClauseList() does not expect that and end up with
this error.

I have tried fixing this issue in flatten_grouping_sets(), after
flattening grouping sets node, we need to concat the result with
the existing list and should not append.  This alone does not
solve the issue as we need a list when we have ROW expression.
Thus there, if not top level, I am creating a list now.

Attached patch with few testcases too.

Please have a look.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index e90e1d6..31d4331 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1779,8 +1779,19 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
 RowExpr*r = (RowExpr *) expr;
 
 if (r-row_format == COERCE_IMPLICIT_CAST)
-	return flatten_grouping_sets((Node *) r-args,
- false, NULL);
+{
+	Node   *n1 = flatten_grouping_sets((Node *) r-args,
+	   false, NULL);
+
+	/*
+	 * Make a list for row expression if toplevel is false,
+	 * return flatten list otherwise
+	 */
+	if (toplevel)
+		return (Node *) n1;
+	else
+		return (Node *) list_make1(n1);
+}
 			}
 			break;
 		case T_GroupingSet:
@@ -1805,7 +1816,10 @@ flatten_grouping_sets(Node *expr, bool toplevel, bool *hasGroupingSets)
 {
 	Node	   *n2 = flatten_grouping_sets(lfirst(l2), false, NULL);
 
-	result_set = lappend(result_set, n2);
+	if (IsA(n2, List))
+		result_set = list_concat(result_set, (List *) n2);
+	else
+		result_set = lappend(result_set, n2);
 }
 
 /*
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index adb39b3..5c47717 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -145,6 +145,112 @@ select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|   |  12 |   36
 (6 rows)
 
+-- nesting with grouping sets
+select sum(c) from gstest2
+  group by grouping sets((), grouping sets((), grouping sets((
+  order by 1 desc;
+ sum 
+-
+  12
+  12
+  12
+(3 rows)
+
+select sum(c) from gstest2
+  group by grouping sets((), grouping sets((), grouping sets(((a, b)
+  order by 1 desc;
+ sum 
+-
+  12
+  12
+   8
+   2
+   2
+(5 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(grouping sets(rollup(c), grouping sets(cube(c
+  order by 1 desc;
+ sum 
+-
+  12
+  12
+   6
+   6
+   6
+   6
+(6 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(a, grouping sets(a, cube(b)))
+  order by 1 desc;
+ sum 
+-
+  12
+  10
+  10
+   8
+   4
+   2
+   2
+(7 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(grouping sets((a, (b
+  order by 1 desc;
+ sum 
+-
+   8
+   2
+   2
+(3 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(grouping sets((a, b)))
+  order by 1 desc;
+ sum 
+-
+   8
+   2
+   2
+(3 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(grouping sets(a, grouping sets(a), a))
+  order by 1 desc;
+ sum 
+-
+  10
+  10
+  10
+   2
+   2
+   2
+(6 rows)
+
+select sum(c) from gstest2
+  group by grouping sets(grouping sets(a, grouping sets(a, grouping sets(a), ((a)), a, grouping sets(a), (a)), a))
+  order by 1 desc;
+ sum 
+-
+  10
+  10
+  10
+  10
+  10
+  10
+  10
+  10
+   2
+   2
+   2
+   2
+   2
+   2
+   2
+   2
+(16 rows)
+
 -- empty input: first is 0 rows, second 1, third 3 etc.
 select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
  a | b | sum | count 
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 0883afd..e478d34 100644
--- 

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-15 Thread Amit Kapila
On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 Attached is a patch that implements the vm scan for truncation.  It
introduces a variable to hold the last blkno which was skipped during the
forward portion.  Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum.  Any other process rendering
the page nonempty are required to clear the vm bit, and no other process
can set the bit again during the vacuum's lifetime.  So if the bit is still
set, the page is still empty without needing to inspect it.


One case where this patch can behave differently then current
code is, when before taking Exclusive lock on relation for truncation,
if some backend clears the vm bit and then inserts-deletes-prune that
page.  I think with patch it will not consider to truncate pages whereas
current code will allow to truncate it as it re-verifies each page.  I think
such a case would be rare and we might not want to bother about it,
but still worth to think about it once.

Some minor points about patch:

count_nondeletable_pages()
{
..
if (PageIsNew(page) || PageIsEmpty(page))
{
/* PageIsNew probably shouldn't happen... */
UnlockReleaseBuffer(buf);
continue;
}
..
}

Why vmbuffer is not freed in above loop?


count_nondeletable_pages()
{
..
+ /*
+  * Pages that were inspected and found to be empty by this vacuum run
+  * must still be empty if the vm bit is still set.  Only vacuum sets
+  * vm bits, and only one vacuum can exist in a table at one time.
+  */
+ trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ?
+ vacrelstats-nonempty_pages : vacrelstats-skipped_pages;

..
}

I think it is better to have spaces in above assignment statement
(near '=' and near '')


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
  there is other question - what is effect of ALTER TABLE of global temp
table on
  instances of this table in active sessions?

As I said, we need to first agree on the behaviors of the existing
commands. I can think of two options now for ALTER TABLE: 1) only allow
ALTER TABLE when there is no other active sessions (this is how Oracle
deals with it.) 2) handle it as if session copies inherit from the global
copy and ALTER TABLE executes on the global copy.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 15:57, Andres Freund and...@anarazel.de wrote:

 On 2015-07-15 16:52:49 +0200, Andres Freund wrote:
  Why do we need to create that copy? We can just use the relfilenode in
  all backends by having the backendid in the filename? Yes, there's a
  some amount of additional code needed, but it's not that much?  I
  actually think it might end up being less additional code than having a
  copy, because with the copy you'll have two different oids for global
  entry and the local copy.

 Hm, yes. Brainfart. Transaction table rewrites/truncations need to
 change the relfilenode.

 To fix We could add a backend local mapping table from global temp table
 id to the backend local relfilenode. The code to lookup the relfilenode
 is already mostly isolated.


It may be possible to do this, though I'm sure there's a wrinkle somewhere.
But there doesn't seem to be a need to overload the main feature request
with additional requirements. Doing that is just scope creep that prevents
us getting features out. Nice, simple patches from newer developers. Later
tuning and tweaking from more expert community members.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
 Sounds fine in general. I'm a bit curious to know what are the locking
implications of  vivifying the table on access.

The locking implications depend on how we interpret the existing commands
in the context of global temp tables and I think we should discuss and
agree on the behaviors of the commands with global temp tables, but I think
in general we can follow these rules:

If the command executes on the global temp table's metadata, for example an
ALTER TABLE command, then we lock the global copy at the same level as we
do a regular table.

If the command executes on the global temp table's data (which is actually
stored in the session copy), for example an DML command, then the global
copy is locked at the AccessShareLock level to prevent concurrent
modifications to the global temp table's definition from other sessions.

Thanks,
Zhaomo

On Wed, Jul 15, 2015 at 4:26 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 07/15/2015 07:58 AM, Simon Riggs wrote:


 For me the design summary is this

 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table
 but with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
 it does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

 That meets the SQL Standard and doesn't contain any visibility problems
 or need for new internals.

 The purpose of this feature is to automatically create a temp table with
 the same definition whenever needed. The discussion of bloat is just
 wrong. We create exactly the same amount of bloat as if we had typed CREATE
 TEMP TABLE. Optimising temp table entries in the catalog is another,
 separate patch, if we care.



 Sounds fine in general. I'm a bit curious to know what are the locking
 implications of vivifying the table on access.

 cheers

 andrew



Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Andres Freund
On 2015-07-15 16:24:52 +0100, Simon Riggs wrote:
 It may be possible to do this, though I'm sure there's a wrinkle somewhere.
 But there doesn't seem to be a need to overload the main feature request
 with additional requirements. Doing that is just scope creep that prevents
 us getting features out. Nice, simple patches from newer developers. Later
 tuning and tweaking from more expert community members.

I think that's generally a fair point. But here we're discussing to add
a fair amount of wrinkles with the copy approach. The fact alone that
the oid is different will have some ugly consequences.

So we add complexity, just to shift it into different places later? I'm
not sure that's a good idea.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Simon Riggs
On 15 July 2015 at 16:28, Andres Freund and...@anarazel.de wrote:

 On 2015-07-15 16:24:52 +0100, Simon Riggs wrote:
  It may be possible to do this, though I'm sure there's a wrinkle
 somewhere.
  But there doesn't seem to be a need to overload the main feature request
  with additional requirements. Doing that is just scope creep that
 prevents
  us getting features out. Nice, simple patches from newer developers.
 Later
  tuning and tweaking from more expert community members.

 I think that's generally a fair point. But here we're discussing to add
 a fair amount of wrinkles with the copy approach. The fact alone that
 the oid is different will have some ugly consequences.


Why? We are creating a local temp table LIKE the global temp table. That is
already a supported operation. So there is no different oid.


 So we add complexity, just to shift it into different places later? I'm
 not sure that's a good idea.


There's no complexity in a simple temp table like. We can do this now with
triggers.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-07-15 16:24:52 +0100, Simon Riggs wrote:
 It may be possible to do this, though I'm sure there's a wrinkle somewhere.
 But there doesn't seem to be a need to overload the main feature request
 with additional requirements. Doing that is just scope creep that prevents
 us getting features out. Nice, simple patches from newer developers. Later
 tuning and tweaking from more expert community members.

 I think that's generally a fair point. But here we're discussing to add
 a fair amount of wrinkles with the copy approach. The fact alone that
 the oid is different will have some ugly consequences.

 So we add complexity, just to shift it into different places later? I'm
 not sure that's a good idea.

With all due respect, there are features that are beyond the abilities of
some newer developers, and reducing the scope isn't a good way to fix
that.  It just leaves a bigger mess to be cleaned up later.

I think Andres' idea of a per-backend filenode mapping table might work.
The existing relfilenode mapper solves a somewhat related problem, namely
how do you replace the filenode for shared system catalogs whose pg_class
entries can't be changed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 15 July 2015 at 16:28, Andres Freund and...@anarazel.de wrote:
 I think that's generally a fair point. But here we're discussing to add
 a fair amount of wrinkles with the copy approach. The fact alone that
 the oid is different will have some ugly consequences.

 Why? We are creating a local temp table LIKE the global temp table. That is
 already a supported operation. So there is no different oid.

You're presuming a specific implementation decision, one that has not been
made yet, and isn't all that attractive because of the catalog bloat issues.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-15 Thread David Christensen

 On Jul 15, 2015, at 3:18 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 
- pg_disable_checksums(void) = turn checksums off for a cluster.  Sets 
  the state to disabled, which means bg_worker will not do anything.
 
- pg_request_checksum_cycle(void) = if checksums are enabled, 
  increment the data_checksum_cycle counter and set the state to enabling.
 
 
 If the cluster is already enabled for checksums, then what is
 the need for any other action?

You are assuming this is a one-way action.  Some people may decide that 
checksums end up taking too much overhead or similar, we should support 
disabling of this feature; with this proposed patch the disable action is 
fairly trivial to handle.

Requesting an explicit checksum cycle would be desirable in the case where you 
want to proactively verify there is no cluster corruption to be found.

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-15 Thread Heikki Linnakangas

On 06/30/2015 11:24 PM, Andres Freund wrote:

On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.


Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.


Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder.  Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way.  On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.


I'll take a stab at fixing this tomorrow.


Thanks! Do you mean both or just the second issue?


Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the 
lock atomically with acquiring it. Before the lwlock-scalability 
changes, that was straightforward because you held the spinlock anyway, 
but it's a lot harder/expensive now. So I changed the way acquiring a 
lock with a variable works. There is now a separate flag, 
LW_FLAG_VAR_SET, which indicates that the current lock holder has 
updated the variable. The LWLockAcquireWithVar function is gone - you 
now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET 
flag, and you can call LWLockUpdateVar() after that if you want to set 
the variable immediately. LWLockWaitForVar() always waits if the flag is 
not set, i.e. it will not return regardless of the variable's value, if 
the current lock-holder has not updated it yet.


This passes make check, but I haven't done any testing beyond that. Does 
this look sane to you?


- Heikki

From ce90bd9a1e2c8b5783ebeea83594da7d3a1c63de Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Wed, 15 Jul 2015 18:33:28 +0300
Subject: [PATCH 1/1] Fix LWLock variable support, broken by the lwlock
 scalability patch

---
 src/backend/access/transam/xlog.c |  15 ++---
 src/backend/storage/lmgr/lwlock.c | 135 ++
 src/include/storage/lwlock.h  |   1 -
 3 files changed, 83 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..2f34346 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void)
 	 * The insertingAt value is initially set to 0, as we don't know our
 	 * insert location yet.
 	 */
-	immed = LWLockAcquireWithVar(WALInsertLocks[MyLockNo].l.lock,
- WALInsertLocks[MyLockNo].l.insertingAt,
- 0);
+	immed = LWLockAcquire(WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 	if (!immed)
 	{
 		/*
@@ -1442,13 +1440,12 @@ WALInsertLockAcquireExclusive(void)
 	 */
 	for (i = 0; i  NUM_XLOGINSERT_LOCKS - 1; i++)
 	{
-		LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-			 WALInsertLocks[i].l.insertingAt,
-			 PG_UINT64_MAX);
+		LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
+		LWLockUpdateVar(WALInsertLocks[i].l.lock,
+		WALInsertLocks[i].l.insertingAt,
+		PG_UINT64_MAX);
 	}
-	LWLockAcquireWithVar(WALInsertLocks[i].l.lock,
-		 WALInsertLocks[i].l.insertingAt,
-		 0);
+	LWLockAcquire(WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
 
 	holdingAllLocks = true;
 }
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 46cab49..92a1aef 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -11,12 +11,12 @@
  * LWLocks to protect its shared state.
  *
  * In addition to exclusive and shared modes, lightweight locks can be used
- * to wait until a variable changes value.  The variable is initially set
- * when the lock is acquired with LWLockAcquireWithVar, and can be updated
- * without releasing the lock by calling LWLockUpdateVar.  LWLockWaitForVar
- * waits for the variable to be updated, or until the lock is free.  The
- * meaning of the variable is up to the caller, the lightweight lock code
- * just assigns and compares it.
+ * to wait until a variable changes value.  The variable is initially put
+ * in a not-set state when the lock is acquired with LWLockAcquire, and
+ * can be updated without releasing the lock by calling LWLockUpdateVar.
+ * LWLockWaitForVar waits for the variable to be updated, or until the lock
+ * is free.  The meaning of the variable is up to the caller, the lightweight
+ * lock code just assigns and compares it.
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -99,6 +99,7 @@ extern slock_t *ShmemLock;
 
 #define LW_FLAG_HAS_WAITERS			((uint32) 1  30)
 #define 

Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Andres Freund
On 2015-07-15 16:36:12 +0100, Simon Riggs wrote:
 On 15 July 2015 at 16:28, Andres Freund and...@anarazel.de wrote:
  I think that's generally a fair point. But here we're discussing to add
  a fair amount of wrinkles with the copy approach. The fact alone that
  the oid is different will have some ugly consequences.
 
 
 Why? We are creating a local temp table LIKE the global temp table. That is
 already a supported operation. So there is no different oid.

Then your locking against ALTER, DROP etc. isn't going to work.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Robert Haas
On Mon, Jul 13, 2015 at 10:46 AM, Ryan Pedela rped...@datalanche.com wrote:
 As far as large numbers in JSON, I think Postgres is doing the right thing
 and should not be changed. It is Javascript that is stupid here, and I don't
 think it is wise to add something to core just because one client does
 stupid things with large numbers. In addition, ES7 is introducing value
 types which will hopefully solve the large number problem in Javascript.

FWIW, I don't agree.  If it's not easy to read the JSON that
PostgreSQL generates using JavaScript, then a lot of people are just
going to give up on doing it, and IMO that would be sad.  Telling
people that they have to parse the JSON using some parser other than
the one built into their JavaScript engine, whack it around, and then
render it as text and parse it again is not really an acceptable
answer.  The reason why the logical decoding stuff allows multiple
output formats is because Andres, quite correctly, foresaw that
different people would need different output formats.  He could have
designed that system to output only one output format and just said,
everybody's got to read and parse this, but that would have been slow.
Instead, he tried to set things up so that you could get the output in
the format that was most convenient for your client, whatever that is.
On this thread, we're back-pedaling from that idea: sorry, you can get
JSON output, but if you want JSON output that will be properly
interpreted by your JSON parser, you can't have that.  Regardless of
the details of this particular patch, I can't endorse that approach.
If we want people to use our software, we need to meet them where they
are at, especially when we are only (IIUC) talking about inserting a
few extra quotation marks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Andres Freund
On 2015-07-15 12:58:51 +0100, Simon Riggs wrote:
 On 14 July 2015 at 23:20, Jim Nasby jim.na...@bluetreble.com wrote:
 Pavel's original description of how to do this seem valid, and from the
 link Tom agreed in 2009.
 
 For me the design summary is this
 
 * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table but
 with different relkind
 * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if it
 does not exist we create it as a TEMP table of the same name, using the
 Global's pg_class entry as a template

Why do we need to create that copy? We can just use the relfilenode in
all backends by having the backendid in the filename? Yes, there's a
some amount of additional code needed, but it's not that much?  I
actually think it might end up being less additional code than having a
copy, because with the copy you'll have two different oids for global
entry and the local copy.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Andres Freund
On 2015-07-15 16:52:49 +0200, Andres Freund wrote:
 Why do we need to create that copy? We can just use the relfilenode in
 all backends by having the backendid in the filename? Yes, there's a
 some amount of additional code needed, but it's not that much?  I
 actually think it might end up being less additional code than having a
 copy, because with the copy you'll have two different oids for global
 entry and the local copy.

Hm, yes. Brainfart. Transaction table rewrites/truncations need to
change the relfilenode.

To fix We could add a backend local mapping table from global temp table
id to the backend local relfilenode. The code to lookup the relfilenode
is already mostly isolated.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First Aggregate Funtion?

2015-07-15 Thread Robert Haas
On Tue, Jul 14, 2015 at 9:23 AM, sudalai sudala...@gmail.com wrote:
 The above implementation of first aggregate returns the first non-NULL item
 value.

 To get *first row item value* for a column use the below implementation.

 -- create a function that push  at most two element on given array
 -- push the first row value at second index of the array
 CREATE OR REPLACE  FUNCTION two_value_holder(anyarray, anyelement)
 returns anyarray as  $$
  select case when array_length($1,1)  2 then array_append($1,$2) else
 $1 end ;
 $$ language sql immutable;

 -- create a function that returns second element of an array
 CREATE  OR replace FUNCTION second_element (ANYARRAY)
 RETURNS ANYELEMENT AS $$ SELECT $1[2]; $$ LANGUAGE SQL;

 -- create first aggregate function that return first_row item value
 CREATE AGGREGATE first(anyelement)(
 SFUNC=two_value_holder,
 STYPE=ANYARRAY,
 INITCOND='{NULL}',
 FINALFUNC=second_element
 );

 I hope this work..

I don't think so, because arrays can contain duplicates.

rhaas=# select coalesce(first(x.column1), 'wrong') from (values
(null), ('correct')) x;
 coalesce
--
 wrong
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-15 Thread Robert Haas
On Tue, Jul 14, 2015 at 6:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Both you and Andres have articulated the concern that CustomScan isn't
 actually useful, but I still don't really understand why not.

 I'm curious, for example, whether CustomScan would have been
 sufficient to build TABLESAMPLE, and if not, why not.  Obviously the
 syntax has to be in core,

 ... so you just made the point ...

If you're going to set the bar that high, there's no point in anyone
ever trying to add extensibility anywhere, because there will always
be some other place where there isn't enough extensibility someplace
else to do everything that someone might want.  The fact that the
parser isn't extensible doesn't make custom scans useless any more
than the fact that the non-extensibility of WAL-logging makes pg_am
useless.

 but why couldn't the syntax just call an
 extension-provided callback that returns a custom scan, instead of
 having a node just for TABLESAMPLE?

 Because that only works for small values of work.  As far as TABLESAMPLE
 goes, I intend to hold Simon's feet to the fire until there's a less
 cheesy answer to the problem of scan reproducibility.  Assuming we're
 going to allow sample methods that can't meet the reproducibility
 requirement, we need something to prevent them from producing visibly
 broken query results.  Ideally, the planner would avoid putting such a
 scan on the inside of a nestloop.  A CustomScan-based implementation could
 not possibly arrange such a thing; we'd have to teach the core planner
 about the concern.

Well, I think it would be better to change the tablesample methods as
you previously proposed, so that they are actually stable.  But if
that's not possible, then when you (or someone) makes the core planner
able to consider such matters, you could add a CUSTOM_PATH_UNSTABLE
flag that a custom path can use to notify the core system about the
problem.  Then tablesample methods that are unstable can set the flag
and those that are stable are not penalized.  Presumably we'll end up
with such a flag in the tablesample-path anyway, and a separate flag
in every kind of future scan that needs similar consideration.  If we
put it in some piece of infrastructure (like the custom-path stuff)
that is reusable, we could avoid reinventing the wheel every time.

 Or, taking the example of a GpuScan node, it's essentially impossible
 to persuade the planner to delegate any expensive function calculations,
 aggregates, etc to such a node; much less teach it that that way is cheaper
 than doing such things the usual way.  So yeah, KaiGai-san may have a
 module that does a few things with a GPU, but it's far from doing all or
 even very much of what one would want.

It's true that there are things he can't do this way, but without the
custom-scan stuff, he'd be able to do even less.

 Now, as part of the upper-planner-rewrite business that I keep hoping to
 get to when I'm not riding herd on bad patches, it's likely that we might
 have enough new infrastructure soon that that particular problem could
 be solved.  But there would just be another problem after that; a likely
 example is not having adequate statistics, or sufficiently fine-grained
 function cost estimates, to be able to make valid choices once there's
 more than one way to do such calculations.  (I'm not really impressed by
 the GPU is *always* faster approaches.)  Significant improvements of
 that sort are going to take core-code changes.

Probably, but giving people the ability to experiment outside core,
even if it's limited, often leads to good things.  And when it
doesn't, it doesn't really cost us anything.

 Even worse, if there do get to be any popular custom-scan extensions,
 we'll break them anytime we make any nontrivial planner changes, because
 there is no arms-length API there.  A trivial example is that even adding
 or changing any fields in struct Path will necessarily break custom scan
 providers, because they build Paths for themselves with no interposed API.

I agree that will happen, but I don't care.  This happens all the
time, and every person other than yourself who has commented on this
issue has said that they'd rather have an API exposed that will later
get whacked around without warning than have no API exposed at all,
not just with respect to custom-paths, but with a whole variety of
other things as well, like sticking PGDLLIMPORT on all of the
variables that back GUCs.  It is really far more tedious to try to
work around the lack of a PGDLLIMPORT marking (which causes a huge
annoyance now) than to adjust the code if and when somebody changes
something about the GUC (which causes a small annoyance later, and
only if there actually are changes).

 In large part this is the same as my core concern about the TABLESAMPLE
 patch: exposing dubiously-designed APIs is soon going to force us to make
 choices between breaking those APIs or not being able to make 

Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-07-15 Thread Robert Haas
On Wed, Jul 15, 2015 at 2:28 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jul 15, 2015 at 10:12 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 We never guarantee the interface compatibility between major version up.
 If we add/modify interface on v9.6, it is duty for developer of extensions
 to follow the new version, even not specific to custom-scan provider.
 If v9.6 adds support fine-grained function cost estimation, I also have
 to follow the feature, but it is natural.

 Maintaining compatibility across major versions is a best-effort and
 even if we sometimes break things across major versions, and sometimes
 even silently (take the example of 9.3's background worker that do not
 start with 9.4 as long as bgw_notify_pid is not set to 0), the
 approach is usually taken to have APIs stable and convenient able to
 cover a maximum set of cases for a given set of plugins, and this
 serves well in the long term. Now it is true that we cannot assume
 either that the version of a plugin API will be perfect forever and
 will be able to cover all the imagined test cases at first shot, still
 I'd like to think that things are broken only when necessary and with
 good reasons. A set of APIs changed at each major release tends to be
 proof that research lacked in the first version, and would surely
 demotivate its adoption by extension developers.

Maybe.  If those changes are demanded by extension developers who are
trying to do useful stuff with the APIs and finding problems, then
changing things will probably make those extension developers happy.
If the changes are necessitated by core improvements, then we might
get some complaints if those core improvements are perceived to have
small value compared to the API break.  But I think complaints of that
kind are very rare.  The only thing I can think of that falls into
approximately that category is a gripe about the replacement of
relistemp by relpersistence.  But I think that was more motivated by
the way it broke SQL monitoring queries than by the way it broke C
code.  There may be other more recent examples; but that's the only
one I can think of offhand.  I just don't see it as a big issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers