Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-03 Thread Peter Geoghegan
On Tue, Mar 3, 2015 at 12:05 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 My experimental branch works just fine (with a variant jjanes_upsert
 with subxact looping), until I need to restart an update after a
 failed heap_update() that still returned HeapTupleMayBeUpdated
 (having super deleted within an ExecUpdate() call). There is no good
 way to do that for ExecUpdate() that I can see, because an existing,
 visible row is affected (unlike with ExecInsert()). Even if it was
 possible, it would be hugely invasive to already very complicated code
 paths.

 Ah, so we can't easily use super-deletion to back out an UPDATE. I had not
 considered that.

Yeah. When I got into considering making EvalPlanQualFetch() look at
speculative tokens, it became abundantly clear that that code would
never be committed, even if I could make it work.

 I continue to believe that the best way forward is to incrementally
 commit the work by committing ON CONFLICT IGNORE first. That way,
 speculative tokens will remain strictly the concern of UPSERTers or
 sessions doing INSERT ... ON CONFLICT IGNORE.


 Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE,
 please?

Of course. I'll have that for your shortly.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-03 Thread Heikki Linnakangas

On 03/02/2015 11:21 PM, Peter Geoghegan wrote:

On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

Hmm. I used a b-tree to estimate the effect that the locking would have in
the UPSERT case, for UPSERT into a table with a b-tree index. But you're
right that for the question of whether this is acceptable for the case of
regular insert into a table with exclusion constraints, other indexams are
more interesting. In a very quick test, the overhead with a single GiST
index seems to be about 5%. IMHO that's still a noticeable slowdown, so
let's try to find a way to eliminate it.


Honestly, I don't know why you're so optimistic that this can be
fixed, even with that heavyweight lock (for regular inserters +
exclusion constraints).


We already concluded that it can be fixed, with the heavyweight lock. 
See http://www.postgresql.org/message-id/54f4a0e0.4070...@iki.fi. Do you 
see some new problem with that that hasn't been discussed yet? To 
eliminate the heavy-weight lock, we'll need some new ideas, but it 
doesn't seem that hard.



My experimental branch, which I showed you privately shows big
problems when I simulate upserting with exclusion constraints (so we
insert first, handle exclusion violations using the traditional upsert
subxact pattern that does work with B-Trees). It's much harder to back
out of a heap_update() than it is to back out of a heap_insert(),
since we might well be updated a tuple visible to some other MVCC
snapshot. Whereas we can super delete a tuple knowing that only a
dirty snapshot might have seen it, which bounds the complexity (only
wait sites for B-Trees + exclusion constraints need to worry about
speculative insertion tokens and so on).

My experimental branch works just fine (with a variant jjanes_upsert
with subxact looping), until I need to restart an update after a
failed heap_update() that still returned HeapTupleMayBeUpdated
(having super deleted within an ExecUpdate() call). There is no good
way to do that for ExecUpdate() that I can see, because an existing,
visible row is affected (unlike with ExecInsert()). Even if it was
possible, it would be hugely invasive to already very complicated code
paths.


Ah, so we can't easily use super-deletion to back out an UPDATE. I had 
not considered that.



I continue to believe that the best way forward is to incrementally
commit the work by committing ON CONFLICT IGNORE first. That way,
speculative tokens will remain strictly the concern of UPSERTers or
sessions doing INSERT ... ON CONFLICT IGNORE.


Ok, let's try that. Can you cut a patch that does just ON CONFLICT 
IGNORE, please?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-03-03 Thread Shigeru Hanada
2015-03-02 23:07 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 I seem to be getting a problem with whole-row references:

 # SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
 e.person_id = p.id inner join countries c on p.country_id = c.id;
 ERROR:  table r has 3 columns available but 4 columns specified
 CONTEXT:  Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT 
 id,
 country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
 country_id FROM public.people) r (a_0, a_1, a_2, a_3)  ON ((r.a_3 = l.a_0))

 In this case, the 4th target-entry should be l, not l.a_1.

Actually.  I fixed that part.

 And the error message could be somewhat confusing.  This mentions table r, 
 but
 there's no such table or alias in my actual query.

 However, do we have a mechanical/simple way to distinguish the cases when
 we need relation alias from the case when we don't need it?
 Like a self-join cases, we has to construct a remote query even if same
 table is referenced multiple times in a query. Do you have a good idea?

I'd like to vote for keeping current aliasing style, use l and r
for join source relations, and use a_0, a_1, ... for each column of
them.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-03-03 Thread Shigeru Hanada
Thanks for reviewing my patch.

2015-03-02 22:50 GMT+09:00 Thom Brown t...@linux.com:
 I seem to be getting a problem with whole-row references:

 # SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
 e.person_id = p.id inner join countries c on p.country_id = c.id;
 ERROR:  table r has 3 columns available but 4 columns specified
 CONTEXT:  Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT
 id, country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
 country_id FROM public.people) r (a_0, a_1, a_2, a_3)  ON ((r.a_3 = l.a_0))

 And the error message could be somewhat confusing.  This mentions table r,
 but there's no such table or alias in my actual query.

Your concern would not be limited to the feature I'm proposing,
because fdw_options about object name also introduce such mismatch
between the query user constructed and another one which postgres_fdw
constructed for remote execution.  Currently we put CONTEXT line for
such purpose, but it might hard to understand soon only from error
messages.  So I'd like to add section about query debugging for
postgres_fdw.

 Another issue:

 # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN
 countries ON people.country_id = countries.id LIMIT 3) x;
 ERROR:  could not open relation with OID 0

Good catch.  In my quick trial, removing LIMIT3 avoids this error.
I'll check it right now.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-03-03 Thread Shigeru Hanada
Kaigai-san,

The v6 patch was cleanly applied on master branch.  I'll rebase my
patch onto it, but before that I have a comment about name of the new
FDW API handler GetForeignJoinPath.

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

I'll review the v6 path afterwards.


2015-03-03 20:20 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 Sorry, I misoperated on patch creation.
 Attached one is the correct version.
 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Tuesday, March 03, 2015 6:31 PM
 To: Kaigai Kouhei(海外 浩平); Robert Haas
 Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
 Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

 The attached version of custom/foreign-join interface patch
 fixes up the problem reported on the join-pushdown support
 thread.

 The previous version referenced *_ps_tlist on setrefs.c, to
 check whether the Custom/ForeignScan node is associated with
 a particular base relation, or not.
 This logic considered above nodes performs base relation scan,
 if *_ps_tlist is valid. However, it was incorrect in case when
 underlying pseudo-scan relation has empty targetlist.
 Instead of the previous logic, it shall be revised to check
 scanrelid itself. If zero, it means Custom/ForeignScan node is
 not associated with a particular base relation, thus, its slot
 descriptor for scan shall be constructed based on *_ps_tlist.


 Also, I noticed a potential problem if CSP/FDW driver want to
 displays expression nodes using deparse_expression() but
 varnode within this expression does not appear in the *_ps_tlist.
 For example, a remote query below shall return rows with two
 columns.

   SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

 Thus, ForeignScan will perform like as a scan on relation with
 two columns, and FDW driver will set two TargetEntry on the
 fdw_ps_tlist. If FDW is designed to keep the join condition
 (aid = bid) using expression node form, it is expected to be
 saved on custom/fdw_expr variable, then setrefs.c rewrites the
 varnode according to *_ps_tlist.
 It means, we also have to add *_ps_tlist both of aid and bid
 to avoid failure on variable lookup. However, these additional
 entries changes the definition of the slot descriptor.
 So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
 use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
 the slot descriptor based on the *_ps_tlist.
 It expects CSP/FDW drivers to add target-entries with resjunk=true,
 if it wants to have additional entries for variable lookups on
 EXPLAIN command.

 Fortunately or unfortunately, postgres_fdw keeps its remote query
 in cstring form, so it does not need to add junk entries on the
 fdw_ps_tlist.

 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
  Sent: Sunday, February 15, 2015 11:01 PM
  To: Kaigai Kouhei(海外 浩平); Robert Haas
  Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
  Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan 
  API)
 
  The attached patch is a rebased version of join replacement with
  foreign-/custom-scan. Here is no feature updates at this moment
  but SGML documentation is added (according to Michael's comment).
 
  This infrastructure allows foreign-data-wrapper and custom-scan-
  provider to add alternative scan paths towards relations join.
  From viewpoint of the executor, it looks like a scan on a pseudo-
  relation that is materialized from multiple relations, even though
  FDW/CSP internally processes relations join with their own logic.
 
  Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
  scan node runs on a pseudo relation and (2) fdw_ps_tlist and
  custom_ps_tlist introduce the definition of the pseudo relation,
  because it is not associated with a tangible relation unlike
  simple scan case, thus planner cannot know the expected record
  type to be returned without these additional information.
  These two enhancement enables extensions to process relations
  join internally, and to perform as like existing scan node from
  viewpoint of the core backend.
 
  Also, as an aside. I had a discussion with Hanada-san about this
  interface off-list. He had an idea to keep create_plan_recurse()
  static, using a special list field in CustomPath structure to
  chain underlying Path node. If core backend translate the Path
  node 

Re: [HACKERS] autogenerated column names + views are a dump hazard

2015-03-03 Thread Andres Freund
On 2015-03-02 16:32:53 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  The easiest way to solve this would teach ruleutils.c to simply always
  attach AS clauses for auto-generated columnnames. Won't look too pretty
  though. Does somebody have a better idea?
 
 No, it would look awful :-(.

Hm, so I looked into it, and I think the problem is actually restricted
to columns where the typename that FigureColname() assigns is different
from the one that will result after ger_rule_expr()/get_const_expr()'s
implicit cast is added.

For this case it seems easiest if we'd make get_rule_expr() (and some of
its helpers) return whether an implicit cast has been added. Whenever an
implicit cast is added in the target list we should add an alias - as
the generated column name will be different from before. That's a bit
invasive, but doesn't seem too bad.  A quick hack doing so shows that
there's no changes in the regression output when doing it for consts.


 But I think we might have enough
 infrastructure in there to notice whether the column is actually
 referenced or not.  So we could label the columns only if necessary,
 which would at least limit the ugliness.

I don't think looking for referenced columns is actually sufficient,
unless you define referenced pretty widely at least.

Before the dump CREATE VIEW ... AS SELECT '2' ORDER BY 1; will yield a
'?column?'  columnname. After a dump/restore it'll be 'text'. That
doesn't seem desireable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-03-03 Thread Shigeru Hanada
I rebased join push-down patch onto Kaigai-san's Custom/Foreign Join
v6 patch.  I posted some comments to v6 patch in this post:

http://www.postgresql.org/message-id/CAEZqfEcNvjqq-P=jxnw1pb4t9wvpcporcn7g6cc46jgub7d...@mail.gmail.com

Before applying my v3 patch, please apply Kaigai-san's v6 patch and my
mod_cjv6.patch.
Sorry for complex patch combination.  Those patches will be arranged
soon by Kaigai-san and me.

I fixed the issues pointed out by Thom and Kohei, but still the patch
has an issue about joins underlying UPDATE or DELETE.  Now I'm working
on fixing this issue.  Besides this issue, existing regression test
passed.

2015-03-03 19:48 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
  * Bug reported by Thom Brown
  -
  # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN
 countries ON people.country_id = countries.id LIMIT 3) x;
  ERROR:  could not open relation with OID 0
 
  Sorry, it was a problem caused by my portion. The patched setrefs.c
  checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
  node is associated with a certain base relation. If *_ps_tlist is
  valid, it also expects scanrelid == 0.
  However, things we should check is incorrect. We may have a case
  with empty *_ps_tlist if remote join expects no columns.
  So, I adjusted the condition to check scanrelid instead.

 Is this issue fixed by v5 custom/foreign join patch?

 Yes, please rebase it.

 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com




-- 
Shigeru HANADA


foreign_join_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-03-03 Thread Julien Tachoires
Hi,

Sorry for the delay, I missed this thread.
Here is a new version of this patch considering Andreas' comments.

On 30/12/2014 03:48, Andreas Karlsson wrote:
 - A test fails in create_view.out. I looked some into it and did not see 
 how this could happen.
 
  *** 
 /home/andreas/dev/postgresql/src/test/regress/expected/create_view.out 
 2014-08-09 01:35:50.008886776 +0200
  --- 
 /home/andreas/dev/postgresql/src/test/regress/results/create_view.out 
 2014-12-30 00:41:17.966525339 +0100
  ***
  *** 283,289 ***
  relname   | relkind |reloptions
+-+--
 mysecview1 | v   |
  !  mysecview2 | v   |
 mysecview3 | v   | {security_barrier=true}
 mysecview4 | v   | {security_barrier=false}
(4 rows)
  --- 283,289 
  relname   | relkind |reloptions
+-+--
 mysecview1 | v   |
  !  mysecview2 | v   | {security_barrier=true}
 mysecview3 | v   | {security_barrier=true}
 mysecview4 | v   | {security_barrier=false}
(4 rows)

I can't reproduce this issue.

 - pg_dump is broken
 
  pg_dump: [archiver (db)] query failed: ERROR:  syntax error at or 
 near (
  LINE 1: ...nest(tc.reloptions) x), ', ') AS toast_reloptions 
 (SELECT sp...

Fixed.

 - ALTER INDEX foo_idx SET TABLE TABLESPACE ... should not be allowed, 
 currently it changes the tablespace of the index. I do not think ALTER 
 INDEX foo_idx SET (TABLE|TOAST) TABLESPACE ... should even be allowed 
 in the grammar.

Fixed.

 - You should add a link to 
 http://www.postgresql.org/docs/current/interactive/storage-toast.html to 
 the manual page of ALTER TABLE.

Added.

 - I do not like how \d handles the toast tablespace. Having TOAST in 
 pg_default and the table in another space looks the same as if there was 
 no TOAST table at all. I think we should always print both tablespaces 
 if either of them are not pg_default.

If we do it that way, we should always show the tablespace even if it's
pg_default in any case to be consistent. I'm pretty sure that we don't
want that.

 - Would it be interesting to add syntax for moving the toast index to a 
 separate tablespace?

-1, we just want to be able to move TOAST heap, which is supposed to
contain a lot of data and we want to move on a different storage device
/ filesystem.

 - There is no warning if you set the toast table space of a table 
 lacking toast. I think there should be one.

A notice is raised now in that case.

 - No support for materialized views as pointed out by Alex.

Support, documentation and regression tests added.

 - I think the code would be cleaner if ATPrepSetTableSpace and 
 ATPrepSetToastTableSpace were merged into one function or split into 
 two, one setting the toast and one setting the tablespace of the table 
 and one setting it on the TOAST table.

Done.

 - I think a couple of tests for the error cases would be nice.

2 more tests added.

 - Missing periods on the ALTER TABLE manual page after See also CREATE 
 TABLESPACE (in two places).
 
 - Missing period last in the new paragraph added to the storage manual page.

I don't understand those 2 last points ?

 - Double spaces in src/backend/catalog/toasting.c after if (new_toast.

Fixed.

 - The comment and if this is not a TOAST re-creation case (through 
 CLUSTER). incorrectly implies that CLUSTER is the only case where the 
 TOAST table is recreated.

Fixed.

 - The local variables ToastTableSpace and ToastRel do not follow the 
 naming conventions.

Fixed (I hope so).

 - Remove the parentheses around (is_system_catalog).

Fixed.

 - Why was the Save info for Phase 3 to do the real work comment 
 changed to XXX: Save info for Phase 3 to do the real work?

Fixed.

 - Incorrect indentation for new code added last in ATExecSetTableSpace.

Fixed.

 - The patch adds commented out code in src/include/commands/tablecmds.h.

Fixed.

Thank you for your review.

--
Julien
diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index b0759fc..321ffc9 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -44,6 +44,8 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE replaceable class=parametername/r
 RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
 OWNER TO replaceable class=PARAMETERnew_owner/replaceable
 SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
+SET TABLE TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
+SET TOAST TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 /synopsis
  /refsynopsisdiv
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index b3a4970..777ba57 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ 

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Andres Freund
On 2015-03-03 15:21:24 +, Greg Stark wrote:
 Fwiw this concerns me slightly. I'm sure a lot of people are doing
 things like kill -HUP `cat .../postmaster.pid` or the equivalent.

postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost
  5440001  82345992

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-03-03 15:21:24 +, Greg Stark wrote:
 Fwiw this concerns me slightly. I'm sure a lot of people are doing
 things like kill -HUP `cat .../postmaster.pid` or the equivalent.

 postmaster.pid already contains considerably more than just the pid. e.g.

Yeah, that ship sailed long ago.  I'm sure anyone who's doing this is
using head -1 not just cat, and what I suggest wouldn't break that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Comparing primary/HS standby in tests

2015-03-03 Thread Andres Freund
Hi,

I've regularly wished we had automated tests that setup HS and then
compare primary/standby at the end to verify replay worked
correctly.

Heikki's page comparison tools deals with some of that verification, but
it's really quite expensive and doesn't care about runtime only
differences. I.e. it doesn't test HS at all.

I every now and then run installcheck against a primary, verify that
replay works without errors, and then compare pg_dumpall from both
clusters. Unfortunately that currently requires hand inspection of
dumps, there are differences like:
-SELECT pg_catalog.setval('default_seq', 1, true);
+SELECT pg_catalog.setval('default_seq', 33, true);

The reason these differences is that the primary increases the
sequence's last_value by 1, but temporarily sets it to +SEQ_LOG_VALS
before XLogInsert(). So the two differ.

Does anybody have a good idea how to get rid of that difference? One way
to do that would be to log the value the standby is sure to have - but
that's not entirely trivial.

I'd very much like to add a automated test like this to the tree, but I
don't see wa way to do that sanely without a comparison tool...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Syed, Rahila
Hello,

It would be good to get those problems fixed first. Could you send an updated 
patch?

Please find attached updated patch with WAL replay error fixed. The patch 
follows chunk ID approach of xlog format.

Following are brief measurement numbers. 
  WAL
FPW compression on   122.032 MB

FPW compression off   155.239 MB

HEAD  155.236 MB


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-of-full-page-writes_v22.patch
Description: Support-compression-of-full-page-writes_v22.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Greg Stark
On Mon, Jun 30, 2014 at 8:06 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 After sleeping on it, I realised that the code would return '{all}' for
 'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly
 ambiguous, but I don't think it's especially useful for callers.

Hm. Nope, it doesn't. It just says {all} regardless of whether all
is quoted or not.

This makes sense, the rules for when to quote things in pg_hba and
when to quote things for arrays are separate. And we definitely don't
want to start adding quotes to every token that the parser noted was
quoted because then you'll start seeing things like {\all\} and
{\database with space\} which won't make it any easier to match
things.

I'm not sure adding a separate column is really the solution either.
You can have things like all,databasename (which is the keyword all
not a database all). Or more likely something like sameuser,bob or
even things like replication,all.

I'm thinking leaving well enough alone is probably best. It's not
perfect but if a user does have a database named all or
replication or a user named sameuser or samerole then it's not
like pg_hba_settings crashes or anything, it just produces information
that's hard to interpret and the response might just be don't do
that.

The only other option I was pondering was using a jsonb instead of an
array. That would give us more flexibility and we could have a json
array that contained strings and objects representing keywords. But
most hba files consist *mostly* of singleton keywords,  so the result
might be kind of cumbersome.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Greg Stark
On Fri, Feb 20, 2015 at 1:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. Extend the definition of the postmaster.pid file to add another
 line, which will contain the time of the last postmaster configuration
 load attempt (might as well be a numeric Unix-style timestamp) and
 a boolean indication of whether that attempt succeeded or not.


Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like kill -HUP `cat .../postmaster.pid` or the equivalent.

We do have the external_pid_file GUC so perhaps this just means we
should warn people in the release notes or somewhere and point them at
that GUC.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-03 Thread David Kubečka
Hi all,

I have encountered a performance problem with relatively simple query,
which I think is caused by the overly pesimistic estimates in optimizer. I
have originally run into this issue on a table few GBs large, but it can be
reproduced with much smaller table as follows:

-- Setup main fact table
create table facts (fk int, f numeric);
insert into facts select (random()*1)::int, random()*1 from
generate_series(1,100) g(i);
create index on facts (fk);
vacuum analyze facts;

-- Pick 100 values of 'fk' which cover roughly 1/100 rows from the 'facts'
table and select all corresponding values
create table random_fk_dupl as select fk from facts where fk in (select
(random()*1)::int from generate_series(1,100) g(i));
vacuum analyze random_fk_dupl;

-- Create a table with unique values from 'random_fk_dupl'
create table random_fk_uniq as select distinct fk from random_fk_dupl;
vacuum analyze random_fk_uniq;

(The motivation for doing this is that I have a persistent table/cache with
aggregated values and I want to update this table whenever new data are
loaded to 'facts' by processing only loaded data. This is very rough
description but it isn't needed to go into more detail for the sake of the
argument.)

Now the main query:

david=# explain analyze select fk, sum(f) from facts where fk in (select fk
from random_fk_dupl) group by 1;
  QUERY
PLAN
--
 HashAggregate  (cost=892.82..1017.34 rows=9962 width=15) (actual
time=22.752..22.791 rows=100 loops=1)
   Group Key: facts.fk
   -  Nested Loop  (cost=167.01..842.63 rows=10038 width=15) (actual
time=2.167..16.739 rows=9807 loops=1)
 -  HashAggregate  (cost=166.59..167.59 rows=100 width=4) (actual
time=2.153..2.193 rows=100 loops=1)
   Group Key: random_fk_dupl.fk
   -  Seq Scan on random_fk_dupl  (cost=0.00..142.07 rows=9807
width=4) (actual time=0.003..0.688 rows=9807 loops=1)
 -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
   Index Cond: (fk = random_fk_dupl.fk)
 Planning time: 0.372 ms
 Execution time: 22.842 ms

This is the plan I would expect given the estimates (quite good except the
last aggregation) for this query and it exactly corresponds to what's
written in the README of optimizer:

{quote}
The naive way to join two relations using a clause like WHERE A.X = B.Y
is to generate a nestloop plan like this:

NestLoop
Filter: A.X = B.Y
- Seq Scan on A
- Seq Scan on B

We can make this better by using a merge or hash join, but it still
requires scanning all of both input relations.  If A is very small and B is
very large, but there is an index on B.Y, it can be enormously better to do
something like this:

NestLoop
- Seq Scan on A
- Index Scan using B_Y_IDX on B
Index Condition: B.Y = A.X

Here, we are expecting that for each row scanned from A, the nestloop
plan node will pass down the current value of A.X into the scan of B.
That allows the indexscan to treat A.X as a constant for any one
invocation, and thereby use it as an index key.  This is the only plan type
that can avoid fetching all of B, and for small numbers of rows coming from
A, that will dominate every other consideration.  (As A gets larger, this
gets less attractive, and eventually a merge or hash join will win instead.
So we have to cost out all the alternatives to decide what to do.)
{quote}

So far so good. Now let's use 'random_fk_uniq' instead of 'random_fk_dupl'.
I would expect almost the same plan (with just cheaper inner
HashAggregate), because the optimizer knows that thar there are (at most)
100 values in 'random_fk_uniq' so nested loop is clearly the best choice.
Instead we get this:

david=# explain analyze select fk, sum(f) from facts where fk in (select fk
from random_fk_uniq) group by 1;
  QUERY
PLAN
--
 HashAggregate  (cost=18198.11..18322.64 rows=9962 width=15) (actual
time=163.738..163.773 rows=100 loops=1)
   Group Key: facts.fk
   -  Hash Semi Join  (cost=3.25..18147.92 rows=10038 width=15) (actual
time=0.298..160.271 rows=9807 loops=1)
 Hash Cond: (facts.fk = random_fk_uniq.fk)
 -  Seq Scan on facts  (cost=0.00..15408.00 rows=100 width=15)
(actual time=0.010..66.524 rows=100 loops=1)
 -  Hash  (cost=2.00..2.00 rows=100 width=4) (actual
time=0.079..0.079 rows=100 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 4kB
   -  Seq Scan on random_fk_uniq  (cost=0.00..2.00 rows=100
width=4) (actual time=0.003..0.028 rows=100 loops=1)
 Planning time: 

[HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-03 Thread David Kubečka
Hi all,

I have encountered a performance problem with relatively simple query,
which I think is caused by the overly pesimistic estimates in optimizer. I
have originally run into this issue on a table few GBs large, but it can be
reproduced with much smaller table as follows:

-- Setup main fact table
create table facts (fk int, f numeric);
insert into facts select (random()*1)::int, random()*1 from
generate_series(1,100) g(i);
create index on facts (fk);
vacuum analyze facts;

-- Pick 100 values of 'fk' which cover roughly 1/100 rows from the 'facts'
table and select all corresponding values
create table random_fk_dupl as select fk from facts where fk in (select
(random()*1)::int from generate_series(1,100) g(i));
vacuum analyze random_fk_dupl;

-- Create a table with unique values from 'random_fk_dupl'
create table random_fk_uniq as select distinct fk from random_fk_dupl;
vacuum analyze random_fk_uniq;

(The motivation for doing this is that I have a persistent table/cache with
aggregated values and I want to update this table whenever new data are
loaded to 'facts' by processing only loaded data. This is very rough
description but it isn't needed to go into more detail for the sake of the
argument.)

Now the main query:

david=# explain analyze select fk, sum(f) from facts where fk in (select fk
from random_fk_dupl) group by 1;
  QUERY
PLAN
--
 HashAggregate  (cost=892.82..1017.34 rows=9962 width=15) (actual
time=22.752..22.791 rows=100 loops=1)
   Group Key: facts.fk
   -  Nested Loop  (cost=167.01..842.63 rows=10038 width=15) (actual
time=2.167..16.739 rows=9807 loops=1)
 -  HashAggregate  (cost=166.59..167.59 rows=100 width=4) (actual
time=2.153..2.193 rows=100 loops=1)
   Group Key: random_fk_dupl.fk
   -  Seq Scan on random_fk_dupl  (cost=0.00..142.07 rows=9807
width=4) (actual time=0.003..0.688 rows=9807 loops=1)
 -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
   Index Cond: (fk = random_fk_dupl.fk)
 Planning time: 0.372 ms
 Execution time: 22.842 ms

This is the plan I would expect given the estimates (quite good except the
last aggregation) for this query and it exactly corresponds to what's
written in the README of optimizer:

{quote}
The naive way to join two relations using a clause like WHERE A.X = B.Y
is to generate a nestloop plan like this:

NestLoop
Filter: A.X = B.Y
- Seq Scan on A
- Seq Scan on B

We can make this better by using a merge or hash join, but it still
requires scanning all of both input relations.  If A is very small and B is
very large, but there is an index on B.Y, it can be enormously better to do
something like this:

NestLoop
- Seq Scan on A
- Index Scan using B_Y_IDX on B
Index Condition: B.Y = A.X

Here, we are expecting that for each row scanned from A, the nestloop
plan node will pass down the current value of A.X into the scan of B.
That allows the indexscan to treat A.X as a constant for any one
invocation, and thereby use it as an index key.  This is the only plan type
that can avoid fetching all of B, and for small numbers of rows coming from
A, that will dominate every other consideration.  (As A gets larger, this
gets less attractive, and eventually a merge or hash join will win instead.
So we have to cost out all the alternatives to decide what to do.)
{quote}

So far so good. Now let's use 'random_fk_uniq' instead of 'random_fk_dupl'.
I would expect almost the same plan (with just cheaper inner
HashAggregate), because the optimizer knows that thar there are (at most)
100 values in 'random_fk_uniq' so nested loop is clearly the best choice.
Instead we get this:

david=# explain analyze select fk, sum(f) from facts where fk in (select fk
from random_fk_uniq) group by 1;
  QUERY
PLAN
--
 HashAggregate  (cost=18198.11..18322.64 rows=9962 width=15) (actual
time=163.738..163.773 rows=100 loops=1)
   Group Key: facts.fk
   -  Hash Semi Join  (cost=3.25..18147.92 rows=10038 width=15) (actual
time=0.298..160.271 rows=9807 loops=1)
 Hash Cond: (facts.fk = random_fk_uniq.fk)
 -  Seq Scan on facts  (cost=0.00..15408.00 rows=100 width=15)
(actual time=0.010..66.524 rows=100 loops=1)
 -  Hash  (cost=2.00..2.00 rows=100 width=4) (actual
time=0.079..0.079 rows=100 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 4kB
   -  Seq Scan on random_fk_uniq  (cost=0.00..2.00 rows=100
width=4) (actual time=0.003..0.028 rows=100 loops=1)
 Planning time: 

Re: [HACKERS] autogenerated column names + views are a dump hazard

2015-03-03 Thread Andres Freund
On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  For this case it seems easiest if we'd make get_rule_expr() (and some of
  its helpers) return whether an implicit cast has been added.

 Aside from being pretty ugly, that doesn't seem particularly
 bulletproof.

Right.

 It might deal all right with this specific manifestation, but now that
 we've seen the problem, who's to say that there aren't other ways to get
 bit? Or that some innocent future change to FigureColname() might not
 create a new one?  It's not like the behavior of that function was handed
 down on stone tablets --- we do change it from time to time.

Sure it'd not be a guarantee, but what is?

 I wondered whether there was a way to directly test whether
 FigureColname() gives a different result from what we have recorded
 as the effective column name.  Unfortunately, it wants a raw parse tree
 whereas what we have is an analyzed parse tree, so there's no easy way
 to apply it and see.

Additionally the casts added by get_rule_expr() show up in neither tree,
so that'd not in itself help.

 Now, we do have the deparsed text of the column expression at hand,
 so in principle you could run that back through the grammar to get a raw
 parse tree, and then apply FigureColname() to that.  Not sure what that
 would be like from a performance standpoint, but it would provide a pretty
 robust fix for this class of problem.

Yea, I've wondered about that as well. Given that this is pretty much
only run during deparsing I don't think the overhead would be relevant.

 And on the third hand ... doing that would really only be robust as long
 as you assume that the output will be read by a server using exactly the
 same FigureColname() logic as what we are using.  So maybe the whole idea
 is a bad one, and we should just bite the bullet and print AS clauses
 always.

I think this is the way to go though. There's different extremes we can
go to though - the easiest is to simply remove the attname = ?column?
assignment from get_target_list(). That way plain Var references (aside
from whole row vars/subplans and such that get_variable() deals with)
don't get a forced alias, but everything else does.  That seems like a
good compromise between readability and safety. get_rule_expr() deals
with most of the nasty stuff.

I did this, and the noise in the regression test output isn't too
bad. Primarily that a fair number of of EXISTS(SELECT 1 .. ) grow an
alias.

 Or we could consider printing them always if the pretty flag isn't
 on.  IIRC, pg_dump doesn't enable that.

Not a fan of that, I'd rather not output queries that can't be executed.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 04:55:56PM +0300, Vladimir Borodin wrote:
 OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
 master set up for streaming replication when you ran pg_upgrade.  Is
 that correct?  Should I specify that specifically in the instructions?
 
 
 After running pg_upgrade I do put in new PGDATA on master old pg_hba.conf and
 postgresql.conf with wal_level = hot_standby. The full content of
 postgresql.conf could be seen here - http://pastie.org/9995902. Then I do 
 rsync
 to replica, put recovery.conf and try to start both - first master, then
 replica. If I turn off hot_standby in replica configuration, it starts. What 
 am
 I doing wrong?

After running initdb to create the new master, but before running
pg_upgrade, modify the new master's postgresql.conf and change wal_level
= hot_standby.  (Don't modify pg_hba.conf at this stage.)

I didn't think that was necessary, but this might be some 9.3-specific
problem, but let's get it working first.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding

2015-03-03 Thread Andres Freund
On 2015-02-25 14:04:55 -0800, Peter Geoghegan wrote:
 On Wed, Feb 25, 2015 at 3:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  I'm pretty sure this will entirely fail if you have a transaction that's
  large enough to spill to disk. Calling ReorderBufferIterTXNNext() will
  reuse the memory for the in memory changes.
 
  It's also borked because it skips a large number of checks for the next
  change. You're entirely disregarding what the routine does otherwise -
  it could be a toast record, it could be a change to a system table, it
  could be a new snapshot...
 
  Also, this seems awfully fragile in th presence of toast rows and such?
  Can we be entirely sure that the next WAL record logged by that session
  would be the internal super deletion?
 
 toast_save_datum() is called with a heap_insert() call before heap
 insertion for the tuple proper. We're relying on the assumption that
 if there is no immediate super deletion record, things are fine. We
 cannot speculatively insert into catalogs, BTW.

I fail to see what prohibits e.g. another catalog modifying transaction
to commit inbetween and distribute a new snapshot.

 Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case
 next, in the case that we need to care about (when the tuple was super
 deleted immediately afterwards)?

It's irrelevant whether you care about it or not. Your
ReorderBufferIterTXNNext() consumes a change that needs to actually be
processed. It could very well be something important like a new
snapshot.

 As for the idea of writing WAL separately, I don't think it's worth
 it. We'd need to go exclusive lock the buffer again, which seems like
 a non-starter.

Meh^2. That won't be the most significant cost of an UPSERT.

 While what I have here *is* very ugly, I see no better alternative. By
 doing what you suggest, we'd be finding a special case way of safely
 violating the general WAL log-before-data rule.

No, it won't. We don't WAL log modifications that don't need to persist
in a bunch of places. It'd be perfectly fine to start upsert with a
'acquiring a insertion lock' record that pretty much only contains the
item pointer (and a FPI if necessary to prevent torn pages). And then
only log the full new record once it's sure that the whole thing needs
to survive.  Redo than can simply clean out the size touched by the
insertion lock.

 Performance aside, that seems like a worse, less isolated kludge...no?

No. I'm not sure it'll come up significantly better, but I don't see it
coming up much worse. WAL logging where the result of the WAL logged
action essentially can't be determined until many records later aren't
pretty.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-03-03 Thread Vladimir Borodin

 2 марта 2015 г., в 21:28, Bruce Momjian br...@momjian.us написал(а):
 
 On Tue, Feb 24, 2015 at 12:13:17PM +0300, Vladimir Borodin wrote:
 
20 февр. 2015 г., в 18:21, Bruce Momjian br...@momjian.us написал(а):
 
On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote:
 
#3 bothered me as well because it was not specific enough.  I like
what
you've added to clarify the procedure.
 
 
Good.  It took me a while to understand why they have to be in sync 
 ---
because we are using rsync in size-only-comparison mode, if they are
not
in sync we might update some files whose sizes changed, but not 
 others,
and the old slave would be broken.  The new slave is going to get all
new files or hard links for user files, so it would be fine, but we
should be able to fall back to the old slaves, and having them in sync
allows that.
 
 
Also, since there was concern about the instructions, I am thinking of
applying the patch only to head for 9.5, and then blog about it if
people want to test it.
 
 
 Am I right that if you are using hot standby with both streaming replication
 and WAL shipping you do still need to take full backup of master after using
 pg_upgrade?
 
 No, you would not need to take a full backup if you use these instructions.

Although it would be applied to documentation for 9.5 only, are these 
instructions applicable for upgrading from 9.3.6 to 9.4.1?

Following the instructions from patch I’ve got following errors in 
postgresql.log of replica after trying to start it with hot_standby = on:

 2015-02-24 11:47:22.861 MSK WARNING:  WAL was generated with 
wal_level=minimal, data may be missing
 2015-02-24 11:47:22.861 MSK HINT:  This happens if you temporarily set 
wal_level=minimal without taking a new base backup.
 2015-02-24 11:47:22.861 MSK FATAL:  hot standby is not possible because 
wal_level was not set to hot_standby or higher on the master server
 2015-02-24 11:47:22.861 MSK HINT:  Either set wal_level to hot_standby on 
the master, or turn off hot_standby here.
 2015-02-24 11:47:22.862 MSK LOG:  startup process (PID 28093) exited with 
exit code 1
 2015-02-24 11:47:22.862 MSK LOG:  aborting startup due to startup process 
failure

 
 -- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
 
  + Everyone has their own god. +


--
Да пребудет с вами сила…
https://simply.name/ru



Re: [HACKERS] Join push-down support for foreign tables

2015-03-03 Thread Shigeru Hanada
Thanks for the detailed comments.

2015-03-03 18:01 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 Hanada-san,

 I checked the patch, below is the random comments from my side.

 * Context variables
 ---
 Sorry, I might give you a wrong suggestion.
 The foreign_glob_cxt and deparse_expr_cxt were re-defined as follows:

 typedef struct foreign_glob_cxt
  {
 PlannerInfo *root;  /* global planner state */
 -   RelOptInfo *foreignrel; /* the foreign relation we are 
 planning
 +   RelOptInfo *outerrel;   /* the foreign relation, or outer 
 child
 +   RelOptInfo *innerrel;   /* inner child, only set for join */
  } foreign_glob_cxt;

  /*
 @@ -86,9 +89,12 @@ typedef struct foreign_loc_cxt
  typedef struct deparse_expr_cxt
  {
 PlannerInfo *root;  /* global planner state */
 -   RelOptInfo *foreignrel; /* the foreign relation we are 
 planning
 +   RelOptInfo *outerrel;   /* the foreign relation, or outer 
 child
 +   RelOptInfo *innerrel;   /* inner child, only set for join */
 StringInfo  buf;/* output buffer to append to 
 */
 List  **params_list;/* exprs that will become remote 
 Params
 +   ForeignScan *outerplan; /* outer child's ForeignScan node */
 +   ForeignScan *innerplan; /* inner child's ForeignScan node */
  } deparse_expr_cxt;

 However, the outerrel does not need to have double-meaning.

 RelOptInfo-reloptkind gives us information whether the target
 relation is base-relation or join-relation.
 So, foreign_expr_walker() can be implemented as follows:
 if (bms_is_member(var-varno, glob_cxt-foreignrel-relids) 
 var-varlevelsup == 0)
 {
 :
 also, deparseVar() can checks relation type using:
 if (context-foreignrel-reloptkind == RELOPT_JOINREL)
 {
 deparseJoinVar(...);

 In addition, what we need in deparse_expr_cxt are target-list of
 outer and inner relation in deparse_expr_cxt.
 How about to add inner_tlist/outer_tlist instead of innerplan and
 outerplan in deparse_expr_cxt?
 The deparseJoinVar() references these fields, but only target list.

Ah, I've totally misunderstood your suggestion.  Now I reverted my
changes and use target lists to know whether the var came from either
of the relations.


 * GetForeignJoinPath method of FDW
 --
 It should be portion of the interface patch, so I added these
 enhancement of FDW APIs with documentation updates.
 Please see the newer version of foreign/custom-join interface patch.

Agreed.

 * enable_foreignjoin parameter
 --
 I'm uncertain whether we should have this GUC parameter that affects
 to all FDW implementation. Rather than a built-in one, my preference
 is an individual GUC variable defined with DefineCustomBoolVariable(),
 by postgres_fdw.
 Pros: It offers user more flexible configuration.
 Cons: Each FDW has to implement this GUC by itself?

Hum...
In a sense, I added this GUC parameter for debugging purpose.  As you
pointed out, users might want to control join push-down feature
per-FDW.  I'd like to hear others' opinion.


 * syntax violated query if empty targetlist
 ---
 At least NULL shall be injected if no columns are referenced.
 Also, add a dummy entry to fdw_ps_tlist to fit slot tuple descriptor.

 postgres=# explain verbose select NULL from ft1,ft2 where aid=bid;
  QUERY PLAN
 ---
  Foreign Scan  (cost=100.00..129.25 rows=2925 width=0)
Output: NULL::unknown
Remote SQL: SELECT  FROM (SELECT bid, NULL FROM public.t2) l (a_0, a_1) 
 INNER
  JOIN (SELECT aid, NULL FROM public.t1) r (a_0, a_1)  ON ((r.a_0 = l.a_0))

Fixed.

 * Bug reported by Thom Brown
 -
 # EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN 
 countries ON people.country_id = countries.id LIMIT 3) x;
 ERROR:  could not open relation with OID 0

 Sorry, it was a problem caused by my portion. The patched setrefs.c
 checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
 node is associated with a certain base relation. If *_ps_tlist is
 valid, it also expects scanrelid == 0.
 However, things we should check is incorrect. We may have a case
 with empty *_ps_tlist if remote join expects no columns.
 So, I adjusted the condition to check scanrelid instead.

Is this issue fixed by v5 custom/foreign join patch?

 * make_tuple_from_result_row() call
 
 The 4th argument (newly added) is referenced without NULL checks,
 however, store_returning_result() and analyze_row_processor() put
 NULL on this argument, then it leads segmentation fault.
 RelationGetDescr(fmstate-rel or astate-rel) should be given on
 the caller.

Fixed.



Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Fujii Masao
On Tue, Mar 3, 2015 at 4:25 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 03/03/2015 01:43 AM, Andres Freund wrote:

 On 2015-03-02 15:40:27 -0800, Josh Berkus wrote:

 ! #max_wal_size = 1GB   # in logfile segments


 Independent of the rest of the changes, the in logfile segments bit
 should probably be changed.


 The base unit is still logfile segments, so if you write just
 max_wal_size = 10 without a unit, that means 160MB. That's what the
 comment means.

 Is it needed? Many settings have a similar comment, but most don't. Most of
 the ones that do have a magic value 0, -1 as default, so that it's not
 obvious from the default what the unit is. For example:

 #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
 # 0 selects the system default
 #tcp_keepalives_interval = 0# TCP_KEEPINTVL, in seconds;
 # 0 selects the system default
 #temp_file_limit = -1   # limits per-session temp file space
 # in kB, or -1 for no limit
 #commit_delay = 0   # range 0-10, in microseconds
 #log_temp_files = -1# log temporary files equal or
 larger
 # than the specified size in
 kilobytes;
 # -1 disables, 0 logs all temp files
 #statement_timeout = 0  # in milliseconds, 0 is disabled
 #lock_timeout = 0   # in milliseconds, 0 is disabled
 #wal_keep_segments = 0  # in logfile segments, 16MB each; 0 disables

 But there are also:

 #wal_receiver_timeout = 60s # time that receiver waits for
 # communication from master
 # in milliseconds; 0 disables
 #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for
 # autovacuum, in milliseconds;
 # -1 means use vacuum_cost_delay


 I propose that we remove the comment from max_wal_size, and also remove the
 in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.

Seems OK to me. BTW, we can also remove in milliseconds from
wal_sender_timeout.

 (and we should also change wal_keep_segments to accept MB/GB, as discussed
 already)

+1

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-03-03 Thread Bruce Momjian
On Sun, Feb 22, 2015 at 03:12:12PM -0500, Magnus Hagander wrote:
                  # Attempt to identify the file using magic information
                  mtype = mag.buffer(contents)
                  if mtype.startswith('text/x-diff'):
                          a.ispatch = True
                  else:
                          a.ispatch = False
...
 
 I think the old system where the patch submitter declared, this message
 contains my patch, is the only one that will work.
 
 
 
 Would you suggest removing the automated system completely, or keep it around
 and just make it possible to override it (either by removing the note that
 something is a patch, or by making something that's not listed as a patch
 become marked as such)?

One counter-idea would be to assume every attachment is a patch _unless_
the attachment type matches a pattern that identifies it as not a patch.

However, I agree with Tom that we should go a little longer before
changing it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerated column names + views are a dump hazard

2015-03-03 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
 And on the third hand ... doing that would really only be robust as long
 as you assume that the output will be read by a server using exactly the
 same FigureColname() logic as what we are using.  So maybe the whole idea
 is a bad one, and we should just bite the bullet and print AS clauses
 always.

 I think this is the way to go though. There's different extremes we can
 go to though - the easiest is to simply remove the attname = ?column?
 assignment from get_target_list(). That way plain Var references (aside
 from whole row vars/subplans and such that get_variable() deals with)
 don't get a forced alias, but everything else does.  That seems like a
 good compromise between readability and safety. get_rule_expr() deals
 with most of the nasty stuff.

I wasn't aware that there was any room for compromise on the safety
aspect.  If pg_dump gets this wrong, that means pg_upgrade is broken,
for example.  That's why I was thinking that relying on the pretty
flag might be a reasonable thing.  pg_dump would be unconditionally
right, and we'd not uglify EXPLAIN or \d+ output.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 10:58:28AM -0500, Bruce Momjian wrote:
  Would you suggest removing the automated system completely, or keep it 
  around
  and just make it possible to override it (either by removing the note that
  something is a patch, or by making something that's not listed as a patch
  become marked as such)?
 
 One counter-idea would be to assume every attachment is a patch _unless_
 the attachment type matches a pattern that identifies it as not a patch.
 
 However, I agree with Tom that we should go a little longer before
 changing it.

Also, can we look inside the attachment to see if it starts with
'diffspace'.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2015-03-03 Thread Bruce Momjian
On Tue, Oct 14, 2014 at 01:21:53PM -0400, Bruce Momjian wrote:
 On Tue, Oct 14, 2014 at 09:20:22AM -0700, Jeff Janes wrote:
  On Mon, Oct 13, 2014 at 12:11 PM, Bruce Momjian br...@momjian.us wrote:
  
  
  I looked into this, and came up with more questions.  Why is
  checkpoint_completion_target involved in the total number of WAL
  segments?  If checkpoint_completion_target is 0.5 (the default), the
  calculation is:
  
          (2 + 0.5) * checkpoint_segments + 1
  
  while if it is 0.9, it is:
  
          (2 + 0.9) * checkpoint_segments + 1
  
  Is this trying to estimate how many WAL files are going to be created
  during the checkpoint?  If so, wouldn't it be (1 +
  checkpoint_completion_target), not 2 +.  My logic is you have the old
  WAL files being checkpointed (that's the 1), plus you have new WAL
  files being created during the checkpoint, which would be
  checkpoint_completion_target * checkpoint_segments, plus one for the
  current WAL file.
  
  
  WAL is not eligible to be recycled until there have been 2 successful
  checkpoints.
  
  So at the end of a checkpoint, you have 1 cycle of WAL which has just become
  eligible for recycling,
  1 cycle of WAL which is now expendable but which is kept anyway, and
  checkpoint_completion_target worth of WAL which has occurred while the
  checkpoint was occurring and is still needed for crash recovery.
 
 OK, so based on this analysis, what is the right calculation?  This?
 
   (1 + checkpoint_completion_target) * checkpoint_segments + 1 +
   max(wal_keep_segments, checkpoint_segments)

Now that we have min_wal_size and max_wal_size in 9.5, I don't see any
value to figuring out the proper formula for backpatching.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerated column names + views are a dump hazard

2015-03-03 Thread Andres Freund
On 2015-03-03 11:09:29 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-03-03 09:39:03 -0500, Tom Lane wrote:
  I think this is the way to go though. There's different extremes we can
  go to though - the easiest is to simply remove the attname = ?column?
  assignment from get_target_list(). That way plain Var references (aside
  from whole row vars/subplans and such that get_variable() deals with)
  don't get a forced alias, but everything else does.  That seems like a
  good compromise between readability and safety. get_rule_expr() deals
  with most of the nasty stuff.
 
 I wasn't aware that there was any room for compromise on the safety
 aspect.

Meh.

I think that *not* printing an alias for a plain column reference is
reasonable and doesn't present a relevant risk even in the face of
future changes. foo.bar AS bar is just a bit too redundant imo.

 That's why I was thinking that relying on the pretty flag might be a
 reasonable thing.  pg_dump would be unconditionally right, and we'd
 not uglify EXPLAIN or \d+ output.

I don't think EXPLAIN VERBOSE currently is affected - it doesn't seem to
show aliases/generated column names at all.

I personally do occasionally copy  paste from from \d+ output, so I'd
rather have slightly more verbose output rather than wrong output.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-03-03 Thread Thom Brown
On 3 March 2015 at 12:34, Shigeru Hanada shigeru.han...@gmail.com wrote:
 I rebased join push-down patch onto Kaigai-san's Custom/Foreign Join
 v6 patch.  I posted some comments to v6 patch in this post:

 http://www.postgresql.org/message-id/CAEZqfEcNvjqq-P=jxnw1pb4t9wvpcporcn7g6cc46jgub7d...@mail.gmail.com

 Before applying my v3 patch, please apply Kaigai-san's v6 patch and my
 mod_cjv6.patch.
 Sorry for complex patch combination.  Those patches will be arranged
 soon by Kaigai-san and me.

 I fixed the issues pointed out by Thom and Kohei, but still the patch
 has an issue about joins underlying UPDATE or DELETE.  Now I'm working
 on fixing this issue.  Besides this issue, existing regression test
 passed.

Re-tested the broken query and it works for me now.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 08:38:50AM -0500, Bruce Momjian wrote:
   2015-02-24 11:47:22.861 MSK WARNING:  WAL was generated with wal_level=
  minimal, data may be missing
   2015-02-24 11:47:22.861 MSK HINT:  This happens if you temporarily set
  wal_level=minimal without taking a new base backup.
   2015-02-24 11:47:22.861 MSK FATAL:  hot standby is not possible because
  wal_level was not set to hot_standby or higher on the master server
   2015-02-24 11:47:22.861 MSK HINT:  Either set wal_level to hot_standby 
  on
  the master, or turn off hot_standby here.
   2015-02-24 11:47:22.862 MSK LOG:  startup process (PID 28093) exited with
  exit code 1
   2015-02-24 11:47:22.862 MSK LOG:  aborting startup due to startup process
  failure
 
 OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
 master set up for streaming replication when you ran pg_upgrade.  Is
 that correct?  Should I specify that specifically in the instructions?

Actually, I think you are on to something that needs to be documented. 
Because the old and new clusters might be using the same port number,
you can't configure the new master to use streaming replication because
you can't be shipping those logs to the old standby.  Yikes.  OK, I
think we need to document that you need to set wal_level=hot_standby on
the new master, but not set up streaming.  Once you are done the
upgrade, you should configure streaming.

If this fixes the problem, I will generate an updated documentation
patch.  Please let me know.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 11:38:58AM +0300, Vladimir Borodin wrote:
 No, you would not need to take a full backup if you use these 
 instructions.
 
 
 Although it would be applied to documentation for 9.5 only, are these
 instructions applicable for upgrading from 9.3.6 to 9.4.1?

Yes.  They work all the way back to 9.0.

 Following the instructions from patch I’ve got following errors in
 postgresql.log of replica after trying to start it with hot_standby = on:
 
  2015-02-24 11:47:22.861 MSK WARNING:  WAL was generated with wal_level=
 minimal, data may be missing
  2015-02-24 11:47:22.861 MSK HINT:  This happens if you temporarily set
 wal_level=minimal without taking a new base backup.
  2015-02-24 11:47:22.861 MSK FATAL:  hot standby is not possible because
 wal_level was not set to hot_standby or higher on the master server
  2015-02-24 11:47:22.861 MSK HINT:  Either set wal_level to hot_standby on
 the master, or turn off hot_standby here.
  2015-02-24 11:47:22.862 MSK LOG:  startup process (PID 28093) exited with
 exit code 1
  2015-02-24 11:47:22.862 MSK LOG:  aborting startup due to startup process
 failure

OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
master set up for streaming replication when you ran pg_upgrade.  Is
that correct?  Should I specify that specifically in the instructions?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jim Nasby

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like kill -HUP `cat .../postmaster.pid` or the equivalent.


postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost
   5440001  82345992


If we already have all this extra stuff, why not include an actual error 
message then, or at least the first line of an error (or maybe just swap 
any newlines with spaces)?

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jan de Visser
On March 3, 2015 10:29:43 AM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-03-03 15:21:24 +, Greg Stark wrote:
  Fwiw this concerns me slightly. I'm sure a lot of people are doing
  things like kill -HUP `cat .../postmaster.pid` or the equivalent.
  
  postmaster.pid already contains considerably more than just the pid. e.g.
 
 Yeah, that ship sailed long ago.  I'm sure anyone who's doing this is
 using head -1 not just cat, and what I suggest wouldn't break that.
 
   regards, tom lane

And what I've implemented doesn't either. The extra line is added to the back 
of the file.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jan de Visser
On March 3, 2015 11:09:29 AM Jim Nasby wrote:
 On 3/3/15 9:26 AM, Andres Freund wrote:
  On 2015-03-03 15:21:24 +, Greg Stark wrote:
  Fwiw this concerns me slightly. I'm sure a lot of people are doing
  things like kill -HUP `cat .../postmaster.pid` or the equivalent.
  
  postmaster.pid already contains considerably more than just the pid. e.g.
  4071
  /srv/dev/pgdev-master
  1425396089
  5440
  /tmp
  localhost
  
 5440001  82345992
 
 If we already have all this extra stuff, why not include an actual error
 message then, or at least the first line of an error (or maybe just swap
 any newlines with spaces)?

Not impossible. I can play around with that and see if it's as straightforward 
as I think it is.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-03-03 Thread Bruce Momjian
On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
 On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
  This patch decouples these three things so that they
  can changed freely -- but provides no user interface to do so.  I think
  that trying to only decouple the thing we currently have in two pieces,
  and then have a subsequent patch to decouple again, is additional
  conceptual complexity for no gain.
 
 Oh, I didn't realize there weren't commands to change the LCO.  Without
 at least SQL syntax for LCO, I don't see why we'd take it; this sounds
 more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-03-03 Thread Jim Nasby

On 3/3/15 11:23 AM, Bruce Momjian wrote:

On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so.  I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.


Oh, I didn't realize there weren't commands to change the LCO.  Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.


FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.


Wouldn't it need attno info too, so all 3 orderings?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 11:24:38AM -0600, Jim Nasby wrote:
 On 3/3/15 11:23 AM, Bruce Momjian wrote:
 On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:
 On 02/26/2015 01:54 PM, Alvaro Herrera wrote:
 This patch decouples these three things so that they
 can changed freely -- but provides no user interface to do so.  I think
 that trying to only decouple the thing we currently have in two pieces,
 and then have a subsequent patch to decouple again, is additional
 conceptual complexity for no gain.
 
 Oh, I didn't realize there weren't commands to change the LCO.  Without
 at least SQL syntax for LCO, I don't see why we'd take it; this sounds
 more like a WIP patch.
 
 FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
 columns in physical order with some logical ordering information, i.e.
 pg_upgrade cannot be passed only logical ordering from pg_dump.
 
 Wouldn't it need attno info too, so all 3 orderings?

Uh, what is the third ordering?  Physical, logical, and ?  It already
gets information about dropped columns, if that is the third one.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jim Nasby

On 3/3/15 11:15 AM, Jan de Visser wrote:

On March 3, 2015 11:09:29 AM Jim Nasby wrote:

On 3/3/15 9:26 AM, Andres Freund wrote:

On 2015-03-03 15:21:24 +, Greg Stark wrote:

Fwiw this concerns me slightly. I'm sure a lot of people are doing
things like kill -HUP `cat .../postmaster.pid` or the equivalent.


postmaster.pid already contains considerably more than just the pid. e.g.
4071
/srv/dev/pgdev-master
1425396089
5440
/tmp
localhost

5440001  82345992


If we already have all this extra stuff, why not include an actual error
message then, or at least the first line of an error (or maybe just swap
any newlines with spaces)?


Not impossible. I can play around with that and see if it's as straightforward
as I think it is.


I'm sure the code side of this is trivial; it's a question of why Tom 
was objecting. It would probably be better for us to come to a 
conclusion before working on this.


On a related note... something else we could do here would be to keep a 
last-known-good copy of the config files around. That way if you flubbed 
something at least the server would still start. I do think that any 
admin worth anything would notice an error from pg_ctl, but maybe others 
have a different opinion.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning WIP patch

2015-03-03 Thread Bruce Momjian
On Fri, Feb 27, 2015 at 09:09:35AM +0900, Amit Langote wrote:
 On 27-02-2015 AM 03:24, Andres Freund wrote:
  On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
  On 26-02-2015 AM 05:15, Josh Berkus wrote:
  I would love to have it for 9.5, but I guess the
  patch isn't nearly baked enough for that?
  
  I'm not quite sure what would qualify as baked enough for 9.5 though we
  can surely try to reach some consensus on various implementation aspects
  and perhaps even get it ready in time for 9.5.
  
  I think it's absolutely unrealistic to get this into 9.5. There's barely
  been any progress on the current (last!) commitfest - where on earth
  should the energy come to make this patch ready? And why would that be
  fair against all the others that have submitted in time?
  
 
 I realize and I apologize that it was irresponsible of me to have said
 that; maybe got a bit too excited. I do not want to unduly draw people's
 time on something that's not quite ready while there are other things
 people have worked hard on to get in time. In all earnestness, I say we
 spend time perfecting those things.
 
 I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
 that when that commitfest finally starts, we will have something worth
 considering.

I am _very_ glad you have started on this.  There is a huge need for
this, and I am certainly excited about it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Normalizing units for postgresql.conf WAS: Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Josh Berkus
On 03/03/2015 10:37 AM, Heikki Linnakangas wrote:
 On 03/03/2015 08:31 PM, Josh Berkus wrote:
 On 03/03/2015 10:29 AM, Heikki Linnakangas wrote:
 On 03/03/2015 08:21 PM, Josh Berkus wrote:
 On 03/03/2015 10:15 AM, Josh Berkus wrote:
 On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:
 I propose that we remove the comment from max_wal_size, and also
 remove
 the in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.

 +1


 Actually, let's be consistent about this.  It makes no sense to remove
 unit comments from some settings which accept ms but not others.

 Do we want to remove unit comments from all settings which accept
 MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
 this, but seems like (a) it should be universal, and (b) its own patch.

 I think it's a good rule that if the commented-out default in the sample
 file does not contain a unit, then the base unit is in the comment.
 Otherwise it's not. For example:

 #shared_buffers = 32MB# min 128kB
  # (change requires restart)

 The base unit is BLCKSZ, i.e. 8k, but usually people will usually use
 MB/GB. And that is evident from the default value, 32MB, so there's no
 need to mention it in the comment.

 #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
  # 0 selects the system default

 Here it's not obvious what the unit should be from the default itself.
 So the comment says it's in seconds.

 Sure.  Although, do we take (s) for tcp_keepalives_idle?  Or only an INT?
 
 It's a time unit, so you can say 10s or 1ms. If you don't
 specify a unit, it implies seconds.

So if we're going to make this consistent, let's make it consistent.

1. All GUCs which accept time/size units will have them on the default
setting.

2. Time/size comments will be removed, *except* from GUCs which do not
accept (ms/s/min) or (kB/MB/GB).

Argument for: the current inconsistency confuses new users and his
entirely historical in nature.

Argument Against: will create unnecessary diff changes between 9.4's
pg.conf and 9.5's pg.conf.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comparing primary/HS standby in tests

2015-03-03 Thread Josh Berkus
On 03/03/2015 07:49 AM, Andres Freund wrote:
 I'd very much like to add a automated test like this to the tree, but I
 don't see wa way to do that sanely without a comparison tool...

We could use a comparison tool anyway.  Baron Schwartz was pointing out
that Percona has a comparison tool for MySQL, and the amount of drift
and corruption that they find in a large replication cluster is
generally pretty alarming, and *always* present.  While our replication
isn't as flaky as MySQL's, networks are often lossy or corrupt.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Heikki Linnakangas

On 03/03/2015 08:21 PM, Josh Berkus wrote:

On 03/03/2015 10:15 AM, Josh Berkus wrote:

On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:

I propose that we remove the comment from max_wal_size, and also remove
the in milliseconds from wal_receiver_timeout and
autovacuum_vacuum_cost_delay.


+1



Actually, let's be consistent about this.  It makes no sense to remove
unit comments from some settings which accept ms but not others.

Do we want to remove unit comments from all settings which accept
MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
this, but seems like (a) it should be universal, and (b) its own patch.


I think it's a good rule that if the commented-out default in the sample 
file does not contain a unit, then the base unit is in the comment. 
Otherwise it's not. For example:


#shared_buffers = 32MB  # min 128kB
# (change requires restart)

The base unit is BLCKSZ, i.e. 8k, but usually people will usually use 
MB/GB. And that is evident from the default value, 32MB, so there's no 
need to mention it in the comment.


#tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
# 0 selects the system default

Here it's not obvious what the unit should be from the default itself. 
So the comment says it's in seconds.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Jim Nasby

On 3/3/15 9:08 AM, Greg Stark wrote:

On Mon, Jun 30, 2014 at 8:06 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:

After sleeping on it, I realised that the code would return '{all}' for
'all' in pg_hba.conf, but '{all}' for 'all'. So it's not exactly
ambiguous, but I don't think it's especially useful for callers.


Hm. Nope, it doesn't. It just says {all} regardless of whether all
is quoted or not.

This makes sense, the rules for when to quote things in pg_hba and
when to quote things for arrays are separate. And we definitely don't
want to start adding quotes to every token that the parser noted was
quoted because then you'll start seeing things like {\all\} and
{\database with space\} which won't make it any easier to match
things.

I'm not sure adding a separate column is really the solution either.
You can have things like all,databasename (which is the keyword all
not a database all). Or more likely something like sameuser,bob or
even things like replication,all.


What about a separate column that's just the text from pg_hba? Or is 
that what you're opposed to?


FWIW, I'd say that having the individual array elements be correct is 
more important than what the result of array_out is. That way you could 
always do array_to_string(..., ', ') and get valid pg_hba output.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Josh Berkus
On 03/03/2015 10:15 AM, Josh Berkus wrote:
 On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:
 I propose that we remove the comment from max_wal_size, and also remove
 the in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.
 
 +1
 

Actually, let's be consistent about this.  It makes no sense to remove
unit comments from some settings which accept ms but not others.

Do we want to remove unit comments from all settings which accept
MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
this, but seems like (a) it should be universal, and (b) its own patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Heikki Linnakangas

On 03/03/2015 08:31 PM, Josh Berkus wrote:

On 03/03/2015 10:29 AM, Heikki Linnakangas wrote:

On 03/03/2015 08:21 PM, Josh Berkus wrote:

On 03/03/2015 10:15 AM, Josh Berkus wrote:

On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:

I propose that we remove the comment from max_wal_size, and also remove
the in milliseconds from wal_receiver_timeout and
autovacuum_vacuum_cost_delay.


+1



Actually, let's be consistent about this.  It makes no sense to remove
unit comments from some settings which accept ms but not others.

Do we want to remove unit comments from all settings which accept
MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
this, but seems like (a) it should be universal, and (b) its own patch.


I think it's a good rule that if the commented-out default in the sample
file does not contain a unit, then the base unit is in the comment.
Otherwise it's not. For example:

#shared_buffers = 32MB# min 128kB
 # (change requires restart)

The base unit is BLCKSZ, i.e. 8k, but usually people will usually use
MB/GB. And that is evident from the default value, 32MB, so there's no
need to mention it in the comment.

#tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
 # 0 selects the system default

Here it's not obvious what the unit should be from the default itself.
So the comment says it's in seconds.


Sure.  Although, do we take (s) for tcp_keepalives_idle?  Or only an INT?


It's a time unit, so you can say 10s or 1ms. If you don't 
specify a unit, it implies seconds.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Corey Huinker
Naive question: would it be /possible/ to change configuration to accept
percentages, and have a percent mean of existing RAM at startup time?

I ask because most of the tuning guidelines I see suggest setting memory
parameters as a % of RAM available.

On Tue, Mar 3, 2015 at 1:29 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 03/03/2015 08:21 PM, Josh Berkus wrote:

 On 03/03/2015 10:15 AM, Josh Berkus wrote:

 On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:

 I propose that we remove the comment from max_wal_size, and also remove
 the in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.


 +1


 Actually, let's be consistent about this.  It makes no sense to remove
 unit comments from some settings which accept ms but not others.

 Do we want to remove unit comments from all settings which accept
 MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
 this, but seems like (a) it should be universal, and (b) its own patch.


 I think it's a good rule that if the commented-out default in the sample
 file does not contain a unit, then the base unit is in the comment.
 Otherwise it's not. For example:

 #shared_buffers = 32MB  # min 128kB
 # (change requires restart)

 The base unit is BLCKSZ, i.e. 8k, but usually people will usually use
 MB/GB. And that is evident from the default value, 32MB, so there's no need
 to mention it in the comment.

 #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
 # 0 selects the system default

 Here it's not obvious what the unit should be from the default itself. So
 the comment says it's in seconds.

 - Heikki



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Do we want to remove unit comments from all settings which accept
 MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
 this, but seems like (a) it should be universal, and (b) its own patch.

Meh.  Doing this strikes me as a serious documentation failure, since it
is no longer apparent what happens if you put in a unitless number.

Now, if we were to change the server so that it *refused* settings that
didn't have a unit, that argument would become moot.  But I'm not going
to defend the castle against the villagers who will show up if you do
that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Josh Berkus
On 03/03/2015 11:57 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Do we want to remove unit comments from all settings which accept
 MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
 this, but seems like (a) it should be universal, and (b) its own patch.
 
 Meh.  Doing this strikes me as a serious documentation failure, since it
 is no longer apparent what happens if you put in a unitless number.

Well, you know my opinion about documentation in the pg.conf file.
However, I'm not confident I'm in the majority there.

 Now, if we were to change the server so that it *refused* settings that
 didn't have a unit, that argument would become moot.  But I'm not going
 to defend the castle against the villagers who will show up if you do
 that.

No, thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Josh Berkus
On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:
 I propose that we remove the comment from max_wal_size, and also remove
 the in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for text cost model fix

2015-03-03 Thread Peter Geoghegan
On Tue, Mar 3, 2015 at 2:00 AM, Jeremy Harris j...@wizmail.org wrote:
 Yes; there seemed no advantage to any additional complexity.
 The merge consistently performs fewer comparisons than the
 quicksort, on random input - and many fewer if there are
 any sorted (or reverse-sorted) sections.  However, it also
 consistently takes slightly longer (a few percent).  I was
 unable to chase this down but assume it to be a cacheing
 effect.  So I don't currently think it should replace the
 current sort for all use.

It's definitely caching effects. That's a very important consideration here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Josh Berkus
On 03/03/2015 10:29 AM, Heikki Linnakangas wrote:
 On 03/03/2015 08:21 PM, Josh Berkus wrote:
 On 03/03/2015 10:15 AM, Josh Berkus wrote:
 On 03/02/2015 11:25 PM, Heikki Linnakangas wrote:
 I propose that we remove the comment from max_wal_size, and also remove
 the in milliseconds from wal_receiver_timeout and
 autovacuum_vacuum_cost_delay.

 +1


 Actually, let's be consistent about this.  It makes no sense to remove
 unit comments from some settings which accept ms but not others.

 Do we want to remove unit comments from all settings which accept
 MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
 this, but seems like (a) it should be universal, and (b) its own patch.
 
 I think it's a good rule that if the commented-out default in the sample
 file does not contain a unit, then the base unit is in the comment.
 Otherwise it's not. For example:
 
 #shared_buffers = 32MB# min 128kB
 # (change requires restart)
 
 The base unit is BLCKSZ, i.e. 8k, but usually people will usually use
 MB/GB. And that is evident from the default value, 32MB, so there's no
 need to mention it in the comment.
 
 #tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
 # 0 selects the system default
 
 Here it's not obvious what the unit should be from the default itself.
 So the comment says it's in seconds.

Sure.  Although, do we take (s) for tcp_keepalives_idle?  Or only an INT?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Greg Stark
On Tue, Mar 3, 2015 at 6:05 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 What about a separate column that's just the text from pg_hba? Or is that 
 what you're opposed to?

I'm not sure what you mean by that. There's a rawline field we could
put somewhere but it contains the entire line.

 FWIW, I'd say that having the individual array elements be correct is more
 important than what the result of array_out is. That way you could always do
 array_to_string(..., ', ') and get valid pg_hba output.

Well I don't think you can get that without making the view less
useful for every other purpose.

Like, I would want to be able to do WHERE user @ array[?] or WHERE
database = array[?] or to join against a list of users or databases
somewhere else.

To do what you suggest would mean the tokens will need to be quoted
based on pg_hba.conf syntax requirements. That would mean I would need
to check each variable or join value against pg_hba.conf's quoting
requirements to compare with it. It seems more practical to have that
knowledge if you're actually going to generate a pg_hba.conf than to
pass around these quoted strings all the time.

On further review I've made a few more changes attached.

I think we should change the column names to users and databases
to be clear they're lists and also to avoid the user SQL reserved
word.

I removed the dependency on strlist_to_array which is in
objectaddress.c which isn't a very sensible dependency -- it does seem
like it would be handy to have a list-based version of construct_array
moved to arrayfuncs.c but for now it's not much more work to handle
these ourselves.

I changed the options to accumulate one big array instead of an array
of bunches of options. Previously you could only end up with a
singleton array with a comma-delimited string or a two element array
with one of those and map=.

I think the error if pg_hba fails to reload needs to be LOG. It would
be too unexpected to the user who isn't necessarily the one who issued
the SIGHUP to spontaneously get a warning.

I also removed the mask from local entries and made some of the
NULLS that shouldn't be possible to happen (unknown auth method or
connection method) actually throw errors.

-- 
greg
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 7393,7398 
--- 7393,7403 
entryviews/entry
   /row
  
+  row
+   entrylink linkend=view-pg-hba-settingsstructnamepg_hba_settings/structname/link/entry
+   entryclient authentication settings/entry
+  /row
+ 
  /tbody
 /tgroup
/table
***
*** 9696,9699  SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
--- 9701,9786 
  
   /sect1
  
+  sect1 id=view-pg-hba-settings
+   titlestructnamepg_hba_settings/structname/title
+ 
+   indexterm zone=view-pg-hba-settings
+primarypg_hba_settings/primary
+   /indexterm
+ 
+   para
+The read-only structnamepg_hba_settings/structname view provides
+access to the client authentication configuration from pg_hba.conf.
+Access to this view is limited to superusers. 
+   /para
+ 
+   table
+titlestructnamepg_hba_settings/ Columns/title
+ 
+tgroup cols=3
+ thead
+  row
+   entryName/entry
+   entryType/entry
+   entryDescription/entry
+  /row
+ /thead
+ tbody
+  row
+   entrystructfieldtype/structfield/entry
+   entrytypetext/type/entry
+   entryType of connection/entry
+  /row
+  row
+   entrystructfielddatabases/structfield/entry
+   entrytypetext[]/type/entry
+   entryList of database names/entry
+  /row
+  row
+   entrystructfieldusers/structfield/entry
+   entrytypetext[]/type/entry
+   entryList of user names/entry
+  /row
+  row
+   entrystructfieldaddress/structfield/entry
+   entrytypeinet/type/entry
+   entryClient machine address/entry
+  /row
+  row
+   entrystructfieldmask/structfield/entry
+   entrytypeinet/type/entry
+   entryIP Mask/entry
+  /row
+  row
+   entrystructfieldcompare_method/structfield/entry
+   entrytypetext/type/entry
+   entryIP address comparison method/entry
+  /row
+  row
+   entrystructfieldhostname/structfield/entry
+   entrytypetext/type/entry
+   entryClient host name/entry
+  /row
+  row
+   entrystructfieldmethod/structfield/entry
+   entrytypetext/type/entry
+   entryAuthentication method/entry
+  /row
+  row
+   entrystructfieldoptions/structfield/entry
+   entrytypetext[]/type/entry
+   entryConfiguration options set for authentication method/entry
+  /row
+  row
+   entrystructfieldline_number/structfield/entry
+   entrytypeinteger/type/entry
+   entry
+Line number within client authentication configuration file 
+the current value was set at
+   /entry
+  /row
+ /tbody
+/tgroup
+   /table
+  /sect1
  /chapter

Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views

2015-03-03 Thread David Fetter
On Tue, Mar 03, 2015 at 05:49:06PM -0300, Alvaro Herrera wrote:
 Jim Nasby wrote:
 
  FWIW, what I would find most useful at this point is a way to get
  the equivalent of an AFTER STATEMENT trigger that provided all
  changed rows in a MV as the result of a statement.
 
 Ah, like
 https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com

Yes, very much like that.

Kevin, might you be able to give some guidance on this?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Corey Huinker
No intention to hijack. Dropping issue for now.

On Tue, Mar 3, 2015 at 2:05 PM, Josh Berkus j...@agliodbs.com wrote:

 On 03/03/2015 10:58 AM, Corey Huinker wrote:
  Naive question: would it be /possible/ to change configuration to accept
  percentages, and have a percent mean of existing RAM at startup time?
 
  I ask because most of the tuning guidelines I see suggest setting memory
  parameters as a % of RAM available.

 Waaay off topic for this thread.  Please don't hijack; start a new
 thread if you want to discuss this.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] NULL-pointer check and incorrect comment for pstate in addRangeTableEntry

2015-03-03 Thread Robert Haas
On Wed, Feb 25, 2015 at 7:27 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Coverity is pointing out that addRangeTableEntry contains the
 following code path that does a NULL-pointer check on pstate:
 if (pstate != NULL)
 pstate-p_rtable = lappend(pstate-p_rtable, rte);
 But pstate is dereferenced before in isLockedRefname when grabbing the
 lock mode:
 lockmode = isLockedRefname(pstate, refname) ? RowShareLock : AccessShareLock;

 Note that there is as well the following comment that is confusing on
 top of addRangeTableEntry:
  * If pstate is NULL, we just build an RTE and return it without adding it
  * to an rtable list.

 So I have looked at all the code paths calling this function and found
 first that the only potential point where pstate could be NULL is
 transformTopLevelStmt in analyze.c. One level higher there are
 parse_analyze_varparams and parse_analyze that may use pstate as NULL,
 and even one level more higher in the stack there is
 pg_analyze_and_rewrite. But well, looking at each case individually,
 in all cases we never pass NULL for the parse tree node, so I think
 that we should remove the comment on top of addRangeTableEntry, remove
 the NULL-pointer check and add an assertion as in the patch attached.

 Thoughts?

That seems to make sense to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_dump

2015-03-03 Thread Peter Eisentraut
On 3/1/15 2:17 PM, Stephen Frost wrote:
 Peter, if you have a minute, could you take a look at this thread and
 discussion of having TAP tests under src/test/modules which need to
 install an extension?  I think it's something we certainly want to
 support but I'm not sure it's a good idea to just run install in every
 directory that has a prove_check.

I don't think the way the tests are set up is scalable.  Over time, we
will surely want to test more and different extension dumping scenarios.
 We don't want to have to create a new fully built-out extension for
each of those test cases, and we're going to run out of useful names for
the extensions, too.

Moreover, I would like to have tests of pg_dump in src/bin/pg_dump/, not
in remote areas of the code.

Here's what I would do:

- set up basic scaffolding for TAP tests in src/bin/pg_dump

- write a Perl function that can create an extension on the fly, given
name, C code, SQL code

- add to the proposed t/001_dump_test.pl code to write the extension

- add that test to the pg_dump test suite

Eventually, the dump-and-restore routine could also be refactored, but
as long as we only have one test case, that can wait.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove pg_standby?

2015-03-03 Thread Robert Haas
On Mon, Mar 2, 2015 at 6:22 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Mar 3, 2015 at 3:07 AM, Josh Berkus j...@agliodbs.com wrote:
 Per document,

 --
 In fast failover, the server is brought up immediately. Any WAL files
 in the archive that have not yet been applied will be ignored, and all
 transactions in those files are lost. To trigger a fast failover,
 create a trigger file and write the word fast into it. pg_standby can
 also be configured to execute a fast failover automatically if no new
 WAL file appears within a defined interval.
 --

 I thought we had that as a 9.4 feature, actually.  Well wait ... that's
 for streaming.

 s/9.4/9.3?

 That's different from one we had in 9.3. Fast failover that pg_standby
 supports is something like the feature that I was proposing at
 http://www.postgresql.org/message-id/cahgqgwhtvydqkzaywya9zyylecakif5p0kpcpune_tsrgtf...@mail.gmail.com
 that is, the feature which allows us to give up replaying remaining
 WAL data for some reasons at the standby promotion. OTOH, fast
 failover that was supported in 9.3 enables us to skip an end-of-recovery
 checkpoint at the promotion and reduce the failover time.

Calling those things by the same name is very confusing.  The
data-losing feature ought to be called immediate failover or
something else more dangerous-sounding than fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views

2015-03-03 Thread Alvaro Herrera
Jim Nasby wrote:

 FWIW, what I would find most useful at this point is a way to get the
 equivalent of an AFTER STATEMENT trigger that provided all changed rows in a
 MV as the result of a statement.

Ah, like
https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more tests on event triggers

2015-03-03 Thread Robert Haas
On Wed, Feb 25, 2015 at 10:03 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 You can add tests in src/test/modules/dummy_seclabel.

 Patch attached to test sec label there, in addition to the other more
 standard checks in event_trigger.

These tests seem worthwhile to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-03-03 Thread Robert Haas
On Mon, Mar 2, 2015 at 5:41 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 but I'd like to have a more robust discussion about what we want the error
 reporting to look like rather than just sliding it into this patch.

 As an input, I suggest that the error reporting feature should provide a
 clue about where the issue is, including a line number and possibly the
 offending line. Not sure what else is needed.

I agree.  But I think it might be better to try to put everything
related to a single error on one line, in a consistent format, e.g.:

bad.sql:3: syntax error in set command at column 25

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CATUPDATE confusion?

2015-03-03 Thread Peter Eisentraut
On 2/28/15 6:32 PM, Stephen Frost wrote:
 This isn't really /etc/shadow though, this is more like direct access to
 the filesystem through the device node- and you'll note that Linux
 certainly has got an independent set of permissions for that called the
 capabilities system.  That's because messing with those pieces can crash
 the kernel.  You're not going to crash the kernel if you goof up
 /etc/shadow.

I think this characterization is incorrect.  The Linux capability system
does not exist because the actions are scary or can crash the kernel.
Capabilities exist because they are not attached to file system objects
and can therefore not be represented using the usual permission system.

Note that one can write directly to raw devices or the kernel memory
through various /dev and /proc files.  No capability protects against
that.  It's only the file permissions, possibly in combination with
mount options.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: raise default for max_wal_segments to 1GB

2015-03-03 Thread Jim Nasby

On 3/3/15 2:36 PM, Andrew Dunstan wrote:

On 03/03/2015 02:59 PM, Josh Berkus wrote:

On 03/03/2015 11:57 AM, Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:

Do we want to remove unit comments from all settings which accept
MB,GB or ms,s,min?  There's more than a few.  I'd be in favor of
this, but seems like (a) it should be universal, and (b) its own patch.

Meh.  Doing this strikes me as a serious documentation failure, since it
is no longer apparent what happens if you put in a unitless number.

Well, you know my opinion about documentation in the pg.conf file.
However, I'm not confident I'm in the majority there.


If we were consistent across all variables for each vartype we could 
maybe get away with this. But that's not the case. How are users 
supposed to know that statement_timeout is in ms while 
authentication_timeout is in seconds?



I think there's a difference between comments about the function of a
GUC and stating the units it's specified in. It's more than annoying to
have to go and look that up where it's not stated.


Look up the units?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi
On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 + foreach(line, parsed_hba_lines)

 In the above for loop it is better to add check_for_interrupts to
 avoid it looping
 if the parsed_hba_lines are more.

Updated patch is attached with the addition of check_for_interrupts in
the for loop.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about to have relnamespace and relrole?

2015-03-03 Thread Jim Nasby

On 3/3/15 8:04 PM, Kyotaro HORIGUCHI wrote:

Note: The OID alias types don't sctrictly comply the transaction
   isolation rules so do not use them where exact transaction
   isolation on the values of these types has a
   significance. Likewise, since they look as simple constants to
   planner so you might get slower plans than the queries joining
   the system tables correnspond to the OID types.


Might I suggest:

Note: The OID alias types do not completely follow transaction isolation 
rules. The planner also treats them as simple constants, which may 
result in sub-optimal planning.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-03-03 Thread Amit Kapila
On Fri, Dec 26, 2014 at 1:08 PM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 At 2014-09-25 15:40:11 +0530, a...@2ndquadrant.com wrote:
 
  All right, then I'll post a version that addresses Amit's other
  points, adds a new file/function to pgstattuple, acquires content
  locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all.

 Sorry, I forgot to post this patch. It does what I listed above, and
 seems to work fine (it's still quite a lot faster than pgstattuple
 in many cases).


I think one of the comment is not handled in latest patch.
5. Don't we need the handling for empty page (PageIsEmpty()) case?

I think we should have siimilar for PageIsEmpty()
as you have done for PageIsNew() in your patch.

+ stat.tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
+  stat.tuple_count);

Here for scanned tuples, you have used the live tuple counter
whereas I think we need to count HEAPTUPLE_INSERT_IN_PROGRESS
and HEAPTUPLE_DELETE_IN_PROGRESS and
HEAPTUPLE_RECENTLY_DEAD.

Refer the similar logic in Vacuum.

Although I am not in favor of using HeapTupleSatisfiesVacuum(), however
if you want to use it then lets be consistent with what Vacuum does
for estimation of tuples.


 A couple of remaining questions:

 1. I didn't change the handling of LP_DEAD items, because the way it is
now matches what pgstattuple counts. I'm open to changing it, though.
Maybe it makes sense to count LP_DEAD items iff lp_len != 0? Or just
leave it as it is? I think it doesn't matter much.

 2. I changed the code to acquire the content locks on the buffer, as
discussed, but it still uses HeapTupleSatisfiesVacuum. Amit suggested
using HeapTupleSatisfiesVisibility, but it's not clear to me why that
would be better. I welcome advice in this matter.


The reason to use HeapTupleSatisfiesVacuum() is that it helps us
in freezing and some other similar stuff which is required by
Vacuum.  Also it could be beneficial if we want take specific
action for various result codes of Vaccum. I think as this routine
mostly gives the estimated count, so it might not matter much even
if we use HeapTupleSatisfiesVacuum(), however it is better to be
consistent with pgstat_heap() in this case.
Do you have any reason for using HeapTupleSatisfiesVacuum() rather
than what is used in pgstat_heap()?

(If anything, I should use HeapTupleIsSurelyDead, which doesn't set
any hint bits, but which I earlier rejected as too conservative in
its dead/not-dead decisions for this purpose.)

 (I've actually patched the pgstattuple.sgml docs as well, but I want to
 re-read that to make sure it's up to date, and didn't want to wait to
 post the code changes.)


Sure, post the same when it is ready.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Michael Paquier
On Wed, Mar 4, 2015 at 12:41 AM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached updated patch with WAL replay error fixed. The patch 
 follows chunk ID approach of xlog format.

(Review done independently of the chunk_id stuff being good or not,
already gave my opinion on the matter).

  * readRecordBufSize is set to the new buffer size.
- *
+
The patch has some noise diffs.

You may want to change the values of BKPBLOCK_WILL_INIT and
BKPBLOCK_SAME_REL to respectively 0x01 and 0x02.

+   uint8   chunk_id = 0;
+   chunk_id |= XLR_CHUNK_BLOCK_REFERENCE;

Why not simply that:
chunk_id = XLR_CHUNK_BLOCK_REFERENCE;

+#define XLR_CHUNK_ID_DATA_SHORT255
+#define XLR_CHUNK_ID_DATA_LONG 254
Why aren't those just using one bit as well? This seems inconsistent
with the rest.

+ if ((blk-with_hole == 0  blk-hole_offset != 0) ||
+ (blk-with_hole == 1  blk-hole_offset = 0))
In xlogreader.c blk-with_hole is defined as a boolean but compared
with an integer, could you remove the ==0 and ==1 portions for
clarity?

-   goto err;
+   goto err;
}
}
-
if (remaining != datatotal)
This gathers incorrect code alignment and unnecessary diffs.

 typedef struct XLogRecordBlockHeader
 {
+   /* Chunk ID precedes */
+
uint8   id;
What prevents the declaration of chunk_id as an int8 here instead of
this comment? This is confusing.

 Following are brief measurement numbers.
   WAL
 FPW compression on   122.032 MB
 FPW compression off   155.239 MB
 HEAD   155.236 MB

What is the test run in this case? How many block images have been
generated in WAL for each case? You could gather some of those numbers
with pg_xlogdump --stat for example.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add LINE: hint when schemaname.typename is a non-existent schema

2015-03-03 Thread Jeevan Chalke
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Tom suggested few changes already which I too think author needs to address.
So marking it Waiting on Author.

However, I see following, example does not work well.

postgres=# create or replace function
f1(a abc.test.id%type) returns int as
$$ select 1; $$
language sql;
ERROR:  schema abc does not exist

Is that expected?
I guess we need it at all places in parse_*.c where we will look for namespace.
Please fix.


Also, like Tom's suggestion on make_oper_cache_key, can we push down this
inside func_get_detail() as well, just to limit it for namespace lookup?

However, patch is not getting applied cleanly on latest sources. Need rebase.

 On Tom comments on parse_utilcmd.c:
I guess the block is moved after the pstate and CreateStmtContext are setup
properly.  I guess, we can move just after pstate setup, so that it will
result into minimal changes?

Can we have small test-case? Or will it be too much for this feature?

The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Install shared libs in lib/ and bin/ with MSVC (Was: install libpq.dll in bin directory on Windows / Cygwin)

2015-03-03 Thread Michael Paquier
On Tue, Mar 3, 2015 at 8:36 PM, Asif Naeem anaeem...@gmail.com wrote:
 Thank you Michael. I have looked the patch.

Thanks for the review!

 Overall logic looks good to me,
 I have checked it with MSVC{2013,2008}. It works for MSVC 2013 but fail for
 MSVC 2008, I think the condition if ($proj =~
 qr{ResourceCompile\s*Include=([^]+)}) is not going to work for MSVC2008
 and MSVC2005 i.e.
 [...]

Thanks for the details, my patch is definitely missing vcproj entries.
Note that I don't have yet an environment with MSVC 2008 or similar, I
just got 2010 on my box for now. So you will have to wait until I have
a fresh environment to get an updated patch...

 It seems that search criteria can be narrowed to skip reading related
 Makefile for SO_MAJOR_VERSION string when VS project file is related to
 StaticLibrary or Application.

Well, agreed, and the patch takes care of that for vcxproj files by
only installing shared libraries if they use DynamicLibrary. Perhaps
you have in mind a better logic?

 Although this patch is going to make MSVC
 build consistent with Cygwin and MinGW build, following files seems
 redundant now, is there any use for them other than backward compatibility ?
 i.e.
 inst\lib\libpq.dll
 inst\lib\libpgtypes.dll
 inst\lib\libecpg_compat.dll
 inst\lib\libecpg.dll

Indeed, I haven't noticed them. But we can simply remove them as they
are installed in bin/ as well with this patch, it seems better for
consistency with MinGW and Cygwin in any case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-03-03 Thread Albe Laurenz
Etsuro Fujita wrote:
 While updating the patch, I noticed that in the previous patch, there is
 a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
 for such queries fail with a can't-happen error.  I fixed the bug and
 tried to add the regression tests that execute the generic plans, but I
 couldn't because I can't figure out how to force generic plans.  Is
 there any way to do that?

I don't know about a way to force it, but if you run the statement six
times, it will probably switch to a generic plan.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Sat, Feb 28, 2015 at 12:27 AM, Stephen Frost sfr...@snowman.net wrote:
  While this generally works, the usual expectation is that functions
  that should be superuser-only have a check in the function rather than
  depending on the execute privilege.  I'm certainly happy to debate the
  merits of that approach, but for the purposes of this patch, I'd suggest
  you stick an if (!superuser()) ereport(must be superuser) into the
  function itself and not work about setting the correct permissions on
  it.
 
 -1.  If that policy exists at all, it's a BAD policy, because it
 prevents users from changing the permissions using DDL.  I think the
 superuser check should be inside the function, when, for example, it
 masks some of the output data depending on the user's permissions.
 But I see little virtue in handicapping an attempt by a superuser to
 grant SELECT rights on pg_file_settings.

It's not a documented policy but it's certainly what a whole slew of
functions *do*.  Including pg_start_backup, pg_stop_backup,
pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point,
pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume,
pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir.

Indeed, it's the larger portion of what the additional role attributes
are for.  I'm not entirely thrilled to hear that nearly all of that work
might be moot, but if that's the case then so be it and let's just
remove all those superuser checks and instead revoke EXECUTE from public
and let the superuser grant out those rights.

As for pg_stat_get_wal_senders, pg_stat_get_activity, and proably some
others, column-level privileges and/or RLS policies might be able to
provide what we want there (or possibly not), but it's something to
consider if we want to go this route.

For pg_signal_backend, we could revoke direct EXECUTE permissions on it
and then keep just the check that says only superusers can signal
superuser initiated processes, and then a superuser could grant EXECUTE
rights on it directly for users they want to have that access.  We could
possibly also create new helper functions for pg_terminate and
pg_cancel.

The discussion I'm having with Peter on another thread is a very similar
case that should be looping in, which is if we should continue to have
any superuser check on updating catalog tables.  He is advocating that
we remove that also and let the superuser GRANT modification access on
catalog tables to users.

For my part, I understood that we specifically didn't want to allow that
for the same reason that we didn't want to simply depend on the GRANT
system for the above functions, but everyone else on these discussions
so far is advocating for using the GRANT system.  My memory might be
wrong, but I could have sworn that I had brought up exactly that
suggestion in years past only to have it shot down.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 3/3/15 11:48 AM, Andres Freund wrote:
 It'll be confusing to have different interfaces in one/multiple error cases.

 If we simply don't want the code complexity then fine, but I just don't 
 buy this argument. How could it possibly be confusing?

What I'm concerned about is confusing the code.  There is a lot of stuff
that looks at pidfiles and a lot of it is not very bright (note upthread
argument about cat vs head -1).  I don't want possibly localized
(non-ASCII) text in there, especially when there's not going to be any
sane way to know which encoding it's in.  And I definitely don't want
multiline error messages in there.

It's possible we could dumb things down enough to meet these restrictions,
but I'd really rather not go there at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jan de Visser
On March 3, 2015 04:57:58 PM Jim Nasby wrote:
 On 3/3/15 11:48 AM, Andres Freund wrote:
   I'm saying that you'll need a way to notice that a reload was processed
   or not. And that can't really be the message itself, there has to be
   some other field; like the timestamp Tom proposes.
 
 Ahh, right. We should make sure we don't go brain-dead if the system 
 clock moves backwards. I assume we couldn't just fstat the file...

The timestamp field is already there (in my patch). It gets populated when the 
server starts and repopulated by SIGHUP_handler. It's the only timestamp 
pg_ctl pays attention to.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Stephen Frost
Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 3/3/15 5:22 PM, Stephen Frost wrote:
 The
 problem with the role attribute approach is that they aren't inheirted
 the way GRANTs are, which means you can't have a backup role that is
 then granted out to users, you'd have to set a BACKUP role attribute
 for every role added.
 
 Yeah, but you'd still have to grant backup to every role created
 anyway, right?

Yes, you would.

 Or you could create a role that has the backup attribute and then
 grant that to users. Then they'd have to intentionally SET ROLE
 my_backup_role to elevate their privilege. That seems like a safer
 way to do things...

This is already possible with the GRANT system- create a 'noinherit'
role instead of an 'inherit' role.  I agree it's safer to require a
'SET ROLE' and configure all of my systems with a noinherit 'admin'
role.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Combining Aggregates

2015-03-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Feb 24, 2015 at 2:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think the combine function is not actually a property of the
 aggregate, but a property of the transition function.  If two aggregates
 have the same transition function, they will also have the same combine
 function.  The combine function really just says, how do you combine two
 series of these function calls.  Say maybe this should be put into
 pg_proc instead.  (Or you make the transition functions transition
 operators and put the info into pg_operator instead, which is where
 function call optimization information is usually kept.)

 This seems like a weird design to me.  It's probably true that if the
 transition function is the same, the state-combiner function will also
 be the same.  But the state-combiner function is only going to exist
 for aggregate transition functions, not functions or operators in
 general.  So linking it from pg_proc or pg_operator feels wrong to me.

FWIW, I agree with Robert.  It's better to keep this in pg_aggregate.
We don't need yet another mostly-empty column in pg_proc; and as for
pg_operator, there is no expectation that an aggregate transition function
is in pg_operator.

Also, I don't think it's a foregone conclusion that same transition
function implies same combiner function: that logic falls apart when
you consider that one of them might be polymorphic and the other not.
(Admittedly, that probably means the aggregate creator is missing a bet;
but we should not design the catalog schema in a way that says that only
maximally efficient aggregate designs are allowed.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configurable location for extension .control files

2015-03-03 Thread Oskari Saarenmaa
18.02.2015, 01:49, Jim Nasby kirjoitti:
 On 2/17/15 4:39 PM, Oskari Saarenmaa wrote:
 10.06.2013, 17:51, Dimitri Fontaine kirjoitti:
 Andres Freund and...@2ndquadrant.com writes:
 In any case, no packager is going to ship an insecure-by-default
 configuration, which is what Dimitri seems to be fantasizing would
 happen.  It would have to be local option to relax the permissions
 on the directory, no matter where it is.

 *I* don't want that at all. All I'd like to have is a postgresql.conf
   option specifying additional locations.

 Same from me. I think I would even take non-plural location.

 Here's a patch to allow overriding extension installation directory.
 The patch allows superusers to change it at runtime, but we could also
 restrict it to postgresql.conf if needed.  I don't really see a point in
 restricting it (or not implementing the option at all) as the superuser
 can already load custom C code and sql from anywhere in the filesystem;
 not having configurable extension directory just makes it harder to test
 extensions and to create private clusters of PG data in a custom
 location without using custom binaries.
 
 I'm interested in this because it could potentially make it possible to
 install SQL extensions without OS access. (My understanding is there's
 some issue with writing the extension files even if LIBDIR is writable
 by the OS account).

I'm not sure this patch makes extensions without OS access any easier to
implement; you still need to write the files somewhere, and someone
needs to set up the cluster with a nonstandard extension path.

 I don't think anyone should be packaging and shipping PG with
 extension_directory set, but we really should allow it for superusers
 and postgresql.conf so people can test extensions without hacks like
 this: https://github.com/ohmu/pgmemcache/blob/master/localtests.sh#L23
 
 FWIW I'd like to see is breaking the cluster setup/teardown capability
 in pg_regress into it's own tool. It wouldn't solve the extension test
 problem, but it would eliminate the need for most of what that script is
 doing, and it would do it more robustly. It would make it very easy to
 unit test things with frameworks other than pg_regress.

This would certainly be useful.  I can try to do something about it if
we get configurable extension path supported.  The patch is now in
https://commitfest.postgresql.org/5/170/

/ Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-03-03 Thread Etsuro Fujita
On 2015/02/16 12:03, Etsuro Fujita wrote:
 I'll update the patch.

While updating the patch, I noticed that in the previous patch, there is
a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
for such queries fail with a can't-happen error.  I fixed the bug and
tried to add the regression tests that execute the generic plans, but I
couldn't because I can't figure out how to force generic plans.  Is
there any way to do that?

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-03-03 Thread Amit Kapila
On Mon, Feb 23, 2015 at 7:11 AM, Tomas Vondra tomas.von...@2ndquadrant.com
wrote:
 On 28.1.2015 05:03, Abhijit Menon-Sen wrote:
  At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote:
 
 Otherwise, the code looks OK to me. Now, there are a few features I'd
 like to have for production use (to minimize the impact):

 1) no index support :-(

I'd like to see support for more relation types (at least btree
indexes). Are there any plans for that? Do we have an idea on how to
compute that?

 2) sampling just a portion of the table

For example, being able to sample just 5% of blocks, making it less
obtrusive, especially on huge tables. Interestingly, there's a
TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
of the methods (e.g. functions behind SYSTEM sampling)?

 3) throttling

Another feature minimizing impact of running this on production might
be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
or something along those lines.


I think these features could be done separately if anybody is interested.
The patch in its proposed form seems useful to me.

 4) prefetch

fbstat_heap is using visibility map to skip fully-visible pages,
which is nice, but if we skip too many pages it breaks readahead
similarly to bitmap heap scan. I believe this is another place where
effective_io_concurrency (i.e. prefetch) would be appropriate.


Good point.  We can even think of using the technique used by Vacuum
which is skip only when we can skip atleast SKIP_PAGES_THRESHOLD
pages.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Greg Stark
On Tue, Mar 3, 2015 at 10:29 PM, Stephen Frost sfr...@snowman.net wrote:
 -1.  If that policy exists at all, it's a BAD policy, because it
 prevents users from changing the permissions using DDL.  I think the
 superuser check should be inside the function, when, for example, it
 masks some of the output data depending on the user's permissions.
 But I see little virtue in handicapping an attempt by a superuser to
 grant SELECT rights on pg_file_settings.

 It's not a documented policy but it's certainly what a whole slew of
 functions *do*.  Including pg_start_backup, pg_stop_backup,
 pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point,
 pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume,
 pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir.

And the same issue comes up for the pg_hba_settings patch.

Fwiw it's not entirely true that users can't change the permissions on
these functions via DDL -- it's just a lot harder. They have to create
a security-definer wrapper function and then grant access to that
function to who they want.

I'm generally of the opinion we should use the GRANT system and not
hard-wire things but I also see the concern that it's really easy to
grant access to something without realizing all the consequences. It's
a lot harder to audit your system to be sure nothing inappropriate has
been granted which can be escalated to super-user access. It's not
clear how to determine what the set of things are that could do that.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

2015-03-03 Thread Robert Haas
On Fri, Feb 20, 2015 at 4:01 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Feb 20, 2015 at 11:58 AM, Tomas Vondra
 tomas.von...@2ndquadrant.com wrote:
 This seems to happen because ordered_set_startup() calls
 tuplesort_begin_datum() when (use_tuples == true), which only sets
 'onlyKey' and leaves (sortKeys == NULL). So 'mergeruns' fails because it
 does not expect that.

 Oops. You're right. Attached patch fixes the issue, by making
 tuplesort_begin_datum() consistent with other comparable routines for
 other tuple cases. I think it makes sense to have the 'sortKeys' field
 always set, and the 'onlyKey' field also set when that additional
 optimization makes sense. That was what I understood the API to be, so
 arguably this is a pre-existing issue with tuplesort_begin_datum().

This doesn't completely fix the issue.  Even with this patch applied:

CREATE TABLE stuff AS SELECT random()::text AS randtext FROM
generate_series(1,5000);
set maintenance_work_mem='1024';
create index on stuff using hash (randtext);
...wait for a bit, server crash...

I find your statement that this is a pre-existing issue in
tuplesort_begin_datum() to be pretty misleading, unless what you mean
by it is pre-existing since November, when an earlier patch by Peter
Geoghegan changed the comments without changing the code to match.
Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments
for the sortKey field to say that it would be set in all cases except
for the hash index case, but it did not make that statement true.
Subsequently, another of your patches, which I committed as
5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on
state-sortKeys always being set.  Now you've got this patch here,
which you claim makes sure that sortKeys is always set, but it
actually doesn't, which is why the above still crashes.  There are not
so many tuplesort_begin_xxx routines that you couldn't audit them all
before submitting your patches.

Anyway, I propose the attached instead, which makes both Tomas's test
case and the one above pass.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


tuplesort-fixes.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-03 Thread Robert Haas
On Tue, Feb 24, 2015 at 2:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/15 3:09 PM, Tomas Vondra wrote:
 The 'combine' function gets two such 'state' values, while transition
 gets 'state' + next value.

 I think the combine function is not actually a property of the
 aggregate, but a property of the transition function.  If two aggregates
 have the same transition function, they will also have the same combine
 function.  The combine function really just says, how do you combine two
 series of these function calls.  Say maybe this should be put into
 pg_proc instead.  (Or you make the transition functions transition
 operators and put the info into pg_operator instead, which is where
 function call optimization information is usually kept.)

This seems like a weird design to me.  It's probably true that if the
transition function is the same, the state-combiner function will also
be the same.  But the state-combiner function is only going to exist
for aggregate transition functions, not functions or operators in
general.  So linking it from pg_proc or pg_operator feels wrong to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NULL-pointer check and incorrect comment for pstate in addRangeTableEntry

2015-03-03 Thread Michael Paquier
On Wed, Mar 4, 2015 at 6:43 AM, Robert Haas wrote:
 That seems to make sense to me.  Committed.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jim Nasby

On 3/3/15 5:13 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

On 3/3/15 11:48 AM, Andres Freund wrote:

It'll be confusing to have different interfaces in one/multiple error cases.



If we simply don't want the code complexity then fine, but I just don't
buy this argument. How could it possibly be confusing?


What I'm concerned about is confusing the code.  There is a lot of stuff
that looks at pidfiles and a lot of it is not very bright (note upthread
argument about cat vs head -1).  I don't want possibly localized
(non-ASCII) text in there, especially when there's not going to be any
sane way to know which encoding it's in.  And I definitely don't want
multiline error messages in there.


Definitely no multi-line. If we keep that restriction, couldn't we just 
dedicate one entire line to the error message? ISTM that would be safe.



It's possible we could dumb things down enough to meet these restrictions,
but I'd really rather not go there at all.


IMHO the added DBA convenience would be worth it (assuming we can make 
it safe). I know I'd certainly appreciate it...


On 3/3/15 5:24 PM, Jan de Visser wrote: On March 3, 2015 04:57:58 PM 
Jim Nasby wrote:

 On 3/3/15 11:48 AM, Andres Freund wrote:
   I'm saying that you'll need a way to notice that a reload was 
processed

or not. And that can't really be the message itself, there has to be
some other field; like the timestamp Tom proposes.

 Ahh, right. We should make sure we don't go brain-dead if the system
 clock moves backwards. I assume we couldn't just fstat the file...

 The timestamp field is already there (in my patch). It gets populated 
when the

 server starts and repopulated by SIGHUP_handler. It's the only timestamp
 pg_ctl pays attention to.

What happens if someone does a reload sometime in the future (perhaps 
because they've messed with the system clock for test purposes). Do we 
still detect a new reload?

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Greg Stark
On Wed, Mar 4, 2015 at 12:17 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 I can make these changes if you want.

Personally I'm just not convinced this is worth it. It makes the
catalogs harder for people to read and use and only benefits people
who have users named all or databases named all, sameuser, or
samerole which I can't really imagine would be anyone.

If this were going to be the infrastructure on which lots of tools
rested rather than just a read-only view that was mostly going to be
read by humans that might be different. Are you envisioning a tool
that would look at this view, offer a gui for users to make changes
based on that information, and build a new pg_hba.conf to replace it?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Greg Stark
On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 Out of curiosity, regarding the result materialize code addition, Any
 way the caller of hba_settings function
 ExecMakeTableFunctionResult also stores the results in tuple_store.
 Is there any advantage
 doing it in hba_settings function rather than in the caller?

That might protect against concurrent pg_hba reloads, though I suspect
there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could
maybe put one in this loop and check if the parser memory context has
been reset.

But the immediate problem is that having the API that looked up a line
by line number and then calling it repeatedly with successive line
numbers was O(n^2). Each time it was called it had to walk down the
list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or
O(n^2). This isn't performance critical but it would not have run in a
reasonable length of time for large pg_hba files.

And having to store the state between calls means the code is at least
as simple for the tuplestore, imho even simpler.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Obsolete SnapshotNow reference within snapbuild.c

2015-03-03 Thread Peter Geoghegan
SnapBuildCommitTxn() has what I gather is an obsolete reference to
SnapshotNow(). Attached patch corrects this.

-- 
Peter Geoghegan
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e911453..ff5ff26 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1077,7 +1077,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/* refcount of the snapshot builder for the new snapshot */
 		SnapBuildSnapIncRefcount(builder-snapshot);
 
-		/* add a new SnapshotNow to all currently running transactions */
+		/* add a new Snapshot to all currently running transactions */
 		SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
 	}
 	else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 It's not a documented policy but it's certainly what a whole slew of
 functions *do*.  Including pg_start_backup, pg_stop_backup,
 pg_switch_xlog, pg_rotate_logfile, pg_create_restore_point,
 pg_xlog_replay_pause, lo_import, lo_export, and pg_xlog_replay_resume,
 pg_read_file, pg_read_text_file, pg_read_binary_file, and pg_ls_dir.

 Indeed, it's the larger portion of what the additional role attributes
 are for.  I'm not entirely thrilled to hear that nearly all of that work
 might be moot, but if that's the case then so be it and let's just
 remove all those superuser checks and instead revoke EXECUTE from public
 and let the superuser grant out those rights.

It does seem like that could be an easier and more flexible answer than
inventing a pile of role attributes.

 The discussion I'm having with Peter on another thread is a very similar
 case that should be looping in, which is if we should continue to have
 any superuser check on updating catalog tables.  He is advocating that
 we remove that also and let the superuser GRANT modification access on
 catalog tables to users.

 For my part, I understood that we specifically didn't want to allow that
 for the same reason that we didn't want to simply depend on the GRANT
 system for the above functions, but everyone else on these discussions
 so far is advocating for using the GRANT system.  My memory might be
 wrong, but I could have sworn that I had brought up exactly that
 suggestion in years past only to have it shot down.

I'm a bit less comfortable with this one, mainly because the ability to
modify the system catalogs directly implies the ability to crash, corrupt,
or hijack the database in any number of non-obvious ways.  I would go so
far as to argue that if you grant me modify permissions on many of the
catalogs, I would have the ability to become superuser outright, and
therefore any notion that such a grant is any weaker than granting full
superuserness is a dangerous lie.

It may be that the same type of permissions escalation is possible with
some of the functions you mention above; but anywhere that it isn't,
I should think GRANT on the function is a reasonable solution.

One aspect of this that merits some thought is that in some cases
access to some set of functions is best granted as a unit.  That's
easy with role properties but much less so with plain GRANT.
Do we have enough such cases to make it an issue?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

2015-03-03 Thread Peter Geoghegan
On Tue, Mar 3, 2015 at 3:53 PM, Robert Haas robertmh...@gmail.com wrote:
 I find your statement that this is a pre-existing issue in
 tuplesort_begin_datum() to be pretty misleading, unless what you mean
 by it is pre-existing since November, when an earlier patch by Peter
 Geoghegan changed the comments without changing the code to match.
 Commit 5ea86e6e65dd2da3e9a3464484985d48328e7fe3, changed the comments
 for the sortKey field to say that it would be set in all cases except
 for the hash index case, but it did not make that statement true.

Agreed. I believe that is in fact what I meant. I'm not trying to
deflect blame - just to explain my understanding of the problem.

 Subsequently, another of your patches, which I committed as
 5cefbf5a6c4466ac6b1cc2a4316b4eba9108c802, added a dependency on
 state-sortKeys always being set.  Now you've got this patch here,
 which you claim makes sure that sortKeys is always set, but it
 actually doesn't, which is why the above still crashes.  There are not
 so many tuplesort_begin_xxx routines that you couldn't audit them all
 before submitting your patches.

Sorry. I should have caught the hash index case too.

 Anyway, I propose the attached instead, which makes both Tomas's test
 case and the one above pass.

My patch actually matches Andrew Gierth's datumsort patch, in that it
also uses this convention, as I believe it should. For that reason,
I'd prefer to make the comment added in November true, rather than
changing the comment...I feel it ought to be true anyway (which is
what I meant about a pre-existing issue). You'll still need the
defenses you've added for the the hash case, of course. You might want
to also put a comment specifying why it's necessary, since the hash
tuple case is the oddball cases that doesn't always have sortKeys
set.

In other words, I suggest that you commit the union of our two
patches, less your changes to the comments around sortKeys'.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Jim Nasby

On 3/3/15 5:22 PM, Stephen Frost wrote:

The
problem with the role attribute approach is that they aren't inheirted
the way GRANTs are, which means you can't have a backup role that is
then granted out to users, you'd have to set a BACKUP role attribute
for every role added.


Yeah, but you'd still have to grant backup to every role created 
anyway, right?


Or you could create a role that has the backup attribute and then grant 
that to users. Then they'd have to intentionally SET ROLE my_backup_role 
to elevate their privilege. That seems like a safer way to do things...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column ordering

2015-03-03 Thread Bruce Momjian
On Tue, Mar  3, 2015 at 11:41:20AM -0600, Jim Nasby wrote:
 Wouldn't it need attno info too, so all 3 orderings?
 
 Uh, what is the third ordering?  Physical, logical, and ?  It already
 gets information about dropped columns, if that is the third one.
 
 attnum; used in other catalogs to reference columns.
 
 If you're shuffling everything though pg_dump anyway then maybe it's
 not needed...

Yes, all those attno system table links are done with pg_dump SQL
commands.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Josh Berkus
On 03/03/2015 05:07 PM, Greg Stark wrote:
 On Wed, Mar 4, 2015 at 12:17 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 I can make these changes if you want.
 
 Personally I'm just not convinced this is worth it. It makes the
 catalogs harder for people to read and use and only benefits people
 who have users named all or databases named all, sameuser, or
 samerole which I can't really imagine would be anyone.
 
 If this were going to be the infrastructure on which lots of tools
 rested rather than just a read-only view that was mostly going to be
 read by humans that might be different. Are you envisioning a tool
 that would look at this view, offer a gui for users to make changes
 based on that information, and build a new pg_hba.conf to replace it?

I'm planning to write such a tool.  However, I am not concerned about
weird name cases like the above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi
On Wed, Mar 4, 2015 at 12:18 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 Out of curiosity, regarding the result materialize code addition, Any
 way the caller of hba_settings function
 ExecMakeTableFunctionResult also stores the results in tuple_store.
 Is there any advantage
 doing it in hba_settings function rather than in the caller?

 That might protect against concurrent pg_hba reloads, though I suspect
 there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could
 maybe put one in this loop and check if the parser memory context has
 been reset.

I feel there is no problem of current pg_hba reloads, because the
check_for_interrupts
doesn't reload the conf files. It will be done in the postgresMain
function once the
query finishes. Am I missing something?

+ foreach(line, parsed_hba_lines)

In the above for loop it is better to add check_for_interrupts to
avoid it looping
if the parsed_hba_lines are more.

 But the immediate problem is that having the API that looked up a line
 by line number and then calling it repeatedly with successive line
 numbers was O(n^2). Each time it was called it had to walk down the
 list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or
 O(n^2). This isn't performance critical but it would not have run in a
 reasonable length of time for large pg_hba files.

 And having to store the state between calls means the code is at least
 as simple for the tuplestore, imho even simpler.

Got it. Thanks.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: GSoC - Query Rewrite with Materialized Views

2015-03-03 Thread Jim Nasby

On 3/3/15 3:34 PM, David Fetter wrote:

On Tue, Mar 03, 2015 at 05:49:06PM -0300, Alvaro Herrera wrote:

Jim Nasby wrote:


FWIW, what I would find most useful at this point is a way to get
the equivalent of an AFTER STATEMENT trigger that provided all
changed rows in a MV as the result of a statement.


Ah, like
https://www.postgresql.org/message-id/1402790204.65037.YahooMailNeo%40web122301.mail.ne1.yahoo.com


Yes, very much like that.


Actually, I was talking about the next step beyond that. I don't want 
what changed in a single table; I want what changed *in the source of 
the entire MV*. Kevin has a whitepaper that describes how to do this in 
set notation; theoretically this is a matter of converting that to SQL. 
IIRC this needs the deltas and current (or maybe NEW and OLD) for every 
table in the MV. So one way you could model this is a function that 
accepts a bunch of NEW and OLD recordsets.


Theoretically you could actually drive that with per-row triggers, but 
the performance would presumably suck. Next best thing would be 
providing NEW and OLD for AFTER STATEMENT triggers (what Kevin was 
talking about in that email). Though, if you're driving this at a 
statement level that means you can't actually reference the MV in a 
statement that's performing DML on any of the dependent tables.


As you can see, this is all pretty involved. Doing just a small part of 
this would make for a good GSoC project. AFTER STATEMENT NEW and OLD 
might be a good project; I don't know how much more work Kevin's stuff 
needs.  But there's much greater value in creating something that would 
take the definition for a MV and turn that into appropriate delta logic. 
That would be the basis for detecting if a MV was stale (beyond just the 
gross level check of were any of the tables involved touched), and is 
what is needed to do *any* kind of incremental update.


That logic doesn't have to be driven by triggers. For example, you could 
have PgQ or similar capture all DML on all tables for a MV and feed that 
data to the delta logic on an async incremental basis. It's pretty easy 
for an end user to setup PgQ or similar but doing the delta logic is 
tightly coupled to the MV definition, which would be very hard for an 
end user to deal with.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Jim Nasby

On 3/3/15 12:57 PM, Greg Stark wrote:

On Tue, Mar 3, 2015 at 6:05 PM, Jim Nasby jim.na...@bluetreble.com wrote:


What about a separate column that's just the text from pg_hba? Or is that what 
you're opposed to?


I'm not sure what you mean by that. There's a rawline field we could
put somewhere but it contains the entire line.


I mean have a field for each of user/databases that gives you valid 
pg_hba.conf output. That would allow for cut  paste. But perhaps that's 
just not a use case.



FWIW, I'd say that having the individual array elements be correct is more
important than what the result of array_out is. That way you could always do
array_to_string(..., ', ') and get valid pg_hba output.


Well I don't think you can get that without making the view less
useful for every other purpose.

Like, I would want to be able to do WHERE user @ array[?] or WHERE
database = array[?] or to join against a list of users or databases
somewhere else.


I think we're screwed in that regard anyway, because of the special 
constructs. You'd need different logic to handle things like +role and 
sameuser. We might even end up painted in a corner where we can't change 
it in the future because it'll break everyone's scripts.



To do what you suggest would mean the tokens will need to be quoted
based on pg_hba.conf syntax requirements. That would mean I would need
to check each variable or join value against pg_hba.conf's quoting
requirements to compare with it. It seems more practical to have that
knowledge if you're actually going to generate a pg_hba.conf than to
pass around these quoted strings all the time.


What about this:

- database_specials enum[] contains all occurrences of all, sameuser, 
samerole and replication (or maybe it doesn't need to be an array)
- in_roles name[] is a list of all cases of +role_name, with the + 
stripped off (I think the @ case is already handled before the SRF is 
called??)

- role_specials enum[] handles all (enum[] for any future expansion)

Alternatively in_roles could just be combined with role_specials as long 
as we preserve the +.


Any tokens that match the special conditions do not show up in 
databases/users, and those fields become name[]. AFAIK that means the 
quoting should be identical (at least looking at check_db() and 
check_role()).


I can make these changes if you want.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi
On Wed, Mar 4, 2015 at 5:57 AM, Greg Stark st...@mit.edu wrote:
 On further review I've made a few more changes attached.

 I think we should change the column names to users and databases
 to be clear they're lists and also to avoid the user SQL reserved
 word.

 I removed the dependency on strlist_to_array which is in
 objectaddress.c which isn't a very sensible dependency -- it does seem
 like it would be handy to have a list-based version of construct_array
 moved to arrayfuncs.c but for now it's not much more work to handle
 these ourselves.

 I changed the options to accumulate one big array instead of an array
 of bunches of options. Previously you could only end up with a
 singleton array with a comma-delimited string or a two element array
 with one of those and map=.

The changes are fine, this eliminates the unnecessary dependency on
objectaddress.

 I think the error if pg_hba fails to reload needs to be LOG. It would
 be too unexpected to the user who isn't necessarily the one who issued
 the SIGHUP to spontaneously get a warning.

 I also removed the mask from local entries and made some of the
 NULLS that shouldn't be possible to happen (unknown auth method or
 connection method) actually throw errors.

changes are fine. All the tests are passed.

Out of curiosity, regarding the result materialize code addition, Any
way the caller of hba_settings function
ExecMakeTableFunctionResult also stores the results in tuple_store.
Is there any advantage
doing it in hba_settings function rather than in the caller?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-03 Thread Jim Nasby

On 3/3/15 11:48 AM, Andres Freund wrote:

On 2015-03-03 11:43:46 -0600, Jim Nasby wrote:

It's certainly better than now, but why put DBAs through an extra step for
no reason?

Because it makes it more complicated than it already is? It's nontrivial
to capture the output, escape it to somehow fit into a delimited field,
et al.  I'd rather have a committed improvement, than talks about a
bigger one.


I don't see how this would be significantly more complex, but of course 
that's subjective.



Though, in the case of multiple errors perhaps it would be best
to just report a count and point them at the log.

It'll be confusing to have different interfaces in one/multiple error cases.


If we simply don't want the code complexity then fine, but I just don't 
buy this argument. How could it possibly be confusing? Either you'll get 
the actual error message or a message that says Didn't work, check the 
log. And the error will always be in the log no matter what. I really 
can't imagine that anyone would be unhappy that we just up-front told 
them what the error was if we could. Why make them jump through needless 
hoops?


 I'm saying that you'll need a way to notice that a reload was processed
 or not. And that can't really be the message itself, there has to be
 some other field; like the timestamp Tom proposes.

Ahh, right. We should make sure we don't go brain-dead if the system 
clock moves backwards. I assume we couldn't just fstat the file...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-03 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  The discussion I'm having with Peter on another thread is a very similar
  case that should be looping in, which is if we should continue to have
  any superuser check on updating catalog tables.  He is advocating that
  we remove that also and let the superuser GRANT modification access on
  catalog tables to users.
 
  For my part, I understood that we specifically didn't want to allow that
  for the same reason that we didn't want to simply depend on the GRANT
  system for the above functions, but everyone else on these discussions
  so far is advocating for using the GRANT system.  My memory might be
  wrong, but I could have sworn that I had brought up exactly that
  suggestion in years past only to have it shot down.
 
 I'm a bit less comfortable with this one, mainly because the ability to
 modify the system catalogs directly implies the ability to crash, corrupt,
 or hijack the database in any number of non-obvious ways.  I would go so
 far as to argue that if you grant me modify permissions on many of the
 catalogs, I would have the ability to become superuser outright, and
 therefore any notion that such a grant is any weaker than granting full
 superuserness is a dangerous lie.

I tend to agree with this approach and it's what I was advocating for in
my overall analysis of superuser checks that gave rise to the additional
role attributes idea.  If it can be used to become superuser then it
doesn't make sense to have it be independent from superuser.

 It may be that the same type of permissions escalation is possible with
 some of the functions you mention above; but anywhere that it isn't,
 I should think GRANT on the function is a reasonable solution.

The functions identified for the new role attributes were specifically
those that, I believe, could be used without also giving the user
superuser rights.  Those are:

pg_start_backup, pg_stop_backup, pg_switch_xlog, pg_rotate_logfile,
pg_create_restore_point, pg_xlog_replay_pause, lo_import, lo_export,
and pg_xlog_replay_resume.

 One aspect of this that merits some thought is that in some cases
 access to some set of functions is best granted as a unit.  That's
 easy with role properties but much less so with plain GRANT.
 Do we have enough such cases to make it an issue?

That's what roles are for, in my mind.  If we're fine with using the
grant system to control executing these functions then I would suggest
we address this by providing some pre-configured roles or even just
documentation which list what a given role would need.  That's
certainly what a lot of people coming from other databases expect.  The
problem with the role attribute approach is that they aren't inheirted
the way GRANTs are, which means you can't have a backup role that is
then granted out to users, you'd have to set a BACKUP role attribute
for every role added.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding

2015-03-03 Thread Peter Geoghegan
On Tue, Mar 3, 2015 at 3:13 AM, Andres Freund and...@2ndquadrant.com wrote:
 toast_save_datum() is called with a heap_insert() call before heap
 insertion for the tuple proper. We're relying on the assumption that
 if there is no immediate super deletion record, things are fine. We
 cannot speculatively insert into catalogs, BTW.

 I fail to see what prohibits e.g. another catalog modifying transaction
 to commit inbetween and distribute a new snapshot.

SnapBuildDistributeNewCatalogSnapshot()/REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT
does look like a problematic case. It's problematic specifically
because it involves some xact queuing a change to *some other* xact -
we cannot assume that this won't happen between WAL-logging within
heap_insert() and heap_delete(). Can you think of any other such
cases?

I think that REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID should be fine.
REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID is not an issue either. In
both cases I believe my assumption does not break because there won't
be writes to system catalogs between the relevant heap_insert() and
heap_delete() calls, which are tightly coupled, and also because
speculative insertion into catalogs is unsupported. That just leaves
the non-*CHANGE_INTERAL_* (regular DML) cases, which should also be
fine.

As for SnapBuildDistributeNewCatalogSnapshot(), I'm open to
suggestions. Do you see any opportunity around making assumptions
about heavyweight locking making the interleaving of some
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change between a
REORDER_BUFFER_CHANGE_INTERNAL_INSERT change and a
REORDER_BUFFER_CHANGE_INTERNAL_DELETE change? AFAICT, that's all this
approach really needs to worry about (or the interleaving of something
else not under the control of speculative insertion - doesn't have to
be a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, that's just the most
obvious problematic case).

 Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case
 next, in the case that we need to care about (when the tuple was super
 deleted immediately afterwards)?

 It's irrelevant whether you care about it or not. Your
 ReorderBufferIterTXNNext() consumes a change that needs to actually be
 processed. It could very well be something important like a new
 snapshot.

But it is actually processed, in the next iteration (we avoid calling
ReorderBufferIterTXNNext() at the top of the loop if it has already
been called for that iteration as part of peeking ahead). AFAICT all
that is at issue is the safety of one particular assumption I've made:
that it is safe to assume that there will never be a super deletion in
the event of not seeing a super deletion change immediately following
a speculative insertion change within some xact when decoding. It's
still going to be processed if it's something else. The implementation
will, however, fail to consolidate the speculative insertion change
and super deletion change if my assumption is ever faulty.

This whole approach doesn't seem to be all that different to how there
is currently coordination within ReorderBufferCommit() between
TOAST-related changes, and changes to the relation proper. In fact,
I've now duplicated the way the IsToastRelation() case performs
dlist_delete(change-node) in order to avoid chunk reuse in the
event of spilling to disk. Stress testing by decreasing
max_changes_in_memory to 1 does not exhibit broken behavior, I
believe (although that does break the test_decoding regression tests
on the master branch, FWIW). Also, obviously I have not considered the
SnapBuildDistributeNewCatalogSnapshot() case too closely.

I attach an updated patch that I believe at least handles the
serialization aspects correctly, FYI. Note that I assert that
REORDER_BUFFER_CHANGE_INTERNAL_DELETE follows a
REORDER_BUFFER_CHANGE_INTERNAL_INSERT, so if you can find a way to
break the assumption it should cause an assertion failure, which is
something to look out for during testing.

 While what I have here *is* very ugly, I see no better alternative. By
 doing what you suggest, we'd be finding a special case way of safely
 violating the general WAL log-before-data rule.

 No, it won't. We don't WAL log modifications that don't need to persist
 in a bunch of places. It'd be perfectly fine to start upsert with a
 'acquiring a insertion lock' record that pretty much only contains the
 item pointer (and a FPI if necessary to prevent torn pages). And then
 only log the full new record once it's sure that the whole thing needs
 to survive.  Redo than can simply clean out the size touched by the
 insertion lock.

That seems like a lot of work in the general case to not do something
that will only very rarely need to occur anyway. The optimistic
approach of value locking scheme #2 has a small race that is detected
(which necessitates super deletion), but that will only very rarely be
required. Also, there is value in making super deletion (and
speculative insertion) as close as possible to regular deletion 

Re: [HACKERS] Bootstrap DATA is a pita

2015-03-03 Thread Robert Haas
On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
 On 2/20/15 8:46 PM, Josh Berkus wrote:
 Or what about just doing CSV?

 I don't think that would actually address the problems.  It would just
 be the same format as now with different delimiters.

 Yea, we need hierarchies and named keys.

 Yeah.  One thought though is that I don't think we need the data layer
 in your proposal; that is, I'd flatten the representation to something
 more like

  {
  oid = 2249,
  oiddefine = 'CSTRINGOID',
  typname = 'cstring',
  typlen = -2,
  typbyval = 1,
  ...
  }

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring.  I agree that the current format is a pain in the tail, but
pg_proc.h is 5k lines already.  I don't want it to be 100k lines
instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_dump

2015-03-03 Thread Michael Paquier
On Wed, Mar 4, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote:
 - set up basic scaffolding for TAP tests in src/bin/pg_dump

Agreed.

 - write a Perl function that can create an extension on the fly, given
 name, C code, SQL code

I am perplex about that. Where would the SQL code or C code be stored?
In the pl script itself? I cannot really see the advantage to generate
automatically the skeletton of an extension based on some C or SQL
code in comparison to store the extension statically as-is. Adding
those extensions in src/test/modules is out of scope to not bloat it,
so we could for example add such test extensions in t/extensions/ or
similar, and have prove_check scan this folder, then install those
extensions in the temporary installation.

 - add to the proposed t/001_dump_test.pl code to write the extension
 - add that test to the pg_dump test suite
 Eventually, the dump-and-restore routine could also be refactored, but
 as long as we only have one test case, that can wait.

Agreed on all those points.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MD5 authentication needs help

2015-03-03 Thread Bruce Momjian
It feels like MD5 has accumulated enough problems that we need to start
looking for another way to store and pass passwords.  The MD5 problems
are:

1)  MD5 makes users feel uneasy (though our usage is mostly safe) 

2)  The per-session salt sent to the client is only 32-bits, meaning
that it is possible to reply an observed MD5 hash in ~16k connection
attempts. 

3)  Using the user name for the MD5 storage salt allows the MD5 stored
hash to be used on a different cluster if the user used the same
password. 

4)  Using the user name for the MD5 storage salt causes the renaming of
a user to break the stored password.

For these reasons, it is probably time to start thinking about a
replacement that fixes these issues.  We would keep MD5 but recommend
a better option.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-03-03 Thread Kouhei Kaigai
 Obviously FDW can add multiple paths at a time, like GetForeignPaths,
 so IMO it should be renamed to GetForeignJoinPaths, with plural form.
 
 In addition to that, new member of RelOptInfo, fdw_handler, should be
 initialized explicitly in build_simple_rel.
 
 Please see attached a patch for these changes.

Thanks for your checks. Yep, the name of FDW handler should be ...Paths(),
instead of Path().

The attached one integrates Hanada-san's updates.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Shigeru Hanada [mailto:shigeru.han...@gmail.com]
 Sent: Tuesday, March 03, 2015 9:26 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org
 Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom
 Plan API)
 
 Kaigai-san,
 
 The v6 patch was cleanly applied on master branch.  I'll rebase my
 patch onto it, but before that I have a comment about name of the new
 FDW API handler GetForeignJoinPath.
 
 Obviously FDW can add multiple paths at a time, like GetForeignPaths,
 so IMO it should be renamed to GetForeignJoinPaths, with plural form.
 
 In addition to that, new member of RelOptInfo, fdw_handler, should be
 initialized explicitly in build_simple_rel.
 
 Please see attached a patch for these changes.
 
 I'll review the v6 path afterwards.
 
 
 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
  Sorry, I misoperated on patch creation.
  Attached one is the correct version.
  --
  NEC OSS Promotion Center / PG-Strom Project
  KaiGai Kohei kai...@ak.jp.nec.com
 
 
  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
  Sent: Tuesday, March 03, 2015 6:31 PM
  To: Kaigai Kouhei(海外 浩平); Robert Haas
  Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
  Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan 
  API)
 
  The attached version of custom/foreign-join interface patch
  fixes up the problem reported on the join-pushdown support
  thread.
 
  The previous version referenced *_ps_tlist on setrefs.c, to
  check whether the Custom/ForeignScan node is associated with
  a particular base relation, or not.
  This logic considered above nodes performs base relation scan,
  if *_ps_tlist is valid. However, it was incorrect in case when
  underlying pseudo-scan relation has empty targetlist.
  Instead of the previous logic, it shall be revised to check
  scanrelid itself. If zero, it means Custom/ForeignScan node is
  not associated with a particular base relation, thus, its slot
  descriptor for scan shall be constructed based on *_ps_tlist.
 
 
  Also, I noticed a potential problem if CSP/FDW driver want to
  displays expression nodes using deparse_expression() but
  varnode within this expression does not appear in the *_ps_tlist.
  For example, a remote query below shall return rows with two
  columns.
 
SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;
 
  Thus, ForeignScan will perform like as a scan on relation with
  two columns, and FDW driver will set two TargetEntry on the
  fdw_ps_tlist. If FDW is designed to keep the join condition
  (aid = bid) using expression node form, it is expected to be
  saved on custom/fdw_expr variable, then setrefs.c rewrites the
  varnode according to *_ps_tlist.
  It means, we also have to add *_ps_tlist both of aid and bid
  to avoid failure on variable lookup. However, these additional
  entries changes the definition of the slot descriptor.
  So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
  use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
  the slot descriptor based on the *_ps_tlist.
  It expects CSP/FDW drivers to add target-entries with resjunk=true,
  if it wants to have additional entries for variable lookups on
  EXPLAIN command.
 
  Fortunately or unfortunately, postgres_fdw keeps its remote query
  in cstring form, so it does not need to add junk entries on the
  fdw_ps_tlist.
 
  Thanks,
  --
  NEC OSS Promotion Center / PG-Strom Project
  KaiGai Kohei kai...@ak.jp.nec.com
 
 
   -Original Message-
   From: pgsql-hackers-ow...@postgresql.org
   [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
   Sent: Sunday, February 15, 2015 11:01 PM
   To: Kaigai Kouhei(海外 浩平); Robert Haas
   Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
   Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan 
   API)
  
   The attached patch is a rebased version of join replacement with
   foreign-/custom-scan. Here is no feature updates at this moment
   but SGML documentation is added (according to Michael's comment).
  
   This infrastructure allows foreign-data-wrapper and custom-scan-
   provider to add alternative scan paths towards relations join.
   From viewpoint of the executor, it looks like a scan on a pseudo-
   relation that is 

  1   2   >