Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Jim Nasby

On 3/4/15 2:56 PM, Stephen Frost wrote:

2)  The per-session salt sent to the client is only 32-bits, meaning
that it is possible to reply an observed MD5 hash in ~16k connection
attempts.

Yes, and we have no (PG-based) mechanism to prevent those connection
attempts, which is a pretty horrible situation to be in.


Is there some reason we don't just fix that? I'm thinking that this is a 
special case where we could just modify the pg_auth tuple in-place 
without bloating the catalog (we already do that somewhere else). Is 
there something else that makes this difficult? Are we afraid of an 
extra GUC to control it?

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 3/4/15 2:56 PM, Stephen Frost wrote:
 2)  The per-session salt sent to the client is only 32-bits, meaning
 that it is possible to reply an observed MD5 hash in ~16k connection
 attempts.
 Yes, and we have no (PG-based) mechanism to prevent those connection
 attempts, which is a pretty horrible situation to be in.
 
 Is there some reason we don't just fix that? I'm thinking that this
 is a special case where we could just modify the pg_auth tuple
 in-place without bloating the catalog (we already do that somewhere
 else). Is there something else that makes this difficult? Are we
 afraid of an extra GUC to control it?

I'm all for it, though I would ask that we provide a way for superusers
to delegate the ability to reset locked accounts to non-superusers.

I'd want to think about it a bit more before settling on using pg_authid
to track the data.  In any case, I do think we need a way to disable
this ability for certain roles and, furtherr, that we not track failed
logins in cases where it's disabled (which might well be the default- I
don't think we want to add this overhead for systems which have lots of
recurring logins (think application users where they aren't doing
pooling).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Jim Nasby

On 3/5/15 2:17 PM, Stephen Frost wrote:

* Jim Nasby (jim.na...@bluetreble.com) wrote:

On 3/4/15 2:56 PM, Stephen Frost wrote:

2)  The per-session salt sent to the client is only 32-bits, meaning

that it is possible to reply an observed MD5 hash in ~16k connection
attempts.

Yes, and we have no (PG-based) mechanism to prevent those connection
attempts, which is a pretty horrible situation to be in.


Is there some reason we don't just fix that? I'm thinking that this
is a special case where we could just modify the pg_auth tuple
in-place without bloating the catalog (we already do that somewhere
else). Is there something else that makes this difficult? Are we
afraid of an extra GUC to control it?


I'm all for it, though I would ask that we provide a way for superusers
to delegate the ability to reset locked accounts to non-superusers.

I'd want to think about it a bit more before settling on using pg_authid


I guess it's a question of how durable we want it to be. We could 
conceivable keep it in shared memory and let it wipe on a crash.


But we already have code that ignores MVCC on a catalog table (IIRC for 
updating pg_class stats after vacuum) so the pattern is there. I don't 
see that we need more sophistication than that...



to track the data.  In any case, I do think we need a way to disable
this ability for certain roles


In the interest of something for this release... do we really need that? 
My thought is we just special-case the postgres user and be done with 
it. Though, if there's some other way to reset an account from the 
shell, no need to even special case postgres.


Though, I guess if we just follow the normal GUC behavior of allowing 
per-database and -user overrides it wouldn't be that hard.



and, furtherr, that we not track failed
logins in cases where it's disabled (which might well be the default- I
don't think we want to add this overhead for systems which have lots of
recurring logins (think application users where they aren't doing
pooling).


Yeah, presumably if allowed_authentication_failures  0 then we don't 
bother with the check at all.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-05 Thread Jim Nasby

On 3/4/15 9:10 AM, Robert Haas wrote:

On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote:

Could the large allocation[2] for the dead tuple array in lazy_space_alloc
cause problems with linux OOM? [1] and some other things I've read indicate
that a large mmap will count towards total system memory, including
producing a failure if overcommit is disabled.


I believe that this is possible.


Would it be worth avoiding the full size allocation when we can?


Maybe.  I'm not aware of any evidence that this is an actual problem
as opposed to a theoretical one.  vacrelstats-dead_tuples is limited
to a 1GB allocation, which is not a trivial amount of memory, but it's
not huge, either.  But we could consider changing the representation
from a single flat array to a list of chunks, with each chunk capped
at say 64MB.  That would not only reduce the amount of memory that we


I was thinking the simpler route of just repalloc'ing... the memcpy 
would suck, but much less so than the extra index pass. 64M gets us 11M 
tuples, which probably isn't very common.



needlessly allocate, but would allow autovacuum to make use of more
than 1GB of maintenance_work_mem, which it looks like it currently
can't.  I'm not sure that's a huge problem right now either, because


I'm confused... how autovacuum is special in this regard? Each worker 
can use up to 1G, just like a regular vacuum, right? Or are you just 
saying getting rid of the 1G limit would be good?



it's probably rare to vacuum at able with more than 1GB / 6 =
178,956,970 dead tuples in it, but it would certainly suck if you did
and if the current 1GB limit forced you to do multiple vacuum passes.


Yeah, with 100+ GB machines not that uncommon today perhaps it's worth 
significantly upping this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a2e35b53 added unused variable to ConversionCreate()

2015-03-05 Thread Alvaro Herrera
Peter Geoghegan wrote:
 I'm seeing this on a the master branch tip when building at -O2:
 
 pg_conversion.c: In function ‘ConversionCreate’:
 pg_conversion.c:53:6: error: variable ‘oid’ set but not used
 [-Werror=unused-but-set-variable]
   Oid   oid;
   ^
 cc1: all warnings being treated as errors
 
 I think that commit a2e35b53 did this.

Obviously the commit did not introduce an unused variable, but instead
made another variable take its place as the function's return value.  In
an assert-enabled build it was used by an assert, which is why I didn't
notice the problem.  I removed the assert and the variable so this
should be fixed now.

Thanks for reporting.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-03-05 Thread Jim Nasby

On 3/4/15 7:13 PM, Jan de Visser wrote:

On March 4, 2015 11:08:09 PM Andres Freund wrote:

Let's get the basic feature (notification of failed reloads) done
first. That will be required with or without including the error
message.  Then we can get more fancy later, if somebody really wants to
invest the time.


Except for also fixing pg_reload_conf() to pay attention to what happens with
postmaster.pid, the patch I submitted does exactly what you want I think.

And I don't mind spending time on stuff like this. I'm not smart enough to deal
with actual, you know, database technology.


Yeah, lets at least get this wrapped and we can see about improving it.

I like the idea of doing a here-doc or similar in the .pid, though I 
think it'd be sufficient to just prefix all the continuation lines with 
a tab. An uglier option would be just striping the newlines out.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 3/5/15 2:17 PM, Stephen Frost wrote:
 * Jim Nasby (jim.na...@bluetreble.com) wrote:
 I'm all for it, though I would ask that we provide a way for superusers
 to delegate the ability to reset locked accounts to non-superusers.
 
 I'd want to think about it a bit more before settling on using pg_authid
 
 I guess it's a question of how durable we want it to be. We could
 conceivable keep it in shared memory and let it wipe on a crash.
 
 But we already have code that ignores MVCC on a catalog table (IIRC
 for updating pg_class stats after vacuum) so the pattern is there. I
 don't see that we need more sophistication than that...

I'm not sure we should jump to that immediately..

 to track the data.  In any case, I do think we need a way to disable
 this ability for certain roles
 
 In the interest of something for this release... do we really need
 that? My thought is we just special-case the postgres user and be
 done with it. Though, if there's some other way to reset an account
 from the shell, no need to even special case postgres.

I don't think this is going to happen for 9.5 unless someone shows up
with code in the *very* short term..  Further, realistically, we would
want to design this properly and not just hack something together.

 Though, I guess if we just follow the normal GUC behavior of
 allowing per-database and -user overrides it wouldn't be that hard.

Yes, using the GUC-based approach would allow users to be excluded from
an overall (or per-database) policy.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-03-05 Thread Peter Geoghegan
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
 CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
 this development justifies a new thread, though.)

Bruce Momjian kindly made available a server for stress-testing [1].
I'm using jjanes_upsert for this task (I stopped doing this for a
little while, and was in need of a new server).

At the very highest client count I'm testing (128), I see unprincipled
deadlocks for the exclusion constraint case very infrequently:


2015-03-05 14:09:36 EST [ 64987901 ]: ERROR:  deadlock detected
2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL:  Process 7044 waits for
ShareLock on promise tuple with token 1 of transaction 64987589;
blocked by process 7200.
Process 7200 waits for ShareLock on transaction 64987901;
blocked by process 7044.
Process 7044: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count
Process 7200: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count
2015-03-05 14:09:36 EST [ 64987901 ]: HINT:  See server log for query details.
2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT:  insert into
upsert_race_test (index, count) values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count



(Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs,
are artificial; this has been enabled within the parser solely for the
benefit of this stress-testing. Also, the B-Tree AM does not have nor
require livelock insurance).

This only happens after just over 30 minutes, while consistently doing
128 client exclusion constraint runs. This is pretty close to the most
stressful thing that you could throw at the implementation, so that's
really not too bad.

I believe that this regression occurred as a direct result of adding
livelock insurance. Basically, we cannot be 100% sure that the
interleaving of WAL-logged things within and across transactions won't
be such that the only, oldest session that gets to wait in the
second stage (the second possible call to
check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples())
will really be the oldest XID. Another *older* xact could just get
in ahead of us, waiting on our promise tuple as we wait on its xact
end (maybe it updates some third tuple that we didn't see in that has
already committed...not 100% sure). Obviously XID assignment order
does not guarantee that things like heap insertion and index tuple
insertion occur in serial XID order, especially with confounding
factors like super deletion. And so, every once in a long while we
deadlock.

Now, the very fact that this happens at all actually demonstrates the
need for livelock insurance, IMV. The fact that we reliably
terminate is *reassuring*, because livelocks are in general a
terrifying possibility. We cannot *truly* solve the unprincipled
deadlock problem without adding livelocks, I think. But what we have
here is very close to eliminating unprincipled deadlocks, while not
also adding any livelocks, AFAICT. I'd argue that that's good enough.

Of course, when I remove livelock insurance, the problem ostensibly
goes away (I've waited on such a stress test session for a couple of
hours now, so this conclusion seems very likely to be correct). I
think that we should do nothing about this, other than document it as
possible in our comments on livelock insurance. Again, it's very
unlikely to be a problem in the real world, especially since B-Trees
are unaffected.

Also, I should point out that one of those waiters doesn't look like
an insertion-related wait at all: 7200 waits for ShareLock on
transaction 64987901; blocked by process 7044. Looks like row locking
is an essential part of this deadlock, and ordinarily that isn't even
possible for exclusion constraints (unless the patch is hacked to make
the parser accept exclusion constraints for an UPSERT, as it has been
here). Not quite sure what exact  XactLockTableWait() call did this,
but don't think it was any of them within
check_exclusion_or_unique_constraint(), based on the lack of details
(TID and so on are not shown).

[1] http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_dump

2015-03-05 Thread Michael Paquier
On Wed, Mar 4, 2015 at 2:03 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Mar 4, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote:
 - set up basic scaffolding for TAP tests in src/bin/pg_dump

 Agreed.

 - write a Perl function that can create an extension on the fly, given
 name, C code, SQL code

 I am perplex about that. Where would the SQL code or C code be stored?
 In the pl script itself? I cannot really see the advantage to generate
 automatically the skeletton of an extension based on some C or SQL
 code in comparison to store the extension statically as-is. Adding
 those extensions in src/test/modules is out of scope to not bloat it,
 so we could for example add such test extensions in t/extensions/ or
 similar, and have prove_check scan this folder, then install those
 extensions in the temporary installation.

 - add to the proposed t/001_dump_test.pl code to write the extension
 - add that test to the pg_dump test suite
 Eventually, the dump-and-restore routine could also be refactored, but
 as long as we only have one test case, that can wait.

 Agreed on all those points.

Please note that I have created a new thread especially for this purpose here:
http://www.postgresql.org/message-id/CAB7nPqRx=zmbfjyjrwhguhhqk__8m+wd+p95cenjtmhxxr7...@mail.gmail.com
Perhaps we should move there this discussion as it is rather
independent of the problem that has been reported.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote
On 06-03-2015 AM 01:32, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  appendStringInfo(buffer, _(user mapping for %s in server %s), 
 usename,
   srv-servername);
 
 +1 for the concept, but to be nitpicky, in doesn't seem like the right
 word here.  on server would read better to me; or perhaps at server.
 

One more option may be for server (reading the doc for CREATE USER MAPPING)

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote
On 06-03-2015 AM 09:18, Amit Langote wrote:
 On 06-03-2015 AM 01:32, Tom Lane wrote:

 +1 for the concept, but to be nitpicky, in doesn't seem like the right
 word here.  on server would read better to me; or perhaps at server.

 
 One more option may be for server (reading the doc for CREATE USER MAPPING)

Oh, I see it's been done that way already.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote
On 06-03-2015 AM 09:30, Tom Lane wrote:
 Amit Langote langote_amit...@lab.ntt.co.jp writes:
 
 One more option may be for server (reading the doc for CREATE USER MAPPING)
 
 Hm, but then you'd have user mapping for foo for server bar, which
 doesn't read so nicely either.
 

Yeah, I had totally missed the for foo part; I was thinking of it as of foo.

By the way, in this case, is foo the name/id of a local user or does it
really refer to some foo on the remote server?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby

On 2/28/15 12:01 PM, David Kubečka wrote:

With 'random_fk_dupl':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
With 'random_fk_uniq':
  -  Index Scan using facts_fk_idx on facts (cost=0.42..214.26
rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)

I have read the optimizer README file and also looked briefly at the
code, but this seems to be something not related to particular
implementation of algorithm (e.g. nested loop). Perhaps it's the way how
cost estimates are propagated down (or sideways? that would be weird...)
the query tree. But I am really not sure, since this is my first time
lookng at the optimizer code base. I should also add that I have
reproduced this behaviour for all versions of Pg from 9.2 up to current
devel.


This got answered on one of the other lists, right?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane
Amit Langote langote_amit...@lab.ntt.co.jp writes:
 By the way, in this case, is foo the name/id of a local user or does it
 really refer to some foo on the remote server?

It's the name of a local user.  I see your point that somebody might
misread this as suggesting that it's a remote username, but not sure
that there's anything great we can do here to disambiguate that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Kouhei Kaigai
 Actually val and val2 come from public.lt in r side, but as you say
 it's too difficult to know that from EXPLAIN output.  Do you have any
 idea to make the Output item more readable?

A fundamental reason why we need to have symbolic aliases here is that
postgres_fdw has remote query in cstring form. It makes implementation
complicated to deconstruct/construct a query that is once constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expression node
form (and construction of remote query is delayed to beginning of the
executor, probably), we will be able to construct more human readable
remote query.

However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
 Sent: Thursday, March 05, 2015 10:00 PM
 To: Ashutosh Bapat
 Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
 Subject: Re: [HACKERS] Join push-down support for foreign tables
 
 Hi Ashutosh, thanks for the review.
 
 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat ashutosh.ba...@enterprisedb.com:
  In create_foreignscan_path() we have lines like -
  1587 pathnode-path.param_info = get_baserel_parampathinfo(root, rel,
  1588
  required_outer);
  Now, that the same function is being used for creating foreign scan paths
  for joins, we should be calling get_joinrel_parampathinfo() on a join rel
  and get_baserel_parampathinfo() on base rel.
 
 Got it.  Please let me check the difference.
 
 
  The patch seems to handle all the restriction clauses in the same way. There
  are two kinds of restriction clauses - a. join quals (specified using ON
  clause; optimizer might move them to the other class if that doesn't affect
  correctness) and b. quals on join relation (specified in the WHERE clause,
  optimizer might move them to the other class if that doesn't affect
  correctness). The quals in a are applied while the join is being computed
  whereas those in b are applied after the join is computed. For example,
  postgres=# select * from lt;
   val | val2
  -+--
 1 |2
 1 |3
  (2 rows)
 
  postgres=# select * from lt2;
   val | val2
  -+--
 1 |2
  (1 row)
 
  postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
   val | val2 | val | val2
  -+--+-+--
 1 |2 |   1 |2
 1 |3 | |
  (2 rows)
 
  postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
  lt2.val2);
   val | val2 | val | val2
  -+--+-+--
 1 |2 |   1 |2
  (1 row)
 
  The difference between these two kinds is evident in case of outer joins,
  for inner join optimizer puts all of them in class b. The remote query
  sent to the foreign server has all those in ON clause. Consider foreign
  tables ft1 and ft2 pointing to local tables on the same server.
  postgres=# \d ft1
   Foreign table public.ft1
   Column |  Type   | Modifiers | FDW Options
  +-+---+-
   val| integer |   |
   val2   | integer |   |
  Server: loopback
  FDW Options: (table_name 'lt')
 
  postgres=# \d ft2
   Foreign table public.ft2
   Column |  Type   | Modifiers | FDW Options
  +-+---+-
   val| integer |   |
   val2   | integer |   |
  Server: loopback
  FDW Options: (table_name 'lt2')
 
  postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
  ft2.val2) where ft1.val + ft2.val  ft1.val2 or ft2.val is null;
 
  QUERY PLAN
 
 
 
 ---
 
 
 
   Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
 Output: val, val2, val, val2
 Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
  public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
  _0, a_1)  ON r.a_0 + l.a_0)  r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
  l.a_1))
  (3 rows)
 
  The result is then wrong
  postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
  ft1.val + ft2.val  ft1.val2 or ft2.val is null;
   val | val2 | val | val2
  -+--+-+--
 1 |2 | |
 1 |3 | |
  (2 rows)
 
  which should match the result obtained by substituting local tables for
  foreign ones
  postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
  lt.val + lt2.val  lt.val2 or lt2.val is null;
   val | val2 | val | val2
  -+--+-+--
 1 |3 | |
  (1 row)
 
  Once we start distinguishing the two 

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Peter Eisentraut
On 3/5/15 9:42 AM, Greg Stark wrote:
 Well if you want to read the file as is you can do so using the file
 reading functions which afaik were specifically intended for the
 purpose of writing config editing tools.

Sure, but those things are almost never installed by default, and I
don't want to install them if they provide easy write access, and they
don't help you find the actual file.  This proposal is much nicer.

 Joining against other catalog tables may be kind of exaggerated but if
 I have data in an SQL view I certainly expect to be able to write
 WHERE clauses to find the rows I want without having to do string
 parsing. If it were a comma-delimited string I have to do something
 like WHERE users LIKE '%,username,%' but then that's not good enough
 since it may be the first or last and the username itself may contain
 white-space or a comma etc etc.

The point is, it should be one or the other (or both), not something in
the middle.

It's either a textual representation of the file or a semantic one.  If
it's the latter, then all user names, group names, and special key words
need to be resolved in some way that they cannot be confused with
unfortunately named user names.  And there needs to be a way that we can
add more in the future.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Fujii Masao
On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
 Please find attached  a patch. As discussed, flag to denote
 compression and presence of hole in block image has been added in
 XLogRecordImageHeader rather than block header.

 FWIW, I personally won't commit it with things done that way. I think
 it's going the wrong way, leading to a harder to interpret and less
 flexible format.  I'm not going to further protest if Fujii or Heikki
 commit it this way though.

I'm pretty sure that we can discuss the *better* WAL format even after
committing this patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane
Amit Langote langote_amit...@lab.ntt.co.jp writes:
 On 06-03-2015 AM 01:32, Tom Lane wrote:
 +1 for the concept, but to be nitpicky, in doesn't seem like the right
 word here.  on server would read better to me; or perhaps at server.

 One more option may be for server (reading the doc for CREATE USER MAPPING)

Hm, but then you'd have user mapping for foo for server bar, which
doesn't read so nicely either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rethinking pg_dump's function sorting code

2015-03-05 Thread Tom Lane
In bug #12832 Marko Tiikkaja points out that commit
7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
dump failure hazard, since it applies pg_get_function_identity_arguments()
to every function in the database, even those that won't get dumped.
I think we should fix this by getting rid of pg_dump's use of that
function altogether.  A low-tech way to sort functions of identical names
would be to compare argument type OIDs, as in the attached simple patch.
If people feel it's important to avoid depending on numerical OID order,
we could instead look up type names locally and compare them as in the
attached less-simple patch.  (Both patches neglect reverting the data
collection aspects of the prior commit, since that's mechanical; the only
interesting part is what we'll do to sort.)

Neither patch will exactly preserve the sort behavior of the current
code, but I don't think that's important.

Comments?

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..816c9f0 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*** DOTypeNameCompare(const void *p1, const 
*** 291,303 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
  
  		cmpval = fobj1-nargs - fobj2-nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		cmpval = strcmp(fobj1-proiargs, fobj2-proiargs);
! 		if (cmpval != 0)
! 			return cmpval;
  	}
  	else if (obj1-objType == DO_OPERATOR)
  	{
--- 291,307 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
+ 		int			i;
  
  		cmpval = fobj1-nargs - fobj2-nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		for (i = 0; i  fobj1-nargs; i++)
! 		{
! 			cmpval = oidcmp(fobj1-argtypes[i], fobj2-argtypes[i]);
! 			if (cmpval != 0)
! return cmpval;
! 		}
  	}
  	else if (obj1-objType == DO_OPERATOR)
  	{
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..de9407c 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*** DOTypeNameCompare(const void *p1, const 
*** 291,303 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
  
  		cmpval = fobj1-nargs - fobj2-nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		cmpval = strcmp(fobj1-proiargs, fobj2-proiargs);
! 		if (cmpval != 0)
! 			return cmpval;
  	}
  	else if (obj1-objType == DO_OPERATOR)
  	{
--- 291,313 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
+ 		int			i;
  
  		cmpval = fobj1-nargs - fobj2-nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		for (i = 0; i  fobj1-nargs; i++)
! 		{
! 			TypeInfo   *argtype1 = findTypeByOid(fobj1-argtypes[i]);
! 			TypeInfo   *argtype2 = findTypeByOid(fobj2-argtypes[i]);
! 
! 			if (argtype1  argtype2)
! 			{
! cmpval = strcmp(argtype1-dobj.name, argtype2-dobj.name);
! if (cmpval != 0)
! 	return cmpval;
! 			}
! 		}
  	}
  	else if (obj1-objType == DO_OPERATOR)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread David G Johnston
Tom Lane-2 wrote
 Amit Langote lt;

 Langote_Amit_f8@.co

 gt; writes:
 By the way, in this case, is foo the name/id of a local user or does it
 really refer to some foo on the remote server?
 
 It's the name of a local user.  I see your point that somebody might
 misread this as suggesting that it's a remote username, but not sure
 that there's anything great we can do here to disambiguate that.

Yeah, clarifying that point seems to add verbosity for little gain.

On the message aspect is it against style to use composite notation in a
case like this?

user mapping (%s, %s)

It is basically a single, if compound, noun so adding in/at/to doesn't feel
right...

David J.





--
View this message in context: 
http://postgresql.nabble.com/object-description-for-FDW-user-mappings-tp5840655p5840729.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anyarray

2015-03-05 Thread Peter Eisentraut
On 3/4/15 10:28 PM, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On 2/13/15 10:20 AM, Teodor Sigaev wrote:
 Some of users of intarray contrib module wish to use its features with
 another kind of arrays, not only for int4 type. Suggested module
 generalizes intarray over other (not all) types op pgsql.
 
 I think this module should be merged with the intarray module.  Having
 two modules with very similar functionality would be confusing.
 
 Perhaps.  I think it would be hard to remove intarray without breaking
 things for existing users of it; even if the functionality remains under
 another name.  And surely we don't want to generalize intarray while
 keeping that same name.  So it might be hard to get to a clean solution.

Maybe we could have the anyarray module (name TBD) install two
extensions: one of its own, and one that is named intarray that just
pulls anyarray in as a dependency.

There are more fundamental questions to be answered about the
functionality of this proposal first, however.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configurable location for extension .control files

2015-03-05 Thread Jim Nasby

On 3/4/15 1:41 AM, Oskari Saarenmaa wrote:

I'm interested in this because it could potentially make it possible to
install SQL extensions without OS access. (My understanding is there's
some issue with writing the extension files even if LIBDIR is writable
by the OS account).

I'm not sure this patch makes extensions without OS access any easier to
implement; you still need to write the files somewhere, and someone
needs to set up the cluster with a nonstandard extension path.


Right, I wasn't expecting it to work out of the box.

Admittedly I'm foggy on what the exact problem with pushing a file into 
that directory via SQL is, so maybe it still won't help.



I don't think anyone should be packaging and shipping PG with
extension_directory set, but we really should allow it for superusers
and postgresql.conf so people can test extensions without hacks like
this:https://github.com/ohmu/pgmemcache/blob/master/localtests.sh#L23


FWIW I'd like to see is breaking the cluster setup/teardown capability
in pg_regress into it's own tool. It wouldn't solve the extension test
problem, but it would eliminate the need for most of what that script is
doing, and it would do it more robustly. It would make it very easy to
unit test things with frameworks other than pg_regress.

This would certainly be useful.  I can try to do something about it if
we get configurable extension path supported.  The patch is now in
https://commitfest.postgresql.org/5/170/


Awesome! Let me know if I can help. Or I may start on it if I can get 
some other stuff off my plate.


By the way, after thinking about this a little more, I think a utility 
for standing up temporary Postgres clusters would be very welcome by 
power users. It's a very common need for QA environments. Originally I 
was thinking about it in terms of things like extension testing, but now 
I realize there's a much larger audience than that. So I definitely 
think this should be a stand alone shell command (pg_temp_cluster?), and 
either have pg_regress call it or (probably more logical) add it to the 
make files as a dependency for make check (make 
temp-cluster/remove-temp-cluster or similar).

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-03-05 Thread Bruce Momjian
On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
 You are correct that a pg_controldata file is copied over that has
 wal_level=minimal, but that should not be a problem.
 
 
 I suppose, this is the root cause of why replica does not start as hot 
 standby.
 It it enough to start it as warm standby, but not hot standby.
 See CheckRequiredParameterValues function in xlog.c which is called inside of
 StartupXLOG function.

Yes, you are correct.  I spent all day building a test harness so I
could automate this setup and test various failures.  I was able to
reproduce your failure, and you are correct that the proper fix is to
set wal_level=hot_standby on the new master, and then start and stop the
new cluster just before rsync.

The root cause is that pg_upgrade calls pg_resetxlog -o on the new
cluster _after_ the new cluster stopped for the final time, so rsync is
copying the incorrect pg_controldata wal_level value.  Also, even if
pg_resetxlog preserved wal_level in the control file, there is no
guarantee that the user configured the new cluster's wal_level for
hot_standby anyway.

What I have done is to update the pg_upgrade instructions to add this
required step.  Updated doc patch attached.  (I also added the --delete
flag to rsync.)  Thanks so much for your detailed report.

 But it could not be done with --size-only key, because control-file is
 of fixed
 size and rsync would skip it. Or may be everything should be copied
 with
 --size-only and control-file should be copied without this option.
 
 
 Well, what happens is that there is no _new_ standby pg_controldata
 file, so it is copied fully from the new master.  Are you running initdb
 to create the new standby --- you shouldn't be doing that as the rsync
 will do that for you.  
 
 
 No, I don’t. The scenario of the problem with copying control-file was in case
 when I:
 1. ran pg_upgrade on master and got control-file with wal_level = minimal,
 2. did rsync --size-only to replica (and it got this control-file with
 wal_level = minimal),
 3. started and stopped postgres on master to get «good» control-file with
  wal_level = hot_standby»,
 4. did rsync --size-only to replica one more time. And this time control-file
 is not copied because of the same size of control-file.
 
 Actually, if you don’t do step 2, everything works as expected. Sorry for
 bothering you.

Ah, yes, I think doing rsync twice is never a good suggestion.  It can
lead to too many failures.  Doing the start/stop before rsync seems like
the best solution.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server just long enough to do a second applicationrsync/.  The
!second applicationrsync/ will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server long enough to do an commandrsync --checksum/.
!(option--checksum/ is necessary because commandrsync/ only
!has file modification-time granularity of one second.)  The
!second applicationrsync/ will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..56cb45d
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** NET STOP postgresql-8.4
*** 315,320 
--- 315,324 
  NET STOP postgresql-9.0
  /programlisting
  /para
+ 
+ para
+  Log-shipping standby servers can remain running until a later step.
+ /para
 /step
  
 step
*** pg_upgrade.exe
*** 399,404 
--- 403,535 
 /step
  
 step
+ titleUpgrade any Log-Shipping Standby Servers/title
+ 
+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby

On 3/5/15 7:58 PM, Jim Nasby wrote:

This got answered on one of the other lists, right?


That was supposed to be off-list. I'll answer my own question: yes.

Sorry for the noise. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 One way to fix #2 would be to use a per-user or per-cluster counter for
 the session salt, rather than a random number --- that would change
 replays from ~16k to 4 billion, with no wire protocol change needed.

I'm not against doing that if we decide to ignore the pg_authid-based
vector (which we could certainly do), but given the relatively poor
hashing algorithm we use and the small salt, along with the commonly
used practice of using TLS to address network-based attacks, I'm not
sure it's really worth it.

Note that changing the algorithm or the salt would require a wireline
protocol change and therefore isn't interesting to consider as, if we're
going to do that, we should be moving to a vetted solution instead.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Wed, Mar  4, 2015 at 04:19:00PM -0500, Stephen Frost wrote:
   Hm, well, don't change the wireline protocol could be another wanna-have
   ... but if we want to stop using MD5, that's not really a realistic goal
   is it?
  
  I'm trying to address both sides of the issue- improve the current
  situation without breaking existing clients AND provide a new auth
  method and encourage everyone to move to it as soon as they're able.  We
  can't simply deprecate md5 as that would break pg_upgrade for any users
  who are currently using it.
 
 Actually, pg_upgrade uses 'trust' and a private socket file, at least on
 Unix.  Of course, post-upgrade, users would have trouble logging in.

The post-upgrade trouble logging in is what I was getting at.  Of
course, that issue exists with a pg_dump-based approach also, so my
reference to pg_upgrade above wasn't accurate and it shold have been
would break upgrades for any users who are currently using it.

  This half of the discussion has been all about improving md5 without
  breaking the wireline protocol or existing users.
  
  The other half of the discussion is about implementing a new
  password-based authentication based on one of the vetted authentication
  protocols already in use today (SCRAM or SRP, for example).  Using those
  new authentication protocols would include a move off of the MD5 hashing
  function, of course.  This would also mean breaking the on-disk hash,
  but that's necessary anyway because what we do there today isn't secure
  either and no amount of futzing is going to change that.
 
 While I don't like the requirement to use TLS to improve MD5 fix, I also
 don't like the idea of having users go through updating all these
 passwords only to have us implement the _right_ solution in the next
 release.  I don't see why it is useful to be patching up MD5 with a TLS
 requirement when we know they should be moving to SCRAM or SRP.  If the
 MD5 change was transparent to users/admins, that would be different.

So, TLS *isn't* an actual requirement with the approach that I've
outlined, it's just that if you're not using it then you have a somewhat
larger risk of successful network-based attacks, but then, if you're
not using TLS then you're unlikely to be worried about that vector.

  I've got nearly zero interest in trying to go half-way on this by
  designing something that we think is secure which has had no external
  review or anyone else who uses it.  Further, going that route makes me
  very nervous that we'd decide on certain compromises in order to make
  things easier for users without actually realising the problems with
  such an approach (eg: well, if we use hack X we wouldn't have to
  change what is stored on disk and therefore we wouldn't break
 
 I am not happy to blindly accept a new security setup without
 understanding exactly what it is trying to fix, which is why I am asking
 all these questions.

I certainly didn't intend to suggest that anyone blindly accept a new
security setup.  I don't mind the questions but I do get a bit
frustrated at the suggestions that we can fix the md5 auth mechanism
by simply increasing the salt or putting in a better hashing algorithm.
Those are not solutions to this authentication challenge- for that, we
need to look at SCRAM or SRP, where RFCs have been reviewed and
published which outline exactly how they work and why they work the way
they do.  I'd certainly encourage everyone interested in this to go look
at those RFCs.

SCRAM - https://tools.ietf.org/html/rfc5802
SRP - https://www.ietf.org/rfc/rfc2945.txt

  of a transistion would be a lot easier on our users and I'd definitely
  recommend that we add that with this new authentication mechanism, to
  address this kind of issue in the future (not to mention that's often
  asked for..).  Password complexity would be another thing we should
  really add and is also often requested.
 
 I agree our password management could use improvement.

I'm *very* glad to hear that.  When I brought it up 10-or-so years ago,
there was very little interest in these kinds of improvements.

  Frankly, in the end, I don't see us being able to produce something in
  time for this release unless someone can be dedicated to the effort over
  the next couple of months, and therefore I'd prefer to improve the
  current issues with md5 without breaking the wireline protocol than
  simply do nothing (again).
 
 I am not sure why we have to shove something into 9.5 --- as you said,
 this issue has been known about for 10+ years.

It'd be nice to show that we're being responsive when issues are brought
up, even if they've existed for a long time (because, in the minds of
some, that we've done nothing in 10 years doesn't exactly reflect very
well on us :( ).

 I think we should do what we can to improve MD5 in cases where the
 user/admin needs to take no action, and then move to add a better
 authentication method.

Great, I 

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 When commit cae565e503 introduced FDW user mappings, it used this in
 getObjectDescription for them:
   appendStringInfo(buffer, _(user mapping for %s), usename);

 This was later mostly copied (by yours truly) as object identity by
 commit f8348ea32e wherein I used this:
   appendStringInfo(buffer, %s, usename);

 As it turns out, this is wrong, because the pg_user_mapping catalog has
 a two-column primary key which is user OID and server OID.  Therefore
 it seems to me that the correct object description and identity must
 include both username and server name.  I propose we change the above to
 this:

   appendStringInfo(buffer, _(user mapping for %s in server %s), 
 usename,
srv-servername);

+1 for the concept, but to be nitpicky, in doesn't seem like the right
word here.  on server would read better to me; or perhaps at server.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Sawada Masahiko
On Sat, Feb 28, 2015 at 2:27 PM, Stephen Frost sfr...@snowman.net wrote:
 Sawada,

 * Sawada Masahiko (sawada.m...@gmail.com) wrote:
 Attached patch is latest version including following changes.
 - This view is available to super use only
 - Add sourceline coulmn

 Alright, first off, to Josh's point- I'm definitely interested in a
 capability to show where the heck a given config value is coming from.
 I'd be even happier if there was a way to *force* where config values
 have to come from, but that ship sailed a year ago or so.  Having this
 be for the files, specifically, seems perfectly reasonable to me.  I
 could see having another function which will report where a currently
 set GUC variable came from (eg: ALTER ROLE or ALTER DATABASE), but
 having a function for which config file is this GUC set in seems
 perfectly reasonable to me.

 To that point, here's a review of this patch.

 diff --git a/src/backend/catalog/system_views.sql 
 b/src/backend/catalog/system_views.sql
 index 6df6bf2..f628cb0 100644
 --- a/src/backend/catalog/system_views.sql
 +++ b/src/backend/catalog/system_views.sql
 @@ -412,6 +412,11 @@ CREATE RULE pg_settings_n AS

  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;

 +CREATE VIEW pg_file_settings AS
 +   SELECT * FROM pg_show_all_file_settings() AS A;
 +
 +REVOKE ALL on pg_file_settings FROM public;

 While this generally works, the usual expectation is that functions
 that should be superuser-only have a check in the function rather than
 depending on the execute privilege.  I'm certainly happy to debate the
 merits of that approach, but for the purposes of this patch, I'd suggest
 you stick an if (!superuser()) ereport(must be superuser) into the
 function itself and not work about setting the correct permissions on
 it.

 + ConfigFileVariable *guc;

 As this ends up being an array, I'd call it guc_array or something
 along those lines.  GUC in PG-land has a pretty specific single-entity
 meaning.

 @@ -344,6 +346,21 @@ ProcessConfigFile(GucContext context)
   PGC_BACKEND, 
 PGC_S_DYNAMIC_DEFAULT);
   }

 + guc_file_variables = (ConfigFileVariable *)
 + guc_malloc(FATAL, num_guc_file_variables * 
 sizeof(struct ConfigFileVariable));

 Uh, perhaps I missed it, but what happens on a reload?  Aren't we going
 to realloc this every time?  Seems like we should be doing a
 guc_malloc() the first time through but doing guc_realloc() after, or
 we'll leak memory on every reload.

 + /*
 +  * Apply guc config parameters to guc_file_variable
 +  */
 + guc = guc_file_variables;
 + for (item = head; item; item = item-next, guc++)
 + {
 + guc-name = guc_strdup(FATAL, item-name);
 + guc-value = guc_strdup(FATAL, item-value);
 + guc-filename = guc_strdup(FATAL, item-filename);
 + guc-sourceline = item-sourceline;
 + }

 Uh, ditto and double-down here.  I don't see a great solution other than
 looping through the previous array and free'ing each of these, since we
 can't depend on the memory context machinery being up and ready at this
 point, as I recall.

 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
 index 931993c..c69ce66 100644
 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -3561,9 +3561,17 @@ static const char *const map_old_guc_names[] = {
   */
  static struct config_generic **guc_variables;

 +/*
 + * lookup of variables for pg_file_settings view.
 + */
 +static struct ConfigFileVariable *guc_file_variables;
 +
  /* Current number of variables contained in the vector */
  static int   num_guc_variables;

 +/* Number of file variables */
 +static int   num_guc_file_variables;
 +

 I'd put these together, and add a comment explaining that
 guc_file_variables is an array of length num_guc_file_variables..

  /*
 + * Return the total number of GUC File variables
 + */
 +int
 +GetNumConfigFileOptions(void)
 +{
 + return num_guc_file_variables;
 +}

 I don't see the point of this function..

 +Datum
 +show_all_file_settings(PG_FUNCTION_ARGS)
 +{
 + FuncCallContext *funcctx;
 + TupleDesc   tupdesc;
 + int call_cntr;
 + int max_calls;
 + AttInMetadata *attinmeta;
 + MemoryContext oldcontext;
 +
 + if (SRF_IS_FIRSTCALL())
 + {
 + funcctx = SRF_FIRSTCALL_INIT();
 +
 + oldcontext = 
 MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
 +
 + /*
 +  * need a tuple descriptor representing NUM_PG_SETTINGS_ATTS 
 columns
 +  * of the appropriate types
 +  */
 +
 + tupdesc = CreateTemplateTupleDesc(NUM_PG_FILE_SETTINGS_ATTS, 
 false);
 + TupleDescInitEntry(tupdesc, (AttrNumber) 1, name,
 +TEXTOID, -1, 0);
 + TupleDescInitEntry(tupdesc, (AttrNumber) 2, setting,
 +

Re: [HACKERS] contraints_exclusion fails to refute simple condition

2015-03-05 Thread Andrew Gierth
 Sandro == Sandro Santilli s...@keybit.net writes:

 Sandro PostGIS installs standard constraints of this kind:

 Sandro   CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL)

If geometrytype() is strict, the IS NULL condition is superfluous.
CHECK(x) _passes_ when x is null, rather than rejecting the row.

 Sandro But it is _NOT_ used if the NOT NULL condition is removed:

 Sandro   WHERE geometrytype(g) = 'LINESTRING'

 Sandro As the geometrytype is defined as STRICT and IMMUTABLE,
 Sandro there's no way for geometrytype(g) = 'LINESTRING' to hold true,
 Sandro so why is the IS NOT NULL condition also needed by the
 Sandro planner ?

 Sandro Andres Freund on IRC suggested that
 Sandro predicate_refuted_by_simple_clause() looks like trying to
 Sandro handle such cases, but if that's the case it seems to fail
 Sandro here.

It looks like it searches only one level deep into the clause, so it
knows that if OP or FUNC is strict that (g OP const) or (FUNC(g)) will
refute (g IS NULL), but not that (FUNC(g) OP const) will do so, because
the reference to g is nested too deep.

In this case the obvious thing to do is remove the redundant clause.
Whether it would be worth teaching the predicate prover to recurse in
such cases would be a performance tradeoff - likely a poor one, since it
would have to recurse into arbitrarily complex expressions.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CATUPDATE confusion?

2015-03-05 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Any other opinions?
  The options are:
  0) Leave as is.
  1) Remove catupdate and replace with superuser checks.
  2) Remove catupdate and rely on regular table permissions on catalogs.
 
 I think I vote for (1).  I do not like (2) because of the argument I made
 upthread that write access on system catalogs is far more dangerous than
 a naive DBA might think.  (0) has some attraction but really CATUPDATE
 is one more concept than we need.

I agree with #1 on this.

 On the other hand, if Stephen is going to push forward with his plan to
 subdivide superuserness, we might have the equivalent of CATUPDATE right
 back again.  (But at least it would be properly documented and
 supported...)

The subdivision of superuserness is intended only for operations which
can't be used to directly give the user superuser access back and
therefore I don't think we'd ever put back CATUPDATE or an equivilant.

I'd much rather reduce the need for direct catalog modifications by
adding additional syntax for those operations which can't be done
without modifying the catalog directly and, where it makes sense to, add
a way to control access to those operations.

For example, changing a database to not accept connections seems like
something the database owner should be allowed to do.  Perhaps that'd be
interesting to allow users other than the owner to do, perhaps it
doesn't, but that would be an independent question to address.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 On 3/3/15 5:29 PM, Stephen Frost wrote:
  For my part, I understood that we specifically didn't want to allow that
  for the same reason that we didn't want to simply depend on the GRANT
  system for the above functions, but everyone else on these discussions
  so far is advocating for using the GRANT system.  My memory might be
  wrong, but I could have sworn that I had brought up exactly that
  suggestion in years past only to have it shot down.
 
 I think a lot of this access control work is done based on some
 undocumented understandings, when in fact there is no consensus on
 anything.  I think we need more clarity.

When there hasn't been discussion on a particular topic and years of
ongoing development, all of which uses one approach, I tend to figure
that makes it an unspoking consensus.  I'm not saying we shouldn't
question that when it makes sense to, we certainly should, but I'm not
sure it makes sense to ask why didn't you attempt to get consensus on
this thing we've all been doing for years?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Additional role attributes superuser review

2015-03-05 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 On 2/28/15 10:10 PM, Stephen Frost wrote:
  * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
  I have attached and updated patch for review.
  
  Thanks!  I've gone over this and made quite a few documentation and
  comment updates, but not too much else, so I'm pretty happy with how
  this is coming along.  As mentioned elsewhere, this conflicts with the
  GetUserId() to has_privs_of_role() cleanup, but as I anticipate handling
  both this patch and that one, I'll find some way to manage. :)
  
  Updated patch attached.  Barring objections, I'll be moving forward with
  this soonish.  Would certainly appreciate any additional testing or
  review that you (or anyone!) has time to provide.
 
 Let's move this discussion to the right thread.

Agreed.

 Why are we not using roles and function execute privileges for this?

There isn't a particular reason not to, except that the existing checks
are in C code and those would need to be removed and the permission
changes done at initdb time to revoke EXECUTE from PUBLIC for these
functions.  Further, as you pointed out, we'd need to dump out the
permissions for the catalog tables and functions with this approach.  I
don't expect that to be too difficult to do though.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-03-05 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Thanks for the review.  I have pushed these patches, squashed in one
 commit, with the exception of the one that changed ALTER TABLE.  You had
 enough comments about that one that I decided to put it aside for now,
 and get the other ones done.  I will post a new series shortly.

I look forward to seeing the new series posted soon.

 I fixed (most of?) the issues you pointed out; some additional comments:

These all looked good to me.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TABLESAMPLE patch

2015-03-05 Thread Amit Kapila
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com wrote:


 I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to its
indexes). Commented about it in code also.


I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?

Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej

2.
Few warnings in code (compiled on windows(msvc))
1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'

3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
{
..
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node-ss.ss_currentScanDesc-rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c


4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+ *tuples = path-rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.

5.
gram.y

-
 /*
  * Given UPDATE foo set set ..., we have to decide without looking any


Unrelated line removed.

6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
  | PLACING
  | PRIMARY
  | REFERENCES
+ | REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.

On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;

 opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *) $3;
}
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *) $3;
}
| /*EMPTY*/
{ $$ = NULL; }



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] WALWriter active during recovery

2015-03-05 Thread Fujii Masao
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Currently, WALReceiver writes and fsyncs data it receives. Clearly,
 while we are waiting for an fsync we aren't doing any other useful
 work.

 Following patch starts WALWriter during recovery and makes it
 responsible for fsyncing data, allowing WALReceiver to progress other
 useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Robert Haas
On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Hm, why not. That would remove all inconsistencies between the parser
  and the autovacuum code path. Perhaps something like the attached
  makes sense then?

 I really don't see this patch, or any of the previous ones, as solving
 any actual problem.  There's no bug here, and no reason to suspect
 that future code changes would be particularly like to introduce one.
 Assertions are a great way to help developers catch coding mistakes,
 but it's a real stretch to think that a developer is going to add a
 new syntax for ANALYZE that involves setting options proper to VACUUM
 and not notice it.

 That was my opinion of previous patches in this thread.  But I think
 this last one makes a lot more sense: why is the parser concerned with
 figuring out the right defaults given FREEZE/not-FREEZE?  I think there
 is a real gain in clarity here by deferring those decisions until vacuum
 time.  The parser's job should be to pass the FREEZE flag down only,
 which is what this patch does, and consequently results in a (small) net
 reduction of LOC in gram.y.

Sure, I'll buy that.  If you want to refactor things that way, I have
no issue with it - I just want to point out that it seems to have zip
to do with what started this thread, which is why I wondered if we
were just talking for the sake of talking.

 Here's a simple idea to improve the patch: make VacuumParams include
 VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
 the number of arguments to be passed down in a couple of places.  In
 particular:

 vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)

 vacuum_rel(VacuumParams *params)

 Does that sound more attractive?

I dislike passing down parser nodes straight into utility commands.
It tends to make those those functions hard for in-core users to call,
and also to lead to security vulnerabilities where we look up the same
names more than once and just hope that we get the same OID every
time.  Stuffing the VacuumStmt pointer inside the VacuumParams object
doesn't, for me, help anything.  It'd be a lot more interesting if we
could get rid of that altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Robert Haas
On Wed, Mar 4, 2015 at 11:00 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Anyway, I could find out, at least, these complicated issues around
 two-phase aggregate integration with planner. Someone can suggest
 minimum invasive way for these integration?

I don't think I have the brain space to think about that until we get
the basic parallelism stuff in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 Catching up here ...
 
 On 03/03/2015 06:01 PM, Bruce Momjian wrote:
  It feels like MD5 has accumulated enough problems that we need to start
  looking for another way to store and pass passwords.  The MD5 problems
  are:
  
  1)  MD5 makes users feel uneasy (though our usage is mostly safe) 
  
  2)  The per-session salt sent to the client is only 32-bits, meaning
  that it is possible to reply an observed MD5 hash in ~16k connection
  attempts.
 
 Seems like we could pretty easily increase the size of the salt.  Of
 course, that just increases the required number of connection attempts,
 without really fixing the problem.

Please do not propose hacks on top of the existing md5 authentication
method which break the wireline protocol.  There is absolutely nothing
useful to be gained from that.  We need to introduce a new auth method
with whatever protocol changes that requires without breaking the
existing auth method.  Discussing trade-offs of changing the existing
md5 mechanism *without* breaking the wireline protocol may be worthwhile
as we might be able to improve things a bit there, but nothing there is
a proper solution for security conscious individuals.

  3)  Using the user name for the MD5 storage salt allows the MD5 stored
  hash to be used on a different cluster if the user used the same
  password. 
 
 This is a feature as well as a bug. For example, pgBouncer relies on
 this aspect of md5 auth.

It's not a feature and pgBouncer could be made to not rely on this.

  4)  Using the user name for the MD5 storage salt causes the renaming of
  a user to break the stored password.
 
 Wierdly, in 17 years of Postgres, I've never encountered this issue.

I agree, that's kind of odd. :)

 So, are we more worried about attackers getting a copy of pg_authid, or
 sniffing the hash on the wire?

They are both attack vectors to consider but most applications address
the network-based risk through TLS and have a hash-based approach to
address the pg_authid-based risk.  Administrators are used to that and
are quite surprised to discover that PG doesn't work that way.  As I
mentioned up-thread, it's certainly very rare for anyone concerned about
a network-based risk to *not* use TLS, but they tend to still use
passwords for the actual authentication mechansim and the md5 auth mech
makes the wrong trade-off for them.  The password auth mech isn't ideal
either since the server ends up seeing the PW.  The change I'm
suggesting addresses both the pg_authid-based risk and the server
seeing the PW risk without changing the wireline protocol.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 On 3/3/15 5:58 PM, Tom Lane wrote:
  One aspect of this that merits some thought is that in some cases
  access to some set of functions is best granted as a unit.  That's
  easy with role properties but much less so with plain GRANT.
  Do we have enough such cases to make it an issue?
 
 You could have built-in roles, such as backup and ship the system with
 the backup role having permissions on some functions.  And then users
 are granted those roles.  Similar to how some Linux systems ship with
 groups such as adm.

One thought I had for this was a contrib module which added an extension
to create and grant those roles.  That approach would mean that we don't
need to worry about upgrade-path problems which we could get into if we
declared new roles like 'backup' which users might already have.

An alternative approach which might be better, now that I think about
it, would be to declare that the 'pg_' prefix applies to roles too and
then have a 'pg_backup' role which is granted the correct permissions.
Personally, I like that idea a lot..

We could then have pg_upgrade throw an error and pg_dump a warning (or
something along those lines) if they find any existing roles with that
prefix.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Shigeru Hanada
Here is the v5 patch of Join push-down support for foreign tables.

Changes since v4:

- Separete remote conditions into ON and WHERE, per Ashutosh.
- Add regression test cases for foreign join.
- Don't skip reversed relation combination in OUTER join cases.

I'm now working on two issues from Kaigai-san and Ashutosu, whole-row
reference handling and use of get_joinrel_parampathinfo().

2015-03-05 22:00 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com:
 Hi Ashutosh, thanks for the review.

 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat ashutosh.ba...@enterprisedb.com:
 In create_foreignscan_path() we have lines like -
 1587 pathnode-path.param_info = get_baserel_parampathinfo(root, rel,
 1588
 required_outer);
 Now, that the same function is being used for creating foreign scan paths
 for joins, we should be calling get_joinrel_parampathinfo() on a join rel
 and get_baserel_parampathinfo() on base rel.

 Got it.  Please let me check the difference.


 The patch seems to handle all the restriction clauses in the same way. There
 are two kinds of restriction clauses - a. join quals (specified using ON
 clause; optimizer might move them to the other class if that doesn't affect
 correctness) and b. quals on join relation (specified in the WHERE clause,
 optimizer might move them to the other class if that doesn't affect
 correctness). The quals in a are applied while the join is being computed
 whereas those in b are applied after the join is computed. For example,
 postgres=# select * from lt;
  val | val2
 -+--
1 |2
1 |3
 (2 rows)

 postgres=# select * from lt2;
  val | val2
 -+--
1 |2
 (1 row)

 postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
  val | val2 | val | val2
 -+--+-+--
1 |2 |   1 |2
1 |3 | |
 (2 rows)

 postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
 lt2.val2);
  val | val2 | val | val2
 -+--+-+--
1 |2 |   1 |2
 (1 row)

 The difference between these two kinds is evident in case of outer joins,
 for inner join optimizer puts all of them in class b. The remote query
 sent to the foreign server has all those in ON clause. Consider foreign
 tables ft1 and ft2 pointing to local tables on the same server.
 postgres=# \d ft1
  Foreign table public.ft1
  Column |  Type   | Modifiers | FDW Options
 +-+---+-
  val| integer |   |
  val2   | integer |   |
 Server: loopback
 FDW Options: (table_name 'lt')

 postgres=# \d ft2
  Foreign table public.ft2
  Column |  Type   | Modifiers | FDW Options
 +-+---+-
  val| integer |   |
  val2   | integer |   |
 Server: loopback
 FDW Options: (table_name 'lt2')

 postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
 ft2.val2) where ft1.val + ft2.val  ft1.val2 or ft2.val is null;

 QUERY PLAN

 ---
 
  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
 public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
 _0, a_1)  ON r.a_0 + l.a_0)  r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
 l.a_1))
 (3 rows)

 The result is then wrong
 postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
 ft1.val + ft2.val  ft1.val2 or ft2.val is null;
  val | val2 | val | val2
 -+--+-+--
1 |2 | |
1 |3 | |
 (2 rows)

 which should match the result obtained by substituting local tables for
 foreign ones
 postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
 lt.val + lt2.val  lt.val2 or lt2.val is null;
  val | val2 | val | val2
 -+--+-+--
1 |3 | |
 (1 row)

 Once we start distinguishing the two kinds of quals, there is some
 optimization possible. For pushing down a join it's essential that all the
 quals in a are safe to be pushed down. But a join can be pushed down, even
 if quals in a are not safe to be pushed down. But more clauses one pushed
 down to foreign server, lesser are the rows fetched from the foreign server.
 In postgresGetForeignJoinPath, instead of checking all the restriction
 clauses to be safe to be pushed down, we need to check only those which are
 join quals (class a).

 The argument restrictlist of GetForeignJoinPaths contains both
 conditions mixed, so I added extract_actual_join_clauses() to separate
 it into two lists, join_quals and other clauses.  This is similar to
 what create_nestloop_plan and siblings do.



 Following EXPLAIN output seems to be confusing
 ft1 and ft2 both are pointing to same lt on a 

Re: [HACKERS] File based Incremental backup v8

2015-03-05 Thread Robert Haas
On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
  Yeah, it might make the situation better than today. But I'm afraid that
  many users might get disappointed about that behavior of an incremental
  backup after the release...
 
  I don't get what do you mean here. Can you elaborate this point?

 The proposed version of LSN-based incremental backup has some limitations
 (e.g., every database files need to be read even when there is no 
 modification
 in database since last backup, and which may make the backup time longer than
 users expect) which may disappoint users. So I'm afraid that users who can
 benefit from the feature might be very limited. IOW, I'm just sticking to
 the idea of timestamp-based one :) But I should drop it if the majority in
 the list prefers the LSN-based one even if it has such limitations.

 We need numbers on how effective each level of tracking will be.  Until
 then, the patch can't move forward.

The point is that this is a stepping stone toward what will ultimately
be a better solution.  You can use timestamps today if (a) whole-file
granularity is good enough for you and (b) you trust your system clock
to never go backwards.  In fact, if you use pg_start_backup() and
pg_stop_backup(), you don't even need a server patch; you can just go
right ahead and implement whatever you like.  A server patch would be
needed to make pg_basebackup do a file-time-based incremental backup,
but I'm not excited about that because I think the approach is a
dead-end.

If you want block-level granularity, and you should, an approach based
on file times is never going to get you there.  An approach based on
LSNs can.  If the first version of the patch requires reading the
whole database, fine, it's not going to perform all that terribly
well.  But we can optimize that later by keeping track of which blocks
have been modified since a given LSN.  If we do that, we can get
better reliability than the timestamp approach can ever offer, plus
excellent transfer and storage characteristics.

What I'm unhappy with about this patch is that it insists on sending
the whole file if a single block in that file has changed.  That is
lame.  To get something useful out of this, we should be looking to
send only those blocks whose LSNs have actually changed.  That would
reduce I/O (in the worst case, the current patch each file in its
entirety twice) and transfer bandwidth as compared to the proposed
patch.  We'd still have to read the whole database so it might very
well do more I/O than the file-timestamp approach, but it would beat
the file-timestamp approach on transfer bandwidth and on the amount of
storage required to store the incremental.  In many workloads, I
expect those savings would be quite significant.  If we then went back
in a later release and implemented one of the various proposals to
avoid needing to read every block, we'd then have a very robust and
complete solution.

But I agree with Fujii to the extent that I see little value in
committing this patch in the form proposed.  Being smart enough to use
the LSN to identify changed blocks, but then sending the entirety of
every file anyway because you don't want to go to the trouble of
figuring out how to revise the wire protocol to identify the
individual blocks being sent and write the tools to reconstruct a full
backup based on that data, does not seem like enough of a win. As
Fujii says, if we ship this patch as written, people will just keep
using the timestamp-based approach anyway.  Let's wait until we have
something that is, at least in some circumstances, a material
improvement over the status quo before committing anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Robert Haas wrote:
  On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   Hm, why not. That would remove all inconsistencies between the parser
   and the autovacuum code path. Perhaps something like the attached
   makes sense then?
  
  I really don't see this patch, or any of the previous ones, as solving
  any actual problem.  There's no bug here, and no reason to suspect
  that future code changes would be particularly like to introduce one.
  Assertions are a great way to help developers catch coding mistakes,
  but it's a real stretch to think that a developer is going to add a
  new syntax for ANALYZE that involves setting options proper to VACUUM
  and not notice it.
 
 That was my opinion of previous patches in this thread.  But I think
 this last one makes a lot more sense: why is the parser concerned with
 figuring out the right defaults given FREEZE/not-FREEZE?  I think there
 is a real gain in clarity here by deferring those decisions until vacuum
 time.  The parser's job should be to pass the FREEZE flag down only,
 which is what this patch does, and consequently results in a (small) net
 reduction of LOC in gram.y.

Yeah, that was my thinking also in my earlier review.

 Here's a simple idea to improve the patch: make VacuumParams include
 VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
 the number of arguments to be passed down in a couple of places.  In
 particular:
 
 vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)
 
 vacuum_rel(VacuumParams *params)
 
 Does that sound more attractive?

I had been hoping we'd be able to provide an API which didn't require
autovacuum to build up a VacuumStmt, but that's not a big deal and it's
doing it currently anyway.  Just mentioning it as that was one of the
other things that I had been hoping to get out of this.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] object description for FDW user mappings

2015-03-05 Thread Alvaro Herrera
When commit cae565e503 introduced FDW user mappings, it used this in
getObjectDescription for them:

appendStringInfo(buffer, _(user mapping for %s), usename);

This was later mostly copied (by yours truly) as object identity by
commit f8348ea32e wherein I used this:
appendStringInfo(buffer, %s, usename);

As it turns out, this is wrong, because the pg_user_mapping catalog has
a two-column primary key which is user OID and server OID.  Therefore
it seems to me that the correct object description and identity must
include both username and server name.  I propose we change the above to
this:

appendStringInfo(buffer, _(user mapping for %s in server %s), 
usename,
 srv-servername);

I propose to apply the (appropriately rebased) attached patch to all
back branches.  Note in particular the wording changes in some error
messages in the foreign_data test.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
From 82e8814c9f4b89d31d51b2fa370add1c04ae0f12 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Thu, 5 Mar 2015 12:30:53 -0300
Subject: [PATCH] fix user mapping object description/identity

---
 src/backend/catalog/objectaddress.c| 30 --
 src/test/regress/expected/foreign_data.out | 24 
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 70043fc..0740b4f 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -2510,14 +2510,17 @@ getObjectDescription(const ObjectAddress *object)
 HeapTuple	tup;
 Oid			useid;
 char	   *usename;
+Form_pg_user_mapping umform;
+ForeignServer *srv;
 
 tup = SearchSysCache1(USERMAPPINGOID,
 	  ObjectIdGetDatum(object-objectId));
 if (!HeapTupleIsValid(tup))
 	elog(ERROR, cache lookup failed for user mapping %u,
 		 object-objectId);
-
-useid = ((Form_pg_user_mapping) GETSTRUCT(tup))-umuser;
+umform = (Form_pg_user_mapping) GETSTRUCT(tup);
+useid = umform-umuser;
+srv = GetForeignServer(umform-umserver);
 
 ReleaseSysCache(tup);
 
@@ -2526,7 +2529,8 @@ getObjectDescription(const ObjectAddress *object)
 else
 	usename = public;
 
-appendStringInfo(buffer, _(user mapping for %s), usename);
+appendStringInfo(buffer, _(user mapping for %s in server %s), usename,
+ srv-servername);
 break;
 			}
 
@@ -3906,19 +3910,18 @@ getObjectIdentityParts(const ObjectAddress *object,
 			{
 HeapTuple	tup;
 Oid			useid;
+Form_pg_user_mapping umform;
+ForeignServer *srv;
 const char *usename;
 
-/* no objname support */
-if (objname)
-	*objname = NIL;
-
 tup = SearchSysCache1(USERMAPPINGOID,
 	  ObjectIdGetDatum(object-objectId));
 if (!HeapTupleIsValid(tup))
 	elog(ERROR, cache lookup failed for user mapping %u,
 		 object-objectId);
-
-useid = ((Form_pg_user_mapping) GETSTRUCT(tup))-umuser;
+umform = (Form_pg_user_mapping) GETSTRUCT(tup);
+useid = umform-umuser;
+srv = GetForeignServer(umform-umserver);
 
 ReleaseSysCache(tup);
 
@@ -3927,7 +3930,14 @@ getObjectIdentityParts(const ObjectAddress *object,
 else
 	usename = public;
 
-appendStringInfoString(buffer, usename);
+if (objname)
+{
+	*objname = list_make1(pstrdup(usename));
+	*objargs = list_make1(pstrdup(srv-servername));
+}
+
+appendStringInfo(buffer, %s in server %s, usename,
+ srv-servername);
 break;
 			}
 
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 632b7e5..4d0de1f 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -247,7 +247,7 @@ CREATE USER MAPPING FOR current_user SERVER s1;
 DROP FOREIGN DATA WRAPPER foo;  -- ERROR
 ERROR:  cannot drop foreign-data wrapper foo because other objects depend on it
 DETAIL:  server s1 depends on foreign-data wrapper foo
-user mapping for foreign_data_user depends on server s1
+user mapping for foreign_data_user in server s1 depends on server s1
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 SET ROLE regress_test_role;
 DROP FOREIGN DATA WRAPPER foo CASCADE;  -- ERROR
@@ -256,7 +256,7 @@ RESET ROLE;
 DROP FOREIGN DATA WRAPPER foo CASCADE;
 NOTICE:  drop cascades to 2 other objects
 DETAIL:  drop cascades to server s1
-drop cascades to user mapping for foreign_data_user
+drop cascades to user mapping for foreign_data_user in server s1
 \dew+
 List of foreign-data wrappers
 Name|   Owner   | Handler |Validator | Access privileges | FDW Options | Description 
@@ -526,10 +526,10 @@ CREATE 

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
  So, are we more worried about attackers getting a copy of pg_authid, or
  sniffing the hash on the wire?
 
 Both.  Stephen is more worried about pg_authid, but I am more worried
 about sniffing.

I'm also worried about both, but if the admin is worried about sniffing
in their environment, they're much more likely to use TLS than to set up
client side certificates, kerberos, or some other strong auth mechanism,
simply because TLS is pretty darn easy to get working and distros set it
up for you by default.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Pavel Stehule
2015-03-04 22:41 GMT+01:00 Peter Eisentraut pete...@gmx.net:

 On 3/3/15 7:17 PM, Jim Nasby wrote:
  I think we're screwed in that regard anyway, because of the special
  constructs. You'd need different logic to handle things like +role and
  sameuser. We might even end up painted in a corner where we can't change
  it in the future because it'll break everyone's scripts.

 Yeah, I'm getting worried about this.  I think most people agree that
 getting a peek at pg_hba.conf from within the server is useful, but
 everyone seems to have quite different uses for it.  Greg wants to join
 against other catalog tables, Jim wants to reassemble a valid and
 accurate pg_hba.conf, Josh wants to write an editing tool.  Personally,
 I'd like to see something as close to the actual file as possible.

 If there were an obviously correct way to map the various special
 constructs to the available SQL data types, then fine.  But if there
 isn't, then we shouldn't give a false overinterpretation.  So I'd render
 everything that's disputed as a plain text field.  (Not even an array of
 text.)


I disagree with last note - arrays where is expected are valid. I don't see
any reason why anybody have to do parsing some informations from any table.

The face of pg_hba.conf in SQL space can be different than original file -
but all data should be clean (without necessity of additional parsing)

Regards

Pavel




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Shigeru Hanada
Hi Ashutosh, thanks for the review.

2015-03-04 19:17 GMT+09:00 Ashutosh Bapat ashutosh.ba...@enterprisedb.com:
 In create_foreignscan_path() we have lines like -
 1587 pathnode-path.param_info = get_baserel_parampathinfo(root, rel,
 1588
 required_outer);
 Now, that the same function is being used for creating foreign scan paths
 for joins, we should be calling get_joinrel_parampathinfo() on a join rel
 and get_baserel_parampathinfo() on base rel.

Got it.  Please let me check the difference.


 The patch seems to handle all the restriction clauses in the same way. There
 are two kinds of restriction clauses - a. join quals (specified using ON
 clause; optimizer might move them to the other class if that doesn't affect
 correctness) and b. quals on join relation (specified in the WHERE clause,
 optimizer might move them to the other class if that doesn't affect
 correctness). The quals in a are applied while the join is being computed
 whereas those in b are applied after the join is computed. For example,
 postgres=# select * from lt;
  val | val2
 -+--
1 |2
1 |3
 (2 rows)

 postgres=# select * from lt2;
  val | val2
 -+--
1 |2
 (1 row)

 postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
  val | val2 | val | val2
 -+--+-+--
1 |2 |   1 |2
1 |3 | |
 (2 rows)

 postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
 lt2.val2);
  val | val2 | val | val2
 -+--+-+--
1 |2 |   1 |2
 (1 row)

 The difference between these two kinds is evident in case of outer joins,
 for inner join optimizer puts all of them in class b. The remote query
 sent to the foreign server has all those in ON clause. Consider foreign
 tables ft1 and ft2 pointing to local tables on the same server.
 postgres=# \d ft1
  Foreign table public.ft1
  Column |  Type   | Modifiers | FDW Options
 +-+---+-
  val| integer |   |
  val2   | integer |   |
 Server: loopback
 FDW Options: (table_name 'lt')

 postgres=# \d ft2
  Foreign table public.ft2
  Column |  Type   | Modifiers | FDW Options
 +-+---+-
  val| integer |   |
  val2   | integer |   |
 Server: loopback
 FDW Options: (table_name 'lt2')

 postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
 ft2.val2) where ft1.val + ft2.val  ft1.val2 or ft2.val is null;

 QUERY PLAN

 ---
 
  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
 public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
 _0, a_1)  ON r.a_0 + l.a_0)  r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
 l.a_1))
 (3 rows)

 The result is then wrong
 postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
 ft1.val + ft2.val  ft1.val2 or ft2.val is null;
  val | val2 | val | val2
 -+--+-+--
1 |2 | |
1 |3 | |
 (2 rows)

 which should match the result obtained by substituting local tables for
 foreign ones
 postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
 lt.val + lt2.val  lt.val2 or lt2.val is null;
  val | val2 | val | val2
 -+--+-+--
1 |3 | |
 (1 row)

 Once we start distinguishing the two kinds of quals, there is some
 optimization possible. For pushing down a join it's essential that all the
 quals in a are safe to be pushed down. But a join can be pushed down, even
 if quals in a are not safe to be pushed down. But more clauses one pushed
 down to foreign server, lesser are the rows fetched from the foreign server.
 In postgresGetForeignJoinPath, instead of checking all the restriction
 clauses to be safe to be pushed down, we need to check only those which are
 join quals (class a).

The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses.  This is similar to
what create_nestloop_plan and siblings do.



 Following EXPLAIN output seems to be confusing
 ft1 and ft2 both are pointing to same lt on a foreign server.
 postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
 ft1.val + ft1.val2 = ft2.val;

 QUERY PLAN

 ---
 --
  Foreign Scan  (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM 

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-03-05 Thread Fujii Masao
On Thu, Feb 19, 2015 at 3:32 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Feb 19, 2015 at 2:13 PM, Naoya Anzai
 anzai-na...@mxu.nes.nec.co.jp wrote:
 As a result, I think you should not delete VACOPT_VERBOSE.

 In v8 it is not deleted. It is still declared, and its use is isolated
 in gram.y, similarly to VACOPT_FREEZE.

 According to the last mail I have posted, the difference of
 manual-vacuum log and auto-vacuum log exists clearly.

 Did you test the latest patch v8? I have added checks in it to see if
 a process is an autovacuum worker to control elevel and the extra logs
 of v7 do not show up.

 So, at least you should not touch the mechanism of VACOPT_VERBOSE
 until both vacuum log formats are unified to a same format.

 If you mean that we should have the same kind of log outputs for
 autovacuum and manual vacuum, I think that this is not going to
 happen. Autovacuum entries are kept less verbose on purpose, contract
 that v7 clealy broke.

 If you agree my think, please undo your removing VACOPT_VERBOSE work.

 Well, I don't agree :) And I am guessing that you did not look at v8
 as well. Centralizing the control of logs using log_min_duration is
 more extensible than simply having VACOPT_VERBOSE.

With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message.
Why did you remove that functionality?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Greg Stark
On Wed, Mar 4, 2015 at 9:41 PM, Peter Eisentraut pete...@gmx.net wrote:
 everyone seems to have quite different uses for it.  Greg wants to join
 against other catalog tables, Jim wants to reassemble a valid and
 accurate pg_hba.conf, Josh wants to write an editing tool.  Personally,
 I'd like to see something as close to the actual file as possible.

Well if you want to read the file as is you can do so using the file
reading functions which afaik were specifically intended for the
purpose of writing config editing tools.


Just to repeat, even if you're reassembling a valid and accurage
pg_hba.conf or writing an editing tool you can do that with what we
have today as long as there are no databases named all, sameuser,
or samerole and no roles named all. That's something an editing
tool could check and provide a warning for to the user and refuse to
write a config file if it's the case and I doubt there are any such
users out there anyways.

Having five separate columns would work but I just think it's way more
complicated than necessary and burdens everyone who wants to use the
view less formally.


 If there were an obviously correct way to map the various special
 constructs to the available SQL data types, then fine.  But if there
 isn't, then we shouldn't give a false overinterpretation.  So I'd render
 everything that's disputed as a plain text field.  (Not even an array of
 text.)

Joining against other catalog tables may be kind of exaggerated but if
I have data in an SQL view I certainly expect to be able to write
WHERE clauses to find the rows I want without having to do string
parsing. If it were a comma-delimited string I have to do something
like WHERE users LIKE '%,username,%' but then that's not good enough
since it may be the first or last and the username itself may contain
white-space or a comma etc etc.

I think knowing about the + prefix and the risk of tokens like all
and sameuser etc are pretty small compromises to make. But having to
know the quoting rules and write a tokenizer are too far.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-03-05 Thread Etsuro Fujita

On 2015/03/04 17:07, Etsuro Fujita wrote:

On 2015/03/04 16:58, Albe Laurenz wrote:

Etsuro Fujita wrote:

While updating the patch, I noticed that in the previous patch, there is
a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
for such queries fail with a can't-happen error.  I fixed the bug and
tried to add the regression tests that execute the generic plans, but I
couldn't because I can't figure out how to force generic plans.  Is
there any way to do that?


I don't know about a way to force it, but if you run the statement six
times, it will probably switch to a generic plan.


Yeah, I was just thinking running the statement six times in the
regression tests ...


Here is an updated version.  In this version, the bug has been fixed, 
but any regression tests for that hasn't been added, because I'm not 
sure that the above way is a good idea and don't have any other ideas.


The EXPLAIN output has also been improved as discussed in [1].

On top of this, I'll try to extend the join push-down patch to support a 
pushed-down update on a join, though I'm still digesting the series of 
patches.


Comments welcome.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/31942.1410534...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 189,198  is_foreign_expr(PlannerInfo *root,
  	if (!foreign_expr_walker((Node *) expr, glob_cxt, loc_cxt))
  		return false;
  
- 	/* Expressions examined here should be boolean, ie noncollatable */
- 	Assert(loc_cxt.collation == InvalidOid);
- 	Assert(loc_cxt.state == FDW_COLLATE_NONE);
- 
  	/*
  	 * An expression which includes any mutable functions can't be sent over
  	 * because its result is not stable.  For example, sending now() remote
--- 189,194 
***
*** 788,794  deparseTargetList(StringInfo buf,
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
--- 784,791 
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.  Caller is responsible for initializing the result list
!  * to empty if necessary.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
***
*** 805,813  appendWhereClause(StringInfo buf,
  	int			nestlevel;
  	ListCell   *lc;
  
- 	if (params)
- 		*params = NIL;			/* initialize result list to empty */
- 
  	/* Set up context struct for recursion */
  	context.root = root;
  	context.foreignrel = baserel;
--- 802,807 
***
*** 940,945  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 934,996 
  }
  
  /*
+  * deparse remote UPDATE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparsePushedDownUpdateSql(StringInfo buf, PlannerInfo *root,
+ 		   Index rtindex, Relation rel,
+ 		   List	*targetlist,
+ 		   List *targetAttrs,
+ 		   List	*remote_conds,
+ 		   List **params_list,
+ 		   List *returningList,
+ 		   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root-simple_rel_array[rtindex];
+ 	deparse_expr_cxt context;
+ 	bool		first;
+ 	ListCell   *lc;
+ 
+ 	if (params_list)
+ 		*params_list = NIL;		/* initialize result list to empty */
+ 
+ 	/* Set up context struct for recursion */
+ 	context.root = root;
+ 	context.foreignrel = baserel;
+ 	context.buf = buf;
+ 	context.params_list = params_list;
+ 
+ 	appendStringInfoString(buf, UPDATE );
+ 	deparseRelation(buf, rel);
+ 	appendStringInfoString(buf,  SET );
+ 
+ 	first = true;
+ 	foreach(lc, targetAttrs)
+ 	{
+ 		int			attnum = lfirst_int(lc);
+ 		TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+ 
+ 		if (!first)
+ 			appendStringInfoString(buf, , );
+ 		first = false;
+ 
+ 		deparseColumnRef(buf, rtindex, attnum, root);
+ 		appendStringInfo(buf,  = );
+ 		deparseExpr((Expr *) tle-expr, context);
+ 	}
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	deparseReturningList(buf, root, rtindex, rel, false,
+ 		 returningList, retrieved_attrs);
+ }
+ 
+ /*
   * deparse remote DELETE statement
   *
   * The statement text is appended to buf, and we also create an integer List
***
*** 962,967  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 1013,1048 
  }
  
  /*
+  * deparse remote DELETE statement
+  *
+  * The statement 

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Syed, Rahila

Hello,

Please find attached  a patch. As discussed, flag to denote compression and 
presence of hole in block image has been added in XLogRecordImageHeader rather 
than block header.  

Following are WAL numbers based on attached  test script  posted by Michael 
earlier in the thread.
 
  WAL generated
FPW compression on   122.032 MB

FPW compression off   155.223 MB

HEAD  155.236 MB 

Compression : 21 %
Number of block images generated in WAL :   63637


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


Support-compression-of-full-page-writes-in-WAL_v23.patch
Description: Support-compression-of-full-page-writes-in-WAL_v23.patch


compress_run.bash
Description: compress_run.bash

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contraints_exclusion fails to refute simple condition

2015-03-05 Thread Sandro Santilli
PostGIS installs standard constraints of this kind:

  CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL)

The constraint is used by constraint_exclusion if using this condition:

  WHERE g IS NOT NULL AND geometrytype(g) = 'LINESTRING'

But it is _NOT_ used if the NOT NULL condition is removed:

  WHERE geometrytype(g) = 'LINESTRING'

As the geometrytype is defined as STRICT and IMMUTABLE, there's
no way for geometrytype(g) = 'LINESTRING' to hold true, so why
is the IS NOT NULL condition also needed by the planner ?

Andres Freund on IRC suggested that predicate_refuted_by_simple_clause()
looks like trying to handle such cases, but if that's the case it seems
to fail here.

--strk; 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Michael Paquier
On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached  a patch. As discussed, flag to denote compression and 
 presence of hole in block image has been added in XLogRecordImageHeader 
 rather than block header.

 Following are WAL numbers based on attached  test script  posted by Michael 
 earlier in the thread.

   WAL generated
 FPW compression on   122.032 MB

 FPW compression off   155.223 MB

 HEAD  155.236 MB

 Compression : 21 %
 Number of block images generated in WAL :   63637

ISTM that we are getting a nice thing here. I tested the patch and WAL
replay is working correctly.

Some nitpicky comments...

+ * bkp_info stores flags for information about the backup block image
+ * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image
is compressed.
+ * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a
block image.
+ * If the block image has no hole, it is ensured that the raw size of
a compressed
+ * block image is equal to BLCKSZ, hence the contents of
+ * XLogRecordBlockImageCompressionInfo are not necessary.
Take care of the limit of 80 characters per line. (Perhaps you could
run pgindent on your code before sending a patch?). The first line of
this paragraph is a sentence in itself, no?

In xlogreader.c, blk-with_hole is a boolean, you could remove the ==0
and ==1 it is compared with.

+  /*
+   * Length of a block image must be less than BLCKSZ
+   * if the block has hole
+   */
if the block has a hole. (End of the sentence needs a dot.)

+   /*
+* Length of a block image must be equal to BLCKSZ
+* if the block does not have hole
+*/
if the block does not have a hole.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Andres Freund
On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
 Please find attached  a patch. As discussed, flag to denote
 compression and presence of hole in block image has been added in
 XLogRecordImageHeader rather than block header.

FWIW, I personally won't commit it with things done that way. I think
it's going the wrong way, leading to a harder to interpret and less
flexible format.  I'm not going to further protest if Fujii or Heikki
commit it this way though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anyarray

2015-03-05 Thread Andrew Dunstan


On 03/04/2015 10:28 PM, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

On 2/13/15 10:20 AM, Teodor Sigaev wrote:

Some of users of intarray contrib module wish to use its features with
another kind of arrays, not only for int4 type. Suggested module
generalizes intarray over other (not all) types op pgsql.

I think this module should be merged with the intarray module.  Having
two modules with very similar functionality would be confusing.

Perhaps.  I think it would be hard to remove intarray without breaking
things for existing users of it; even if the functionality remains under
another name.  And surely we don't want to generalize intarray while
keeping that same name.  So it might be hard to get to a clean solution.

Speaking of names, I can't avoid the feeling that it is a seriously bad
idea to name an extension the same thing as an existing core type.





+1. We have far too much experience already of this type of naming 
confusion.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Alvaro Herrera
Robert Haas wrote:
 On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Hm, why not. That would remove all inconsistencies between the parser
  and the autovacuum code path. Perhaps something like the attached
  makes sense then?
 
 I really don't see this patch, or any of the previous ones, as solving
 any actual problem.  There's no bug here, and no reason to suspect
 that future code changes would be particularly like to introduce one.
 Assertions are a great way to help developers catch coding mistakes,
 but it's a real stretch to think that a developer is going to add a
 new syntax for ANALYZE that involves setting options proper to VACUUM
 and not notice it.

That was my opinion of previous patches in this thread.  But I think
this last one makes a lot more sense: why is the parser concerned with
figuring out the right defaults given FREEZE/not-FREEZE?  I think there
is a real gain in clarity here by deferring those decisions until vacuum
time.  The parser's job should be to pass the FREEZE flag down only,
which is what this patch does, and consequently results in a (small) net
reduction of LOC in gram.y.

Here's a simple idea to improve the patch: make VacuumParams include
VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
the number of arguments to be passed down in a couple of places.  In
particular:

vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)

vacuum_rel(VacuumParams *params)

Does that sound more attractive?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Ashutosh Bapat
On Thu, Mar 5, 2015 at 9:30 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  On Wed, Mar 4, 2015 at 4:41 AM, David Rowley dgrowle...@gmail.com
 wrote:
   This thread mentions parallel queries as a use case, but that means
   passing data between processes, and that requires being able to
   serialize and deserialize the aggregate state somehow. For actual data
   types that's not overly difficult I guess (we can use in/out
 functions),
   but what about aggretates using 'internal' state? I.e. aggregates
   passing pointers that we can't serialize?
  
   This is a good question. I really don't know the answer to it as I've
 not
   looked at the Robert's API for passing data between backends yet.
  
   Maybe Robert or Amit can answer this?
 
  I think parallel aggregation will probably require both the
  infrastructure discussed here, namely the ability to combine two
  transition states into a single transition state, and also the ability
  to serialize and de-serialize transition states, which has previously
  been discussed in the context of letting hash aggregates spill to
  disk.  My current thinking is that the parallel plan will look
  something like this:
 
  HashAggregateFinish
  - Funnel
  - HashAggregatePartial
- PartialHeapScan
 
  So the workers will all pull from a partial heap scan and each worker
  will aggregate its own portion of the data.  Then it'll need to pass
  the results of that step back to the master, which will aggregate the
  partial results and produce the final results.  I'm guessing that if
  we're grouping on, say, column a, the HashAggregatePartial nodes will
  kick out 2-column tuples of the form (value for a, serialized
  transition state for group with that value for a).
 
 It may not be an urgent topic to be discussed towards v9.5, however,
 I'd like to raise a topic about planner and aggregates.

 Once we could get the two phase aggregation, planner needs to pay
 attention possibility of partial aggregate during plan construction,
 even though our current implementation attach Agg node after the
 join/scan plan construction.

 Probably, a straightforward design is to add FunnelPath with some
 child nodes on set_rel_pathlist() or add_paths_to_joinrel().
 Its child node may be PartialAggregate node (or some other parallel
 safe node of course). In this case, it must inform the planner this
 node (for more correctness, targetlist of the node) returns partial
 aggregation, instead of the values row-by-row.
 Then, planner need to track and care about which type of data shall
 be returned to the upper node. Path node may have a flag to show it,
 and we also may need to inject dummy PartialAggregate if we try to
 join a relation that returns values row-by-row and another one that
 returns partial aggregate.
 It also leads another problem. The RelOptInfo-targetlist will
 depend on the Path node chosen. Even if float datum is expected as
 an argument of AVG(), its state to be combined is float[3].
 So, we will need to adjust the targetlist of RelOptInfo, once Path
 got chosen.

 Anyway, I could find out, at least, these complicated issues around
 two-phase aggregate integration with planner. Someone can suggest
 minimum invasive way for these integration?


Postgres-XC solved this question by creating a plan with two Agg/Group
nodes, one for combining transitioned result and one for creating the
distributed transition results (one per distributed run per group). So,
Agg/Group for combining result had as many Agg/Group nodes as there are
distributed/parallel runs. But XC chose this way to reduce the code
footprint. In Postgres, we can have different nodes for combining and
transitioning as you have specified above. Aggregation is not pathified in
current planner, hence XC took the approach of pushing the Agg nodes down
the plan tree when there was distributed/parallel execution possible. If we
can get aggregation pathified, we can go by path-based approach which might
give a better judgement of whether or not to distribute the aggregates
itself.

Looking at Postgres-XC might be useful to get ideas. I can help you there.


 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-05 Thread Jim Nasby

On 3/2/15 10:58 AM, Sawada Masahiko wrote:

On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby jim.na...@bluetreble.com wrote:

On 2/24/15 8:28 AM, Sawada Masahiko wrote:


According to the above discussion, VACUUM and REINDEX should have
trailing options. Tom seems (to me) suggesting that SQL-style
(bare word preceded by WITH) options and Jim suggesting '()'
style options? (Anyway VACUUM gets the*third additional*  option
sytle, but it is the different discussion from this)



Well, almost everything does a trailing WITH. We need to either stick with
that for consistency, or add leading () as an option to those WITH commands.

Does anyone know why those are WITH? Is it ANSI?

As a refresher, current commands are:

  VACUUM (ANALYZE, VERBOSE) table1 (col1);
  REINDEX INDEX index1 FORCE;
  COPY table1 FROM 'file.txt' WITH (FORMAT csv);
  CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
  CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
  CREATE ROLE role WITH LOGIN;
  GRANT  WITH GRANT OPTION;
  CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
  ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
  DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;


BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the 
most consistent with everything else. Is there a problem with doing 
that? I know getting syntax is one of the hard parts of new features, 
but it seems like we reached consensus here...



We have discussed about this option including FORCE option, but I
think there are not user who want to use both FORCE and VERBOSE option
at same time.


I find that very hard to believe... I would expect a primary use case 
for VERBOSE to be I ran REINDEX, but it doesn't seem to have done 
anything... what's going on? and that's exactly when you might want to 
use FORCE.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-05 Thread David Rowley
On 6 March 2015 at 19:01, Ashutosh Bapat ashutosh.ba...@enterprisedb.com
wrote:

 Postgres-XC solved this question by creating a plan with two Agg/Group
 nodes, one for combining transitioned result and one for creating the
 distributed transition results (one per distributed run per group).



 So, Agg/Group for combining result had as many Agg/Group nodes as there
 are distributed/parallel runs.


This sounds quite like the planner must be forcing the executor to having
to execute the plan on a fixed number of worker processes.

I really hoped that we could, one day, have a load monitor process that
decided what might be the best number of threads to execute a parallel plan
on. Otherwise how would we decide how many worker processes to allocate to
a plan? Surely there must be times where only utilising half of the
processors for a query would be better than trying to use all processors
and having many more context switched to perform.

Probably the harder part about dynamically deciding the number of workers
would be around the costing. Where maybe the plan will execute the fastest
with 32 workers, but if it was only given 2 workers then it might execute
better as a non-parallel plan.


 But XC chose this way to reduce the code footprint. In Postgres, we can
 have different nodes for combining and transitioning as you have specified
 above. Aggregation is not pathified in current planner, hence XC took the
 approach of pushing the Agg nodes down the plan tree when there was
 distributed/parallel execution possible. If we can get aggregation
 pathified, we can go by path-based approach which might give a better
 judgement of whether or not to distribute the aggregates itself.

 Looking at Postgres-XC might be useful to get ideas. I can help you there.



 Regards

David Rowley


Re: [HACKERS] pg_upgrade and rsync

2015-03-05 Thread Vladimir Borodin

 6 марта 2015 г., в 6:11, Bruce Momjian br...@momjian.us написал(а):
 
 On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
You are correct that a pg_controldata file is copied over that has
wal_level=minimal, but that should not be a problem.
 
 
 I suppose, this is the root cause of why replica does not start as hot 
 standby.
 It it enough to start it as warm standby, but not hot standby.
 See CheckRequiredParameterValues function in xlog.c which is called inside of
 StartupXLOG function.
 
 Yes, you are correct.  I spent all day building a test harness so I
 could automate this setup and test various failures.  I was able to
 reproduce your failure, and you are correct that the proper fix is to
 set wal_level=hot_standby on the new master, and then start and stop the
 new cluster just before rsync.
 
 The root cause is that pg_upgrade calls pg_resetxlog -o on the new
 cluster _after_ the new cluster stopped for the final time, so rsync is
 copying the incorrect pg_controldata wal_level value.  Also, even if
 pg_resetxlog preserved wal_level in the control file, there is no
 guarantee that the user configured the new cluster's wal_level for
 hot_standby anyway.
 
 What I have done is to update the pg_upgrade instructions to add this
 required step.  Updated doc patch attached.  (I also added the --delete
 flag to rsync.)  Thanks so much for your detailed report.

It seems to work fine now. The only thing that would be nice to change is to 
explicitly write that these instructions are correct for hot standby 
installations too.

+ para
+  If you have Log-Shipping Standby Servers (xref
+  linkend=warm-standby), follow these steps to upgrade them (before
+  starting any servers):
+ /para

Actually, I’ve entered this thread because it is not obvious from the paragraph 
above or any other places.

 
But it could not be done with --size-only key, because control-file is
of fixed
size and rsync would skip it. Or may be everything should be copied
with
--size-only and control-file should be copied without this option.
 
 
Well, what happens is that there is no _new_ standby pg_controldata
file, so it is copied fully from the new master.  Are you running initdb
to create the new standby --- you shouldn't be doing that as the rsync
will do that for you.  
 
 
 No, I don’t. The scenario of the problem with copying control-file was in 
 case
 when I:
 1. ran pg_upgrade on master and got control-file with wal_level = minimal,
 2. did rsync --size-only to replica (and it got this control-file with
 wal_level = minimal),
 3. started and stopped postgres on master to get «good» control-file with
 wal_level = hot_standby»,
 4. did rsync --size-only to replica one more time. And this time control-file
 is not copied because of the same size of control-file.
 
 Actually, if you don’t do step 2, everything works as expected. Sorry for
 bothering you.
 
 Ah, yes, I think doing rsync twice is never a good suggestion.  It can
 lead to too many failures.  Doing the start/stop before rsync seems like
 the best solution.
 
 -- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
 
  + Everyone has their own god. +
 rsync.diff


--
May the force be with you…
https://simply.name



Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Ashutosh Bapat
On Fri, Mar 6, 2015 at 12:41 PM, David Rowley dgrow...@gmail.com wrote:

 On 6 March 2015 at 19:01, Ashutosh Bapat ashutosh.ba...@enterprisedb.com
 wrote:

 Postgres-XC solved this question by creating a plan with two Agg/Group
 nodes, one for combining transitioned result and one for creating the
 distributed transition results (one per distributed run per group).



 So, Agg/Group for combining result had as many Agg/Group nodes as there
 are distributed/parallel runs.


 This sounds quite like the planner must be forcing the executor to having
 to execute the plan on a fixed number of worker processes.

 I really hoped that we could, one day, have a load monitor process that
 decided what might be the best number of threads to execute a parallel plan
 on. Otherwise how would we decide how many worker processes to allocate to
 a plan? Surely there must be times where only utilising half of the
 processors for a query would be better than trying to use all processors
 and having many more context switched to perform.

 Probably the harder part about dynamically deciding the number of workers
 would be around the costing. Where maybe the plan will execute the fastest
 with 32 workers, but if it was only given 2 workers then it might execute
 better as a non-parallel plan.


XC does that, because it knew on how many nodes it had to distribute the
aggregation while creating the plan. To keep that dynamic, we can add a
place-holder planner node for producing transitioned results for a given
distributed run. At the time of execution, that node creates one executor
node (corresponding to the place-holder node) per parallel run. I haven't
seen a precedence in PG code to create more than one executor node for a
given planner node, but is that a rule?




 But XC chose this way to reduce the code footprint. In Postgres, we can
 have different nodes for combining and transitioning as you have specified
 above. Aggregation is not pathified in current planner, hence XC took the
 approach of pushing the Agg nodes down the plan tree when there was
 distributed/parallel execution possible. If we can get aggregation
 pathified, we can go by path-based approach which might give a better
 judgement of whether or not to distribute the aggregates itself.

 Looking at Postgres-XC might be useful to get ideas. I can help you there.



  Regards

 David Rowley




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Michael Paquier
On Fri, Mar 6, 2015 at 12:42 AM, Robert Haas wrote:
 On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera wrote:
 Here's a simple idea to improve the patch: make VacuumParams include
 VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
 the number of arguments to be passed down in a couple of places.  In
 particular:

 vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)

 vacuum_rel(VacuumParams *params)

 Does that sound more attractive?

 I dislike passing down parser nodes straight into utility commands.
 It tends to make those those functions hard for in-core users to call,
 and also to lead to security vulnerabilities where we look up the same
 names more than once and just hope that we get the same OID every
 time.  Stuffing the VacuumStmt pointer inside the VacuumParams object
 doesn't, for me, help anything.  It'd be a lot more interesting if we
 could get rid of that altogether.

Do you mean removing totally VacuumStmt from the stack? We would then
need to add relation and va_cols as additional arguments of things
like vacuum_rel, analyze_rel, do_analyze_rel or similar.

FWIW, adding do_toast and for_wraparound into VacuumParams makes sense
to me, but not VacuumStmt. It has little meaning as VacuumParams
should be used for parameters.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-03-05 Thread Fujii Masao
On Thu, Mar 5, 2015 at 9:49 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 7:10 PM, Fujii Masao wrote:
 With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message.
 Why did you remove that functionality?

 Oops. Sorry about that. In gram.y, the combination of VacuumStmt with
 AnalyzeStmt overwrote the value of log_min_duration incorrectly. I
 also found another bug related to logging of ANALYZE not working
 correctly because of the use of IsAutoVacuumWorkerProcess() instead of
 VACOPT_VERBOSE (this is reducing the diffs of the patch btw). All
 those things are fixed in the attached.

Thanks for updating the patch!

Why does log_min_duration need to be set even when manual VACUUM command is
executed? Per the latest version of the patch, log_min_duration is checked only
when the process is autovacuum worker. So ISTM that log_min_duration doesn't
need to be set in gram.y. It's even confusing to me. Or if you're going to
implement something like VACUUM VERBOSE DURATION n (i.e., verbose message
is output if n seconds have been elapsed), that might be necessary, though...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Tom Lane
=?UTF-8?Q?David_Kube=C4=8Dka?= kubecka@gmail.com writes:
 There is division by loop_count because of predicted effect of caching and
 it is exactly this division which makes the run_cost for single index
 lookup so low compared with the query version with random_fk_uniq.  So the
 main problem is that the planner calls cost_index with loop_count equal to
 number of rows of inner table, *but it ignores semi-join semantics*, i.e.
 it doesn't account for the number of *unique* rows in the inner table which
 will actually be the number of loops.

Good point.

 Since this is my first time looking into the optimizer (and in fact any
 postgres) code I am not yet able to locate the exact place where this
 should be repaired, but I hope that in few days I will have a patch :-)

Hm, this might not be the best problem to tackle for your first Postgres
patch :-(.  The information about the estimated number of unique rows
isn't readily available at the point where we're creating parameterized
paths.  When we do compute such an estimate, it's done like this:

pathnode-path.rows = estimate_num_groups(root, uniq_exprs, rel-rows);

where uniq_exprs is a list of right-hand-side expressions we've determined
belong to the semijoin, and rel-rows represents the raw size of the
semijoin's RHS relation (which might itself be a join).  Neither of those
things are available to create_index_path.

I chewed on this for awhile and decided that there'd be no real harm in
taking identification of the unique expressions out of 
create_unique_path() and doing it earlier, in initsplan.c; we'd need a
couple more fields in SpecialJoinInfo but that doesn't seem like a
problem.  However, rel-rows is a *big* problem; we simply have not made
any join size estimates yet, and can't, because these things are done
bottom up.

However ... estimate_num_groups's dependency on its rowcount input is not
large (it's basically using it as a clamp).  So conceivably we could have
get_loop_count just multiply together the sizes of the base relations
included in the semijoin's RHS to get a preliminary estimate of that
number.  This would be the right thing anyway for a single relation in the
RHS, which is the most common case.  It would usually be an overestimate
for join RHS, but we could hope that the output of estimate_num_groups
wouldn't be affected too badly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers