Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Sat, Mar  7, 2015 at 12:52:15PM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Mar  6, 2015 at 07:00:10PM -0500, Stephen Frost wrote:
   suggested to me as one change we could make that would reduce the risk
   of disk-based attacks while trading that off for a higher risk on the
   side of network-based attacks while not breaking the existing network
   protocol.  To make it very clear- it is not a solution but rather a poor
   bandaid.  What we really need is a new password based protocol which is
   based on a future-proof, well designed protocol, such as SCRAM.
  
  Again, agreed.
 
 Great.
 
 Have you had a chance to review the SCRAM RFC or do you have any
 questions as it relates to how SCRAM works?  I'd love to start a proper
 discussion about how we go about implementing it.

I am feeling we have to wait for 9.5 to settle first.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Sat, Mar  7, 2015 at 12:49:15PM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Mar  6, 2015 at 07:02:36PM -0500, Stephen Frost wrote:
   * Bruce Momjian (br...@momjian.us) wrote:
I think the best solution to this would be to introduce a per-cluster
salt that is used for every password hash.  That way, you could not
replay a pg_authid hash on another server _unless_ you had manually
assigned the same cluster salt to both servers, or connection pooler.
   
   Wouldn't that break the wireline protocol, unless you used a fixed set
   of known challenges?  Perhaps I'm not following what you mean by a
   cluster-wide salt here.
  
  Well, right now the server sends a random 32-bit number as session salt,
  but due to the birthday problem, that can repeat in ~16k connection
  attempts.  If we use an incremented counter for each session, we spread
  the salt evenly over the entire 32-bit key space, making replays much
  more difficult, e.g. 4 billion.
 
 Ok, this is the incremented counter approach you brought up previously.
 Using the term 'cluster-wide salt' confused me as I thought you were
 referring to changing the on-disk format somehow with this.

Yes, I used the term cluster-wide salt in two cases:  first,
cluster-wide counter for the MD5 session salt improvement, and second,
cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse
if the cluster-wide fixed salt was set to match on two clusters, e.g.
for pooling.

  I don't think the client would ever know the number was different from 
  the random number we used to send.
 
 Agreed.
 
  The big win for this idea is that it requires no client or admin
  changes, while your idea of using a new salt does.  My personal opinion
  is that the 32-bit counter idea improves MD5 replays, and that we are
  better off going with an entirely new authentication method to fix the
  pg_authid vulnerability.
 
 There's apparently some confusion here as my approach does not require
 any client or admin changes either.  If users are not using TLS today

Really?  From your previous email I see:

 this approach looks like this: pre-determine and store the values (on a
 per-user basis, so a new field in pg_authid or some hack on the existing
 field) which will be sent to the client in the AuthenticationMD5Password
 message.  Further, calculate a random salt to be used when storing data
 in pg_authid.  Then, for however many variations we feel are necessary,
 calculate and store, for each AuthenticationMD5Password value:

How do you generate these X versions of password hashes without admin
changes?  In fact, I am not even sure how the server would create these
unless it had the raw passwords.

 then they will be slightly more susceptible to network-based sniffing

Slightly?  X versions as stored in pg_authid vs 16k or 4 billion?

 attacks than they are today due to the replay issue, but they are
 already at risk to a variety of other (arguably worse) attacks in that
 case.  Further, we can at least tell users about those risks and provide
 a way to address them.

Right, but again, the user can choose to use TLS if they wish.  I think
you are saying MD5 replay security is worthless without TLS, but I can
assure you many users are happy with that.  The fact this issue rarely
comes up is a testament to that, and in fact, until we documented
exactly how MD5 worked, we got many more questions about MD5.  Perhaps
we should document the pg_authid reuse risk.

  I think your argument is that if users are using TLS, there is no replay
  problem and you want to focus on the pg_authid problem.  I think fixing
  the pg_authid problem inside MD5 is just too complex and we are better
  off creating a new authentication method for that.  
 
 There is no replay problem if users are using TLS, that's correct.  I
 don't believe the approach I've outlined is particularly complex, but
 that's clearly a biased viewpoint.
 
 I certainly agree that we need to create a new authentication method to
 address these issues properly and would love to discuss that further
 rather than discuss the merits of these potential improvements to md5.
 
  The bottom line is we promised MD5 to prevent replay, but not pg_authid
  stealing, and if we can improve on what we promised, we should do that
  and not hijack MD5 to fix a different problem, particularly because
  fixing MD5 for pg_authid requires admin action.
 
 Perhaps I'm missing it, but the documentation in 19.3.2 for 9.4 mentions
 md5 being useful to address password sniffing risk but doesn't
 mention anything about replay or the pg_authid-based risk.  Sniffing of
 the cleartext password is addressed with the approach I've outlined, but
 the replay risk is greater.

I assumed sniffing meant sniff to later replay, which is why we use the
random session salt.  We document how it works so users can decide.

 Further, we can provide a solution to the replay concern by encouraging
 use of 

Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Fri, Mar  6, 2015 at 07:02:36PM -0500, Stephen Frost wrote:
  * Bruce Momjian (br...@momjian.us) wrote:
   I think the best solution to this would be to introduce a per-cluster
   salt that is used for every password hash.  That way, you could not
   replay a pg_authid hash on another server _unless_ you had manually
   assigned the same cluster salt to both servers, or connection pooler.
  
  Wouldn't that break the wireline protocol, unless you used a fixed set
  of known challenges?  Perhaps I'm not following what you mean by a
  cluster-wide salt here.
 
 Well, right now the server sends a random 32-bit number as session salt,
 but due to the birthday problem, that can repeat in ~16k connection
 attempts.  If we use an incremented counter for each session, we spread
 the salt evenly over the entire 32-bit key space, making replays much
 more difficult, e.g. 4 billion.

Ok, this is the incremented counter approach you brought up previously.
Using the term 'cluster-wide salt' confused me as I thought you were
referring to changing the on-disk format somehow with this.

 I don't think the client would ever know the number was different from 
 the random number we used to send.

Agreed.

 The big win for this idea is that it requires no client or admin
 changes, while your idea of using a new salt does.  My personal opinion
 is that the 32-bit counter idea improves MD5 replays, and that we are
 better off going with an entirely new authentication method to fix the
 pg_authid vulnerability.

There's apparently some confusion here as my approach does not require
any client or admin changes either.  If users are not using TLS today
then they will be slightly more susceptible to network-based sniffing
attacks than they are today due to the replay issue, but they are
already at risk to a variety of other (arguably worse) attacks in that
case.  Further, we can at least tell users about those risks and provide
a way to address them.

 I think your argument is that if users are using TLS, there is no replay
 problem and you want to focus on the pg_authid problem.  I think fixing
 the pg_authid problem inside MD5 is just too complex and we are better
 off creating a new authentication method for that.  

There is no replay problem if users are using TLS, that's correct.  I
don't believe the approach I've outlined is particularly complex, but
that's clearly a biased viewpoint.

I certainly agree that we need to create a new authentication method to
address these issues properly and would love to discuss that further
rather than discuss the merits of these potential improvements to md5.

 The bottom line is we promised MD5 to prevent replay, but not pg_authid
 stealing, and if we can improve on what we promised, we should do that
 and not hijack MD5 to fix a different problem, particularly because
 fixing MD5 for pg_authid requires admin action.

Perhaps I'm missing it, but the documentation in 19.3.2 for 9.4 mentions
md5 being useful to address password sniffing risk but doesn't
mention anything about replay or the pg_authid-based risk.  Sniffing of
the cleartext password is addressed with the approach I've outlined, but
the replay risk is greater.

Further, we can provide a solution to the replay concern by encouraging
use of SSL/TLS.  We can not provide any solution to the pg_authid-based
risk with this approach (with the possible exception of recommending
password-based auth, though the poor salt used makes that less
effective than md5 with the suggestion I've outlined).

The incremental counter approach implies not using the approach I've
outlined, which is fine, but it doesn't change the pg_authid risk, nor
does it help at all for TLS-using environments, and for users who are
not using TLS, it doesn't make them particularly more secure to
cleartext-password loss, connection hijacking, data loss, or anything
except replay attacks.  When it comes to risks, for my part at least, I
don't feel like the replay risk is the largest concern to an operator
who is not using TLS.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-03-07 Thread Petr Jelinek

On 04/03/15 23:51, Andreas Karlsson wrote:

On 01/29/2015 12:28 AM, Peter Geoghegan wrote:

* I'm not sure about the idea of polymorphic catalog functions (that
return the type internal, but the actual struct returned varying
based on build settings).

I tend to think that things would be better if there was always a
uniform return type for such internal type returning functions, but
that *its* structure varied according to the availability of int128
(i.e. HAVE_INT128) at compile time. What we should probably do is have
a third aggregate struct, that encapsulates this idea of (say)
int2_accum() piggybacking on one of either Int128AggState or
NumericAggState directly. Maybe that would be called PolyNumAggState.
Then this common code is all that is needed on both types of build (at
the top of int4_accum(), for example):

PolyNumAggState *state;

state = PG_ARGISNULL(0) ? NULL : (PolyNumAggState *)
PG_GETARG_POINTER(0);

I'm not sure if I'd do this with a containing struct or a simple
#ifdef HAVE_INT128 typedef #else ... , but I think it's an
improvement either way.


Not entirely sure exactly what I mean but I have changed to something
like this, relying on typedef, in the attached version of the patch. I
think it looks better after the changes.



I think this made the patch much better indeed.

What I am wondering is if those numeric_int16_* functions that also deal 
with either the Int128AggState or NumericAggState should be renamed in 
similar fashion.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Bruce Momjian
On Thu, Mar  5, 2015 at 11:10:08AM -0500, Robert Haas wrote:
 But I agree with Fujii to the extent that I see little value in
 committing this patch in the form proposed.  Being smart enough to use
 the LSN to identify changed blocks, but then sending the entirety of
 every file anyway because you don't want to go to the trouble of
 figuring out how to revise the wire protocol to identify the
 individual blocks being sent and write the tools to reconstruct a full
 backup based on that data, does not seem like enough of a win. As
 Fujii says, if we ship this patch as written, people will just keep
 using the timestamp-based approach anyway.  Let's wait until we have
 something that is, at least in some circumstances, a material
 improvement over the status quo before committing anything.

The big problem I have with this patch is that it has not followed the
proper process for development, i.e. at the top of the TODO list we
have:

Desirability - Design - Implement - Test - Review - Commit

This patch has continued in development without getting agreement on
its Desirability or Design, meaning we are going to continue going back
to those points until there is agreement.  Posting more versions of this
patch is not going to change that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Sat, Mar  7, 2015 at 12:49:15PM -0500, Stephen Frost wrote:
  Ok, this is the incremented counter approach you brought up previously.
  Using the term 'cluster-wide salt' confused me as I thought you were
  referring to changing the on-disk format somehow with this.
 
 Yes, I used the term cluster-wide salt in two cases:  first,
 cluster-wide counter for the MD5 session salt improvement, and second,
 cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse
 if the cluster-wide fixed salt was set to match on two clusters, e.g.
 for pooling.

Ok, how, in the second case, is pg_authid reuse prevented unless we also
change the on-disk pg_authid format?  It doesn't matter if there's a
different counter used between different clusters, if the attacker has
what's in pg_authid then they can trivially add any salt we send them
during the challenge/response to the pg_authid value and send us the
response we expect.

   The big win for this idea is that it requires no client or admin
   changes, while your idea of using a new salt does.  My personal opinion
   is that the 32-bit counter idea improves MD5 replays, and that we are
   better off going with an entirely new authentication method to fix the
   pg_authid vulnerability.
  
  There's apparently some confusion here as my approach does not require
  any client or admin changes either.  If users are not using TLS today
 
 Really?  From your previous email I see:
 
  this approach looks like this: pre-determine and store the values (on a
  per-user basis, so a new field in pg_authid or some hack on the existing
  field) which will be sent to the client in the AuthenticationMD5Password
  message.  Further, calculate a random salt to be used when storing data
  in pg_authid.  Then, for however many variations we feel are necessary,
  calculate and store, for each AuthenticationMD5Password value:
 
 How do you generate these X versions of password hashes without admin
 changes?  In fact, I am not even sure how the server would create these
 unless it had the raw passwords.

Ok, I've apparently not been clear enough with the proposal.  What we
are hashing is the *current value* of what's in pg_authid- which we've
already got on disk.  All we have to do is read the current on-disk
value, add a salt to it, and store it back on disk.  We'd want to store
more than one, as discussed, since otherwise the challenge/response is
utterly useless.

  then they will be slightly more susceptible to network-based sniffing
 
 Slightly?  X versions as stored in pg_authid vs 16k or 4 billion?

Sadly, yes.  It's not a perfect test, but a simple:

time for a in `seq 1 1000`; do nc localhost 5432  /dev/null; done

Gave me 9.15s, or ~0.00915s per connection on a single thread.  That
times 16k is 146s or about two and a half minutes.  Of course, I'm
comparing this against what we currently do since, well, that's what we
currently do.  Changing it to 4b would certainly improve that.  Of
course, using multiple threads, having multiple challenge/responses on
hand (due to listening for a while) or simply breaking the MD5 hash
(which we know isn't a terribly great hashing algorithm these days)
would change that.

  attacks than they are today due to the replay issue, but they are
  already at risk to a variety of other (arguably worse) attacks in that
  case.  Further, we can at least tell users about those risks and provide
  a way to address them.
 
 Right, but again, the user can choose to use TLS if they wish.  I think
 you are saying MD5 replay security is worthless without TLS, but I can
 assure you many users are happy with that.  The fact this issue rarely
 comes up is a testament to that, and in fact, until we documented
 exactly how MD5 worked, we got many more questions about MD5.  Perhaps
 we should document the pg_authid reuse risk.

We should certainly document the pg_authid reuse risk.  For my part, I
don't think it's that people are happy with that trade-off but rather
that they simply don't realize the risk is there because, basically, who
would ever do that?  I'm not aware of any other server application which
even offers an option to store the authentication token out on disk in a
format which can be trivially used to authenticate to the system
(excluding people who are storing cleartext passwords..).

I do feel that people who are worried about MD5 replay security who are
*not* using TLS have not considered their risks properly.  It is unclear
to me why anyone should feel safe from an attacker who is able to sniff
the network traffic thanks to our challenge/response protocol.

  Further, we can provide a solution to the replay concern by encouraging
  use of SSL/TLS.  We can not provide any solution to the pg_authid-based
  risk with this approach (with the possible exception of recommending
  password-based auth, though the poor salt used makes that less
  effective than md5 with the suggestion I've outlined).
  

Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Fri, Mar  6, 2015 at 07:02:36PM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Fri, Mar  6, 2015 at 12:50:14PM -0800, Josh Berkus wrote:
   On 03/06/2015 08:19 AM, Stephen Frost wrote:
Well, server-side, we already have that- have pgbouncer run on the
database server (something which I'm typically in favor of anyway) and
use 'peer'.  If it supported TLS then it could use certificates instead.
The question is what to do after the pooler has connected and that's
actually a generic issue which goes beyond poolers and into
applications, basically, how can I re-authenticate this connection
using a different role.  We have SET ROLE, but that gives a lot of
power to the role the pooler logs in as.  It'd definitely be neat to
provide a way to use SCRAM or similar to do that re-authentication after
the initial connection.
   
   Using pgbouncer on the DB server is common, but less common that using
   it on an intermediate server or even the app server itself.  So anything
   we create needs to be implementable with all three configurations in
   some way.
  
  I think the best solution to this would be to introduce a per-cluster
  salt that is used for every password hash.  That way, you could not
  replay a pg_authid hash on another server _unless_ you had manually
  assigned the same cluster salt to both servers, or connection pooler.
 
 Wouldn't that break the wireline protocol, unless you used a fixed set
 of known challenges?  Perhaps I'm not following what you mean by a
 cluster-wide salt here.

Well, right now the server sends a random 32-bit number as session salt,
but due to the birthday problem, that can repeat in ~16k connection
attempts.  If we use an incremented counter for each session, we spread
the salt evenly over the entire 32-bit key space, making replays much
more difficult, e.g. 4 billion.

I don't think the client would ever know the number was different from 
the random number we used to send.

The big win for this idea is that it requires no client or admin
changes, while your idea of using a new salt does.  My personal opinion
is that the 32-bit counter idea improves MD5 replays, and that we are
better off going with an entirely new authentication method to fix the
pg_authid vulnerability.

I think your argument is that if users are using TLS, there is no replay
problem and you want to focus on the pg_authid problem.  I think fixing
the pg_authid problem inside MD5 is just too complex and we are better
off creating a new authentication method for that.  

The bottom line is we promised MD5 to prevent replay, but not pg_authid
stealing, and if we can improve on what we promised, we should do that
and not hijack MD5 to fix a different problem, particularly because
fixing MD5 for pg_authid requires admin action.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Fri, Mar  6, 2015 at 07:00:10PM -0500, Stephen Frost wrote:
   I'm also worried about both, but if the admin is worried about sniffing
   in their environment, they're much more likely to use TLS than to set up
   client side certificates, kerberos, or some other strong auth mechanism,
   simply because TLS is pretty darn easy to get working and distros set it
   up for you by default.
  
  I think your view might be skewed.  I think there many people who care
  about password security who don't care to do TLS.
 
 I'm not quite sure that I follow what you mean about caring for
 password security.
 
 I'll try to clarify by suggesting a few things that I think you might
 mean and will respond to them.  Please clarify if I've completely missed
 what you're getting at here.
 
 If the concern is database access due to an attacker who can sniff the
 network data then the approach which I suggested would make things
 materially worse for users who do not employ TLS.  Once the attacker has
 sniffed the network for a little while, they'll have one and likely more
 challenge/responses which they could use to attempt to log in with.  As
 discussed, our existing challenge/response system is vulnerable to
 network sniffing based replay attacks also.

Yes, I am worried about that attack vector.  We have always been open
that MD5 only prevents password sniffing, though the 16k reply idea has
put a dent into that promise.

See the email I just sent on how I think we should proceed, i.e. by
creating a new authentication method to fix pg_authid stealing, and use
a salt counter to improve MD5.

  Also, my suggestion to use a counter for the session salt, to reduce
  replay from 16k to 4 billion, has not received any comments, and it does
  not break the wire protocol.  I feel that is an incremental improvement
  we should consider.
 
 You are correct, that would improve the existing protocol regarding
 database-access risk due to network-based sniffing attacks.
 Unfortunately, it would not improve cleartext or database access
 risk due to disk-based attacks.

Agreed.

 
  I think you are minimizing the downsize of your idea using X challenges
  instead of 16k challenges to get in.  Again, if my idea is valid, it
  would be X challenges vs 4 billion challenges.
 
 The reason I have not been as excited by this approach is that we
 already have a solution for network-based sniffing attacks.  As Josh
 mentioned, there are users out there who even go to extraordinary
 lengths to thwart network-based sniffing attacks by using stunnel with
 pg_bouncer.  On the flip side, while there are ways to protect backups
 through encryption, many other vectors exist for disk-based attacks
 which result in either database access or finding the cleartext
 password with much less difficulty.
 
 Further, this only improves the authentication handling and doesn't
 improve our risk to other network-based attacks, including connection
 hijacking, sniffing during password set/reset, data compromise as it's
 sent across the wire, etc.  Encouraging use of TLS addresses all of
 those risks.  I don't recall any complaints about these other
 network-based attacks and I do believe that's because TLS is available.
 Combined with the approach I've suggested, we would reduce the risk of
 disk-based attacks to the extent we're able to without breaking the
 protocol.

Agreed.  Again, TLS has its own value beyond simple authentication.  I
just don't think getting MD5 to try to fix the pg_authid problem is the
right approach.

 For my part, doing this, or going with my suggestion, or doing nothing
 with md5, really doesn't move us forward very much, which frustrates me
 greatly.  I brought this suggestion to this list because it was

Agreed.

 suggested to me as one change we could make that would reduce the risk
 of disk-based attacks while trading that off for a higher risk on the
 side of network-based attacks while not breaking the existing network
 protocol.  To make it very clear- it is not a solution but rather a poor
 bandaid.  What we really need is a new password based protocol which is
 based on a future-proof, well designed protocol, such as SCRAM.

Again, agreed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Fri, Mar  6, 2015 at 07:00:10PM -0500, Stephen Frost wrote:
  suggested to me as one change we could make that would reduce the risk
  of disk-based attacks while trading that off for a higher risk on the
  side of network-based attacks while not breaking the existing network
  protocol.  To make it very clear- it is not a solution but rather a poor
  bandaid.  What we really need is a new password based protocol which is
  based on a future-proof, well designed protocol, such as SCRAM.
 
 Again, agreed.

Great.

Have you had a chance to review the SCRAM RFC or do you have any
questions as it relates to how SCRAM works?  I'd love to start a proper
discussion about how we go about implementing it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TABLESAMPLE patch

2015-03-07 Thread Petr Jelinek

On 05/03/15 09:21, Amit Kapila wrote:

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
 
  I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.
 

I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?


Double checking for tuple visibility is the only downside I can think 
of. Plus some added code complexity of course. I guess we could use 
binary search on rs_vistuples (it's already sorted) so that info won't 
be completely useless. Not sure if worth it compared to normal 
visibility check, but not hard to do.


I personally don't see the page pruning in TABLESAMPLE as all that 
important though, considering that in practice it will only scan tiny 
portion of a relation and usually one that does not get many updates (in 
practice the main use-case is BI).




Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej


Ah bitrot over time.



2.
Few warnings in code (compiled on windows(msvc))
1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'



I think this is just compiler stupidity but hopefully fixed (I don't 
have msvc to check for it and other compilers I tried don't complain).



3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+TableSampleClause *tablesample)
{
..
+/*
+* Page at a time mode is useless for us as we need to check visibility
+* of tuples individually because tuple offsets returned by sampling
+* methods map to rs_vistuples values and not to its indexes.
+*/
+node-ss.ss_currentScanDesc-rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c



Yeah I agree, I taught the heap_beginscan_strat about it as that one is 
the advanced API.




4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+*tuples = path-rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.



I am not sure how we would know what size would the tuples have in the 
random blocks that we are going to read later. That said, I am sure that 
costing can be improved even if I don't know how myself.




6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+| REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.


Yes it can be moved to type_func_name_keyword which is not all that much 
better but at least something. I did try to play with this already 
during first submission but could not find a way to move it to something 
less restrictive.


I could not even pinpoint what exactly is the shift/reduce issue except 
that it has something to do with the fact that the REPEATABLE clause is 
optional (at least I didn't have the problem when it wasn't optional).




On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
 ;

  opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *)
$3; }
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *)
$3; }

Re: [HACKERS] proposal: searching in array function - array_position

2015-03-07 Thread Petr Jelinek

On 22/02/15 12:19, Pavel Stehule wrote:



2015-02-22 3:00 GMT+01:00 Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com:

On 28/01/15 08:15, Pavel Stehule wrote:



2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com
mailto:jim.na...@bluetreble.com:

 On 1/27/15 4:36 AM, Pavel Stehule wrote:


 It is only partially identical - I would to use cache for
 array_offset, but it is not necessary for array_offsets ..
 depends how we would to modify current API to support
externally
 cached data.


 Externally cached data?


Some from these functions has own caches for minimize access to
typcache
(array_map, array_cmp is example). And in first case, I am trying to
push these information from fn_extra, in second case I don't do it,
because I don't expect a repeated call (and I am expecting so
type cache
will be enough).


You actually do caching via fn_extra in both case and I think that's
the correct way, and yes that part can be moved common function.

I also see that the documentation does not say what is returned by
array_offset if nothing is found (it's documented in code but not in
sgml).


rebased + fixed docs



You still duplicate the type cache code, but many other array functions 
do that too so I will not hold that against you. (Maybe somebody should 
write separate patch that would put all that duplicate code into common 
function?)


I think this patch is ready for committer so I am going to mark it as such.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CATUPDATE confusion?

2015-03-07 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 On 3/7/15 12:31 AM, Tom Lane wrote:
  Peter Eisentraut pete...@gmx.net writes:
  On 12/29/14 7:16 PM, Adam Brightwell wrote:
  Given this discussion, I have attached a patch that removes CATUPDATE
  for review/discussion.
  
  committed this version
  
  Hmm .. I'm not sure that summarily removing usecatupd from those three
  system views was well thought out.  pg_shadow, especially, has no reason
  to live at all except for backwards compatibility, and clients might well
  expect that column to still be there.  I wonder if we'd not be better off
  to keep the column in the views but have it read from rolsuper.
 
 I doubt anyone is reading the column.  And if they are, they should stop.

Certainly pgAdmin and similar tools are.  From that standpoint though, I
agree that they should be modified to no longer read it, as the whole
point of those kinds of tools is to allow users to modify those
attributes and having CATUPDATE in the view would surely be confusing as
the user wouldn't be able to modify it.

 pg_shadow and pg_user have been kept around because it is plausible that
 a lot of tools want to have a list of users, and requiring all of them
 to change to pg_authid at once was deemed too onerous at the time.  I
 don't think this requires us to keep all the details the same forever.

pg_shadow, pg_user and pg_group were added when role support was added,
specifically for backwards compatibility.  I don't believe there was
ever discussion about keeping them because filtering pg_roles based on
rolcanlogin was too onerous.  That said, we already decided recently
that we wanted to keep them updated to match the actual attributes
available (note that the replication role attribute modified those
views) and I think that makes sense on the removal side as well as the
new-attribute side.

I continue to feel that we really should officially deprecate those
views as I don't think they're actually all that useful any more and
maintaining them ends up bringing up all these questions, discussion,
and ends up being largely busy-work if no one really uses them.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] improve pgbench syntax error messages

2015-03-07 Thread Fabien COELHO



Here is a v3, which (1) activates better error messages from bison
and (2) improves the error reporting from the scanner as well.


v4.

While adding a basic function call syntax to expressions, a noticed that 
it would be useful to access the detail field of syntax errors so as to 
report the name of the unknown function. This version just adds the hook 
(expr_yyerror_detailed) that could be called later for this purpose.


--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index 243c6b9..0405a50 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -26,6 +26,10 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 %expect 0
 %name-prefix=expr_yy
 
+/* better error reporting with bison */
+%define parse.lac full
+%define parse.error verbose
+
 %union
 {
 	int64		ival;
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 4c9229c..de3f09a 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -18,6 +18,13 @@ static YY_BUFFER_STATE scanbufhandle;
 static char *scanbuf;
 static int	scanbuflen;
 
+/* context information for error reporting */
+static char *expr_source = NULL;
+static int expr_lineno = 0;
+static char *expr_full_line = NULL;
+static char *expr_command = NULL;
+static int expr_col = 0;
+
 /* flex 2.5.4 doesn't bother with a decl for this */
 int expr_yylex(void);
 
@@ -52,29 +59,42 @@ space			[ \t\r\f]
 
 .{
 	yycol += yyleng;
-	fprintf(stderr, unexpected character '%s'\n, yytext);
+	syntax_error(expr_source, expr_lineno, expr_full_line, expr_command,
+ unexpected character, yytext, expr_col + yycol);
+	/* dead code, exit is called from syntax_error */
 	return CHAR_ERROR;
 }
 %%
 
 void
-yyerror(const char *message)
+expr_yyerror_detailed(const char *message, const char * detail)
 {
-	/* yyline is always 1 as pgbench calls the parser for each line...
-	 * so the interesting location information is the column number */
-	fprintf(stderr, %s at column %d\n, message, yycol);
-	/* go on to raise the error from pgbench with more information */
-	/* exit(1); */
+	syntax_error(expr_source, expr_lineno, expr_full_line, expr_command,
+ message, detail, expr_col + yycol);
+}
+
+void yyerror(const char *message)
+{
+	expr_yyerror_detailed(message, NULL);
 }
 
 /*
  * Called before any actual parsing is done
  */
 void
-expr_scanner_init(const char *str)
+expr_scanner_init(const char *str, const char *source,
+  const int lineno, const char *line,
+  const char *cmd, const int ecol)
 {
 	Size	slen = strlen(str);
 
+	/* save context informations for error messages */
+	expr_source = (char *) source;
+	expr_lineno = (int) lineno;
+	expr_full_line = (char *) line;
+	expr_command = (char *) cmd;
+	expr_col = (int) ecol;
+
 	/*
 	 * Might be left over after error
 	 */
@@ -102,4 +122,9 @@ expr_scanner_finish(void)
 {
 	yy_delete_buffer(scanbufhandle);
 	pg_free(scanbuf);
+	expr_source = NULL;
+	expr_lineno = 0;
+	expr_full_line = NULL;
+	expr_command = NULL;
+	expr_col = 0;
 }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 706fdf5..3e50bae 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -291,6 +291,7 @@ typedef struct
 	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
 	int			argc;			/* number of command words */
 	char	   *argv[MAX_ARGS]; /* command word list */
+	int			cols[MAX_ARGS]; /* corresponding column starting from 1 */
 	PgBenchExpr *expr;			/* parsed expression */
 } Command;
 
@@ -2189,6 +2190,28 @@ parseQuery(Command *cmd, const char *raw_sql)
 	return true;
 }
 
+void syntax_error(const char *source, const int lineno,
+  const char *line, const char *command,
+  const char *msg, const char *more, const int column)
+{
+	fprintf(stderr, %s:%d: %s, source, lineno, msg);
+	if (more)
+		fprintf(stderr,  (%s), more);
+	if (column != -1)
+		fprintf(stderr,  at column %d, column);
+	fprintf(stderr,  in command \%s\\n, command);
+	if (line) {
+		fprintf(stderr, %s\n, line);
+		if (column != -1) {
+			int i = 0;
+			for (i = 0; i  column - 1; i++)
+fprintf(stderr,  );
+			fprintf(stderr, ^ error found here\n);
+		}
+	}
+	exit(1);
+}
+
 /* Parse a command; return a Command struct, or NULL if it's a comment */
 static Command *
 process_commands(char *buf, const char *source, const int lineno)
@@ -2200,6 +2223,11 @@ process_commands(char *buf, const char *source, const int lineno)
 	char	   *p,
 			   *tok;
 
+/* error message generation helper */
+#define SYNTAX_ERROR(msg, more, col)	\
+	syntax_error(source, lineno, my_commands-line,		\
+ my_commands-argv[0], msg, more, col)
+
 	/* Make the string buf end at the next newline */
 	if ((p = strchr(buf, '\n')) != NULL)
 		*p = '\0';
@@ -2228,11 +2256,13 @@ process_commands(char *buf, const char *source, const int lineno)
 		j = 0;
 		tok = strtok(++p, delim);
 
+		/* stop set argument parsing the variable name */
 		if 

Re: [HACKERS] CATUPDATE confusion?

2015-03-07 Thread Peter Eisentraut
On 3/7/15 12:31 AM, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On 12/29/14 7:16 PM, Adam Brightwell wrote:
 Given this discussion, I have attached a patch that removes CATUPDATE
 for review/discussion.
 
 committed this version
 
 Hmm .. I'm not sure that summarily removing usecatupd from those three
 system views was well thought out.  pg_shadow, especially, has no reason
 to live at all except for backwards compatibility, and clients might well
 expect that column to still be there.  I wonder if we'd not be better off
 to keep the column in the views but have it read from rolsuper.

I doubt anyone is reading the column.  And if they are, they should stop.

pg_shadow and pg_user have been kept around because it is plausible that
a lot of tools want to have a list of users, and requiring all of them
to change to pg_authid at once was deemed too onerous at the time.  I
don't think this requires us to keep all the details the same forever.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Marco Nenciarini
Il 05/03/15 05:42, Bruce Momjian ha scritto:
 On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
 Yeah, it might make the situation better than today. But I'm afraid that
 many users might get disappointed about that behavior of an incremental
 backup after the release...

 I don't get what do you mean here. Can you elaborate this point?

 The proposed version of LSN-based incremental backup has some limitations
 (e.g., every database files need to be read even when there is no 
 modification
 in database since last backup, and which may make the backup time longer than
 users expect) which may disappoint users. So I'm afraid that users who can
 benefit from the feature might be very limited. IOW, I'm just sticking to
 the idea of timestamp-based one :) But I should drop it if the majority in
 the list prefers the LSN-based one even if it has such limitations.
 
 We need numbers on how effective each level of tracking will be.  Until
 then, the patch can't move forward.
 

I've written a little test script to estimate how much space can be saved by 
file level incremental, and I've run it on some real backups I have access to.

The script takes two basebackup directory and simulate how much data can be 
saved in the 2nd backup using incremental backup (using file size/time and LSN)

It assumes that every file in base, global and pg_tblspc which matches both 
size and modification time will also match from the LSN point of view.

The result is that many databases can take advantage of incremental, even if 
not do big, and considering LSNs yield a result almost identical to the 
approach based on filesystem metadata.

== Very big geographic database (similar to openstreetmap main DB), it contains 
versioned data, interval two months 

First backup size: 13286623850656 (12.1TiB)
Second backup size: 13323511925626 (12.1TiB)
Matching files count: 17094
Matching LSN count: 14580
Matching files size: 9129755116499 (8.3TiB, 68.5%)
Matching LSN size: 9128568799332 (8.3TiB, 68.5%)


== Big on-line store database, old data regularly moved to historic partitions, 
interval one day

First backup size: 1355285058842 (1.2TiB)
Second backup size: 1358389467239 (1.2TiB)
Matching files count: 3937
Matching LSN count: 2821
Matching files size: 762292960220 (709.9GiB, 56.1%)
Matching LSN size: 762122543668 (709.8GiB, 56.1%)


== Ticketing system database, interval one day

First backup size: 144988275 (138.3MiB)
Second backup size: 146135155 (139.4MiB)
Matching files count: 3124
Matching LSN count: 2641
Matching files size: 76908986 (73.3MiB, 52.6%)
Matching LSN size: 67747928 (64.6MiB, 46.4%)


== Online store, interval one day

First backup size: 20418561133 (19.0GiB)
Second backup size: 20475290733 (19.1GiB)
Matching files count: 5744
Matching LSN count: 4302
Matching files size: 4432709876 (4.1GiB, 21.6%)
Matching LSN size: 4388993884 (4.1GiB, 21.4%)


== Heavily updated database, interval one week

First backup size: 3203198962 (3.0GiB)
Second backup size: 3222409202 (3.0GiB)
Matching files count: 1801
Matching LSN count: 1273
Matching files size: 91206317 (87.0MiB, 2.8%)
Matching LSN size: 69083532 (65.9MiB, 2.1%)

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
#!/usr/bin/env python
from __future__ import print_function

import collections
from optparse import OptionParser
import os
import sys

__author__ = 'Marco Nenciarini marco.nenciar...@2ndquadrant.it'

usage = usage: %prog [options] backup_1 backup_2

parser = OptionParser(usage=usage)
(options, args) = parser.parse_args()

# need 2 directories
if len(args) != 2:
parser.print_help()
sys.exit(1)

FileItem = collections.namedtuple('FileItem', 'size time path')


def get_files(target_dir):
Return a set of FileItem
files = set()
for dir_path, _, file_names in os.walk(target_dir, followlinks=True):
for filename in file_names:
path = os.path.join(dir_path, filename)
rel_path = path[len(target_dir) + 1:]
stats = os.stat(path)
files.add(FileItem(stats.st_size, stats.st_mtime, rel_path))
return files


def size_fmt(num, suffix='B'):
Format a size
for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']:
if abs(num)  1024.0:
return %3.1f%s%s % (num, unit, suffix)
num /= 1024.0
return %.1f%s%s % (num, 'Yi', suffix)

def percent_fmt(a, b):
Format a percent
return %.1f%% % (100.0*a/b)


def get_size(file_set):
return reduce(lambda x, y: x + y.size, file_set, 0)


def report(a, b):
# find files that are identical (same size and same time)
common = a  b

# remove files count only main forks LSN
common_lsn = set()
for item in common:
# remove non main fork
if any(suffix in item.path
   for suffix in ('_fsm', '_vm', '_init')):
continue
# remove things outside data 

Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-03-07 Thread Michael Paquier
On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Those patches are really simple, but then perhaps there are better or
 simpler ways than what is attached, so feel free to comment if you
 have any ideas.

Attached are new patches somewhat based on the comments provided by
Peter Eisentraut
(http://www.postgresql.org/message-id/54f62c3f.8070...@gmx.net).
- 0001 makes prove_check install the contents of the path located at
$(CURDIR)/t/extra if present, which would be the path that test
developers could use to store extensions, plugins or modules that
would then be usable by a given set of TAP tests as they are installed
in the temporary installation before launching the tests.
- 0002 is the test for pg_dump checking extensions containing FK
tables, this time integrated as a TAP test in src/bin/pg_dump, with
the extension in src/bin/pg_dump/t/extra.
IMO, this approach is scalable enough thinking long-term. And I think
that the location of those extra extensions is fine in t/ as an
hardcoded subfolder (any fresh idea being of course welcome) as this
makes the stuff in extra/ dedicated to only on set of TAP tests, and
it is even possible to set multiple extensions/modules.
Regards,
-- 
Michael
From e3eec3c8f8af72d3c85e217441009842e09ce1fc Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Sat, 7 Mar 2015 21:13:15 +
Subject: [PATCH 1/2] Make prove_check install contents t/extra if present

This gives module developers the possibility to add a set of extensions
and/or tools that will be available in the temporary installation of a
given path, making them available for TAP test.
---
 src/Makefile.global.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 7c39d82..e856ca8 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -323,6 +323,8 @@ endef
 define prove_check
 $(MKDIR_P) tmp_check/log
 $(MAKE) -C $(top_builddir) DESTDIR='$(CURDIR)'/tmp_check/install install '$(CURDIR)'/tmp_check/log/install.log 21
+# Install extra set of test utilities if present
+@if test -d $(CURDIR)/t/extra; then $(MAKE) -C $(CURDIR)/t/extra DESTDIR='$(CURDIR)'/tmp_check/install install '$(CURDIR)'/tmp_check/log/install.log 21; fi
 cd $(srcdir)  TESTDIR='$(CURDIR)' PATH=$(CURDIR)/tmp_check/install$(bindir):$$PATH $(call add_to_path,$(ld_library_path_var),$(CURDIR)/tmp_check/install$(libdir)) top_builddir='$(CURDIR)/$(top_builddir)' PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
 endef
 
-- 
2.3.1

From 0c7a45348d9a699aeeef6cbc911beca6e6e45235 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Sat, 7 Mar 2015 21:18:58 +
Subject: [PATCH 2/2] Add TAP test for pg_dump checking data dump of extension
 tables

The test added checks the data dump ordering of tables added in an extension
linked among them with foreign keys. In order to do that, a test extension in
the set of TAP tests of pg_dump, combined with a TAP test script making use
of it.
---
 src/bin/pg_dump/.gitignore |  2 ++
 src/bin/pg_dump/Makefile   |  6 +
 src/bin/pg_dump/t/001_dump_test.pl | 27 ++
 src/bin/pg_dump/t/extra/Makefile   | 14 +++
 src/bin/pg_dump/t/extra/tables_fk/Makefile | 16 +
 src/bin/pg_dump/t/extra/tables_fk/README   |  5 
 .../pg_dump/t/extra/tables_fk/tables_fk--1.0.sql   | 20 
 .../pg_dump/t/extra/tables_fk/tables_fk.control|  5 
 8 files changed, 95 insertions(+)
 create mode 100644 src/bin/pg_dump/t/001_dump_test.pl
 create mode 100644 src/bin/pg_dump/t/extra/Makefile
 create mode 100644 src/bin/pg_dump/t/extra/tables_fk/Makefile
 create mode 100644 src/bin/pg_dump/t/extra/tables_fk/README
 create mode 100644 src/bin/pg_dump/t/extra/tables_fk/tables_fk--1.0.sql
 create mode 100644 src/bin/pg_dump/t/extra/tables_fk/tables_fk.control

diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore
index c2c8677..02666f9 100644
--- a/src/bin/pg_dump/.gitignore
+++ b/src/bin/pg_dump/.gitignore
@@ -3,3 +3,5 @@
 /pg_dump
 /pg_dumpall
 /pg_restore
+
+/tmp_check
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index e890a62..1a3f2ca 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -49,5 +49,11 @@ installdirs:
 uninstall:
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X))
 
+check: all
+	$(prove_check)
+
+installcheck: install
+	$(prove_installcheck)
+
 clean distclean maintainer-clean:
 	rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o kwlookup.c $(KEYWRDOBJS)
diff --git a/src/bin/pg_dump/t/001_dump_test.pl b/src/bin/pg_dump/t/001_dump_test.pl
new file mode 100644
index 000..0953bf5
--- /dev/null
+++ b/src/bin/pg_dump/t/001_dump_test.pl
@@ -0,0 +1,27 @@
+use strict;
+use 

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 12:48 AM, Noah Misch wrote:

On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:

Noah Misch n...@leadboat.com writes:

On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.



+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


+1 for repalloc'ing at need, but I'm not sure about the start far below
64 MiB part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).


True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)


A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.


Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


In the case of vacuum, I think we presumably have a pretty good 
indicator of how much space we should need; namely reltuples * 
autovacuum_scale_factor. There shouldn't be too much more space needed 
than that if autovac is keeping up with things.


If we go that route, does it still make sense to explicitly use 
repalloc_huge? It will just cut over to that at some point (128M?) 
anyway, and if you're vacuuming a small relation presumably it's not 
worth messing with.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Sat, Mar  7, 2015 at 03:15:46PM -0500, Bruce Momjian wrote:
  Gave me 9.15s, or ~0.00915s per connection on a single thread.  That
  times 16k is 146s or about two and a half minutes.  Of course, I'm
  comparing this against what we currently do since, well, that's what we
  currently do.  Changing it to 4b would certainly improve that.  Of
  course, using multiple threads, having multiple challenge/responses on
  hand (due to listening for a while) or simply breaking the MD5 hash
  (which we know isn't a terribly great hashing algorithm these days)
  would change that.
 
 Uh, my calculations show that as 434 days of trying.  (Not sure why you
 didn't bother doing that calculation.)  I think anyone who is worried
 about that level of attack would already be using MD5.  Again, MD5 is
 mostly used in low-security settings where you just don't want the
 password sent over the wire in cleartext.  Frankly, without TLS, you are
 already sending your queries and data across in clear-text, and there
 are other attack vectors.

Actually, with a counter, the bad guy just has to wait for the counter
to roll around, and then try to catch the counter on the values he has
recorded, meaning you wouldn't even be able to detect the hack attempts.
:-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Andres Freund
On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.

I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bootstrap DATA is a pita

2015-03-07 Thread Andres Freund
On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
 Semi-related... if we put some special handling in some places for bootstrap
 mode, couldn't most catalog objects be created using SQL, once we got
 pg_class, pg_attributes and pg_type created? That would theoretically allow
 us to drive much more of initdb with plain SQL (possibly created via
 pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 4:49 PM, Andres Freund wrote:

On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.


That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.


That leaves me wondering why we bother with dynamic resizing in other 
areas (like sorts, for example) then? Why not just palloc work_mem and 
be done with it? What makes those cases different?



I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.


Perhaps; Noah seems to be the one one who's seen this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2015-03-07 Thread Stephen Frost
Peter, all,

* Peter Eisentraut (pete...@gmx.net) wrote:
 Why are we not using roles and function execute privileges for this?

Alright, I've got an initial patch to do this for pg_start/stop_backup,
pg_switch_xlog, and pg_create_restore_point.  The actual backend changes
are quite small, as expected.  I'll add in the changes for the other
functions being discussed and adapt the documentation changes from
the earlier patch to make sense, but what I'd really appreciate are any
comments or thoughts regarding the changes to pg_dump (which are generic
to all of the function changes, of course).

I've added a notion of the catalog schema to pg_dump's internal
_namespaceinfo representation and then marked pg_catalog as being that
schema, as well as being a dumpable schema.  Throughout the
selectDumpable functions, I've made changes to only mark the objects in
the catalog as dumpable if they are functions.  I'm planning to look
into the extension and binary upgrade paths since I'm a bit worried
those may not work with this approach, but I wanted to get this out
there for at least an initial review as, if people feel this makes
things too ugly on the pg_dump side of things then we may want to
reconsider using the role attributes instead.

Thanks!

Stephen
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 2179bf7..36029d0
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,64 
  
  	backupidstr = text_to_cstring(backupid);
  
- 	if (!superuser()  !has_rolreplication(GetUserId()))
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- 		   errmsg(must be superuser or replication role to run a backup)));
- 
  	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
  	PG_RETURN_LSN(startpoint);
--- 54,59 
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,92 
  {
  	XLogRecPtr	stoppoint;
  
- 	if (!superuser()  !has_rolreplication(GetUserId()))
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- 		 (errmsg(must be superuser or replication role to run a backup;
- 
  	stoppoint = do_pg_stop_backup(NULL, true, NULL);
  
  	PG_RETURN_LSN(stoppoint);
--- 77,82 
*** pg_switch_xlog(PG_FUNCTION_ARGS)
*** 100,110 
  {
  	XLogRecPtr	switchpoint;
  
- 	if (!superuser())
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- 			 (errmsg(must be superuser to switch transaction log files;
- 
  	if (RecoveryInProgress())
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
--- 90,95 
*** pg_create_restore_point(PG_FUNCTION_ARGS
*** 129,139 
  	char	   *restore_name_str;
  	XLogRecPtr	restorepoint;
  
- 	if (!superuser())
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-  (errmsg(must be superuser to create a restore point;
- 
  	if (RecoveryInProgress())
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
--- 114,119 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 2800f73..38605de
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS interval
*** 897,899 
--- 897,907 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'make_interval';
+ 
+ -- Revoke privileges for functions that should not be available to
+ -- all users.  Administrators are allowed to change this later, if
+ -- they wish.
+ REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean) FROM public;
+ REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
+ REVOKE EXECUTE ON FUNCTION pg_switch_xlog() FROM public;
+ REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
new file mode 100644
index fdfb431..164608a
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** selectDumpableNamespace(NamespaceInfo *n
*** 1234,1245 
--- 1234,1255 
  	 * If specific tables are being dumped, do not dump any complete
  	 * namespaces. If specific namespaces are being dumped, dump just those
  	 * namespaces. Otherwise, dump all non-system namespaces.
+ 	 *
+ 	 * Note that we do consider dumping ACLs of functions in pg_catalog,
+ 	 * so mark that as a dumpable namespace, but further mark it as the
+ 	 * catalog namespace.
  	 */
+ 
+ 	/* Will be set when we may be dumping catalog ACLs, see below. */
+ 	nsinfo-catalog = false;
+ 
  	if (table_include_oids.head != NULL)
  		nsinfo-dobj.dump = false;
  	else if (schema_include_oids.head != NULL)
  		nsinfo-dobj.dump = simple_oid_list_member(schema_include_oids,
     nsinfo-dobj.catId.oid);
+ 	else if (strncmp(nsinfo-dobj.name, pg_catalog, 10) == 0)
+ 		nsinfo-dobj.dump = nsinfo-catalog = true;
  	else if 

Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Sat, Mar  7, 2015 at 01:56:51PM -0500, Stephen Frost wrote:
  * Bruce Momjian (br...@momjian.us) wrote:
   Yes, I used the term cluster-wide salt in two cases:  first,
   cluster-wide counter for the MD5 session salt improvement, and second,
   cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse
   if the cluster-wide fixed salt was set to match on two clusters, e.g.
   for pooling.
  
  Ok, how, in the second case, is pg_authid reuse prevented unless we also
  change the on-disk pg_authid format?  It doesn't matter if there's a
 
 Yes, the second use would be a new authentication method.

Ok..  I don't think it's a good idea to try and base a new auth method
on what we're doing for md5, in any case.  To the extent that we have
options as it relates to SCRAM or another reviewed authentication
protocol, perhaps it'd make sense to consider a per-cluster salt or
similar, but I'm not sure we actually want or need to do so.

  different counter used between different clusters, if the attacker has
  what's in pg_authid then they can trivially add any salt we send them
  during the challenge/response to the pg_authid value and send us the
  response we expect.
 
 It would be a server-fixed salt sent to the client and applied before
 the session salt.  The pg_authid value would also store with this salt. 
 Again, I am brainstorming about poolers.

When it comes to poolers, I'm thinking we should be looking at an
approach to do an after-connection re-authentication.  Essentially a
'SET ROLE WITH mypassword;' kind of approach, but also using a proper
protocol, etc.  That way, the pooler wouldn't really need to worry about
the actual authentication of the user.  This is also brainstorming, of
course, I don't have a fully formed idea about how this would work, but
I do know that some other RDBMS's offer a similar masquerade kind of
ability.

   How do you generate these X versions of password hashes without admin
   changes?  In fact, I am not even sure how the server would create these
   unless it had the raw passwords.
  
  Ok, I've apparently not been clear enough with the proposal.  What we
  are hashing is the *current value* of what's in pg_authid- which we've
  already got on disk.  All we have to do is read the current on-disk
  value, add a salt to it, and store it back on disk.  We'd want to store
  more than one, as discussed, since otherwise the challenge/response is
  utterly useless.
 
 Oh, that is very interesting.  It seems like you are basically applying
 my server-fixed salt idea, except you are doing it X times so sniffing
 and then replay is less of a problem.   Hmmm.  I am very glad there is
 no admin work ---  I did not realize that.  It does make the trade-off
 of your idea more appealing, though I still think it weakens MD5 in an
 unacceptable way.

Well, that's where I come down on the side of our existing MD5 approach
is pretty darn weak wrt replay.  I agree, we can fix that, but I was
comparing to what our MD5 approach currently provides.

then they will be slightly more susceptible to network-based sniffing
   
   Slightly?  X versions as stored in pg_authid vs 16k or 4 billion?
  
  Sadly, yes.  It's not a perfect test, but a simple:
  
  time for a in `seq 1 1000`; do nc localhost 5432  /dev/null; done
  
  Gave me 9.15s, or ~0.00915s per connection on a single thread.  That
  times 16k is 146s or about two and a half minutes.  Of course, I'm
  comparing this against what we currently do since, well, that's what we
  currently do.  Changing it to 4b would certainly improve that.  Of
  course, using multiple threads, having multiple challenge/responses on
  hand (due to listening for a while) or simply breaking the MD5 hash
  (which we know isn't a terribly great hashing algorithm these days)
  would change that.
 
 Uh, my calculations show that as 434 days of trying.  (Not sure why you
 didn't bother doing that calculation.)  

I did.  I guess it wasn't clear but I was pointing out the difference
between my suggestion and the *current* state of things, where we have
the birthday problem.

 I think anyone who is worried
 about that level of attack would already be using MD5.  Again, MD5 is
 mostly used in low-security settings where you just don't want the
 password sent over the wire in cleartext.  Frankly, without TLS, you are
 already sending your queries and data across in clear-text, and there
 are other attack vectors.

This suggestion does not send the cleartext password over the wire.

 Fine, but causing MD5 to be less secure doesn't warrant fixing it this
 way.

What I was getting at is that it's not much less secure than our current
MD5 implementation when it comes to replay attacks.  Improving MD5 to be
more robust against replay attacks would be good, except that it doesn't
address the pg_authid-based risk.  I wish there was a solution which
allowed us to have both but I don't see one, without a wireline 

Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Sat, Mar  7, 2015 at 03:15:46PM -0500, Bruce Momjian wrote:
   Gave me 9.15s, or ~0.00915s per connection on a single thread.  That
   times 16k is 146s or about two and a half minutes.  Of course, I'm
   comparing this against what we currently do since, well, that's what we
   currently do.  Changing it to 4b would certainly improve that.  Of
   course, using multiple threads, having multiple challenge/responses on
   hand (due to listening for a while) or simply breaking the MD5 hash
   (which we know isn't a terribly great hashing algorithm these days)
   would change that.
  
  Uh, my calculations show that as 434 days of trying.  (Not sure why you
  didn't bother doing that calculation.)  I think anyone who is worried
  about that level of attack would already be using MD5.  Again, MD5 is
  mostly used in low-security settings where you just don't want the
  password sent over the wire in cleartext.  Frankly, without TLS, you are
  already sending your queries and data across in clear-text, and there
  are other attack vectors.
 
 Actually, with a counter, the bad guy just has to wait for the counter
 to roll around, and then try to catch the counter on the values he has
 recorded, meaning you wouldn't even be able to detect the hack attempts.
 :-)

That's true, if the counter is at an individual-level.  If it's cluster
wide then they aren't very likely to have the same counter for the same
individual after the wrap-around.  Then again, what individual is going
to be logging in 4 billion times?  There's a number of trade-offs here,
which is why we'd really be better off using an approach which security
folks have already vetted.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Bootstrap DATA is a pita

2015-03-07 Thread Jim Nasby

On 3/4/15 9:07 AM, Stephen Frost wrote:

* Robert Haas (robertmh...@gmail.com) wrote:

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:

and make it harder to compare entries by grepping out some common
substring.



Could you give an example of the sort of thing you wish to do?


On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is.  I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.


I wonder if we should have a tool in our repository to help people
edit the file.  So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

or

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.


Alright, I'll bite on this- we have this really neat tool for editing
data in bulk, or individual values, or pulling out data to look at based
on particular values or even functions...  It's called PostgreSQL.

What if we had an easy way to export an existing table into whatever
format we decide to use for initdb to use?  For that matter, what if
that file was simple to import into PG?

What about having a way to load all the catalog tables from their git
repo files into a pg_dev schema?  Maybe even include a make target or
initdb option which does that?  (the point here being to provide a way
to modify the tables and compare the results to the existing tables
without breaking the instance one is using for this)

I have to admit that I've never tried to do that with the existing
format, but seems like an interesting idea to consider.  I further
wonder if it'd be possible to generate the table structures too..


Semi-related... if we put some special handling in some places for 
bootstrap mode, couldn't most catalog objects be created using SQL, once 
we got pg_class, pg_attributes and pg_type created? That would 
theoretically allow us to drive much more of initdb with plain SQL 
(possibly created via pg_dump).

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bootstrap DATA is a pita

2015-03-07 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
 On 03/07/2015 05:46 PM, Andres Freund wrote:
 On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
 Semi-related... if we put some special handling in some places for bootstrap
 mode, couldn't most catalog objects be created using SQL, once we got
 pg_class, pg_attributes and pg_type created? That would theoretically allow
 us to drive much more of initdb with plain SQL (possibly created via
 pg_dump).
 Several people have now made that suggestion, but I *seriously* doubt
 that we actually want to go there. The overhead of executing SQL
 commands in comparison to the bki stuff is really rather
 noticeable. Doing the majority of the large number of insertions via SQL
 will make initdb noticeably slower. And it's already annoyingly
 slow. Besides make install it's probably the thing I wait most for
 during development.
 
 My reaction exactly. We should not make users pay a price for
 developers' convenience.

Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.

 That's besides the fact that SQL commands aren't actually that
 comfortably editable in bulk.
 
 Indeed.

No, they aren't, but having the data in a table in PG, with a way to
easily export to the format needed by BKI, would make bulk updates much
easier..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Bootstrap DATA is a pita

2015-03-07 Thread Jim Nasby

On 3/7/15 6:02 PM, Stephen Frost wrote:

* Andrew Dunstan (and...@dunslane.net) wrote:

On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.


My reaction exactly. We should not make users pay a price for
developers' convenience.


How often does a normal user actually initdb? I don't think it's that 
incredibly common. Added time to our development cycle certainly is a 
concern though.



Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.


Yeah, I was thinking SQL would make it even easier, but perhaps not. 
Since the other options here seem to have hit a dead end though, it 
seems your load it into tables idea is what we've got left...



That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.


Indeed.


No, they aren't, but having the data in a table in PG, with a way to
easily export to the format needed by BKI, would make bulk updates much
easier..


My thought was that pg_dump would be useful here, so instead of hand 
editing you'd just make changes in a live database and then dump it.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-07 Thread Bruce Momjian
On Sat, Mar  7, 2015 at 01:56:51PM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Sat, Mar  7, 2015 at 12:49:15PM -0500, Stephen Frost wrote:
   Ok, this is the incremented counter approach you brought up previously.
   Using the term 'cluster-wide salt' confused me as I thought you were
   referring to changing the on-disk format somehow with this.
  
  Yes, I used the term cluster-wide salt in two cases:  first,
  cluster-wide counter for the MD5 session salt improvement, and second,
  cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse
  if the cluster-wide fixed salt was set to match on two clusters, e.g.
  for pooling.
 
 Ok, how, in the second case, is pg_authid reuse prevented unless we also
 change the on-disk pg_authid format?  It doesn't matter if there's a

Yes, the second use would be a new authentication method.

 different counter used between different clusters, if the attacker has
 what's in pg_authid then they can trivially add any salt we send them
 during the challenge/response to the pg_authid value and send us the
 response we expect.

It would be a server-fixed salt sent to the client and applied before
the session salt.  The pg_authid value would also store with this salt. 
Again, I am brainstorming about poolers.

The big win for this idea is that it requires no client or admin
changes, while your idea of using a new salt does.  My personal opinion
is that the 32-bit counter idea improves MD5 replays, and that we are
better off going with an entirely new authentication method to fix the
pg_authid vulnerability.
   
   There's apparently some confusion here as my approach does not require
   any client or admin changes either.  If users are not using TLS today
  
  Really?  From your previous email I see:
  
   this approach looks like this: pre-determine and store the values (on a
   per-user basis, so a new field in pg_authid or some hack on the existing
   field) which will be sent to the client in the AuthenticationMD5Password
   message.  Further, calculate a random salt to be used when storing data
   in pg_authid.  Then, for however many variations we feel are necessary,
   calculate and store, for each AuthenticationMD5Password value:
  
  How do you generate these X versions of password hashes without admin
  changes?  In fact, I am not even sure how the server would create these
  unless it had the raw passwords.
 
 Ok, I've apparently not been clear enough with the proposal.  What we
 are hashing is the *current value* of what's in pg_authid- which we've
 already got on disk.  All we have to do is read the current on-disk
 value, add a salt to it, and store it back on disk.  We'd want to store
 more than one, as discussed, since otherwise the challenge/response is
 utterly useless.

Oh, that is very interesting.  It seems like you are basically applying
my server-fixed salt idea, except you are doing it X times so sniffing
and then replay is less of a problem.   Hmmm.  I am very glad there is
no admin work ---  I did not realize that.  It does make the trade-off
of your idea more appealing, though I still think it weakens MD5 in an
unacceptable way.

   then they will be slightly more susceptible to network-based sniffing
  
  Slightly?  X versions as stored in pg_authid vs 16k or 4 billion?
 
 Sadly, yes.  It's not a perfect test, but a simple:
 
 time for a in `seq 1 1000`; do nc localhost 5432  /dev/null; done
 
 Gave me 9.15s, or ~0.00915s per connection on a single thread.  That
 times 16k is 146s or about two and a half minutes.  Of course, I'm
 comparing this against what we currently do since, well, that's what we
 currently do.  Changing it to 4b would certainly improve that.  Of
 course, using multiple threads, having multiple challenge/responses on
 hand (due to listening for a while) or simply breaking the MD5 hash
 (which we know isn't a terribly great hashing algorithm these days)
 would change that.

Uh, my calculations show that as 434 days of trying.  (Not sure why you
didn't bother doing that calculation.)  I think anyone who is worried
about that level of attack would already be using MD5.  Again, MD5 is
mostly used in low-security settings where you just don't want the
password sent over the wire in cleartext.  Frankly, without TLS, you are
already sending your queries and data across in clear-text, and there
are other attack vectors.

   attacks than they are today due to the replay issue, but they are
   already at risk to a variety of other (arguably worse) attacks in that
   case.  Further, we can at least tell users about those risks and provide
   a way to address them.
  
  Right, but again, the user can choose to use TLS if they wish.  I think
  you are saying MD5 replay security is worthless without TLS, but I can
  assure you many users are happy with that.  The fact this issue rarely
  comes up is a testament to that, and in fact, until we documented
  exactly how MD5 

Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Gabriele Bartolini
Hi Bruce,

2015-03-08 5:37 GMT+11:00 Bruce Momjian br...@momjian.us:

 Desirability - Design - Implement - Test - Review - Commit

 This patch has continued in development without getting agreement on
 its Desirability or Design, meaning we are going to continue going back
 to those points until there is agreement.  Posting more versions of this
 patch is not going to change that.


Could you please elaborate that?

I actually think the approach that has been followed is what makes open
source and collaborative development work. The initial idea was based on
timestamp approach which, thanks to the input of several developers, led
Marco to develop LSN based checks and move forward the feature
implementation.

The numbers that Marco has posted clearly show that a lot of users will
benefit from this file-based approach for incremental backup through
pg_basebackup.

As far as I see it, the only missing bit is the pg_restorebackup tool which
is quite trivial - given the existing prototype in Python.

Thanks,
Gabriele


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Gabriele Bartolini
Hi Robert,

2015-03-07 2:57 GMT+11:00 Robert Haas robertmh...@gmail.com:

 By the way, unless I'm missing something, this patch only seems to
 include the code to construct an incremental backup, but no tools
 whatsoever to do anything useful with it once you've got it.


As stated previously, Marco is writing a tool called pg_restorebackup (the
prototype in Python has been already posted) to be included in the core. I
am in Australia now and not in the office so I cannot confirm it, but I am
pretty sure he had already written it and was about to send it to the list.

He's been trying to find more data - see the one that he's sent - in order
to convince that even a file-based approach is useful.

  I think that's 100% unacceptable.


I agree, that's why pg_restorebackup written in C is part of this patch.
See: https://wiki.postgresql.org/wiki/Incremental_backup


 Users need to be able to manipulate
 PostgreSQL backups using either standard operating system tools or
 tools provided with PostgreSQL.  Some people may prefer to use
 something like repmgr or pitrtools or omniptr in addition, but that
 shouldn't be a requirement for incremental backup to be usable.


Not at all. I believe those tools will have to use pg_basebackup and
pg_restorebackup. If they want to use streaming replication protocol they
will be responsible to make sure that - if the protocol changes - they
adapt their technology.

Agile development is good, but that does not mean you can divide a big
 project into arbitrarily small chunks.  At some point the chunks are
 too small to be sure that the overall direction is right, and/or
 individually useless.


The goal has always been to provide file-based incremental backup. I can
assure that this has always been our compass and the direction to follow.

I repeat that, using pg_restorebackup, this patch will transparently let
users benefit from incremental backup even when it will be moved to an
internal block-level logic. Users will continue to execute pg_basebackup
and pg_restorebackup, ignoring that with - for example 9.5 - it is
file-based (saving between 50-70% of space and time) of block level - for
example 9.6.

My proposal is that Marco provides pg_restorebackup according to the
initial plan - a matter of hours/days.

Cheers,
Gabriele


Re: [HACKERS] Bootstrap DATA is a pita

2015-03-07 Thread Andrew Dunstan


On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.



My reaction exactly. We should not make users pay a price for 
developers' convenience.




That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.


Indeed.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CATUPDATE confusion?

2015-03-07 Thread Adam Brightwell
All,


 pg_shadow, pg_user and pg_group were added when role support was added,
 specifically for backwards compatibility.  I don't believe there was
 ever discussion about keeping them because filtering pg_roles based on
 rolcanlogin was too onerous.  That said, we already decided recently
 that we wanted to keep them updated to match the actual attributes
 available (note that the replication role attribute modified those
 views) and I think that makes sense on the removal side as well as the
 new-attribute side.

 I continue to feel that we really should officially deprecate those
 views as I don't think they're actually all that useful any more and
 maintaining them ends up bringing up all these questions, discussion,
 and ends up being largely busy-work if no one really uses them.


Deprecation would certainly seem like an appropriate path for 'usecatupd'
(and the mentioned views).  Though, is there a 'formal' deprecation
policy/process that the community follows?  I certainly understand and
support giving client/dependent tools the time and opportunity to update
accordingly.  Therefore, I think having them read from 'rolsuper' for the
time being would be ok, provided that they are actually removed at the next
possible opportunity.  Otherwise, I'd probably lean towards just removing
them now and getting it over with.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


[HACKERS] Strange debug message of walreciver?

2015-03-07 Thread Tatsuo Ishii
When I set log_min_messages to debug5 and looked into walreciver log,
I saw this:

3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08 09:47:38.31493+09 
receipttime 2015-03-08 09:47:38.315027+09 replication apply delay -1945478837 
ms tran sfer latency 0 ms

The replication apply delay -1945478837 ms part looks strange
because the delay is below 0. The number is formatted as %d in elog
call, and I suspect this is kind of integer overflow.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Bruce Momjian
On Sun, Mar  8, 2015 at 09:26:38AM +1100, Gabriele Bartolini wrote:
 Hi Bruce,
 
 2015-03-08 5:37 GMT+11:00 Bruce Momjian br...@momjian.us:
 
         Desirability - Design - Implement - Test - Review - Commit
 
 This patch has continued in development without getting agreement on
 its Desirability or Design, meaning we are going to continue going back
 to those points until there is agreement.  Posting more versions of this
 patch is not going to change that.
 
 
 Could you please elaborate that?
 
 I actually think the approach that has been followed is what makes open source
 and collaborative development work. The initial idea was based on timestamp
 approach which, thanks to the input of several developers, led Marco to 
 develop
 LSN based checks and move forward the feature implementation.

OK, if you think everyone just going on their own and working on patches
that have little chance of being accepted, you can do it, but that
rarely makes successful open source software.  You can do whatever you
want with your patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Wrong error message in REINDEX command

2015-03-07 Thread Sawada Masahiko
Hi,

I got wrong error message when I did REINDEX SYSTEM command in
transaction as follows.
It should say ERROR:  REINDEX SYSTEM cannot run inside a transaction block
Attached patch fixes it.

[postgres][5432](1)=# begin;
BEGIN
[postgres][5432](1)=# reindex system postgres;
ERROR:  REINDEX DATABASE cannot run inside a transaction block
STATEMENT:  reindex system postgres;

Regards,

---
Sawada Masahiko


fix_reindex_error_message.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange debug message of walreciver?

2015-03-07 Thread Fabrízio de Royes Mello
On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii is...@postgresql.org wrote:

 When I set log_min_messages to debug5 and looked into walreciver log,
 I saw this:

 3600 2015-03-08 09:47:38 JST DEBUG:  sendtime 2015-03-08
09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication
apply delay -1945478837 ms tran sfer latency 0 ms

 The replication apply delay -1945478837 ms part looks strange
 because the delay is below 0. The number is formatted as %d in elog
 call, and I suspect this is kind of integer overflow.


Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed that
the integer overflow occurs because sometimes the return of the
GetCurrentChunkReplayStartTime() is 0 (zero).

I added an elog into GetReplicationApplyDelay to check this info:

DEBUG:  GetReplicationApplyDelay 479099832 seconds, 352 milliseconds,
(0.00, 479099832352083.00)
DEBUG:  sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08
00:17:12.352043-03 replication apply delay -1936504800 ms transfer latency
0 ms
DEBUG:  GetReplicationApplyDelay 479099841 seconds, 450 milliseconds,
(0.00, 479099841450320.00)
DEBUG:  sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08
00:17:21.450294-03 replication apply delay -1936495702 ms transfer latency
0 ms

Maybe we should check before and return zero from GetReplicationApplyDelay.
The attached patch implement it.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 496605f..0c45b66 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -328,6 +328,8 @@ GetReplicationApplyDelay(void)
 	long		secs;
 	int			usecs;
 
+	TimestampTz	chunckReplayStartTime;
+
 	SpinLockAcquire(walrcv-mutex);
 	receivePtr = walrcv-receivedUpto;
 	SpinLockRelease(walrcv-mutex);
@@ -337,7 +339,12 @@ GetReplicationApplyDelay(void)
 	if (receivePtr == replayPtr)
 		return 0;
 
-	TimestampDifference(GetCurrentChunkReplayStartTime(),
+	chunckReplayStartTime = GetCurrentChunkReplayStartTime();
+
+	if (chunckReplayStartTime == 0)
+		return 0;
+
+	TimestampDifference(chunckReplayStartTime,
 		GetCurrentTimestamp(),
 		secs, usecs);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-03-07 Thread Andreas Karlsson

On 02/11/2015 07:34 PM, Emre Hasegeli wrote:

The current code compiles but the brin test suite fails.


Now, only a test in .


Yeah, there is still a test which fails in opr_sanity.


Yes but they were also required by this patch.  This version adds more
functions and operators.  I can split them appropriately after your
review.


Ok, sounds fine to me.


This problem remains.  There is also a similar problem with the
range types, namely empty ranges.  There should be special cases
for them on some of the strategies.  I tried to solve the problems
in several different ways, but got a segfault one line or another.
This makes me think that BRIN framework doesn't support to store
different types than the indexed column in the values array.
For example, brin_deform_tuple() iterates over the values array and
copies them using the length of the attr on the index, not the length
of the type defined by OpcInfo function.  If storing another types
aren't supported, why is it required to return oid's on the OpcInfo
function.  I am confused.



I leave this to someone more knowledgable about BRIN to answer.


I think I have fixed them.


Looks good as far as I can tell.


I have fixed different addressed families by adding another support
function.

I used STORAGE parameter to support the point data type.  To make it
work I added some operators between box and point data type.  We can
support all geometric types with this method.


Looks to me like this should work.

= New comments

- Searching for the empty range is slow since the empty range matches 
all brin ranges.


EXPLAIN ANALYZE SELECT * FROM foo WHERE r = '[1,1)';
  QUERY PLAN 


---
 Bitmap Heap Scan on foo  (cost=12.01..16.02 rows=1 width=14) (actual 
time=47.603..47.605 rows=1 loops=1)

   Recheck Cond: (r = 'empty'::int4range)
   Rows Removed by Index Recheck: 20
   Heap Blocks: lossy=1082
   -  Bitmap Index Scan on foo_r_idx  (cost=0.00..12.01 rows=1 
width=0) (actual time=0.169..0.169 rows=11000 loops=1)

 Index Cond: (r = 'empty'::int4range)
 Planning time: 0.062 ms
 Execution time: 47.647 ms
(8 rows)

- Found a typo in the docs: withing the range

- Why have you removed the USE_ASSERT_CHECKING code from brin.c?

- Remove redundant or not from /* includes empty element or not */.

- Minor grammar gripe: Change Check that to Check if in the comments 
in brin_inclusion_add_value().


- Wont the code incorrectly return false if the first added element to 
an index page is empty?


- Would it be worth optimizing the code by checking for empty ranges 
after checking for overlap in brin_inclusion_add_value()? I would 
imagine that empty ranges are rare in most use cases.


- Typo in comment: If the it - If it

- Typo in comment: Note that this strategies - Note that these 
strategies


- Typo in comment: inequality strategies does not - inequality 
strategies do not


- Typo in comment: geometric types which uses - geometric types 
which use


- I get 'ERROR:  missing strategy 7 for attribute 1 of index 
bar_i_idx' when running the query below. Why does this not fail in the 
test suite? The overlap operator works just fine. If I read your code 
correctly other strategies are also missing.


SELECT * FROM bar WHERE i = '::1';

- I do not think this comment is true Used to determine the addresses 
have a common union or not. It actually checks if we can create range 
which contains both ranges.


- Compact random spaces in select numrange(1.0, 2.0) + numrange(2.5, 
3.0);-- should fail


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers