Re: [HACKERS] MD5 authentication needs help
On Sat, Mar 7, 2015 at 12:52:15PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: On Fri, Mar 6, 2015 at 07:00:10PM -0500, Stephen Frost wrote: suggested to me as one change we could make that would reduce the risk of disk-based attacks while trading that off for a higher risk on the side of network-based attacks while not breaking the existing network protocol. To make it very clear- it is not a solution but rather a poor bandaid. What we really need is a new password based protocol which is based on a future-proof, well designed protocol, such as SCRAM. Again, agreed. Great. Have you had a chance to review the SCRAM RFC or do you have any questions as it relates to how SCRAM works? I'd love to start a proper discussion about how we go about implementing it. I am feeling we have to wait for 9.5 to settle first. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
On Sat, Mar 7, 2015 at 12:49:15PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: On Fri, Mar 6, 2015 at 07:02:36PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: I think the best solution to this would be to introduce a per-cluster salt that is used for every password hash. That way, you could not replay a pg_authid hash on another server _unless_ you had manually assigned the same cluster salt to both servers, or connection pooler. Wouldn't that break the wireline protocol, unless you used a fixed set of known challenges? Perhaps I'm not following what you mean by a cluster-wide salt here. Well, right now the server sends a random 32-bit number as session salt, but due to the birthday problem, that can repeat in ~16k connection attempts. If we use an incremented counter for each session, we spread the salt evenly over the entire 32-bit key space, making replays much more difficult, e.g. 4 billion. Ok, this is the incremented counter approach you brought up previously. Using the term 'cluster-wide salt' confused me as I thought you were referring to changing the on-disk format somehow with this. Yes, I used the term cluster-wide salt in two cases: first, cluster-wide counter for the MD5 session salt improvement, and second, cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse if the cluster-wide fixed salt was set to match on two clusters, e.g. for pooling. I don't think the client would ever know the number was different from the random number we used to send. Agreed. The big win for this idea is that it requires no client or admin changes, while your idea of using a new salt does. My personal opinion is that the 32-bit counter idea improves MD5 replays, and that we are better off going with an entirely new authentication method to fix the pg_authid vulnerability. There's apparently some confusion here as my approach does not require any client or admin changes either. If users are not using TLS today Really? From your previous email I see: this approach looks like this: pre-determine and store the values (on a per-user basis, so a new field in pg_authid or some hack on the existing field) which will be sent to the client in the AuthenticationMD5Password message. Further, calculate a random salt to be used when storing data in pg_authid. Then, for however many variations we feel are necessary, calculate and store, for each AuthenticationMD5Password value: How do you generate these X versions of password hashes without admin changes? In fact, I am not even sure how the server would create these unless it had the raw passwords. then they will be slightly more susceptible to network-based sniffing Slightly? X versions as stored in pg_authid vs 16k or 4 billion? attacks than they are today due to the replay issue, but they are already at risk to a variety of other (arguably worse) attacks in that case. Further, we can at least tell users about those risks and provide a way to address them. Right, but again, the user can choose to use TLS if they wish. I think you are saying MD5 replay security is worthless without TLS, but I can assure you many users are happy with that. The fact this issue rarely comes up is a testament to that, and in fact, until we documented exactly how MD5 worked, we got many more questions about MD5. Perhaps we should document the pg_authid reuse risk. I think your argument is that if users are using TLS, there is no replay problem and you want to focus on the pg_authid problem. I think fixing the pg_authid problem inside MD5 is just too complex and we are better off creating a new authentication method for that. There is no replay problem if users are using TLS, that's correct. I don't believe the approach I've outlined is particularly complex, but that's clearly a biased viewpoint. I certainly agree that we need to create a new authentication method to address these issues properly and would love to discuss that further rather than discuss the merits of these potential improvements to md5. The bottom line is we promised MD5 to prevent replay, but not pg_authid stealing, and if we can improve on what we promised, we should do that and not hijack MD5 to fix a different problem, particularly because fixing MD5 for pg_authid requires admin action. Perhaps I'm missing it, but the documentation in 19.3.2 for 9.4 mentions md5 being useful to address password sniffing risk but doesn't mention anything about replay or the pg_authid-based risk. Sniffing of the cleartext password is addressed with the approach I've outlined, but the replay risk is greater. I assumed sniffing meant sniff to later replay, which is why we use the random session salt. We document how it works so users can decide. Further, we can provide a solution to the replay concern by encouraging use of
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: On Fri, Mar 6, 2015 at 07:02:36PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: I think the best solution to this would be to introduce a per-cluster salt that is used for every password hash. That way, you could not replay a pg_authid hash on another server _unless_ you had manually assigned the same cluster salt to both servers, or connection pooler. Wouldn't that break the wireline protocol, unless you used a fixed set of known challenges? Perhaps I'm not following what you mean by a cluster-wide salt here. Well, right now the server sends a random 32-bit number as session salt, but due to the birthday problem, that can repeat in ~16k connection attempts. If we use an incremented counter for each session, we spread the salt evenly over the entire 32-bit key space, making replays much more difficult, e.g. 4 billion. Ok, this is the incremented counter approach you brought up previously. Using the term 'cluster-wide salt' confused me as I thought you were referring to changing the on-disk format somehow with this. I don't think the client would ever know the number was different from the random number we used to send. Agreed. The big win for this idea is that it requires no client or admin changes, while your idea of using a new salt does. My personal opinion is that the 32-bit counter idea improves MD5 replays, and that we are better off going with an entirely new authentication method to fix the pg_authid vulnerability. There's apparently some confusion here as my approach does not require any client or admin changes either. If users are not using TLS today then they will be slightly more susceptible to network-based sniffing attacks than they are today due to the replay issue, but they are already at risk to a variety of other (arguably worse) attacks in that case. Further, we can at least tell users about those risks and provide a way to address them. I think your argument is that if users are using TLS, there is no replay problem and you want to focus on the pg_authid problem. I think fixing the pg_authid problem inside MD5 is just too complex and we are better off creating a new authentication method for that. There is no replay problem if users are using TLS, that's correct. I don't believe the approach I've outlined is particularly complex, but that's clearly a biased viewpoint. I certainly agree that we need to create a new authentication method to address these issues properly and would love to discuss that further rather than discuss the merits of these potential improvements to md5. The bottom line is we promised MD5 to prevent replay, but not pg_authid stealing, and if we can improve on what we promised, we should do that and not hijack MD5 to fix a different problem, particularly because fixing MD5 for pg_authid requires admin action. Perhaps I'm missing it, but the documentation in 19.3.2 for 9.4 mentions md5 being useful to address password sniffing risk but doesn't mention anything about replay or the pg_authid-based risk. Sniffing of the cleartext password is addressed with the approach I've outlined, but the replay risk is greater. Further, we can provide a solution to the replay concern by encouraging use of SSL/TLS. We can not provide any solution to the pg_authid-based risk with this approach (with the possible exception of recommending password-based auth, though the poor salt used makes that less effective than md5 with the suggestion I've outlined). The incremental counter approach implies not using the approach I've outlined, which is fine, but it doesn't change the pg_authid risk, nor does it help at all for TLS-using environments, and for users who are not using TLS, it doesn't make them particularly more secure to cleartext-password loss, connection hijacking, data loss, or anything except replay attacks. When it comes to risks, for my part at least, I don't feel like the replay risk is the largest concern to an operator who is not using TLS. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 04/03/15 23:51, Andreas Karlsson wrote: On 01/29/2015 12:28 AM, Peter Geoghegan wrote: * I'm not sure about the idea of polymorphic catalog functions (that return the type internal, but the actual struct returned varying based on build settings). I tend to think that things would be better if there was always a uniform return type for such internal type returning functions, but that *its* structure varied according to the availability of int128 (i.e. HAVE_INT128) at compile time. What we should probably do is have a third aggregate struct, that encapsulates this idea of (say) int2_accum() piggybacking on one of either Int128AggState or NumericAggState directly. Maybe that would be called PolyNumAggState. Then this common code is all that is needed on both types of build (at the top of int4_accum(), for example): PolyNumAggState *state; state = PG_ARGISNULL(0) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(0); I'm not sure if I'd do this with a containing struct or a simple #ifdef HAVE_INT128 typedef #else ... , but I think it's an improvement either way. Not entirely sure exactly what I mean but I have changed to something like this, relying on typedef, in the attached version of the patch. I think it looks better after the changes. I think this made the patch much better indeed. What I am wondering is if those numeric_int16_* functions that also deal with either the Int128AggState or NumericAggState should be renamed in similar fashion. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v8
On Thu, Mar 5, 2015 at 11:10:08AM -0500, Robert Haas wrote: But I agree with Fujii to the extent that I see little value in committing this patch in the form proposed. Being smart enough to use the LSN to identify changed blocks, but then sending the entirety of every file anyway because you don't want to go to the trouble of figuring out how to revise the wire protocol to identify the individual blocks being sent and write the tools to reconstruct a full backup based on that data, does not seem like enough of a win. As Fujii says, if we ship this patch as written, people will just keep using the timestamp-based approach anyway. Let's wait until we have something that is, at least in some circumstances, a material improvement over the status quo before committing anything. The big problem I have with this patch is that it has not followed the proper process for development, i.e. at the top of the TODO list we have: Desirability - Design - Implement - Test - Review - Commit This patch has continued in development without getting agreement on its Desirability or Design, meaning we are going to continue going back to those points until there is agreement. Posting more versions of this patch is not going to change that. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: On Sat, Mar 7, 2015 at 12:49:15PM -0500, Stephen Frost wrote: Ok, this is the incremented counter approach you brought up previously. Using the term 'cluster-wide salt' confused me as I thought you were referring to changing the on-disk format somehow with this. Yes, I used the term cluster-wide salt in two cases: first, cluster-wide counter for the MD5 session salt improvement, and second, cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse if the cluster-wide fixed salt was set to match on two clusters, e.g. for pooling. Ok, how, in the second case, is pg_authid reuse prevented unless we also change the on-disk pg_authid format? It doesn't matter if there's a different counter used between different clusters, if the attacker has what's in pg_authid then they can trivially add any salt we send them during the challenge/response to the pg_authid value and send us the response we expect. The big win for this idea is that it requires no client or admin changes, while your idea of using a new salt does. My personal opinion is that the 32-bit counter idea improves MD5 replays, and that we are better off going with an entirely new authentication method to fix the pg_authid vulnerability. There's apparently some confusion here as my approach does not require any client or admin changes either. If users are not using TLS today Really? From your previous email I see: this approach looks like this: pre-determine and store the values (on a per-user basis, so a new field in pg_authid or some hack on the existing field) which will be sent to the client in the AuthenticationMD5Password message. Further, calculate a random salt to be used when storing data in pg_authid. Then, for however many variations we feel are necessary, calculate and store, for each AuthenticationMD5Password value: How do you generate these X versions of password hashes without admin changes? In fact, I am not even sure how the server would create these unless it had the raw passwords. Ok, I've apparently not been clear enough with the proposal. What we are hashing is the *current value* of what's in pg_authid- which we've already got on disk. All we have to do is read the current on-disk value, add a salt to it, and store it back on disk. We'd want to store more than one, as discussed, since otherwise the challenge/response is utterly useless. then they will be slightly more susceptible to network-based sniffing Slightly? X versions as stored in pg_authid vs 16k or 4 billion? Sadly, yes. It's not a perfect test, but a simple: time for a in `seq 1 1000`; do nc localhost 5432 /dev/null; done Gave me 9.15s, or ~0.00915s per connection on a single thread. That times 16k is 146s or about two and a half minutes. Of course, I'm comparing this against what we currently do since, well, that's what we currently do. Changing it to 4b would certainly improve that. Of course, using multiple threads, having multiple challenge/responses on hand (due to listening for a while) or simply breaking the MD5 hash (which we know isn't a terribly great hashing algorithm these days) would change that. attacks than they are today due to the replay issue, but they are already at risk to a variety of other (arguably worse) attacks in that case. Further, we can at least tell users about those risks and provide a way to address them. Right, but again, the user can choose to use TLS if they wish. I think you are saying MD5 replay security is worthless without TLS, but I can assure you many users are happy with that. The fact this issue rarely comes up is a testament to that, and in fact, until we documented exactly how MD5 worked, we got many more questions about MD5. Perhaps we should document the pg_authid reuse risk. We should certainly document the pg_authid reuse risk. For my part, I don't think it's that people are happy with that trade-off but rather that they simply don't realize the risk is there because, basically, who would ever do that? I'm not aware of any other server application which even offers an option to store the authentication token out on disk in a format which can be trivially used to authenticate to the system (excluding people who are storing cleartext passwords..). I do feel that people who are worried about MD5 replay security who are *not* using TLS have not considered their risks properly. It is unclear to me why anyone should feel safe from an attacker who is able to sniff the network traffic thanks to our challenge/response protocol. Further, we can provide a solution to the replay concern by encouraging use of SSL/TLS. We can not provide any solution to the pg_authid-based risk with this approach (with the possible exception of recommending password-based auth, though the poor salt used makes that less effective than md5 with the suggestion I've outlined).
Re: [HACKERS] MD5 authentication needs help
On Fri, Mar 6, 2015 at 07:02:36PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: On Fri, Mar 6, 2015 at 12:50:14PM -0800, Josh Berkus wrote: On 03/06/2015 08:19 AM, Stephen Frost wrote: Well, server-side, we already have that- have pgbouncer run on the database server (something which I'm typically in favor of anyway) and use 'peer'. If it supported TLS then it could use certificates instead. The question is what to do after the pooler has connected and that's actually a generic issue which goes beyond poolers and into applications, basically, how can I re-authenticate this connection using a different role. We have SET ROLE, but that gives a lot of power to the role the pooler logs in as. It'd definitely be neat to provide a way to use SCRAM or similar to do that re-authentication after the initial connection. Using pgbouncer on the DB server is common, but less common that using it on an intermediate server or even the app server itself. So anything we create needs to be implementable with all three configurations in some way. I think the best solution to this would be to introduce a per-cluster salt that is used for every password hash. That way, you could not replay a pg_authid hash on another server _unless_ you had manually assigned the same cluster salt to both servers, or connection pooler. Wouldn't that break the wireline protocol, unless you used a fixed set of known challenges? Perhaps I'm not following what you mean by a cluster-wide salt here. Well, right now the server sends a random 32-bit number as session salt, but due to the birthday problem, that can repeat in ~16k connection attempts. If we use an incremented counter for each session, we spread the salt evenly over the entire 32-bit key space, making replays much more difficult, e.g. 4 billion. I don't think the client would ever know the number was different from the random number we used to send. The big win for this idea is that it requires no client or admin changes, while your idea of using a new salt does. My personal opinion is that the 32-bit counter idea improves MD5 replays, and that we are better off going with an entirely new authentication method to fix the pg_authid vulnerability. I think your argument is that if users are using TLS, there is no replay problem and you want to focus on the pg_authid problem. I think fixing the pg_authid problem inside MD5 is just too complex and we are better off creating a new authentication method for that. The bottom line is we promised MD5 to prevent replay, but not pg_authid stealing, and if we can improve on what we promised, we should do that and not hijack MD5 to fix a different problem, particularly because fixing MD5 for pg_authid requires admin action. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
On Fri, Mar 6, 2015 at 07:00:10PM -0500, Stephen Frost wrote: I'm also worried about both, but if the admin is worried about sniffing in their environment, they're much more likely to use TLS than to set up client side certificates, kerberos, or some other strong auth mechanism, simply because TLS is pretty darn easy to get working and distros set it up for you by default. I think your view might be skewed. I think there many people who care about password security who don't care to do TLS. I'm not quite sure that I follow what you mean about caring for password security. I'll try to clarify by suggesting a few things that I think you might mean and will respond to them. Please clarify if I've completely missed what you're getting at here. If the concern is database access due to an attacker who can sniff the network data then the approach which I suggested would make things materially worse for users who do not employ TLS. Once the attacker has sniffed the network for a little while, they'll have one and likely more challenge/responses which they could use to attempt to log in with. As discussed, our existing challenge/response system is vulnerable to network sniffing based replay attacks also. Yes, I am worried about that attack vector. We have always been open that MD5 only prevents password sniffing, though the 16k reply idea has put a dent into that promise. See the email I just sent on how I think we should proceed, i.e. by creating a new authentication method to fix pg_authid stealing, and use a salt counter to improve MD5. Also, my suggestion to use a counter for the session salt, to reduce replay from 16k to 4 billion, has not received any comments, and it does not break the wire protocol. I feel that is an incremental improvement we should consider. You are correct, that would improve the existing protocol regarding database-access risk due to network-based sniffing attacks. Unfortunately, it would not improve cleartext or database access risk due to disk-based attacks. Agreed. I think you are minimizing the downsize of your idea using X challenges instead of 16k challenges to get in. Again, if my idea is valid, it would be X challenges vs 4 billion challenges. The reason I have not been as excited by this approach is that we already have a solution for network-based sniffing attacks. As Josh mentioned, there are users out there who even go to extraordinary lengths to thwart network-based sniffing attacks by using stunnel with pg_bouncer. On the flip side, while there are ways to protect backups through encryption, many other vectors exist for disk-based attacks which result in either database access or finding the cleartext password with much less difficulty. Further, this only improves the authentication handling and doesn't improve our risk to other network-based attacks, including connection hijacking, sniffing during password set/reset, data compromise as it's sent across the wire, etc. Encouraging use of TLS addresses all of those risks. I don't recall any complaints about these other network-based attacks and I do believe that's because TLS is available. Combined with the approach I've suggested, we would reduce the risk of disk-based attacks to the extent we're able to without breaking the protocol. Agreed. Again, TLS has its own value beyond simple authentication. I just don't think getting MD5 to try to fix the pg_authid problem is the right approach. For my part, doing this, or going with my suggestion, or doing nothing with md5, really doesn't move us forward very much, which frustrates me greatly. I brought this suggestion to this list because it was Agreed. suggested to me as one change we could make that would reduce the risk of disk-based attacks while trading that off for a higher risk on the side of network-based attacks while not breaking the existing network protocol. To make it very clear- it is not a solution but rather a poor bandaid. What we really need is a new password based protocol which is based on a future-proof, well designed protocol, such as SCRAM. Again, agreed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: On Fri, Mar 6, 2015 at 07:00:10PM -0500, Stephen Frost wrote: suggested to me as one change we could make that would reduce the risk of disk-based attacks while trading that off for a higher risk on the side of network-based attacks while not breaking the existing network protocol. To make it very clear- it is not a solution but rather a poor bandaid. What we really need is a new password based protocol which is based on a future-proof, well designed protocol, such as SCRAM. Again, agreed. Great. Have you had a chance to review the SCRAM RFC or do you have any questions as it relates to how SCRAM works? I'd love to start a proper discussion about how we go about implementing it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] TABLESAMPLE patch
On 05/03/15 09:21, Amit Kapila wrote: On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Double checking for tuple visibility is the only downside I can think of. Plus some added code complexity of course. I guess we could use binary search on rs_vistuples (it's already sorted) so that info won't be completely useless. Not sure if worth it compared to normal visibility check, but not hard to do. I personally don't see the page pruning in TABLESAMPLE as all that important though, considering that in practice it will only scan tiny portion of a relation and usually one that does not get many updates (in practice the main use-case is BI). Few other comments: 1. Current patch fails to apply, minor change is required: patching file `src/backend/utils/misc/Makefile' Hunk #1 FAILED at 15. 1 out of 1 hunk FAILED -- saving rejects to src/backend/utils/misc/Makefile.rej Ah bitrot over time. 2. Few warnings in code (compiled on windows(msvc)) 1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' : truncation from 'double' to 'float4' 1src\backend\utils\tablesample\system.c(177): warning C4305: '=' : truncation from 'double' to 'float4' I think this is just compiler stupidity but hopefully fixed (I don't have msvc to check for it and other compilers I tried don't complain). 3. +static void +InitScanRelation(SampleScanState *node, EState *estate, int eflags, +TableSampleClause *tablesample) { .. +/* +* Page at a time mode is useless for us as we need to check visibility +* of tuples individually because tuple offsets returned by sampling +* methods map to rs_vistuples values and not to its indexes. +*/ +node-ss.ss_currentScanDesc-rs_pageatatime = false; .. } Modifying scandescriptor in nodeSamplescan.c looks slightly odd, Do we modify this way at anyother place? I think it is better to either teach heap_beginscan_* api's about it or expose it via new API in heapam.c Yeah I agree, I taught the heap_beginscan_strat about it as that one is the advanced API. 4. +Datum +tsm_system_cost(PG_FUNCTION_ARGS) { .. +*tuples = path-rows * samplesize; } It seems above calculation considers all rows in table are of equal size and hence multiplying directly with samplesize will give estimated number of rows for sample method, however if row size varies then this calculation might not give right results. I think if possible we should consider the possibility of rows with varying sizes else we can at least write a comment to indicate the possibility of improvement for future. I am not sure how we would know what size would the tuples have in the random blocks that we are going to read later. That said, I am sure that costing can be improved even if I don't know how myself. 6. @@ -13577,6 +13608,7 @@ reserved_keyword: | PLACING | PRIMARY | REFERENCES +| REPEATABLE Have you tried to investigate the reason why it is giving shift/reduce error for unreserved category and if we are not able to resolve that, then at least we can try to make it in some less restrictive category. I tried (on windows) by putting it in (type_func_name_keyword:) and it seems to be working. To investigate, you can try with information at below link: http://www.gnu.org/software/bison/manual/html_node/Understanding.html Basically I think we should try some more before concluding to change the category of REPEATABLE and especially if we want to make it a RESERVED keyword. Yes it can be moved to type_func_name_keyword which is not all that much better but at least something. I did try to play with this already during first submission but could not find a way to move it to something less restrictive. I could not even pinpoint what exactly is the shift/reduce issue except that it has something to do with the fact that the REPEATABLE clause is optional (at least I didn't have the problem when it wasn't optional). On a related note, it seems you have agreed upthread with Kyotaro-san for below change, but I don't see the same in latest patch: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
Re: [HACKERS] proposal: searching in array function - array_position
On 22/02/15 12:19, Pavel Stehule wrote: 2015-02-22 3:00 GMT+01:00 Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com: On 28/01/15 08:15, Pavel Stehule wrote: 2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com: On 1/27/15 4:36 AM, Pavel Stehule wrote: It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data. Externally cached data? Some from these functions has own caches for minimize access to typcache (array_map, array_cmp is example). And in first case, I am trying to push these information from fn_extra, in second case I don't do it, because I don't expect a repeated call (and I am expecting so type cache will be enough). You actually do caching via fn_extra in both case and I think that's the correct way, and yes that part can be moved common function. I also see that the documentation does not say what is returned by array_offset if nothing is found (it's documented in code but not in sgml). rebased + fixed docs You still duplicate the type cache code, but many other array functions do that too so I will not hold that against you. (Maybe somebody should write separate patch that would put all that duplicate code into common function?) I think this patch is ready for committer so I am going to mark it as such. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CATUPDATE confusion?
* Peter Eisentraut (pete...@gmx.net) wrote: On 3/7/15 12:31 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On 12/29/14 7:16 PM, Adam Brightwell wrote: Given this discussion, I have attached a patch that removes CATUPDATE for review/discussion. committed this version Hmm .. I'm not sure that summarily removing usecatupd from those three system views was well thought out. pg_shadow, especially, has no reason to live at all except for backwards compatibility, and clients might well expect that column to still be there. I wonder if we'd not be better off to keep the column in the views but have it read from rolsuper. I doubt anyone is reading the column. And if they are, they should stop. Certainly pgAdmin and similar tools are. From that standpoint though, I agree that they should be modified to no longer read it, as the whole point of those kinds of tools is to allow users to modify those attributes and having CATUPDATE in the view would surely be confusing as the user wouldn't be able to modify it. pg_shadow and pg_user have been kept around because it is plausible that a lot of tools want to have a list of users, and requiring all of them to change to pg_authid at once was deemed too onerous at the time. I don't think this requires us to keep all the details the same forever. pg_shadow, pg_user and pg_group were added when role support was added, specifically for backwards compatibility. I don't believe there was ever discussion about keeping them because filtering pg_roles based on rolcanlogin was too onerous. That said, we already decided recently that we wanted to keep them updated to match the actual attributes available (note that the replication role attribute modified those views) and I think that makes sense on the removal side as well as the new-attribute side. I continue to feel that we really should officially deprecate those views as I don't think they're actually all that useful any more and maintaining them ends up bringing up all these questions, discussion, and ends up being largely busy-work if no one really uses them. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] improve pgbench syntax error messages
Here is a v3, which (1) activates better error messages from bison and (2) improves the error reporting from the scanner as well. v4. While adding a basic function call syntax to expressions, a noticed that it would be useful to access the detail field of syntax errors so as to report the name of the unknown function. This version just adds the hook (expr_yyerror_detailed) that could be called later for this purpose. -- Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y index 243c6b9..0405a50 100644 --- a/contrib/pgbench/exprparse.y +++ b/contrib/pgbench/exprparse.y @@ -26,6 +26,10 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, %expect 0 %name-prefix=expr_yy +/* better error reporting with bison */ +%define parse.lac full +%define parse.error verbose + %union { int64 ival; diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l index 4c9229c..de3f09a 100644 --- a/contrib/pgbench/exprscan.l +++ b/contrib/pgbench/exprscan.l @@ -18,6 +18,13 @@ static YY_BUFFER_STATE scanbufhandle; static char *scanbuf; static int scanbuflen; +/* context information for error reporting */ +static char *expr_source = NULL; +static int expr_lineno = 0; +static char *expr_full_line = NULL; +static char *expr_command = NULL; +static int expr_col = 0; + /* flex 2.5.4 doesn't bother with a decl for this */ int expr_yylex(void); @@ -52,29 +59,42 @@ space [ \t\r\f] .{ yycol += yyleng; - fprintf(stderr, unexpected character '%s'\n, yytext); + syntax_error(expr_source, expr_lineno, expr_full_line, expr_command, + unexpected character, yytext, expr_col + yycol); + /* dead code, exit is called from syntax_error */ return CHAR_ERROR; } %% void -yyerror(const char *message) +expr_yyerror_detailed(const char *message, const char * detail) { - /* yyline is always 1 as pgbench calls the parser for each line... - * so the interesting location information is the column number */ - fprintf(stderr, %s at column %d\n, message, yycol); - /* go on to raise the error from pgbench with more information */ - /* exit(1); */ + syntax_error(expr_source, expr_lineno, expr_full_line, expr_command, + message, detail, expr_col + yycol); +} + +void yyerror(const char *message) +{ + expr_yyerror_detailed(message, NULL); } /* * Called before any actual parsing is done */ void -expr_scanner_init(const char *str) +expr_scanner_init(const char *str, const char *source, + const int lineno, const char *line, + const char *cmd, const int ecol) { Size slen = strlen(str); + /* save context informations for error messages */ + expr_source = (char *) source; + expr_lineno = (int) lineno; + expr_full_line = (char *) line; + expr_command = (char *) cmd; + expr_col = (int) ecol; + /* * Might be left over after error */ @@ -102,4 +122,9 @@ expr_scanner_finish(void) { yy_delete_buffer(scanbufhandle); pg_free(scanbuf); + expr_source = NULL; + expr_lineno = 0; + expr_full_line = NULL; + expr_command = NULL; + expr_col = 0; } diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 706fdf5..3e50bae 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -291,6 +291,7 @@ typedef struct int type; /* command type (SQL_COMMAND or META_COMMAND) */ int argc; /* number of command words */ char *argv[MAX_ARGS]; /* command word list */ + int cols[MAX_ARGS]; /* corresponding column starting from 1 */ PgBenchExpr *expr; /* parsed expression */ } Command; @@ -2189,6 +2190,28 @@ parseQuery(Command *cmd, const char *raw_sql) return true; } +void syntax_error(const char *source, const int lineno, + const char *line, const char *command, + const char *msg, const char *more, const int column) +{ + fprintf(stderr, %s:%d: %s, source, lineno, msg); + if (more) + fprintf(stderr, (%s), more); + if (column != -1) + fprintf(stderr, at column %d, column); + fprintf(stderr, in command \%s\\n, command); + if (line) { + fprintf(stderr, %s\n, line); + if (column != -1) { + int i = 0; + for (i = 0; i column - 1; i++) +fprintf(stderr, ); + fprintf(stderr, ^ error found here\n); + } + } + exit(1); +} + /* Parse a command; return a Command struct, or NULL if it's a comment */ static Command * process_commands(char *buf, const char *source, const int lineno) @@ -2200,6 +2223,11 @@ process_commands(char *buf, const char *source, const int lineno) char *p, *tok; +/* error message generation helper */ +#define SYNTAX_ERROR(msg, more, col) \ + syntax_error(source, lineno, my_commands-line, \ + my_commands-argv[0], msg, more, col) + /* Make the string buf end at the next newline */ if ((p = strchr(buf, '\n')) != NULL) *p = '\0'; @@ -2228,11 +2256,13 @@ process_commands(char *buf, const char *source, const int lineno) j = 0; tok = strtok(++p, delim); + /* stop set argument parsing the variable name */ if
Re: [HACKERS] CATUPDATE confusion?
On 3/7/15 12:31 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On 12/29/14 7:16 PM, Adam Brightwell wrote: Given this discussion, I have attached a patch that removes CATUPDATE for review/discussion. committed this version Hmm .. I'm not sure that summarily removing usecatupd from those three system views was well thought out. pg_shadow, especially, has no reason to live at all except for backwards compatibility, and clients might well expect that column to still be there. I wonder if we'd not be better off to keep the column in the views but have it read from rolsuper. I doubt anyone is reading the column. And if they are, they should stop. pg_shadow and pg_user have been kept around because it is plausible that a lot of tools want to have a list of users, and requiring all of them to change to pg_authid at once was deemed too onerous at the time. I don't think this requires us to keep all the details the same forever. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v8
Il 05/03/15 05:42, Bruce Momjian ha scritto: On Thu, Mar 5, 2015 at 01:25:13PM +0900, Fujii Masao wrote: Yeah, it might make the situation better than today. But I'm afraid that many users might get disappointed about that behavior of an incremental backup after the release... I don't get what do you mean here. Can you elaborate this point? The proposed version of LSN-based incremental backup has some limitations (e.g., every database files need to be read even when there is no modification in database since last backup, and which may make the backup time longer than users expect) which may disappoint users. So I'm afraid that users who can benefit from the feature might be very limited. IOW, I'm just sticking to the idea of timestamp-based one :) But I should drop it if the majority in the list prefers the LSN-based one even if it has such limitations. We need numbers on how effective each level of tracking will be. Until then, the patch can't move forward. I've written a little test script to estimate how much space can be saved by file level incremental, and I've run it on some real backups I have access to. The script takes two basebackup directory and simulate how much data can be saved in the 2nd backup using incremental backup (using file size/time and LSN) It assumes that every file in base, global and pg_tblspc which matches both size and modification time will also match from the LSN point of view. The result is that many databases can take advantage of incremental, even if not do big, and considering LSNs yield a result almost identical to the approach based on filesystem metadata. == Very big geographic database (similar to openstreetmap main DB), it contains versioned data, interval two months First backup size: 13286623850656 (12.1TiB) Second backup size: 13323511925626 (12.1TiB) Matching files count: 17094 Matching LSN count: 14580 Matching files size: 9129755116499 (8.3TiB, 68.5%) Matching LSN size: 9128568799332 (8.3TiB, 68.5%) == Big on-line store database, old data regularly moved to historic partitions, interval one day First backup size: 1355285058842 (1.2TiB) Second backup size: 1358389467239 (1.2TiB) Matching files count: 3937 Matching LSN count: 2821 Matching files size: 762292960220 (709.9GiB, 56.1%) Matching LSN size: 762122543668 (709.8GiB, 56.1%) == Ticketing system database, interval one day First backup size: 144988275 (138.3MiB) Second backup size: 146135155 (139.4MiB) Matching files count: 3124 Matching LSN count: 2641 Matching files size: 76908986 (73.3MiB, 52.6%) Matching LSN size: 67747928 (64.6MiB, 46.4%) == Online store, interval one day First backup size: 20418561133 (19.0GiB) Second backup size: 20475290733 (19.1GiB) Matching files count: 5744 Matching LSN count: 4302 Matching files size: 4432709876 (4.1GiB, 21.6%) Matching LSN size: 4388993884 (4.1GiB, 21.4%) == Heavily updated database, interval one week First backup size: 3203198962 (3.0GiB) Second backup size: 3222409202 (3.0GiB) Matching files count: 1801 Matching LSN count: 1273 Matching files size: 91206317 (87.0MiB, 2.8%) Matching LSN size: 69083532 (65.9MiB, 2.1%) Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it #!/usr/bin/env python from __future__ import print_function import collections from optparse import OptionParser import os import sys __author__ = 'Marco Nenciarini marco.nenciar...@2ndquadrant.it' usage = usage: %prog [options] backup_1 backup_2 parser = OptionParser(usage=usage) (options, args) = parser.parse_args() # need 2 directories if len(args) != 2: parser.print_help() sys.exit(1) FileItem = collections.namedtuple('FileItem', 'size time path') def get_files(target_dir): Return a set of FileItem files = set() for dir_path, _, file_names in os.walk(target_dir, followlinks=True): for filename in file_names: path = os.path.join(dir_path, filename) rel_path = path[len(target_dir) + 1:] stats = os.stat(path) files.add(FileItem(stats.st_size, stats.st_mtime, rel_path)) return files def size_fmt(num, suffix='B'): Format a size for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']: if abs(num) 1024.0: return %3.1f%s%s % (num, unit, suffix) num /= 1024.0 return %.1f%s%s % (num, 'Yi', suffix) def percent_fmt(a, b): Format a percent return %.1f%% % (100.0*a/b) def get_size(file_set): return reduce(lambda x, y: x + y.size, file_set, 0) def report(a, b): # find files that are identical (same size and same time) common = a b # remove files count only main forks LSN common_lsn = set() for item in common: # remove non main fork if any(suffix in item.path for suffix in ('_fsm', '_vm', '_init')): continue # remove things outside data
Re: [HACKERS] Improving test coverage of extensions with pg_dump
On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier michael.paqu...@gmail.com wrote: Those patches are really simple, but then perhaps there are better or simpler ways than what is attached, so feel free to comment if you have any ideas. Attached are new patches somewhat based on the comments provided by Peter Eisentraut (http://www.postgresql.org/message-id/54f62c3f.8070...@gmx.net). - 0001 makes prove_check install the contents of the path located at $(CURDIR)/t/extra if present, which would be the path that test developers could use to store extensions, plugins or modules that would then be usable by a given set of TAP tests as they are installed in the temporary installation before launching the tests. - 0002 is the test for pg_dump checking extensions containing FK tables, this time integrated as a TAP test in src/bin/pg_dump, with the extension in src/bin/pg_dump/t/extra. IMO, this approach is scalable enough thinking long-term. And I think that the location of those extra extensions is fine in t/ as an hardcoded subfolder (any fresh idea being of course welcome) as this makes the stuff in extra/ dedicated to only on set of TAP tests, and it is even possible to set multiple extensions/modules. Regards, -- Michael From e3eec3c8f8af72d3c85e217441009842e09ce1fc Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Sat, 7 Mar 2015 21:13:15 + Subject: [PATCH 1/2] Make prove_check install contents t/extra if present This gives module developers the possibility to add a set of extensions and/or tools that will be available in the temporary installation of a given path, making them available for TAP test. --- src/Makefile.global.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 7c39d82..e856ca8 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -323,6 +323,8 @@ endef define prove_check $(MKDIR_P) tmp_check/log $(MAKE) -C $(top_builddir) DESTDIR='$(CURDIR)'/tmp_check/install install '$(CURDIR)'/tmp_check/log/install.log 21 +# Install extra set of test utilities if present +@if test -d $(CURDIR)/t/extra; then $(MAKE) -C $(CURDIR)/t/extra DESTDIR='$(CURDIR)'/tmp_check/install install '$(CURDIR)'/tmp_check/log/install.log 21; fi cd $(srcdir) TESTDIR='$(CURDIR)' PATH=$(CURDIR)/tmp_check/install$(bindir):$$PATH $(call add_to_path,$(ld_library_path_var),$(CURDIR)/tmp_check/install$(libdir)) top_builddir='$(CURDIR)/$(top_builddir)' PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl endef -- 2.3.1 From 0c7a45348d9a699aeeef6cbc911beca6e6e45235 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Sat, 7 Mar 2015 21:18:58 + Subject: [PATCH 2/2] Add TAP test for pg_dump checking data dump of extension tables The test added checks the data dump ordering of tables added in an extension linked among them with foreign keys. In order to do that, a test extension in the set of TAP tests of pg_dump, combined with a TAP test script making use of it. --- src/bin/pg_dump/.gitignore | 2 ++ src/bin/pg_dump/Makefile | 6 + src/bin/pg_dump/t/001_dump_test.pl | 27 ++ src/bin/pg_dump/t/extra/Makefile | 14 +++ src/bin/pg_dump/t/extra/tables_fk/Makefile | 16 + src/bin/pg_dump/t/extra/tables_fk/README | 5 .../pg_dump/t/extra/tables_fk/tables_fk--1.0.sql | 20 .../pg_dump/t/extra/tables_fk/tables_fk.control| 5 8 files changed, 95 insertions(+) create mode 100644 src/bin/pg_dump/t/001_dump_test.pl create mode 100644 src/bin/pg_dump/t/extra/Makefile create mode 100644 src/bin/pg_dump/t/extra/tables_fk/Makefile create mode 100644 src/bin/pg_dump/t/extra/tables_fk/README create mode 100644 src/bin/pg_dump/t/extra/tables_fk/tables_fk--1.0.sql create mode 100644 src/bin/pg_dump/t/extra/tables_fk/tables_fk.control diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore index c2c8677..02666f9 100644 --- a/src/bin/pg_dump/.gitignore +++ b/src/bin/pg_dump/.gitignore @@ -3,3 +3,5 @@ /pg_dump /pg_dumpall /pg_restore + +/tmp_check diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile index e890a62..1a3f2ca 100644 --- a/src/bin/pg_dump/Makefile +++ b/src/bin/pg_dump/Makefile @@ -49,5 +49,11 @@ installdirs: uninstall: rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X)) +check: all + $(prove_check) + +installcheck: install + $(prove_installcheck) + clean distclean maintainer-clean: rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o kwlookup.c $(KEYWRDOBJS) diff --git a/src/bin/pg_dump/t/001_dump_test.pl b/src/bin/pg_dump/t/001_dump_test.pl new file mode 100644 index 000..0953bf5 --- /dev/null +++ b/src/bin/pg_dump/t/001_dump_test.pl @@ -0,0 +1,27 @@ +use strict; +use
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/7/15 12:48 AM, Noah Misch wrote: On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. +1. Start far below 64 MiB; grow geometrically using repalloc_huge(); cap growth at vac_work_mem. +1 for repalloc'ing at need, but I'm not sure about the start far below 64 MiB part. 64MB is a pretty small amount on nearly any machine these days (and for anybody who thinks it isn't, that's why maintenance_work_mem is a tunable). True; nothing would explode, especially since the allocation would be strictly smaller than it is today. However, I can't think of a place in PostgreSQL where a growable allocation begins so aggressively, nor a reason to break new ground in that respect. For comparison, tuplestore/tuplesort start memtupsize at 1 KiB. (One could make a separate case for that practice being wrong.) A different line of thought is that it would seem to make sense to have the initial allocation vary depending on the relation size. For instance, you could assume there might be 10 dead tuples per page, and hence try to alloc that much if it fits in vac_work_mem. Sounds better than a fixed 64 MiB start, though I'm not sure it's better than a fixed 256 KiB start. In the case of vacuum, I think we presumably have a pretty good indicator of how much space we should need; namely reltuples * autovacuum_scale_factor. There shouldn't be too much more space needed than that if autovac is keeping up with things. If we go that route, does it still make sense to explicitly use repalloc_huge? It will just cut over to that at some point (128M?) anyway, and if you're vacuuming a small relation presumably it's not worth messing with. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
On Sat, Mar 7, 2015 at 03:15:46PM -0500, Bruce Momjian wrote: Gave me 9.15s, or ~0.00915s per connection on a single thread. That times 16k is 146s or about two and a half minutes. Of course, I'm comparing this against what we currently do since, well, that's what we currently do. Changing it to 4b would certainly improve that. Of course, using multiple threads, having multiple challenge/responses on hand (due to listening for a while) or simply breaking the MD5 hash (which we know isn't a terribly great hashing algorithm these days) would change that. Uh, my calculations show that as 434 days of trying. (Not sure why you didn't bother doing that calculation.) I think anyone who is worried about that level of attack would already be using MD5. Again, MD5 is mostly used in low-security settings where you just don't want the password sent over the wire in cleartext. Frankly, without TLS, you are already sending your queries and data across in clear-text, and there are other attack vectors. Actually, with a counter, the bad guy just has to wait for the counter to roll around, and then try to catch the counter on the values he has recorded, meaning you wouldn't even be able to detect the hack attempts. :-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. I've to say, I'm rather unconvinced that it's worth changing stuff around here. If overcommit is enabled, vacuum won't fail unless the memory is actually used (= no problem). If overcommit is disabled and you get memory allocations, you're probably already running awfully close to the maximum of your configuration and you're better off adjusting it. I'm not aware of any field complaints about this and thus I'm not sure it's worth fiddling with this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bootstrap DATA is a pita
On 2015-03-07 16:43:15 -0600, Jim Nasby wrote: Semi-related... if we put some special handling in some places for bootstrap mode, couldn't most catalog objects be created using SQL, once we got pg_class, pg_attributes and pg_type created? That would theoretically allow us to drive much more of initdb with plain SQL (possibly created via pg_dump). Several people have now made that suggestion, but I *seriously* doubt that we actually want to go there. The overhead of executing SQL commands in comparison to the bki stuff is really rather noticeable. Doing the majority of the large number of insertions via SQL will make initdb noticeably slower. And it's already annoyingly slow. Besides make install it's probably the thing I wait most for during development. That's besides the fact that SQL commands aren't actually that comfortably editable in bulk. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/7/15 4:49 PM, Andres Freund wrote: On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. That leaves me wondering why we bother with dynamic resizing in other areas (like sorts, for example) then? Why not just palloc work_mem and be done with it? What makes those cases different? I've to say, I'm rather unconvinced that it's worth changing stuff around here. If overcommit is enabled, vacuum won't fail unless the memory is actually used (= no problem). If overcommit is disabled and you get memory allocations, you're probably already running awfully close to the maximum of your configuration and you're better off adjusting it. I'm not aware of any field complaints about this and thus I'm not sure it's worth fiddling with this. Perhaps; Noah seems to be the one one who's seen this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
Peter, all, * Peter Eisentraut (pete...@gmx.net) wrote: Why are we not using roles and function execute privileges for this? Alright, I've got an initial patch to do this for pg_start/stop_backup, pg_switch_xlog, and pg_create_restore_point. The actual backend changes are quite small, as expected. I'll add in the changes for the other functions being discussed and adapt the documentation changes from the earlier patch to make sense, but what I'd really appreciate are any comments or thoughts regarding the changes to pg_dump (which are generic to all of the function changes, of course). I've added a notion of the catalog schema to pg_dump's internal _namespaceinfo representation and then marked pg_catalog as being that schema, as well as being a dumpable schema. Throughout the selectDumpable functions, I've made changes to only mark the objects in the catalog as dumpable if they are functions. I'm planning to look into the extension and binary upgrade paths since I'm a bit worried those may not work with this approach, but I wanted to get this out there for at least an initial review as, if people feel this makes things too ugly on the pg_dump side of things then we may want to reconsider using the role attributes instead. Thanks! Stephen diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 2179bf7..36029d0 *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,64 backupidstr = text_to_cstring(backupid); - if (!superuser() !has_rolreplication(GetUserId())) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg(must be superuser or replication role to run a backup))); - startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL); PG_RETURN_LSN(startpoint); --- 54,59 *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,92 { XLogRecPtr stoppoint; - if (!superuser() !has_rolreplication(GetUserId())) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(must be superuser or replication role to run a backup; - stoppoint = do_pg_stop_backup(NULL, true, NULL); PG_RETURN_LSN(stoppoint); --- 77,82 *** pg_switch_xlog(PG_FUNCTION_ARGS) *** 100,110 { XLogRecPtr switchpoint; - if (!superuser()) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(must be superuser to switch transaction log files; - if (RecoveryInProgress()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), --- 90,95 *** pg_create_restore_point(PG_FUNCTION_ARGS *** 129,139 char *restore_name_str; XLogRecPtr restorepoint; - if (!superuser()) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(must be superuser to create a restore point; - if (RecoveryInProgress()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), --- 114,119 diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql new file mode 100644 index 2800f73..38605de *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** RETURNS interval *** 897,899 --- 897,907 LANGUAGE INTERNAL STRICT IMMUTABLE AS 'make_interval'; + + -- Revoke privileges for functions that should not be available to + -- all users. Administrators are allowed to change this later, if + -- they wish. + REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean) FROM public; + REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public; + REVOKE EXECUTE ON FUNCTION pg_switch_xlog() FROM public; + REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c new file mode 100644 index fdfb431..164608a *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** selectDumpableNamespace(NamespaceInfo *n *** 1234,1245 --- 1234,1255 * If specific tables are being dumped, do not dump any complete * namespaces. If specific namespaces are being dumped, dump just those * namespaces. Otherwise, dump all non-system namespaces. + * + * Note that we do consider dumping ACLs of functions in pg_catalog, + * so mark that as a dumpable namespace, but further mark it as the + * catalog namespace. */ + + /* Will be set when we may be dumping catalog ACLs, see below. */ + nsinfo-catalog = false; + if (table_include_oids.head != NULL) nsinfo-dobj.dump = false; else if (schema_include_oids.head != NULL) nsinfo-dobj.dump = simple_oid_list_member(schema_include_oids, nsinfo-dobj.catId.oid); + else if (strncmp(nsinfo-dobj.name, pg_catalog, 10) == 0) + nsinfo-dobj.dump = nsinfo-catalog = true; else if
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: On Sat, Mar 7, 2015 at 01:56:51PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: Yes, I used the term cluster-wide salt in two cases: first, cluster-wide counter for the MD5 session salt improvement, and second, cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse if the cluster-wide fixed salt was set to match on two clusters, e.g. for pooling. Ok, how, in the second case, is pg_authid reuse prevented unless we also change the on-disk pg_authid format? It doesn't matter if there's a Yes, the second use would be a new authentication method. Ok.. I don't think it's a good idea to try and base a new auth method on what we're doing for md5, in any case. To the extent that we have options as it relates to SCRAM or another reviewed authentication protocol, perhaps it'd make sense to consider a per-cluster salt or similar, but I'm not sure we actually want or need to do so. different counter used between different clusters, if the attacker has what's in pg_authid then they can trivially add any salt we send them during the challenge/response to the pg_authid value and send us the response we expect. It would be a server-fixed salt sent to the client and applied before the session salt. The pg_authid value would also store with this salt. Again, I am brainstorming about poolers. When it comes to poolers, I'm thinking we should be looking at an approach to do an after-connection re-authentication. Essentially a 'SET ROLE WITH mypassword;' kind of approach, but also using a proper protocol, etc. That way, the pooler wouldn't really need to worry about the actual authentication of the user. This is also brainstorming, of course, I don't have a fully formed idea about how this would work, but I do know that some other RDBMS's offer a similar masquerade kind of ability. How do you generate these X versions of password hashes without admin changes? In fact, I am not even sure how the server would create these unless it had the raw passwords. Ok, I've apparently not been clear enough with the proposal. What we are hashing is the *current value* of what's in pg_authid- which we've already got on disk. All we have to do is read the current on-disk value, add a salt to it, and store it back on disk. We'd want to store more than one, as discussed, since otherwise the challenge/response is utterly useless. Oh, that is very interesting. It seems like you are basically applying my server-fixed salt idea, except you are doing it X times so sniffing and then replay is less of a problem. Hmmm. I am very glad there is no admin work --- I did not realize that. It does make the trade-off of your idea more appealing, though I still think it weakens MD5 in an unacceptable way. Well, that's where I come down on the side of our existing MD5 approach is pretty darn weak wrt replay. I agree, we can fix that, but I was comparing to what our MD5 approach currently provides. then they will be slightly more susceptible to network-based sniffing Slightly? X versions as stored in pg_authid vs 16k or 4 billion? Sadly, yes. It's not a perfect test, but a simple: time for a in `seq 1 1000`; do nc localhost 5432 /dev/null; done Gave me 9.15s, or ~0.00915s per connection on a single thread. That times 16k is 146s or about two and a half minutes. Of course, I'm comparing this against what we currently do since, well, that's what we currently do. Changing it to 4b would certainly improve that. Of course, using multiple threads, having multiple challenge/responses on hand (due to listening for a while) or simply breaking the MD5 hash (which we know isn't a terribly great hashing algorithm these days) would change that. Uh, my calculations show that as 434 days of trying. (Not sure why you didn't bother doing that calculation.) I did. I guess it wasn't clear but I was pointing out the difference between my suggestion and the *current* state of things, where we have the birthday problem. I think anyone who is worried about that level of attack would already be using MD5. Again, MD5 is mostly used in low-security settings where you just don't want the password sent over the wire in cleartext. Frankly, without TLS, you are already sending your queries and data across in clear-text, and there are other attack vectors. This suggestion does not send the cleartext password over the wire. Fine, but causing MD5 to be less secure doesn't warrant fixing it this way. What I was getting at is that it's not much less secure than our current MD5 implementation when it comes to replay attacks. Improving MD5 to be more robust against replay attacks would be good, except that it doesn't address the pg_authid-based risk. I wish there was a solution which allowed us to have both but I don't see one, without a wireline
Re: [HACKERS] MD5 authentication needs help
* Bruce Momjian (br...@momjian.us) wrote: On Sat, Mar 7, 2015 at 03:15:46PM -0500, Bruce Momjian wrote: Gave me 9.15s, or ~0.00915s per connection on a single thread. That times 16k is 146s or about two and a half minutes. Of course, I'm comparing this against what we currently do since, well, that's what we currently do. Changing it to 4b would certainly improve that. Of course, using multiple threads, having multiple challenge/responses on hand (due to listening for a while) or simply breaking the MD5 hash (which we know isn't a terribly great hashing algorithm these days) would change that. Uh, my calculations show that as 434 days of trying. (Not sure why you didn't bother doing that calculation.) I think anyone who is worried about that level of attack would already be using MD5. Again, MD5 is mostly used in low-security settings where you just don't want the password sent over the wire in cleartext. Frankly, without TLS, you are already sending your queries and data across in clear-text, and there are other attack vectors. Actually, with a counter, the bad guy just has to wait for the counter to roll around, and then try to catch the counter on the values he has recorded, meaning you wouldn't even be able to detect the hack attempts. :-) That's true, if the counter is at an individual-level. If it's cluster wide then they aren't very likely to have the same counter for the same individual after the wrap-around. Then again, what individual is going to be logging in 4 billion times? There's a number of trade-offs here, which is why we'd really be better off using an approach which security folks have already vetted. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Bootstrap DATA is a pita
On 3/4/15 9:07 AM, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: and make it harder to compare entries by grepping out some common substring. Could you give an example of the sort of thing you wish to do? On that angle, I'm dubious that a format that allows omission of fields is going to be easy for editing scripts to modify, no matter what the layout convention is. I've found it relatively easy to write sed or even Emacs macros to add new column values to old-school pg_proc.h ... but in this brave new world, I'm going to be really hoping that the column default works for 99.9% of pg_proc entries when we add a new pg_proc column, because slipping a value into a desired position is gonna be hard for a script when you don't know whether the adjacent existing fields are present or not. I wonder if we should have a tool in our repository to help people edit the file. So instead of going in there yourself and changing things by hand, or writing your own script, you can do: updatepgproc.pl --oid 5678 provolatile=v or updatepgpproc.pl --name='.*xact.*' prowhatever=someval Regardless of what format we end up with, that seems like it would make things easier. Alright, I'll bite on this- we have this really neat tool for editing data in bulk, or individual values, or pulling out data to look at based on particular values or even functions... It's called PostgreSQL. What if we had an easy way to export an existing table into whatever format we decide to use for initdb to use? For that matter, what if that file was simple to import into PG? What about having a way to load all the catalog tables from their git repo files into a pg_dev schema? Maybe even include a make target or initdb option which does that? (the point here being to provide a way to modify the tables and compare the results to the existing tables without breaking the instance one is using for this) I have to admit that I've never tried to do that with the existing format, but seems like an interesting idea to consider. I further wonder if it'd be possible to generate the table structures too.. Semi-related... if we put some special handling in some places for bootstrap mode, couldn't most catalog objects be created using SQL, once we got pg_class, pg_attributes and pg_type created? That would theoretically allow us to drive much more of initdb with plain SQL (possibly created via pg_dump). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bootstrap DATA is a pita
* Andrew Dunstan (and...@dunslane.net) wrote: On 03/07/2015 05:46 PM, Andres Freund wrote: On 2015-03-07 16:43:15 -0600, Jim Nasby wrote: Semi-related... if we put some special handling in some places for bootstrap mode, couldn't most catalog objects be created using SQL, once we got pg_class, pg_attributes and pg_type created? That would theoretically allow us to drive much more of initdb with plain SQL (possibly created via pg_dump). Several people have now made that suggestion, but I *seriously* doubt that we actually want to go there. The overhead of executing SQL commands in comparison to the bki stuff is really rather noticeable. Doing the majority of the large number of insertions via SQL will make initdb noticeably slower. And it's already annoyingly slow. Besides make install it's probably the thing I wait most for during development. My reaction exactly. We should not make users pay a price for developers' convenience. Just to clarify, since Jim was responding to my comment, my thought was *not* to use SQL commands inside initdb, but rather to use PG to create the source files that we have today in our tree, which wouldn't slow down initdb at all. That's besides the fact that SQL commands aren't actually that comfortably editable in bulk. Indeed. No, they aren't, but having the data in a table in PG, with a way to easily export to the format needed by BKI, would make bulk updates much easier.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Bootstrap DATA is a pita
On 3/7/15 6:02 PM, Stephen Frost wrote: * Andrew Dunstan (and...@dunslane.net) wrote: On 03/07/2015 05:46 PM, Andres Freund wrote: On 2015-03-07 16:43:15 -0600, Jim Nasby wrote: Semi-related... if we put some special handling in some places for bootstrap mode, couldn't most catalog objects be created using SQL, once we got pg_class, pg_attributes and pg_type created? That would theoretically allow us to drive much more of initdb with plain SQL (possibly created via pg_dump). Several people have now made that suggestion, but I *seriously* doubt that we actually want to go there. The overhead of executing SQL commands in comparison to the bki stuff is really rather noticeable. Doing the majority of the large number of insertions via SQL will make initdb noticeably slower. And it's already annoyingly slow. Besides make install it's probably the thing I wait most for during development. My reaction exactly. We should not make users pay a price for developers' convenience. How often does a normal user actually initdb? I don't think it's that incredibly common. Added time to our development cycle certainly is a concern though. Just to clarify, since Jim was responding to my comment, my thought was *not* to use SQL commands inside initdb, but rather to use PG to create the source files that we have today in our tree, which wouldn't slow down initdb at all. Yeah, I was thinking SQL would make it even easier, but perhaps not. Since the other options here seem to have hit a dead end though, it seems your load it into tables idea is what we've got left... That's besides the fact that SQL commands aren't actually that comfortably editable in bulk. Indeed. No, they aren't, but having the data in a table in PG, with a way to easily export to the format needed by BKI, would make bulk updates much easier.. My thought was that pg_dump would be useful here, so instead of hand editing you'd just make changes in a live database and then dump it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 authentication needs help
On Sat, Mar 7, 2015 at 01:56:51PM -0500, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: On Sat, Mar 7, 2015 at 12:49:15PM -0500, Stephen Frost wrote: Ok, this is the incremented counter approach you brought up previously. Using the term 'cluster-wide salt' confused me as I thought you were referring to changing the on-disk format somehow with this. Yes, I used the term cluster-wide salt in two cases: first, cluster-wide counter for the MD5 session salt improvement, and second, cluster-wide fixed salt to prevent pg_authid reuse, but to allow reuse if the cluster-wide fixed salt was set to match on two clusters, e.g. for pooling. Ok, how, in the second case, is pg_authid reuse prevented unless we also change the on-disk pg_authid format? It doesn't matter if there's a Yes, the second use would be a new authentication method. different counter used between different clusters, if the attacker has what's in pg_authid then they can trivially add any salt we send them during the challenge/response to the pg_authid value and send us the response we expect. It would be a server-fixed salt sent to the client and applied before the session salt. The pg_authid value would also store with this salt. Again, I am brainstorming about poolers. The big win for this idea is that it requires no client or admin changes, while your idea of using a new salt does. My personal opinion is that the 32-bit counter idea improves MD5 replays, and that we are better off going with an entirely new authentication method to fix the pg_authid vulnerability. There's apparently some confusion here as my approach does not require any client or admin changes either. If users are not using TLS today Really? From your previous email I see: this approach looks like this: pre-determine and store the values (on a per-user basis, so a new field in pg_authid or some hack on the existing field) which will be sent to the client in the AuthenticationMD5Password message. Further, calculate a random salt to be used when storing data in pg_authid. Then, for however many variations we feel are necessary, calculate and store, for each AuthenticationMD5Password value: How do you generate these X versions of password hashes without admin changes? In fact, I am not even sure how the server would create these unless it had the raw passwords. Ok, I've apparently not been clear enough with the proposal. What we are hashing is the *current value* of what's in pg_authid- which we've already got on disk. All we have to do is read the current on-disk value, add a salt to it, and store it back on disk. We'd want to store more than one, as discussed, since otherwise the challenge/response is utterly useless. Oh, that is very interesting. It seems like you are basically applying my server-fixed salt idea, except you are doing it X times so sniffing and then replay is less of a problem. Hmmm. I am very glad there is no admin work --- I did not realize that. It does make the trade-off of your idea more appealing, though I still think it weakens MD5 in an unacceptable way. then they will be slightly more susceptible to network-based sniffing Slightly? X versions as stored in pg_authid vs 16k or 4 billion? Sadly, yes. It's not a perfect test, but a simple: time for a in `seq 1 1000`; do nc localhost 5432 /dev/null; done Gave me 9.15s, or ~0.00915s per connection on a single thread. That times 16k is 146s or about two and a half minutes. Of course, I'm comparing this against what we currently do since, well, that's what we currently do. Changing it to 4b would certainly improve that. Of course, using multiple threads, having multiple challenge/responses on hand (due to listening for a while) or simply breaking the MD5 hash (which we know isn't a terribly great hashing algorithm these days) would change that. Uh, my calculations show that as 434 days of trying. (Not sure why you didn't bother doing that calculation.) I think anyone who is worried about that level of attack would already be using MD5. Again, MD5 is mostly used in low-security settings where you just don't want the password sent over the wire in cleartext. Frankly, without TLS, you are already sending your queries and data across in clear-text, and there are other attack vectors. attacks than they are today due to the replay issue, but they are already at risk to a variety of other (arguably worse) attacks in that case. Further, we can at least tell users about those risks and provide a way to address them. Right, but again, the user can choose to use TLS if they wish. I think you are saying MD5 replay security is worthless without TLS, but I can assure you many users are happy with that. The fact this issue rarely comes up is a testament to that, and in fact, until we documented exactly how MD5
Re: [HACKERS] File based Incremental backup v8
Hi Bruce, 2015-03-08 5:37 GMT+11:00 Bruce Momjian br...@momjian.us: Desirability - Design - Implement - Test - Review - Commit This patch has continued in development without getting agreement on its Desirability or Design, meaning we are going to continue going back to those points until there is agreement. Posting more versions of this patch is not going to change that. Could you please elaborate that? I actually think the approach that has been followed is what makes open source and collaborative development work. The initial idea was based on timestamp approach which, thanks to the input of several developers, led Marco to develop LSN based checks and move forward the feature implementation. The numbers that Marco has posted clearly show that a lot of users will benefit from this file-based approach for incremental backup through pg_basebackup. As far as I see it, the only missing bit is the pg_restorebackup tool which is quite trivial - given the existing prototype in Python. Thanks, Gabriele
Re: [HACKERS] File based Incremental backup v8
Hi Robert, 2015-03-07 2:57 GMT+11:00 Robert Haas robertmh...@gmail.com: By the way, unless I'm missing something, this patch only seems to include the code to construct an incremental backup, but no tools whatsoever to do anything useful with it once you've got it. As stated previously, Marco is writing a tool called pg_restorebackup (the prototype in Python has been already posted) to be included in the core. I am in Australia now and not in the office so I cannot confirm it, but I am pretty sure he had already written it and was about to send it to the list. He's been trying to find more data - see the one that he's sent - in order to convince that even a file-based approach is useful. I think that's 100% unacceptable. I agree, that's why pg_restorebackup written in C is part of this patch. See: https://wiki.postgresql.org/wiki/Incremental_backup Users need to be able to manipulate PostgreSQL backups using either standard operating system tools or tools provided with PostgreSQL. Some people may prefer to use something like repmgr or pitrtools or omniptr in addition, but that shouldn't be a requirement for incremental backup to be usable. Not at all. I believe those tools will have to use pg_basebackup and pg_restorebackup. If they want to use streaming replication protocol they will be responsible to make sure that - if the protocol changes - they adapt their technology. Agile development is good, but that does not mean you can divide a big project into arbitrarily small chunks. At some point the chunks are too small to be sure that the overall direction is right, and/or individually useless. The goal has always been to provide file-based incremental backup. I can assure that this has always been our compass and the direction to follow. I repeat that, using pg_restorebackup, this patch will transparently let users benefit from incremental backup even when it will be moved to an internal block-level logic. Users will continue to execute pg_basebackup and pg_restorebackup, ignoring that with - for example 9.5 - it is file-based (saving between 50-70% of space and time) of block level - for example 9.6. My proposal is that Marco provides pg_restorebackup according to the initial plan - a matter of hours/days. Cheers, Gabriele
Re: [HACKERS] Bootstrap DATA is a pita
On 03/07/2015 05:46 PM, Andres Freund wrote: On 2015-03-07 16:43:15 -0600, Jim Nasby wrote: Semi-related... if we put some special handling in some places for bootstrap mode, couldn't most catalog objects be created using SQL, once we got pg_class, pg_attributes and pg_type created? That would theoretically allow us to drive much more of initdb with plain SQL (possibly created via pg_dump). Several people have now made that suggestion, but I *seriously* doubt that we actually want to go there. The overhead of executing SQL commands in comparison to the bki stuff is really rather noticeable. Doing the majority of the large number of insertions via SQL will make initdb noticeably slower. And it's already annoyingly slow. Besides make install it's probably the thing I wait most for during development. My reaction exactly. We should not make users pay a price for developers' convenience. That's besides the fact that SQL commands aren't actually that comfortably editable in bulk. Indeed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CATUPDATE confusion?
All, pg_shadow, pg_user and pg_group were added when role support was added, specifically for backwards compatibility. I don't believe there was ever discussion about keeping them because filtering pg_roles based on rolcanlogin was too onerous. That said, we already decided recently that we wanted to keep them updated to match the actual attributes available (note that the replication role attribute modified those views) and I think that makes sense on the removal side as well as the new-attribute side. I continue to feel that we really should officially deprecate those views as I don't think they're actually all that useful any more and maintaining them ends up bringing up all these questions, discussion, and ends up being largely busy-work if no one really uses them. Deprecation would certainly seem like an appropriate path for 'usecatupd' (and the mentioned views). Though, is there a 'formal' deprecation policy/process that the community follows? I certainly understand and support giving client/dependent tools the time and opportunity to update accordingly. Therefore, I think having them read from 'rolsuper' for the time being would be ok, provided that they are actually removed at the next possible opportunity. Otherwise, I'd probably lean towards just removing them now and getting it over with. -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
[HACKERS] Strange debug message of walreciver?
When I set log_min_messages to debug5 and looked into walreciver log, I saw this: 3600 2015-03-08 09:47:38 JST DEBUG: sendtime 2015-03-08 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication apply delay -1945478837 ms tran sfer latency 0 ms The replication apply delay -1945478837 ms part looks strange because the delay is below 0. The number is formatted as %d in elog call, and I suspect this is kind of integer overflow. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v8
On Sun, Mar 8, 2015 at 09:26:38AM +1100, Gabriele Bartolini wrote: Hi Bruce, 2015-03-08 5:37 GMT+11:00 Bruce Momjian br...@momjian.us: Desirability - Design - Implement - Test - Review - Commit This patch has continued in development without getting agreement on its Desirability or Design, meaning we are going to continue going back to those points until there is agreement. Posting more versions of this patch is not going to change that. Could you please elaborate that? I actually think the approach that has been followed is what makes open source and collaborative development work. The initial idea was based on timestamp approach which, thanks to the input of several developers, led Marco to develop LSN based checks and move forward the feature implementation. OK, if you think everyone just going on their own and working on patches that have little chance of being accepted, you can do it, but that rarely makes successful open source software. You can do whatever you want with your patch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wrong error message in REINDEX command
Hi, I got wrong error message when I did REINDEX SYSTEM command in transaction as follows. It should say ERROR: REINDEX SYSTEM cannot run inside a transaction block Attached patch fixes it. [postgres][5432](1)=# begin; BEGIN [postgres][5432](1)=# reindex system postgres; ERROR: REINDEX DATABASE cannot run inside a transaction block STATEMENT: reindex system postgres; Regards, --- Sawada Masahiko fix_reindex_error_message.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange debug message of walreciver?
On Sat, Mar 7, 2015 at 10:18 PM, Tatsuo Ishii is...@postgresql.org wrote: When I set log_min_messages to debug5 and looked into walreciver log, I saw this: 3600 2015-03-08 09:47:38 JST DEBUG: sendtime 2015-03-08 09:47:38.31493+09 receipttime 2015-03-08 09:47:38.315027+09 replication apply delay -1945478837 ms tran sfer latency 0 ms The replication apply delay -1945478837 ms part looks strange because the delay is below 0. The number is formatted as %d in elog call, and I suspect this is kind of integer overflow. Looking at GetReplicationApplyDelay() in walreceiverfuncs.c I noticed that the integer overflow occurs because sometimes the return of the GetCurrentChunkReplayStartTime() is 0 (zero). I added an elog into GetReplicationApplyDelay to check this info: DEBUG: GetReplicationApplyDelay 479099832 seconds, 352 milliseconds, (0.00, 479099832352083.00) DEBUG: sendtime 2015-03-08 00:17:12.351987-03 receipttime 2015-03-08 00:17:12.352043-03 replication apply delay -1936504800 ms transfer latency 0 ms DEBUG: GetReplicationApplyDelay 479099841 seconds, 450 milliseconds, (0.00, 479099841450320.00) DEBUG: sendtime 2015-03-08 00:17:21.45018-03 receipttime 2015-03-08 00:17:21.450294-03 replication apply delay -1936495702 ms transfer latency 0 ms Maybe we should check before and return zero from GetReplicationApplyDelay. The attached patch implement it. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c index 496605f..0c45b66 100644 --- a/src/backend/replication/walreceiverfuncs.c +++ b/src/backend/replication/walreceiverfuncs.c @@ -328,6 +328,8 @@ GetReplicationApplyDelay(void) long secs; int usecs; + TimestampTz chunckReplayStartTime; + SpinLockAcquire(walrcv-mutex); receivePtr = walrcv-receivedUpto; SpinLockRelease(walrcv-mutex); @@ -337,7 +339,12 @@ GetReplicationApplyDelay(void) if (receivePtr == replayPtr) return 0; - TimestampDifference(GetCurrentChunkReplayStartTime(), + chunckReplayStartTime = GetCurrentChunkReplayStartTime(); + + if (chunckReplayStartTime == 0) + return 0; + + TimestampDifference(chunckReplayStartTime, GetCurrentTimestamp(), secs, usecs); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN range operator class
On 02/11/2015 07:34 PM, Emre Hasegeli wrote: The current code compiles but the brin test suite fails. Now, only a test in . Yeah, there is still a test which fails in opr_sanity. Yes but they were also required by this patch. This version adds more functions and operators. I can split them appropriately after your review. Ok, sounds fine to me. This problem remains. There is also a similar problem with the range types, namely empty ranges. There should be special cases for them on some of the strategies. I tried to solve the problems in several different ways, but got a segfault one line or another. This makes me think that BRIN framework doesn't support to store different types than the indexed column in the values array. For example, brin_deform_tuple() iterates over the values array and copies them using the length of the attr on the index, not the length of the type defined by OpcInfo function. If storing another types aren't supported, why is it required to return oid's on the OpcInfo function. I am confused. I leave this to someone more knowledgable about BRIN to answer. I think I have fixed them. Looks good as far as I can tell. I have fixed different addressed families by adding another support function. I used STORAGE parameter to support the point data type. To make it work I added some operators between box and point data type. We can support all geometric types with this method. Looks to me like this should work. = New comments - Searching for the empty range is slow since the empty range matches all brin ranges. EXPLAIN ANALYZE SELECT * FROM foo WHERE r = '[1,1)'; QUERY PLAN --- Bitmap Heap Scan on foo (cost=12.01..16.02 rows=1 width=14) (actual time=47.603..47.605 rows=1 loops=1) Recheck Cond: (r = 'empty'::int4range) Rows Removed by Index Recheck: 20 Heap Blocks: lossy=1082 - Bitmap Index Scan on foo_r_idx (cost=0.00..12.01 rows=1 width=0) (actual time=0.169..0.169 rows=11000 loops=1) Index Cond: (r = 'empty'::int4range) Planning time: 0.062 ms Execution time: 47.647 ms (8 rows) - Found a typo in the docs: withing the range - Why have you removed the USE_ASSERT_CHECKING code from brin.c? - Remove redundant or not from /* includes empty element or not */. - Minor grammar gripe: Change Check that to Check if in the comments in brin_inclusion_add_value(). - Wont the code incorrectly return false if the first added element to an index page is empty? - Would it be worth optimizing the code by checking for empty ranges after checking for overlap in brin_inclusion_add_value()? I would imagine that empty ranges are rare in most use cases. - Typo in comment: If the it - If it - Typo in comment: Note that this strategies - Note that these strategies - Typo in comment: inequality strategies does not - inequality strategies do not - Typo in comment: geometric types which uses - geometric types which use - I get 'ERROR: missing strategy 7 for attribute 1 of index bar_i_idx' when running the query below. Why does this not fail in the test suite? The overlap operator works just fine. If I read your code correctly other strategies are also missing. SELECT * FROM bar WHERE i = '::1'; - I do not think this comment is true Used to determine the addresses have a common union or not. It actually checks if we can create range which contains both ranges. - Compact random spaces in select numrange(1.0, 2.0) + numrange(2.5, 3.0);-- should fail -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers