[GENERAL] Migration to pglister - Before
Greetings, We will be migrating these lists to pglister in the next few minutes. This final email on the old list system is intended to let you know that future emails will have different headers and you will need to adjust your filters. The changes which we expect to be most significant to users can be found on the wiki here: https://wiki.postgresql.org/wiki/PGLister_Announce Once the migration of these lists is complete, an 'after' email will be sent out. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Query on pg_settings view
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > Ronen Nofar wrote: > > I have a weird case when running a query on the pg_settings view. > > I have two users, first one is the default user - postgres which is a > > superuser > > and another one is a role which i had created, i called it test_role and > > it's not a superuser. > > When I run a select on pg_settings with these two users I recieve different > > results. > > That is intentional, because some settings should only be > visible for superusers, for example everything that has to > do with the operating system (location of configuration file > or socket directories). This isn't quite correct any longer- with PG10, we have a default role called 'pg_read_all_settings' which can be GRANT'd to other roles to allow viewing of all settings, even those previously restricted to superuser-only. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] missing public on schema public
Tom, all, * Tom Lane (t...@sss.pgh.pa.us) wrote: > =?iso-8859-1?Q?Bo_Thorbj=F8rn_Jensen?=writes: > > I have some additional info and a fix. > > Firstly steps to reproduce: > > Yeah, I can reproduce this. I suspect it got broken by Stephen's hacking > around with default ACLs. A simple example is Yes, it's related to the work I did with pg_dump's ACL handling, because we're no longer just always including the whole revoke/grant set of ACLs for everything in the output. > $ pg_dump -c -U postgres postgres | grep -i public > DROP SCHEMA public; > -- Name: public; Type: SCHEMA; Schema: -; Owner: postgres > CREATE SCHEMA public; > ALTER SCHEMA public OWNER TO postgres; > -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres > COMMENT ON SCHEMA public IS 'standard public schema'; > -- Name: public; Type: ACL; Schema: -; Owner: postgres > GRANT ALL ON SCHEMA public TO PUBLIC; > > That's fine, but if I shove it through an archive file: This works because I added into pg_dump.c a check based on if the output is clean (and therefore the public schema is being recreated or not). In hindsight, that wasn't really the right thing to do because it ends up only working when pg_dump is run with -c and doesn't consider the case where pg_dump is run without -c but pg_restore is. > $ pg_dump -f p.dump -Fc -U postgres postgres > > $ pg_restore -c p.dump | grep -i public This doesn't work because pg_dump isn't run with -c, while pg_restore is. If the archive is created with pg_dump -c (as the above was), then the results match up between the two runs. Note also that if pg_dump is run with -c then a pg_restore without -c would actually still include the GRANT statement, which isn't really correct either. That's obviously a change from what we had before and wasn't intentional. > This is *REALLY BAD*. Quite aside from the restore being wrong, > those two sequences should never ever give different results. > Stephen, you put some filtering logic in the wrong place in pg_dump. I do wish it was that simple. Unfortunately, the public schema is just ridiculously special, both in the way it's a 'user' object but is created by initdb and that it's got special non-default ACLs on it and how it has explicit special code to skip over it when a restore is happening, unless -c is used. What I'm afraid we need to do here is basically continue to hack on that code in pg_backup_archiver.c's _printTocEntry() to teach it to issue the default GRANT ALL ON SCHEMA public TO PUBLIC; when we are processing the TOC entry for CREATE SCHEMA public;. That would make the recreation of the public schema when pg_dump or pg_restore is being run with -c actually match how the public schema is created by initdb, and the rest would end up falling into place, I think. One complication, however, is what happens when a user drops and recreates the public schema. If that's done, we'll end up not dumping out the delta from the public schema's initial ACLs, which wouldn't be correct if you're restoring into a newly initdb'd cluster. I'm thinking that we need to forcibly look at the delta from public-as-installed-by-initdb and whatever-public-is-now, regardless of if the public schema was recreated by the user or not, because on restore we are expecting a newly initdb'd cluster with the public schema as originally installed (or as installed by pg_dump/pg_restore following the logic above). I'll play around with this approach and see if things end up working out in a better fashion with it. Baking this knowledge into pg_backup_archiver.c is certainly ugly, but handling of public has always been hard-coded into that, and we even added more special handling to that code 10 years ago to deal with the COMMENT on the public schema, so this is really just more of the same. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Roles inherited from a role which is the owner of a database can drop it?
Greetings, * Ivan Voras (ivo...@gmail.com) wrote: > On 30 October 2017 at 22:10, David G. Johnston> wrote: > > Not quite following but ownership is an inheritable permission; > > Basically, I'm asking if "ownership" can be revoked from the set of > inherited permissions? If there is a role G which is granted to role A, and > G is the owner of a database, can A be made to not be able to do what only > owners can (specifically in this case, drop databases)? No, that's exactly what role membership means- you have the same rights as the other role. > > and even if it was not SET ROLE is all that would be required. Any owner > > can drop an object that it owns. > > It's kind of the reverse: I'm wondering if ownership can be made > un-inheritable. No, because even if ownership wasn't inheritable the user would simply do 'SET ROLE owner;' and then have all of the ownership rights that way. > Just considering the case of dropping databases for now. I.e. let the > developers do everything except that. It's a start. I think you're assuming far too much about what being a database owner means- I'd suggest you really think about why the developers need to be database owners at all; in other words- what's the *other* privilege that's currently only available to database owners that you need developers to be able to do? I have a hunch that it might be GRANT'ing rights on the database, but there's only a couple such rights (eg: CONNECT) and you might be better off managing those in another way. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_audit to mask literal sql
Greetings, * rakeshkumar464 (rakeshkumar...@outlook.com) wrote: > By mask I mean pgaudit should log where ssn = '123-456-7891' as where ssn = > '?' Data masking really isn't part of auditing, and so even if pgaudit could do so, that wouldn't really be the right place to make it happen. There have been discussions about data masking previously but they haven't really lead anywhere. Having proper auditing capabilities built into the backend and then a way to classify errors (such as syntax error or other issue where we couldn't tell what the query actually was due to a user fat-fingering something) as 'not to be logged' would at least get us closer to your goal of not wanting sensitive data in the log files, but PG isn't there yet. That said, there are quite a few people who do use PG with HIPPA and address the requirements required for it in other ways (as discussed elsewhere on this thread). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Backup strategy using 'wal_keep_segments'
Greetings, * Rhhh Lin (ruanline...@hotmail.com) wrote: > I would actually be an advocate for using a proper archive_command in order > to facilitate a proper (Per the documentation) PITR and backup strategy. Glad to hear it. > However, a colleague had suggested such a creative approach (Possibly, less > administrative overhead, ease of maintenance? I'm not sure) and I struggled > to find any evidence online in blogs/white-papers/documentation that this was > a feasible approach. That said, I couldn't find any info rejecting it as a > method either, which led me to post here. Essentially, this was a difference > of opinion on approach, and I was looking to gather information in order to > make an informed opposing argument. I'd strongly suggest considering tried-and-true approachs instead of coming up with something novel, particularly when it comes to backups and recovery. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Backup strategy using 'wal_keep_segments'
Greetings, * Rhhh Lin (ruanline...@hotmail.com) wrote: > A colleague recently suggested that instead of implementing an > 'archive_command' to push archivable WALs to a secondary location (for > further backup to tape for example), we could instead persist the WAL files > in their current location by setting the "wal_keep_segments" parameter to an > extreme value e.g. 1000 and have the 'archive_command' do nothing. Michael's points are good and I wouldn't recommend using this archive command either, but what isn't clear to me is what you're actaully trying to solve by using such a method..? You haven't said anywhere what's wrong with archive_command (I know that there certainly are some things wrong with it, of course, but there are solutions to a number of those issues that isn't a hack like this ...). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pgpass file type restrictions
Matt, * Desidero (desid...@gmail.com) wrote: > I agree that it would be better for us to use something other than LDAP, If you happen to be using Active Directory, then you should really be using Kerberos-based auth instead. AD includes both LDAP and a KDC and the LDAP half is really *not* the way to handle authentication in that environment (or, well, really, LDAP isn't a terribly secure option in any environment, but if it's all you've got and you're not allowed to change then I suppose there's not much to be done about it). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Permissions for Web App
Greetings, * Igal @ Lucee.org (i...@lucee.org) wrote: > It worked, thanks! Be sure to check that you're really getting what you want here. > For future reference and for the benefit of others, the command that > I ran is: > > ALTER DEFAULT PRIVILEGES IN SCHEMA public > GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO webapp; Note that DEFAULT PRIVILEGES apply to a specific role (in the above case, whatever CURRENT_USER is) and therefore will only be helpful if you only have one user creating tables, in this case. There's been a lot of questions and discussions about having an actual 'read only' role that can be granted out. Now that we've actually got default roles in PG, this is something that becomes at least somewhat more practical and might be able to happen for PG11 if there's interest and effort put into it. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup naming conventions
Greetings, * mj0nes (matthew.jo...@ramtech.co.uk) wrote: > I'm just starting out on a rolling backup strategy and the naming convention > has thrown me slightly for the WAL and "backup_label" files. > > What I want to do is pair up the backup label files with the associated tar > ball of the data directory. When I first ran pg_start_backup('label'), I > naively thought a file would be created named label. Maybe not that naive > considering the documentation: > > /pg_start_backup creates a backup label file, *called backup_label*, in the > cluster directory with information about your backup, including the start > time and label string./ That's an interesting point, but not particularly relevent any more as we've deprecated that approach to doing backups now because of issues with the backup_label file and what happens if the system is restarted during a backup. > Thanks for any pointers. Frankly, developing your own backup software for PG isn't trivial and isn't something I'd recommend. There's a number of good tools out there which have been developed over years to work well to specifically address PG backups and to do things correctly (such as making sure WAL archiving is working and that WAL files are sync'd to disk before telling PG that it's been copied). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Using cp to back up a database?
Ron, * Ron Johnson (ron.l.john...@cox.net) wrote: > Maybe my original question wasn't clear, so I'll try again: is it > safe to do a physical using cp (as opposed to rsync)? Frankly, I'd say no. There's nothing to guarantee that the backup is actually sync'd out to disk. Further, you're probably in the same boat when it comes to WAL archiving (just using 'cp' there isn't good for the same reason), though that's arguably worse because once you return true from archive_command, that WAL file will be gone and you could end up with a hole in your WAL stream. Next, this backup method has been deprecated because of issues with the backup_label file and what happens when the system crashes during a backup. So, no, you shouldn't be just using 'cp', or 'rsync', or any of those similar, simple, tools for doing a proper PG backup. Use a tool which has been developed specifically for PG such as pgBackRest, barman, WAL-E, WAL-G, etc. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL
Greetings John, * John R Pierce (pie...@hogranch.com) wrote: > On 9/20/2017 6:55 AM, Stephen Frost wrote: > >If AD is in the mix here, then there's no need to have things happening > >at the database level when it comes to passwords- configure PG to use > >Kerberos and create a princ in AD and put that on the database server > >and then users can authenticate that way. > > for web apps? how does a web browser do kerberos over http ? Yes, and using SPNEGO, which the major browsers have all supported for a very long time. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL
John, * John R Pierce (pie...@hogranch.com) wrote: > On 9/19/2017 3:32 PM, chiru r wrote: > >How those application accounts get recognized in database? > > > >Let say App_user1 authenticated through application ,after that > >how the App_user1 get access to DB? > > > >can you please provide more information ,how the app users are > >accessing database ? > > the user isn't accessing the database, the application itself is > accessing the database. a web app might have 10s of 1000s of > unique users, but a web app typically only uses a single application > account to access the database. While that's one approach, it's certainly not the only one (and not necessairly considered a good approach either..). If AD is in the mix here, then there's no need to have things happening at the database level when it comes to passwords- configure PG to use Kerberos and create a princ in AD and put that on the database server and then users can authenticate that way. Further, applications can be configured to accept and proxy Kerberos credentials from the user to authenticate with to the database, allowing the application to actually be logged in as the user for that session, with only those rights the user has. Group/Role membership does still have to be sync'd between the AD/LDAP directory and the database for those permissions to be set up, but that isn't all that hard to do using a cronjob and ldap_fdw, or similar, people just have to realize that there's a bit of lag. The same goes for creating accounts in the first place in the database, of course. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL
Bruce, * Bruce Momjian (br...@momjian.us) wrote: > On Tue, Sep 19, 2017 at 01:28:11PM -0400, Stephen Frost wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > chiru r <chir...@gmail.com> writes: > > > > We are looking for User profiles in ope source PostgreSQL. > > > > For example, If a user password failed n+ times while login ,the user > > > > access has to be blocked few seconds. > > > > Please let us know, is there any plan to implement user profiles in > > > > feature > > > > releases?. > > > > > > Not particularly. You can do that sort of thing already via PAM, > > > for example. > > > > Ugh, hardly and it's hokey and a huge pain to do, and only works on > > platforms that have PAM. > > > > Better is to use an external authentication system (Kerberos, for > > example) which can deal with this, but I do think this is also something > > we should be considering for core, especially now that we've got a > > reasonable password-based authentication method with SCRAM. > > Does LDAP do this too? Active Directory does this, with Kerberos as the authentication mechanism. Straight LDAP might also support it, but I wouldn't recommend it because it's really insecure as the PG server will see the user's password in the cleartext (and it may be sent in cleartext across the network too unless careful steps are taken to make sure that the client only ever connects over SSL to a known trusted and verified server). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > chiru rwrites: > > We are looking for User profiles in ope source PostgreSQL. > > For example, If a user password failed n+ times while login ,the user > > access has to be blocked few seconds. > > Please let us know, is there any plan to implement user profiles in feature > > releases?. > > Not particularly. You can do that sort of thing already via PAM, > for example. Ugh, hardly and it's hokey and a huge pain to do, and only works on platforms that have PAM. Better is to use an external authentication system (Kerberos, for example) which can deal with this, but I do think this is also something we should be considering for core, especially now that we've got a reasonable password-based authentication method with SCRAM. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Logging the fact that a log was shipped
* Christoph Moench-Tegeder (c...@burggraben.net) wrote: > ## Stephen Frost (sfr...@snowman.net): > > > Worse, such scripts run the serious risk of losing WAL if a crash > > happens because nothing is ensuring that the WAL has been sync'd to disk > > before returning from the archive_command. > > That risk already exists when using rsync/scp/scp/... and should be > mitigated by filesystem settings on the receiving side. I was including rsync/scp/similar based tools, yes, just pointing out that such tools should be avoided when doing PG backups and WAL archiving. I have a hard time seeing "require filesystems be mounted as sync" to really be a viable solution, though I suppose it would be technically correct. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Logging the fact that a log was shipped
* Christoph Moench-Tegeder (c...@burggraben.net) wrote: > ## Ron Johnson (ron.l.john...@cox.net): > > > How is this done in v8.4? (I tried adding "date; rsync ..." but pg > > didn't like that *at all*.) > > There's a DEBUG1-level log message on successful archive_command > completion - that would give you a lot of other low-prio log > messages wich you probably don't care about. > I'd put a wrapper around your rsync (a short shell script > would be sufficient) in the lines of rsync ... && logger "done", > that way you'd get the information via syslog. > On the other hand, do you really need the details about each WAL > segment? Since 9.4 there's pg_stat_wal_archiver... You're really > making your job harder than it needs to be with that ancient > PostgreSQL... Worse, such scripts run the serious risk of losing WAL if a crash happens because nothing is ensuring that the WAL has been sync'd to disk before returning from the archive_command. Most of the existing tools for dealing with WAL archival (pgbackrest, barman and WAL-E, at least) already log successful and unsuccessful archive command runs. I'm pretty sure barman supports back to 8.4 and I know pgbackrest does. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] PG and database encryption
Greetings, * Scott Marlowe (scott.marl...@gmail.com) wrote: > On Tue, Aug 22, 2017 at 3:13 PM, PTwrote: > > On Tue, 22 Aug 2017 12:48:13 -0700 (MST) > > rakeshkumar464 wrote: > >> We have a requirement to encrypt the entire database. What is the best > >> tool > >> to accomplish this. Our primary goal is that it should be transparent to > >> the > >> application, with no change in the application, as compared to un-encrypted > >> database. Reading about pgcrypto module, it seems it is good for few > >> columns > >> only and using it to encrypt entire database is not a good use-case. > >> > >> Is this which can be done best by file level encryption? What are the good > >> tools on Linux (RHES), preferably open-source. There's been discussion and work to add the ability to have encryption at the database instance level instead requiring on filesystem-level or harddisk-level encryption, but nothing has gone into PG yet to support this. I'm hopeful that there will be progress towards this in the next round of PG development, ideally landing in PG11. > > On multiple occasions, I demonstrated that an unecrypted database was the > > least likely disclosure vector for sensative data, and that we shouldn't > > waste any time on it until we had ensured that all other breach vectors had > > been fixed. Over the course of 4 years at that job, we never managed to get > > all the other (more likely) breach vectors secured. While that's interesting, I don't find it particularly relevant- fix the breach vectors you can reasonably. If you're worried that backups or hard drives might not be properly handled and may fall into an attacker's hands and adding encryption to address that is straight-forward and simple to do then it's entirely reasonable to do so, even if your application isn't as secure as you'd like it to be. > > As a result, my opinion would be that you use filesystem encryption. It's > > very efficient, low management overhead, and proven technology that doesn't > > interfere with anything else you're doing. You can then check that box on > > whatever form you have to fill out and the beaurocrats will leave you alone. > > On top of that, it effectivley protects againts possible breach vectors that > > don't require changing the application. If PG-level encryption was also efficient, low management overhead and based on proven technology that didn't interfere with anything else then it would be an entirely good thing to consider doing. We don't have that today, but that doesn't make it unreasonable to ask for. > > Real security will require changing the application. But take my word for > > it, > > nobody wants to hear the list of breach vectors that can only be fixed by > > modifying the application. Because people aren't interested in real > > security, > > they're just interested in checking boxes on a form. If it's trivial to address a certain set of attack vectors, then it makes sense to do so, even if it's a relatively unlikely vector to be hit with. I agree that non-trivial efforts to address unlikely attack vectors is a bad idea when there are more likely vectors to consider addressing first. > This. Without a much stricter definition of the attack vectors you're > trying to defeat "encrypt the whole database" is a very hand-wavy > proposition. Are you protecting against people getting into the data > center and stealing your hard drives? Rogue applications getting > access to the db? Someone sniffing the passwords or unencrypting them > on the servers etc etc. This continues to come up in discussions which outline the attack vectors that database-level encryption addresses. We know the vectors which database/filesystem-level encryption addresses, we don't need to beat up people asking for it by telling them that they don't know- let's just make it clear that there's only a subset of vectors addressed by database of filesystem encryption and ensure that everyone understands what those are and then individuals can weigh the choice themselves as to if it makes sense for their particular situation to address those vectors through database/filesystem-level encryption or not. > OP: It's just generic a requirement to take seriously. Sit down, come > up with possible attack vectors and possible ways to thwart them. This is certainly a good exercise, but "someone stole the backups" is absolutely an entirely reasonable attack vector to consider and database-level encryption is one approach which could (if implemented properly) address that vector. There are certainly other approaches to address that vector as well, of course, such as using backup technology which provides its own encryption, though that requires managing a different set of keys possibly, or run the backup through GPG or similar but that gets painful quickly, et al. > Security isn't something you do one time and you're done, it's a > constant process of design,
Re: [GENERAL] Planner statistics usage for composite type
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Dmitry Lazurkinwrites: > > I am trying to find workaround for cross-column statistics. > > ... > > Worn estimate. Planner doesn't use statistics. In code I see usage of > > function scalargtsel which returns default selectivity because > > ROW('tag1', 0.9)::tag_sim is not Const. > > Yeah, that's because eval_const_expressions doesn't know how to fold > a constant RowExpr to a simple Const. I have a patch laying about > someplace to improve that, but I keep forgetting about it until > we're in beta phase :-(. In the meantime you could perhaps do the > folding by hand: Put it into the CF..? Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] LDAP authentication without OU in ldapbasedn
Greetings, * Gregory Nicol (gregory.ni...@medbank.com.mt) wrote: > I can't seem to get LDAP Authentication working without an OU in the > ldapbasedn. My users are spread across multiple OUs without a common root OU > which is why I'm trying to authenticate with just the DC. As it looks like you're working in a Microsoft Windows AD environment, I'd strongly suggest you consider using Kerberos/GSS authentication instead of LDAP. With LDAP, the user has to constantly re-type their password and the password is sent to the PostgreSQL server. Neither of these are ideal and both are avoided by simply using Kerberos, which is what AD uses. Authentication using LDAP really shouldn't ever be done in an environment which has Active Directory. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Associating a basebackup and it's .backup WAL file
Greetings, * cen (imba...@gmail.com) wrote: > Seems like a lot of manual work to me, to automate it I'd basically > have to diff both directories and then copy only the newest > differences over to the recovery. So far I was unable to find a > supersecret git repo with bash scripts accomplishing this tasks > which is surprising. I would hazard to suggest that's because people have realized that bash scripting backup and recovery of PostgreSQL, and ensuring that all of the appropriate precautions are taken and checks are done, is next to impossible; were it to be done the scripts would be more complex and challenging to use than any of the existing solutions. I'd strongly suggest you consider one of the maintained backup solutions that have already been written instead of inventing yet another one. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot
Greetings, * mariusz (mar...@mtvk.pl) wrote: > On Tue, 2017-07-04 at 17:55 -0400, Stephen Frost wrote: > > > How is this done inside a shell script? > > > > Generally, it's not. I suppose it might be possible to use '\!' with > > psql and then have a shell snippet that looks for some file that's > > touched when the snapshot has finished, but really, trying to perform a > > PG backup using hacked together shell scripts isn't recommended and > > tends to have problems. > > not saying to use scripts or pg tools, but if OP needs a script, may > consider perl module IPC::Run instead of shell script - easy to use > interacting with subprocesses over filehandles or pipes (even supports > ptys). that way you can keep your active psql session, pass input to it, > view and process it's output, while doesn't block you from doing other > things meanwhile > > of course it's perl, not shell, but looks "scripty" and can do what OP > wants Sure, that's more-or-less what pgBackRest does (which is Perl, mostly). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot
Melvin, * Melvin Davidson (melvin6...@gmail.com) wrote: > On Wed, Jul 5, 2017 at 10:14 AM, Stephen Frost <sfr...@snowman.net> wrote: > >Part of my concern is that such a script is unlikely to show any problems > until it comes time to do a restore > As previously stated, the script was used to set up a slave and has done so > successfully many times. There are subsequent scripts > that check results. Ah, the impression was that this was being suggested as a way of performing regular backups. For simply creating a replica, pg_basebackup works pretty well for small systems. For larger databases, being able to perform parallel backup/restore is very useful, even if it's just for building a replica. > >What might be worse would be to pull the plug while the backup is running > and then try to bring the primary back online. > Uh, whom would be so stupid as to do that? Pulling the plug in the middle of various operations is a good way to simulate what happens if the system crashes, for whatever reason, and to make sure that processes and procedures are in place to address such a failure scenario. > >Right, there's little sense in trying to perfect a shell script when > proper solutions exist. > >>It's better to create something that others criticise than to create > nothing and criticise others. Go create, have fun!! Indeed, I'd certainly welcome criticism of pgBackRest. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot
Greetings, * Melvin Davidson (melvin6...@gmail.com) wrote: > Stephen, > >This script is a good example of why trying to take a PG backup using > shell scripts isn't a good idea. > > Your criticism is noted, however, I have used it many times in the past > with absolutely no problem. I submitted that script as a possible solution > to the op's problem/question. If you have an alternate solution or can make > improvements to it, then I am sure the op and I would welcome them. Part of my concern is that such a script is unlikely to show any problems until it comes time to do a restore- it could be failing now due to the issues I noted previously without any obvious error being thrown but with the resulting backup not being viable. Hopefully that isn't the case and ideally you're performing test restores of each backup you take to ensure that it works. Further, it doesn't address the OP's question, which was specifically how to avoid using the now-deprecated exclusive backup method that the script you posted uses. * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Wed, Jul 5, 2017 at 10:47 PM, Melvin Davidsonwrote: > > Your criticism is noted, however, I have used it many times in the past > > with absolutely no problem. > > Plug off the server on which is stored the backup just after your > script finishes, you have a good chance to be surprised if you try to > restore from this backup later on. What might be worse would be to pull the plug while the backup is running and then try to bring the primary back online. :/ That issue is part of why the API used in this script is now deprecated. > > I submitted that script as a possible solution > > to the op's problem/question. If you have an alternate solution or can make > > improvements to it, then I am sure the op and I would welcome them. > > Stephen has mentioned two of them, with hundreds of man hours spent in > developing those backup tools to be robust solutions, done by > specialists on the matter. Right, there's little sense in trying to perfect a shell script when proper solutions exist. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot
Greetings, * Melvin Davidson (melvin6...@gmail.com) wrote: > On Tue, Jul 4, 2017 at 5:55 PM, Stephen Frost <sfr...@snowman.net> wrote: > > I'd recommend considering one of the existing PG backup tools which know > > how to properly perform WAL archiving and tracking the start/stop points > > in the WAL of the backup. Trying to write your own using shell scripts, > > even with ZFS snapshots, isn't trivial. If you trust the ZFS snapshot > > to be perfectly atomic across all filesystems/tablespaces used for PG, > > you could just take a snapshot and forget the rest- PG will do crash > > recovery when you have to restore from that snapshot but that's not much > > different from having to do WAL replay of the WAL generated during the > > backup. > > > > As for existing solutions, my preference/bias is for pgBackRest, but > > there are other options out there which also work, such as barman. > > Here is a model shell script I use to do a base backup to set up a slave. > See attached ws_base_backup.sh This script is a good example of why trying to take a PG backup using shell scripts isn't a good idea. Offhand, there's issues like: - No check that start_backup was successful - No check that stop_backup was successful - No syncing of files to disk anywhere - Requires running as root (without any particular clear reason why) - Doesn't check if the database is already in 'exclusive backup' mode - Doesn't check the return codes for the main 'tar' command - Uses pipes without checking return codes through PIPESTATUS - Doesn't capture the output from pg_start/stop_backup - Doesn't verify that all of the WAL required for the backup was archvied - Doesn't check the exit code of the rsync I'm sure there's other issues also and I do hope it's working enough that you have viable backups, but I wouldn't use such a script today (though I wrote plenty like it in the distant past). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot
Greetings, * hvjunk (hvj...@gmail.com) wrote: > I’ve previously done ZFS snapshot backups like this: > > psql -c “select pg_start_backup(‘snapshot’);” > zfs snapshot TANK/postgresql@`date ‘+%Ymd’` > psql -c “select * from pg_stop_backup();” Hopefully you are also doing WAL archiving... > Reading the PostgreSQL9.6 documentation, the advice/future is to use the > non-exclusive method, where I’ll need to keep a session *open* while the > snapshot takes place, and after that I’ll have to issue the > pg_stop_backup(false); in that active connection that issued the > pg_start_backup(‘backup’,false,false); Right. > How is this done inside a shell script? Generally, it's not. I suppose it might be possible to use '\!' with psql and then have a shell snippet that looks for some file that's touched when the snapshot has finished, but really, trying to perform a PG backup using hacked together shell scripts isn't recommended and tends to have problems. In particular WAL archiving- there's no simple way for a shell script which is being used for archiving to confirm that the WAL it has "archived" has been completely written out to disk (which is required for proper archiving). Further, simple shell scripts also don't check that all of the WAL has been archived and that there aren't any holes in the WAL between the starting point of the backup and the end point. > Especially how to do error checking from the commands as psql -c “select > pg_start_backup{‘test’,false,false);” not going to work? I'd recommend considering one of the existing PG backup tools which know how to properly perform WAL archiving and tracking the start/stop points in the WAL of the backup. Trying to write your own using shell scripts, even with ZFS snapshots, isn't trivial. If you trust the ZFS snapshot to be perfectly atomic across all filesystems/tablespaces used for PG, you could just take a snapshot and forget the rest- PG will do crash recovery when you have to restore from that snapshot but that's not much different from having to do WAL replay of the WAL generated during the backup. As for existing solutions, my preference/bias is for pgBackRest, but there are other options out there which also work, such as barman. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Querying a policy
Jeff, * Jean-Francois Bernier (jean.francois.bern...@boreal-is.com) wrote: > We are evaluating migrating our software RLS to Postgres by using policies. Neat! > Having a "FOR UPDATE POLICY" on a table, I was wondering if there is a way to > know, before trying an Update and getting an error, if the current row can be > updated ? Unfortunately, not as easily as it seems you would like, currently, though perhaps we could change that.. > The goal is to show or hide the edit button in my software forms or lists. Right, makes sense. > I know that this query can return the CHECK condition of my POLICY: > SELECT pg_get_expr(polwithcheck, polrelid, true) FROM pg_policy; Yup. > But is there a simpler way to get the ids the current user can read and the > ones that he can update? Well, have you considered using the expression from the above query to add a column to your SELECT query that results in a column that indicates if the row is updatable or not..? That is, construct your query by doing: SELECT pg_get_expr(polwithcheck, polrelid, true) FROM pg_policy ; into a variable in your application, then: "SELECT * , " . variable . " from ..." The same could be done through a view, potentially, or perhaps with a plpgsql function, but I'm guessing that negates some of the "cleanliness" that you get with RLS and base tables. I certainly like the idea in general. I will caution that, to be fair, just because the WITH CHECK clause says a given row can be modified at SELECT time doesn't guarantee that the same row will be updatable in some later transaction, as it depends on just what the policy is. In any case, very cool to hear about people working to use RLS! Would love to chat further about your use-case and see what we can do to make RLS easier to use. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Large data and slow queries
Greetings, * Samuel Williams (space.ship.travel...@gmail.com) wrote: > Thanks for all the suggestions Stephen. > > > That explain analyze shows a whole ton of heap fetches. When was the > last time a VACUUM was run on this table, to build the visibility map? > Without the visibility map being current, an Index-Only Scan, as is > happening here, can really suck. > > I'm using the default 9.6 config, I thought that auto-vacuum was on by > default? Sure, but that doesn't mean the table gets vacuumed. In particular, actual vacuums are only kicked off when the number of *updated* or *deleted* tuples passes the autovacuum threshold. If no updates or deletes are done on the table (iow, it's essentially an insert-only or insert-mostly table), the autovacuum daemon nevers runs a vacuum on it (though it'll run analyze's). https://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-THRESHOLD Check pg_stat_all_tables to see when/if the table has actually been vacuumed. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Large data and slow queries
Greetings, * Samuel Williams (space.ship.travel...@gmail.com) wrote: > We want the following kinds of query to be fast: "kinds of query" isn't helpful, you should be reviewing exactly the queries you care about because statistics and your exact data set and what the exact query you're running is will all have an impact. > SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE > "user_event"."what" IN ('poll', 'location_change', > 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 > 01:23:55') AND (latitude > -37.03079375089291 AND latitude < > -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < > 175.0805140220076); Ugh. You should really look at and consider PostGIS whenever you're working with geospatial data. > We have a btree index and it appears to be working. However, it's > still pretty slow. You're finding some 17M records and then reducing that with DISTINCT to only 114k. Hard to say if it's faster to just brute-force your way through that with a HashAgg (as your explain analyze shows), or if a loose index scan would work better (check out how to do one in PG here: https://wiki.postgresql.org/wiki/Loose_indexscan). > EXPLAIN ANALYZE gives the following: > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 That explain analyze shows a whole ton of heap fetches. When was the last time a VACUUM was run on this table, to build the visibility map? Without the visibility map being current, an Index-Only Scan, as is happening here, can really suck. > CLUSTER user_event ON index_user_event_for_visits_3 followed by > analyze... Our data is mostly time series but sometimes we get some > dumps with historical records. > > Perhaps add a BRIN index on created_at BRIN would be helpful if you wanted a smaller index. That doesn't seem to be the case here. > I'm wondering if... we can use an index to cache, all user_ids seen on > a given day. If we change our index to be more discrete, e.g. > created_at::date, would this help? The set union of user_ids for 365 > days should be pretty fast? Materialized views are often useful, particularly when the results are (relatively) slow moving. > I'm open to any ideas or suggestions, ideally we can keep > optimisations within the database, rather than adding a layer of > caching on top. If you need to go the materialized view route, I'd definitely recommend doing that in the database rather than outside it. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] A change in the Debian install
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > (But ... these statements are based on an assumption of out-of-the- > box Postgres behavior. I would not exactly put it past the Debian > packagers to have decided to change this for reasons of their own, > and their track record of telling us about such decisions is many > miles south of abysmal. So you might look at whatever patches > are in the Debian package to see if there's anything touching > pgstat.c's socket-setup logic.) I don't believe this is really a fair assessment. Maybe at some point in the distant past, but not today. Christoph is regularly on this list contributing to threads regarding packaging, submitting patches of his own for improvements to PG, and the patches currently included in the Debian distribution, at least mostly, are for things which really should be possible to do with configure options, but which we don't provide today, or things we should just be handling already. The non-comment/documentation patches include for the Debian PG 9.6 packages are: 50-per-version-dirs.patch Use version specific installation directories so that several major versions can be installed in parallel. This includes changing pkglibdir and includedir_server. Those might be able to be set through existing configure flags and that's probably something we could work with Christoph to do. There's also a change to pg_config which might be a bit more difficult to handle in upstream (related to how pg_config ends up in /usr/bin, but that isn't the "right" BINDIR). 51-default-sockets-in-var.patch Use /var/run/postgresql/ for the DEFAULT_PGSOCKET_DIR. We really should allow this to be changed in configure. 54-debian-alternatives-for-external-tools.patch Use 'sensible-editor' for DEFAULT_EDITOR, and 'pager' for DEFAULT_PAGER. These could also be done through configure switches, I would think. 64-pg_upgrade-sockdir This is a bit of a curious one, the description is: Fix for: connection to database failed: Unix-domain socket path "/build/buildd-postgresql-9.3_9.3~beta1-1-i386-mHjRUH/postgresql-9.3-9.3~beta1/build/contrib/pg_upgrade/.s.PGSQL.50432" is too long (maximum 107 bytes) See also: http://lists.debian.org/debian-wb-team/2013/05/msg00015.html This basically adds a mechanism to fall back to using /tmp if the socket path is too long. Would probably be good to figure out a better way. filter-debug-prefix-map Description: Remove -fdebug-prefix-map=/p/w/d=. from CFLAGS and CONFIGURE in pg_config. The embedded path makes the build non-reproducible, and the flag is useless for building extensions anyway. Not sure there's much we can do about this one, but it's also just for pg_config. hurd-sem_init Adds -pthread to LIBS and forces POSIX semaphores on hurd. I'm not sure if this is all still required, but if so, it'd probably be ideal if we could figure out a way to handle it automatically instead of making the Debian packagers have to do it. In short, there's certainly nothing that touches pgstat.c's socket-setup logic and almost all of the changes are just changes to #define's to specify alternative paths or external tools. I'll start a discussion with Christoph on if we might, already, be able to remove some of these, and where we might be able to make upstream changes to remove the need for others. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] browser interface to forums please?
Greetings, * Robin St.Clair (ro...@robinstclair.net) wrote: > Please do not encrypt what does not need to be encrypted. Signing > communications to a mailing list probably isn't required? Signing communications demonstrates that the message was, indeed, from me. You are certainly welcome to ignore it. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] browser interface to forums please?
Greetings Vinny, * vinny (vi...@xs4all.nl) wrote: > I meant it as "in an ideal world". It's a bit like buying a car > and finding out that they have not put the wheels on. It's not > difficult to put them on yourself, > but you kind of expect that the people who want you to user their > car would do that for you. There's a bit of a difference between buying a car and using a service which is provided for free from a team of volunteers. I agree that the "in an ideal world" wording is better. :) Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] browser interface to forums please?
Greetings, * vinny (vi...@xs4all.nl) wrote: > The thing is; mailinglists are far from userfiendly if you are not > used to them. > Even in this thread several people have explained how much work they > have done to get it > into a state where they can easily work with it. Can you expect Joe > Average to do something like that > if they want to get more involved in PgSQL? I don't actually feel that it's really all that unreasonable, no. I've explained that we use mailing lists to a few different groups at different colleges I've spoken at and while there's been a bit of grousing from a few individuals, I don't recall anyone not knowing what a mailing list is or having all that bad of a reaction. I'll be speaking later this month again at GMU, so I'll make it a point to discuss it with the group there. > Now, I'm not saying the mailinglists should go, I'm saying there > should be an easier way > to access them. It should be possible to register on the site, post > a message and read replies, > without having to subscribe to the list and setup a way of dealing > with the influx of messages > that are, for the most post, simply not interesting to the average user. I don't think there's anyone who is particularly against that idea, but it's far from trivial to do and to address the possible spam which will result from that. All of the website code is open-source and improvements to it would be greatly welcomed, as long as they don't create a significant increase in the maintenance burden for the pginfra team. > I'd love to have an RSS feed that contains only new questions, so I > can just watch the popup > on my screen the way I do with the rest of the world, and not have > to deal with replies to topics that I don't care about anyway. I don't see any reason, off-hand at least, that this couldn't be provided. We already provide RSS feeds for other things and it's reasonably straight-forward. Replying to the RSS feed would require an email client though, at least for now. Perhaps that could be integrated into the 'whole mbox' download option though or something along those lines so you can pull the email/thread into your client easily to reply. > And yes, I can probably setup my email to do something like that, > the point is that I shouldn't have to. I'm all for improving things and adding automation where it'll help, but the infrastructure is basically run by volunteers. Making statements like "I shouldn't have to" isn't the best approach to getting the changes you'd like to see happen done. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Checksum and Wal files
Greetings, * Poul Kristensen (bcc5...@gmail.com) wrote: > Are all files inclusive wal files added a checksum? WAL files include checksums, yes. The heap files in recent versions can also include checksums, but they don't by defualt. Currently, checksums can only be enabled at initdb-time, meaning that if you have a database which doesn't have checksums today then you have to create a *new* database, export all of the data from existing database using 'pg_dump' and then import it into the database which has heap checksums enbaled. > We will be doing regularly tests of recovery using checksums as the > parameter > of a succesfull recovery. I'm curious what you mean by this statement. There aren't many tools out there today for checking if the heap checksums in PostgreSQL are valid or not. One approach is to use PostgreSQL itself and convince it to read every page, though that can be a bit troublesome to do. Another option is to use a backup tool which checks checksums as it's doing the backup (pgbackrest being one, and I think there may have been a fork of barman which also did..?). That said, I would strongly encourage you to find a better approach to testing your backup and recovery procedures as just checking checksums doesn't tell you if there's been any database-level corruption, it'll only identify filesystem-and-below corruption (well, and it might catch some database-level bugs, but the coverage there is exceedingly small). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Incremental / Level -1 backup in PG
Greetings, * rakeshkumar464 (rakeshkumar...@outlook.com) wrote: > >The short answer is 'no'. There are complications around this, > >particularly at the edges and because files can be written and rewritten > >as you're reading them. > >Basically, no file with a timestamp after the > >checkpoint before the backup can be omitted from an incremental backup. > > what you have written above applies to oracle/db2 too. In case you are not > aware, during backup, those products have a mechanism to save the image > of any changing block as it existed before the start of the backup. that is > used > to reconstruct the PIT image of the block. That is WAL archiving, which PG already does, but is different from backups. > Anyhow looks like this can't be done in PG. PG does support WAL archiving, as discussed on this thread, and it works exactly as you describe above. As I mentioned, there are also tools for performing incremental backups, which isn't quite the same as straight WAL archiving. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Incremental / Level -1 backup in PG
Greetings, * rakeshkumar464 (rakeshkumar...@outlook.com) wrote: > If first choice is lot faster in Oracle,DB2, I have reasons to believe that > the same should be true for PG also. But as someone explained, the PG > technology can not support this. This statement isn't correct. There are, in fact, tools for doing incremental backup and restore with PG and they work quite well, as I mentioned. Currently they operate at a file-level, which means 1G chunks instead of block-level, but there is work being done to change that too. There isn't anything in "PG technology" that makes this impossible or, really, even that difficult, it's more that there hasn't been effort put into it simply because the file-level incremental solution works quite well in most cases. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Incremental / Level -1 backup in PG
John, * John R Pierce (pie...@hogranch.com) wrote: > On 3/21/2017 5:27 PM, Rakesh Kumar wrote: > >PG does not have a concept of incremental backup. The way it works in > >Oracle and other RDBMS is that incremental backup only backups up changed > >blocks since the last full backup. So if only 10% of blocks changed since > >the last full backup, incremental backup will be only for 10%. > >I am wondering whether it is technically feasible to implement it like this: > > > >1 - At the time of full backup, note the last modified time of each data > >file in a repository. > >2 - Next time when incremental backup runs, for every data file it will > >check the last modified time of it with the one in the repository to > >determine whether it has changed since last full backup. If yes, back it up. > > > >Now on to restore: > > > >1 - First restore full backup. > >2 - Restore incremental backup. > > > >My question: Will it work in PG? > > basebackup + WAL archive lets you do just exactly this. you can > restore to any transaction between when that basebackup was taken, > and the latest entry in the WAL archive, its referred in the > documentation as PITR, Point in Time Recovery. WAL must always be kept for file-level backups, of course, but it does not allow the kind of incremental backup the OP was suggesting. It's important to understand that you might start reading a file whose timestamp is X, read half of it, and then PG starts writing to the first half of the file, and you finish reading the file, all within the same second. A later incremental backup might assume that file hadn't been changed from the version you have and therefore not back it up. The WAL for the change which was written by PG would be in the first 'full' backup, but would not be included in the WAL which is generated during the incremental backup, leading to a situation where that write would be lost and you have a corrupted backup. Do not try to implement an incremental backup solution using simple/naive tools like rsync with timestamp-based incrementals. It is not safe. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Incremental / Level -1 backup in PG
Greetings, * Rakesh Kumar (rakeshkumar...@outlook.com) wrote: > PG does not have a concept of incremental backup. The way it works in Oracle > and other RDBMS is that incremental backup only backups up changed blocks > since the last full backup. So if only 10% of blocks changed since the last > full backup, incremental backup will be only for 10%. > I am wondering whether it is technically feasible to implement it like this: > > 1 - At the time of full backup, note the last modified time of each data file > in a repository. > 2 - Next time when incremental backup runs, for every data file it will check > the last modified time of it with the one in the repository to determine > whether it has changed since last full backup. If yes, back it up. > > Now on to restore: > > 1 - First restore full backup. > 2 - Restore incremental backup. > > My question: Will it work in PG? The short answer is 'no'. There are complications around this, particularly at the edges and because files can be written and rewritten as you're reading them. Basically, no file with a timestamp after the checkpoint before the backup can be omitted from an incremental backup. I strongly recommend you use one of the existing backup solutions for PostgreSQL which know how to properly perform incremental backups. I know at least pgBackrest and barman do, I'm not sure about others. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Postgres backup solution
Lawrence, First off, I strongly recommend that you figure out how to send regular plain-text emails, at least to this mailing list, as the whole "winmail.dat" thing is going to throw people off and you're unlikely to get many responses because of it. Regarding your question.. * Lawrence Cohan (lco...@web.com) wrote: > What would be a recommended solution for backing up a very large Postgres > (~13TeraBytes) database in order to prevent from data deletion/corruption. > Current setup is only to backup/restore to a standby read-only Postgres > server > via AWS S3 using wal-e however this does not offer the comfort of keeping a > full backup available in case we need to restore some deleted or corrupted > data. If the goal is to be able to do partial restores (such as just one table) then your best bet is probably to use pg_dump. Given the size of your database, you'll probably want to pg_dump in directory format and then send each of those files to S3 (assuming you wish to continue using S3 for backups). Note that pg_dump doesn't directly support S3 currently. Also, the pg_dump will hold open a transaction for a long time, which may be an issue depending on your environment. If you're looking for file-based backups of the entire cluster and don't mind using regular non-S3 storage then you might consider pgBackrest or barman. With file-based backups, you have to restore at least an entire database to be able to pull out data from it. We are working to add S3 support to pgBackrest, but it's not there today. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Frank van Vugt (ftm.van.v...@foxi.nl) wrote: > > Well, I didn't run into this issue with any of my db's that 'nicely' use > > tables in various schema's, it was actually the one 'older' db with > > everything > > in the public schema that brought it up, so maybe keeping one of those > > around > > isn't too bad an idea ;) > > Yeah, I'll be including this in a regression test also, to make sure we > don't end up breaking this special case again in the future. I've now pushed a fix for this, would be great to know if it solves your issue. Thanks! Stephen signature.asc Description: Digital signature
Re: R: R: [GENERAL] Slow queries on very big (and partitioned) table
Greetings, * Job (j...@colliniconsulting.it) wrote: > >If this is really what you're mostly doing, having constraint exclusion and > >an index on 'profile' would probably be enough, if you insist on continuing > >to have the table partitioned by day (which I continue to argue is a bad > >idea- > >based on the number of total rows you mentioned and the number of > >partitions, you have partitions with less than 20M rows each and that's > >really small, month-based partitions with a BRIN would probably work > >better). If you get to > >the point of having years worth of daily partitions, you'd going to see > >increases in planning time. > > Based on our simulation, we can raise until 5Gb of datas for a single day, > with some millions of rows. 5GB and a few millions rows isn't actually all that much. > We thought to implement one partition for day. > Do you think it should be fine? Really depends on what you're doing. If you're running very short queries that pull out just a record or a few records, then you're going to be unhappy with the planning time required when you have hundreds and thousands of partitions, which is why I typically recommend against using partitions-by-day unless you're only keeping a few months worth of data. Thanks! Stephen signature.asc Description: Digital signature
Re: R: [GENERAL] Slow queries on very big (and partitioned) table
Greetings, * Job (j...@colliniconsulting.it) wrote: > here is primary a partitioned table (for 20/2/2017 logs): > flashstart=# \d webtraffic_archive_day_2017_02_20; > Table > "public.webtraffic_archive_day_2017_02_20" > Column |Type | > Modifiers > ---+-+ > id| numeric(1000,1) | not null default > function_get_next_sequence('webtraffic_archive_id_seq'::text) > timestamp | timestamp without time zone | > domain| character varying(255) | > action| integer | > profile | character varying(50) | > accessi | integer | > url | text| > Indexes: > "webtraffic_archive_day_2017_02_20_action_wbidx" btree (action) > "webtraffic_archive_day_2017_02_20_domain_wbidx" btree (domain) > "webtraffic_archive_day_2017_02_20_profile_wbidx" btree (profile) > "webtraffic_archive_day_2017_02_20_timestamp_date_wbidx" brin > (("timestamp"::date)) > "webtraffic_archive_day_2017_02_20_timestamp_time_wbidx" btree > (("timestamp"::time without time zone)) > "webtraffic_archive_day_2017_02_20_timestamp_wbidx" brin ("timestamp") You don't have a CHECK constraint on your individual partitions, so PG has no idea which of the partitions contains data from what time-period. To have PostgreSQL's constraint exclusion work to eliminate entire partitions, you need to add a CHECK constraint on the partition key. In this case, it looks like you'd want something like: CHECK (timestamp >= partition_date AND timestamp < partition_date+1) Unfortunately, that'll require locking each table and scanning it to make sure that the CHECK constraint isn't violated. A couple of other notes- you probably don't need both a BRIN and a btree index on the same column. If the BRIN works well enough for you then you can drop the btree index. If it doesn't, then you might as well just keep the btree index and drop the BRIN. It seems equally unnecessary to have a BRIN on a cast of the column. You should also be using timestamp w/ timezone, really. > This is the query planner: > explain analyze > select * from webtraffic_archive where timestamp::date='2017-02-20' and > profile='f62467' If this is really what you're mostly doing, having constraint exclusion and an index on 'profile' would probably be enough, if you insist on continuing to have the table partitioned by day (which I continue to argue is a bad idea- based on the number of total rows you mentioned and the number of partitions, you have partitions with less than 20M rows each and that's really small, month-based partitions with a BRIN would probably work better). If you get to the point of having years worth of daily partitions, you'd going to see increases in planning time. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Slow queries on very big (and partitioned) table
Greetings, * Job (j...@colliniconsulting.it) wrote: > we have a test machine with Postgresql 9.6.1 an about 800.000.000 record in a > table. > Table is partitioned by day, with indexes on partitioned table. You probably shouldn't be partitioning by day for such a small dataset, unless you've only got a few days worth of data that make up those 800m records. Having hundreds of partitions leads to slow query planning time. There is work happening to improve on this by having declarative partitions instead of using CHECK constraints and the constrain exclusion mechanism. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Frank, * Frank van Vugt (ftm.van.v...@foxi.nl) wrote: > Well, I didn't run into this issue with any of my db's that 'nicely' use > tables in various schema's, it was actually the one 'older' db with > everything > in the public schema that brought it up, so maybe keeping one of those around > isn't too bad an idea ;) Yeah, I'll be including this in a regression test also, to make sure we don't end up breaking this special case again in the future. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Greetings, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > On 02/13/2017 06:04 AM, Stephen Frost wrote: > >* Adrian Klaver (adrian.kla...@aklaver.com) wrote: > >>I am following this up to the point of not understanding what > >>exactly changed between 9.5 and 9.6. Namely 9.5 does include the > >>default ACL's in the dump output and 9.6 does not. > > > >Quite a bit in pg_dump changed, but the relevant bit here is that we now > >try to include in the pg_dump output any ACLs which have been changed > >from their initdb-time settings for initdb-time objects. What that > >means is that if you don't change the privileges for the public schema > >from what they're set to at initdb-time, then we don't dump out any ACL > >commands for the public schema. That ends up being incorrect in '-c' > >mode because we drop the public schema in that mode and recreate it, in > >which case we need to re-implement the ACLs which existed for the public > >schema at initdb-time. > > Thanks for the explanation in this post and your previous one. If I > am following pg_init_privs is the initial state of objects ACLs and > if that changes then those entries are removed. No, if the object is *dropped* then the entry is removed from pg_init_privs. Otherwise, the entries in pg_init_privs aren't changed. > So would not the > general case be, on recreating an object use the ACLs in > pg_init_privs if they exist otherwise use the ACLs as they exist > wherever they go to on change away from pg_init_privs? pg_init_privs doesn't track the object's name, so this isn't actually possible. Even if we did track the name of the object, I don't think we'd actually want to set the privileges to what they were set to at initdb time. If you drop the public schema and then recreate it, are you really expecting it to get the initdb-time privileges it had..? How would you reconsile that with default privileges (which we don't have for schemas right now, but it's been proposed...). This case is about a pg_dump, which is a very different case in that we want to recreate the state of the system as it existed at the time of the dump. > I gather that > is what you are proposing as a special case for the public schema. > Just wondering why it should not be the general case? Not quite.. This is about what pg_dump does when a -c is used. Unfortunately, it's *already* doing something special with the public schema (if it wasn't, then this wouldn't really be an issue..). This is just about making it do the right thing in that already-existing special-case. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Frank, * Frank van Vugt (ftm.van.v...@foxi.nl) wrote: > Op zaterdag 11 februari 2017 15:28:55 schreef Tom Lane: > > I'm inclined to argue that it was a mistake to include any non-pinned > > objects in pg_init_privs. > > > We might need to fix pg_dump too, but I think these entries in > > pg_init_privs should simply not be there. > > Thanks for picking this up, I'll probably see this subject pop up on hackers > and/or committers at some point ;) We should be able to get it addressed shortly. > Allow me to emphasize that this issue basically means that for v9.6 after > restoring a dump created with the '-c' option one ends up in a situation that > might be quite confusing for users that didn't have to pay much attention yet > to handling priviliges... i.e. trying even a plain select on table_a in the > public schema as a non-system user returns something like: > ERROR: relation "table_a" does not exist Yes, it's unfortunate that many users aren't really familiar with schema-level privileges. For your specific case, if you drop/recreate the public schema in the system that you're dumping the data from, and then set the ACLs to what you want, they should be dumped out, even with a pg_dump -c. It's only when you're using -c with the initdb-time public schema, and initdb-time ACLs, that the issue arises. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Adrian, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > I am following this up to the point of not understanding what > exactly changed between 9.5 and 9.6. Namely 9.5 does include the > default ACL's in the dump output and 9.6 does not. Quite a bit in pg_dump changed, but the relevant bit here is that we now try to include in the pg_dump output any ACLs which have been changed from their initdb-time settings for initdb-time objects. What that means is that if you don't change the privileges for the public schema from what they're set to at initdb-time, then we don't dump out any ACL commands for the public schema. That ends up being incorrect in '-c' mode because we drop the public schema in that mode and recreate it, in which case we need to re-implement the ACLs which existed for the public schema at initdb-time. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost <sfr...@snowman.net> writes: > > I'm not seeing a very simple answer for this, unfortunately. > > I'm inclined to argue that it was a mistake to include any non-pinned > objects in pg_init_privs. The reason initdb leaves some objects unpinned > is exactly because they can be dropped and recreated, and that means > that their "initial" privileges are not static system properties. > > We might need to fix pg_dump too, but I think these entries in > pg_init_privs should simply not be there. Wouldn't that mean we then need to drop and recreate all of those non-pinned objects, with their ACLs, on a -c? Or are you thinking we would just dump out whatever ACLs they have that are different from default object creation, meaning that we'd have ACLs for nearly all the views in pg_catalog and information_schema included in regular pg_dump output? Just dumping out the ACLs wouldn't necessairly get us back to the same state as the database which was dumped out though as the existing objects might already have some GRANTs performed on them. pg_init_privs also handles the initial privileges for extension objects, though I'm guessing you aren't including those. The initdb-time objects are certainly similar though, in some ways, to extensions, the main difference being that you can't really drop/recreate all of the initdb time objects. It seems to me that the oddball here is the public schema, and specifically, the public schema when the user has been using the from-initdb public schema (which has initdb-time privileges which are recorded into pg_init_privs). Had the user dropped and recreated the public schema, the initdb-time privs for the original public schema in pg_init_privs would have been removed, and the new public schema wouldn't have had any entries in pg_init_privs. The reason it's an oddball is that it's the only from-initdb time object which is dropped and recreated as part of pg_dump -c. If we dropped all non-pinned objects and recreated them when running pg_dump -c then perhaps what you're suggesting would work, but we don't and I don't think that's really what you were thinking we would do. While it's a bit grotty, I tend to think the best approach is to continue to assume that the only from-initdb-time object that the user actually wants us to drop/recreate on a pg_dump -c is the public schema, which means we need to handle things a bit differently when working in -c mode for the public schema when pulling ACLs. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public
Greetings, * Frank van Vugt (ftm.van.v...@foxi.nl) wrote: > I noticed the following and wondered whether this is intentional or an > oversight in pg_dump's '-c' option? > > The clean option causes the public schema to be dropped and recreated, but > this is done with the default schema priviliges, which are not the same as > the > ones assigned during create database: Interesting. The reason this happens is that the privileges for the public schema aren't dumped when they are the same as what you would get from a default install in 9.6+, but using -c will end up dropping and recreating it, which, as you note, will end up having different privileges than the default install because they'll be the regular default privilegs of "nothing" for schemas. This is happening just for the public schema due to how it's handled in a special way in pg_dump_archive.c:_printTocEntry(). This only impacts ACLs because those are the only things which are different for the public schema vs. it's initdb settings (there's no SECURITY LABEL, for example, on the initdb'd public schema). Due to how the public schema is (and always has been) handled in this special way, this is a bug which needs to be fixed by having the default ACLs for the public schema included in the dump output if -c is being used. I'm not seeing a very simple answer for this, unfortunately. I'm thinking we're going to need to pull the public schema's permissions differently if we're in clean mode (by comparing to the default schema privileges) vs. when we're not (in which case we should be comparing to *public*'s initdb-time privileges, as we do now). One option would be to handle that by hacking up buildACLQueries() to take a flag which basically means "we are dropping the public schema, do not consider its pg_init_privs settings" but that strikes me as awful grotty. Another option would be to change getNamespaces() to run a special query (perhaps as a UNION-ALL combination with the existing query) that is just to get the info for the 'public' schema (and exclude the 'public' schema from the first half of the query, of course). Thanks for the report! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Can we not give tyrannical pedants control of #postgresql?
Greetings, * Merlin Moncure (mmonc...@gmail.com) wrote: > On Thu, Jan 19, 2017 at 5:23 PM, Julian Paulwrote: > > I hope that particular stereotypes aren't proven here, but it appears > > #postgresql encourages a particular tier and makes aware of it's rigid > > hierarchy. I owe alot to #postgresql but not to these particular users, I've > > perhaps been idle for too long and the channel has change for the worse, > > well that's not my fault. I leave it with the community to sort out. > > I haven't been on irc much lately, but I've noticed this trend as well. I'm on it pretty regularly, though I wasn't when the event which started this thread happened, so I can't really speak to it and that's why I hadn't responded. In general, I feel like the channel is quite welcoming to newcomers, but there is often a bit of a learning curve and when others point things out that can sometimes be off-putting (not unlike our mailing lists..). In any case, I'm happy to try and help out if people feel that there's abusing of OPs or inappropriate behavior. Thanks! Stephen signature.asc Description: Digital signature
Re: R: [GENERAL] Partitioned "views"
Greetings, * Job (j...@colliniconsulting.it) wrote: > >>The mat view takes longer and longer to update because it runs the full > >>query. What you really want to do is have a side-table that you update > >>regularly with appropriate SQL to issue UPDATE statements for just the > >>current day (or whatever). > > If correct, i leave only last datas into "side-table" and syncronize only > these datas in the materialized view. > If i delete datas from the side-table (ex- truncate) during the next > materialized view update they will be lost or remain intact? I was suggesting that you, essentially, write your own SQL to have a materialized view, *not* use the PG materialized view system. In other words, the 'side-table' that you create would be *your* materialized view, but to PG, it'd just look like a regular table. You can't modify a PG materialized view. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Partitioned "views"
Greetings, * Job (j...@colliniconsulting.it) wrote: > we use a materialized view to aggregate datas from a very big table > containing logs. > The source table is partitioned, one table for a day. > > Since the refresh of the materialized view seems to grow a lot about timing, > we would like to know if it is pssible to make a "partitioned materialized > views", in order to update *only* current day logs aggregation, leaving old > days untouchable. You can do this, but you have to write the SQL for it yourself, there's no current way in PG to ask for a materialized view to be partitioned. The mat view takes longer and longer to update because it runs the full query. What you really want to do is have a side-table that you update regularly with appropriate SQL to issue UPDATE statements for just the current day (or whatever). Another approach which can be used is to have a trigger which will automatically update the side-table for every change to the 'big' table, but that will mean every update on the 'big' table takes longer and if the updates are happening concurrently then you may run into locking, and possibly deadlocking if it's done wrong. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Doubts regarding postgres Security
* Thomas Kellerer (spam_ea...@gmx.net) wrote: > Stephen Frost schrieb am 21.01.2017 um 22:37: > >>So, there is no solution for my first question, we need if users enter the > >>wrong password more than 5 times than their account gets locked and then > >>only DBA will unlock this account. > > > >I understood the question and there is an answer- use PAM. > > What about LDAP? Perhaps, but not what I'd recommend. If you've got LDAP then you probably have Active Directory and you should just be using GSSAPI. If you've actually *only* got LDAP, then perhaps, but all the LDAP methods require the user's password to be sent, in one form or another, to the PG server, which isn't ideal, and usually there's a better option in such environments. Of course, PAM requires the user's password to be sent to the server also, which is also why I don't generally recommend using it either, but without more info ... Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Doubts regarding postgres Security
Pawan, Please be sure to include the mailing list on replies, so others can benefit from the discussion. Also, please reply in-line, as I do below, instead of top-posting. * PAWAN SHARMA (er.pawanshr0...@gmail.com) wrote: > So, there is no solution for my first question, we need if users enter the > wrong password more than 5 times than their account gets locked and then > only DBA will unlock this account. I understood the question and there is an answer- use PAM. The other option is to not use password-based authentication with PG and instead use a better method such as certificate-based auth or GSSAPI. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Doubts regarding postgres Security
Pawan, * PAWAN SHARMA (er.pawanshr0...@gmail.com) wrote: > 1. How can we set user account block feature after max number of > invalid password entries? There are ways to accomplish this, but they're unfortunately complicated. In the past, I've implemented these kinds of requirments by using the 'PAM' auth method and configuring saslauthd and PAM with various PAM modules. Unfortunately, most PAM modules are not very well suited to running as non-root or with alternate directories, so it can be a bit tricky. > 2. How can we use SSL encryption in Postgres on Linux environment? This can be a bit tricky too, depending on your PKI, but generally speaking, you simply need to configure the SSL-relevant options in postgresql.conf and then set SSLMODE=verify-ca (or verify-full, depending on what you want to do). Specific documentiaton is available here: https://www.postgresql.org/docs/current/static/ssl-tcp.html > 3. How can we transparent data encryption in Postgres? If you mean between PG and the filesystem, you could either use full disk encryption provided by the OS, or like pgcrypto. Documentation for pgcrypto is here: https://www.postgresql.org/docs/current/static/pgcrypto.html Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] PgPool or alternatives
Simon, * Simon Windsor (simon.wind...@cornfield.me.uk) wrote: > My employer wants to move from an in house Oracle solution to a > cloud based Postgres system. The system will involve a number of > data loaders running 24x7 feeding several Postgres Databases that > will be used by internal applications and external customer > applications. > > For the record, internal and external applications make heavy use of > Temporary tables, that are session related. This requirement means I > cannot consider normal replication methods. > > Is PgPool the only viable that will allow the system the data > loaders to feed [n] databases that will be functional identical? I'm not sure what you mean by 'functional identical', but I wouldn't generally consider that to be a property of pgpool (or pgbouncer, or any other connection pooler, really). That said, my general feeling is that pgbouncer tends to be simpler, faster, and less likely to introduce oddities that you don't expect. The 'session' mode might work for you, though it might be debatable if that really helps you all that much. 'transaction' mode is what I usually recommend as it allows idle connections to be handled by pgbouncer (unlike 'session' mode), but there are caveats to using that mode, of course. I'm a bit curious where you're thinking of using the connection pooler also though. If you have data loaders running 24x7 feeding data constantly to PG, do you really need a connection pooler for those? Connection poolers make a lot of sense for environments where there's lots of down-time on the connection, but the less down-time, the less they make sense. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Combining count() and row_number() as window functions
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Thomas Kellererwrites: > > I assumed that the count() wouldn't increase the runtime of the query as > > the result of the row_number() can be used to calculate that. > > No such knowledge exists in Postgres. Given our general approach in which > functions (including window functions) are black boxes, it's hard to see > how it could be done in a way that wasn't a ugly kluge. No, but what's interesting about this is that the WindowAgg count(*) query is, apparently, quite a bit slower than the subquery with regular aggregate of count(*), but the WindowAgg plan is costed out as being cheaper. I put the two up if anyone else finds that easier to read: https://explain.depesz.com/s/bc67 https://explain.depesz.com/s/UWZt That said, it probably doesn't matter if it was costed cheaper since I don't think we would actually consider running an aggregate with an 'OVER ()' clause as a regular aggregate instead of as a WindowAgg. I don't know how expensive it would be to consider such a path, but it seems like it might not be too bad since you would only look at those cases if it's an empty window clause, which should be cheap to check. The other thing I wonder about is if there's some way we could make the WindowAgg query faster through code changes in how the windowing count(*) is called. I've not really looked at any code, this is all pure speculation, so feel free to ignore me if I'm completely off-base here. :) Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Are new connection/security features in order, given connection pooling?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Guyren Howewrites: > > I’m not following. What I would like is just a lightweight way to switch > > the connections to use a different role, or some moral equivalent, that > > would prevent an SQL injection from wrecking havoc. I’m not proposing > > anything that will change anything else about how the application is using > > the database. > > > SET ROLE doesn’t work, because an SQL injection can just SET ROLE back to > > the privileged user. > > > I would like a regime where there is no permanent privileged relationship > > between the client application and the database; a user would need to > > supply validating information that we can use to switch a connection to > > something with minimal privileges for just that role, for the duration of > > one session or transaction. > > And what happens after that? You can't reuse the session anyway, because > there would be no way to switch it back to a more-privileged state so you > could use it for another user ID. (If there were a way, then a successful > SQL-injection attack could use it too.) That's based on an incorrect assumption that the state which allows a connection to be reused by another user ID is 'more privileged'. If we had a way to require authentication for a 'SET ROLE' call then you would be able to do what Guyren is asking about and when you're done with the session something issues 'RESET ROLE;' and you're back to a role that *isn't* more privileged and to move to another user ID you have to go through the authentication process again. This isn't fantasy, it's been done, but it's more complicated than it really should be, as noted up-thread. There's also complications around how to handle the authentication credentials so you aren't constantly asking the user to re-authenticate, but that's not impossible to deal with either, you just need something akin to the cookie/unique ID used by every website that authenticates users today. > If you end up having to dedicate each connection to a particular > unprivileged userID, then you can just open the connection as that user > to start with; a magic one-way privilege switch doesn't really help. Well, to that I'd say "it depends." There are certainly scenarios where you have multiple userIDs and a connection pooler like pgbouncer which handles the different connections to the database and it's a lot better than making new connections because new connections are so painful and slow to create. This doesn't work great if you have thousands of concurrently connected independent users, of course. > We've discussed this problem repeatedly (you might want to search the > archives a bit) and never found a solution that was both fully secure > and did much of anything for connection-pooling scenarios. I don't agree that this is unsolvable, but it would require things like protocol-level changes which no one has had the gumption to work through and propose. In short, I agree with Guyren, there are features needed here that we don't have and it would be a great deal better if we did. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] LYDB: Feasible to use PG roles instead of application-level security?
Greetings, * Guyren Howe (guy...@gmail.com) wrote: > it occurs to me to wonder whether it is practical to use PG’s own roles and > security model in lieu of using an application-level one. The short answer is yes. > It seems that the role system in PG is sufficient for most general purposes. > One could presumably also have a table with role names and associated > metainformation (email address etc) as needed. Yup. That can get a bit awkward if you have multiple databases inside of a single cluster, as you would have to pick which database to put that metainformation in, but that isn't a very big issue. > If I have a system with many thousands of users, is it practical to manage > these users’ authentication and authorization using *just* Postgres? For this, it really depends on if the PG authorization model matches the requirements you have. The PG auth model, particularly with RLS, is extremely flexible but you would really need to evaluate what the exact requirements are and how you would handle that with the PG auth model. Of course, if there are just a few exceptions or complicated cases that can't be satisfied directly with PG today, you could use security definer functions. One area that isn't fully addressed with the PG auth model today is partial access to a certain column. Consider a table where you want users to have access to all of the rows and all of the columns *except* for column X for rows where ID is > 1000. The PG auth model today can be used to say "you can't access column X" or to say "you can't access rows where ID > 1000" but you can't combine those, yet. I'm hopeful that we'll get there as there are definitely use-cases for that kind of access control, but it's unlikely to happen for PG10. > It occurs to me that some client frameworks might have issues with their > connection pools if those connections keep switching users, assuming they > even can, but let’s set that aside for now. Or perhaps every connection could > immediately do a SET USER before executing its connection? Again, yes, connection poolers can be an issue, but it's possible to use the role system and do a 'set role X' after having connected as some user that has very little access. The issue here is controlling that role change- there's no direct way in PG today to require a password to be provided when doing the role change, which is unfortunate. One approach to solving that with RLS is to use a security definer function to change a record in a table that is then used in all RLS policies. It's a bit complicated and doesn't involve doing 'set role' though, so there are some trade-offs there. If you really want connection pooling and independent users in PG's role system then you'll end up having to have the app code do the authentication (or maybe auth to PG as the user and, if successful, reconnect as the regular user and set role... that's pretty awkward though) and then connect and do the 'set role'. One big question here, however, is if you're going to have thousands of *concurrently connected* users. Thousands of users shouldn't be too much of an issue, but if they're all connected using PG's main auth system then you'll have thousands of backend processes running. That'll end up causing some amount of overhead even if they're mostly idle. If your application can handle connecting/disconnecting pretty easily and you have a relativly short timeout (though, ideally, not too short) then perhaps your number of concurrent connections won't be too bad. > This seems an attractive proposition from a security standpoint: if I use > row-level security pervasively, I can have a security system that’s nestled > nice and close to the data and presumably tricky to work around from a hacker > given direct access only to the client application. If that's the threat model you want to address then you'll have to work out the concurrent connections question. One thing which can help is to use a common user for 'read-only/public-access (or at least low-value)' queries from the app, if there are such. > Is this practical? Has anyone here done it? What might the caveats be? Yes, yes, see above. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Is there a reason the "-v/verbose" switch is not printed with pg_dumpall --help
Daniel, * Stephen Frost (sfr...@snowman.net) wrote: > * Daniel Westermann (daniel.westerm...@dbi-services.com) wrote: > > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall -V > > pg_dumpall (PostgreSQL) 9.6.1 > > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall --help > > pg_dumpall extracts a PostgreSQL database cluster into an SQL script file. > > Hmm. Looks like an oversight, will see about fixing it. I've pushed a fix for this. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Querying dead rows
* Rakesh Kumar (rakeshkumar...@outlook.com) wrote: > Is there a way to query dead rows (that is, rows which are dead and still not > cleaned up by Vacuum) using SQL. I am asking this just as an academical > question. CREATE EXTENSION pageinspect; https://www.postgresql.org/docs/9.6/static/pageinspect.html Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Is there a reason the "-v/verbose" switch is not printed with pg_dumpall --help
Daniel, * Daniel Westermann (daniel.westerm...@dbi-services.com) wrote: > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall -V > pg_dumpall (PostgreSQL) 9.6.1 > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall --help > pg_dumpall extracts a PostgreSQL database cluster into an SQL script file. Hmm. Looks like an oversight, will see about fixing it. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Recursive row level security policy
Simon, * Simon Charette (charett...@gmail.com) wrote: > Ahh makes sense, thanks for the explanation! > > I was assuming USING() clauses were executed in the context of the > owner of the policy, by passing RLS. No, as with views, a USING() clause is executed as the caller not the owner of the relation. Security Definer functions can be used to execute actions in the policy as another user. Note that RLS won't be applied for the table owner either (unless the relation has 'FORCE RLS' enabled for it), so you don't have to have functions which are run as superuser to use the approach Joe recommended. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Postgresql 9.6 and Big Data
Job, * Job (j...@colliniconsulting.it) wrote: > we are planning to store historically data into a Postgresql 9.6 table. The question is less about what you're storing in PG and more about what you're going to be doing with that data. > We see on Postgresql limit that it could handle "big data". > In fact, limit for a single table is 32 Tb. Very few people would advocate storing a single, non-partitioned, 32TB table in any database system. As with most other database technologies, you'll want to partition up your data. The general rule of thumb is to partition at the 100s-of-millions of tuples level which tends to lead to tables which are somewhere between 5G and 100G. At 100G per table, you would end up with a few hundred tables to get up to 32TB, which is quite managable in PG. Indeed, PG can support many more tables than that, but putting more than a few hundred into a single inheritance structure using constraint exclusion for the partitioning can lead to long planning times. > We need to archive this data to generate report and analysis views. Is the historical data changing? Are the queries changing? Will you primairly be performing queries which look at all of the data? > Anyone has experienced Postgresql 9.6 with Database and table bigger than > four or five Terabytes? Yes. > Which hardware requirements has been used? This depends a great deal on what you're actually doing. > There were some problems or bottleneck with so big data? If you're looking to run queries against 4 or 5 TBs at a time where all of the tuples have to be considered (no index-based lookups, etc), then you'll want to be doing parallel work. With 9.6, you could try using the parallel query capability, but even with 9.5 and earlier you could pretty trivially write your own parallel query system by using multiple connections to the database and it works very well. The only downside to using multiple queries is if you have ongoing changes happening and you need a consistent view of the data- that's still possible to do with multiple processes, but you have to do some prep work ahead of time to extract out the keys for all of the records you want to process in a given run and store them in a static side-table that the parallel processes then use. You have to avoid removing records that are included in that set, of course, and depending on your requirments you might wish to avoid updating them also. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Row level security performance joining large tables
David, * David R. Pike (david.p...@trustedconcepts.com) wrote: > From what I can understand the RLS implementation strives to execute policy > checks before user provided predicate checks so as to avoid leaking protected > data. Is there any way to make the join look "safe" to the optimizer to > avoid full table scans? Isn't this a common scenario? You can use a security barrier view which is owned by the same user that the tables underneath are owned by, that will bypass RLS on the tables themselves and therefore you'll need to implement the appropriate quals in the security barrier view. As Tom mentions, we're working to improve RLS optimization as well. As is pretty common with various features, the initial implementation provides the functionality but perhaps isn't as performant as one might like, and then we iterate and improve it in the subsequent releases. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] avoiding index on incremental column
* t.dalpo...@gmail.com (t.dalpo...@gmail.com) wrote: > I've a very huge table whose 1st column is a numeric value, starting > from 0 at the 1st row and incremented by 1 each new row I inserted. > No holes, no duplicates. > I need to perform some very fast query based on this value, mainly > around the last inserted rows. > What is the best I can do? A normal index or is there a way to > instruct the system to take advantage from that strong order? > In theory, the number of the row is the only info the system would > need to directly access that row. > So I'd like to avoid useless complex indexes if possible. A BRIN index should work pretty well in that scenario. A btree index would most likely be better/faster for query time, but more expensive to maintain. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] BRIN indexes and ORDER BY
Darren, * Darren Lafreniere (dlafreni...@onezero.com) wrote: > Tom Lanewrote: > > > Gavin Wahl wrote: > > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > > >> just find the page range with the largest/smallest value, and then only > > >> scan that one. Would that be hard to implement? I'm interested in > > working > > >> on it if someone can give me some pointers. > > > > I think this proposal is fairly broken anyway. The page range with the > > largest max-value may once have contained the largest live row, but > > there's no guarantee that it still does. It might even be completely > > empty. You could imagine an algorithm like this: > > > > 1. Find page-range with largest max. Scan it to identify live row with > > largest value. If *no* live values, find page-range with next largest > > max, repeat until no page ranges remain (whereupon return NULL). > > > > 2. For each remaining page-range whose indexed max exceeds the value > > currently in hand, scan that page-range to see if any value exceeds > > the one in hand, replacing the value if so. > > > > This'd probably allow you to omit scanning some of the page-ranges > > in the table, but in a lot of cases you'd end up scanning many of them; > > and you'd need a lot of working state to remember which ranges you'd > > already looked at. It'd certainly always be a lot more expensive than > > answering the same question with a btree index, because in no case do > > you get to avoid scanning the entire contents of the index. [...] > A b-tree index would certainly be faster for ordering. But in scenarios > where you have huge datasets that can't afford the space or update time > required for b-tree, could such a BRIN-accelerated ordering algorithm at > least be faster than ordering with no index? For at least some of the common BRIN use-cases, where the rows are inserted in-order and never/very-rarely modified or deleted, this approach would work very well. Certainly, using this would be much cheaper than a seqscan/top-N sort, for small values of 'N', relative to the number of rows in the table, in those cases. In general, I like the idea of supporting this as BRIN indexes strike me as very good for very large tables which have highly clumped data in them and being able to do a top-N query on those can be very useful at times. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Multi tenancy : schema vs databases
* Jeff Janes (jeff.ja...@gmail.com) wrote: > On Fri, Sep 30, 2016 at 2:06 AM, Rakesh Kumar> wrote: > > We require complete data isolation. Absolutely nothing should be shared > > between two tenants. > > Then you need different clusters per tenant. Otherwise, the WAL records of > different tenants are inextricably mingled together. Different clusters are also required to have independent file-level backups, independent roles, independent tablespaces, etc. It's also far easier to move a single cluster from one system to another to adjust for growth than to try and move an individual schema or database. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Privileges on public schema can't be revoked?
Gregm * Greg Fodor (gfo...@gmail.com) wrote: > A, I wasn't aware of the PUBLIC meta-role. Not sure if it's useful > feedback, I spent a lot of time digging around the web for solutions > that would basically let me query the database to see all of the > effective privileges for a user, and none of the solutions I found > were able to get me to a point where this was clear, since they all > queried against the various information schema tables that I think > neglect to take into account the PUBLIC meta-role. \dn+ in psql will give you the access privileges for all schemas. I'd have to look at the "other solutions" you're referring to, but, in general, we do not exclude the public role in any way from the access privilege system. The \dn+ above should return something like: postgres=UC/postgres+ =UC/postgres for the Access Privileges column for the public schema, which shows that the 'postgres' role and the '' role (aka, 'public') have been granted both USAGE and CREATE on that schema. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Privileges on public schema can't be revoked?
Greg, * Greg Fodor (gfo...@gmail.com) wrote: > Apologies in advance about this since it is likely something obvious, > but I am seeing some very basic behavior that does not make sense. > I've tested this on a fresh build of 9.6rc1 and also 9.1.24 (just to > see if it was a regression.) After creating a test database, and a > test user that I revoke all privileges on the public schema to, yet > that user is still able to create tables in the public schema. Revokes > on other schemas work as expected, it seems the public schema is > treated specially. > > https://gist.github.com/gfodor/c360683f25f55497c8c657255fd0e0f8 > > Any help appreciated! The privilege on the public schema was granted to "public" and therefore must be revoked from "public" to remove that privilege. Revoking from "guy" doesn't have any effect. Note that if you revoke all privielges from 'public' then only users who have been explicitly granted access will be able to create or *use* any objects in the public schema. Generally, I revoke CREATE rights from the public schema, but leave USAGE rights, as I then put trusted extensions and other tools into the public schema. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] restore a specific schema from physical backup
* David Steele (da...@pgmasters.net) wrote: > On 7/29/16 5:31 PM, Rakesh Kumar wrote: > > Sure. > > > > 1 - You ran pg_basebackup on node-1 against a live cluster and store > > it on NFS or tape. > > 2 - Do a restore on node-2 from the backup taken on (1), but only for > > a subset of the database > > (schema/database) > > 3- Put the cluster live on node-2 after (2) completes. Essentially the > > cluster will now be a small > > subset of cluster on node-1. > > > > Benefit: If I have to restore only 5% of entire db, it should be lot faster. > > pgBackRest allows specified databases to be restored from a cluster backup: > > http://www.pgbackrest.org/user-guide.html#restore/option-db-include > > I know you are interested in schema-level restores but this is the > closest thing that I know of. We have discussed providing the ability to restore a subset of a database from a physical backup, but it's far from trivial. Working out what files contain the catalog requires first reading through pg_filenode.map and then understanding the structures of the relevant catalogs. Only then will you know what schemas and tables exist and what their relfilenode's are, which is necessary to perform the restore of those objects. Of course, WAL replay still has to be performed also, to reach a consistent backup point. We've worked out how to get that to work, though if you have a lot of WAL then that can still take a bit of time and disk space. With sufficient interest and resources, we might be able to make it happen, but I wouldn't expect it near-term. Until then, at least the database-level option, as David mentioned, can be used. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Allowing multiple versions of PG under NetBSD
* Larry Rosenman (l...@lerctr.org) wrote: > On 2016-07-29 15:14, Bruce Momjian wrote: > >On Fri, Jul 29, 2016 at 03:09:59PM -0500, Larry Rosenman wrote: > >>>Data Directory naming, as well as keeping the init-scripts straight. > >>> > >>And who gets 5432, and Unix socket naming, it starts to get messy. > > > >Well, pg_upgrade uses a socket created in the current run directory, so > >that should be fine. > if we're talking JUST for pg_upgrade, that's one thing. Peaceful > co-existence on an ongoing basis > is quite another. It's not an insurmountable problem, though it's a bit painful. Still, both the Debian-based and RedHat-based distributions demonstrate how it can be done. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Uber migrated from Postgres to MySQL
Jerry, * Jerry Sievers (gsiever...@comcast.net) wrote: > Bruce Momjianwrites: > > I agree, but I am not sure how to improve it. The big complaint I have > > heard is that once you upgrade and open up writes on the upgraded > > server, you can't re-apply those writes to the old server if you need to > > fall back to the old server. I also don't see how to improve that either. > > Hmmm, is it at least theoretically possible that if a newly upgraded > system were run for an interval where *no* incompatible changes to DDL > etc had been done... > > ...that a downgrade could be performed? > > Er, using a not yet invented pg_downgrade:-) The short answer is 'no'. Consider a case like the GIN page changes- as soon as you execute DML on a column that has a GIN index on it, we're going to rewrite that page using a newer version of the page format and an older version of PG isn't going to understand it. Those kind of on-disk changes are, I suspect, why you have to set the "compatibility" option in the big $O product to be able to later do a downgrade. > That is, since higher version knew enough about lower version to > rejigger everything... just maybe it could do the reverse. That might work if you opened the database in read-only mode, but not once you start making changes. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] pg_dumping extensions having sequences with 9.6beta3
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Tue, Jul 26, 2016 at 4:50 PM, Noah Mischwrote: > > [Action required within 72 hours. This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 9.6 open item. Stephen, > > since you committed the patch believed to have created it, you own this open > > item. If some other commit is more relevant or if this does not belong as a > > 9.6 open item, please let us know. Otherwise, please observe the policy on > > open item ownership[1] and send a status update within 72 hours of this > > message. Include a date for your subsequent status update. Testers may > > discover new open items at any time, and I want to plan to get them all > > fixed > > well in advance of shipping 9.6rc1. Consequently, I will appreciate your > > efforts toward speedy resolution. Thanks. > > > > [1] > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com > > I am not sure what's Stephen's status on this item, but I am planning > to look at it within the next 24 hours. That'd be great. It's definitely on my list of things to look into, but I'm extremely busy this week. I hope to look into it on Friday, would be great to see what you find. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Memory usage per session
* amatv...@bitec.ru (amatv...@bitec.ru) wrote: > > On 08/07/2016 14:11, amatv...@bitec.ru wrote: > >> The test performs about 11K lines of code > >> Memory usage per session: > >> Oracle: about 5M > >> MSSqlServer: about 4M > >> postgreSql: about 160М > > > Visual C??? > > You will have to run PostgreSQL on a proper Unix system to test for > > performance. > Of cause we understand that unix is faster( damn fork :))) > Our Current problem is memory: > (160m vs 5M) * 100 sessions = 16G vs 0.5G > We just can get "out of memory". > :((( Do you have 100 CPUs on this system which apparently doesn't have 16G of RAM available for PG to use? If not, you should probably consider connection pooling to reduce the number of PG sessions to something approaching the number of CPUs/cores you have in the system. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Fastest memmove in C
* Merlin Moncure (mmonc...@gmail.com) wrote: > Well, testing is the key here. Microbechmarks demonstrating the value > are not enough; proven statistically relevant benchmarks generated > from postgres are the data points needed to make an assessment. My > recommendation would be to dynamically link in these routines to > postgres and start running a battery of memory heavy tests (start with > pgbench -S on a small database). Ideally you could tease out some > measurable benefit; license discussions and formal integration > strategies are secondary to that IMO. While I agree with this, I'm trying to figure out why this isn't being incorporated into glibc instead..? Perhaps I missed it, but I didn't see a discussion of that in the article. I'd certainly rather rely on glibc if we can, though I know that we've ended up implementing our own routines at times too. On the other hand, if there's a reason the glibc folks don't want this, we should consider that.. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] How safe is pg_basebackup + continuous archiving?
Greetings, * Kaixi Luo (kaixi...@gmail.com) wrote: > We use PostgreSQL at work and we do daily backups with pg_dump. After that > we pg_restore the dump and check the database that there isn't any data > corruption. As the database grows, the whole pg_dump / pg_restore cycle > time is quickly approaching 24h, so we need to change strategies. I've found this to be a good strategy also, but it's far from perfect. Corruption can still occur, for example, in indexes on the primary system. Generally speaking, pg_dump doesn't exercise indexes and therefore you won't notice if an index is corrupt. > We've thought about using pg_basebackup + continuous archiving as an > alternative backup process, but I have doubts regarding the safety of such > procedure. As far as I know, pg_basebackup is done via rsync (and we also > archive wals using rsync), so if by any chance disk corruption occurs on > the master server, the corruption would be carried over to our backup > server. This is correct, but checksums are now available in modern versions of PG, which will detect disk corruption. Those checksums would be carried over to the backup server and could be verified there by using pg_dump (note that this still wouldn't help with indexes, but you don't have coverage there today anyway). > How can we check for backup corruption in this case? Thanks you very much. There has been some discussion about a specific tool for checking the checksums throughout the entire system. I don't know of anyone activly working on that, unfortunately. There are a number of tools available to help with online backups and continuous archiving beyond pgbasebackup and having to hand-roll scripts. I'm personally biased towards and prefer pgBackRest, as I helped start that project, but there are other tools, such as barman and WAL-E, which would still be better than trying to implement everything correctly on your own. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] WAL's listing in pg_xlog by some sql query
* Michael Paquier (michael.paqu...@gmail.com) wrote: > On Mon, Jun 6, 2016 at 9:51 PM, Stephen Frost <sfr...@snowman.net> wrote: > > * Vik Fearing (v...@2ndquadrant.fr) wrote: > >> On 03/06/16 04:32, Michael Paquier wrote: > >> > On Fri, Jun 3, 2016 at 11:23 AM, Sameer Kumar <sameer.ku...@ashnik.com> > >> > wrote: > >> >> On Fri, Jun 3, 2016 at 4:30 AM Stephen Frost <sfr...@snowman.net> wrote: > >> >>> Given the usefulness of this specific query and that it could be used > >> >>> without risk of the user being able to gain superuser access through > >> >>> it, > >> >>> I'd like to see a new function added which does not have the superuser > >> >>> check, but is not allowed to be called by public initially either. > >> > >> CREATE FUNCTION ls_dir(text) > >> RETURNS SETOF text > >> LANGUAGE sql > >> SECURITY DEFINER > >> AS 'select * from pg_ls_dir($1)'; > > > > This isn't a good idea as it allows access to a great deal more than > > just the number of xlogs. Further, as described above, it gives that > > access to everyone and not just to specific roles. > > Sure, because it is possible to a path string at will. In the context > of this use case, Alex could just hardcode pg_xlog and nothing else > than the list of files in this path would be leaked to a user who has > the execution grant right of this function. Part of what I was getting at is that it's better for a specific function to be designed and implemented carefully than everyone having to write their own (often with poor results, as above). For a common use-case, such as this, that function would be best provided as part of core rather than out in a contrib module or only in the mailing list archives or similar. > > This is a great example of why we should provide an explicit function > > which is documented (both in our documentation and in the documentation > > of tools like check_postgres.pl) that users can use and can GRANT access > > to for their monitoring systems which gives access to only the > > information needed- that is, the number of xlog segments. > > I have been wondering for some time now about the possibility to have > at SQL level a representation of the shared memory structure > XLogCtlData instead, though there is no tracking of what is the newest > segment that has been recycled ahead, but it could be a reason to > track that as well. I've not looked into it myself, but off-hand that does seem useful. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] postgres_fdw and Kerberos authentication
Jean-Marc, * Jean-Marc Lessard (jean-marc.less...@ultra-ft.com) wrote: > Stephen Frost [sfr...@snowman.net] wrote: > > The database owner operating system user has to be trusted, along with any > > superusers in the database, but if you assume those, then having PG manage > > the different Kerberos cache files > > (one for each backend which has authenticated via Kerberos and passed > > through delegation credentials) should work. > > Clearly, we can't give the user control over which credential cache to use. > > True, in such a case (single sign on) the user should not specify a user in > the user mapping, so that its own Kerberos ticket be used to authenticate. I don't know that it's actually an issue for the user to specify the mapping- either it'll be allowed or not, based on the credentials in the Kerberos cache and pg_ident mappings. What we can't do is allow the user to control which cache they are able to use. In other words, there should be one credential cache per backend process and that holds exactly the credentials which are forwarded from the client system. > > Having to trust the OS user and superusers with those credentials isn't any > > different from using passwords with postgres_fdw. > > OS user and superusers, should not have access and allowed to manage the > credential files. This isn't possible with traditional Unix permissions. Perhaps something could be done with SELinux, but we're not going to depend on that. Ultimately, the credential cache must be available to the backend process, which runs as the OS user. The PG superuser can execute arbitrary commands as the OS user, so there isn't any distinction between the OS user and the PG superuser. As mentioned up-thread, this is exactly the same as Apache, except that Apache happens to run as root whereas we run as a non-root user. > For example, in a secure environment with separation of duties at the > organization level (tier1, tier3, superuser, sys admins, etc), the tier1 DB > users cannot connect onto the DB server (as OS user), but may move data form > one database to another. Sure, I assumed that we were discussing a case where DB users connect to the database, not log on to the DB server as an OS user. > I agree that tier1 users cannot query the catalog and see other user > password, but a superuser can, which is considered a security breach by > auditors. > Storing a password in plain text even for a short period of time is > unfortunately not authorized. Agreed. This isn't the same as a Kerberos credential cache, but it's not as far different as one might assume either. The superuser will be able to access the credential cache of anyone who has forwarded their Kerberos ticket to the server, which is the same for any environment that allows Kerberos credential proxying. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] WAL's listing in pg_xlog by some sql query
* Vik Fearing (v...@2ndquadrant.fr) wrote: > On 03/06/16 04:32, Michael Paquier wrote: > > On Fri, Jun 3, 2016 at 11:23 AM, Sameer Kumar <sameer.ku...@ashnik.com> > > wrote: > >> On Fri, Jun 3, 2016 at 4:30 AM Stephen Frost <sfr...@snowman.net> wrote: > >>> Given the usefulness of this specific query and that it could be used > >>> without risk of the user being able to gain superuser access through it, > >>> I'd like to see a new function added which does not have the superuser > >>> check, but is not allowed to be called by public initially either. > > CREATE FUNCTION ls_dir(text) > RETURNS SETOF text > LANGUAGE sql > SECURITY DEFINER > AS 'select * from pg_ls_dir($1)'; This isn't a good idea as it allows access to a great deal more than just the number of xlogs. Further, as described above, it gives that access to everyone and not just to specific roles. This is a great example of why we should provide an explicit function which is documented (both in our documentation and in the documentation of tools like check_postgres.pl) that users can use and can GRANT access to for their monitoring systems which gives access to only the information needed- that is, the number of xlog segments. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] psql remote shell command
* David G. Johnston (david.g.johns...@gmail.com) wrote: > On Thu, Jun 2, 2016 at 7:30 PM, Denniswrote: > > Is it possible to execute command in on system the is hosting postgresql > > remotely using psql or other mechanism? I know I can use \! in psql but > > that executes the commands on the host where I am running psql from. Also, > > is it possible for a postgres login/user to stop or restart a running > > postgres instance from in side psql or similar client. e.g. psql -c “pg_ctl > > stop -D some_data_dir” > > ssh user@hostname ? > > You should ask your system administrator for help. Odds are if you cannot > do "ssh user@hostname" then the person hosting the server doesn't want you > to be able to execute arbitrary commands on the host. To be fair, if the administrator really didn't want you to have access to the postgres unix user account on the system, they shouldn't give you a PG superuser account. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] psql remote shell command
* Dennis (denn...@visi.com) wrote: > Is it possible to execute command in on system the is hosting postgresql > remotely using psql or other mechanism? I know I can use \! in psql but that > executes the commands on the host where I am running psql from. Also, is it > possible for a postgres login/user to stop or restart a running postgres > instance from in side psql or similar client. e.g. psql -c “pg_ctl stop -D > some_data_dir” You can use COPY with PROGRAM, assuming you're a superuser. COPY (select 1) TO PROGRAM 'whatever command'; You could use that to issue a pg_ctl stop, though you'll then lose your connection to the database. You can 'reload' the running configuration by using: select pg_reload_conf(); Not all parameters can be changed while PG is running, but most of them can be. Parameters in postgresql.conf can be updated via ALTER SYSTEM through psql also. Issuing a 'pg_ctl restart' via COPY PROGRAM isn't a good idea and may not work, though I suppose you could try if you really wish to. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] WAL's listing in pg_xlog by some sql query
David, * David G. Johnston (david.g.johns...@gmail.com) wrote: > On Thu, Jun 2, 2016 at 4:29 PM, Stephen Frost <sfr...@snowman.net> wrote: > > > * Sameer Kumar (sameer.ku...@ashnik.com) wrote: > > > On Fri, 3 Jun 2016, 12:14 a.m. Alex Ignatov, <a.igna...@postgrespro.ru> > > > wrote: > > > > Can I list all WAL files in pg_xlog by using some sql query in > > Postgres? > > > > > > Try > > > > > > Select pg_ls_dir('pg_xlog'); > > > > Note that this currently requires superuser privileges. > > > > Given the usefulness of this specific query and that it could be used > > without risk of the user being able to gain superuser access through it, > > I'd like to see a new function added which does not have the superuser > > check, but is not allowed to be called by public initially either. > > > > Something along the lines of 'pg_xlog_file_list()', perhaps. There is a > > check in check_postgres.pl which could take advantage of this also. > > Should be a very straight-forward function to write, perhaps good as a > > starter project for someone. > > > > Isn't this the reason we created the newfangled pg_* roles in 9.6? No, the default roles are specifically to address situations where our GRANT system is unable to provide the privilege granularity necessary; ie: the function needs to be executable by 'public' but should behave differently depending on if the individual calling it has privileged access or not. In other words, a case like pg_cancel_query/pg_terminate_backend, where users can cancel queries of roles they are a member of, superusers can can cancel queries of all roles, and members of pg_signal_backend can cancel queries for all non-superusers. In this case, I think we'd want a whole new function, in which case it does not need to be callable by a non-privileged individual and does not need to distinguish between a non-privileged user, a privileged user, and superuser. Technically, we could have the pg_ls_dir() function check its argument and decide to allow it if some new default role 'pg_allow_xlog_ls' existed and the user was a member of it, but that strikes me as a whole lot of unnecessary complexity and potential for issue, not to mention that it certainly wouldn't be very straight-forward to document or explain to users. The suggested function would also be able to take additional arguments, or maybe a second column in the result set, to extract/identify subsets of xlogs ("xlogs waiting to be archived via archive_cmd", "xlogs being held due to wal_keep_segments", etc). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] WAL's listing in pg_xlog by some sql query
* Sameer Kumar (sameer.ku...@ashnik.com) wrote: > On Fri, 3 Jun 2016, 12:14 a.m. Alex Ignatov,> wrote: > > Can I list all WAL files in pg_xlog by using some sql query in Postgres? > > Try > > Select pg_ls_dir('pg_xlog'); Note that this currently requires superuser privileges. Given the usefulness of this specific query and that it could be used without risk of the user being able to gain superuser access through it, I'd like to see a new function added which does not have the superuser check, but is not allowed to be called by public initially either. Something along the lines of 'pg_xlog_file_list()', perhaps. There is a check in check_postgres.pl which could take advantage of this also. Should be a very straight-forward function to write, perhaps good as a starter project for someone. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] postgres_fdw and Kerberos authentication
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Jean-Marc Lessardwrites: > > A nice way to meet security requirements would be to provide single sign on > > support for the postgres_fdw. > > As long as you have defined a user in the source and destination databases, > > and configure the Kerberos authentication you should be able to use > > postgres_fdw. > > It's not really that easy, because postgres_fdw (like the server in > general) is running as the database-owner operating system user. > How will you associate a Postgres role that's responsible for a > particular connection request with some Kerberos credentials, > while keeping it away from credentials that belong to other roles? That's actually not that difficult and is something which Apache and mod_auth_kerb has been doing for a very long time. > This is certainly something that'd be useful to have, but it's not > clear how to do it in a secure fashion. The database owner operating system user has to be trusted, along with any superusers in the database, but if you assume those, then having PG manage the different Kerberos cache files (one for each backend which has authenticated via Kerberos and passed through delegation credentials) should work. Clearly, we can't give the user control over which credential cache to use. Having to trust the OS user and superusers with those credentials isn't any different from using passwords with postgres_fdw. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Clarify "allow_system_table_mods"
* Melvin Davidson (melvin6...@gmail.com) wrote: > Hmmm, if you go back a few comments, you will note that per initdb --help > there is no such option available. It's not an option *to* initdb, it's an option which is used *by* initdb. I'm afraid I'm done with this particular discussion. Hopefully it's purpose is now clear and you understand a bit better what is required to actually add a column to pg_class. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Clarify "allow_system_table_mods"
Melvin, * Melvin Davidson (melvin6...@gmail.com) wrote: > So in essence "*Allows modification of the structure of system tables" does > NOT allow _structural_ changes (probably only dml changes) > and the documentation should be changed to clarify. That would imply that other changes are acceptable. That is not the case. Certain DML changes could cause crashes too. The documentation is pretty clear that this option is for initdb, and not anyone else. I'm unconvinced that we need anything more. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Clarify "allow_system_table_mods"
* Melvin Davidson (melvin6...@gmail.com) wrote: > Then could you please clarify exactly what structural mods are permitted by > *"* > *Allows modification of the structure of system tables" ?* I would say, in short, those modifications which are implemented by PG developers who know what's safe to do with the catalog tables. Beyond that, I'm afriad you'd need to read the source code. I don't know offhand the complete set of "what's safe to do", though it's probably a pretty short list and certainly doesn't include adding columns. Ultimately, that switch isn't for end users to use to modify the catalogs. If you'd like to modify the structure of pg_class, you would start by looking at src/include/catalog/pg_class.h, though there's quite a few other bits that would need to change too. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Clarify "allow_system_table_mods"
David, Melvin, * David G. Johnston (david.g.johns...@gmail.com) wrote: > On Monday, April 25, 2016, Melvin Davidsonwrote: > > I need clarification on allow_system_table_mods parameter > > Per the documentation: > > *Allows modification of the structure of system tables.* This is used by > > initdb. This parameter can only be set at server start. > > > > However, attempting to modify pg_class to add another column fails with > > "STATUS_ACCESS_VIOLATION" as below. > > So either only certain system catalogs may be changed, or only certain > > types of structure changes are allowed. There might be some things about system catalogs you can modify, but generally speaking, you can't add a column or otherwise change the structure. System catalogs are represented in memory by C structures (and more), so it's not too surprising that adding a column causes a crash. > If it's going to prevent the action it should be able to do so without > crashing the server..and I doubt it's intended to be opinionated aside from > covering all DDL. What little there is on Google seems to support this. Sorry, but as a superuser, there's a lot of things you can do to crash the server, this is just one way. The effort required to prevent anything bad from happening when a user is running as a superuser is far from trivial. In short, no, you can't just add a column to pg_class via SQL, and I don't think we're going to be very interested in trying to "fix" such cases. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Enhancement Request
Rob, * Rob Brucks (rob.bru...@rackspace.com) wrote: > I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure > if this is the correct mailing list. So if it's not then please let me know > where I need to post this. This is the correct place. I don't know why people are suggesting third party sites, but the correct place is -general, as you've done, which is fantastic. > These are monitoring-centric enhancement requests since I'm trying to > implement accurate monitoring in a secure fashion. I've been working on exactly this problem and 9.6 will (finally) have the start of work to improve PostgreSQL in this area. Your thoughts and use-cases are exactly what we need to continue that effort and get to a point where monitoring solutions can depend on PostgreSQL to provide the features, capabilities, and information which they need, without requiring the monitoring user to be a superuser. > * General monitoring: > We have a need for a "monitoring" role in PostgreSQL that has read-only > access to any "pg_stat" view. As of 9.4, only a super-user can read all > columns of "pg_stat_activity", "pg_stat_replication", and "pg_stat_archiver" > (there may be other restricted views as well). These views provide critical > insight on how well the cluster is operating and what is going on. That was proposed and was called 'pg_monitor'. Unfortunately, through a lack of support and questions about such a role possibly being "too broad", it ended up not being included for 9.6. I'd very much like to work through those issues and find a solution for post-9.6 (note that we are past the feature freeze point for 9.6, so any further changes will be for later versions). > * Streaming Replication Monitoring: > Make the "pg_stat_replication" view more persistent (maybe keep the rows for > 24 hours or have a registration process?). I believe this is improved when working with replication slots in recent versions. > If anyone can provide insight on how I could accomplish these in a simple > manner by other means then I'm all ears! Please continue to engage with the PostgreSQL community on these issues. I agree that these are critical features which we really need to have and will continue to work on them, but support from users, particularly with detailed real-world use-caes, goes a very long way. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Postgresql 9.3.4 file system compatibility
* Albe Laurenz (laurenz.a...@wien.gv.at) wrote: > Marllius wrote: > > OCFS2 = oracle cluster file system 2 > > I think using OCFS2 for PostgreSQL data is a good idea if you want > to be the first at something or try to discover bugs in OCFS2. I've found that OCFS2 is a very decent clustered filesystem for smallish environments (4 or 8). We had trouble with larger clusters. > Why do you want a cluster file system for PostgreSQL? You cannot > have more than one server access the same data at the same time > anyway. This is probably the better question. Using a clustered filesystem can be handy for redundancy, and you could use OCFS2 or DRDB for that, but don't expect to be able to run multiple PG servers concurrently from the same set of data files. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Really unique session ID - PID + connection timestamp?
* Christoph Berg (c...@df7cb.de) wrote: > Re: Durumdara 2016-04-09 >> > In MS the session id is smallint, so it can repeats after server restarts, > > but my coll. found a "session creation timestamp". > > This is a key which unique. > > With this we can check for died sessions and we can clean their records. > > > > We want create same mechanism. > > I know there are adv. locks in PG, but I want to use session id. > > > > This could be: > > pg_backend_pid() > > > > May pid repeats. > > Where I can get timestamp or some other unique data with I can create a > > combined primary key? > > Btw, what you are describing is exactly what %c in log_line_prefix > does. That's not currently exposed at an SQL level anywhere though, is it? Perhaps we should add a way to get that. Another thought would be to essentially expose 'log_line_prefix()' at the SQL level but I'm not sure that generally makes sense. We could also have an explicit function to get MyStartTime, similar to pg_backend_pid(). Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Really unique session ID - PID + connection timestamp?
Greetings, * Durumdara (durumd...@gmail.com) wrote: > Where I can get timestamp or some other unique data with I can create a > combined primary key? Not entirely sure about the rest of it, but if you want backend start time, you can look at pg_stat_activity (which also happens to have the pid). If you want just your own, combine it with pg_backend_pid, as in: select pid || ',' || backend_start from pg_stat_activity where pid = pg_backend_pid(); Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive
David, * David G. Johnston (david.g.johns...@gmail.com) wrote: > On Thu, Mar 24, 2016 at 4:51 AM, Stephen Frost <sfr...@snowman.net> wrote: > > I don't see any reason why the patch itself would be terribly difficult, > > but are we sure we'd want just ANALYZE and not VACUUM also? Which would > > have to be another bit, since those are pretty different actions. > > In the limited experience that prompted this requested the benefit of > performing a VACUUM is significantly less than the benefit of performing > ANALYZE, and the cost of the former is considerably higher. I'm quite > content to leave VACUUM decisions to the auto-vacuum process which balances > the benefit of removing bloat with the I/O cost of doing so. I guess I don't entirely follow that logic. autovacuum, even though it's name doesn't imply it, is *also* quite responsible for ensuring that ANALYZE is done regularly on the tables and even has options to control when ANALYZE is run which would it to run more frequently than vacuums. Further, a lot of ETL could have very good reason to want to run a VACUUM, especially with the changes that we continue to make which make that process less and less expensive of an operation to run. > > The question really is- what other things might we want as grantable > > rights in the future? Once these 16 bits are gone, it's a whole bunch > > of work to get more. > > If I am reading parsenodes.h correctly we presently use only 12 of 16 bits > and those that are present all seem ancient. With no other existing need > to add a single additional grantable option, let alone 4, I'm not see this > as being particularly concerning. They're not all ancient- TRUNCATE was added not that long ago and took quite a few years of convincing before it was accepted (I asked for it when I first started working on PG, some 15-or-so years ago and it wasn't actually included until 3 or 4 years ago, iirc). Further, as we add new features, new kinds of GRANTs can be needed. Consider the case of auditing, for example. When we finally get around to adding support for proper in-core auditing, it may be desirable for individuals other than the owner of a relation to be able to control the auditing of the table. > Let someone else argue for inclusion of VACUUM before considering adding it > - all I believe that we need is ANALYZE. I want programs doing ETL to be > able to get the system into "good-enough" shape to be functional; > maintenance processes can deal with the rest. ANALYZE is a maintenance process too, really, so I don't entirely buy your argument here. Either we support having these maintanence-type actions being performed by non-owners, or we don't and encourage everyone to configure autovacuum to meet their needs. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive
David, * David G. Johnston (david.g.johns...@gmail.com) wrote: > Which means that, aside from effort, the main blocking factors here are > code complexity (which I understand) and limited grant "bits" as Stephen > puts it. So I pose the question: do any of the committers consider a grant > bit too valuable to consume on an ANALYZE grant? I wasn't referring to "bits" as "things" to do but rather as actual zeros and ones- AclMode is a 32bit integer, of which the bottom half are 'regular' grantable rights and the top half are "with grant option" indications, meanly we've only got 16 to work with, and every object uses AclMode, so we have to support *all* kinds of GRANTs with those 16 bits. See src/include/nodes/parsenodes.h, around line 63. > If that and/or general code complexity means this will not be added even if > a patch was proposed for 9.7 then I'll move on and institute one of the > hacks that has been proffered. Otherwise I have (more than) half a mind to > find some way to get a patch written. I don't see any reason why the patch itself would be terribly difficult, but are we sure we'd want just ANALYZE and not VACUUM also? Which would have to be another bit, since those are pretty different actions. The question really is- what other things might we want as grantable rights in the future? Once these 16 bits are gone, it's a whole bunch of work to get more. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] UPSERT and HOT-update
Daniel, * CHENG Yuk-Pong, Daniel (j16s...@gmail.com) wrote: > I am doing a INSERT...ON CONFLICT...UPDATE.. on a table. The query is > mostly-UPDATE and does not change any columns most of the time, like > so: > > CREATE INDEX ON book(title); > INSERT INTO book (isbn, title, author, lastupdate) VALUES ($1,$2,$3, now()) > ON CONFLICT (isbn) DO UPDATE set title=excluded.title, > author=excluded.author, lastupdate=excluded.lastupdate; > > PostgreSQL seems to consider the title as changed and refused to do a > HOT-update. It works if I remove the `title=...` part. > > Are there any tricks to make it smarter? The title don't change most > of the time after all. If it's really that infrequent for the title to change, you could do something like: insert into book (isbn, title, author, lastupdate) values ('$1','$2', '$3', now()) on conflict (isbn) do update set author=excluded.author, lastupdate = excluded.lastupdate where book.title = excluded.title; and if that doesn't change any rows then the title did change and you need to run the command you have above. What might be kind of neat would be to have multiple UPDATE clauses allowed for the INSERT .. ON CONFLICT DO UPDATE and then you could have different WHERE clauses and do it all in one command. Another interesting idea would be a different kind of 'UPDATE SET' operation (maybe '*=' or something?) which says "only change this if the value actually changed." There's clearly a lot of cases where that use-case is desired. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive
* David G. Johnston (david.g.johns...@gmail.com) wrote: > On Mon, Feb 29, 2016 at 6:31 AM, Stephen Frost <sfr...@snowman.net> wrote: > > > * David G. Johnston (david.g.johns...@gmail.com) wrote: > > > Given the amount of damage a person with write access to a table can get > > > into it seems pointless to not allow them to analyze the table after > > their > > > updates - since best practices would say that normal work with a table > > > should not be performed by an owner. > > > > > > I should the check for whether a given user can or cannot analyze a table > > > should be whether the user has INSERT, UPDATE, or DELETE privileges. > > > > Realistically, ANALYZE is a background/maintenance task that autovacuum > > should be handling for you. > > Then my recent experience of adding a bunch of records and having the > subsequent select query take forever because the table wasn't analyzed is > not supposed to happen? What am I doing wrong then that autovacuum didn't > run for me? Perhaps nothing. Making autovacuum more aggressive is a trade-off and evidently there weren't enough changes or perhaps not enough time for autovacuum to realize it needed to kick in and re-analyze the table. One thought about how to address that might be to have a given backend, which is already sending stats info to the statistic collector, somehow also bump autovacuum to wake it up from its sleep to go analyze the tables just modified. This is all very hand-wavy as I don't have time at the moment to run it down, but I do think it'd be good to reduce the need to run ANALYZE by hand after every data load. > > > I suppose row-level-security might come into play here... > > > > Yes, you may only have access to a subset of the table. > > > > > TBH, since you cannot see the data being analyzed I don't see a security > implication here if you allow someone to ANALYZE the whole table even when > RLS is in place. I wasn't looking at it from a security implication standpoint as I suspect that any issue there could actually be addressed, if any exist. What I was getting at is that you're making an assumption that any user with DML rights on the table also has enough information about the table overall to know when it makes sense to ANALYZE the table or not. That's a bit of a stretch to begin with, but when you consider that RLS may be involved and the user may only have access to 1% (or less) of the overall table, it's that much more of a reach. > If we had plenty more bits to allow ANALYZE to be independently > > GRANT'able, then maybe, but those are a limited resource. > > > > The planner and system performance seems important enough to give it such > a resource. But as I stated initially I personally believe that a user > with INSERT/DELETE/UPDATE permissions on a table (maybe require all three) > should also be allowed to ANALYZE said table. I don't think requiring all three would make any sense and would, instead, simply be confusing. I'm not completely against your general idea, but let's keep it simple. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive
* David G. Johnston (david.g.johns...@gmail.com) wrote: > Given the amount of damage a person with write access to a table can get > into it seems pointless to not allow them to analyze the table after their > updates - since best practices would say that normal work with a table > should not be performed by an owner. > > I should the check for whether a given user can or cannot analyze a table > should be whether the user has INSERT, UPDATE, or DELETE privileges. Realistically, ANALYZE is a background/maintenance task that autovacuum should be handling for you. > I suppose row-level-security might come into play here... Yes, you may only have access to a subset of the table. If we had plenty more bits to allow ANALYZE to be independently GRANT'able, then maybe, but those are a limited resource. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] "plan should not reference subplan's variable" when using row level security
* Adam Guthrie (asguth...@gmail.com) wrote: > On 24 February 2016 at 20:27, Stephen Frost <sfr...@snowman.net> wrote: > > Yeah, looks like a bug to me. My gut reaction is that we're pulling up > > a subquery in a way that isn't possible and that plan shouldn't be > > getting built/considered. > > Thanks - shall I go ahead and submit a bug report? Sure. > > As a work-around, until we fix it, you could create an sql function to > > check for the existance of the id in 'a' and use that in the policy > > definition. > > I've also discovered that using the following policy instead > > CREATE POLICY a_select ON b FOR SELECT > USING ( a_id IN (SELECT id FROM a) ); > > also seems to work around the issue. Yes, that also works, but it could get painful if 'a' gets large. An SQL function like: select exists (select * from a where a.id = $1); Would still use an indexed lookup against 'a'. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] "plan should not reference subplan's variable" when using row level security
Adrian, * Adrian Klaver (adrian.kla...@aklaver.com) wrote: > I started to work through this when I realized the > permissions/attributes of the role test are not shown. This seems to > be important as the UPDATE example works if you run it immediately > after: > > INSERT INTO b (id, a_id, text) VALUES (1, 1, 'one'); Not sure what you mean- the necessary permissions for the test role are just the 'GRANT ALL ON ALL TABLES' which is included. If you run the UPDATE immediately after the INSERT, then it's before that GRANT and, more importantly, before the 'SET ROLE', meaning that you're running it as the table owner, and the policy is ignored (policies are not applied to the owner of the table, unless FORCE RLS is used). Thanks! Stephen signature.asc Description: Digital signature