from:"Stephen Frost"

[GENERAL] Migration to pglister - Before

2017-11-20 Thread Stephen Frost

Greetings,

We will be migrating these lists to pglister in the next few minutes.

This final email on the old list system is intended to let you know
that future emails will have different headers and you will need to
adjust your filters.

The changes which we expect to be most significant to users can be found
on the wiki here: https://wiki.postgresql.org/wiki/PGLister_Announce

Once the migration of these lists is complete, an 'after' email will be
sent out.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Query on pg_settings view

2017-11-15 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> Ronen Nofar wrote:
> > I have a weird case when running a query on  the pg_settings view.
> > I have two users, first one is the default user - postgres which is a 
> > superuser
> > and another one is a role which i had created, i called it test_role and 
> > it's not a superuser.
> > When I run a select on pg_settings with these two users I recieve different 
> > results.
> 
> That is intentional, because some settings should only be
> visible for superusers, for example everything that has to
> do with the operating system (location of configuration file
> or socket directories).

This isn't quite correct any longer- with PG10, we have a default role
called 'pg_read_all_settings' which can be GRANT'd to other roles to
allow viewing of all settings, even those previously restricted to
superuser-only.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] missing public on schema public

2017-11-14 Thread Stephen Frost

Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> =?iso-8859-1?Q?Bo_Thorbj=F8rn_Jensen?=  writes:
> > I have some additional info and a fix.
> > Firstly steps to reproduce:
> 
> Yeah, I can reproduce this.  I suspect it got broken by Stephen's hacking
> around with default ACLs.  A simple example is

Yes, it's related to the work I did with pg_dump's ACL handling, because
we're no longer just always including the whole revoke/grant set of ACLs
for everything in the output.

> $ pg_dump -c -U postgres postgres | grep -i public
> DROP SCHEMA public;
> -- Name: public; Type: SCHEMA; Schema: -; Owner: postgres
> CREATE SCHEMA public;
> ALTER SCHEMA public OWNER TO postgres;
> -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres
> COMMENT ON SCHEMA public IS 'standard public schema';
> -- Name: public; Type: ACL; Schema: -; Owner: postgres
> GRANT ALL ON SCHEMA public TO PUBLIC;
> 
> That's fine, but if I shove it through an archive file:

This works because I added into pg_dump.c a check based on if the output
is clean (and therefore the public schema is being recreated or not).

In hindsight, that wasn't really the right thing to do because it ends
up only working when pg_dump is run with -c and doesn't consider the
case where pg_dump is run without -c but pg_restore is.

> $ pg_dump -f p.dump -Fc -U postgres postgres
> 
> $ pg_restore -c p.dump | grep -i public

This doesn't work because pg_dump isn't run with -c, while pg_restore
is.  If the archive is created with pg_dump -c (as the above was), then
the results match up between the two runs.  Note also that if pg_dump is
run with -c then a pg_restore without -c would actually still include
the GRANT statement, which isn't really correct either.

That's obviously a change from what we had before and wasn't
intentional.

> This is *REALLY BAD*.  Quite aside from the restore being wrong,
> those two sequences should never ever give different results.
> Stephen, you put some filtering logic in the wrong place in pg_dump.

I do wish it was that simple.

Unfortunately, the public schema is just ridiculously special, both in
the way it's a 'user' object but is created by initdb and that it's got
special non-default ACLs on it and how it has explicit special code to
skip over it when a restore is happening, unless -c is used.

What I'm afraid we need to do here is basically continue to hack on that
code in pg_backup_archiver.c's _printTocEntry() to teach it to issue the
default GRANT ALL ON SCHEMA public TO PUBLIC; when we are processing the
TOC entry for CREATE SCHEMA public;.

That would make the recreation of the public schema when pg_dump or
pg_restore is being run with -c actually match how the public schema is
created by initdb, and the rest would end up falling into place, I
think.

One complication, however, is what happens when a user drops and
recreates the public schema.  If that's done, we'll end up not dumping
out the delta from the public schema's initial ACLs, which wouldn't be
correct if you're restoring into a newly initdb'd cluster.  I'm thinking
that we need to forcibly look at the delta from
public-as-installed-by-initdb and whatever-public-is-now, regardless of
if the public schema was recreated by the user or not, because on
restore we are expecting a newly initdb'd cluster with the public schema
as originally installed (or as installed by pg_dump/pg_restore following
the logic above).

I'll play around with this approach and see if things end up working out
in a better fashion with it.  Baking this knowledge into
pg_backup_archiver.c is certainly ugly, but handling of public has
always been hard-coded into that, and we even added more special
handling to that code 10 years ago to deal with the COMMENT on the
public schema, so this is really just more of the same.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Roles inherited from a role which is the owner of a database can drop it?

2017-10-31 Thread Stephen Frost

Greetings,

* Ivan Voras (ivo...@gmail.com) wrote:
> On 30 October 2017 at 22:10, David G. Johnston 
> wrote:
> > Not quite following but ownership is an inheritable permission;
> 
> Basically, I'm asking if "ownership" can be revoked from the set of
> inherited permissions? If there is a role G which is granted to role A, and
> G is the owner of a database, can A be made to not be able to do what only
> owners can (specifically in this case, drop databases)?

No, that's exactly what role membership means- you have the same rights
as the other role.

> > and even if it was not SET ROLE is all that would be required.  Any owner
> > can drop an object that it owns.
> 
> It's kind of the reverse: I'm wondering if ownership can be made
> un-inheritable.

No, because even if ownership wasn't inheritable the user would simply
do 'SET ROLE owner;' and then have all of the ownership rights that way.

> Just considering the case of dropping databases for now. I.e. let the
> developers do everything except that. It's a start.

I think you're assuming far too much about what being a database owner
means- I'd suggest you really think about why the developers need to be
database owners at all; in other words- what's the *other* privilege
that's currently only available to database owners that you need
developers to be able to do?

I have a hunch that it might be GRANT'ing rights on the database, but
there's only a couple such rights (eg: CONNECT) and you might be better
off managing those in another way.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_audit to mask literal sql

2017-10-31 Thread Stephen Frost

Greetings,

* rakeshkumar464 (rakeshkumar...@outlook.com) wrote:
> By mask I mean pgaudit should log where ssn = '123-456-7891' as where ssn =
> '?' 

Data masking really isn't part of auditing, and so even if pgaudit could
do so, that wouldn't really be the right place to make it happen.

There have been discussions about data masking previously but they
haven't really lead anywhere.  Having proper auditing capabilities built
into the backend and then a way to classify errors (such as syntax error
or other issue where we couldn't tell what the query actually was due to
a user fat-fingering something) as 'not to be logged' would at least get
us closer to your goal of not wanting sensitive data in the log files,
but PG isn't there yet.

That said, there are quite a few people who do use PG with HIPPA and
address the requirements required for it in other ways (as discussed
elsewhere on this thread).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Backup strategy using 'wal_keep_segments'

2017-10-31 Thread Stephen Frost

Greetings,

* Rhhh Lin (ruanline...@hotmail.com) wrote:
> I would actually be an advocate for using a proper archive_command in order 
> to facilitate a proper (Per the documentation) PITR and backup strategy.

Glad to hear it.

> However, a colleague had suggested such a creative approach (Possibly, less 
> administrative overhead, ease of maintenance? I'm not sure) and I struggled 
> to find any evidence online in blogs/white-papers/documentation that this was 
> a feasible approach. That said, I couldn't find any info rejecting it as a 
> method either, which led me to post here. Essentially, this was a difference 
> of opinion on approach, and I was looking to gather information in order to 
> make an informed opposing argument.

I'd strongly suggest considering tried-and-true approachs instead of
coming up with something novel, particularly when it comes to backups
and recovery.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Backup strategy using 'wal_keep_segments'

2017-10-30 Thread Stephen Frost

Greetings,

* Rhhh Lin (ruanline...@hotmail.com) wrote:
> A colleague recently suggested that instead of implementing an 
> 'archive_command' to push archivable WALs to a secondary location (for 
> further backup to tape for example), we could instead persist the WAL files 
> in their current location by setting the "wal_keep_segments" parameter to an 
> extreme value e.g. 1000 and have the 'archive_command' do nothing.

Michael's points are good and I wouldn't recommend using this archive
command either, but what isn't clear to me is what you're actaully
trying to solve by using such a method..?  You haven't said anywhere
what's wrong with archive_command (I know that there certainly are some
things wrong with it, of course, but there are solutions to a number of
those issues that isn't a hack like this ...).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pgpass file type restrictions

2017-10-19 Thread Stephen Frost

Matt,

* Desidero (desid...@gmail.com) wrote:
> I agree that it would be better for us to use something other than LDAP,

If you happen to be using Active Directory, then you should really be
using Kerberos-based auth instead.  AD includes both LDAP and a KDC and
the LDAP half is really *not* the way to handle authentication in that
environment (or, well, really, LDAP isn't a terribly secure option in
any environment, but if it's all you've got and you're not allowed to
change then I suppose there's not much to be done about it).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Permissions for Web App

2017-10-10 Thread Stephen Frost

Greetings,

* Igal @ Lucee.org (i...@lucee.org) wrote:
> It worked, thanks!

Be sure to check that you're really getting what you want here.

> For future reference and for the benefit of others, the command that
> I ran is:
> 
>   ALTER DEFAULT PRIVILEGES IN SCHEMA public
>     GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO webapp;

Note that DEFAULT PRIVILEGES apply to a specific role (in the above
case, whatever CURRENT_USER is) and therefore will only be helpful if
you only have one user creating tables, in this case.

There's been a lot of questions and discussions about having an actual
'read only' role that can be granted out.  Now that we've actually got
default roles in PG, this is something that becomes at least somewhat
more practical and might be able to happen for PG11 if there's interest
and effort put into it.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup naming conventions

2017-10-10 Thread Stephen Frost

Greetings,

* mj0nes (matthew.jo...@ramtech.co.uk) wrote:
> I'm just starting out on a rolling backup strategy and the naming convention
> has thrown me slightly for the WAL and "backup_label" files.
> 
> What I want to do is pair up the backup label files with the associated tar
> ball of the data directory. When I first ran pg_start_backup('label'), I
> naively thought a file would be created named label. Maybe not that naive
> considering the documentation:
> 
> /pg_start_backup creates a backup label file, *called backup_label*, in the
> cluster directory with information about your backup, including the start
> time and label string./

That's an interesting point, but not particularly relevent any more as
we've deprecated that approach to doing backups now because of issues
with the backup_label file and what happens if the system is restarted
during a backup.

> Thanks for any pointers.

Frankly, developing your own backup software for PG isn't trivial and
isn't something I'd recommend.  There's a number of good tools out there
which have been developed over years to work well to specifically
address PG backups and to do things correctly (such as making sure WAL
archiving is working and that WAL files are sync'd to disk before
telling PG that it's been copied).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Using cp to back up a database?

2017-10-10 Thread Stephen Frost

Ron,

* Ron Johnson (ron.l.john...@cox.net) wrote:
> Maybe my original question wasn't clear, so I'll try again: is it
> safe to do a physical using cp (as opposed to rsync)?

Frankly, I'd say no.  There's nothing to guarantee that the backup is
actually sync'd out to disk.  Further, you're probably in the same boat
when it comes to WAL archiving (just using 'cp' there isn't good for the
same reason), though that's arguably worse because once you return true
from archive_command, that WAL file will be gone and you could end up
with a hole in your WAL stream.

Next, this backup method has been deprecated because of issues with the
backup_label file and what happens when the system crashes during a
backup.

So, no, you shouldn't be just using 'cp', or 'rsync', or any of those
similar, simple, tools for doing a proper PG backup.  Use a tool which
has been developed specifically for PG such as pgBackRest, barman,
WAL-E, WAL-G, etc.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost

Greetings John,

* John R Pierce (pie...@hogranch.com) wrote:
> On 9/20/2017 6:55 AM, Stephen Frost wrote:
> >If AD is in the mix here, then there's no need to have things happening
> >at the database level when it comes to passwords- configure PG to use
> >Kerberos and create a princ in AD and put that on the database server
> >and then users can authenticate that way.
> 
> for web apps?   how does a web browser do kerberos over http ?

Yes, and using SPNEGO, which the major browsers have all supported for a
very long time.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost

John,

* John R Pierce (pie...@hogranch.com) wrote:
> On 9/19/2017 3:32 PM, chiru r wrote:
> >How those application accounts get recognized in database?
> >
> >Let say  App_user1 authenticated through application ,after that
> >how the App_user1 get access to DB?
> >
> >can you please provide more information ,how the app users are
> >accessing database ?
> 
> the user isn't accessing the database, the application itself is
> accessing the database.   a web app might have 10s of 1000s of
> unique users, but a web app typically only uses a single application
> account to access the database.

While that's one approach, it's certainly not the only one (and not
necessairly considered a good approach either..).

If AD is in the mix here, then there's no need to have things happening
at the database level when it comes to passwords- configure PG to use
Kerberos and create a princ in AD and put that on the database server
and then users can authenticate that way.

Further, applications can be configured to accept and proxy Kerberos
credentials from the user to authenticate with to the database, allowing
the application to actually be logged in as the user for that session,
with only those rights the user has.

Group/Role membership does still have to be sync'd between the AD/LDAP
directory and the database for those permissions to be set up, but that
isn't all that hard to do using a cronjob and ldap_fdw, or similar,
people just have to realize that there's a bit of lag.  The same goes
for creating accounts in the first place in the database, of course.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-20 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Sep 19, 2017 at 01:28:11PM -0400, Stephen Frost wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > chiru r <chir...@gmail.com> writes:
> > > > We are looking  for User profiles in ope source PostgreSQL.
> > > > For example, If a  user password failed n+ times while login ,the user
> > > > access has to be blocked few seconds.
> > > > Please let us know, is there any plan to implement user profiles in 
> > > > feature
> > > > releases?.
> > > 
> > > Not particularly.  You can do that sort of thing already via PAM,
> > > for example.
> > 
> > Ugh, hardly and it's hokey and a huge pain to do, and only works on
> > platforms that have PAM.
> > 
> > Better is to use an external authentication system (Kerberos, for
> > example) which can deal with this, but I do think this is also something
> > we should be considering for core, especially now that we've got a
> > reasonable password-based authentication method with SCRAM.
> 
> Does LDAP do this too?

Active Directory does this, with Kerberos as the authentication
mechanism.  Straight LDAP might also support it, but I wouldn't
recommend it because it's really insecure as the PG server will see the
user's password in the cleartext (and it may be sent in cleartext across
the network too unless careful steps are taken to make sure that the
client only ever connects over SSL to a known trusted and verified
server).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] [HACKERS] USER Profiles for PostgreSQL

2017-09-19 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> chiru r  writes:
> > We are looking  for User profiles in ope source PostgreSQL.
> > For example, If a  user password failed n+ times while login ,the user
> > access has to be blocked few seconds.
> > Please let us know, is there any plan to implement user profiles in feature
> > releases?.
> 
> Not particularly.  You can do that sort of thing already via PAM,
> for example.

Ugh, hardly and it's hokey and a huge pain to do, and only works on
platforms that have PAM.

Better is to use an external authentication system (Kerberos, for
example) which can deal with this, but I do think this is also something
we should be considering for core, especially now that we've got a
reasonable password-based authentication method with SCRAM.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Logging the fact that a log was shipped

2017-08-28 Thread Stephen Frost

* Christoph Moench-Tegeder (c...@burggraben.net) wrote:
> ## Stephen Frost (sfr...@snowman.net):
> 
> > Worse, such scripts run the serious risk of losing WAL if a crash
> > happens because nothing is ensuring that the WAL has been sync'd to disk
> > before returning from the archive_command.
> 
> That risk already exists when using rsync/scp/scp/... and should be
> mitigated by filesystem settings on the receiving side.

I was including rsync/scp/similar based tools, yes, just pointing out
that such tools should be avoided when doing PG backups and WAL
archiving.

I have a hard time seeing "require filesystems be mounted as sync" to
really be a viable solution, though I suppose it would be technically
correct.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Logging the fact that a log was shipped

2017-08-28 Thread Stephen Frost

* Christoph Moench-Tegeder (c...@burggraben.net) wrote:
> ## Ron Johnson (ron.l.john...@cox.net):
> 
> > How is this done in v8.4?  (I tried adding "date; rsync ..." but pg
> > didn't like that *at all*.)
> 
> There's a DEBUG1-level log message on successful archive_command
> completion - that would give you a lot of other low-prio log
> messages wich you probably don't care about.
> I'd put a wrapper around your rsync (a short shell script
> would be sufficient) in the lines of rsync ... && logger "done",
> that way you'd get the information via syslog.
> On the other hand, do you really need the details about each WAL
> segment? Since 9.4 there's pg_stat_wal_archiver... You're really
> making your job harder than it needs to be with that ancient
> PostgreSQL...

Worse, such scripts run the serious risk of losing WAL if a crash
happens because nothing is ensuring that the WAL has been sync'd to disk
before returning from the archive_command.

Most of the existing tools for dealing with WAL archival (pgbackrest,
barman and WAL-E, at least) already log successful and unsuccessful
archive command runs.  I'm pretty sure barman supports back to 8.4 and I
know pgbackrest does.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] PG and database encryption

2017-08-23 Thread Stephen Frost

Greetings,

* Scott Marlowe (scott.marl...@gmail.com) wrote:
> On Tue, Aug 22, 2017 at 3:13 PM, PT  wrote:
> > On Tue, 22 Aug 2017 12:48:13 -0700 (MST)
> > rakeshkumar464  wrote:
> >> We have a requirement to encrypt the entire database.  What is the best 
> >> tool
> >> to accomplish this. Our primary goal is that it should be transparent to 
> >> the
> >> application, with no change in the application, as compared to un-encrypted
> >> database. Reading about pgcrypto module, it seems it is good for few 
> >> columns
> >> only and using it to encrypt entire database is not a good use-case.
> >>
> >> Is this which can be done best by file level encryption?  What are the good
> >> tools on Linux (RHES), preferably open-source.

There's been discussion and work to add the ability to have encryption
at the database instance level instead requiring on filesystem-level or
harddisk-level encryption, but nothing has gone into PG yet to support
this.  I'm hopeful that there will be progress towards this in the next
round of PG development, ideally landing in PG11.

> > On multiple occasions, I demonstrated that an unecrypted database was the
> > least likely disclosure vector for sensative data, and that we shouldn't
> > waste any time on it until we had ensured that all other breach vectors had
> > been fixed.  Over the course of 4 years at that job, we never managed to get
> > all the other (more likely) breach vectors secured.

While that's interesting, I don't find it particularly relevant- fix the
breach vectors you can reasonably.  If you're worried that backups or
hard drives might not be properly handled and may fall into an
attacker's hands and adding encryption to address that is
straight-forward and simple to do then it's entirely reasonable to do
so, even if your application isn't as secure as you'd like it to be.

> > As a result, my opinion would be that you use filesystem encryption. It's
> > very efficient, low management overhead, and proven technology that doesn't
> > interfere with anything else you're doing. You can then check that box on
> > whatever form you have to fill out and the beaurocrats will leave you alone.
> > On top of that, it effectivley protects againts possible breach vectors that
> > don't require changing the application.

If PG-level encryption was also efficient, low management overhead and
based on proven technology that didn't interfere with anything else then
it would be an entirely good thing to consider doing.  We don't have
that today, but that doesn't make it unreasonable to ask for.

> > Real security will require changing the application. But take my word for 
> > it,
> > nobody wants to hear the list of breach vectors that can only be fixed by
> > modifying the application. Because people aren't interested in real 
> > security,
> > they're just interested in checking boxes on a form.

If it's trivial to address a certain set of attack vectors, then it
makes sense to do so, even if it's a relatively unlikely vector to be
hit with.  I agree that non-trivial efforts to address unlikely attack
vectors is a bad idea when there are more likely vectors to consider
addressing first.

> This. Without a much stricter definition of the attack vectors you're
> trying to defeat "encrypt the whole database" is a very hand-wavy
> proposition. Are you protecting against people getting into the data
> center and stealing your hard drives? Rogue applications getting
> access to the db? Someone sniffing the passwords or unencrypting them
> on the servers etc etc.

This continues to come up in discussions which outline the attack
vectors that database-level encryption addresses.  We know the vectors
which database/filesystem-level encryption addresses, we don't need to
beat up people asking for it by telling them that they don't know- let's
just make it clear that there's only a subset of vectors addressed by
database of filesystem encryption and ensure that everyone understands
what those are and then individuals can weigh the choice themselves as
to if it makes sense for their particular situation to address those
vectors through database/filesystem-level encryption or not.

> OP: It's just generic a requirement to take seriously. Sit down, come
> up with possible attack vectors and possible ways to thwart them.

This is certainly a good exercise, but "someone stole the backups" is
absolutely an entirely reasonable attack vector to consider and
database-level encryption is one approach which could (if implemented
properly) address that vector.  There are certainly other approaches to
address that vector as well, of course, such as using backup technology
which provides its own encryption, though that requires managing a
different set of keys possibly, or run the backup through GPG or similar
but that gets painful quickly, et al.

> Security isn't something you do one time and you're done, it's a
> constant process of design,

Re: [GENERAL] Planner statistics usage for composite type

2017-07-19 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Dmitry Lazurkin  writes:
> > I am trying to find workaround for cross-column statistics.
> > ...
> > Worn estimate. Planner doesn't use statistics. In code I see usage of
> > function scalargtsel which returns default selectivity because
> > ROW('tag1', 0.9)::tag_sim is not Const.
> 
> Yeah, that's because eval_const_expressions doesn't know how to fold
> a constant RowExpr to a simple Const.  I have a patch laying about
> someplace to improve that, but I keep forgetting about it until
> we're in beta phase :-(.  In the meantime you could perhaps do the
> folding by hand:

Put it into the CF..?

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] LDAP authentication without OU in ldapbasedn

2017-07-13 Thread Stephen Frost

Greetings,

* Gregory Nicol (gregory.ni...@medbank.com.mt) wrote:
> I can't seem to get LDAP Authentication working without an OU in the 
> ldapbasedn. My users are spread across multiple OUs without a common root OU 
> which is why I'm trying to authenticate with just the DC.

As it looks like you're working in a Microsoft Windows AD environment,
I'd strongly suggest you consider using Kerberos/GSS authentication
instead of LDAP.  With LDAP, the user has to constantly re-type their
password and the password is sent to the PostgreSQL server.  Neither of
these are ideal and both are avoided by simply using Kerberos, which is
what AD uses.

Authentication using LDAP really shouldn't ever be done in an
environment which has Active Directory.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Associating a basebackup and it's .backup WAL file

2017-07-13 Thread Stephen Frost

Greetings,

* cen (imba...@gmail.com) wrote:
> Seems like a lot of manual work to me, to automate it I'd basically
> have to diff both directories and then copy only the newest
> differences over to the recovery. So far I was unable to find a
> supersecret git repo with bash scripts accomplishing this tasks
> which is surprising.

I would hazard to suggest that's because people have realized that bash
scripting backup and recovery of PostgreSQL, and ensuring that all of
the appropriate precautions are taken and checks are done, is next to
impossible; were it to be done the scripts would be more complex and
challenging to use than any of the existing solutions.

I'd strongly suggest you consider one of the maintained backup solutions
that have already been written instead of inventing yet another one.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot

2017-07-10 Thread Stephen Frost

Greetings,

* mariusz (mar...@mtvk.pl) wrote:
> On Tue, 2017-07-04 at 17:55 -0400, Stephen Frost wrote:
> > > How is this done inside a shell script?
> > 
> > Generally, it's not.  I suppose it might be possible to use '\!' with
> > psql and then have a shell snippet that looks for some file that's
> > touched when the snapshot has finished, but really, trying to perform a
> > PG backup using hacked together shell scripts isn't recommended and
> > tends to have problems.
> 
> not saying to use scripts or pg tools, but if OP needs a script, may
> consider perl module IPC::Run instead of shell script - easy to use
> interacting with subprocesses over filehandles or pipes (even supports
> ptys). that way you can keep your active psql session, pass input to it,
> view and process it's output, while doesn't block you from doing other
> things meanwhile
> 
> of course it's perl, not shell, but looks "scripty" and can do what OP
> wants 

Sure, that's more-or-less what pgBackRest does (which is Perl, mostly).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot

2017-07-05 Thread Stephen Frost

Melvin,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> On Wed, Jul 5, 2017 at 10:14 AM, Stephen Frost <sfr...@snowman.net> wrote:
> >Part of my concern is that such a script is unlikely to show any problems
> until it comes time to do a restore
> As previously stated, the script was used to set up a slave and has done so
> successfully many times. There are subsequent scripts
> that check results.

Ah, the impression was that this was being suggested as a way of
performing regular backups.  For simply creating a replica,
pg_basebackup works pretty well for small systems.  For larger
databases, being able to perform parallel backup/restore is very useful,
even if it's just for building a replica.

> >What might be worse would be to pull the plug while the backup is running
> and then try to bring the primary back online.
> Uh, whom would be so stupid as to do that?

Pulling the plug in the middle of various operations is a good way to
simulate what happens if the system crashes, for whatever reason, and to
make sure that processes and procedures are in place to address such a
failure scenario.

> >Right, there's little sense in trying to perfect a shell script when
> proper solutions exist.
> >>It's better to create something that others criticise than to create
> nothing and criticise others. Go create, have fun!!

Indeed, I'd certainly welcome criticism of pgBackRest.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot

2017-07-05 Thread Stephen Frost

Greetings,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> Stephen,
> >This script is a good example of why trying to take a PG backup using
> shell scripts isn't a good idea.
> 
> Your criticism is noted, however, I have used it many times in the past
> with absolutely no problem. I submitted that script as a possible solution
> to the op's problem/question. If you have an alternate solution or can make
> improvements to it, then I am sure the op and I would welcome them.

Part of my concern is that such a script is unlikely to show any
problems until it comes time to do a restore- it could be failing now
due to the issues I noted previously without any obvious error being
thrown but with the resulting backup not being viable.  Hopefully that
isn't the case and ideally you're performing test restores of each
backup you take to ensure that it works.

Further, it doesn't address the OP's question, which was specifically
how to avoid using the now-deprecated exclusive backup method that the
script you posted uses.

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Wed, Jul 5, 2017 at 10:47 PM, Melvin Davidson  wrote:
> > Your criticism is noted, however, I have used it many times in the past 
> > with absolutely no problem.
> 
> Plug off the server on which is stored the backup just after your
> script finishes, you have a good chance to be surprised if you try to
> restore from this backup later on.

What might be worse would be to pull the plug while the backup is
running and then try to bring the primary back online. :/  That issue is
part of why the API used in this script is now deprecated.

> > I submitted that script as a possible solution
> > to the op's problem/question. If you have an alternate solution or can make 
> > improvements to it, then I am sure the op and I would welcome them.
> 
> Stephen has mentioned two of them, with hundreds of man hours spent in
> developing those backup tools to be robust solutions, done by
> specialists on the matter.

Right, there's little sense in trying to perfect a shell script when
proper solutions exist.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot

2017-07-05 Thread Stephen Frost

Greetings,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> On Tue, Jul 4, 2017 at 5:55 PM, Stephen Frost <sfr...@snowman.net> wrote:
> > I'd recommend considering one of the existing PG backup tools which know
> > how to properly perform WAL archiving and tracking the start/stop points
> > in the WAL of the backup.  Trying to write your own using shell scripts,
> > even with ZFS snapshots, isn't trivial.  If you trust the ZFS snapshot
> > to be perfectly atomic across all filesystems/tablespaces used for PG,
> > you could just take a snapshot and forget the rest- PG will do crash
> > recovery when you have to restore from that snapshot but that's not much
> > different from having to do WAL replay of the WAL generated during the
> > backup.
> >
> > As for existing solutions, my preference/bias is for pgBackRest, but
> > there are other options out there which also work, such as barman.
> 
> Here is a model shell script I use to do a base backup to set up a slave.
> See attached ws_base_backup.sh

This script is a good example of why trying to take a PG backup using
shell scripts isn't a good idea.  Offhand, there's issues like:

- No check that start_backup was successful
- No check that stop_backup was successful
- No syncing of files to disk anywhere
- Requires running as root (without any particular clear reason why)
- Doesn't check if the database is already in 'exclusive backup' mode
- Doesn't check the return codes for the main 'tar' command
- Uses pipes without checking return codes through PIPESTATUS
- Doesn't capture the output from pg_start/stop_backup
- Doesn't verify that all of the WAL required for the backup was
  archvied
- Doesn't check the exit code of the rsync

I'm sure there's other issues also and I do hope it's working enough
that you have viable backups, but I wouldn't use such a script today
(though I wrote plenty like it in the distant past).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] pg_start/stop_backup non-exclusive scripts to snapshot

2017-07-04 Thread Stephen Frost

Greetings,

* hvjunk (hvj...@gmail.com) wrote:
> I’ve previously done ZFS snapshot backups like this:
> 
> psql -c “select pg_start_backup(‘snapshot’);”
> zfs snapshot TANK/postgresql@`date ‘+%Ymd’`
> psql -c “select * from  pg_stop_backup();”

Hopefully you are also doing WAL archiving...

> Reading the PostgreSQL9.6 documentation, the advice/future is to use the 
> non-exclusive method, where I’ll need to keep a session *open* while the 
> snapshot takes place, and after that I’ll have to issue the 
> pg_stop_backup(false); in that active connection that issued the 
> pg_start_backup(‘backup’,false,false);

Right.

> How is this done inside a shell script?

Generally, it's not.  I suppose it might be possible to use '\!' with
psql and then have a shell snippet that looks for some file that's
touched when the snapshot has finished, but really, trying to perform a
PG backup using hacked together shell scripts isn't recommended and
tends to have problems.

In particular WAL archiving- there's no simple way for a shell script
which is being used for archiving to confirm that the WAL it has
"archived" has been completely written out to disk (which is required
for proper archiving).  Further, simple shell scripts also don't check
that all of the WAL has been archived and that there aren't any holes in
the WAL between the starting point of the backup and the end point.

> Especially how to do error checking from the commands as psql -c “select 
> pg_start_backup{‘test’,false,false);” not going to work?

I'd recommend considering one of the existing PG backup tools which know
how to properly perform WAL archiving and tracking the start/stop points
in the WAL of the backup.  Trying to write your own using shell scripts,
even with ZFS snapshots, isn't trivial.  If you trust the ZFS snapshot
to be perfectly atomic across all filesystems/tablespaces used for PG,
you could just take a snapshot and forget the rest- PG will do crash
recovery when you have to restore from that snapshot but that's not much
different from having to do WAL replay of the WAL generated during the
backup.

As for existing solutions, my preference/bias is for pgBackRest, but
there are other options out there which also work, such as barman.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Querying a policy

2017-05-10 Thread Stephen Frost

Jeff,

* Jean-Francois Bernier (jean.francois.bern...@boreal-is.com) wrote:
> We are evaluating migrating our software RLS to Postgres by using policies.

Neat!

> Having a "FOR UPDATE POLICY" on a table, I was wondering if there is a way to 
> know, before trying an Update and getting an error, if the current row can be 
> updated ?

Unfortunately, not as easily as it seems you would like, currently,
though perhaps we could change that..

> The goal is to show or hide the edit button in my software forms or lists.

Right, makes sense.

> I know that this query can return the CHECK condition of my POLICY:
> SELECT pg_get_expr(polwithcheck, polrelid, true) FROM pg_policy;

Yup.

> But is there a simpler way to get the ids the current user can read and the 
> ones that he can update?

Well, have you considered using the expression from the above query to
add a column to your SELECT query that results in a column that
indicates if the row is updatable or not..?  That is, construct your
query by doing:

SELECT pg_get_expr(polwithcheck, polrelid, true) FROM pg_policy ; into a
variable in your application, then:

"SELECT * , " . variable . " from ..."

The same could be done through a view, potentially, or perhaps with a
plpgsql function, but I'm guessing that negates some of the
"cleanliness" that you get with RLS and base tables.

I certainly like the idea in general.  I will caution that, to be fair,
just because the WITH CHECK clause says a given row can be modified at
SELECT time doesn't guarantee that the same row will be updatable in
some later transaction, as it depends on just what the policy is.

In any case, very cool to hear about people working to use RLS!  Would
love to chat further about your use-case and see what we can do to make
RLS easier to use.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Stephen Frost

Greetings,

* Samuel Williams (space.ship.travel...@gmail.com) wrote:
> Thanks for all the suggestions Stephen.
> 
> > That explain analyze shows a whole ton of heap fetches.  When was the
> last time a VACUUM was run on this table, to build the visibility map?
> Without the visibility map being current, an Index-Only Scan, as is
> happening here, can really suck.
> 
> I'm using the default 9.6 config, I thought that auto-vacuum was on by 
> default?

Sure, but that doesn't mean the table gets vacuumed.  In particular,
actual vacuums are only kicked off when the number of *updated* or
*deleted* tuples passes the autovacuum threshold.  If no updates or
deletes are done on the table (iow, it's essentially an insert-only or
insert-mostly table), the autovacuum daemon nevers runs a vacuum on it
(though it'll run analyze's).

https://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-THRESHOLD

Check pg_stat_all_tables to see when/if the table has actually been
vacuumed.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Stephen Frost

Greetings,

* Samuel Williams (space.ship.travel...@gmail.com) wrote:
> We want the following kinds of query to be fast:

"kinds of query" isn't helpful, you should be reviewing exactly the
queries you care about because statistics and your exact data set and
what the exact query you're running is will all have an impact.

> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
> "user_event"."what" IN ('poll', 'location_change',
> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
> 01:23:55') AND (latitude > -37.03079375089291 AND latitude <
> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
> 175.0805140220076);

Ugh.  You should really look at and consider PostGIS whenever you're
working with geospatial data.

> We have a btree index and it appears to be working. However, it's
> still pretty slow.

You're finding some 17M records and then reducing that with DISTINCT to
only 114k.  Hard to say if it's faster to just brute-force your way
through that with a HashAgg (as your explain analyze shows), or if a
loose index scan would work better (check out how to do one in PG here:
https://wiki.postgresql.org/wiki/Loose_indexscan).

> EXPLAIN ANALYZE gives the following:
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314

That explain analyze shows a whole ton of heap fetches.  When was the
last time a VACUUM was run on this table, to build the visibility map?
Without the visibility map being current, an Index-Only Scan, as is
happening here, can really suck.

> CLUSTER user_event ON index_user_event_for_visits_3 followed by
> analyze... Our data is mostly time series but sometimes we get some
> dumps with historical records.
> 
> Perhaps add a BRIN index on created_at

BRIN would be helpful if you wanted a smaller index.  That doesn't seem
to be the case here.

> I'm wondering if... we can use an index to cache, all user_ids seen on
> a given day. If we change our index to be more discrete, e.g.
> created_at::date, would this help? The set union of user_ids for 365
> days should be pretty fast?

Materialized views are often useful, particularly when the results are
(relatively) slow moving.

> I'm open to any ideas or suggestions, ideally we can keep
> optimisations within the database, rather than adding a layer of
> caching on top.

If you need to go the materialized view route, I'd definitely recommend
doing that in the database rather than outside it.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] A change in the Debian install

2017-04-06 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> (But ... these statements are based on an assumption of out-of-the-
> box Postgres behavior.  I would not exactly put it past the Debian
> packagers to have decided to change this for reasons of their own,
> and their track record of telling us about such decisions is many
> miles south of abysmal.  So you might look at whatever patches
> are in the Debian package to see if there's anything touching
> pgstat.c's socket-setup logic.)

I don't believe this is really a fair assessment.  Maybe at some point
in the distant past, but not today.  Christoph is regularly on this list
contributing to threads regarding packaging, submitting patches of his
own for improvements to PG, and the patches currently included in the
Debian distribution, at least mostly, are for things which really should
be possible to do with configure options, but which we don't provide
today, or things we should just be handling already.

The non-comment/documentation patches include for the Debian PG 9.6
packages are:

50-per-version-dirs.patch
  Use version specific installation directories so that several major
  versions can be installed in parallel.  This includes changing
  pkglibdir and includedir_server.  Those might be able to be set
  through existing configure flags and that's probably something we
  could work with Christoph to do.  There's also a change to pg_config
  which might be a bit more difficult to handle in upstream (related to
  how pg_config ends up in /usr/bin, but that isn't the "right" BINDIR).

51-default-sockets-in-var.patch
  Use /var/run/postgresql/ for the DEFAULT_PGSOCKET_DIR.  We really
  should allow this to be changed in configure.

54-debian-alternatives-for-external-tools.patch
  Use 'sensible-editor' for DEFAULT_EDITOR, and 'pager' for
  DEFAULT_PAGER.  These could also be done through configure switches, I
  would think.

64-pg_upgrade-sockdir
  This is a bit of a curious one, the description is:
  Fix for: connection to database failed: Unix-domain socket path

"/build/buildd-postgresql-9.3_9.3~beta1-1-i386-mHjRUH/postgresql-9.3-9.3~beta1/build/contrib/pg_upgrade/.s.PGSQL.50432"
  is too long (maximum 107 bytes)

  See also: http://lists.debian.org/debian-wb-team/2013/05/msg00015.html

  This basically adds a mechanism to fall back to using /tmp if the
  socket path is too long.  Would probably be good to figure out a
  better way.

filter-debug-prefix-map
  Description: Remove -fdebug-prefix-map=/p/w/d=. from CFLAGS and
  CONFIGURE in pg_config. The embedded path makes the build
  non-reproducible, and the flag is useless for building extensions
  anyway.
  Not sure there's much we can do about this one, but it's also just for
  pg_config.

hurd-sem_init
  Adds -pthread to LIBS and forces POSIX semaphores on hurd.  I'm not
  sure if this is all still required, but if so, it'd probably be ideal
  if we could figure out a way to handle it automatically instead of
  making the Debian packagers have to do it.

In short, there's certainly nothing that touches pgstat.c's socket-setup
logic and almost all of the changes are just changes to #define's to
specify alternative paths or external tools.

I'll start a discussion with Christoph on if we might, already, be able
to remove some of these, and where we might be able to make upstream
changes to remove the need for others.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] browser interface to forums please?

2017-04-04 Thread Stephen Frost

Greetings,

* Robin St.Clair (ro...@robinstclair.net) wrote:
> Please do not encrypt  what does not need to be encrypted. Signing 
> communications to a mailing list probably isn't required?

Signing communications demonstrates that the message was, indeed, from
me.  You are certainly welcome to ignore it.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] browser interface to forums please?

2017-04-04 Thread Stephen Frost

Greetings Vinny,

* vinny (vi...@xs4all.nl) wrote:
> I meant it as "in an ideal world". It's a bit like buying a car
> and finding out that they have not put the wheels on. It's not
> difficult to put them on yourself,
> but you kind of expect that the people who want you to user their
> car would do that for you.

There's a bit of a difference between buying a car and using a service
which is provided for free from a team of volunteers.

I agree that the "in an ideal world" wording is better. :)

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] browser interface to forums please?

2017-04-04 Thread Stephen Frost

Greetings,

* vinny (vi...@xs4all.nl) wrote:
> The thing is; mailinglists are far from userfiendly if you are not
> used to them.
> Even in this thread several people have explained how much work they
> have done to get it
> into a state where they can easily work with it. Can you expect Joe
> Average to do something like that
> if they want to get more involved in PgSQL?

I don't actually feel that it's really all that unreasonable, no.  I've
explained that we use mailing lists to a few different groups at
different colleges I've spoken at and while there's been a bit of
grousing from a few individuals, I don't recall anyone not knowing what
a mailing list is or having all that bad of a reaction.

I'll be speaking later this month again at GMU, so I'll make it a point
to discuss it with the group there.

> Now, I'm not saying the mailinglists should go, I'm saying there
> should be an easier way
> to access them. It should be possible to register on the site, post
> a message and read replies,
> without having to subscribe to the list and setup a way of dealing
> with the influx of messages
> that are, for the most post, simply not interesting to the average user.

I don't think there's anyone who is particularly against that idea, but
it's far from trivial to do and to address the possible spam which will
result from that.  All of the website code is open-source and
improvements to it would be greatly welcomed, as long as they don't
create a significant increase in the maintenance burden for the pginfra
team.

> I'd love to have an RSS feed that contains only new questions, so I
> can just watch the popup
> on my screen the way I do with the rest of the world, and not have
> to deal with replies to topics that I don't care about anyway.

I don't see any reason, off-hand at least, that this couldn't be
provided.  We already provide RSS feeds for other things and it's
reasonably straight-forward.  Replying to the RSS feed would require
an email client though, at least for now.  Perhaps that could be
integrated into the 'whole mbox' download option though or something
along those lines so you can pull the email/thread into your client
easily to reply.

> And yes, I can probably setup my email to do something like that,
> the point is that I shouldn't have to.

I'm all for improving things and adding automation where it'll help, but
the infrastructure is basically run by volunteers.  Making statements
like "I shouldn't have to" isn't the best approach to getting the
changes you'd like to see happen done.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Checksum and Wal files

2017-03-23 Thread Stephen Frost

Greetings,

* Poul Kristensen (bcc5...@gmail.com) wrote:
> Are all files inclusive wal files added a checksum?

WAL files include checksums, yes.  The heap files in recent versions can
also include checksums, but they don't by defualt.  Currently, checksums
can only be enabled at initdb-time, meaning that if you have a database
which doesn't have checksums today then you have to create a *new*
database, export all of the data from existing database using 'pg_dump'
and then import it into the database which has heap checksums enbaled.

> We will be doing regularly tests of recovery using checksums as the
> parameter
> of a succesfull recovery.

I'm curious what you mean by this statement.  There aren't many tools
out there today for checking if the heap checksums in PostgreSQL are
valid or not.  One approach is to use PostgreSQL itself and convince it
to read every page, though that can be a bit troublesome to do.  Another
option is to use a backup tool which checks checksums as it's doing the
backup (pgbackrest being one, and I think there may have been a fork of
barman which also did..?).

That said, I would strongly encourage you to find a better approach to
testing your backup and recovery procedures as just checking checksums
doesn't tell you if there's been any database-level corruption, it'll
only identify filesystem-and-below corruption (well, and it might catch
some database-level bugs, but the coverage there is exceedingly small).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Incremental / Level -1 backup in PG

2017-03-22 Thread Stephen Frost

Greetings,

* rakeshkumar464 (rakeshkumar...@outlook.com) wrote:
> >The short answer is 'no'.  There are complications around this,
> >particularly at the edges and because files can be written and rewritten
> >as you're reading them.  
> >Basically, no file with a timestamp after the
> >checkpoint before the backup can be omitted from an incremental backup.
> 
> what you have written above applies to oracle/db2 too.  In case you are not
> aware, during backup, those products have a mechanism to save the image
> of any changing block as it existed before the start of the backup. that is
> used
> to reconstruct the PIT image of the block.

That is WAL archiving, which PG already does, but is different from
backups.

> Anyhow looks like this can't be done in PG.

PG does support WAL archiving, as discussed on this thread, and it works
exactly as you describe above.  As I mentioned, there are also tools for
performing incremental backups, which isn't quite the same as straight
WAL archiving.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Incremental / Level -1 backup in PG

2017-03-22 Thread Stephen Frost

Greetings,

* rakeshkumar464 (rakeshkumar...@outlook.com) wrote:
> If first choice is lot faster in Oracle,DB2, I have reasons to believe that
> the same should be true for PG also. But as someone explained, the PG
> technology can not support this.

This statement isn't correct.  There are, in fact, tools for doing
incremental backup and restore with PG and they work quite well, as I
mentioned.  Currently they operate at a file-level, which means 1G
chunks instead of block-level, but there is work being done to change
that too.  There isn't anything in "PG technology" that makes this
impossible or, really, even that difficult, it's more that there hasn't
been effort put into it simply because the file-level incremental
solution works quite well in most cases.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Incremental / Level -1 backup in PG

2017-03-21 Thread Stephen Frost

John,

* John R Pierce (pie...@hogranch.com) wrote:
> On 3/21/2017 5:27 PM, Rakesh Kumar wrote:
> >PG does not have a concept of incremental  backup. The way it works in 
> >Oracle and other RDBMS is that incremental backup only backups up changed 
> >blocks since the last full backup.  So if only 10% of blocks changed since 
> >the last full backup, incremental backup will be only for 10%.
> >I am wondering whether it is technically feasible to implement it like this:
> >
> >1 - At the time of full backup, note the last modified time of each data 
> >file in a repository.
> >2 - Next time when incremental backup runs, for every data file it will 
> >check the last modified time of it with the one in the repository to 
> >determine whether it has changed since last full backup. If yes, back it up.
> >
> >Now on to restore:
> >
> >1 - First restore full backup.
> >2 - Restore incremental backup.
> >
> >My question: Will it work in PG?
> 
> basebackup + WAL archive lets you do just exactly this.   you can
> restore to any transaction between when that basebackup was taken,
> and the latest entry in the WAL archive, its referred in the
> documentation as PITR, Point in Time Recovery.

WAL must always be kept for file-level backups, of course, but it does
not allow the kind of incremental backup the OP was suggesting.

It's important to understand that you might start reading a file whose
timestamp is X, read half of it, and then PG starts writing to the first
half of the file, and you finish reading the file, all within the same
second.

A later incremental backup might assume that file hadn't been changed
from the version you have and therefore not back it up.  The WAL for the
change which was written by PG would be in the first 'full' backup, but
would not be included in the WAL which is generated during the
incremental backup, leading to a situation where that write would be
lost and you have a corrupted backup.

Do not try to implement an incremental backup solution using
simple/naive tools like rsync with timestamp-based incrementals.  It is
not safe.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Incremental / Level -1 backup in PG

2017-03-21 Thread Stephen Frost

Greetings,

* Rakesh Kumar (rakeshkumar...@outlook.com) wrote:
> PG does not have a concept of incremental  backup. The way it works in Oracle 
> and other RDBMS is that incremental backup only backups up changed blocks 
> since the last full backup.  So if only 10% of blocks changed since the last 
> full backup, incremental backup will be only for 10%.
> I am wondering whether it is technically feasible to implement it like this:
> 
> 1 - At the time of full backup, note the last modified time of each data file 
> in a repository.
> 2 - Next time when incremental backup runs, for every data file it will check 
> the last modified time of it with the one in the repository to determine 
> whether it has changed since last full backup. If yes, back it up.
> 
> Now on to restore:
> 
> 1 - First restore full backup.
> 2 - Restore incremental backup.  
> 
> My question: Will it work in PG?

The short answer is 'no'.  There are complications around this,
particularly at the edges and because files can be written and rewritten
as you're reading them.  Basically, no file with a timestamp after the
checkpoint before the backup can be omitted from an incremental backup.

I strongly recommend you use one of the existing backup solutions for
PostgreSQL which know how to properly perform incremental backups.  I
know at least pgBackrest and barman do, I'm not sure about others.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Postgres backup solution

2017-03-14 Thread Stephen Frost

Lawrence,

First off, I strongly recommend that you figure out how to send regular
plain-text emails, at least to this mailing list, as the whole
"winmail.dat" thing is going to throw people off and you're unlikely to
get many responses because of it.

Regarding your question..

* Lawrence Cohan (lco...@web.com) wrote:
> What would be a recommended solution for backing up a very large Postgres 
> (~13TeraBytes) database in order to prevent from data deletion/corruption. 
> Current setup is only to backup/restore to a standby read-only Postgres 
> server 
> via AWS S3 using wal-e however this does not offer the comfort of keeping a 
> full backup available in case we need to restore some deleted or corrupted 
> data.

If the goal is to be able to do partial restores (such as just one
table) then your best bet is probably to use pg_dump.  Given the size of
your database, you'll probably want to pg_dump in directory format and
then send each of those files to S3 (assuming you wish to continue using
S3 for backups).  Note that pg_dump doesn't directly support S3
currently.  Also, the pg_dump will hold open a transaction for a long
time, which may be an issue depending on your environment.

If you're looking for file-based backups of the entire cluster and don't
mind using regular non-S3 storage then you might consider pgBackrest or
barman.  With file-based backups, you have to restore at least an
entire database to be able to pull out data from it.

We are working to add S3 support to pgBackrest, but it's not there
today.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-03-07 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Frank van Vugt (ftm.van.v...@foxi.nl) wrote:
> > Well, I didn't run into this issue with any of my db's that 'nicely' use 
> > tables in various schema's, it was actually the one 'older' db with 
> > everything 
> > in the public schema that brought it up, so maybe keeping one of those 
> > around 
> > isn't too bad an idea ;)
> 
> Yeah, I'll be including this in a regression test also, to make sure we
> don't end up breaking this special case again in the future.

I've now pushed a fix for this, would be great to know if it solves your
issue.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: R: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-21 Thread Stephen Frost

Greetings,

* Job (j...@colliniconsulting.it) wrote:
> >If this is really what you're mostly doing, having constraint exclusion and 
> >an index on 'profile' would probably be enough, if you insist on continuing 
> >to have the table partitioned by day (which I continue to argue is a bad 
> >idea- 
> >based on the number of total rows you mentioned and the number of 
> >partitions, you have partitions with less than 20M rows each and that's 
> >really small, month-based partitions with a BRIN would probably work 
> >better).  If you get to 
> >the point of having years worth of daily partitions, you'd going to see 
> >increases in planning time.
> 
> Based on our simulation, we can raise until 5Gb of datas for a single day, 
> with some millions of rows.

5GB and a few millions rows isn't actually all that much.

> We thought to implement one partition for day.
> Do you think it should be fine?

Really depends on what you're doing.  If you're running very short
queries that pull out just a record or a few records, then you're going
to be unhappy with the planning time required when you have hundreds and
thousands of partitions, which is why I typically recommend against
using partitions-by-day unless you're only keeping a few months worth of
data.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: R: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-20 Thread Stephen Frost

Greetings,

* Job (j...@colliniconsulting.it) wrote:
> here is primary a partitioned table (for 20/2/2017 logs):
> flashstart=# \d webtraffic_archive_day_2017_02_20;
>  Table 
> "public.webtraffic_archive_day_2017_02_20"
>   Column   |Type |   
> Modifiers
> ---+-+
>  id| numeric(1000,1) | not null default 
> function_get_next_sequence('webtraffic_archive_id_seq'::text)
>  timestamp | timestamp without time zone |
>  domain| character varying(255)  |
>  action| integer |
>  profile   | character varying(50)   |
>  accessi   | integer |
>  url   | text|
> Indexes:
> "webtraffic_archive_day_2017_02_20_action_wbidx" btree (action)
> "webtraffic_archive_day_2017_02_20_domain_wbidx" btree (domain)
> "webtraffic_archive_day_2017_02_20_profile_wbidx" btree (profile)
> "webtraffic_archive_day_2017_02_20_timestamp_date_wbidx" brin 
> (("timestamp"::date))
> "webtraffic_archive_day_2017_02_20_timestamp_time_wbidx" btree 
> (("timestamp"::time without time zone))
> "webtraffic_archive_day_2017_02_20_timestamp_wbidx" brin ("timestamp")

You don't have a CHECK constraint on your individual partitions, so PG
has no idea which of the partitions contains data from what time-period.
To have PostgreSQL's constraint exclusion work to eliminate entire
partitions, you need to add a CHECK constraint on the partition key.  In
this case, it looks like you'd want something like:

CHECK (timestamp >= partition_date AND timestamp < partition_date+1)

Unfortunately, that'll require locking each table and scanning it to
make sure that the CHECK constraint isn't violated.

A couple of other notes- you probably don't need both a BRIN and a btree
index on the same column.  If the BRIN works well enough for you then
you can drop the btree index.  If it doesn't, then you might as well
just keep the btree index and drop the BRIN.  It seems equally
unnecessary to have a BRIN on a cast of the column.  You should also be
using timestamp w/ timezone, really.

> This is the query planner:
> explain analyze
> select * from webtraffic_archive where timestamp::date='2017-02-20' and 
> profile='f62467'

If this is really what you're mostly doing, having constraint exclusion
and an index on 'profile' would probably be enough, if you insist on
continuing to have the table partitioned by day (which I continue to
argue is a bad idea- based on the number of total rows you mentioned and
the number of partitions, you have partitions with less than 20M rows
each and that's really small, month-based partitions with a BRIN would
probably work better).  If you get to the point of having years worth of
daily partitions, you'd going to see increases in planning time.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Slow queries on very big (and partitioned) table

2017-02-20 Thread Stephen Frost

Greetings,

* Job (j...@colliniconsulting.it) wrote:
> we have a test machine with Postgresql 9.6.1 an about 800.000.000 record in a 
> table.
> Table is partitioned by day, with indexes on partitioned table.

You probably shouldn't be partitioning by day for such a small dataset,
unless you've only got a few days worth of data that make up those 800m
records.

Having hundreds of partitions leads to slow query planning time.  There
is work happening to improve on this by having declarative partitions
instead of using CHECK constraints and the constrain exclusion
mechanism.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-13 Thread Stephen Frost

Frank,

* Frank van Vugt (ftm.van.v...@foxi.nl) wrote:
> Well, I didn't run into this issue with any of my db's that 'nicely' use 
> tables in various schema's, it was actually the one 'older' db with 
> everything 
> in the public schema that brought it up, so maybe keeping one of those around 
> isn't too bad an idea ;)

Yeah, I'll be including this in a regression test also, to make sure we
don't end up breaking this special case again in the future.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-13 Thread Stephen Frost

Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 02/13/2017 06:04 AM, Stephen Frost wrote:
> >* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> >>I am following this up to the point of not understanding what
> >>exactly changed between 9.5 and 9.6. Namely 9.5 does include the
> >>default ACL's in the dump output and 9.6 does not.
> >
> >Quite a bit in pg_dump changed, but the relevant bit here is that we now
> >try to include in the pg_dump output any ACLs which have been changed
> >from their initdb-time settings for initdb-time objects.  What that
> >means is that if you don't change the privileges for the public schema
> >from what they're set to at initdb-time, then we don't dump out any ACL
> >commands for the public schema.  That ends up being incorrect in '-c'
> >mode because we drop the public schema in that mode and recreate it, in
> >which case we need to re-implement the ACLs which existed for the public
> >schema at initdb-time.
> 
> Thanks for the explanation in this post and your previous one. If I
> am following pg_init_privs is the initial state of objects ACLs and
> if that changes then those entries are removed.

No, if the object is *dropped* then the entry is removed from
pg_init_privs.  Otherwise, the entries in pg_init_privs aren't changed.

> So would not the
> general case be, on recreating an object use the ACLs in
> pg_init_privs if they exist otherwise use the ACLs as they exist
> wherever they go to on change away from pg_init_privs?

pg_init_privs doesn't track the object's name, so this isn't actually
possible.  Even if we did track the name of the object, I don't think
we'd actually want to set the privileges to what they were set to at
initdb time.  If you drop the public schema and then recreate it, are
you really expecting it to get the initdb-time privileges it had..?
How would you reconsile that with default privileges (which we don't
have for schemas right now, but it's been proposed...).

This case is about a pg_dump, which is a very different case in that we
want to recreate the state of the system as it existed at the time of
the dump.

> I gather that
> is what you are proposing as a special case for the public schema.
> Just wondering why it should not be the general case?

Not quite..  This is about what pg_dump does when a -c is used.
Unfortunately, it's *already* doing something special with the public
schema (if it wasn't, then this wouldn't really be an issue..).  This is
just about making it do the right thing in that already-existing
special-case.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-13 Thread Stephen Frost

Frank,

* Frank van Vugt (ftm.van.v...@foxi.nl) wrote:
> Op zaterdag 11 februari 2017 15:28:55 schreef Tom Lane:
> > I'm inclined to argue that it was a mistake to include any non-pinned
> > objects in pg_init_privs.
> 
> > We might need to fix pg_dump too, but I think these entries in
> > pg_init_privs should simply not be there.
> 
> Thanks for picking this up, I'll probably see this subject pop up on hackers 
> and/or committers at some point ;)

We should be able to get it addressed shortly.

> Allow me to emphasize that this issue basically means that for v9.6 after 
> restoring a dump created with the '-c' option one ends up in a situation that 
> might be quite confusing for users that didn't have to pay much attention yet 
> to handling priviliges... i.e. trying even a plain select on table_a in the 
> public schema as a non-system user returns something like:
>   ERROR:  relation "table_a" does not exist

Yes, it's unfortunate that many users aren't really familiar with
schema-level privileges.

For your specific case, if you drop/recreate the public schema in the
system that you're dumping the data from, and then set the ACLs to what
you want, they should be dumped out, even with a pg_dump -c.  It's only
when you're using -c with the initdb-time public schema, and initdb-time
ACLs, that the issue arises.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-13 Thread Stephen Frost

Adrian,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> I am following this up to the point of not understanding what
> exactly changed between 9.5 and 9.6. Namely 9.5 does include the
> default ACL's in the dump output and 9.6 does not.

Quite a bit in pg_dump changed, but the relevant bit here is that we now
try to include in the pg_dump output any ACLs which have been changed
from their initdb-time settings for initdb-time objects.  What that
means is that if you don't change the privileges for the public schema
from what they're set to at initdb-time, then we don't dump out any ACL
commands for the public schema.  That ends up being incorrect in '-c'
mode because we drop the public schema in that mode and recreate it, in
which case we need to re-implement the ACLs which existed for the public
schema at initdb-time.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-13 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost <sfr...@snowman.net> writes:
> > I'm not seeing a very simple answer for this, unfortunately.
> 
> I'm inclined to argue that it was a mistake to include any non-pinned
> objects in pg_init_privs.  The reason initdb leaves some objects unpinned
> is exactly because they can be dropped and recreated, and that means
> that their "initial" privileges are not static system properties.
> 
> We might need to fix pg_dump too, but I think these entries in
> pg_init_privs should simply not be there.

Wouldn't that mean we then need to drop and recreate all of those
non-pinned objects, with their ACLs, on a -c?

Or are you thinking we would just dump out whatever ACLs they have that
are different from default object creation, meaning that we'd have ACLs
for nearly all the views in pg_catalog and information_schema included
in regular pg_dump output?  Just dumping out the ACLs wouldn't
necessairly get us back to the same state as the database which was
dumped out though as the existing objects might already have some GRANTs
performed on them.

pg_init_privs also handles the initial privileges for extension objects,
though I'm guessing you aren't including those.  The initdb-time objects
are certainly similar though, in some ways, to extensions, the main
difference being that you can't really drop/recreate all of the initdb
time objects.

It seems to me that the oddball here is the public schema, and
specifically, the public schema when the user has been using the
from-initdb public schema (which has initdb-time privileges which are
recorded into pg_init_privs).  Had the user dropped and recreated the
public schema, the initdb-time privs for the original public schema in
pg_init_privs would have been removed, and the new public schema
wouldn't have had any entries in pg_init_privs.  The reason it's an
oddball is that it's the only from-initdb time object which is dropped
and recreated as part of pg_dump -c.  If we dropped all non-pinned
objects and recreated them when running pg_dump -c then perhaps what
you're suggesting would work, but we don't and I don't think that's
really what you were thinking we would do.

While it's a bit grotty, I tend to think the best approach is to
continue to assume that the only from-initdb-time object that the user
actually wants us to drop/recreate on a pg_dump -c is the public schema,
which means we need to handle things a bit differently when working in
-c mode for the public schema when pulling ACLs.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public

2017-02-11 Thread Stephen Frost

Greetings,

* Frank van Vugt (ftm.van.v...@foxi.nl) wrote:
> I noticed the following and wondered whether this is intentional or an 
> oversight in pg_dump's '-c' option?
> 
> The clean option causes the public schema to be dropped and recreated, but 
> this is done with the default schema priviliges, which are not the same as 
> the 
> ones assigned during create database:

Interesting.  The reason this happens is that the privileges for the
public schema aren't dumped when they are the same as what you would get
from a default install in 9.6+, but using -c will end up dropping and
recreating it, which, as you note, will end up having different
privileges than the default install because they'll be the regular
default privilegs of "nothing" for schemas.

This is happening just for the public schema due to how it's handled in
a special way in pg_dump_archive.c:_printTocEntry().  This only impacts
ACLs because those are the only things which are different for the
public schema vs. it's initdb settings (there's no SECURITY LABEL, for
example, on the initdb'd public schema).

Due to how the public schema is (and always has been) handled in this
special way, this is a bug which needs to be fixed by having the default
ACLs for the public schema included in the dump output if -c is being
used.

I'm not seeing a very simple answer for this, unfortunately.  I'm
thinking we're going to need to pull the public schema's permissions
differently if we're in clean mode (by comparing to the default schema
privileges) vs. when we're not (in which case we should be comparing to
*public*'s initdb-time privileges, as we do now).  One option would be
to handle that by hacking up buildACLQueries() to take a flag which
basically means "we are dropping the public schema, do not consider its
pg_init_privs settings" but that strikes me as awful grotty.  Another
option would be to change getNamespaces() to run a special query
(perhaps as a UNION-ALL combination with the existing query) that is
just to get the info for the 'public' schema (and exclude the 'public'
schema from the first half of the query, of course).

Thanks for the report!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Can we not give tyrannical pedants control of #postgresql?

2017-02-01 Thread Stephen Frost

Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Thu, Jan 19, 2017 at 5:23 PM, Julian Paul  wrote:
> > I hope that particular stereotypes aren't proven here, but it appears
> > #postgresql encourages a particular tier and makes aware of it's rigid
> > hierarchy. I owe alot to #postgresql but not to these particular users, I've
> > perhaps been idle for too long and the channel has change for the worse,
> > well that's not my fault. I leave it with the community to sort out.
> 
> I haven't been on irc much lately, but I've noticed this trend as well.

I'm on it pretty regularly, though I wasn't when the event which started
this thread happened, so I can't really speak to it and that's why I
hadn't responded.

In general, I feel like the channel is quite welcoming to newcomers, but
there is often a bit of a learning curve and when others point things
out that can sometimes be off-putting (not unlike our mailing lists..).

In any case, I'm happy to try and help out if people feel that there's
abusing of OPs or inappropriate behavior.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: R: [GENERAL] Partitioned "views"

2017-01-22 Thread Stephen Frost

Greetings,

* Job (j...@colliniconsulting.it) wrote:
> >>The mat view takes longer and longer to update because it runs the full 
> >>query.  What you really want to do is have a side-table that you update 
> >>regularly with appropriate SQL to issue UPDATE statements for just the 
> >>current day (or whatever).
> 
> If correct, i leave only last datas into "side-table" and syncronize only 
> these datas in the materialized view.
> If i delete datas from the side-table (ex- truncate) during the next 
> materialized view update they will be lost or remain intact?

I was suggesting that you, essentially, write your own SQL to have a
materialized view, *not* use the PG materialized view system.

In other words, the 'side-table' that you create would be *your*
materialized view, but to PG, it'd just look like a regular table.

You can't modify a PG materialized view.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Partitioned "views"

2017-01-22 Thread Stephen Frost

Greetings,

* Job (j...@colliniconsulting.it) wrote:
> we use a materialized view to aggregate datas from a very big table 
> containing logs.
> The source table is partitioned, one table for a day.
> 
> Since the refresh of the materialized view seems to grow a lot about timing, 
> we would like to know if it is pssible to make a "partitioned materialized 
> views", in order to update *only* current day logs aggregation, leaving old 
> days untouchable.

You can do this, but you have to write the SQL for it yourself, there's
no current way in PG to ask for a materialized view to be partitioned.

The mat view takes longer and longer to update because it runs the full
query.  What you really want to do is have a side-table that you update
regularly with appropriate SQL to issue UPDATE statements for just the
current day (or whatever).

Another approach which can be used is to have a trigger which will
automatically update the side-table for every change to the 'big' table,
but that will mean every update on the 'big' table takes longer and if
the updates are happening concurrently then you may run into locking,
and possibly deadlocking if it's done wrong.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Doubts regarding postgres Security

2017-01-21 Thread Stephen Frost

* Thomas Kellerer (spam_ea...@gmx.net) wrote:
> Stephen Frost schrieb am 21.01.2017 um 22:37:
> >>So, there is no solution for my first question, we need if users enter the
> >>wrong password more than 5 times than their account gets locked and then
> >>only DBA will unlock this account.
> >
> >I understood the question and there is an answer- use PAM.
> 
> What about LDAP?

Perhaps, but not what I'd recommend.  If you've got LDAP then you
probably have Active Directory and you should just be using GSSAPI.

If you've actually *only* got LDAP, then perhaps, but all the LDAP
methods require the user's password to be sent, in one form or another,
to the PG server, which isn't ideal, and usually there's a better option
in such environments.

Of course, PAM requires the user's password to be sent to the server
also, which is also why I don't generally recommend using it either, but
without more info ...

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Doubts regarding postgres Security

2017-01-21 Thread Stephen Frost

Pawan,

Please be sure to include the mailing list on replies, so others can
benefit from the discussion.  Also, please reply in-line, as I do below,
instead of top-posting.

* PAWAN SHARMA (er.pawanshr0...@gmail.com) wrote:
> So, there is no solution for my first question, we need if users enter the
> wrong password more than 5 times than their account gets locked and then
> only DBA will unlock this account.

I understood the question and there is an answer- use PAM.

The other option is to not use password-based authentication with PG and
instead use a better method such as certificate-based auth or GSSAPI.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Doubts regarding postgres Security

2017-01-21 Thread Stephen Frost

Pawan,

* PAWAN SHARMA (er.pawanshr0...@gmail.com) wrote:
> 1.  How can we set user account block feature after max number of
> invalid password entries?

There are ways to accomplish this, but they're unfortunately
complicated.  In the past, I've implemented these kinds of requirments
by using the 'PAM' auth method and configuring saslauthd and PAM with
various PAM modules.  Unfortunately, most PAM modules are not very well
suited to running as non-root or with alternate directories, so it can
be a bit tricky.

> 2.  How can we use SSL encryption in Postgres on Linux environment?

This can be a bit tricky too, depending on your PKI, but generally
speaking, you simply need to configure the SSL-relevant options in
postgresql.conf and then set SSLMODE=verify-ca (or verify-full,
depending on what you want to do).

Specific documentiaton is available here:

https://www.postgresql.org/docs/current/static/ssl-tcp.html

> 3.  How can we transparent data encryption in Postgres?

If you mean between PG and the filesystem, you could either use full
disk encryption provided by the OS, or like pgcrypto.

Documentation for pgcrypto is here:

https://www.postgresql.org/docs/current/static/pgcrypto.html

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] PgPool or alternatives

2017-01-21 Thread Stephen Frost

Simon,

* Simon Windsor (simon.wind...@cornfield.me.uk) wrote:
> My employer wants to move from an in house Oracle solution to a
> cloud based Postgres system. The system will involve a number of
> data loaders running 24x7 feeding several Postgres Databases that
> will be used by internal applications and external customer
> applications.
> 
> For the record, internal and external applications make heavy use of
> Temporary tables, that are session related. This requirement means I
> cannot consider normal replication methods.
> 
> Is PgPool the only viable that will allow the system the data
> loaders to feed [n] databases that will be functional identical?

I'm not sure what you mean by 'functional identical', but I wouldn't
generally consider that to be a property of pgpool (or pgbouncer, or any
other connection pooler, really).

That said, my general feeling is that pgbouncer tends to be simpler,
faster, and less likely to introduce oddities that you don't expect.
The 'session' mode might work for you, though it might be debatable if
that really helps you all that much.  'transaction' mode is what I
usually recommend as it allows idle connections to be handled by
pgbouncer (unlike 'session' mode), but there are caveats to using that
mode, of course.

I'm a bit curious where you're thinking of using the connection pooler
also though.  If you have data loaders running 24x7 feeding data
constantly to PG, do you really need a connection pooler for those?
Connection poolers make a lot of sense for environments where there's
lots of down-time on the connection, but the less down-time, the less
they make sense.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Combining count() and row_number() as window functions

2017-01-19 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Thomas Kellerer  writes:
> > I assumed that the count() wouldn't increase the runtime of the query as 
> > the result of the row_number() can be used to calculate that. 
> 
> No such knowledge exists in Postgres.  Given our general approach in which
> functions (including window functions) are black boxes, it's hard to see
> how it could be done in a way that wasn't a ugly kluge.

No, but what's interesting about this is that the WindowAgg count(*)
query is, apparently, quite a bit slower than the subquery with regular
aggregate of count(*), but the WindowAgg plan is costed out as being
cheaper.

I put the two up if anyone else finds that easier to read:

https://explain.depesz.com/s/bc67
https://explain.depesz.com/s/UWZt

That said, it probably doesn't matter if it was costed cheaper since I
don't think we would actually consider running an aggregate with an
'OVER ()' clause as a regular aggregate instead of as a WindowAgg.  I
don't know how expensive it would be to consider such a path, but it
seems like it might not be too bad since you would only look at those
cases if it's an empty window clause, which should be cheap to check.

The other thing I wonder about is if there's some way we could make the
WindowAgg query faster through code changes in how the windowing
count(*) is called.

I've not really looked at any code, this is all pure speculation, so
feel free to ignore me if I'm completely off-base here. :)

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Are new connection/security features in order, given connection pooling?

2017-01-11 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Guyren Howe  writes:
> > I’m not following. What I would like is just a lightweight way to switch 
> > the connections to use a different role, or some moral equivalent, that 
> > would prevent an SQL injection from wrecking havoc. I’m not proposing 
> > anything that will change anything else about how the application is using 
> > the database.
> 
> > SET ROLE doesn’t work, because an SQL injection can just SET ROLE back to 
> > the privileged user.
> 
> > I would like a regime where there is no permanent privileged relationship 
> > between the client application and the database; a user would need to 
> > supply validating information that we can use to switch a connection to 
> > something with minimal privileges for just that role, for the duration of 
> > one session or transaction.
> 
> And what happens after that?  You can't reuse the session anyway, because
> there would be no way to switch it back to a more-privileged state so you
> could use it for another user ID.  (If there were a way, then a successful
> SQL-injection attack could use it too.)

That's based on an incorrect assumption that the state which allows a
connection to be reused by another user ID is 'more privileged'.

If we had a way to require authentication for a 'SET ROLE' call then you
would be able to do what Guyren is asking about and when you're done
with the session something issues 'RESET ROLE;' and you're back to a
role that *isn't* more privileged and to move to another user ID you
have to go through the authentication process again.

This isn't fantasy, it's been done, but it's more complicated than it
really should be, as noted up-thread.  There's also complications around
how to handle the authentication credentials so you aren't constantly
asking the user to re-authenticate, but that's not impossible to deal
with either, you just need something akin to the cookie/unique ID used
by every website that authenticates users today.

> If you end up having to dedicate each connection to a particular
> unprivileged userID, then you can just open the connection as that user
> to start with; a magic one-way privilege switch doesn't really help.

Well, to that I'd say "it depends."  There are certainly scenarios where
you have multiple userIDs and a connection pooler like pgbouncer which
handles the different connections to the database and it's a lot better
than making new connections because new connections are so painful and
slow to create.  This doesn't work great if you have thousands of
concurrently connected independent users, of course.

> We've discussed this problem repeatedly (you might want to search the
> archives a bit) and never found a solution that was both fully secure
> and did much of anything for connection-pooling scenarios.

I don't agree that this is unsolvable, but it would require things like
protocol-level changes which no one has had the gumption to work through
and propose.

In short, I agree with Guyren, there are features needed here that we
don't have and it would be a great deal better if we did.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] LYDB: Feasible to use PG roles instead of application-level security?

2016-12-30 Thread Stephen Frost

Greetings,

* Guyren Howe (guy...@gmail.com) wrote:
> it occurs to me to wonder whether it is practical to use PG’s own roles and 
> security model in lieu of using an application-level one.

The short answer is yes.

> It seems that the role system in PG is sufficient for most general purposes. 
> One could presumably also have a table with role names and associated 
> metainformation (email address etc) as needed.

Yup.  That can get a bit awkward if you have multiple databases inside
of a single cluster, as you would have to pick which database to put
that metainformation in, but that isn't a very big issue.

> If I have a system with many thousands of users, is it practical to manage 
> these users’ authentication and authorization using *just* Postgres?

For this, it really depends on if the PG authorization model matches the
requirements you have.  The PG auth model, particularly with RLS, is
extremely flexible but you would really need to evaluate what the exact
requirements are and how you would handle that with the PG auth model.
Of course, if there are just a few exceptions or complicated cases that
can't be satisfied directly with PG today, you could use security
definer functions.

One area that isn't fully addressed with the PG auth model today is
partial access to a certain column.  Consider a table where you want
users to have access to all of the rows and all of the columns *except*
for column X for rows where ID is > 1000.  The PG auth model today can
be used to say "you can't access column X" or to say "you can't access
rows where ID > 1000" but you can't combine those, yet.

I'm hopeful that we'll get there as there are definitely use-cases for
that kind of access control, but it's unlikely to happen for PG10.

> It occurs to me that some client frameworks might have issues with their 
> connection pools if those connections keep switching users, assuming they 
> even can, but let’s set that aside for now. Or perhaps every connection could 
> immediately do a SET USER before executing its connection?

Again, yes, connection poolers can be an issue, but it's possible to use
the role system and do a 'set role X' after having connected as some
user that has very little access.  The issue here is controlling that
role change- there's no direct way in PG today to require a password to
be provided when doing the role change, which is unfortunate.  One
approach to solving that with RLS is to use a security definer function
to change a record in a table that is then used in all RLS policies.
It's a bit complicated and doesn't involve doing 'set role' though, so
there are some trade-offs there.

If you really want connection pooling and independent users in PG's role
system then you'll end up having to have the app code do the
authentication (or maybe auth to PG as the user and, if successful,
reconnect as the regular user and set role...  that's pretty awkward
though) and then connect and do the 'set role'.

One big question here, however, is if you're going to have thousands of
*concurrently connected* users.  Thousands of users shouldn't be too
much of an issue, but if they're all connected using PG's main auth
system then you'll have thousands of backend processes running.  That'll
end up causing some amount of overhead even if they're mostly idle.  If
your application can handle connecting/disconnecting pretty easily and
you have a relativly short timeout (though, ideally, not too short) then
perhaps your number of concurrent connections won't be too bad.

> This seems an attractive proposition from a security standpoint: if I use 
> row-level security pervasively, I can have a security system that’s nestled 
> nice and close to the data and presumably tricky to work around from a hacker 
> given direct access only to the client application.

If that's the threat model you want to address then you'll have to work
out the concurrent connections question.  One thing which can help is to
use a common user for 'read-only/public-access (or at least low-value)'
queries from the app, if there are such.

> Is this practical? Has anyone here done it? What might the caveats be?

Yes, yes, see above.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Is there a reason the "-v/verbose" switch is not printed with pg_dumpall --help

2016-12-24 Thread Stephen Frost

Daniel,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Daniel Westermann (daniel.westerm...@dbi-services.com) wrote:
> > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall -V 
> > pg_dumpall (PostgreSQL) 9.6.1 
> > postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall --help 
> > pg_dumpall extracts a PostgreSQL database cluster into an SQL script file. 
> 
> Hmm.  Looks like an oversight, will see about fixing it.

I've pushed a fix for this.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Querying dead rows

2016-12-23 Thread Stephen Frost

* Rakesh Kumar (rakeshkumar...@outlook.com) wrote:
> Is there a way to query dead rows (that is, rows which are dead and still not 
> cleaned up by Vacuum) using SQL.  I am asking this just as an academical 
> question.

CREATE EXTENSION pageinspect;

https://www.postgresql.org/docs/9.6/static/pageinspect.html

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Is there a reason the "-v/verbose" switch is not printed with pg_dumpall --help

2016-12-23 Thread Stephen Frost

Daniel,

* Daniel Westermann (daniel.westerm...@dbi-services.com) wrote:
> postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall -V 
> pg_dumpall (PostgreSQL) 9.6.1 
> postgres@pgbox:/u01/app/postgres/local/dmk/ [PG961] pg_dumpall --help 
> pg_dumpall extracts a PostgreSQL database cluster into an SQL script file. 

Hmm.  Looks like an oversight, will see about fixing it.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Recursive row level security policy

2016-12-17 Thread Stephen Frost

Simon,

* Simon Charette (charett...@gmail.com) wrote:
> Ahh makes sense, thanks for the explanation!
> 
> I was assuming USING() clauses were executed in the context of the
> owner of the policy, by passing RLS.

No, as with views, a USING() clause is executed as the caller not the
owner of the relation.  Security Definer functions can be used to
execute actions in the policy as another user.

Note that RLS won't be applied for the table owner either (unless the
relation has 'FORCE RLS' enabled for it), so you don't have to have
functions which are run as superuser to use the approach Joe
recommended.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Postgresql 9.6 and Big Data

2016-12-02 Thread Stephen Frost

Job,

* Job (j...@colliniconsulting.it) wrote:
> we are planning to store historically data into a Postgresql 9.6 table.

The question is less about what you're storing in PG and more about what
you're going to be doing with that data.

> We see on Postgresql limit that it could handle "big data".
> In fact, limit for a single table is 32 Tb.

Very few people would advocate storing a single, non-partitioned, 32TB
table in any database system.  As with most other database technologies,
you'll want to partition up your data.  The general rule of thumb is to
partition at the 100s-of-millions of tuples level which tends to lead to
tables which are somewhere between 5G and 100G.  At 100G per table, you
would end up with a few hundred tables to get up to 32TB, which is quite
managable in PG.  Indeed, PG can support many more tables than that, but
putting more than a few hundred into a single inheritance structure
using constraint exclusion for the partitioning can lead to long
planning times.

> We need to archive this data to generate report and analysis views.

Is the historical data changing?  Are the queries changing?  Will you
primairly be performing queries which look at all of the data?

> Anyone has experienced Postgresql 9.6 with Database and table bigger than 
> four or five Terabytes?

Yes.

> Which hardware requirements has been used?

This depends a great deal on what you're actually doing.

> There were some problems or bottleneck with so big data?

If you're looking to run queries against 4 or 5 TBs at a time where all
of the tuples have to be considered (no index-based lookups, etc), then
you'll want to be doing parallel work.  With 9.6, you could try using
the parallel query capability, but even with 9.5 and earlier you could
pretty trivially write your own parallel query system by using multiple
connections to the database and it works very well.  The only downside
to using multiple queries is if you have ongoing changes happening and
you need a consistent view of the data- that's still possible to do with
multiple processes, but you have to do some prep work ahead of time to
extract out the keys for all of the records you want to process in a
given run and store them in a static side-table that the parallel
processes then use.  You have to avoid removing records that are
included in that set, of course, and depending on your requirments you
might wish to avoid updating them also. 

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Row level security performance joining large tables

2016-11-10 Thread Stephen Frost

David,

* David R. Pike (david.p...@trustedconcepts.com) wrote:
> From what I can understand the RLS implementation strives to execute policy 
> checks before user provided predicate checks so as to avoid leaking protected 
> data.  Is there any way to make the join look "safe" to the optimizer to 
> avoid full table scans?  Isn't this a common scenario?

You can use a security barrier view which is owned by the same user that
the tables underneath are owned by, that will bypass RLS on the tables
themselves and therefore you'll need to implement the appropriate quals
in the security barrier view.

As Tom mentions, we're working to improve RLS optimization as well.  As
is pretty common with various features, the initial implementation
provides the functionality but perhaps isn't as performant as one might
like, and then we iterate and improve it in the subsequent releases.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] avoiding index on incremental column

2016-10-17 Thread Stephen Frost

* t.dalpo...@gmail.com (t.dalpo...@gmail.com) wrote:
> I've a very huge table whose 1st column is a numeric value, starting
> from 0 at the 1st row and incremented by 1 each new row I inserted.
> No holes, no duplicates.
> I need to perform some very fast query based on this value, mainly
> around the last inserted rows.
> What is the best I can do? A normal index or is there a way to
> instruct the system to take advantage from that strong order?
> In theory, the number of the row is the only info the system would
> need to directly access that row.
> So I'd like to avoid useless complex indexes if possible.

A BRIN index should work pretty well in that scenario.

A btree index would most likely be better/faster for query time, but
more expensive to maintain.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] BRIN indexes and ORDER BY

2016-10-05 Thread Stephen Frost

Darren,

* Darren Lafreniere (dlafreni...@onezero.com) wrote:
> Tom Lane  wrote:
> > > Gavin Wahl wrote:
> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> > >> just find the page range with the largest/smallest value, and then only
> > >> scan that one. Would that be hard to implement? I'm interested in
> > working
> > >> on it if someone can give me some pointers.
> >
> > I think this proposal is fairly broken anyway.  The page range with the
> > largest max-value may once have contained the largest live row, but
> > there's no guarantee that it still does.  It might even be completely
> > empty.  You could imagine an algorithm like this:
> >
> > 1. Find page-range with largest max.  Scan it to identify live row with
> > largest value.  If *no* live values, find page-range with next largest
> > max, repeat until no page ranges remain (whereupon return NULL).
> >
> > 2. For each remaining page-range whose indexed max exceeds the value
> > currently in hand, scan that page-range to see if any value exceeds
> > the one in hand, replacing the value if so.
> >
> > This'd probably allow you to omit scanning some of the page-ranges
> > in the table, but in a lot of cases you'd end up scanning many of them;
> > and you'd need a lot of working state to remember which ranges you'd
> > already looked at.  It'd certainly always be a lot more expensive than
> > answering the same question with a btree index, because in no case do
> > you get to avoid scanning the entire contents of the index.
[...]
> A b-tree index would certainly be faster for ordering. But in scenarios
> where you have huge datasets that can't afford the space or update time
> required for b-tree, could such a BRIN-accelerated ordering algorithm at
> least be faster than ordering with no index?

For at least some of the common BRIN use-cases, where the rows are
inserted in-order and never/very-rarely modified or deleted, this
approach would work very well.

Certainly, using this would be much cheaper than a seqscan/top-N sort,
for small values of 'N', relative to the number of rows in the table,
in those cases.

In general, I like the idea of supporting this as BRIN indexes strike me
as very good for very large tables which have highly clumped data in
them and being able to do a top-N query on those can be very useful at
times.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Multi tenancy : schema vs databases

2016-09-30 Thread Stephen Frost

* Jeff Janes (jeff.ja...@gmail.com) wrote:
> On Fri, Sep 30, 2016 at 2:06 AM, Rakesh Kumar 
> wrote:
> > We require complete data isolation. Absolutely nothing should be shared
> > between two tenants.
> 
> Then you need different clusters per tenant.  Otherwise, the WAL records of
> different tenants are inextricably mingled together.

Different clusters are also required to have independent file-level
backups, independent roles, independent tablespaces, etc.

It's also far easier to move a single cluster from one system to another
to adjust for growth than to try and move an individual schema or
database.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Privileges on public schema can't be revoked?

2016-09-06 Thread Stephen Frost

Gregm

* Greg Fodor (gfo...@gmail.com) wrote:
> A, I wasn't aware of the PUBLIC meta-role. Not sure if it's useful
> feedback, I spent a lot of time digging around the web for solutions
> that would basically let me query the database to see all of the
> effective privileges for a user, and none of the solutions I found
> were able to get me to a point where this was clear, since they all
> queried against the various information schema tables that I think
> neglect to take into account the PUBLIC meta-role.

\dn+ in psql will give you the access privileges for all schemas.

I'd have to look at the "other solutions" you're referring to, but, in
general, we do not exclude the public role in any way from the access
privilege system.  The \dn+ above should return something like:

postgres=UC/postgres+
=UC/postgres

for the Access Privileges column for the public schema, which shows that
the 'postgres' role and the '' role (aka, 'public') have been granted
both USAGE and CREATE on that schema. 

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Privileges on public schema can't be revoked?

2016-09-06 Thread Stephen Frost

Greg,

* Greg Fodor (gfo...@gmail.com) wrote:
> Apologies in advance about this since it is likely something obvious,
> but I am seeing some very basic behavior that does not make sense.
> I've tested this on a fresh build of 9.6rc1 and also 9.1.24 (just to
> see if it was a regression.) After creating a test database, and a
> test user that I revoke all privileges on the public schema to, yet
> that user is still able to create tables in the public schema. Revokes
> on other schemas work as expected, it seems the public schema is
> treated specially.
> 
> https://gist.github.com/gfodor/c360683f25f55497c8c657255fd0e0f8
> 
> Any help appreciated!

The privilege on the public schema was granted to "public" and therefore
must be revoked from "public" to remove that privilege.  Revoking from
"guy" doesn't have any effect.

Note that if you revoke all privielges from 'public' then only users who
have been explicitly granted access will be able to create or *use* any
objects in the public schema.

Generally, I revoke CREATE rights from the public schema, but leave
USAGE rights, as I then put trusted extensions and other tools into the
public schema.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] restore a specific schema from physical backup

2016-07-30 Thread Stephen Frost

* David Steele (da...@pgmasters.net) wrote:
> On 7/29/16 5:31 PM, Rakesh Kumar wrote:
> > Sure.
> > 
> > 1 - You ran pg_basebackup on node-1 against a live cluster and store
> > it on NFS or tape.
> > 2 - Do a  restore on node-2 from the backup taken on (1), but only for
> > a subset of the database
> >  (schema/database)
> > 3- Put the cluster live on node-2 after (2) completes. Essentially the
> > cluster will now be a small
> > subset of cluster on node-1.
> > 
> > Benefit: If I have to restore only 5% of entire db, it should be lot faster.
> 
> pgBackRest allows specified databases to be restored from a cluster backup:
> 
> http://www.pgbackrest.org/user-guide.html#restore/option-db-include
> 
> I know you are interested in schema-level restores but this is the
> closest thing that I know of.

We have discussed providing the ability to restore a subset of a
database from a physical backup, but it's far from trivial.  Working out
what files contain the catalog requires first reading through
pg_filenode.map and then understanding the structures of the relevant
catalogs.  Only then will you know what schemas and tables exist and
what their relfilenode's are, which is necessary to perform the restore
of those objects.

Of course, WAL replay still has to be performed also, to reach a
consistent backup point.  We've worked out how to get that to work,
though if you have a lot of WAL then that can still take a bit of time
and disk space.

With sufficient interest and resources, we might be able to make it
happen, but I wouldn't expect it near-term.  Until then, at least the
database-level option, as David mentioned, can be used.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Allowing multiple versions of PG under NetBSD

2016-07-29 Thread Stephen Frost

* Larry Rosenman (l...@lerctr.org) wrote:
> On 2016-07-29 15:14, Bruce Momjian wrote:
> >On Fri, Jul 29, 2016 at 03:09:59PM -0500, Larry Rosenman wrote:
> >>>Data Directory naming, as well as keeping the init-scripts straight.
> >>>
> >>And who gets 5432, and Unix socket naming, it starts to get messy.
> >
> >Well, pg_upgrade uses a socket created in the current run directory, so
> >that should be fine.
> if we're talking JUST for pg_upgrade, that's one thing.  Peaceful
> co-existence on an ongoing basis
> is quite another.

It's not an insurmountable problem, though it's a bit painful.  Still,
both the Debian-based and RedHat-based distributions demonstrate how it
can be done.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-29 Thread Stephen Frost

Jerry,

* Jerry Sievers (gsiever...@comcast.net) wrote:
> Bruce Momjian  writes:
> > I agree, but I am not sure how to improve it.  The big complaint I have
> > heard is that once you upgrade and open up writes on the upgraded
> > server, you can't re-apply those writes to the old server if you need to
> > fall back to the old server.  I also don't see how to improve that either.
> 
> Hmmm, is it at least theoretically possible that if a newly upgraded
> system were run for an interval where *no* incompatible changes to DDL
> etc had been done...
> 
> ...that a downgrade could be performed?
> 
> Er, using a not yet invented pg_downgrade:-)

The short answer is 'no'.  Consider a case like the GIN page changes- as
soon as you execute DML on a column that has a GIN index on it, we're
going to rewrite that page using a newer version of the page format and
an older version of PG isn't going to understand it.

Those kind of on-disk changes are, I suspect, why you have to set the
"compatibility" option in the big $O product to be able to later do a
downgrade.

> That is, since higher version knew enough about lower version to
> rejigger everything...  just maybe it could do the reverse.

That might work if you opened the database in read-only mode, but not
once you start making changes.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] pg_dumping extensions having sequences with 9.6beta3

2016-07-26 Thread Stephen Frost

Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Tue, Jul 26, 2016 at 4:50 PM, Noah Misch  wrote:
> > [Action required within 72 hours.  This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Stephen,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within 72 hours of this
> > message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all 
> > fixed
> > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > efforts toward speedy resolution.  Thanks.
> >
> > [1] 
> > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> 
> I am not sure what's Stephen's status on this item, but I am planning
> to look at it within the next 24 hours.

That'd be great.  It's definitely on my list of things to look into, but
I'm extremely busy this week.  I hope to look into it on Friday, would
be great to see what you find.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] Memory usage per session

2016-07-08 Thread Stephen Frost

* amatv...@bitec.ru (amatv...@bitec.ru) wrote:
> > On 08/07/2016 14:11, amatv...@bitec.ru wrote:
> >> The test performs about 11K lines of code
> >> Memory usage per session:
> >> Oracle: about 5M
> >> MSSqlServer: about 4M
> >> postgreSql: about 160М
> 
> > Visual C???
> > You will have to run PostgreSQL on a proper Unix system to test for 
> > performance.
> Of cause we understand that unix is faster( damn fork :)))
> Our Current problem is memory:
>  (160m vs 5M) * 100 sessions  = 16G vs 0.5G
> We just can get "out of memory".
> :(((

Do you have 100 CPUs on this system which apparently doesn't have 16G
of RAM available for PG to use?

If not, you should probably consider connection pooling to reduce the
number of PG sessions to something approaching the number of CPUs/cores
you have in the system.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Fastest memmove in C

2016-07-07 Thread Stephen Frost

* Merlin Moncure (mmonc...@gmail.com) wrote:
> Well, testing is the key here. Microbechmarks demonstrating the value
> are not enough; proven statistically relevant benchmarks generated
> from postgres are the data points needed to make an assessment.  My
> recommendation would be to dynamically link in these routines to
> postgres and start running a battery of memory heavy tests (start with
> pgbench -S on a small database).  Ideally you could tease out some
> measurable benefit; license discussions and formal integration
> strategies are secondary to that IMO.

While I agree with this, I'm trying to figure out why this isn't being
incorporated into glibc instead..?  Perhaps I missed it, but I didn't
see a discussion of that in the article.  I'd certainly rather rely on
glibc if we can, though I know that we've ended up implementing our own
routines at times too.  On the other hand, if there's a reason the glibc
folks don't want this, we should consider that..

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] How safe is pg_basebackup + continuous archiving?

2016-06-30 Thread Stephen Frost

Greetings,

* Kaixi Luo (kaixi...@gmail.com) wrote:
> We use PostgreSQL at work and we do daily backups with pg_dump. After that
> we pg_restore the dump and check the database that there isn't any data
> corruption. As the database grows, the whole pg_dump / pg_restore cycle
> time is quickly approaching 24h, so we need to change strategies.

I've found this to be a good strategy also, but it's far from perfect.
Corruption can still occur, for example, in indexes on the primary
system.  Generally speaking, pg_dump doesn't exercise indexes and
therefore you won't notice if an index is corrupt.

> We've thought about using pg_basebackup + continuous archiving as an
> alternative backup process, but I have doubts regarding the safety of such
> procedure. As far as I know, pg_basebackup is done via rsync (and we also
> archive wals using rsync), so if by any chance  disk corruption occurs on
> the master server, the corruption would be carried over to our backup
> server.

This is correct, but checksums are now available in modern versions of
PG, which will detect disk corruption.  Those checksums would be carried
over to the backup server and could be verified there by using pg_dump
(note that this still wouldn't help with indexes, but you don't have
coverage there today anyway).

> How can we check for backup corruption in this case? Thanks you very much.

There has been some discussion about a specific tool for checking the
checksums throughout the entire system.  I don't know of anyone activly
working on that, unfortunately.

There are a number of tools available to help with online backups and
continuous archiving beyond pgbasebackup and having to hand-roll
scripts.  I'm personally biased towards and prefer pgBackRest, as I
helped start that project, but there are other tools, such as barman and
WAL-E, which would still be better than trying to implement everything
correctly on your own.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] WAL's listing in pg_xlog by some sql query

2016-06-08 Thread Stephen Frost

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Mon, Jun 6, 2016 at 9:51 PM, Stephen Frost <sfr...@snowman.net> wrote:
> > * Vik Fearing (v...@2ndquadrant.fr) wrote:
> >> On 03/06/16 04:32, Michael Paquier wrote:
> >> > On Fri, Jun 3, 2016 at 11:23 AM, Sameer Kumar <sameer.ku...@ashnik.com> 
> >> > wrote:
> >> >> On Fri, Jun 3, 2016 at 4:30 AM Stephen Frost <sfr...@snowman.net> wrote:
> >> >>> Given the usefulness of this specific query and that it could be used
> >> >>> without risk of the user being able to gain superuser access through 
> >> >>> it,
> >> >>> I'd like to see a new function added which does not have the superuser
> >> >>> check, but is not allowed to be called by public initially either.
> >>
> >> CREATE FUNCTION ls_dir(text)
> >>  RETURNS SETOF text
> >>  LANGUAGE sql
> >>  SECURITY DEFINER
> >> AS 'select * from pg_ls_dir($1)';
> >
> > This isn't a good idea as it allows access to a great deal more than
> > just the number of xlogs.  Further, as described above, it gives that
> > access to everyone and not just to specific roles.
> 
> Sure, because it is possible to a path string at will. In the context
> of this use case, Alex could just hardcode pg_xlog and nothing else
> than the list of files in this path would be leaked to a user who has
> the execution grant right of this function.

Part of what I was getting at is that it's better for a specific
function to be designed and implemented carefully than everyone having
to write their own (often with poor results, as above).  For a common
use-case, such as this, that function would be best provided as part of
core rather than out in a contrib module or only in the mailing list
archives or similar.

> > This is a great example of why we should provide an explicit function
> > which is documented (both in our documentation and in the documentation
> > of tools like check_postgres.pl) that users can use and can GRANT access
> > to for their monitoring systems which gives access to only the
> > information needed- that is, the number of xlog segments.
> 
> I have been wondering for some time now about the possibility to have
> at SQL level a representation of the shared memory structure
> XLogCtlData instead, though there is no tracking of what is the newest
> segment that has been recycled ahead, but it could be a reason to
> track that as well.

I've not looked into it myself, but off-hand that does seem useful.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] postgres_fdw and Kerberos authentication

2016-06-06 Thread Stephen Frost

Jean-Marc,

* Jean-Marc Lessard (jean-marc.less...@ultra-ft.com) wrote:
> Stephen Frost [sfr...@snowman.net]  wrote:
> > The database owner operating system user has to be trusted, along with any 
> > superusers in the database, but if you assume those, then having PG manage 
> > the different Kerberos cache files
> > (one for each backend which has authenticated via Kerberos and passed 
> > through delegation credentials) should work.
> > Clearly, we can't give the user control over which credential cache to use.
> 
> True, in such a case (single sign on) the user should not specify a user in 
> the user mapping, so that its own Kerberos ticket be used to authenticate.

I don't know that it's actually an issue for the user to specify the
mapping- either it'll be allowed or not, based on the credentials in the
Kerberos cache and pg_ident mappings.  What we can't do is allow the
user to control which cache they are able to use.

In other words, there should be one credential cache per backend process
and that holds exactly the credentials which are forwarded from the
client system.

> > Having to trust the OS user and superusers with those credentials isn't any 
> > different from using passwords with postgres_fdw.
> 
> OS user and superusers, should not have access and allowed to manage the 
> credential files.

This isn't possible with traditional Unix permissions.  Perhaps
something could be done with SELinux, but we're not going to depend on
that.

Ultimately, the credential cache must be available to the backend
process, which runs as the OS user.  The PG superuser can execute
arbitrary commands as the OS user, so there isn't any distinction
between the OS user and the PG superuser.

As mentioned up-thread, this is exactly the same as Apache, except that
Apache happens to run as root whereas we run as a non-root user.

> For example, in a secure environment with separation of duties at the 
> organization level (tier1, tier3, superuser, sys admins, etc), the tier1 DB 
> users cannot connect onto the DB server (as OS user), but may move data form 
> one database to another.

Sure, I assumed that we were discussing a case where DB users connect to
the database, not log on to the DB server as an OS user.

> I agree that tier1 users cannot query the catalog and see other user 
> password, but a superuser can, which is considered a security breach by 
> auditors.
> Storing a password in plain text even for a short period of time is 
> unfortunately not authorized.

Agreed.  This isn't the same as a Kerberos credential cache, but it's
not as far different as one might assume either.  The superuser will be
able to access the credential cache of anyone who has forwarded their
Kerberos ticket to the server, which is the same for any environment
that allows Kerberos credential proxying.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] WAL's listing in pg_xlog by some sql query

2016-06-06 Thread Stephen Frost

* Vik Fearing (v...@2ndquadrant.fr) wrote:
> On 03/06/16 04:32, Michael Paquier wrote:
> > On Fri, Jun 3, 2016 at 11:23 AM, Sameer Kumar <sameer.ku...@ashnik.com> 
> > wrote:
> >> On Fri, Jun 3, 2016 at 4:30 AM Stephen Frost <sfr...@snowman.net> wrote:
> >>> Given the usefulness of this specific query and that it could be used
> >>> without risk of the user being able to gain superuser access through it,
> >>> I'd like to see a new function added which does not have the superuser
> >>> check, but is not allowed to be called by public initially either.
> 
> CREATE FUNCTION ls_dir(text)
>  RETURNS SETOF text
>  LANGUAGE sql
>  SECURITY DEFINER
> AS 'select * from pg_ls_dir($1)';

This isn't a good idea as it allows access to a great deal more than
just the number of xlogs.  Further, as described above, it gives that
access to everyone and not just to specific roles.

This is a great example of why we should provide an explicit function
which is documented (both in our documentation and in the documentation
of tools like check_postgres.pl) that users can use and can GRANT access
to for their monitoring systems which gives access to only the
information needed- that is, the number of xlog segments.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] psql remote shell command

2016-06-02 Thread Stephen Frost

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Jun 2, 2016 at 7:30 PM, Dennis  wrote:
> > Is it possible to execute command in on system the is hosting postgresql
> > remotely using psql or other mechanism?  I know I can use \! in psql but
> > that executes the commands on the host where I am running psql from.  Also,
> > is it possible for a postgres login/user to stop or restart a running
> > postgres instance from in side psql or similar client. e.g. psql -c “pg_ctl
> > stop -D some_data_dir”
> 
> ssh user@hostname ?
> 
> You should ask your system administrator for help.  Odds are if you cannot
> do "ssh user@hostname" then the person hosting the server doesn't want you
> to be able to execute arbitrary commands on the host.

To be fair, if the administrator really didn't want you to have access
to the postgres unix user account on the system, they shouldn't give you
a PG superuser account.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] psql remote shell command

2016-06-02 Thread Stephen Frost

* Dennis (denn...@visi.com) wrote:
> Is it possible to execute command in on system the is hosting postgresql 
> remotely using psql or other mechanism?  I know I can use \! in psql but that 
> executes the commands on the host where I am running psql from.  Also, is it 
> possible for a postgres login/user to stop or restart a running postgres 
> instance from in side psql or similar client. e.g. psql -c “pg_ctl stop -D 
> some_data_dir”

You can use COPY with PROGRAM, assuming you're a superuser.

COPY (select 1) TO PROGRAM 'whatever command';

You could use that to issue a pg_ctl stop, though you'll then lose your
connection to the database.

You can 'reload' the running configuration by using:

select pg_reload_conf();

Not all parameters can be changed while PG is running, but most of them
can be.  Parameters in postgresql.conf can be updated via ALTER SYSTEM
through psql also.

Issuing a 'pg_ctl restart' via COPY PROGRAM isn't a good idea and may
not work, though I suppose you could try if you really wish to.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] WAL's listing in pg_xlog by some sql query

2016-06-02 Thread Stephen Frost

David,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Jun 2, 2016 at 4:29 PM, Stephen Frost <sfr...@snowman.net> wrote:
> 
> > * Sameer Kumar (sameer.ku...@ashnik.com) wrote:
> > > On Fri, 3 Jun 2016, 12:14 a.m. Alex Ignatov, <a.igna...@postgrespro.ru>
> > > wrote:
> > > > Can I list all WAL files in pg_xlog  by using some sql query in
> > Postgres?
> > >
> > > Try
> > >
> > > Select pg_ls_dir('pg_xlog');
> >
> > Note that this currently requires superuser privileges.
> >
> > Given the usefulness of this specific query and that it could be used
> > without risk of the user being able to gain superuser access through it,
> > I'd like to see a new function added which does not have the superuser
> > check, but is not allowed to be called by public initially either.
> >
> > Something along the lines of 'pg_xlog_file_list()', perhaps.  There is a
> > check in check_postgres.pl which could take advantage of this also.
> > Should be a very straight-forward function to write, perhaps good as a
> > starter project for someone.
> >
> 
> Isn't this the reason we created the newfangled pg_* roles in 9.6?

No, the default roles are specifically to address situations where our
GRANT system is unable to provide the privilege granularity necessary;
ie: the function needs to be executable by 'public' but should behave
differently depending on if the individual calling it has privileged
access or not.

In other words, a case like pg_cancel_query/pg_terminate_backend, where
users can cancel queries of roles they are a member of, superusers can
can cancel queries of all roles, and members of pg_signal_backend can
cancel queries for all non-superusers.

In this case, I think we'd want a whole new function, in which case it
does not need to be callable by a non-privileged individual and does not
need to distinguish between a non-privileged user, a privileged user,
and superuser.

Technically, we could have the pg_ls_dir() function check its argument
and decide to allow it if some new default role 'pg_allow_xlog_ls'
existed and the user was a member of it, but that strikes me as a whole
lot of unnecessary complexity and potential for issue, not to mention
that it certainly wouldn't be very straight-forward to document or
explain to users.

The suggested function would also be able to take additional arguments,
or maybe a second column in the result set, to extract/identify subsets
of xlogs ("xlogs waiting to be archived via archive_cmd", "xlogs being
held due to wal_keep_segments", etc). 

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] WAL's listing in pg_xlog by some sql query

2016-06-02 Thread Stephen Frost

* Sameer Kumar (sameer.ku...@ashnik.com) wrote:
> On Fri, 3 Jun 2016, 12:14 a.m. Alex Ignatov, 
> wrote:
> > Can I list all WAL files in pg_xlog  by using some sql query in Postgres?
> 
> Try
> 
> Select pg_ls_dir('pg_xlog');

Note that this currently requires superuser privileges.

Given the usefulness of this specific query and that it could be used
without risk of the user being able to gain superuser access through it,
I'd like to see a new function added which does not have the superuser
check, but is not allowed to be called by public initially either.

Something along the lines of 'pg_xlog_file_list()', perhaps.  There is a
check in check_postgres.pl which could take advantage of this also.
Should be a very straight-forward function to write, perhaps good as a
starter project for someone.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] postgres_fdw and Kerberos authentication

2016-06-01 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Jean-Marc Lessard  writes:
> > A nice way to meet security requirements would be to provide single sign on 
> > support for the postgres_fdw.
> > As long as you have defined a user in the source and destination databases, 
> > and configure the Kerberos authentication you should be able to use 
> > postgres_fdw.
> 
> It's not really that easy, because postgres_fdw (like the server in
> general) is running as the database-owner operating system user.
> How will you associate a Postgres role that's responsible for a
> particular connection request with some Kerberos credentials,
> while keeping it away from credentials that belong to other roles?

That's actually not that difficult and is something which Apache and
mod_auth_kerb has been doing for a very long time.

> This is certainly something that'd be useful to have, but it's not
> clear how to do it in a secure fashion.

The database owner operating system user has to be trusted, along with
any superusers in the database, but if you assume those, then having PG
manage the different Kerberos cache files (one for each backend which
has authenticated via Kerberos and passed through delegation
credentials) should work.  Clearly, we can't give the user control over
which credential cache to use.

Having to trust the OS user and superusers with those credentials isn't
any different from using passwords with postgres_fdw.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Clarify "allow_system_table_mods"

2016-04-25 Thread Stephen Frost

* Melvin Davidson (melvin6...@gmail.com) wrote:
> Hmmm, if you go back a few comments, you will note that per initdb --help
> there is no such option available.

It's not an option *to* initdb, it's an option which is used *by*
initdb.

I'm afraid I'm done with this particular discussion.  Hopefully it's
purpose is now clear and you understand a bit better what is required to
actually add a column to pg_class.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Clarify "allow_system_table_mods"

2016-04-25 Thread Stephen Frost

Melvin,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> So in essence "*Allows modification of the structure of system tables" does
> NOT allow _structural_ changes (probably only dml changes)
> and the documentation should be changed to clarify.

That would imply that other changes are acceptable.  That is not the
case.  Certain DML changes could cause crashes too.

The documentation is pretty clear that this option is for initdb, and
not anyone else.  I'm unconvinced that we need anything more.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Clarify "allow_system_table_mods"

2016-04-25 Thread Stephen Frost

* Melvin Davidson (melvin6...@gmail.com) wrote:
> Then could you please clarify exactly what structural mods are permitted by
> *"*
> *Allows modification of the structure of system tables" ?*

I would say, in short, those modifications which are implemented by PG
developers who know what's safe to do with the catalog tables.

Beyond that, I'm afriad you'd need to read the source code.  I don't
know offhand the complete set of "what's safe to do", though it's
probably a pretty short list and certainly doesn't include adding
columns.

Ultimately, that switch isn't for end users to use to modify the
catalogs.  If you'd like to modify the structure of pg_class, you would
start by looking at src/include/catalog/pg_class.h, though there's quite
a few other bits that would need to change too.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Clarify "allow_system_table_mods"

2016-04-25 Thread Stephen Frost

David, Melvin,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Monday, April 25, 2016, Melvin Davidson  wrote:
> > I need clarification on allow_system_table_mods parameter
> > Per the documentation:
> > *Allows modification of the structure of system tables.* This is used by
> > initdb. This parameter can only be set at server start.
> >
> > However, attempting to modify pg_class to add another column fails with
> > "STATUS_ACCESS_VIOLATION" as below.
> > So either only certain system catalogs may be changed, or only certain
> > types of structure changes are allowed.

There might be some things about system catalogs you can modify, but
generally speaking, you can't add a column or otherwise change the
structure.  System catalogs are represented in memory by C structures
(and more), so it's not too surprising that adding a column causes a
crash.

> If it's going to prevent the action it should be able to do so without
> crashing the server..and I doubt it's intended to be opinionated aside from
> covering all DDL.  What little there is on Google seems to support this.

Sorry, but as a superuser, there's a lot of things you can do to crash
the server, this is just one way.  The effort required to prevent
anything bad from happening when a user is running as a superuser is far
from trivial.

In short, no, you can't just add a column to pg_class via SQL, and I
don't think we're going to be very interested in trying to "fix" such
cases.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Enhancement Request

2016-04-20 Thread Stephen Frost

Rob,

* Rob Brucks (rob.bru...@rackspace.com) wrote:
> I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure 
> if this is the correct mailing list.  So if it's not then please let me know 
> where I need to post this.

This is the correct place.  I don't know why people are suggesting third
party sites, but the correct place is -general, as you've done, which is
fantastic.

> These are monitoring-centric enhancement requests since I'm trying to 
> implement accurate monitoring in a secure fashion.

I've been working on exactly this problem and 9.6 will (finally) have
the start of work to improve PostgreSQL in this area.  Your thoughts and
use-cases are exactly what we need to continue that effort and get to a
point where monitoring solutions can depend on PostgreSQL to provide the
features, capabilities, and information which they need, without
requiring the monitoring user to be a superuser.

> * General monitoring:
> We have a need for a "monitoring" role in PostgreSQL that has read-only 
> access to any "pg_stat" view.  As of 9.4, only a super-user can read all 
> columns of "pg_stat_activity", "pg_stat_replication", and "pg_stat_archiver" 
> (there may be other restricted views as well).  These views provide critical 
> insight on how well the cluster is operating and what is going on.

That was proposed and was called 'pg_monitor'.  Unfortunately, through a
lack of support and questions about such a role possibly being "too
broad", it ended up not being included for 9.6.  I'd very much like to
work through those issues and find a solution for post-9.6 (note that we
are past the feature freeze point for 9.6, so any further changes will
be for later versions).

> * Streaming Replication Monitoring:
> Make the "pg_stat_replication" view more persistent (maybe keep the rows for 
> 24 hours or have a registration process?).

I believe this is improved when working with replication slots in recent
versions.

> If anyone can provide insight on how I could accomplish these in a simple 
> manner by other means then I'm all ears!

Please continue to engage with the PostgreSQL community on these issues.
I agree that these are critical features which we really need to have
and will continue to work on them, but support from users, particularly
with detailed real-world use-caes, goes a very long way.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Postgresql 9.3.4 file system compatibility

2016-04-11 Thread Stephen Frost

* Albe Laurenz (laurenz.a...@wien.gv.at) wrote:
> Marllius wrote:
> > OCFS2 = oracle cluster file system 2
> 
> I think using OCFS2 for PostgreSQL data is a good idea if you want
> to be the first at something or try to discover bugs in OCFS2.

I've found that OCFS2 is a very decent clustered filesystem for smallish
environments (4 or 8).  We had trouble with larger clusters.

> Why do you want a cluster file system for PostgreSQL?  You cannot
> have more than one server access the same data at the same time
> anyway.

This is probably the better question.  Using a clustered filesystem can
be handy for redundancy, and you could use OCFS2 or DRDB for that, but
don't expect to be able to run multiple PG servers concurrently from the
same set of data files.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Really unique session ID - PID + connection timestamp?

2016-04-09 Thread Stephen Frost

* Christoph Berg (c...@df7cb.de) wrote:
> Re: Durumdara 2016-04-09 
> 
> > In MS the session id is smallint, so it can repeats after server restarts,
> > but my coll. found a "session creation timestamp".
> > This is a key which unique.
> > With this we can check for died sessions and we can clean their records.
> > 
> > We want create same mechanism.
> > I know there are adv. locks in PG, but I want to use session id.
> > 
> > This could be:
> > pg_backend_pid()
> > 
> > May pid repeats.
> > Where I can get timestamp or some other unique data with I can create a
> > combined primary key?
> 
> Btw, what you are describing is exactly what %c in log_line_prefix
> does.

That's not currently exposed at an SQL level anywhere though, is it?
Perhaps we should add a way to get that.  Another thought would be to
essentially expose 'log_line_prefix()' at the SQL level but I'm not sure
that generally makes sense.  We could also have an explicit function to
get MyStartTime, similar to pg_backend_pid().

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Really unique session ID - PID + connection timestamp?

2016-04-09 Thread Stephen Frost

Greetings,

* Durumdara (durumd...@gmail.com) wrote:
> Where I can get timestamp or some other unique data with I can create a
> combined primary key?

Not entirely sure about the rest of it, but if you want backend start
time, you can look at pg_stat_activity (which also happens to have the
pid).

If you want just your own, combine it with pg_backend_pid, as in:

select
  pid || ',' || backend_start
from pg_stat_activity where pid = pg_backend_pid();

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive

2016-03-25 Thread Stephen Frost

David,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Thu, Mar 24, 2016 at 4:51 AM, Stephen Frost <sfr...@snowman.net> wrote:
> > I don't see any reason why the patch itself would be terribly difficult,
> > but are we sure we'd want just ANALYZE and not VACUUM also?  Which would
> > have to be another bit, since those are pretty different actions.
>
> In the limited experience that prompted this requested the benefit of
> performing a VACUUM is significantly less than the benefit of performing
> ANALYZE, and the cost of the former is considerably higher.  I'm quite
> content to leave VACUUM decisions to the auto-vacuum process which balances
> the benefit of removing bloat with the I/O cost of doing so.

I guess I don't entirely follow that logic.  autovacuum, even though
it's name doesn't imply it, is *also* quite responsible for ensuring
that ANALYZE is done regularly on the tables and even has options to
control when ANALYZE is run which would it to run more frequently than
vacuums.

Further, a lot of ETL could have very good reason to want to run a
VACUUM, especially with the changes that we continue to make which make
that process less and less expensive of an operation to run.

> > The question really is- what other things might we want as grantable
> > rights in the future?  Once these 16 bits are gone, it's a whole bunch
> > of work to get more.
> 
> If I am reading parsenodes.h correctly we presently use only 12 of 16 bits
> and those that are present all seem ancient.  With no other existing need
> to add a single additional grantable option, let alone 4, I'm not see this
> as being particularly concerning.

They're not all ancient- TRUNCATE was added not that long ago and took
quite a few years of convincing before it was accepted (I asked for it
when I first started working on PG, some 15-or-so years ago and it
wasn't actually included until 3 or 4 years ago, iirc).

Further, as we add new features, new kinds of GRANTs can be needed.
Consider the case of auditing, for example.  When we finally get around
to adding support for proper in-core auditing, it may be desirable for
individuals other than the owner of a relation to be able to control the
auditing of the table.

> Let someone else argue for inclusion of VACUUM before considering adding it
> - all I believe that we need is ANALYZE.  I want programs doing ETL to be
> able to get the system into "good-enough" shape to be functional;
> maintenance processes can deal with the rest.

ANALYZE is a maintenance process too, really, so I don't entirely buy
your argument here.  Either we support having these maintanence-type
actions being performed by non-owners, or we don't and encourage
everyone to configure autovacuum to meet their needs.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive

2016-03-24 Thread Stephen Frost

David,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> Which means that, aside from effort, the main blocking factors here are
> code complexity (which I understand) and limited grant "bits" as Stephen
> puts it.  So I pose the question: do any of the committers consider a grant
> bit too valuable to consume on an ANALYZE grant?

I wasn't referring to "bits" as "things" to do but rather as actual
zeros and ones- AclMode is a 32bit integer, of which the bottom half are
'regular' grantable rights and the top half are "with grant option"
indications, meanly we've only got 16 to work with, and every object
uses AclMode, so we have to support *all* kinds of GRANTs with those 16
bits.

See src/include/nodes/parsenodes.h, around line 63.

> If that and/or general code complexity means this will not be added even if
> a patch was proposed for 9.7 then I'll move on and institute one of the
> hacks that has been proffered.  Otherwise I have (more than) half a mind to
> find some way to get a patch written.

I don't see any reason why the patch itself would be terribly difficult,
but are we sure we'd want just ANALYZE and not VACUUM also?  Which would
have to be another bit, since those are pretty different actions.

The question really is- what other things might we want as grantable
rights in the future?  Once these 16 bits are gone, it's a whole bunch
of work to get more.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] UPSERT and HOT-update

2016-03-19 Thread Stephen Frost

Daniel,

* CHENG Yuk-Pong, Daniel  (j16s...@gmail.com) wrote:
> I am doing a INSERT...ON CONFLICT...UPDATE.. on a table. The query is
> mostly-UPDATE and does not change any columns most of the time, like
> so:
> 
>   CREATE INDEX ON book(title);
>   INSERT INTO book (isbn, title, author, lastupdate) VALUES ($1,$2,$3, now())
> ON CONFLICT (isbn) DO UPDATE set title=excluded.title,
> author=excluded.author, lastupdate=excluded.lastupdate;
> 
> PostgreSQL seems to consider the title as changed and refused to do a
> HOT-update. It works if I remove the `title=...` part.
> 
> Are there any tricks to make it smarter? The title don't change most
> of the time after all.

If it's really that infrequent for the title to change, you could do
something like:

insert into book (isbn, title, author, lastupdate) values ('$1','$2',
'$3', now()) on conflict (isbn) do update set author=excluded.author,
lastupdate = excluded.lastupdate where book.title = excluded.title;

and if that doesn't change any rows then the title did change and you
need to run the command you have above.

What might be kind of neat would be to have multiple UPDATE clauses
allowed for the INSERT .. ON CONFLICT DO UPDATE and then you could have
different WHERE clauses and do it all in one command.

Another interesting idea would be a different kind of 'UPDATE SET'
operation (maybe '*=' or something?) which says "only change this if the
value actually changed."  There's clearly a lot of cases where that
use-case is desired.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive

2016-02-29 Thread Stephen Frost

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Mon, Feb 29, 2016 at 6:31 AM, Stephen Frost <sfr...@snowman.net> wrote:
> 
> > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > Given the amount of damage a person with write access to a table can get
> > > into it seems pointless to not allow them to analyze the table after
> > their
> > > updates - since best practices would say that normal work with a table
> > > should not be performed by an owner.
> > >
> > > I should the check for whether a given user can or cannot analyze a table
> > > should be whether the user has INSERT, UPDATE, or DELETE privileges.
> >
> > Realistically, ANALYZE is a background/maintenance task that autovacuum
> > should be handling for you.
> 
> Then my recent experience of adding a bunch of records and having the
> subsequent select query take forever because the table wasn't analyzed is
> not supposed to happen?  What am I doing wrong then that autovacuum didn't
> run for me?

Perhaps nothing.  Making autovacuum more aggressive is a trade-off and
evidently there weren't enough changes or perhaps not enough time for
autovacuum to realize it needed to kick in and re-analyze the table.
One thought about how to address that might be to have a given backend,
which is already sending stats info to the statistic collector, somehow
also bump autovacuum to wake it up from its sleep to go analyze the
tables just modified.  This is all very hand-wavy as I don't have time
at the moment to run it down, but I do think it'd be good to reduce the
need to run ANALYZE by hand after every data load.

> > > I suppose row-level-security might come into play here...
> >
> > Yes, you may only have access to a subset of the table.
> >
> >
> TBH, since you cannot see the data being analyzed I don't see a security
> implication here if you allow someone to ANALYZE the whole table even when
> RLS is in place.

I wasn't looking at it from a security implication standpoint as I
suspect that any issue there could actually be addressed, if any exist.

What I was getting at is that you're making an assumption that any user
with DML rights on the table also has enough information about the table
overall to know when it makes sense to ANALYZE the table or not.  That's
a bit of a stretch to begin with, but when you consider that RLS may be
involved and the user may only have access to 1% (or less) of the
overall table, it's that much more of a reach.

> If we had plenty more bits to allow ANALYZE to be independently
> > GRANT'able, then maybe, but those are a limited resource.
> >
> 
> The planner and system performance seems important enough to give it such
> a resource.  But as I stated initially I personally believe that a user
> with INSERT/DELETE/UPDATE permissions on a table (maybe require all three)
> should also be allowed to ANALYZE said table.

I don't think requiring all three would make any sense and would,
instead, simply be confusing.  I'm not completely against your general
idea, but let's keep it simple.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] Only owners can ANALYZE tables...seems overly restrictive

2016-02-29 Thread Stephen Frost

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> Given the amount of damage a person with write access to a table can get
> into it seems pointless to not allow them to analyze the table after their
> updates - since best practices would say that normal work with a table
> should not be performed by an owner.
> 
> I should the check for whether a given user can or cannot analyze a table
> should be whether the user has INSERT, UPDATE, or DELETE privileges.

Realistically, ANALYZE is a background/maintenance task that autovacuum
should be handling for you.

> I suppose row-level-security might come into play here...

Yes, you may only have access to a subset of the table.

If we had plenty more bits to allow ANALYZE to be independently
GRANT'able, then maybe, but those are a limited resource.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [GENERAL] "plan should not reference subplan's variable" when using row level security

2016-02-24 Thread Stephen Frost

* Adam Guthrie (asguth...@gmail.com) wrote:
> On 24 February 2016 at 20:27, Stephen Frost <sfr...@snowman.net> wrote:
> > Yeah, looks like a bug to me.  My gut reaction is that we're pulling up
> > a subquery in a way that isn't possible and that plan shouldn't be
> > getting built/considered.
> 
> Thanks - shall I go ahead and submit a bug report?

Sure.

> > As a work-around, until we fix it, you could create an sql function to
> > check for the existance of the id in 'a' and use that in the policy
> > definition.
> 
> I've also discovered that using the following policy instead
> 
> CREATE POLICY a_select ON b FOR SELECT
> USING ( a_id IN (SELECT id FROM a) );
> 
> also seems to work around the issue.

Yes, that also works, but it could get painful if 'a' gets large.  An
SQL function like:

select exists (select * from a where a.id = $1);

Would still use an indexed lookup against 'a'.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [GENERAL] "plan should not reference subplan's variable" when using row level security

2016-02-24 Thread Stephen Frost

Adrian,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> I started to work through this when I realized the
> permissions/attributes of the role test are not shown. This seems to
> be important as the UPDATE example works if you run it immediately
> after:
> 
> INSERT INTO b (id, a_id, text) VALUES (1, 1, 'one');

Not sure what you mean- the necessary permissions for the test role are
just the 'GRANT ALL ON ALL TABLES' which is included.

If you run the UPDATE immediately after the INSERT, then it's before
that GRANT and, more importantly, before the 'SET ROLE', meaning that
you're running it as the table owner, and the policy is ignored
(policies are not applied to the owner of the table, unless FORCE RLS is
used).

Thanks!

Stephen

signature.asc
Description: Digital signature

1 2 3 4 5 >

1 - 100 of 401 matches

Mail list logo