Re: Inserts restricted to a trigger

2019-06-18 Thread Adrian Klaver

On 6/18/19 10:14 AM, Miles Elam wrote:
Thanks for the suggestion. Unfortunately we only have a single login 
role (it's a web app) and then we SET ROLE according to the contents of 
a JSON Web Token. So we end up with SESSION_USER as the logged in user 
and the active role as CURRENT_USER.


Have not tried it but nested function?:

1) Outer function runs as normal user and grabs the CURRENT_USER. This 
is passed into 2)


2) Audit function that runs with SECURITY DEFINER.

Other option is to record the CURRENT_USER in the table the trigger is 
on and just pass that to the audit function.




It may be that we're just stuck with a gap and need to just try and keep 
track of our mutation points, such as limit what is accessible through 
REST or GraphQL, and there is no way to fundamentally lock this down in 
Postgres. I was checking the mailing list to see if I'd missed anything.



On Tue, Jun 18, 2019 at 9:47 AM Torsten Förtsch > wrote:


Have you tried session_user?

create function xx() returns table (cur text, sess text)
security definer language sql as $$
     select current_user::text, session_user::text;
$$;

Then log in as different user and:

=> select (xx()).*;
    cur    | sess
--+---
  postgres | write


On Tue, Jun 18, 2019 at 6:30 PM Miles Elam
mailto:miles.e...@productops.com>> wrote:

That seems straightforward. Unfortunately I also want to know
the user/role that performed the operation. If I use SECURITY
DEFINER, I get the superuser account back from CURRENT_USER, not
the actual user.

Sorry, should have included that in the original email. How do I
restrict access while still retaining info about the current
user/role?


On Mon, Jun 17, 2019 at 5:47 PM mailto:r...@raf.org>> wrote:

Adrian Klaver wrote:

 > On 6/17/19 4:54 PM, Miles Elam wrote:
 > > Is there are way to restrict direct access to a table
for inserts but
 > > allow a trigger on another table to perform an insert
for that user?
 > >
 > > I'm trying to implement an audit table without allowing
user tampering
 > > with the audit information.
 >
 > Would the below not work?:
 > CREATE the table as superuser or other privileged user
 > Have trigger function run as above user(use SECURITY DEFINER)

and make sure not to give any other users insert/update/delete
permissions on the audit table.

 > > Thanks in advance,
 > >
 > > Miles Elam
 >
 > --
 > Adrian Klaver
 > adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: psql UPDATE field [tab] expands to DEFAULT?

2019-06-18 Thread Adrian Klaver

On 6/18/19 3:23 PM, Ken Tanzer wrote:
On Mon, Jun 17, 2019 at 4:24 PM Adrian Klaver > wrote:



My cheat for dealing with many/long column names is:

test=# \d up_test
                Table "public.up_test"
   Column |  Type   | Collation | Nullable | Default
+-+---+--+-
   id     | integer |           |          |
   col1   | boolean |           |          |
   col2   | integer |           |          |



test=# \pset format unaligned
Output format is unaligned.
test=# \pset fieldsep ','
Field separator is ",".

select * from up_test limit 0;
id,col1,col2

Cut and paste above.

test=# \pset fieldsep '|'
Field separator is "|".

test=# \pset format 'aligned'
Output format is aligned.


Just curious, but if you really do that often, wouldn't you be better 
off with something like this?


I could/should I just don't do the above enough to get motivated to 
build a function. Most cases where I'm doing complicated updates I am 
not using psql I am building then in Python from a dict.




CREATE OR REPLACE FUNCTION field_list( name ) RETURNS text AS $$

SELECT array_to_string(array_agg(column_name::text ORDER BY 
ordinal_position),',') FROM information_schema.columns WHERE table_name 
= $1;


$$ LANGUAGE sql STABLE;

Cheers,
Ken





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: psql UPDATE field [tab] expands to DEFAULT?

2019-06-18 Thread Ken Tanzer
On Mon, Jun 17, 2019 at 4:24 PM Adrian Klaver 
wrote:

>
> My cheat for dealing with many/long column names is:
>
> test=# \d up_test
>Table "public.up_test"
>   Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>   id | integer |   |  |
>   col1   | boolean |   |  |
>   col2   | integer |   |  |
>
>
>
> test=# \pset format unaligned
> Output format is unaligned.
> test=# \pset fieldsep ','
> Field separator is ",".
>
> select * from up_test limit 0;
> id,col1,col2
>
> Cut and paste above.
>
> test=# \pset fieldsep '|'
> Field separator is "|".
>
> test=# \pset format 'aligned'
> Output format is aligned.
>
>
Just curious, but if you really do that often, wouldn't you be better off
with something like this?

CREATE OR REPLACE FUNCTION field_list( name ) RETURNS text AS $$

SELECT array_to_string(array_agg(column_name::text ORDER BY
ordinal_position),',') FROM information_schema.columns WHERE table_name =
$1;

$$ LANGUAGE sql STABLE;

Cheers,
Ken



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: delete inside for plpgsql loop on same relation?

2019-06-18 Thread Tom Lane
Rob Nikander  writes:
> Are there guarantees about how this plpgsql behaves? It’s deleting from a 
> table while it loops over it. So far it seems like the delete is logically 
> after the select, as I hoped, and doesn’t interfere. 

> for row in select * from some_stuff loop
> delete from some_stuff where …
> ...
> end loop;

Sure.  A query will not see the effects of queries that start after it.
This isn't particularly plpgsql-specific.

Some qualifications are needed if you have triggers or volatile functions
in the first query (those *can* see later effects); but this usage seems
safe enough.

regards, tom lane




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Andres Freund
Hi,

On 2019-06-18 17:13:20 +0100, Fabio Ugo Venchiarutti wrote:
> Does the backend mmap() data files when that's possible?

No. That doesn't allow us to control when data is written back to disk,
which is crucial for durability/consistency.


> I've heard the "use the page cache" suggestion before, from users and
> hackers alike, but I never quite heard a solid argument dismissing potential
> overhead-related ill effects of the seek() & read() syscalls if they're
> needed, especially on many random page fetches.

We don't issue seek() for reads anymore in 12, instead do a pread() (but
it's not a particularly meaningful performance improvement). The read
obviously has cost, especially with syscalls getting more and more
expensive due to the mitigation for intel vulnerabilities.

I'd say that a bigger factor than the overhead of the read itself is
that for many workloads we'll e.g. incur additional writes when s_b is
smaller, that the kernel has less information about when to discard
data, that the kernel pagecaches have some scalability issues (partially
due to their generality), and double buffering.


> Given that shmem-based shared_buffers are bound to be mapped into the
> backend's address space anyway, why isn't that considered always
> preferable/cheaper?

See e.g. my point in my previous email in this thread about
drop/truncate.


> I'm aware that there are other benefits in counting on the page cache (eg:
> staying hot in the face of a backend restart), however I'm considering
> performance in steady state here.

There's also the issue that using a large shared buffers setting means
that each process' page table gets bigger, unless you configure
huge_pages. Which one definitely should - but that's an additional
configuration step that requires superuser access on most operating
systems.

Greetings,

Andres Freund




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Michael Curry
Thanks to everyone for all the detailed responses so far.

The dataset we're working with, in its original form, is roughly a
terabyte; I would guess in the database it will come out to about 2TB-ish.
It will definitely not fit in RAM, unfortunately, and our access patterns
will be quite random, at least at first (maybe as the analysis continues it
will become clear which records are irrelevant, but not yet).

I would love to take advantage of parallelism, but because this is on an
HPC cluster I don't have any personal control over the database version
(9.6.6) or anything involving OS configuration. I will take a look at the
current values of those configuration variables however; maybe we can get
at least some speedup.

It seems I can be confident that shared_buffers and work_mem, along with
effective_io_concurrency and effective_cache_size, ought to be much larger
than their default values. How much larger shared_buffers should be depends
on whether it is better to let Linux or Postgres handle the cache. I will
try to get those changes made and then start benchmarking some simple
queries.

On Tue, Jun 18, 2019 at 12:48 PM Andres Freund  wrote:

> Hi,
>
> On 2019-06-18 12:33:30 -0400, Alvaro Herrera wrote:
> > On 2019-Jun-18, Andres Freund wrote:
> >
> > > On 2019-06-17 19:45:41 -0400, Jeff Janes wrote:
> > > > If not, I would set the value small (say, 8GB) and let the OS do the
> > > > heavy lifting of deciding what to keep in cache.
> > >
> > > FWIW, in my opinion this is not a good idea in most cases. E.g. linux's
> > > pagecache doesn't scale particularly gracefully to large amounts of
> > > data, and it's decisions when to evict data aren't really better than
> > > postgres'. And there's a significant potential for additional
> > > unnecessary disk writes (because the kernel will flush dirty pagecache
> > > buffers, and then we'll just re-issue many of those writes again).
> >
> > One additional tuning point in this area is that you need to tweak the
> > Linux flusher so that it starts writing earlier than what it does by
> > default (by setting dirty_background_bytes to some reasonable value --
> > as I recall it normally runs on the dirty_background_ratio setting,
> > which means it scales up when you add RAM, which I'm not sure is really
> > sensible since you really care about the disk I/O write rate rather than
> > anything about the memory).  If you don't do that, the kernel can
> > accumulate lots of memory to write before starting to write it, which is
> > troublesome once it does.
>
> I think that's less needed these days - by default postgres triggers the
> backend to flush data when writes occur from bgwriter
> (bgwriter_flush_after) or checkpointer (checkpointer_flush_after). And
> IMO one should turn on the flushing by backends in most cases too
> (e.g. backend_flush_after=2MB), unless it's a really latency/jitter
> insensitive application, or storage is *REALLY* slow.
>
> There's a few things we don't flush that we maybe should (file extension
> writes, SLRUs), so it can still be sensible to tune
> dirty_background_bytes. But that has the disadvantage of also affecting
> temp file writes etc, which is usually not wanted.
>
> Greetings,
>
> Andres Freund
>


-- 
Michael J. Curry
cs.umd.edu/~curry


delete inside for plpgsql loop on same relation?

2019-06-18 Thread Rob Nikander
Hi,

Are there guarantees about how this plpgsql behaves? It’s deleting from a table 
while it loops over it. So far it seems like the delete is logically after the 
select, as I hoped, and doesn’t interfere. 

for row in select * from some_stuff loop
delete from some_stuff where …
...
end loop;

I’m using a temporary table of “things to process” and looping over it, 
deleting the ones as I go.

I don’t see anything mentioned here: 
https://www.postgresql.org/docs/11/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

Rob



Re: Is array_append O(n)?

2019-06-18 Thread Tom Lane
Rob Nikander  writes:
> On Jun 18, 2019, at 7:37 PM, Alvaro Herrera  wrote:
>> On 2019-Jun-18, Rob Nikander wrote:
>>> Does `array_append(arr, elt)` create a new array and copy everything?
>>> In other words, is it O(n) or O(1)? […]

>> Starting with 9.5, there's an optimization that I think is supposed to
>> make it O(1):

> Sweet! Thanks. Now I’d like to know how to tell if that optimization is being 
> used, or not. 

Time it and see ;-)

From memory, the cases that should be fast now are

* Assigning to an element of a plpgsql variable that is an array,
ie "arrayvar[n] := newelement".

* "arrayvar := arrayvar || newarrayelement".  I think it doesn't
matter whether you use operator or function-call syntax.

> Same question for the a way to pop an element off the end in O(1) time.

Offhand I don't remember any optimization for that, but I might be
wrong.   Looking at the comments in the plpgsql portion of the
aforementioned commit would probably give more info.

regards, tom lane




Re: Inserts restricted to a trigger

2019-06-18 Thread Miles Elam
Thanks for the suggestion. Unfortunately we only have a single login role
(it's a web app) and then we SET ROLE according to the contents of a JSON
Web Token. So we end up with SESSION_USER as the logged in user and the
active role as CURRENT_USER.

It may be that we're just stuck with a gap and need to just try and keep
track of our mutation points, such as limit what is accessible through REST
or GraphQL, and there is no way to fundamentally lock this down in
Postgres. I was checking the mailing list to see if I'd missed anything.


On Tue, Jun 18, 2019 at 9:47 AM Torsten Förtsch 
wrote:

> Have you tried session_user?
>
> create function xx() returns table (cur text, sess text)
> security definer language sql as $$
> select current_user::text, session_user::text;
> $$;
>
> Then log in as different user and:
>
> => select (xx()).*;
>cur| sess
> --+---
>  postgres | write
>
>
> On Tue, Jun 18, 2019 at 6:30 PM Miles Elam 
> wrote:
>
>> That seems straightforward. Unfortunately I also want to know the
>> user/role that performed the operation. If I use SECURITY DEFINER, I get
>> the superuser account back from CURRENT_USER, not the actual user.
>>
>> Sorry, should have included that in the original email. How do I restrict
>> access while still retaining info about the current user/role?
>>
>>
>> On Mon, Jun 17, 2019 at 5:47 PM  wrote:
>>
>>> Adrian Klaver wrote:
>>>
>>> > On 6/17/19 4:54 PM, Miles Elam wrote:
>>> > > Is there are way to restrict direct access to a table for inserts but
>>> > > allow a trigger on another table to perform an insert for that user?
>>> > >
>>> > > I'm trying to implement an audit table without allowing user
>>> tampering
>>> > > with the audit information.
>>> >
>>> > Would the below not work?:
>>> > CREATE the table as superuser or other privileged user
>>> > Have trigger function run as above user(use SECURITY DEFINER)
>>>
>>> and make sure not to give any other users insert/update/delete
>>> permissions on the audit table.
>>>
>>> > > Thanks in advance,
>>> > >
>>> > > Miles Elam
>>> >
>>> > --
>>> > Adrian Klaver
>>> > adrian.kla...@aklaver.com
>>>
>>>
>>>


Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Andres Freund
Hi,

On 2019-06-18 12:33:30 -0400, Alvaro Herrera wrote:
> On 2019-Jun-18, Andres Freund wrote:
> 
> > On 2019-06-17 19:45:41 -0400, Jeff Janes wrote:
> > > If not, I would set the value small (say, 8GB) and let the OS do the
> > > heavy lifting of deciding what to keep in cache.
> > 
> > FWIW, in my opinion this is not a good idea in most cases. E.g. linux's
> > pagecache doesn't scale particularly gracefully to large amounts of
> > data, and it's decisions when to evict data aren't really better than
> > postgres'. And there's a significant potential for additional
> > unnecessary disk writes (because the kernel will flush dirty pagecache
> > buffers, and then we'll just re-issue many of those writes again).
> 
> One additional tuning point in this area is that you need to tweak the
> Linux flusher so that it starts writing earlier than what it does by
> default (by setting dirty_background_bytes to some reasonable value --
> as I recall it normally runs on the dirty_background_ratio setting,
> which means it scales up when you add RAM, which I'm not sure is really
> sensible since you really care about the disk I/O write rate rather than
> anything about the memory).  If you don't do that, the kernel can
> accumulate lots of memory to write before starting to write it, which is
> troublesome once it does.

I think that's less needed these days - by default postgres triggers the
backend to flush data when writes occur from bgwriter
(bgwriter_flush_after) or checkpointer (checkpointer_flush_after). And
IMO one should turn on the flushing by backends in most cases too
(e.g. backend_flush_after=2MB), unless it's a really latency/jitter
insensitive application, or storage is *REALLY* slow.

There's a few things we don't flush that we maybe should (file extension
writes, SLRUs), so it can still be sensible to tune
dirty_background_bytes. But that has the disadvantage of also affecting
temp file writes etc, which is usually not wanted.

Greetings,

Andres Freund




Re: Is array_append O(n)?

2019-06-18 Thread Rob Nikander



> On Jun 18, 2019, at 7:37 PM, Alvaro Herrera  wrote:
> 
> On 2019-Jun-18, Rob Nikander wrote:
> 
>> Does `array_append(arr, elt)` create a new array and copy everything?
>> In other words, is it O(n) or O(1)? […]
> 
> Starting with 9.5, there's an optimization that I think is supposed to
> make it O(1):

Sweet! Thanks. Now I’d like to know how to tell if that optimization is being 
used, or not. 

Same question for the a way to pop an element off the end in O(1) time.








Re: Inserts restricted to a trigger

2019-06-18 Thread Torsten Förtsch
Have you tried session_user?

create function xx() returns table (cur text, sess text)
security definer language sql as $$
select current_user::text, session_user::text;
$$;

Then log in as different user and:

=> select (xx()).*;
   cur| sess
--+---
 postgres | write


On Tue, Jun 18, 2019 at 6:30 PM Miles Elam 
wrote:

> That seems straightforward. Unfortunately I also want to know the
> user/role that performed the operation. If I use SECURITY DEFINER, I get
> the superuser account back from CURRENT_USER, not the actual user.
>
> Sorry, should have included that in the original email. How do I restrict
> access while still retaining info about the current user/role?
>
>
> On Mon, Jun 17, 2019 at 5:47 PM  wrote:
>
>> Adrian Klaver wrote:
>>
>> > On 6/17/19 4:54 PM, Miles Elam wrote:
>> > > Is there are way to restrict direct access to a table for inserts but
>> > > allow a trigger on another table to perform an insert for that user?
>> > >
>> > > I'm trying to implement an audit table without allowing user tampering
>> > > with the audit information.
>> >
>> > Would the below not work?:
>> > CREATE the table as superuser or other privileged user
>> > Have trigger function run as above user(use SECURITY DEFINER)
>>
>> and make sure not to give any other users insert/update/delete
>> permissions on the audit table.
>>
>> > > Thanks in advance,
>> > >
>> > > Miles Elam
>> >
>> > --
>> > Adrian Klaver
>> > adrian.kla...@aklaver.com
>>
>>
>>


Re: Is array_append O(n)?

2019-06-18 Thread Alvaro Herrera
On 2019-Jun-18, Rob Nikander wrote:

> Does `array_append(arr, elt)` create a new array and copy everything?
> In other words, is it O(n) or O(1)? I’m trying to use plpgsql and
> realizing I don’t really have my usual data structures for basic
> algorithms the way I’d normally write them. I probably shouldn’t be
> treating arrays like Java ArrayList or C++ vector, where you can
> add/remove on the end with little cost (O(1), almost.)

Starting with 9.5, there's an optimization that I think is supposed to
make it O(1):

commit 1dc5ebc9077ab742079ce5dac9a6664248d42916
Author: Tom Lane 
AuthorDate: Thu May 14 12:08:40 2015 -0400
CommitDate: Thu May 14 12:08:49 2015 -0400

Support "expanded" objects, particularly arrays, for better performance.

This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Alvaro Herrera
Hello

On 2019-Jun-18, Andres Freund wrote:

> On 2019-06-17 19:45:41 -0400, Jeff Janes wrote:
> > If not, I would set the value small (say, 8GB) and let the OS do the
> > heavy lifting of deciding what to keep in cache.
> 
> FWIW, in my opinion this is not a good idea in most cases. E.g. linux's
> pagecache doesn't scale particularly gracefully to large amounts of
> data, and it's decisions when to evict data aren't really better than
> postgres'. And there's a significant potential for additional
> unnecessary disk writes (because the kernel will flush dirty pagecache
> buffers, and then we'll just re-issue many of those writes again).

One additional tuning point in this area is that you need to tweak the
Linux flusher so that it starts writing earlier than what it does by
default (by setting dirty_background_bytes to some reasonable value --
as I recall it normally runs on the dirty_background_ratio setting,
which means it scales up when you add RAM, which I'm not sure is really
sensible since you really care about the disk I/O write rate rather than
anything about the memory).  If you don't do that, the kernel can
accumulate lots of memory to write before starting to write it, which is
troublesome once it does.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: Connection refused (0x0000274D/10061)

2019-06-18 Thread Igor Neyman

From: Sourav Majumdar [mailto:souravmajumdar2...@gmail.com]
Sent: Tuesday, June 18, 2019 4:06 AM
To: Adrian Klaver 
Cc: Ray O'Donnell ; pgsql-gene...@postgresql.org
Subject: Re: Connection refused (0x274D/10061)

Hii, I have checked the logged file. I am attaching it with this mail. PFA

On Fri, Jun 14, 2019 at 1:58 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:
On 6/13/19 9:56 AM, Sourav Majumdar wrote:
> Hi,
> I have tried many time for setup postgreSQL for my local host. Hence I
> have tried to reInstall the one click app from
> https://www.enterprisedb.com/downloads/postgres-postgresql-downloads -  
> Windows
> x86-64  -  PostgreSQL Version 11.3. When installing using default port
> 5432, Its giving an error "Database Cluster Installation Failed". Please
> advice how to resolve the issue. I am struggling to setup this
> POSTGRESQL from more than a week.
>
> The error screenshot is attached. PFA
>

Sounds like a permissions problem to me.

See:

https://www.enterprisedb.com/docs/en/11.0/PG_Inst_Guide_v11/PostgreSQL_Installation_Guide.1.13.html#

to find the log of the install process. It will probably have more
information.


--
Adrian Klaver
adrian.kla...@aklaver.com

_

From the log that you attached:

Failed to remove inherited ACLs on (C:\PostgreSQL\data)
WScript.Network initialized...
strParentOfDataDirC:\PostgreSQL
logged in userMajumdar\Its Majumdar
Called AclCheck(C:\PostgreSQL\data)
Called IsVistaOrNewer()...
'winmgmts' object initialized...
Version:6.3
MajorVersion:6
Executing icacls to ensure the Majumdar\Its Majumdar account can read the path 
C:\PostgreSQL\data
Executing batch file 'rad3A6BD.bat'...
'icacls' is not recognized as an internal or external command,
operable program or batch file.


Installation script doesn’t find on your machine ICACLS, Windows utility that 
is being used by installation script to grant proper permissions.
Something’s wrong with your Windows OS.

Regards,
Igor Neyman





Re: Inserts restricted to a trigger

2019-06-18 Thread Miles Elam
That seems straightforward. Unfortunately I also want to know the user/role
that performed the operation. If I use SECURITY DEFINER, I get the
superuser account back from CURRENT_USER, not the actual user.

Sorry, should have included that in the original email. How do I restrict
access while still retaining info about the current user/role?


On Mon, Jun 17, 2019 at 5:47 PM  wrote:

> Adrian Klaver wrote:
>
> > On 6/17/19 4:54 PM, Miles Elam wrote:
> > > Is there are way to restrict direct access to a table for inserts but
> > > allow a trigger on another table to perform an insert for that user?
> > >
> > > I'm trying to implement an audit table without allowing user tampering
> > > with the audit information.
> >
> > Would the below not work?:
> > CREATE the table as superuser or other privileged user
> > Have trigger function run as above user(use SECURITY DEFINER)
>
> and make sure not to give any other users insert/update/delete
> permissions on the audit table.
>
> > > Thanks in advance,
> > >
> > > Miles Elam
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com
>
>
>


Is array_append O(n)?

2019-06-18 Thread Rob Nikander
Hi,

Does `array_append(arr, elt)` create a new array and copy everything? In other 
words, is it O(n) or O(1)? I’m trying to use plpgsql and realizing I don’t 
really have my usual data structures for basic algorithms the way I’d normally 
write them. I probably shouldn’t be treating arrays like Java ArrayList or C++ 
vector, where you can add/remove on the end with little cost (O(1), almost.)

Rob



Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Andres Freund
Hi,

On 2019-06-17 19:45:41 -0400, Jeff Janes wrote:
> If not, I would set the value small (say, 8GB) and let the OS do the
> heavy lifting of deciding what to keep in cache.

FWIW, in my opinion this is not a good idea in most cases. E.g. linux's
pagecache doesn't scale particularly gracefully to large amounts of
data, and it's decisions when to evict data aren't really better than
postgres'. And there's a significant potential for additional
unnecessary disk writes (because the kernel will flush dirty pagecache
buffers, and then we'll just re-issue many of those writes again).

It's a bit hard to be specific without knowing the workload, but my
guidance would be that if the data has some expected form of locality
(e.g. index lookups etc, rather than just sequentially scanning the
whole database) then sizing s_b for at least the amount of data likely
to be repeatedly accessed can be quite beneficial.

If increasing s_b can achieve that most writes can be issued by
checkpointer rather than backends and bgwriter, the generated IO pattern
is *far* superior since 9.6 (as checkpointer writes are sorted, whereas
bgwriter/backend writes aren't to a meaningful degree).

The one big exception is if the workload frequently needs to
drop/truncate non-temporary tables. There we currently linearly need to
search shared_buffers, which, although the constants are fairly small,
obviously means that drop/truncations get noticably slower with a larger
shared_buffers.

- Andres




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Fabio Ugo Venchiarutti

On 18/06/2019 00:45, Jeff Janes wrote:
On Mon, Jun 17, 2019 at 4:51 PM Michael Curry > wrote:


I am using a Postgres instance in an HPC cluster, where they have
generously given me an entire node. This means I have 28 cores and
252GB RAM. I have to assume that the very conservative default
settings for things like buffers and max working memory are too
small here.

We have about 20 billion rows in a single large table. 



What is that in bytes?  Do you only have that one table?

The database is not intended to run an application but rather to
allow a few individuals to do data analysis, so we can guarantee the
number of concurrent queries will be small, and that nothing else
will need to use the server. Creating multiple different indices on
a few subsets of the columns will be needed to support the kinds of
queries we want.

What settings should be changed to maximize performance?


With 28 cores for only a few users, parallelization will probably be 
important.  That feature is fairly new to PostgreSQL and rapidly 
improving from version to version, so you will want to use the last 
version you can (v11).  And then increase the values for 
max_worker_processes, max_parallel_maintenance_workers, 
max_parallel_workers_per_gather, and max_parallel_workers.  With the 
potential for so many parallel workers running at once, you wouldn't 
want to go overboard on work_mem, maybe 2GB.  If you don't think all 
allowed users will be running large queries at the same time (because 
they are mostly thinking what query to run, or thinking about the 
results of the last one they ran, rather than actually running queries), 
then maybe higher than that.


If your entire database can comfortably fit in RAM, I would make 
shared_buffers large enough to hold the entire database.  If not, I 
would set the value small (say, 8GB) and let the OS do the heavy lifting 
of deciding what to keep in cache.



Does the backend mmap() data files when that's possible?


I've heard the "use the page cache" suggestion before, from users and 
hackers alike, but I never quite heard a solid argument dismissing 
potential overhead-related ill effects of the seek() & read() syscalls 
if they're needed, especially on many random page fetches.



Given that shmem-based shared_buffers are bound to be mapped into the 
backend's address space anyway, why isn't that considered always 
preferable/cheaper?




I'm aware that there are other benefits in counting on the page cache 
(eg: staying hot in the face of a backend restart), however I'm 
considering performance in steady state here.




TIA



If you go with the first option, you
probably want to use pg_prewarm after each restart to get the data into 
cache as fast as you can, rather than let it get loaded in naturally as 
you run queries;  Also, you would probably want to set random_page_cost 
and seq_page_cost quite low, like maybe 0.1 and 0.05.


You haven't described what kind of IO capacity and setup you have, 
knowing that could suggest other changes to make.  Also, seeing the 
results of `explain (analyze, buffers)`, especially with track_io_timing 
turned on, for some actual queries could provide good insight for what 
else might need changing.


Cheers,

Jeff






--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group. 


 

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses. 


 

Fetch and Sizzle 
are trading names of Speciality Stores Limited and Fabled is a trading name 
of Marie Claire Beauty Limited, both members of the Ocado Group.


 


References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Buildings One & Two, Trident 
Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL.





Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Merlin Moncure
On Mon, Jun 17, 2019 at 6:46 PM Jeff Janes  wrote:
>
> On Mon, Jun 17, 2019 at 4:51 PM Michael Curry  wrote:
>>
>> I am using a Postgres instance in an HPC cluster, where they have generously 
>> given me an entire node. This means I have 28 cores and 252GB RAM. I have to 
>> assume that the very conservative default settings for things like buffers 
>> and max working memory are too small here.
>>
>> We have about 20 billion rows in a single large table.
>
>
> What is that in bytes?  Do you only have that one table?
>
>>
>> The database is not intended to run an application but rather to allow a few 
>> individuals to do data analysis, so we can guarantee the number of 
>> concurrent queries will be small, and that nothing else will need to use the 
>> server. Creating multiple different indices on a few subsets of the columns 
>> will be needed to support the kinds of queries we want.
>>
>> What settings should be changed to maximize performance?
>
>
> With 28 cores for only a few users, parallelization will probably be 
> important.  That feature is fairly new to PostgreSQL and rapidly improving 
> from version to version, so you will want to use the last version you can 
> (v11).  And then increase the values for max_worker_processes, 
> max_parallel_maintenance_workers, max_parallel_workers_per_gather, and 
> max_parallel_workers.  With the potential for so many parallel workers 
> running at once, you wouldn't want to go overboard on work_mem, maybe 2GB.  
> If you don't think all allowed users will be running large queries at the 
> same time (because they are mostly thinking what query to run, or thinking 
> about the results of the last one they ran, rather than actually running 
> queries), then maybe higher than that.
>
> If your entire database can comfortably fit in RAM, I would make 
> shared_buffers large enough to hold the entire database.  If not, I would set 
> the value small (say, 8GB) and let the OS do the heavy lifting of deciding 
> what to keep in cache.  If you go with the first option, you probably want to 
> use pg_prewarm after each restart to get the data into cache as fast as you 
> can, rather than let it get loaded in naturally as you run queries;  Also, 
> you would probably want to set random_page_cost and seq_page_cost quite low, 
> like maybe 0.1 and 0.05.
>
> You haven't described what kind of IO capacity and setup you have, knowing 
> that could suggest other changes to make.  Also, seeing the results of 
> `explain (analyze, buffers)`, especially with track_io_timing turned on, for 
> some actual queries could provide good insight for what else might need 
> changing.

This is all fantastic advice.  If all the data fits in memory (or at
least, all the data that is typically read from) and the cache is warm
then your database becomes an in memory database with respect to read
operations and all the i/o concerns and buffer management overhead go
away.

If your database does not fit in memory and your storage is fast, one
influential setting besides the above to look at besides the above is
effective_io_concurrency; it gets you faster (in some cases much
faster) bitmap heap scans. Also make sure to set effective_cache_size
high reflecting the large amount of memory you have; this will
influence query plan choice.

merlin




Re: perf tuning for 28 cores and 252GB RAM

2019-06-18 Thread Michael Lewis
>
> If your entire database can comfortably fit in RAM, I would make
> shared_buffers large enough to hold the entire database.  If not, I would
> set the value small (say, 8GB) and let the OS do the heavy lifting of
> deciding what to keep in cache.  If you go with the first option, you
> probably want to use pg_prewarm after each restart to get the data into
> cache as fast as you can, rather than let it get loaded in naturally as you
> run queries;  Also, you would probably want to set random_page_cost and
> seq_page_cost quite low, like maybe 0.1 and 0.05.
>

In all deference to your status as a contributor, what are these
recommendations based on/would you share the rationale? I'd just like to
better understand. I have never heard a recommendation to set random & seq
page cost below 1 before for instance.

If the entire database were say 1 or 1.5 TBs and ram was on the order of 96
or 128 GBs, but some of the data is (almost) never accessed, would the
recommendation still be the same to rely more on the OS caching? Do you
target a particular cache hit rate as reported by Postgres stats?

>


Re: Connection refused (0x0000274D/10061)

2019-06-18 Thread Moreno Andreo

Il 18/06/19 15:57, Adrian Klaver ha scritto:

On 6/18/19 1:06 AM, Sourav Majumdar wrote:
Hii, I have checked the logged file. I am attaching it with this 
mail. PFA


In future please do not attach a 1.4MB file. Most of it was Unpacking 
info. The relevant part was at the end:


"
fixing permissions on existing directory C:/PostgreSQL/data ... 
initdb: could not change permissions of directory 
"C:/PostgreSQL/data": Permission denied


[...]

Did you follow the instructions here?:

https://www.enterprisedb.com/docs/en/11.0/PG_Inst_Guide_v11/PostgreSQL_Installation_Guide.1.08.html# 



In particular the part about:

"To perform an installation using the graphical installation wizard, 
you must have superuser or administrator privileges. To start the 
installation wizard, assume sufficient privileges and double-click the 
installer icon; if prompted, provide a password.
Note that in some versions of Windows, you can invoke the installer 
with Administrator privileges by right clicking on the installer icon 
and selecting Run as Administrator from the context menu."



We had this issue in the past, and noted that re-issuing initdb from an 
Administrator Command Prompt (not doing anything else) from the Bin 
directory worked perfectly.


This was applied for 9.5, 9.6 and 10.x. Still not using 11 in 
production, so I have no cases to test at the moment.


My 2 cent

Moreno.-







Re: Connection refused (0x0000274D/10061)

2019-06-18 Thread Adrian Klaver

On 6/18/19 1:06 AM, Sourav Majumdar wrote:

Hii, I have checked the logged file. I am attaching it with this mail. PFA


In future please do not attach a 1.4MB file. Most of it was Unpacking 
info. The relevant part was at the end:


"
fixing permissions on existing directory C:/PostgreSQL/data ... initdb: 
could not change permissions of directory "C:/PostgreSQL/data": 
Permission denied


Called Die(Failed to initialise the database cluster with initdb)...
Failed to initialise the database cluster with initdb

Script stderr:
 Program ended with an error exit code

Error running cscript //NoLogo 
"C:\PostgreSQL/installer/server/initcluster.vbs" "NT 
AUTHORITY\NetworkService" "postgres" "" "C:\PostgreSQL" 
"C:\PostgreSQL\data" 5432 "DEFAULT" 0: Program ended with an error exit code

Problem running post-install step. Installation may not complete correctly
 The database cluster initialisation failed.
[23:30:13] Delete the temporary scripts directory...
Creating menu shortcuts...
Executing cscript //NoLogo 
"C:\PostgreSQL\installer\server\createshortcuts_clt.vbs" "PostgreSQL 11" 
"C:\PostgreSQL"

Script exit code: 0"


Did you follow the instructions here?:

https://www.enterprisedb.com/docs/en/11.0/PG_Inst_Guide_v11/PostgreSQL_Installation_Guide.1.08.html#

In particular the part about:

"To perform an installation using the graphical installation wizard, you 
must have superuser or administrator privileges. To start the 
installation wizard, assume sufficient privileges and double-click the 
installer icon; if prompted, provide a password.
Note that in some versions of Windows, you can invoke the installer with 
Administrator privileges by right clicking on the installer icon and 
selecting Run as Administrator from the context menu."






On Fri, Jun 14, 2019 at 1:58 AM Adrian Klaver > wrote:


On 6/13/19 9:56 AM, Sourav Majumdar wrote:
 > Hi,
 > I have tried many time for setup postgreSQL for my local host.
Hence I
 > have tried to reInstall the one click app from
 >
https://www.enterprisedb.com/downloads/postgres-postgresql-downloads -  
Windows

 > x86-64  -  PostgreSQL Version 11.3. When installing using default
port
 > 5432, Its giving an error "Database Cluster Installation Failed".
Please
 > advice how to resolve the issue. I am struggling to setup this
 > POSTGRESQL from more than a week.
 >
 > The error screenshot is attached. PFA
 >

Sounds like a permissions problem to me.

See:


https://www.enterprisedb.com/docs/en/11.0/PG_Inst_Guide_v11/PostgreSQL_Installation_Guide.1.13.html#

to find the log of the install process. It will probably have more
information.


-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
/Regards-/
/Sourav Majumdar/
/Mob.- *9732354141*/



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: How to return ARRAY from SQL function?

2019-06-18 Thread Tom Lane
Alexander Farber  writes:
> And then I shuffle the letters by -

> CREATE OR REPLACE FUNCTION words_shuffle(in_array text[])
> RETURNS text[] AS
> $func$
> SELECT array_agg(x ORDER BY RANDOM()) FROM UNNEST(in_array) x;
> $func$ LANGUAGE sql STABLE;

Hmm ... that's not really "stable", since it depends on random()
which is volatile.

regards, tom lane




Re: How to return ARRAY from SQL function?

2019-06-18 Thread Alexander Farber
Thank you, Laurenz and Tom -

On Fri, Jun 14, 2019 at 3:25 PM Tom Lane  wrote:
>
> Laurenz Albe  writes:
>
> > You'll have to specify an array of which type you want, probably
> >  ... RETURNS text[]
>
> Right.  Also, I don't recall the exact rules in this area, but I think
> that SQL functions are pickier about their return types than ordinary
> query contexts, meaning you might also need an explicit cast:
>
>SELECT ARRAY[
> '*', '*', 'А', 'А', 'А', 'А', 'А', 'А', 'А', 'А',
> ...
>]::text[];
>
> Try it without first, but if it moans about the query returning the
> wrong type, that's how to fix it.
>
>

this has worked for me:

 CREATE OR REPLACE FUNCTION words_all_letters()
RETURNS text[] AS
$func$
SELECT ARRAY[
'*', '*',
'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B',
'C', 'C',
'D', 'D', 'D', 'D',
'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E',
'F', 'F',
'G', 'G', 'G',
'H', 'H',
'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I',
'J',
'K',
'L', 'L', 'L', 'L',
'M', 'M',
'N', 'N', 'N', 'N', 'N', 'N',
'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
'P', 'P',
'Q',
'R', 'R', 'R', 'R', 'R', 'R',
'S', 'S', 'S', 'S',
'T', 'T', 'T', 'T', 'T', 'T',
'U', 'U', 'U', 'U',
'V', 'V',
'W', 'W',
'X',
'Y', 'Y',
'Z'
];
$func$ LANGUAGE sql IMMUTABLE;

And then I shuffle the letters by -

CREATE OR REPLACE FUNCTION words_shuffle(in_array text[])
RETURNS text[] AS
$func$
SELECT array_agg(x ORDER BY RANDOM()) FROM UNNEST(in_array) x;
$func$ LANGUAGE sql STABLE;

Regards
Alex


Re: Copy Bulk Ignore Duplicated

2019-06-18 Thread Peter J. Holzer
On 2019-06-17 16:01:57 -0300, Leandro Guimarães wrote:
> I've installed all dependencies, but when I try to "make install" in
> pg_bulkload folder if have some errors like this:
> 
> In file included from pgut/pgut.h:24:0,
>                  from pgut/pgut-fe.h:13,
>                  from pg_bulkload.c:17:
> /usr/include/postgresql/internal/pqexpbuffer.h:149:13: error: old-style
> parameter declarations in prototyped function definition
>  extern void printfPQExpBuffer(PQExpBuffer str, const char *fmt,...)
> pg_attribute_printf(2, 3);
>              ^
> pg_bulkload.c:743:1: error: expected ‘{’ at end of input
> 
> 
> My OS is Ubuntu Server 14.04 and PostgreSQL 9.4. I've searched about
> theses errors but it ways advice to change the C code in files but I'm
> not sure to do this in a tool like pg_bulkload.

The error message is misleading. There is no old-style parameter
declaration here. My guess is that the definition for the macro
pg_attribute_printf is missing, and without that definition this is
a syntax error (which the compiler reports although it seems to be
confused about the nature of the error).

In postgresql-9.5 (haven't 9.4 and am too lazy to download it)
pg_attribute_printf is defined in "c.h". 

I guess that this must be explicitely included in 9.4 but is implicitely
included in versions that the author of pg_bulkload tested.

You could try just to insert 

#include "c.h"

before line 17 of pg_bulkload.c.

But ultimately you should report this incompatibility to the author(s)
of pg_bulkload.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature