date:20161123

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:52 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> The parallel calls should not be working on the same row. Each query
>> services a different group ID on it's own and there is no overlap.
>
>
> Except the INSERT query in the trigger function is working on dates not
> group ids.
>
>>
>> Kind regards,
>> Sebastian
>>
>>
>> Tom Lane > schrieb am Mi.,
>> 23. Nov. 2016 um 17:47 Uhr:
>>
>> azh...@gmail.com  writes:
>> > I have a query which if run alone usually completes in about 300ms.
>> > When run in my application this query constantly locks up and bogs
>> > down all connections of the connection pool (In the application this
>> > query is run up to 10 times in parallel with different parameters).
>> > What's really weird is that I can re-run one of the hung queries
>> from
>> > the command line while it's hung and it will complete as expected
>> > while the hung queries continue to use 100% CPU time.
>>
>> Judging from the EXPLAIN timing, most of the work is in the trigger,
>> which leads me to wonder if the parallel calls are likely to be
>> fighting
>> over inserting/updating the same row in the group_history partition
>> tables.  Or are you certain that they should be hitting different
>> rows?
>>
>> regards, tom lane
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'

I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian

2016-11-24 0:26 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:47 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> Yes I had a look at the logfiles but there is not a single logfile
>> generated when I'm reproducing this issue.
>
>
> We are talking about the Postgres logfile, correct?
>
> I have to believe a log file exists, but maybe not entries at that time.
>
> What are you logging settings?
>
>> Concerning locks I used there queries
>> at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up
>> empty.
>> The group_history table and its sub-tables do not have any foreign keys
>> as they only contain calculated data and the source data is removed
>> after some time.
>> The trigger which calls the partition function below is defined as
>> follows:
>>
>> create
>> trigger group_history_trigger before insert
>> on
>> item_history for each row execute procedure
>> group_history_partition_function()
>
>
> This trigger is on the table item_history not group history, is that
> correct?
>
> If so how does the item_history table fit into this?
>
>>
>> Kind regards,
>> Sebastian
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not 
group ids.

Kind regards,
Sebastian

Tom Lane > schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azh...@gmail.com  writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently


On 11/23/2016 01:47 PM, azhwkd wrote:

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.


We are talking about the Postgres logfile, correct?

I have to believe a log file exists, but maybe not entries at that time.

What are you logging settings?


Concerning locks I used there queries
at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys
as they only contain calculated data and the source data is removed
after some time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()


This trigger is on the table item_history not group history, is that 
correct?


If so how does the item_history table fit into this?



Kind regards,
Sebastian



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query with large in clauses uses a lot of memory

greigwise  writes:
> So, I decided to try an experiment.  I wrote 2 queries as follows:
> 1 ) select pg_sleep(100) ;
> 2 ) with q (s1, s2) as (select pg_sleep(100), 1)
> select * from q where s2 in ( 1,  delimited numbers>)
>
> It looks to me like the connection running the big query is using about 2GB
> more memory than the other one.  I could see why it might use *some* more
> (like 28MB more?), but 2GB more seems excessive. 

Don't hold your breath waiting for that to get better.  Depending on what
I assume about the widths of your numbers, you've got something like 3
million Const parse nodes in that query, so the system is eating something
like 600-700 bytes per Const, which is not all that many copies because
one Const node plus List overhead is probably 100 bytes on a 64-bit
server.  OK, it's not exactly frugal perhaps, but it would not be hard to
get to that at all if you're running the query in a way that requires
keeping a plancache entry for it.  It would take significant work (and
probably some performance sacrifices) to make much of a dent in the
space consumption, and even if we put in the work, I'd only expect to
be able to dent it a bit --- an order-of-magnitude reduction is not in
the cards.  Queries with that many parse elements in them are just not
cheap.

Now, that WHERE condition will eventually get folded to the form

s2 = ANY ('{1,2,...}'::integer[])

and that constant array is a *lot* less space-wasteful, only 4 bytes
per element (or 8 bytes if we're talking bigints).  So the approach I'd
advise is trying to send the query with a constant array to begin with
--- either write it like that, or like

s2 = ANY ($1::integer[])

and send the array as an out-of-line parameter.

Don't know how hard it might be to arm-wrestle ActiveRecord into doing
it like that :-(

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query with large in clauses uses a lot of memory

2016-11-23 Thread greigwise

I had an issue today where the OOM killer terminated one of my postgres
processes.
On my server I have 8 GB of RAM, shared_memory is 1 GB and work_memory is
24MB.
I have connection pooling which limits us to 25 connections. Even if I'm
maxed out there, I'm still only using 1.6 MB of RAM of my 8 which seems like
it shouldn't be a problem.

Looking through my postgres logs, I noticed that right about the time of the
OOM incident, I had some queries running with pretty massive in clauses
(thank you ruby/ActiveRecord). One of the queries was about 28MB in size.

So, I decided to try an experiment. I wrote 2 queries as follows:
1 ) select pg_sleep(100) ;
2 ) with q (s1, s2) as (select pg_sleep(100), 1)
select * from q where s2 in ( 1, )

I ran those queries via psql and did this:

-sh-4.1$ ps aux | grep -i -E "local|COMMAND" | grep -v ruby
USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND
postgres 20896 27.0 28.2 3416812 2132112 ? Ss 21:18 0:02 postgres:
hireology hireology [local] SELECT
postgres 20899 0.0 0.0 1281368 4800 ?Ss 21:18 0:00 postgres:
hireology hireology [local] SELECT

It looks to me like the connection running the big query is using about 2GB
more memory than the other one. I could see why it might use *some* more
(like 28MB more?), but 2GB more seems excessive.

So, the question is why does it use so much more memory. And is there
anything I can do to limit this problem other than fixing the silly queries?

Thanks in advance for any help,
Greig Wise

--
View this message in context:
http://postgresql.nabble.com/Query-with-large-in-clauses-uses-a-lot-of-memory-tp5931716.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Kind regards,
Sebastian


Tom Lane  schrieb am Mi., 23. Nov. 2016 um 17:47 Uhr:

> azh...@gmail.com writes:
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Judging from the EXPLAIN timing, most of the work is in the trigger,
> which leads me to wonder if the parallel calls are likely to be fighting
> over inserting/updating the same row in the group_history partition
> tables.  Or are you certain that they should be hitting different rows?
>
> regards, tom lane
>

Re: [GENERAL] query locks up when run concurrently

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.
Concerning locks I used there queries at
https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys as
they only contain calculated data and the source data is removed after some
time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

Kind regards,
Sebastian

Adrian Klaver  schrieb am Mi., 23. Nov. 2016 um
16:25 Uhr:

> On 11/22/2016 11:41 PM, azh...@gmail.com wrote:
> > Greetings!
> >
> > I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
> > installation which was installed through apt-get.
> >
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Have you looked in the Postgres logs to see if there are any relevant
> messages?
>
> Best guess is that running in parallel is creating a lock problem where
> one of the queries is holding a lock that is keeping the others from
> completing.
>
> What relationships do the group_history and group_history_* tables have
> to other tables?
>
> What is the definition of the trigger that fires the trigger function
> below?
>
> >
> > The query in question is this:
> >
> > insert into group_history ("group", id, sub_category, "date", aa, ab,
> > bb, ba, quantity, "hour")
> > (select
> > a."group",
> > a.id,
> > b.sub_category,
> > to_timestamp($2)::date as "date",
> > max(a.aa / a.quantity) as aa,
> > min(a.aa / a.quantity) as ab,
> > max(a.bb / a.quantity) as bb,
> > min(a.bb/ a.quantity) as ba,
> > sum(a.quantity) as quantity,
> > extract('hour' from to_timestamp($2)) as "hour"
> > from tbla a
> > join tblb b on a.id = b.id
> > where a."group" = $1 and b."group" = $1
> > group by a."group", a.id, b.sub_category
> > );
> >
> > When I'm running a perf on the system it looks like this while running
> > the query 10 times:
> >
> > Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
> > Overhead Shared Object Symbol
> > 17.95% postgres [.] heap_hot_search_buffer
> > 5.64% postgres [.] heap_page_prune_opt
> > 4.62% postgres [.] hash_search_with_hash_value
> > 3.80% postgres [.] LWLockRelease
> > 3.73% postgres [.] 0x002f420d
> > 2.50% postgres [.] _bt_checkkeys
> > 2.48% postgres [.] hash_any
> > 2.45% postgres [.] 0x002f41e7
> > 2.10% postgres [.] slot_getattr
> > 1.80% postgres [.] ResourceOwnerForgetBuffer
> > 1.58% postgres [.] LWLockAcquire
> > 1.58% postgres [.] ReadBufferExtended
> > 1.54% postgres [.] index_fetch_heap
> > 1.47% postgres [.] MemoryContextReset
> > 1.43% postgres [.] btgettuple
> > 1.38% postgres [.] 0x002d710c
> > 1.36% postgres [.] 0x002d70a5
> > 1.35% postgres [.] ExecQual
> >
> > Explain (Analyze, Verbose) Output
> >
> > QUERY PLAN
> >
> 
> >
> -
> > Insert on public.group_history (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
> > -> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=353.880..376.490 rows=6139 loops=1)
> > Output: "*SELECT*"."group", "*SELECT*".id,
> > "*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
> > "*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
> > "*SELECT*"."hour"
> > -> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
> > (actual time=353.871..367.144 rows=6139 loops=1)
> > Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
> > max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
> > a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
> > precision
> > Group Key: a."group", a.id, b.sub_category
> > -> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
> > (actual time=193.949..294.106 rows=30343 loops=1)
> > Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
> > Hash Cond: (b.id = a.id)
> > -> Bitmap Heap Scan on public.auctions_extra b
> > (cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
> > rows=30343 loops=1)
> > Output: b.sub_category, b.id
> > Recheck Cond: (b."group" = 7)
> > Heap Blocks: exact=289
>

Re: [GENERAL] Methods to quiesce PostgreSQL DB without configuring log archival


On 11/21/2016 10:23 PM, MEERA wrote:
If archive_mode is not configured, and i use snapshot solution for 
backup of the server, how can i ensure data consistency? Is there a 
way to quiesce all the connections to DB?


select pg_startbackup();before taking the file system copy or 
snapshot, and


select pg_stopbackup(); when you're done.


--
john r pierce, recycling bits in santa cruz

Re: [GENERAL] Methods to quiesce PostgreSQL DB without configuring log archival

2016-11-23 Thread Kevin Grittner

On Wed, Nov 23, 2016 at 11:53 AM, Scott Mead  wrote:
> On Nov 22, 2016, at 01:23, MEERA  wrote:

>> If archive_mode is not configured, and i use snapshot solution for backup of
>> the server, how can i ensure data consistency? Is there a way to quiesce all
>> the connections to DB?
>
> If your snapshot solution is atomic, then you are *probably* okay. I would
> do a few restores to test, but atomic snapshots give you a point in time and
> should be consistent.

You should be OK *if all files in the backup are part of a single
atomic snapshot*.  You can't put pg_xlog or selected tablespaces on
a separate filesystem that has a separate snapshot and expect the
various snapshot to be combined to be used as a coherent backup
unless you are archiving WAL and use pg_start_backup() (before the
first snapshot is initiated) and pg_stop_backup() (after the last
snapshot is completes) to take a PITR-style recovery.

Be sure to follow all the rules for PITR-style backup and recovery,
like deleting the postmaster.pid file and all files under pg_xlog
before starting the recovery.  And of course, do NOT delete the
backup_label file created by pg_start_backup().

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.5 and Shell scripts/variables vs. C programming/defining a value to be used

2016-11-23 Thread Poul Kristensen

My goal is to grap a user from the OS and and substitute the grapped user -
to be the owner of the database - in the sql commands
e.g using shell-script substitution syntax:
   create user ${user} with login;
   create tablespace ${user} location '/path/${user)';
   create database ${user} owner ${user} on tablespace ${user};
I would be surpriced if this should not be possible using gnu gcc but I
might be wrong. It is just a question of figurering out way to do this.
http://zetcode.com/db/postgresqlc/ has an example of how to do this using
integer as argument on
the commandline.
Do you have a hint of the substitution syntax to be used in gnu gcc.

Thanks.

/Poul

2016-11-23 16:44 GMT+01:00 Tom Lane :

> "David G. Johnston"  writes:
> > On Wed, Nov 23, 2016 at 3:46 AM, Poul Kristensen 
> wrote:
> >> In the postgresql.log I get "select datname from $1".
>
> > That is expected when you parameterize a query - the query string
> includes
> > parameters instead of values.
>
> It looks like Poul is hoping to use a parameter as a table name,
> which doesn't work.  Parameters can only represent values (constants).
>
> regards, tom lane
>

-- 
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: [GENERAL] Partial update on an postgres upsert violates constraint

Andreas Terrius  writes:
> Is there any way to check whether the row already exists before checking
> constraints ? I still want it to fail if it turns out to be a new row
> (which would violate the not null constraint), but updates the row if it
> already exists.

I'm not really sure why you expect this to be able to work.  The data
you are supplying is insufficient to satisfy the INSERT case, so why do
you think that an insert-or-update should be possible?

ISTM that you could try the UPDATE first and notice whether it updates
any rows or not.  I'm not sure what you're going to do if it doesn't,
though, since you still haven't got enough data to do an INSERT.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Andreas Kretschmer



Am 23. November 2016 20:31:21 MEZ, schrieb John R Pierce :
>On 11/23/2016 11:20 AM, Israel Brewster wrote:
>> How do I determine which those are? Just based on the timestamp if
>the 
>> WAL file, such that I could do something like take the timestamp of 
>> the last basebackup and delete all WAL files older than that? Or is 
>> there a better way?
>
>there's a tool, barman, I've never used but its supposed to greatly 
>simplify this whole process...
>
>http://www.pgbarman.org/


Barman is a really great tool.

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Joe Conway

On 11/23/2016 02:50 PM, Israel Brewster wrote:
> On Nov 23, 2016, at 10:31 AM, John R Pierce > there's a tool, barman, I've never used but its supposed to greatly
>> simplify this whole process...
>>
>> http://www.pgbarman.org/
> 
> Definitely looks like something to try. Thanks!

I'd recommend you also look at pgbackrest:
https://github.com/pgbackrest/pgbackrest

HTH,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster

On Nov 23, 2016, at 10:31 AM, John R Pierce  wrote:
> 
> On 11/23/2016 11:20 AM, Israel Brewster wrote:
>> How do I determine which those are? Just based on the timestamp if the WAL 
>> file, such that I could do something like take the timestamp of the last 
>> basebackup and delete all WAL files older than that? Or is there a better 
>> way?
> 
> there's a tool, barman, I've never used but its supposed to greatly simplify 
> this whole process...
> 
> http://www.pgbarman.org/ 

Definitely looks like something to try. Thanks!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---
> 
> -- 
> john r pierce, recycling bits in santa cruz
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"


On 11/23/2016 11:20 AM, Israel Brewster wrote:
How do I determine which those are? Just based on the timestamp if the 
WAL file, such that I could do something like take the timestamp of 
the last basebackup and delete all WAL files older than that? Or is 
there a better way?


there's a tool, barman, I've never used but its supposed to greatly 
simplify this whole process...


http://www.pgbarman.org/

--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster


---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


On Nov 23, 2016, at 9:42 AM, John R Pierce  wrote:On 11/23/2016 10:16 AM, Israel Brewster wrote:To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:if you do the base backup periodically (daily? weekly?   monthly? kinda depends on your workload, data volume, and available space on your WAL archive server)- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?frequency of doing this is entirely dependent on your data volume, how long the backups take, and your tolerance for restore times. wal archive recoveries are generally quite fast, but of course if there's millions of wal files to recover, that would take a long time.  note, too, 9.6 has some major performance enhancements in how checkpoints are written, which should speed things up overall.Backups (using pg_basebackup to a tar file, gziped) take about 45 minutes. Part of the goal here is to minimize restore time in a disaster recovery scenario, so *fairly* low tolerance for restore times (say, after getting the base backup in place, another 5 minutes or less to get it up-and-running). My difficulty is that I don't know if that translates into two WAL files or two thousand. But that was really my question - is there some way to know how many WAL files are too many, and by extension how often I need to do a new base backup? Obviously how quickly I get to that point of "too many" WAL files could vary widely, but if I could get some idea of what that line is, or at least how to figure out that line, that would really help with planning.From what you said, it sounds like I can go for a while, and allow quite a few WAL files to pile up (especially with 9.6, which I will be moving to shortly) between base backups without creating too large a recovery delay. So I'm thinking I may try weekly or even monthly base backups and see how that goes.- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?I would keep at least 2 base backups and all wal files since the start of the oldest base backup.    Good thinking. I'll look into implementing that.when you create a new base backup, delete the oldest, and all wal archive files up to the start of the following one. How do I determine which those are? Just based on the timestamp if the WAL file, such that I could do something like take the timestamp of the last basebackup and delete all WAL files older than that? Or is there a better way?the reason for keeping at least two is so you always have a recovery point if something catastrophic happens during the backup process.note that having base backups plus wal archives allows PiTR too, point in time recovery.   say you discover a bad script updated a ton of bad data into your database last tuesday at 3pm.  you can restore the preceding base backup, then recover up to tuesday just before this event.Exactly - that's one of the primary reasons I'm taking this approach.Thanks again for the info!-- john r pierce, recycling bits in santa cruz-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup "Best Practices"


On 11/23/2016 10:16 AM, Israel Brewster wrote:


To mitigate these issues, I am thinking of supplementing the 
individual dumps with a full base backup and WAL archiving to a NFS 
share. This should enable (relatively) quick/easy recovery from 
backup, plus the ability to do PIT Recovery. I do have a few questions 
with this approach, however:


if you do the base backup periodically (daily? weekly?   monthly? kinda 
depends on your workload, data volume, and available space on your WAL 
archive server)


- How do I figure out how often I should take a full base backup? I 
know this will depend on the amount of traffic my database is doing, 
and how long I am willing to wait for WAL files to be replayed - the 
more WAL files needing replayed, the longer recovery will take - but 
is there some rule of thumb that I can use to calculate how often I 
need a new base backup? Perhaps based on the number of WAL files?


frequency of doing this is entirely dependent on your data volume, how 
long the backups take, and your tolerance for restore times. wal archive 
recoveries are generally quite fast, but of course if there's millions 
of wal files to recover, that would take a long time.  note, too, 
9.6 has some major performance enhancements in how checkpoints are 
written, which should speed things up overall.


- What is the "best" (or just a good) method of keeping the WAL 
archives under control? Obviously when I do a new basebackup I can 
"cleanup" any old files that said backup doesn't need, but how do I 
know what those are?
- Should I be looking at any other backup methods in addition 
to/instead of the basebackup/WAL archive scheme?


I would keep at least 2 base backups and all wal files since the start 
of the oldest base backup.when you create a new base backup, delete 
the oldest, and all wal archive files up to the start of the following 
one. the reason for keeping at least two is so you always have a 
recovery point if something catastrophic happens during the backup process.


note that having base backups plus wal archives allows PiTR too, point 
in time recovery.   say you discover a bad script updated a ton of bad 
data into your database last tuesday at 3pm.  you can restore the 
preceding base backup, then recover up to tuesday just before this event.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Backup "Best Practices"

2016-11-23 Thread Israel Brewster

I was wondering if someone could inform me about, or point me to an online article about best practices for backing up a postgresql database cluster? At the moment, I have two servers running with streaming replication for failover purposes, and then I do nightly database dumps for recovery/backup purposes. However, I feel this approach is lacking in a couple areas I can think of:- If a total failure occurs towards the end of the day, we could potentially loose a whole days worth of data. Similar argument for user error - there is no way to "undo" a catastrophic user data error without going all the way back to the previous day- Less-than-ideal recovery under some scenarios. Since each database in the cluster is dumped individually, this is good should only *one* database need to be restored, but could get tedious should the entire cluster need to be restored.To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?Thanks for any information!
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

Re: [GENERAL] Methods to quiesce PostgreSQL DB without configuring log archival

2016-11-23 Thread Scott Mead

> On Nov 22, 2016, at 01:23, MEERA  wrote:
> 
> Hi all,
> 
> If archive_mode is not configured, and i use snapshot solution for backup of 
> the server, how can i ensure data consistency? Is there a way to quiesce all 
> the connections to DB?

If your snapshot solution is atomic, then you are *probably* okay. I would do a 
few restores to test, but atomic snapshots give you a point in time and should 
be consistent. 

Personally, I like archiving because it removes all of the maybes, but, that's 
me :)

> 
> 
> Thanks,
> Meera

Re: [GENERAL] Partial update on an postgres upsert violates constraint


On 11/21/2016 2:32 AM, Andreas Terrius wrote:
Is there any way to check whether the row already exists before 
checking constraints ? I still want it to fail if it turns out to be a 
new row (which would violate the not null constraint), but updates the 
row if it already exists.


just do an update.  if the row doesn't exist, it will fail, you then 
rollback the transaction or savepoint.


Since if that is not possible, I would need to do a query to determine 
whether the row exists in the database which kinda eliminates the use 
of upsert. (in this case, partial upsert).


in general, anything that relies on those sorts of checks will fail 
under concurrent loads.




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Methods to quiesce PostgreSQL DB without configuring log archival

2016-11-23 Thread MEERA

Hi all,

If archive_mode is not configured, and i use snapshot solution for backup
of the server, how can i ensure data consistency? Is there a way to quiesce
all the connections to DB?


Thanks,
Meera

[GENERAL] query locks up when run concurrently

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead  Shared Object Symbol
  17.95%  postgres  [.] heap_hot_search_buffer
   5.64%  postgres  [.] heap_page_prune_opt
   4.62%  postgres  [.] hash_search_with_hash_value
   3.80%  postgres  [.] LWLockRelease
   3.73%  postgres  [.] 0x002f420d
   2.50%  postgres  [.] _bt_checkkeys
   2.48%  postgres  [.] hash_any
   2.45%  postgres  [.] 0x002f41e7
   2.10%  postgres  [.] slot_getattr
   1.80%  postgres  [.] ResourceOwnerForgetBuffer
   1.58%  postgres  [.] LWLockAcquire
   1.58%  postgres  [.] ReadBufferExtended
   1.54%  postgres  [.] index_fetch_heap
   1.47%  postgres  [.] MemoryContextReset
   1.43%  postgres  [.] btgettuple
   1.38%  postgres  [.] 0x002d710c
   1.36%  postgres  [.] 0x002d70a5
   1.35%  postgres  [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

   
-
Insert on public.group_history  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
  ->  Subquery Scan on "*SELECT*"  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
->  HashAggregate  (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
  Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
  Group Key: a."group", a.id, b.sub_category
  ->  Hash Join  (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
->  Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
  Output: b.sub_category, b.id
  Recheck Cond: (b."group" = 7)
  Heap Blocks: exact=289
  ->  Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
->  Hash  (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
  Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
  Buckets: 16384  Batches: 4  Memory Usage: 638kB
  ->  Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
->  Bitmap Index Scan on

Re: [GENERAL] Partial update on an postgres upsert violates constraint

2016-11-23 Thread Andreas Terrius

Is there any way to check whether the row already exists before checking
constraints ? I still want it to fail if it turns out to be a new row
(which would violate the not null constraint), but updates the row if it
already exists.

Since if that is not possible, I would need to do a query to determine
whether the row exists in the database which kinda eliminates the use of
upsert. (in this case, partial upsert).

On Sun, Nov 20, 2016 at 3:57 AM, Tom Lane  wrote:

> Adrian Klaver  writes:
> > ... So looks like constraints are checked before you get to the ON
> CONFLICT section.
>
> Right.  ON CONFLICT is a means for dealing with duplicate-key errors in
> the specified (or inferred) unique index.  It is *not* an all-purpose
> error catcher.  In the case at hand, the given INSERT request fails due
> to not-null constraints that are unrelated to what the ON CONFLICT clause
> tests for.
>
> regards, tom lane
>

[GENERAL] Thank you

2016-11-23 Thread Melvin Davidson

*This is not a problem or a query.*

*Here in the U.S.A. it is the day before Thanksgiving. In keeping with
that, I'd like to thank all those that have helped me, be thankful for all
those I have helped, and forgive all those that have refused to see my
point of view.*


*Happy Thanksgiving to everyone.*
-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] query locks up when run concurrently

azh...@gmail.com writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity

2016-11-23 Thread Kevin Grittner

On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher
 wrote:
> From: Kevin Grittner [mailto:kgri...@gmail.com]

>> Is it possible to upgrade?  You are missing over a year's worth
>> of fixes for serious bugs and security vulnerabilities.
>
> Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x).
>
>> https://www.postgresql.org/support/versioning/

An upgrade from 9.3.x to 9.6.x is a major release upgrade, which
gets you new features and usually gets you improvements in
performance and scalability.  The 9.3 major release will be
supported for almost 2 more years, so I wasn't so concerned about
that as being on 9.3.10 when the latest bug fix version of 9.3 is
9.3.15.  To avoid hitting bugs that others have already hit and
reported, with fixes published, it is wise to try to upgrade to the
latest minor release fairly quickly.  If the minor release fixes a
serious security vulnerability, I think it is a good idea to update
within a day or two of release.

> I will have to check with our hosting people how many cores we
> have or can have on the new environment.  I have seen that there
> is pgBouncer and pgPool. Would you recommend one of those?

pgBouncer is more lightweight, so if you don't need any of the
features present only pgPool, I would go with pgBouncer.  Depending
on your application software environment, it may be even better to
use a pool built into the application development framework.  There
are several very good pools available to Java environments.  I can
personally attest to the quality of Apache dbcp, but others have
mentioned that they like other Java connection pools even better.
You might want to search the archives, or maybe someone will
mention the others again on this thread.

>>> SELECT count(*) FROM pg_stat_activity; watch 1;
>>
>> At the times when the resources are overloaded by more
>> connections than the resources can efficiently service -- well
>> that's precisely the time that a sleeping "monitoring" process
>> is least likely to be given a time slice to run. If you can
>> manage to get pgbadger to run on your environment, and you turn
>> on logging of connections and disconnections, you will be able
>> to get far more accurate information.
>
> Yes, it sounds reasonable. I assumed that this kind of
> measurements have a higher priority or reserved slots for them.
> In those occasions is when they are most needed.

There is no such feature in PostgreSQL.  It might be worthwhile,
although how that would be implemented is not obvious, short of a
sysadmin looking for the monitoring backend process and running
"nice" against it.

> And thank you for the hint to pgbadger. I will take a look into
> it, but an installation on the server completely depends on our
> hosting service people. I am not sure this is feasible, but I can
> imagine an alternative scenario, using a local installation for
> tuning and then pass the parameters to the hosters for
> implementation.

As long as you can control the PostgreSQL configuration (to set the
right logging options) and can retrieve the log files, you should
be able to use it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [SPAM] Re: [GENERAL] Best practices to manage custom statistics


On 11/23/2016 05:24 AM, Moreno Andreo wrote:

Il 15/11/2016 18:19, Adrian Klaver ha scritto:

On 11/15/2016 07:39 AM, Moreno Andreo wrote:

Sorry for late reply... i'm in some quite rough days

Il 08/11/2016 21:28, Adrian Klaver ha scritto:

On 11/08/2016 12:13 PM, Moreno Andreo wrote:

[...]

In your experience, would this approach help me lower server load?
Are there any other approach I can try?






So does the user need only their data or do they need the other users
data also?

they can be able to download also their group mates' data (stats,
numbers), if they want. Numbers can be confusing because evey user can
have more than one workplace.






How can they be working with 'old' data? The queries you are running
are compiling stats on data that exist at the time they are run and at
any point in time between stats runs the user is working with current
data regardless of what the last stats say.

Since we are on ADO.NET (with Npgsql) and we don't keep connections open
(query, fill a structure and disconnect), in the time slice between two
updates they will have both data and stats that are not "real" in that
moment...
I'll try to make an example
You and me are teammates and work everyone at his place (several miles
away). You read data from my database and get both stats and work data.
Suddenly I make a change. This change won't be visible to you until you
refresh data (that was pressing "Update" button, now it's automated
every 60 secs).
Hope It's clearer now...



Aah, I get it now. You are refreshing forms every 60 seconds over 350 
users each with their own database. Actually worse then that as there is 
user overlap over databases(up to 10 per user), so the same database can 
be hit multiple times at a given refresh. Seems physics is at work here 
as you have already pointed out. Namely fetching all that data at 
regular intervals taxes the bandwith as well as the CPU/storage.


High levels solutions that come to mind to spread the load out:

1) Replication to spread data across multiple machines.
Or just split the databases over multiple non-replicated Postgres 
instances on separate machines


2) Caching results from each individual database so subsequent calls for 
the information do not touch the database. You already touched on this 
with your counter table.


3) Spreading out the refresh interval. Not sure if the 60 second 
interval is synced across users. Still maybe setting different refresh 
intervals and/or changing it on the fly when load increases. A back off 
equation so to speak.





Except when they are wrong:) Still been there.

Don't tell me :-)






Thanks in advance and sorry for the long message (but I had to explain
such a complex thing)
Moreno.-




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.5 and Shell scripts/variables vs. C programming/defining a value to be used

"David G. Johnston"  writes:
> On Wed, Nov 23, 2016 at 3:46 AM, Poul Kristensen  wrote:
>> In the postgresql.log I get "select datname from $1".

> That is expected when you parameterize a query - the query string includes
> parameters instead of values.

It looks like Poul is hoping to use a parameter as a table name,
which doesn't work.  Parameters can only represent values (constants).

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently