Re: [GENERAL] Fwd: PostgreSQL VMWare

2015-07-03 Thread Ibrahim Edib Kokdemir
Hi, It is slower in default settings of vmware environment. But today
vmware has invented other things like SSD cache, vsan etc. If you can
afford that kind of i/o improvments in your environment, it is very
reasonable to continue on vmware, especially due to the ha things.


2015-07-03 14:37 GMT+03:00 Andreas Joseph Krogh andr...@visena.com:

 På fredag 03. juli 2015 kl. 12:35:07, skrev Jean-Gérard Pailloncy 
 jg.paillo...@adnow.fr:

 Hi,

 I work on a project that collects geolocalized data.
 All data will be in PostgreSQL / PostGIS.

 The small PostgreSQL databases will be on Linux guests on VMWare hosts.

 The size of the main database will grow by 50 TB / year, 500 M row / day.
 For the largest one, we plan to test different options.
 One of them is to stay with Linux on WMWare.
 Outside the questions about schema, sharding, I would appreciate if some
 of you have informations, benchmarks, stories about big PostgreSQL
 databases on Linux guests on VMWare hosts.


 We have a 1.5T and growing DB which we started out hosting on VMWare, BIG
 mistake!

 Never, ever, run a production DB on anything other then bare metal
 servers. Once we blasted VMWare's 8GB disk-cache our DB random-read speed
 went down to 1MB/s with 8K blocks (as PG uses), yes that's one megabyte per
 second!

  --
 *Andreas Joseph Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [GENERAL] max number of locks

2015-07-03 Thread Adrian Klaver

On 07/03/2015 03:14 AM, Fabio Pardi wrote:

Hi,


while experimenting with number of locks, i found something I cannot
understand.

 From what i can read in the documentation, at any one given time, a
query can obtain a max number of locks given by

max_locks_per_transaction * (max_connections + max_prepared_transactions)

I then changed my db to use this settings:

mydb=# show max_locks_per_transaction ;
  max_locks_per_transaction
---
  20
(1 row)

mydb=# show max_connections ;
  max_connections
-
  2
(1 row)

mydb=# show  max_prepared_transactions ;
  max_prepared_transactions
---
  0
(1 row)

so i expected to be able to acquire a maximum of 40 locks.


On tables. To continue the docs from where you left off above:

http://www.postgresql.org/docs/9.4/static/runtime-config-locks.html

The shared lock table tracks locks on max_locks_per_transaction * 
(max_connections + max_prepared_transactions) objects (e.g., tables); 
hence, no more than this many distinct objects can be locked at any one 
time. This parameter controls the average number of object locks 
allocated for each transaction; individual transactions can lock more 
objects as long as the locks of all transactions fit in the lock table. 
This is not the number of rows that can be locked; that value is 
unlimited. ...






Then:


mydb=# begin transaction ;
BEGIN
portavita=# SELECT 1 FROM root.ac;
  ?column?
--
(0 rows)

mydb=# select count(*) from pg_locks ;
  count
---
132
(1 row)


Why can I acquire 132 locks while the expected number is 40? What am I
doing wrong?


Take a look here:

http://www.postgresql.org/docs/9.4/interactive/view-pg-locks.html

and see whet the locks are actually being held on.



I m running Postgres 9.2.6



Thanks for your time,



Fabio





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unusual sorting requirement (mixed enum/non-enum) - need thoughts

2015-07-03 Thread David G. Johnston
CREATE TYPE enum_type AS ENUM ('X-One','A-Two');

SELECT *
FROM (VALUES
('Not Enum'::text, 1::int, 'Uno'::text),
('Not Enum', 2, 'Dos'),
('Enum', 4, 'X-One'),
('Enum', 3, 'A-Two')) val (flag, id, val)
;

I need to write an ORDER BY clause that will result in the output of: 1, 2,
4, 3

Basically, if value is capable of being cast to the enum it should be and
the sorting order of the enum used; otherwise order on the value of id.

It doesn't matter how the two groups, enums and non-enums, sort relative to
each other - for any execution of the query either all values will be
enum-able or none will be.

Trying to do this in pure SQL though I'm thinking I'll have to do this in
pl/pgsql and put the is enum-able check external to the query and either
use dynamic SQL or write two separate queries.

I tried casting the enum to an integer but it would not let me :(

Thanks in advanced for any thoughts.

David J.


Re: [GENERAL] database-level lockdown

2015-07-03 Thread Melvin Davidson
Wouldn't it be easier just to prevent connections to the database
 while your transaction is executed?
EG:
Connect to your_database
UPDATE pg_database
  SET datallowconn FALSE
 WHERE datname = 'your_database' ;
 START TRANSACTION;
Do your_transaction
  COMMIT;
UPDATE pg_database
  SET datallowconn TRUE
 WHERE datname = 'your_database' ;


On Fri, Jul 3, 2015 at 1:25 PM, Filipe Pina filipe.p...@impactzero.pt
wrote:

 So, as database level locks do not exist (as per
 https://wiki.postgresql.org/wiki/Lock_database), I've thought of having a
 function that would lock *every* table in the database (far from the same,
 but would probably work for me).

 Something like:

 CREATE OR REPLACE FUNCTION lockdown()
 RETURNS void AS $$
 DECLARE
   t information_schema.tables.table_name%TYPE;
 BEGIN
   FOR t in SELECT table_name
 FROM information_schema.tables
   WHERE table_schema='public'
AND table_type='BASE TABLE'
   LOOP
 EXECUTE 'LOCK ' || t;
   END LOOP;
 END
 $$
 LANGUAGE plpgsql;

 But the problem is that calling the function starts its own transaction
 and once it returns, locks are removed..

 Basically the workflow is (pseudo code coming from
 Django/python/psycopg2/external, not within pgsql):

 function generic_function_restarter(developer_function) {
   # try 4 times to execute developer function and if all of them fail
   # (due to SQLSTATE 40001 serialization failures),
   # lock database and execute one last time
   for 1 in [1..4] {
 try {
   call developer_function()
   return 'success'
 }
 except SQLSTATE_40001 {
   continue
 }
 except other_error {
   return other_error
 }

   # only reaches here if all tries failed with SQLSTATE_40001
   try {
 START TRANSACTION
 call lockdown()
 call developer_function()
 COMMIT TRANSACTION
 return 'success'
   }
   except any_error {
 # implicit ROLLBACK
 return any_error
   }
 }

 So, my problem here is that call lockdown() will place the locks and
 remove them upon returning... Is it possible to execute a function without
 creating a subtransaction?

 I could place the locks from the adapter directly at the outter
 transaction level but I have the feeling that performance would be worse...

 Thanks,
 Filipe

 On Sex, Jun 12, 2015 at 5:25 , Filipe Pina filipe.p...@impactzero.pt
 wrote:

 Exactly, that’s why there’s a limit on the retry number. On the last try I
 wanted something like full lockdown to make sure the transaction will not
 fail due to serialiazation failure (if no other processes are touching the
 database, it can’t happen). So if two transactions were retrying over and
 over, the first one to reach max_retries would activate that “global lock”
 making the other one wait and then the second one would also be able to
 successfully commit...

 On 11/06/2015, at 20:27, Tom Lane t...@sss.pgh.pa.us wrote: Filipe Pina 
 filipe.p...@impactzero.pt writes:

 It will try 5 times to execute each instruction (in case of
 OperationError) and in the last one it will raise the last error it
 received, aborting.

  Now my problem is that aborting for the last try (on a restartable error
 - OperationalError code 40001) is not an option... It simply needs to get
 through, locking whatever other processes and queries it needs.

 I think you need to reconsider your objectives. What if two or more
 transactions are repeatedly failing and retrying, perhaps because they
 conflict? They can't all forcibly win. regards, tom lane




-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] PostgreSQL VMWare

2015-07-03 Thread Jean-Gérard Pailloncy
Hi,

I work on a project that collects geolocalized data.
All data will be in PostgreSQL / PostGIS.

The small PostgreSQL databases will be on Linux guests on VMWare hosts.

The size of the main database will grow by 50 TB / year, 500 M row / day.
For the largest one, we plan to test different options.
One of them is to stay with Linux on WMWare.
Outside the questions about schema, sharding, I would appreciate if some of you 
have informations, benchmarks, stories about big PostgreSQL databases on Linux 
guests on VMWare hosts.

Regards,
Jean-Gérard Pailloncy

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unusual sorting requirement (mixed enum/non-enum) - need thoughts

2015-07-03 Thread David G. Johnston
On Fri, Jul 3, 2015 at 11:27 AM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 CREATE TYPE enum_type AS ENUM ('X-One','A-Two');

 SELECT *
 FROM (VALUES
 ('Not Enum'::text, 1::int, 'Uno'::text),
 ('Not Enum', 2, 'Dos'),
 ('Enum', 4, 'X-One'),
 ('Enum', 3, 'A-Two')) val (flag, id, val)
 ;

 I need to write an ORDER BY clause that will result in the output of: 1,
 2, 4, 3

 Basically, if value is capable of being cast to the enum it should be and
 the sorting order of the enum used; otherwise order on the value of id.

 It doesn't matter how the two groups, enums and non-enums, sort relative
 to each other - for any execution of the query either all values will be
 enum-able or none will be.

 Trying to do this in pure SQL though I'm thinking I'll have to do this in
 pl/pgsql and put the is enum-able check external to the query and either
 use dynamic SQL or write two separate queries.

 I tried casting the enum to an integer but it would not let me :(

 Thanks in advanced for any thoughts.

 David J.


​Running 9.3 and discovered the pg_enum view that has the needed column
that can be converted to integer (well, numeric since it can be
fractional...)

Trying to figure out how best to use it...

David J.


Re: [GENERAL] database-level lockdown

2015-07-03 Thread Filipe Pina
So, as database level locks do not exist (as per 
https://wiki.postgresql.org/wiki/Lock_database), I've thought of having 
a function that would lock *every* table in the database (far from the 
same, but would probably work for me).


Something like:

CREATE OR REPLACE FUNCTION lockdown()
   RETURNS void AS $$
DECLARE
 t information_schema.tables.table_name%TYPE;
BEGIN
 FOR t in SELECT table_name
   FROM information_schema.tables
 WHERE table_schema='public'
  AND table_type='BASE TABLE'
 LOOP
   EXECUTE 'LOCK ' || t;
 END LOOP;
END
$$
LANGUAGE plpgsql;

But the problem is that calling the function starts its own transaction 
and once it returns, locks are removed..


Basically the workflow is (pseudo code coming from 
Django/python/psycopg2/external, not within pgsql):


function generic_function_restarter(developer_function) {
 # try 4 times to execute developer function and if all of them fail
 # (due to SQLSTATE 40001 serialization failures),
 # lock database and execute one last time
 for 1 in [1..4] {
   try {
 call developer_function()
 return 'success'
   }
   except SQLSTATE_40001 {
 continue
   }
   except other_error {
 return other_error
   }
 
 # only reaches here if all tries failed with SQLSTATE_40001

 try {
   START TRANSACTION
   call lockdown()
   call developer_function()
   COMMIT TRANSACTION
   return 'success'
 }
 except any_error {
   # implicit ROLLBACK
   return any_error
 }
}

So, my problem here is that call lockdown() will place the locks and 
remove them upon returning... Is it possible to execute a function 
without creating a subtransaction?


I could place the locks from the adapter directly at the outter 
transaction level but I have the feeling that performance would be 
worse...


Thanks,
Filipe

On Sex, Jun 12, 2015 at 5:25 , Filipe Pina filipe.p...@impactzero.pt 
wrote:
Exactly, that’s why there’s a limit on the retry number. On the 
last try I wanted something like full lockdown to make sure the 
transaction will not fail due to serialiazation failure (if no other 
processes are touching the database, it can’t happen).


So if two transactions were retrying over and over, the first one to 
reach max_retries would activate that “global lock” making the 
other one wait and then the second one would also be able to 
successfully commit...



 On 11/06/2015, at 20:27, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Filipe Pina filipe.p...@impactzero.pt writes:
 It will try 5 times to execute each instruction (in case of 
 OperationError) and in the last one it will raise the last error 
it 
 received, aborting.
 
 Now my problem is that aborting for the last try (on a restartable 
 error - OperationalError code 40001) is not an option... It simply 
 needs to get through, locking whatever other processes and queries 
it 
 needs.
 
 I think you need to reconsider your objectives.  What if two or more
 transactions are repeatedly failing and retrying, perhaps because 
they

 conflict?  They can't all forcibly win.
 
 			regards, tom lane




Re: [GENERAL] Download PostgreSQL 9.5 Alpha

2015-07-03 Thread Charles Clavadetscher

Hello

I also could not find the download on EDB. For Ubuntu 9.5 alpha
is available but I was not able to install it using apt-get because it 
cannot resolve some dependencies. Finally I downloaded and compiled the 
source code. This worked well, but I had to do some additional steps to 
get the cluster up and running. I wrote some notes on


http://www.schmiedewerkstatt.ch/wiki/index.php/PostgreSQL_Compiling_From_Source

In general the problems were:
- Full qualify the executables if you have other instances running on 
your system (in my case 9.4.4). This to make sure that you are using the 
newer versions.
- Set LD_LIBRARY_PATH to the lib dir in your installation directory. The 
same as before but for libraries.


You may prefer to set PATH to the 9.5 lib dir in your current shell 
instead of qualifying the executables.


Hope this helps.

BTW. I am not sure about that, but I guess that if I hadn't had 9.4 
already on the system apt-get would have worked. The error message 
suggested that it did not want to replace some existing files (e.g. 
libpq if I remember well). I assumed that this was intended to avoid a 
working version to be damaged.


Bye
Charles

On 7/4/2015 02:24, Joshua D. Drake wrote:


On 07/03/2015 04:32 PM, Edson F. Lidorio wrote:


Hello,

PostgreSQL 9.5 Alpha not appear on the downloads list in [1]
Where do I download for Windows?

[1]
http://www.enterprisedb.com/products-services-training/pgdownload#windows



For those in the community who may not know, EnterpriseDB hosts the
Windows versions of PostgreSQL.

Edson,

It does not appear that they have a Alpha download available yet.

Sincerely,

JD



--
Edson








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Download PostgreSQL 9.5 Alpha

2015-07-03 Thread Edson F. Lidorio

Hello,

PostgreSQL 9.5 Alpha not appear on the downloads list in [1]
Where do I download for Windows?

[1] 
http://www.enterprisedb.com/products-services-training/pgdownload#windows



--
Edson


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Download PostgreSQL 9.5 Alpha

2015-07-03 Thread Joshua D. Drake


On 07/03/2015 04:32 PM, Edson F. Lidorio wrote:


Hello,

PostgreSQL 9.5 Alpha not appear on the downloads list in [1]
Where do I download for Windows?

[1]
http://www.enterprisedb.com/products-services-training/pgdownload#windows



For those in the community who may not know, EnterpriseDB hosts the 
Windows versions of PostgreSQL.


Edson,

It does not appear that they have a Alpha download available yet.

Sincerely,

JD



--
Edson





--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] return jsonb without field label

2015-07-03 Thread john.tiger

have tables with :  create table mydocs (id serial primary key, data jsonb)

when I do a select data from mydocs it comes back as {data: {my jsonb 
fields}} which then gets passed into a template - don't want to have to 
keep using  %= article.data.name  %  instead of just data.name


probably easy but can't find out how - any suggestions?

ps: using new pg very lite orm library massive.js  (npm massive) - easy 
to run direct sql commands - very nice.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow index performance

2015-07-03 Thread Marc Mamin


 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Christian Schröder
 Sent: Freitag, 3. Juli 2015 07:36
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] Slow index performance
 
 Hi all,
 we have a strange performance issue in one of our databases (using
 PostgreSQL 9.1.18). Maybe you can help me understand what’s going on.
 
 We have two identical tables (rec_isins_current, rec_isins_archive)
 with the following structure:
 
 Table ts_frontend.rec_isins_current
Column   |  Type   | Modifiers
 +-+---
  attachment | integer | not null
  isin   | isin| not null
 Indexes:
 rec_isins_current_pkey PRIMARY KEY, btree (attachment, isin),
 tablespace extra
 rec_isins_current_attachment btree (attachment), tablespace
 extra


Hello, 

Are you sure that the column order of the PKs is the same in both tables?

(attachment, isin) or (isin, attachment).

When isin is at the second place, Postgres will read the whole index
to find matching records.

regards,

Marc Mamin

 Foreign-key constraints:
 rec_isins_attachment_fkey FOREIGN KEY (attachment) REFERENCES
 ts_frontend.attachments(id) ON UPDATE RESTRICT ON DELETE CASCADE
 Inherits: ts_frontend.rec_isins
 
 The isin type is a domain type which has char(12) as its base type.
 Both tables inherit from ts_frontend.rec_isins, which is empty and is
 only used to search both tables in a single query.
 
 When we search for an isin in both tables (using the parent table, but
 the behavior is the same if we directly search in one of the tables),
 the primary key index is used. However, while the archive table is
 pretty fast, the current table is much slower:
 
 # explain analyze select * from ts_frontend.rec_isins where isin =
 'foo';
 
 QUERY PLAN
 ---
 ---
 ---
  Result  (cost=0.00..565831.43 rows=501 width=17) (actual
 time=6080.778..6080.778 rows=0 loops=1)
-  Append  (cost=0.00..565831.43 rows=501 width=17) (actual
 time=6080.777..6080.777 rows=0 loops=1)
  -  Seq Scan on rec_isins  (cost=0.00..0.00 rows=1 width=36)
 (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((isin)::bpchar = 'foo'::bpchar)
  -  Index Scan using rec_isins_archive_pkey on
 rec_isins_archive rec_isins  (cost=0.00..621.61 rows=405 width=17)
 (actual time=10.335..10.335 rows=0 loops=1)
Index Cond: ((isin)::bpchar = 'foo'::bpchar)
  -  Index Scan using rec_isins_current_pkey on
 rec_isins_current rec_isins  (cost=0.00..565209.82 rows=95 width=17)
 (actual time=6070.440..6070.440 rows=0 loops=1)
Index Cond: ((isin)::bpchar = 'foo'::bpchar)  Total
 runtime: 6080.824 ms
 
 This is strange, because the archive table is four times larger than
 the current table and the archive index is also four times larger than
 the current index:
 
relname| relfilenode | reltablespace |
 pg_table_size
 --+-+---+--
 -
 --+-+---+
  rec_isins|   514533886 | 0 |
 8192
  rec_isins_pkey   |   514533892 | 0 |
 8192
  rec_isins_attachment |   514533899 | 0 |
 8192
 --+-+---+--
 -
 --+-+---+
  rec_isins_archive|   507194804 | 0 |
 10923393024
  rec_isins_archive_pkey   |   507197615 | 139300915 |
 9048784896
  rec_isins_archive_attachment |   507197692 | 139300915 |
 4706050048
 --+-+---+--
 -
 --+-+---+
  rec_isins_current|   631621090 | 0 |
 2696216576
  rec_isins_current_pkey   |   631621096 | 139300915 |
 2098552832
  rec_isins_current_attachment |   631621107 | 139300915 |
 1160683520
 
 Both tables are in the same tablespace (and thus on the same disk) and
 both indexes are also in the same tablespace (but in another than the
 tables).
 The current table has been vacuumed full and reindexed.
 
 Can anybody explain the difference? Why is the current table so slow?
 And what can we do to improve performance?
 
 Thanks for your help,
 Christian
 
 
 Deriva GmbH Financial IT and Consulting
 Christian Schröder
 Geschäftsführer
 Hans-Böckler-Straße 2 | D-37079 Göttingen
 Tel: +49 (0)551 489 500-42
 Fax: +49 (0)551 489 500-91
 http://www.deriva.de
 
 Amtsgericht Göttingen | HRB 3240
 Geschäftsführer: Christian Schröder
 
 
 
 
 
 --
 Sent via pgsql-general mailing list 

[GENERAL] max number of locks

2015-07-03 Thread Fabio Pardi
Hi,


while experimenting with number of locks, i found something I cannot understand.

From what i can read in the documentation, at any one given time, a query can 
obtain a max number of locks given by

max_locks_per_transaction * (max_connections + max_prepared_transactions)

I then changed my db to use this settings:

mydb=# show max_locks_per_transaction ;
 max_locks_per_transaction
---
 20
(1 row)

mydb=# show max_connections ;
 max_connections
-
 2
(1 row)

mydb=# show  max_prepared_transactions ;
 max_prepared_transactions
---
 0
(1 row)

so i expected to be able to acquire a maximum of 40 locks.


Then:


mydb=# begin transaction ;
BEGIN
portavita=# SELECT 1 FROM root.ac;
 ?column?
--
(0 rows)

mydb=# select count(*) from pg_locks ;
 count
---
   132
(1 row)


Why can I acquire 132 locks while the expected number is 40? What am I doing 
wrong?

I m running Postgres 9.2.6



Thanks for your time,



Fabio

 


Re: [GENERAL] Backup Method

2015-07-03 Thread Guillaume Lelarge
Le 3 juil. 2015 12:03 PM, howardn...@selestial.com 
howardn...@selestial.com a écrit :

 Hi everyone,

 I am reaching the point with my database backups where the backups are
taking too long, and starting to interfere with running of the system
during the day. So I am looking for a bit of sage advice as to how to
proceed.

 For a typical server, I have a single database cluster with multiple
database that I backup nightly using sequential pg_dumps to NFS.

 So what I am looking for is a backup regime with less impact that will
run overnight but faster or less impact than the pg_dump.

 Would a file system backup be the better option in PITR format, or is
pg_dumpall more efficient than pg_dump?


pg_dumpall is not more efficient than pg_dump. As far as I can tell, you're
looking for PITR backups.


[GENERAL] Backup Method

2015-07-03 Thread howardn...@selestial.com

Hi everyone,

I am reaching the point with my database backups where the backups are 
taking too long, and starting to interfere with running of the system 
during the day. So I am looking for a bit of sage advice as to how to 
proceed.


For a typical server, I have a single database cluster with multiple 
database that I backup nightly using sequential pg_dumps to NFS.


So what I am looking for is a backup regime with less impact that will 
run overnight but faster or less impact than the pg_dump.


Would a file system backup be the better option in PITR format, or is 
pg_dumpall more efficient than pg_dump?


Look forward to your advice.

Howard.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow index performance

2015-07-03 Thread Christian Schröder
You are right ... How embarrassing ... Why did I not see this?
I will change the index and check again. I guess that the problem should be 
fixed then.

Thanks a lot!
Christian


Deriva GmbH Financial IT and Consulting
Christian Schröder
Geschäftsführer
Hans-Böckler-Straße 2 | D-37079 Göttingen
Tel: +49 (0)551 489 500-42
Fax: +49 (0)551 489 500-91
http://www.deriva.de

Amtsgericht Göttingen | HRB 3240
Geschäftsführer: Christian Schröder

-Ursprüngliche Nachricht-
Von: Marc Mamin [mailto:m.ma...@intershop.de] 
Gesendet: Freitag, 3. Juli 2015 08:58
An: Christian Schröder; pgsql-general@postgresql.org
Betreff: RE: [GENERAL] Slow index performance



 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- 
 ow...@postgresql.org] On Behalf Of Christian Schröder
 Sent: Freitag, 3. Juli 2015 07:36
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] Slow index performance
 
 Hi all,
 we have a strange performance issue in one of our databases (using 
 PostgreSQL 9.1.18). Maybe you can help me understand what’s going on.
 
 We have two identical tables (rec_isins_current, rec_isins_archive) 
 with the following structure:
 
 Table ts_frontend.rec_isins_current
Column   |  Type   | Modifiers
 +-+---
  attachment | integer | not null
  isin   | isin| not null
 Indexes:
 rec_isins_current_pkey PRIMARY KEY, btree (attachment, isin), 
 tablespace extra
 rec_isins_current_attachment btree (attachment), tablespace 
 extra


Hello, 

Are you sure that the column order of the PKs is the same in both tables?

(attachment, isin) or (isin, attachment).

When isin is at the second place, Postgres will read the whole index to find 
matching records.

regards,

Marc Mamin

 Foreign-key constraints:
 rec_isins_attachment_fkey FOREIGN KEY (attachment) REFERENCES
 ts_frontend.attachments(id) ON UPDATE RESTRICT ON DELETE CASCADE
 Inherits: ts_frontend.rec_isins
 
 The isin type is a domain type which has char(12) as its base type.
 Both tables inherit from ts_frontend.rec_isins, which is empty and is 
 only used to search both tables in a single query.
 
 When we search for an isin in both tables (using the parent table, but 
 the behavior is the same if we directly search in one of the tables), 
 the primary key index is used. However, while the archive table is 
 pretty fast, the current table is much slower:
 
 # explain analyze select * from ts_frontend.rec_isins where isin = 
 'foo';
 
 QUERY PLAN
 --
 -
 --
 -
 ---
  Result  (cost=0.00..565831.43 rows=501 width=17) (actual
 time=6080.778..6080.778 rows=0 loops=1)
-  Append  (cost=0.00..565831.43 rows=501 width=17) (actual
 time=6080.777..6080.777 rows=0 loops=1)
  -  Seq Scan on rec_isins  (cost=0.00..0.00 rows=1 width=36) 
 (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((isin)::bpchar = 'foo'::bpchar)
  -  Index Scan using rec_isins_archive_pkey on 
 rec_isins_archive rec_isins  (cost=0.00..621.61 rows=405 width=17) 
 (actual time=10.335..10.335 rows=0 loops=1)
Index Cond: ((isin)::bpchar = 'foo'::bpchar)
  -  Index Scan using rec_isins_current_pkey on 
 rec_isins_current rec_isins  (cost=0.00..565209.82 rows=95 width=17) 
 (actual time=6070.440..6070.440 rows=0 loops=1)
Index Cond: ((isin)::bpchar = 'foo'::bpchar)  Total
 runtime: 6080.824 ms
 
 This is strange, because the archive table is four times larger than 
 the current table and the archive index is also four times larger than 
 the current index:
 
relname| relfilenode | reltablespace |
 pg_table_size
 --+-+---+-
 --+-+---+-
 -
 --+-+---+
  rec_isins|   514533886 | 0 |
 8192
  rec_isins_pkey   |   514533892 | 0 |
 8192
  rec_isins_attachment |   514533899 | 0 |
 8192
 --+-+---+-
 --+-+---+-
 -
 --+-+---+
  rec_isins_archive|   507194804 | 0 |
 10923393024
  rec_isins_archive_pkey   |   507197615 | 139300915 |
 9048784896
  rec_isins_archive_attachment |   507197692 | 139300915 |
 4706050048
 --+-+---+-
 --+-+---+-
 -
 --+-+---+
  rec_isins_current|   631621090 | 0 |
 

Re: [GENERAL] Fwd: PostgreSQL VMWare

2015-07-03 Thread Bill Moran
On Fri, 3 Jul 2015 12:35:07 +0200
Jean-Gérard Pailloncy jg.paillo...@adnow.fr wrote:
 
 I work on a project that collects geolocalized data.
 All data will be in PostgreSQL / PostGIS.
 
 The small PostgreSQL databases will be on Linux guests on VMWare hosts.
 
 The size of the main database will grow by 50 TB / year, 500 M row / day.
 For the largest one, we plan to test different options.
 One of them is to stay with Linux on WMWare.
 Outside the questions about schema, sharding, I would appreciate if some of 
 you have informations, benchmarks, stories about big PostgreSQL databases on 
 Linux guests on VMWare hosts.

The place I'm working now did a feasibility study about installing
their primary app on vmware instead of directly onto the hardware.
Their conclusion was that the app would be about 25% slower running
on VMWare. The app is very database-centric. However, I wasn't
involved in the tests, can't vouche for the quality of the testing,
and there _are_ other pieces involved than the database.

That being said, I've used PostgreSQL on VMs quite a bit. It does
seem slower, but I've never actually benchmarked it. And it's never
seemed slower enough for me to complain much.

The concern I have about running a large database on a VM (especially
since you're asking about performance) is not he VM itself, but all
the baggage that inevitably comes with it ... oversubscribed hosts,
terrible, cheap SANs, poor administration leading to bad configuration,
and yet another layer of obscurity preventing you from figuring out
why things are slow. In my experience, you _will_ get all of these,
because once you're on a VM, the admins will be pressured to host
more and more VMs on the existing hardware and/or add capacity at
minimal cost.

There's nothing like a VM where you never know what the performance
will be because you never know when some other VMs (completely unrelated
to you and/or your work) will saturate the IO with some ridiculous
grep recursive command or something.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread howardn...@selestial.com



On 03/07/2015 11:15, Guillaume Lelarge wrote:


pg_dumpall is not more efficient than pg_dump. As far as I can tell, 
you're looking for PITR backups.




Thanks Guillame.

In that case is there any recommendation for how often to make base 
backups in relation to the size of the cluster and the size of the WAL?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fwd: PostgreSQL VMWare

2015-07-03 Thread Jean-Gérard Pailloncy
Hi,

I work on a project that collects geolocalized data.
All data will be in PostgreSQL / PostGIS.

The small PostgreSQL databases will be on Linux guests on VMWare hosts.

The size of the main database will grow by 50 TB / year, 500 M row / day.
For the largest one, we plan to test different options.
One of them is to stay with Linux on WMWare.
Outside the questions about schema, sharding, I would appreciate if some of you 
have informations, benchmarks, stories about big PostgreSQL databases on Linux 
guests on VMWare hosts.

Regards,
Jean-Gérard Pailloncy



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread Guillaume Lelarge
Le 3 juil. 2015 12:31 PM, howardn...@selestial.com 
howardn...@selestial.com a écrit :



 On 03/07/2015 11:15, Guillaume Lelarge wrote:


 pg_dumpall is not more efficient than pg_dump. As far as I can tell,
you're looking for PITR backups.


 Thanks Guillame.

 In that case is there any recommendation for how often to make base
backups in relation to the size of the cluster and the size of the WAL?


Nope, not really. That depends on a lot of things. Our customers usually do
one per day.

-- 
Guillaume


Re: [GENERAL] Backup Method

2015-07-03 Thread howardn...@selestial.com



On 03/07/2015 11:39, Guillaume Lelarge wrote:


 In that case is there any recommendation for how often to make base 
backups in relation to the size of the cluster and the size of the WAL?



Nope, not really. That depends on a lot of things. Our customers 
usually do one per day.


--
Guillaume


Excuse my ignorance... Is the base backup, in general, faster than pg_dump?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread Jan Lentfer

Am 2015-07-03 13:00, schrieb howardn...@selestial.com:

On 03/07/2015 11:39, Guillaume Lelarge wrote:


 In that case is there any recommendation for how often to make 
base backups in relation to the size of the cluster and the size of 
the WAL?



Nope, not really. That depends on a lot of things. Our customers 
usually do one per day.



Excuse my ignorance... Is the base backup, in general, faster than 
pg_dump?


It is a different approach. With the base backup you are actually 
backing up files from the filesystem ($PGDATA directory), whereas with 
pg_dump your saving the SQL commands to reload and rebuild the database.
Usually a file based backup will be faster, both on backup and 
restore, but it is - as mentioned - a different approach and it might 
also not serve all your purposes.


That is why I do weekly base backups (plus WAL Archiving) and use 
pg_dump in a parallel way to do logical backups every night.


Regards,

Jan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread Bill Moran
On Fri, 03 Jul 2015 13:16:02 +0200
Jan Lentfer jan.lent...@web.de wrote:

 Am 2015-07-03 13:00, schrieb howardn...@selestial.com:
  On 03/07/2015 11:39, Guillaume Lelarge wrote:
 
   In that case is there any recommendation for how often to make 
  base backups in relation to the size of the cluster and the size of 
  the WAL?
  
 
  Nope, not really. That depends on a lot of things. Our customers 
  usually do one per day.
 
 
  Excuse my ignorance... Is the base backup, in general, faster than 
  pg_dump?
 
 It is a different approach. With the base backup you are actually 
 backing up files from the filesystem ($PGDATA directory), whereas with 
 pg_dump your saving the SQL commands to reload and rebuild the database.
 Usually a file based backup will be faster, both on backup and 
 restore, but it is - as mentioned - a different approach and it might 
 also not serve all your purposes.

One of the things that makes a lot of difference is the amount of
redundant data in the database. For example, indexes are completely
redundant. They sure do speed things up, but they're storing the same
data 2x for each index you have. When you do a base backup, you have
to copy all that redundancy, but when you do a pg_dump, all that
redundant data is reduced to a single CREATE INDEX command. The
result being that if your database has a lot of indexes, the pg_dump
might actually be faster.

But the only way to know is to try it out on your particular system.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: PostgreSQL VMWare

2015-07-03 Thread Andreas Joseph Krogh
På fredag 03. juli 2015 kl. 12:35:07, skrev Jean-Gérard Pailloncy 
jg.paillo...@adnow.fr mailto:jg.paillo...@adnow.fr:
Hi,

 I work on a project that collects geolocalized data.
 All data will be in PostgreSQL / PostGIS.

 The small PostgreSQL databases will be on Linux guests on VMWare hosts.

 The size of the main database will grow by 50 TB / year, 500 M row / day.
 For the largest one, we plan to test different options.
 One of them is to stay with Linux on WMWare.
 Outside the questions about schema, sharding, I would appreciate if some of 
you have informations, benchmarks, stories about big PostgreSQL databases on 
Linux guests on VMWare hosts.
 
We have a 1.5T and growing DB which we started out hosting on VMWare, BIG 
mistake!
 
Never, ever, run a production DB on anything other then bare metal servers. 
Once we blasted VMWare's 8GB disk-cache our DB random-read speed went down to 
1MB/s with 8K blocks (as PG uses), yes that's one megabyte per second!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com mailto:andr...@visena.com
www.visena.com https://www.visena.com
 https://www.visena.com


 


Re: [GENERAL] Backup Method

2015-07-03 Thread howardn...@selestial.com



On 03/07/2015 12:23, Bill Moran wrote:

On Fri, 03 Jul 2015 13:16:02 +0200
Jan Lentfer jan.lent...@web.de wrote:


Am 2015-07-03 13:00, schrieb howardn...@selestial.com:

On 03/07/2015 11:39, Guillaume Lelarge wrote:

In that case is there any recommendation for how often to make

base backups in relation to the size of the cluster and the size of
the WAL?
Nope, not really. That depends on a lot of things. Our customers
usually do one per day.



Excuse my ignorance... Is the base backup, in general, faster than
pg_dump?

It is a different approach. With the base backup you are actually
backing up files from the filesystem ($PGDATA directory), whereas with
pg_dump your saving the SQL commands to reload and rebuild the database.
Usually a file based backup will be faster, both on backup and
restore, but it is - as mentioned - a different approach and it might
also not serve all your purposes.

One of the things that makes a lot of difference is the amount of
redundant data in the database. For example, indexes are completely
redundant. They sure do speed things up, but they're storing the same
data 2x for each index you have. When you do a base backup, you have
to copy all that redundancy, but when you do a pg_dump, all that
redundant data is reduced to a single CREATE INDEX command. The
result being that if your database has a lot of indexes, the pg_dump
might actually be faster.

But the only way to know is to try it out on your particular system.



Thanks everyone.

I am trying to move away from pg_dump as it is proving too slow. The 
size of the database clusters are approaching 1TB (with multiple 
individual compressed pg_dumps of around 100GB each, but the pace of 
change is relatively glacial compared to the size so I am hoping that 
WAL backups will prove to be much more efficient. As you all point out 
it looks like I will need to test the various methods to find the best 
solution for me.


A supplementary question would be: would rsync be a viable alternative 
to pg_basebackup when performing the file system copy. I have seen a few 
posts on this subject which suggest rsync is more prone to mistakes but  
is potentially the faster option. Love to hear all your thoughts on this 
subject before I risk trying rsync.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread Andreas Joseph Krogh
På fredag 03. juli 2015 kl. 11:59:49, skrev howardn...@selestial.com 
howardn...@selestial.com mailto:howardn...@selestial.com:
Hi everyone,

 I am reaching the point with my database backups where the backups are
 taking too long, and starting to interfere with running of the system
 during the day. So I am looking for a bit of sage advice as to how to
 proceed.

 For a typical server, I have a single database cluster with multiple
 database that I backup nightly using sequential pg_dumps to NFS.

 So what I am looking for is a backup regime with less impact that will
 run overnight but faster or less impact than the pg_dump.

 Would a file system backup be the better option in PITR format, or is
 pg_dumpall more efficient than pg_dump?

 Look forward to your advice.
 
Use Barman http://www.pgbarman.org/
 
We're continuously backing up a 1.5TB cluster without any problems. Full PITR 
with a window of 1 week.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com mailto:andr...@visena.com
www.visena.com https://www.visena.com
 https://www.visena.com