Re: [GENERAL] Fwd: PostgreSQL VMWare
Hi, It is slower in default settings of vmware environment. But today vmware has invented other things like SSD cache, vsan etc. If you can afford that kind of i/o improvments in your environment, it is very reasonable to continue on vmware, especially due to the ha things. 2015-07-03 14:37 GMT+03:00 Andreas Joseph Krogh andr...@visena.com: På fredag 03. juli 2015 kl. 12:35:07, skrev Jean-Gérard Pailloncy jg.paillo...@adnow.fr: Hi, I work on a project that collects geolocalized data. All data will be in PostgreSQL / PostGIS. The small PostgreSQL databases will be on Linux guests on VMWare hosts. The size of the main database will grow by 50 TB / year, 500 M row / day. For the largest one, we plan to test different options. One of them is to stay with Linux on WMWare. Outside the questions about schema, sharding, I would appreciate if some of you have informations, benchmarks, stories about big PostgreSQL databases on Linux guests on VMWare hosts. We have a 1.5T and growing DB which we started out hosting on VMWare, BIG mistake! Never, ever, run a production DB on anything other then bare metal servers. Once we blasted VMWare's 8GB disk-cache our DB random-read speed went down to 1MB/s with 8K blocks (as PG uses), yes that's one megabyte per second! -- *Andreas Joseph Krogh* CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com www.visena.com https://www.visena.com
Re: [GENERAL] max number of locks
On 07/03/2015 03:14 AM, Fabio Pardi wrote: Hi, while experimenting with number of locks, i found something I cannot understand. From what i can read in the documentation, at any one given time, a query can obtain a max number of locks given by max_locks_per_transaction * (max_connections + max_prepared_transactions) I then changed my db to use this settings: mydb=# show max_locks_per_transaction ; max_locks_per_transaction --- 20 (1 row) mydb=# show max_connections ; max_connections - 2 (1 row) mydb=# show max_prepared_transactions ; max_prepared_transactions --- 0 (1 row) so i expected to be able to acquire a maximum of 40 locks. On tables. To continue the docs from where you left off above: http://www.postgresql.org/docs/9.4/static/runtime-config-locks.html The shared lock table tracks locks on max_locks_per_transaction * (max_connections + max_prepared_transactions) objects (e.g., tables); hence, no more than this many distinct objects can be locked at any one time. This parameter controls the average number of object locks allocated for each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. This is not the number of rows that can be locked; that value is unlimited. ... Then: mydb=# begin transaction ; BEGIN portavita=# SELECT 1 FROM root.ac; ?column? -- (0 rows) mydb=# select count(*) from pg_locks ; count --- 132 (1 row) Why can I acquire 132 locks while the expected number is 40? What am I doing wrong? Take a look here: http://www.postgresql.org/docs/9.4/interactive/view-pg-locks.html and see whet the locks are actually being held on. I m running Postgres 9.2.6 Thanks for your time, Fabio -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unusual sorting requirement (mixed enum/non-enum) - need thoughts
CREATE TYPE enum_type AS ENUM ('X-One','A-Two'); SELECT * FROM (VALUES ('Not Enum'::text, 1::int, 'Uno'::text), ('Not Enum', 2, 'Dos'), ('Enum', 4, 'X-One'), ('Enum', 3, 'A-Two')) val (flag, id, val) ; I need to write an ORDER BY clause that will result in the output of: 1, 2, 4, 3 Basically, if value is capable of being cast to the enum it should be and the sorting order of the enum used; otherwise order on the value of id. It doesn't matter how the two groups, enums and non-enums, sort relative to each other - for any execution of the query either all values will be enum-able or none will be. Trying to do this in pure SQL though I'm thinking I'll have to do this in pl/pgsql and put the is enum-able check external to the query and either use dynamic SQL or write two separate queries. I tried casting the enum to an integer but it would not let me :( Thanks in advanced for any thoughts. David J.
Re: [GENERAL] database-level lockdown
Wouldn't it be easier just to prevent connections to the database while your transaction is executed? EG: Connect to your_database UPDATE pg_database SET datallowconn FALSE WHERE datname = 'your_database' ; START TRANSACTION; Do your_transaction COMMIT; UPDATE pg_database SET datallowconn TRUE WHERE datname = 'your_database' ; On Fri, Jul 3, 2015 at 1:25 PM, Filipe Pina filipe.p...@impactzero.pt wrote: So, as database level locks do not exist (as per https://wiki.postgresql.org/wiki/Lock_database), I've thought of having a function that would lock *every* table in the database (far from the same, but would probably work for me). Something like: CREATE OR REPLACE FUNCTION lockdown() RETURNS void AS $$ DECLARE t information_schema.tables.table_name%TYPE; BEGIN FOR t in SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE' LOOP EXECUTE 'LOCK ' || t; END LOOP; END $$ LANGUAGE plpgsql; But the problem is that calling the function starts its own transaction and once it returns, locks are removed.. Basically the workflow is (pseudo code coming from Django/python/psycopg2/external, not within pgsql): function generic_function_restarter(developer_function) { # try 4 times to execute developer function and if all of them fail # (due to SQLSTATE 40001 serialization failures), # lock database and execute one last time for 1 in [1..4] { try { call developer_function() return 'success' } except SQLSTATE_40001 { continue } except other_error { return other_error } # only reaches here if all tries failed with SQLSTATE_40001 try { START TRANSACTION call lockdown() call developer_function() COMMIT TRANSACTION return 'success' } except any_error { # implicit ROLLBACK return any_error } } So, my problem here is that call lockdown() will place the locks and remove them upon returning... Is it possible to execute a function without creating a subtransaction? I could place the locks from the adapter directly at the outter transaction level but I have the feeling that performance would be worse... Thanks, Filipe On Sex, Jun 12, 2015 at 5:25 , Filipe Pina filipe.p...@impactzero.pt wrote: Exactly, that’s why there’s a limit on the retry number. On the last try I wanted something like full lockdown to make sure the transaction will not fail due to serialiazation failure (if no other processes are touching the database, it can’t happen). So if two transactions were retrying over and over, the first one to reach max_retries would activate that “global lock” making the other one wait and then the second one would also be able to successfully commit... On 11/06/2015, at 20:27, Tom Lane t...@sss.pgh.pa.us wrote: Filipe Pina filipe.p...@impactzero.pt writes: It will try 5 times to execute each instruction (in case of OperationError) and in the last one it will raise the last error it received, aborting. Now my problem is that aborting for the last try (on a restartable error - OperationalError code 40001) is not an option... It simply needs to get through, locking whatever other processes and queries it needs. I think you need to reconsider your objectives. What if two or more transactions are repeatedly failing and retrying, perhaps because they conflict? They can't all forcibly win. regards, tom lane -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.
[GENERAL] PostgreSQL VMWare
Hi, I work on a project that collects geolocalized data. All data will be in PostgreSQL / PostGIS. The small PostgreSQL databases will be on Linux guests on VMWare hosts. The size of the main database will grow by 50 TB / year, 500 M row / day. For the largest one, we plan to test different options. One of them is to stay with Linux on WMWare. Outside the questions about schema, sharding, I would appreciate if some of you have informations, benchmarks, stories about big PostgreSQL databases on Linux guests on VMWare hosts. Regards, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unusual sorting requirement (mixed enum/non-enum) - need thoughts
On Fri, Jul 3, 2015 at 11:27 AM, David G. Johnston david.g.johns...@gmail.com wrote: CREATE TYPE enum_type AS ENUM ('X-One','A-Two'); SELECT * FROM (VALUES ('Not Enum'::text, 1::int, 'Uno'::text), ('Not Enum', 2, 'Dos'), ('Enum', 4, 'X-One'), ('Enum', 3, 'A-Two')) val (flag, id, val) ; I need to write an ORDER BY clause that will result in the output of: 1, 2, 4, 3 Basically, if value is capable of being cast to the enum it should be and the sorting order of the enum used; otherwise order on the value of id. It doesn't matter how the two groups, enums and non-enums, sort relative to each other - for any execution of the query either all values will be enum-able or none will be. Trying to do this in pure SQL though I'm thinking I'll have to do this in pl/pgsql and put the is enum-able check external to the query and either use dynamic SQL or write two separate queries. I tried casting the enum to an integer but it would not let me :( Thanks in advanced for any thoughts. David J. Running 9.3 and discovered the pg_enum view that has the needed column that can be converted to integer (well, numeric since it can be fractional...) Trying to figure out how best to use it... David J.
Re: [GENERAL] database-level lockdown
So, as database level locks do not exist (as per https://wiki.postgresql.org/wiki/Lock_database), I've thought of having a function that would lock *every* table in the database (far from the same, but would probably work for me). Something like: CREATE OR REPLACE FUNCTION lockdown() RETURNS void AS $$ DECLARE t information_schema.tables.table_name%TYPE; BEGIN FOR t in SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE' LOOP EXECUTE 'LOCK ' || t; END LOOP; END $$ LANGUAGE plpgsql; But the problem is that calling the function starts its own transaction and once it returns, locks are removed.. Basically the workflow is (pseudo code coming from Django/python/psycopg2/external, not within pgsql): function generic_function_restarter(developer_function) { # try 4 times to execute developer function and if all of them fail # (due to SQLSTATE 40001 serialization failures), # lock database and execute one last time for 1 in [1..4] { try { call developer_function() return 'success' } except SQLSTATE_40001 { continue } except other_error { return other_error } # only reaches here if all tries failed with SQLSTATE_40001 try { START TRANSACTION call lockdown() call developer_function() COMMIT TRANSACTION return 'success' } except any_error { # implicit ROLLBACK return any_error } } So, my problem here is that call lockdown() will place the locks and remove them upon returning... Is it possible to execute a function without creating a subtransaction? I could place the locks from the adapter directly at the outter transaction level but I have the feeling that performance would be worse... Thanks, Filipe On Sex, Jun 12, 2015 at 5:25 , Filipe Pina filipe.p...@impactzero.pt wrote: Exactly, that’s why there’s a limit on the retry number. On the last try I wanted something like full lockdown to make sure the transaction will not fail due to serialiazation failure (if no other processes are touching the database, it can’t happen). So if two transactions were retrying over and over, the first one to reach max_retries would activate that “global lock” making the other one wait and then the second one would also be able to successfully commit... On 11/06/2015, at 20:27, Tom Lane t...@sss.pgh.pa.us wrote: Filipe Pina filipe.p...@impactzero.pt writes: It will try 5 times to execute each instruction (in case of OperationError) and in the last one it will raise the last error it received, aborting. Now my problem is that aborting for the last try (on a restartable error - OperationalError code 40001) is not an option... It simply needs to get through, locking whatever other processes and queries it needs. I think you need to reconsider your objectives. What if two or more transactions are repeatedly failing and retrying, perhaps because they conflict? They can't all forcibly win. regards, tom lane
Re: [GENERAL] Download PostgreSQL 9.5 Alpha
Hello I also could not find the download on EDB. For Ubuntu 9.5 alpha is available but I was not able to install it using apt-get because it cannot resolve some dependencies. Finally I downloaded and compiled the source code. This worked well, but I had to do some additional steps to get the cluster up and running. I wrote some notes on http://www.schmiedewerkstatt.ch/wiki/index.php/PostgreSQL_Compiling_From_Source In general the problems were: - Full qualify the executables if you have other instances running on your system (in my case 9.4.4). This to make sure that you are using the newer versions. - Set LD_LIBRARY_PATH to the lib dir in your installation directory. The same as before but for libraries. You may prefer to set PATH to the 9.5 lib dir in your current shell instead of qualifying the executables. Hope this helps. BTW. I am not sure about that, but I guess that if I hadn't had 9.4 already on the system apt-get would have worked. The error message suggested that it did not want to replace some existing files (e.g. libpq if I remember well). I assumed that this was intended to avoid a working version to be damaged. Bye Charles On 7/4/2015 02:24, Joshua D. Drake wrote: On 07/03/2015 04:32 PM, Edson F. Lidorio wrote: Hello, PostgreSQL 9.5 Alpha not appear on the downloads list in [1] Where do I download for Windows? [1] http://www.enterprisedb.com/products-services-training/pgdownload#windows For those in the community who may not know, EnterpriseDB hosts the Windows versions of PostgreSQL. Edson, It does not appear that they have a Alpha download available yet. Sincerely, JD -- Edson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Download PostgreSQL 9.5 Alpha
Hello, PostgreSQL 9.5 Alpha not appear on the downloads list in [1] Where do I download for Windows? [1] http://www.enterprisedb.com/products-services-training/pgdownload#windows -- Edson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Download PostgreSQL 9.5 Alpha
On 07/03/2015 04:32 PM, Edson F. Lidorio wrote: Hello, PostgreSQL 9.5 Alpha not appear on the downloads list in [1] Where do I download for Windows? [1] http://www.enterprisedb.com/products-services-training/pgdownload#windows For those in the community who may not know, EnterpriseDB hosts the Windows versions of PostgreSQL. Edson, It does not appear that they have a Alpha download available yet. Sincerely, JD -- Edson -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] return jsonb without field label
have tables with : create table mydocs (id serial primary key, data jsonb) when I do a select data from mydocs it comes back as {data: {my jsonb fields}} which then gets passed into a template - don't want to have to keep using %= article.data.name % instead of just data.name probably easy but can't find out how - any suggestions? ps: using new pg very lite orm library massive.js (npm massive) - easy to run direct sql commands - very nice. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow index performance
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- ow...@postgresql.org] On Behalf Of Christian Schröder Sent: Freitag, 3. Juli 2015 07:36 To: pgsql-general@postgresql.org Subject: [GENERAL] Slow index performance Hi all, we have a strange performance issue in one of our databases (using PostgreSQL 9.1.18). Maybe you can help me understand what’s going on. We have two identical tables (rec_isins_current, rec_isins_archive) with the following structure: Table ts_frontend.rec_isins_current Column | Type | Modifiers +-+--- attachment | integer | not null isin | isin| not null Indexes: rec_isins_current_pkey PRIMARY KEY, btree (attachment, isin), tablespace extra rec_isins_current_attachment btree (attachment), tablespace extra Hello, Are you sure that the column order of the PKs is the same in both tables? (attachment, isin) or (isin, attachment). When isin is at the second place, Postgres will read the whole index to find matching records. regards, Marc Mamin Foreign-key constraints: rec_isins_attachment_fkey FOREIGN KEY (attachment) REFERENCES ts_frontend.attachments(id) ON UPDATE RESTRICT ON DELETE CASCADE Inherits: ts_frontend.rec_isins The isin type is a domain type which has char(12) as its base type. Both tables inherit from ts_frontend.rec_isins, which is empty and is only used to search both tables in a single query. When we search for an isin in both tables (using the parent table, but the behavior is the same if we directly search in one of the tables), the primary key index is used. However, while the archive table is pretty fast, the current table is much slower: # explain analyze select * from ts_frontend.rec_isins where isin = 'foo'; QUERY PLAN --- --- --- Result (cost=0.00..565831.43 rows=501 width=17) (actual time=6080.778..6080.778 rows=0 loops=1) - Append (cost=0.00..565831.43 rows=501 width=17) (actual time=6080.777..6080.777 rows=0 loops=1) - Seq Scan on rec_isins (cost=0.00..0.00 rows=1 width=36) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((isin)::bpchar = 'foo'::bpchar) - Index Scan using rec_isins_archive_pkey on rec_isins_archive rec_isins (cost=0.00..621.61 rows=405 width=17) (actual time=10.335..10.335 rows=0 loops=1) Index Cond: ((isin)::bpchar = 'foo'::bpchar) - Index Scan using rec_isins_current_pkey on rec_isins_current rec_isins (cost=0.00..565209.82 rows=95 width=17) (actual time=6070.440..6070.440 rows=0 loops=1) Index Cond: ((isin)::bpchar = 'foo'::bpchar) Total runtime: 6080.824 ms This is strange, because the archive table is four times larger than the current table and the archive index is also four times larger than the current index: relname| relfilenode | reltablespace | pg_table_size --+-+---+-- - --+-+---+ rec_isins| 514533886 | 0 | 8192 rec_isins_pkey | 514533892 | 0 | 8192 rec_isins_attachment | 514533899 | 0 | 8192 --+-+---+-- - --+-+---+ rec_isins_archive| 507194804 | 0 | 10923393024 rec_isins_archive_pkey | 507197615 | 139300915 | 9048784896 rec_isins_archive_attachment | 507197692 | 139300915 | 4706050048 --+-+---+-- - --+-+---+ rec_isins_current| 631621090 | 0 | 2696216576 rec_isins_current_pkey | 631621096 | 139300915 | 2098552832 rec_isins_current_attachment | 631621107 | 139300915 | 1160683520 Both tables are in the same tablespace (and thus on the same disk) and both indexes are also in the same tablespace (but in another than the tables). The current table has been vacuumed full and reindexed. Can anybody explain the difference? Why is the current table so slow? And what can we do to improve performance? Thanks for your help, Christian Deriva GmbH Financial IT and Consulting Christian Schröder Geschäftsführer Hans-Böckler-Straße 2 | D-37079 Göttingen Tel: +49 (0)551 489 500-42 Fax: +49 (0)551 489 500-91 http://www.deriva.de Amtsgericht Göttingen | HRB 3240 Geschäftsführer: Christian Schröder -- Sent via pgsql-general mailing list
[GENERAL] max number of locks
Hi, while experimenting with number of locks, i found something I cannot understand. From what i can read in the documentation, at any one given time, a query can obtain a max number of locks given by max_locks_per_transaction * (max_connections + max_prepared_transactions) I then changed my db to use this settings: mydb=# show max_locks_per_transaction ; max_locks_per_transaction --- 20 (1 row) mydb=# show max_connections ; max_connections - 2 (1 row) mydb=# show max_prepared_transactions ; max_prepared_transactions --- 0 (1 row) so i expected to be able to acquire a maximum of 40 locks. Then: mydb=# begin transaction ; BEGIN portavita=# SELECT 1 FROM root.ac; ?column? -- (0 rows) mydb=# select count(*) from pg_locks ; count --- 132 (1 row) Why can I acquire 132 locks while the expected number is 40? What am I doing wrong? I m running Postgres 9.2.6 Thanks for your time, Fabio
Re: [GENERAL] Backup Method
Le 3 juil. 2015 12:03 PM, howardn...@selestial.com howardn...@selestial.com a écrit : Hi everyone, I am reaching the point with my database backups where the backups are taking too long, and starting to interfere with running of the system during the day. So I am looking for a bit of sage advice as to how to proceed. For a typical server, I have a single database cluster with multiple database that I backup nightly using sequential pg_dumps to NFS. So what I am looking for is a backup regime with less impact that will run overnight but faster or less impact than the pg_dump. Would a file system backup be the better option in PITR format, or is pg_dumpall more efficient than pg_dump? pg_dumpall is not more efficient than pg_dump. As far as I can tell, you're looking for PITR backups.
[GENERAL] Backup Method
Hi everyone, I am reaching the point with my database backups where the backups are taking too long, and starting to interfere with running of the system during the day. So I am looking for a bit of sage advice as to how to proceed. For a typical server, I have a single database cluster with multiple database that I backup nightly using sequential pg_dumps to NFS. So what I am looking for is a backup regime with less impact that will run overnight but faster or less impact than the pg_dump. Would a file system backup be the better option in PITR format, or is pg_dumpall more efficient than pg_dump? Look forward to your advice. Howard. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow index performance
You are right ... How embarrassing ... Why did I not see this? I will change the index and check again. I guess that the problem should be fixed then. Thanks a lot! Christian Deriva GmbH Financial IT and Consulting Christian Schröder Geschäftsführer Hans-Böckler-Straße 2 | D-37079 Göttingen Tel: +49 (0)551 489 500-42 Fax: +49 (0)551 489 500-91 http://www.deriva.de Amtsgericht Göttingen | HRB 3240 Geschäftsführer: Christian Schröder -Ursprüngliche Nachricht- Von: Marc Mamin [mailto:m.ma...@intershop.de] Gesendet: Freitag, 3. Juli 2015 08:58 An: Christian Schröder; pgsql-general@postgresql.org Betreff: RE: [GENERAL] Slow index performance -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- ow...@postgresql.org] On Behalf Of Christian Schröder Sent: Freitag, 3. Juli 2015 07:36 To: pgsql-general@postgresql.org Subject: [GENERAL] Slow index performance Hi all, we have a strange performance issue in one of our databases (using PostgreSQL 9.1.18). Maybe you can help me understand what’s going on. We have two identical tables (rec_isins_current, rec_isins_archive) with the following structure: Table ts_frontend.rec_isins_current Column | Type | Modifiers +-+--- attachment | integer | not null isin | isin| not null Indexes: rec_isins_current_pkey PRIMARY KEY, btree (attachment, isin), tablespace extra rec_isins_current_attachment btree (attachment), tablespace extra Hello, Are you sure that the column order of the PKs is the same in both tables? (attachment, isin) or (isin, attachment). When isin is at the second place, Postgres will read the whole index to find matching records. regards, Marc Mamin Foreign-key constraints: rec_isins_attachment_fkey FOREIGN KEY (attachment) REFERENCES ts_frontend.attachments(id) ON UPDATE RESTRICT ON DELETE CASCADE Inherits: ts_frontend.rec_isins The isin type is a domain type which has char(12) as its base type. Both tables inherit from ts_frontend.rec_isins, which is empty and is only used to search both tables in a single query. When we search for an isin in both tables (using the parent table, but the behavior is the same if we directly search in one of the tables), the primary key index is used. However, while the archive table is pretty fast, the current table is much slower: # explain analyze select * from ts_frontend.rec_isins where isin = 'foo'; QUERY PLAN -- - -- - --- Result (cost=0.00..565831.43 rows=501 width=17) (actual time=6080.778..6080.778 rows=0 loops=1) - Append (cost=0.00..565831.43 rows=501 width=17) (actual time=6080.777..6080.777 rows=0 loops=1) - Seq Scan on rec_isins (cost=0.00..0.00 rows=1 width=36) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((isin)::bpchar = 'foo'::bpchar) - Index Scan using rec_isins_archive_pkey on rec_isins_archive rec_isins (cost=0.00..621.61 rows=405 width=17) (actual time=10.335..10.335 rows=0 loops=1) Index Cond: ((isin)::bpchar = 'foo'::bpchar) - Index Scan using rec_isins_current_pkey on rec_isins_current rec_isins (cost=0.00..565209.82 rows=95 width=17) (actual time=6070.440..6070.440 rows=0 loops=1) Index Cond: ((isin)::bpchar = 'foo'::bpchar) Total runtime: 6080.824 ms This is strange, because the archive table is four times larger than the current table and the archive index is also four times larger than the current index: relname| relfilenode | reltablespace | pg_table_size --+-+---+- --+-+---+- - --+-+---+ rec_isins| 514533886 | 0 | 8192 rec_isins_pkey | 514533892 | 0 | 8192 rec_isins_attachment | 514533899 | 0 | 8192 --+-+---+- --+-+---+- - --+-+---+ rec_isins_archive| 507194804 | 0 | 10923393024 rec_isins_archive_pkey | 507197615 | 139300915 | 9048784896 rec_isins_archive_attachment | 507197692 | 139300915 | 4706050048 --+-+---+- --+-+---+- - --+-+---+ rec_isins_current| 631621090 | 0 |
Re: [GENERAL] Fwd: PostgreSQL VMWare
On Fri, 3 Jul 2015 12:35:07 +0200 Jean-Gérard Pailloncy jg.paillo...@adnow.fr wrote: I work on a project that collects geolocalized data. All data will be in PostgreSQL / PostGIS. The small PostgreSQL databases will be on Linux guests on VMWare hosts. The size of the main database will grow by 50 TB / year, 500 M row / day. For the largest one, we plan to test different options. One of them is to stay with Linux on WMWare. Outside the questions about schema, sharding, I would appreciate if some of you have informations, benchmarks, stories about big PostgreSQL databases on Linux guests on VMWare hosts. The place I'm working now did a feasibility study about installing their primary app on vmware instead of directly onto the hardware. Their conclusion was that the app would be about 25% slower running on VMWare. The app is very database-centric. However, I wasn't involved in the tests, can't vouche for the quality of the testing, and there _are_ other pieces involved than the database. That being said, I've used PostgreSQL on VMs quite a bit. It does seem slower, but I've never actually benchmarked it. And it's never seemed slower enough for me to complain much. The concern I have about running a large database on a VM (especially since you're asking about performance) is not he VM itself, but all the baggage that inevitably comes with it ... oversubscribed hosts, terrible, cheap SANs, poor administration leading to bad configuration, and yet another layer of obscurity preventing you from figuring out why things are slow. In my experience, you _will_ get all of these, because once you're on a VM, the admins will be pressured to host more and more VMs on the existing hardware and/or add capacity at minimal cost. There's nothing like a VM where you never know what the performance will be because you never know when some other VMs (completely unrelated to you and/or your work) will saturate the IO with some ridiculous grep recursive command or something. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
On 03/07/2015 11:15, Guillaume Lelarge wrote: pg_dumpall is not more efficient than pg_dump. As far as I can tell, you're looking for PITR backups. Thanks Guillame. In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Fwd: PostgreSQL VMWare
Hi, I work on a project that collects geolocalized data. All data will be in PostgreSQL / PostGIS. The small PostgreSQL databases will be on Linux guests on VMWare hosts. The size of the main database will grow by 50 TB / year, 500 M row / day. For the largest one, we plan to test different options. One of them is to stay with Linux on WMWare. Outside the questions about schema, sharding, I would appreciate if some of you have informations, benchmarks, stories about big PostgreSQL databases on Linux guests on VMWare hosts. Regards, Jean-Gérard Pailloncy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
Le 3 juil. 2015 12:31 PM, howardn...@selestial.com howardn...@selestial.com a écrit : On 03/07/2015 11:15, Guillaume Lelarge wrote: pg_dumpall is not more efficient than pg_dump. As far as I can tell, you're looking for PITR backups. Thanks Guillame. In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? Nope, not really. That depends on a lot of things. Our customers usually do one per day. -- Guillaume
Re: [GENERAL] Backup Method
On 03/07/2015 11:39, Guillaume Lelarge wrote: In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? Nope, not really. That depends on a lot of things. Our customers usually do one per day. -- Guillaume Excuse my ignorance... Is the base backup, in general, faster than pg_dump? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
Am 2015-07-03 13:00, schrieb howardn...@selestial.com: On 03/07/2015 11:39, Guillaume Lelarge wrote: In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? Nope, not really. That depends on a lot of things. Our customers usually do one per day. Excuse my ignorance... Is the base backup, in general, faster than pg_dump? It is a different approach. With the base backup you are actually backing up files from the filesystem ($PGDATA directory), whereas with pg_dump your saving the SQL commands to reload and rebuild the database. Usually a file based backup will be faster, both on backup and restore, but it is - as mentioned - a different approach and it might also not serve all your purposes. That is why I do weekly base backups (plus WAL Archiving) and use pg_dump in a parallel way to do logical backups every night. Regards, Jan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
On Fri, 03 Jul 2015 13:16:02 +0200 Jan Lentfer jan.lent...@web.de wrote: Am 2015-07-03 13:00, schrieb howardn...@selestial.com: On 03/07/2015 11:39, Guillaume Lelarge wrote: In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? Nope, not really. That depends on a lot of things. Our customers usually do one per day. Excuse my ignorance... Is the base backup, in general, faster than pg_dump? It is a different approach. With the base backup you are actually backing up files from the filesystem ($PGDATA directory), whereas with pg_dump your saving the SQL commands to reload and rebuild the database. Usually a file based backup will be faster, both on backup and restore, but it is - as mentioned - a different approach and it might also not serve all your purposes. One of the things that makes a lot of difference is the amount of redundant data in the database. For example, indexes are completely redundant. They sure do speed things up, but they're storing the same data 2x for each index you have. When you do a base backup, you have to copy all that redundancy, but when you do a pg_dump, all that redundant data is reduced to a single CREATE INDEX command. The result being that if your database has a lot of indexes, the pg_dump might actually be faster. But the only way to know is to try it out on your particular system. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fwd: PostgreSQL VMWare
På fredag 03. juli 2015 kl. 12:35:07, skrev Jean-Gérard Pailloncy jg.paillo...@adnow.fr mailto:jg.paillo...@adnow.fr: Hi, I work on a project that collects geolocalized data. All data will be in PostgreSQL / PostGIS. The small PostgreSQL databases will be on Linux guests on VMWare hosts. The size of the main database will grow by 50 TB / year, 500 M row / day. For the largest one, we plan to test different options. One of them is to stay with Linux on WMWare. Outside the questions about schema, sharding, I would appreciate if some of you have informations, benchmarks, stories about big PostgreSQL databases on Linux guests on VMWare hosts. We have a 1.5T and growing DB which we started out hosting on VMWare, BIG mistake! Never, ever, run a production DB on anything other then bare metal servers. Once we blasted VMWare's 8GB disk-cache our DB random-read speed went down to 1MB/s with 8K blocks (as PG uses), yes that's one megabyte per second! -- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com https://www.visena.com https://www.visena.com
Re: [GENERAL] Backup Method
On 03/07/2015 12:23, Bill Moran wrote: On Fri, 03 Jul 2015 13:16:02 +0200 Jan Lentfer jan.lent...@web.de wrote: Am 2015-07-03 13:00, schrieb howardn...@selestial.com: On 03/07/2015 11:39, Guillaume Lelarge wrote: In that case is there any recommendation for how often to make base backups in relation to the size of the cluster and the size of the WAL? Nope, not really. That depends on a lot of things. Our customers usually do one per day. Excuse my ignorance... Is the base backup, in general, faster than pg_dump? It is a different approach. With the base backup you are actually backing up files from the filesystem ($PGDATA directory), whereas with pg_dump your saving the SQL commands to reload and rebuild the database. Usually a file based backup will be faster, both on backup and restore, but it is - as mentioned - a different approach and it might also not serve all your purposes. One of the things that makes a lot of difference is the amount of redundant data in the database. For example, indexes are completely redundant. They sure do speed things up, but they're storing the same data 2x for each index you have. When you do a base backup, you have to copy all that redundancy, but when you do a pg_dump, all that redundant data is reduced to a single CREATE INDEX command. The result being that if your database has a lot of indexes, the pg_dump might actually be faster. But the only way to know is to try it out on your particular system. Thanks everyone. I am trying to move away from pg_dump as it is proving too slow. The size of the database clusters are approaching 1TB (with multiple individual compressed pg_dumps of around 100GB each, but the pace of change is relatively glacial compared to the size so I am hoping that WAL backups will prove to be much more efficient. As you all point out it looks like I will need to test the various methods to find the best solution for me. A supplementary question would be: would rsync be a viable alternative to pg_basebackup when performing the file system copy. I have seen a few posts on this subject which suggest rsync is more prone to mistakes but is potentially the faster option. Love to hear all your thoughts on this subject before I risk trying rsync. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup Method
På fredag 03. juli 2015 kl. 11:59:49, skrev howardn...@selestial.com howardn...@selestial.com mailto:howardn...@selestial.com: Hi everyone, I am reaching the point with my database backups where the backups are taking too long, and starting to interfere with running of the system during the day. So I am looking for a bit of sage advice as to how to proceed. For a typical server, I have a single database cluster with multiple database that I backup nightly using sequential pg_dumps to NFS. So what I am looking for is a backup regime with less impact that will run overnight but faster or less impact than the pg_dump. Would a file system backup be the better option in PITR format, or is pg_dumpall more efficient than pg_dump? Look forward to your advice. Use Barman http://www.pgbarman.org/ We're continuously backing up a 1.5TB cluster without any problems. Full PITR with a window of 1 week. -- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com https://www.visena.com https://www.visena.com