Re: [GENERAL] dblink password required
select * from dblink(‘dbname=database2 username=db_link password=mypassword','select username, email from appuser') as t1(username text, email text);: I think the problem is the above- ^^^ username=db_link should be user=db_link The accepted keywords are here: http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING Yes that’s it, thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] dblink password required
Hi, I’m trying to use dblink to connect from database1 to database2 on the same machine. The Postgres version is 9.3.5. Firstly, I’ve created a user to use with the connection, while logged in as the postgres superuser to database2: create user db_link with password ‘mypassword’; grant select on appuser to db_link; Secondly, the following has been added to pg_hba.conf and a restart done. # dblink between database1 and database2 local database2 db_link md5 host database2db_link 127.0.0.1/32md5 host database2db_link ::1/128 md5 Connecting with PSQL prompts for a password as expected: psql -U db_link -d database2 Password for user db_link: psql (9.3.5) Type help for help. Then, as a normal user logged into database1, I tried select * from dblink(‘dbname=database2 username=db_link password=mypassword','select username, email from appuser') as t1(username text, email text); However this returns ERROR: password is required DETAIL: Non-superusers must provide a password in the connection string. This is my first time trying dblink so there’s probably something simple I’m missing, but I thought I had provided a password. Any ideas? Regards Oliver Kohll -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Crash in 9.4 Beta when partially collapsing left outer joins
I think this is the first time I've ever reported a PG crash, which is notable since I've been using PG for over 10 years. ;) Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible crash when running a query with a left outer join, partially collapsed. TRAP: FailedAssertion(!(!restriction_is_or_clause((RestrictInfo *) orarg)), File: indxpath.c, Line: 1213) 2014-09-08 14:21:33.179 PDT LOG: server process (PID 19957) was terminated by signal 6: Aborted 2014-09-08 14:21:33.179 PDT DETAIL: Failed process was running: SELECT students.id FROM students LEFT OUTER JOIN enrollments ON ( enrollments.students_id = students.id ) WHERE ( students.id = 5008 OR ( ( students.birthcity = 'Chico' OR students.birthcity IS NULL ) AND enrollments.start 20141219 ) ); 2014-09-08 14:21:33.179 PDT LOG: terminating any other active server processes 2014-09-08 14:21:33.179 PDT WARNING: terminating connection because of crash of another server process 2014-09-08 14:21:33.179 PDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Here's the smallest query I could find that it would crash on. Run on a blank database, the following will reliably crash postgres: CREATE TABLE students (id SERIAL PRIMARY KEY, birthcity VARCHAR DEFAULT NULL); CREATE TABLE enrollments (students_id INTEGER NOT NULL REFERENCES students(id), start INTEGER); SELECT students.id FROM students LEFT OUTER JOIN enrollments ON ( enrollments.students_id = students.id ) WHERE ( students.id = 5008 OR ( ( students.birthcity = 'Chico' OR students.birthcity IS NULL ) AND enrollments.start 20141219 ) ); --- Other environment stuff: [root@db1 pgsql]# rpm -qa | grep postg postgresql94-libs-9.4beta2-1PGDG.rhel6.x86_64 postgresql94-server-9.4beta2-1PGDG.rhel6.x86_64 postgresql94-devel-9.4beta2-1PGDG.rhel6.x86_64 postgresql92-libs-9.2.9-1PGDG.rhel6.x86_64 postgresql94-9.4beta2-1PGDG.rhel6.x86_64 postgresql94-contrib-9.4beta2-1PGDG.rhel6.x86_64 [root@db1 pgsql]# uname -a Linux db1.schoolpathways.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux What other information should I provide? We have the machine available if necessary. Benjamin Smith -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Detect streaming replication failure
For reference: https://wiki.postgresql.org/wiki/Streaming_Replication Assume a master - slave streaming replication configuration, Postgresql 9.2. Assume that the master has been chugging away, but the slave PG service has been offline for a while and the wal archive has updated enough that the slave cannot catch up. When I start the slave PG instance, pg launches and runs but doesn't update. It also doesn't seem to throw any errors. The only outward sign that I can see that anything is wrong is that pg_last_xlog_replay_location() doesn't update. I can look in /var/lib/pgsql/9.2/data/pg_log/postgresql-Thu.csv and see errors there EG: 2014-07-17 22:38:23.851 UTC,,,21310,,53c8505f.533e,2,,2014-07-17 22:38:23 UTC,,0,FATAL,XX000,could not receive data from WAL stream: FATAL: requested WAL segment 000700050071 has already been removed Is that the only way to detect this condition? I guess I'm looking for something like select * from pg_is_replicating_ok(); 1 on the slave. At the moment, it appears that I can either parse the log file, or look for pg_last_xact_replay_timestamp() acceptable threshold minutes in the past. http://www.postgresql.org/docs/9.2/static/functions-admin.html Thanks, Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Natural key woe
I'm sure no one else on this list has done anything like this, but here's a cautionary tale. I wanted to synchronise data in two tables (issue lists) - i.e. whenever a record is added into one, add a similar record into the other. The two tables are similar in format but not exactly the same so only a subset of fields are copied. Both tables have synthetic primary keys, these can't be used to match data as they are auto-incrementing sequences that might interfere. What I could have done perhaps is get both tables to use the same sequence, but what I actually did is: * join both tables based on a natural key * use that to copy any missing items from table1 to table2 * truncate table1 and copy all of table2's rows to table1 * run this routine once an hour The natural key was based on the creation timestamp (stored on insert) and the one of the text fields, called 'subject'. The problem came when someone entered a record with no subject, but left it null. When this was copied over and present in both tables, the *next* time the join was done, a duplicate was created because the join didn't see them as matching (null != null). So after 1 hour there were two records. After two there were four, after 3, 8 etc. When I logged in after 25 hrs and noticed table access was a little slow, there were 2^25 = 33 million records. That's a learning experience for me at least. It's lucky I did check it at the end of that day rather than leaving it overnight, otherwise I think our server would have ground to a halt. One other wrinkle to note. After clearing out these rows, running 'VACUUM table2', 'ANALYZE table2' and 'REINDEX table table2', some queries with simple sequence scans were taking a few seconds to run even though there are only a thousand rows in the table. I finally found that running CLUSTER on the table sorted that out, even though we're on an SSD so I would have thought seeking all over the place for a seq. scan wouldn't have made that much difference. It obviously does still make some. Oliver Kohll www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SSD Drives
On 04/02/2014 02:55 PM, Bret Stern wrote: Care to share the SSD hardware you're using? I've used none to date, and have some critical data I would like to put on a development server to test with. Regards, Bret Stern SSDs are ridiculously cheap when you consider the performance difference. We saw at *least* a 10x improvement in performance going with SATA SSDs vs. 10k SAS drives in a messy, read/write environment. (most of our tests were 20x or more) It's a no-brainer for us. It might be tempting to use a consumer-grade SSD due to the significant cost savings, but the money saved is vapor. They may be OK for a dev environment, but you *will* pay in downtime in a production environment. Unlike regular hard drives where the difference between consumer and enterprise drives is performance and a few features, SSDs are different animals. SSDs wear something like a salt-shaker. There's a fairly definite number of writes that they are good for, and when they are gone, the drive will fail. Like a salt shaker, when the salt is gone, you won't get salt any more no matter how you shake it. So, spend the money and get the enterprise class SSDs. They have come down considerably in price over the last year or so. Although on paper the Intel Enterprise SSDs tend to trail the performance numbers of the leading consumer drives, they have wear characteristics that mean you can trust them as much as you can any other drive for years, and they still leave spinning rust far, far behind. Our production servers are 4x 1U rackmounts with 32 cores, 128 GB of ECC RAM, and SW RAID1 400 GB SSDs in each. We back up all our databases hourly, with peak volume around 200-300 QPS/server with a write ratio of perhaps 40%, and a iostat disk utilization at about 10-20% in 5 second intervals. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] vacuum vs pg_repack for clearing bloat?
Our app makes extensive use of temp tables, and this causes a significant amount of bloat that can often only be cleared with a manual vacuum process. We're looking for a better way that doesn't involve locking, we found pg_repack and pg_reorg and were wondering if anybody here could weigh in on using this instead of using vacuum? pg_repack: https://github.com/reorg/pg_repack pg_reorg http://reorg.projects.pgfoundry.org/ Thanks in advance, Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum vs pg_repack for clearing bloat?
On 01/15/2014 04:24 PM, Tom Lane wrote: Lists li...@benjamindsmith.com writes: Our app makes extensive use of temp tables, and this causes a significant amount of bloat that can often only be cleared with a manual vacuum process. We're looking for a better way that doesn't involve locking, we found pg_repack and pg_reorg and were wondering if anybody here could weigh in on using this instead of using vacuum? A temp table is only accessible to the owning process, so if you're hoping for vacuuming of it to happen silently in background, you'll be sadly disappointed. The speed advantage of a temp table come exactly from not having to worry about concurrent access, so this isn't a tradeoff that can easily be adjusted. regards, tom lane Tom, The process(es) creating the temp tables are not persistent, so the issue isn't trying to clean up bloat from a long running process, it's clearing out the cruft that results from creating temp tables, loading a bunch of data, then dropping the table, either explicitly or when the connection is terminated. This causes PG disk usage to climb without causing any change in pg_dump output. I was wondering if anybody else had used either of these projects (pg_repack or pg_reorg, though reorg seems to be unsupported) and if so, how successful they had been. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Making substrings uppercase
On 9 Sep 2013, at 21:03, Alvaro Herrera alvhe...@2ndquadrant.com wrote: select string_agg(case when words like '*%*' then upper(btrim(words, '*')) else words end, ' ') from regexp_split_to_table('The *quick* *brown* fox jumped over the *lazy* dog', ' ') as words; string_agg -- The QUICK BROWN fox jumped over the LAZY dog That's quite elegant. In the end I exported and used PERL, as some of my 'words' had spaces (they were ingredients like monosodium glutamate), but you could probably do a more complex regex in regexp_split_to_table to cope with that, or use pl/perl as previously suggested. Thanks Oliver www.agilebase.co.uk
[GENERAL] Making substrings uppercase
Hello, Given a string with certain words surrounded by stars, e.g. The *quick* *brown* fox jumped over the *lazy* dog can you transform the words surrounded by stars with uppercase versions, i.e. The QUICK BROWN fox jumped over the LAZY dog Given text in a column sentence in table sentences, I can mark/extract the words as follows: SELECT regexp_replace(sentence,'\*(.*?)\*','STARTUPPER\1ENDUPPER','g') FROM sentences; but my first attempt at uppercase transforms doesn't work: select regexp_replace(sentence,'\*(.*?)\*','' || upper('\1'),'g') from sentences; I thought of using substring() to split the parts up after replacing the stars with start and end markers, but that would fail if there was more than one word starred. Any other ideas? Oliver -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Making substrings uppercase
On 9 Sep 2013, at 14:41, David Johnston pol...@yahoo.com wrote: Oliver Kohll - Mailing Lists wrote select regexp_replace(sentence,'\*(.*?)\*','' || upper('\1'),'g') from sentences; Yeah, you cannot embed a function-call result in the replace with section; it has to be a literal (with the group insertion meta-sequences allowed of course). I see two possible approaches. 1) Use pl/perl (or some variant thereof) which has facilities to do just this. 2) Use regexp_matches(,,'g') to explode the input string into its components parts. You can explode it so every character of the original string is in the output with the different columns containing the raw and to modify parts of each match. This would be done in a sub-query and then in the parent query you would string_agg(...) the matches back together while manipulating the columns needed i.e., string_agg(c1 || upper(c3)) HTH David J. I see, I'm going with Perl, thanks. Oliver -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] replication stops working
John DeSoi wrote: I have a 9.2 hot standby setup with replication via rsync. For the second time, it has stopped working with no apparent error on the primary or standby. Last time this happened I fixed it by restarting the primary. Yesterday I started a new base backup around noon and it replicated without any problems for about 12 hours. Then it just stopped and I don't see any errors in the Postgres log (primary or standby). I looked at other system logs and still don't see any problems. I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug suggestions. John DeSoi, Ph.D. = wal_level = hot_standby wal_keep_segments = 48 max_wal_senders = 2 archive_mode = on archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p bak-postgres:/pgbackup/%f' archive_timeout = 300 If there are no errors in the log, how did you conclude that replication has stopped working? Since you're using a hot standby, you've also setup streaming replication in addition to the WAL archiving, correct? Regards, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] AWS, cascading replication and WAL archiving
I currently have a working 9.2 master + hot standby setup, using asynchronous replication and WAL archiving (via a shared NFS mount), running in our colocated datacenter. I need to migrate this to AWS, with as little downtime as possible. My plan is to create two cascading standbys, daisy-chained like this: master (M) - primary standby (S1) - secondary standby (S2) - tertiary standby (S3), and at migration time, promote S2 to master and then drop both M and S1 (hope this explanation make sense). WAL-E[1] seems like a perfect solution for WAL archiving on AWS, so I've set the master's archive_command to archive both on NFS (so I don't break the current setup) and on Amazon S3 (using WAL-E) so S2 and S3 can restore from it. Q1) Is this a good migration strategy? Q2) Should I promote S2 before killing M, or should I kill M before promoting S2? Q2) Should S2 and S3 read from the same WAL archive, that's initially written to from M and by S2 when it gets promoted to master; or should I have two separate WAL archives to avoid conflicts Q3) How should I set S2 and S3's recovery.conf so S3 automatically follows S2 when promoted to master? recovery_target_timeline = latest ? Thanks in advance, Daniel Serodio [1] https://github.com/wal-e/wal-e -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cannot connect to remote postgres database
Stephen Carville wrote: I have been asked to evaluate Oracle, mysql and postgresql as a possible replacement for our existing Oracle and MsSQL databases. Oracle and mysql I pretty much have covered. Postgresql, OTOH, is somewhat less cooperative. I have the software (v 8.4.13) installed on 64 bit Centos 6. It is listening on all available interfaces and netstat confirms this. I created an additional user for the postgres db: If you want to evaluate PostgreSQL, you should evaluate v9.2. 8.4 is pretty ancient and lacks lots of cool features so your comparison won't be fair to PostgreSQL. postgres=# \du List of roles Role name | Attributes | Member of ---+-+--- postgres | Superuser | {} : Create role : Create DB stephen | Superuser | {} : Create role : Create DB I assigned passwords using alter role etc.. Which exact ALTER ROLE did you use? Feel free to redact the actual password, of course. The problem is that no authentication method except trust seems to work. in pg_hba.conf: local all all trust hostall all 198.204.114.0/24 md5 I've tried both of the above users and get the same error each time: psql: FATAL: password authentication failed for user username I tried changing md5 to password and pam without success. Onlt trust works As near as I can tell by reading the documentation, it is setup correctly but I have, obviously, done something wrong. md5 is the standard. password is plain text (which you don't want) and pam will try to authenticate against OS users, which is probably not what you want. Any hints on where to start looking? Is there any NAT happening between the client and the server? Check the server's log for a LOG: connection received: host=x.x.x.x message so you can check which IP is reaching the server. Regards, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote: ... 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: ... 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a vacuum full inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a vacuum full or vacuum full $tables sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. If you'd like I can easily recreate the scenario by simply not cleaning up one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. 5) I don't yet know if the full option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful. The answer to this is mostly non-deterministic. non-full vacuum can only free space from the end of the table. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
The good news is that we have now resolved our critical problem (disk space overuse) with a somewhat hackish, slow answer that is nonetheless good enough for now. Now I'd like to work out how to get autovacuum to work smoothly within our cluster. I'm happy to try to clarify my notes and post them either here or on the PG wiki as I get them resolved. There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to avoid Vacuum Full ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the end of the table. (See Jeff Janes 11/10/2012 email) non-full vacuum can only free space from the end of the table. This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a full analyze? What about autovacuum and the reindex question at the end of this email? On 11/10/2012 02:49 PM, Adrian Klaver wrote: Seems to have changed in 8.3: http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time... Sadly, this change means that I can no be certain of the utility of the otherwise excellent-sounding advice originally offered by Scott, quoted below. It appears that naptime is (as of 9.x) almost irrelevant since it's defined per database, and dropping this from 1 minute to 5 seconds would have very little overall impact. These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled. Long running transactions - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum? As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. Reducing autovacuum nap time to 5 or 10 seconds would be a good move here, also possibly making it more aggressive by increasing max worker threads, decreasing cost delay (possibly to zero or close to it) and / or increasing cost limit. After making such a change then watching iostat when vacuum is running to see how hard its hitting your IO subsystem. I'm guessing that with SSDs it isn't gonna be a big problem. As Greg Smith has pointed out in the past, usually the answer to an autovacuum problem is making it more, not less aggressive. Unless you're flooding your IO this is almost always the right answer. Keep in mind that autovacuum by default is setup to be VERY unaggressive because it may be running on a netbook for all it knows. To tune autovacuum with 50 databases, start by dropping nap time to something much lower, like 10s. Then if you need to, drop cost delay until you get to 0. If you get to 0 and it's still not hitting your IO too hard, but not keeping up, then increase cost limit. If you get to something in the 5000 to 1 range, and its still not keeping up then start bumping the thread count Should I increase the max_workers field from the default of 3 to (perhaps) 10? Noting that my solution to the disk space problem is effectively a max_worker of 1 since it's all done sequentially, I wonder if reducing max_workers would actually be better? Also, what's the thread count ? Is that max_workers? Why would I want to reduce the cost delay to 0, and how does this relate to cost_limit? Careful reading of the docs: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes me believe that, given my substantial I/O subsystem, I'd want to drop cost_delay to near zero and set the cost_limit really high, which is a rough restatement of the last quoted paragraph above. (I think) Assuming that I make these suggestions and notice a subsequent system load problem, what information should I be gathering in order to provide better post-incident forensics? We have statistics turned on, and aren't using replication. (yet) Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/12/2012 01:31 PM, Jeff Janes wrote: On Mon, Nov 12, 2012 at 10:38 AM, Lists li...@benjamindsmith.com wrote: On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote: 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a vacuum full inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a vacuum full or vacuum full $tables sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. With what version? [root@alpha ~]# rpm -qi postgresql91-server Name: postgresql91-server Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:41 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/DatabasesSource RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size: 15191132 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ If you'd like I can easily recreate the scenario by simply not cleaning up one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. It would only be better if it actually created the situation that caused the space to not be freed. But, until you know the actual cause of a problem, I've found that it's often not productive to create simulations that may or may not be actually related to the problem. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. Based on my understanding, if your loop included an intermittent schema change from within a transaction it might better approximate my actual scenario. Merely creating temp tables and then dropping them would create lots of activity at the end of the table which would free correctly. This still does not explain why reindex $table works when reindex is supposedly implicit in the vacuum. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. If you can prevent the extreme bloat from occurring in the first place, then the end of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. I agree; this is why my questions on enabling autovacuum in a related thread. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/09/2012 05:26 PM, Steve Crawford wrote: Bloat in pg_attribute would correlate with A) (or any constant creation/destruction of tables). You can vacuum and/or reindex the system tables if you are connected as the superuser but you are better off preventing bloat by appropriate adjustment of your configuration settings. However note that if you do frequent bulk creation/destruction of tables you could end up bloating the attribute table between vacuum runs and may need to periodically manually shrink it. Steve, Our system divides customers into distinct databases, however customers are often clustered. (Think: different locations of 7/11) and so we have to aggregate data from different databases. We do this with dblink to get the data and temp tables to collate it, which appears to be a cause of the bloat we're seeing. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
As I've spent a considerable amount of time trying to sort this out, I'm posting it for the benefit other users. I've experienced persistent, ongoing issues with autovacuum in a mixed read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is not yet determined. It may be related to the any or all of the combination of: A) extensive use of temp tables; B) extensive use of multiple dblink() calls in a single query; C) use of transactions, especially prepared transactions and multiple savepoints; D) concurrent use of pg_dump; E) use of numerous databases on a single server, average about 50; To offset this, we turned off autovacuum, and used an old script to vacuum the tables in the middle of the night when nobody was looking. Unfortunately, the vacuum script only vacuumed the userland tables and tremendous amounts of disk space were being wasted, particularly in the pg_attribute tables. However, use of any of the statements vacuum analyze, vacuum full analyze, vacuum full verbose analyze without mentioning specific tables did not resolve the extra disk space used issue, disk usage still remained at least 5x the expected amount in all cases. (in one case, use of all of these open-ended vacuum queries did almost nothing) Nor did running any variation of vacuum analyze $table in a loop thru all tables (including the pg_* tables) completely resolve the issue, either. In order to completely clean things up, we ended up writing a script do the following: 1) Determine the databases using excessive disk space, in descending order of use with this query: SELECT d.datname as Name, d.datistemplate::int AS datistemplate, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first; 2) For each database from #1, get a list of tables to be cleaned up with this query: SELECT nspname || '.' || relname AS table, pg_total_relation_size(C.oid) AS size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind = 'r' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC; 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: 1) The above queries are derived from queries found to determine how much disk space was used, even though the additional information provided isn't actually used by the script. 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. 3) I'd like to get autovacuum to work. I've read suggestions to tweak cost_delay and/or cost_limit. I haven't yet determined if the problem is I/O based or lock/deadlock based. I'm guessing the problem is the latter, though it's hard to tell because queries stack up quickly and load average is sky high when autovacuum fails for us. 4) The aforementioned process is S-L-O-W. Think at least hours and probably days depending on your databases, your server(s), and the load. 5) I don't yet know if the full option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:42 PM, Tom Lane wrote: ... because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. Even so, if I felt the need to keep autovacuum off, what would I need to run regularly in order to keep things neat and tidy under the hood? Would a simple vacuum within each database suffice? Should I be logged in as the database owner or as an administrative user? Thanks, Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 09:01 AM, Jeff Janes wrote: Ben, did you ever figure out where the space was going? I think we've found where the space is going, but I still don't yet know how to resolve it. I modified your query thusly in order to get a total of space used, and got an answer that matches closely: with stuff as (SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first) SELECT sum(size) AS overall from stuff; Result: 171,276,369,124 # du -sbc /var/lib/pgsql/9.1/data/* Result: 172,087,129,512 Now, the question is, I see several databases that uses disk usage with sizes that are dramatically different than I get from a dump/restore to another machine: Production: santarosa444| postgres | 44 GB Dump/Restore: santarosa444| postgres | 685 MB Now, here's where it gets weird. From the disk space usage wiki, (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it to get a total disk space used result: with mytable AS ( SELECT nspname || '.' || relname AS relation, pg_total_relation_size(C.oid) AS size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable ... but the total result is 747,569,152 which is close to the dump/restore value, not the production server value, even though I'm running this query on the production server. So there's *something* that the latter query isn't identifying that the former is. On a hunch, ran this query: with mytable AS ( SELECT nspname || '.' || relname AS relation, pg_total_relation_size(C.oid) AS size FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable; And the result is 46,771,216,384! Removing the mytable wrapper stuff, here are the top results: pg_catalog.pg_attribute | 36727480320 pg_catalog.pg_attrdef| 3800072192 pg_catalog.pg_depend | 2665930752 pg_catalog.pg_class | 1508925440 pg_catalog.pg_type | 1113038848 public.att_claims| 451698688 public.stgrades | 127639552 pg_catalog.pg_index | 107806720 Google returns this page: http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which doesn't help me much. So, am I doing something wrong with admin? Our current process is that every night in the middle of the night, a script connects to each database on each server and runs a query to get all tables in each database and, for each, run VACUUM ANALYZE $table for each table in the database. And then once a week: psql -U postgres -c \l | grep -Po (\w+444) | xargs -t -i psql -U postgres {} -c REINDEX DATABASE {}; (note: there is a database for the postgres user on each DB server) The script is a remnant from PG 8.x days, so am I missing something fundamental about 9.x? I will note that autovacuum is off because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. Our scenario is pretty much a worst-possible case of transactions, prepared transactions, temp tables, and concurrent read/write queries. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:42 PM, Tom Lane wrote: So you've turned off autovacuum, and are carefully not vacuuming the system catalogs. That's your problem all right. Is there a particularly good reason why this script isn't a one-liner VACUUM? Back in the 8.x days, we experienced vacuum full analyze occasionally causing other processes to hang/timeout. In an attempt to minimize the impact of the locking, we updated the script to vacuum one table at a time, which seemed to work well throughout the 8.x series. I'd happily accept that this conclusion may have simply have been wrong, but it worked well enough that nobody complained and life was good. After switching to 9.x, we read that the full vacuum was less useful and so the script was changed to vacuum analyze $table rather than vacuum full analyze $table. Are you sure that once-a-day vacuuming is sufficient, even if it was covering the system catalogs? If you've managed to bloat pg_attribute to 36GB, I suspect you've got enough churn (perhaps from temp tables) that you really need the catalogs vacuumed more often. The only thing that I could find in the docs even mentioning the idea of vacuuming catalogs is this sentence: (A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html This does NOT clearly say that the end user could vacuum catalogs, let alone that it's necessary or even a good idea. Otherwise, the only mention is of tables, and there's no mention of the idea that tables are anything but user space. My advice is dump, reload, and *don't* turn off autovacuum. ... because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. We tried several times to turn on autovacuum with 9.1 and had problems every time. If our use case is particularly special, I'd love to work with you to get autovacuum to work in our situation too as it would make life easier for us! But for the past few months, every time we've turned it on, we've had our phones swamped with customers who are unable to use our system while our application monitors scream bloody murder, at least weekly. From what we could tell (under extreme pressure to get it all working again ASAP, mind you) it seemed that when doing a large update from within a transaction, autovacuum would get triggered before the transaction completed, causing the transaction to hang or at least slow way down, causing timeouts to occur with load balancers, so customers would then try again, compounding the ongoing problem. Pretty soon you have not only I/O issues, but also locking issues and upset customers. This issue may be compounded because we make fairly extensive use of dblink and temp tables to aggregate data for our customers who have multiple sites. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:58 PM, Scott Marlowe wrote: My experience is that if autovac is causing problems with stalled queries etc you're either A: running ancient pg versions (pre 8.3), B: Running WAY too aggressive settings in autovac (100 threads, no nap time, cost limit of 10 etc.) or C: Your IO subsystem is absolute crap. On any modern server, default autovac settings from 8.3 and on should only have the possible problem of not being tuned aggressively enough. A) We are running PG 9.1. B) We used the default settings in the RPMs provided by yum.postgresql.org. At the bottom of this message is information about the RPMs we currently are using. C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration, capable of tens of thousands of IO operations per second. Servers are recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases. As stated previously, we make extensive use of temp tables, transactions, and dblink, but had no trouble with catalog table bloat in 8.x; this is a new phenomenon for us. # rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64 Name: postgresql91 Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:24 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/DatabasesSource RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size: 5193673 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ Summary : PostgreSQL client programs and libraries Description : PostgreSQL is an advanced Object-Relational database management system (DBMS) that supports almost all SQL constructs (including transactions, subselects and user-defined types and functions). The postgresql package includes the client programs and libraries that you'll need to access a PostgreSQL DBMS server. These PostgreSQL client programs are programs that directly manipulate the internal structure of PostgreSQL databases on a PostgreSQL server. These client programs can be located on the same machine with the PostgreSQL server, or may be on a remote machine which accesses a PostgreSQL server over a network connection. This package contains the command-line utilities for managing PostgreSQL databases on a PostgreSQL server. If you want to manipulate a PostgreSQL database on a local or remote PostgreSQL server, you need this package. You also need to install this package if you're installing the postgresql91-server package. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to verify pg_dump files
On 11/07/2012 11:56 AM, Igor Neyman wrote: The only 100% fool-proof test would be to restore from your backup files. Regards, Igor Neyman Our internal process is to back up production databases regularly, and then use the backups offsite to populate copies of databases for developer use. This allows us to test with real data, identifying real world bugs that would not appear with often-limited, manually created, sample data, as well as verify our backups on a regular, daily basis. I'd strongly recommend something similar if it works for you. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Jeff, thanks for the feedback! On 11/05/2012 08:51 PM, Jeff Janes wrote: My first guesses would be things that are not directly under the databases control, such as: 1) your server logs are accumulating and you aren't doing anything about them I'm guessing that this is not the case: [root@delta data]# du -shc * | grep -i log 47M pg_clog 15M pg_log 641Mpg_xlog 2) you are taking backup snapshots to somewhere in that directory and not cleaning them up Our backup snapshots (taken with pg_dump) are taken on a different server over the network. Dumps are made several times during each day. Could this be part of the problem if (somehow) they didn't complete? And if so, would there be some cleanup I'd have to do other than restarting PG? 3) your archive_command is failing (which you should see reports of in the server logs) and so you are accumulating xlog files. As I understand things, the result above under 1) demonstrates that this, also, is not the cause. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
I followed your example, the result is at the bottom. Based on this it would seem that there are 3-4 databases that seem to be the culprit. How could I get more depth/detail on what specifically is the problem? -Ben On 11/05/2012 07:10 PM, Scott Marlowe wrote: What does du -sh have to say about it? Use unix tools to examine your file system and see where the usage is going. For instance, I can do this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that I'm using about 16MB for each pg_xlog and base. I can then do cd into base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 11564 Which shows me using about 5MB each for three different dbs. And so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? That might increase disk space usage a bit. [root@delta ~]# cd /var/lib/pgsql/9.1/data/ [root@delta data]# du -s * | sort -n 4 pg_ident.conf 4 pg_serial 4 pg_tblspc 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 8 pg_hba.conf 12 pg_notify 12 pg_twophase 20 postgresql.300 20 postgresql.conf 20 postgresql.conf.20120903 20 postgresql.conf.300 76 pg_subtrans 104 pg_multixact 15044 pg_log 18184 global 25216 pg_stat_tmp 47916 pg_clog 671916 pg_xlog 164753204 base [root@delta data]# cd base [root@delta base]# du -s * | sort -n 4 pgsql_tmp 612412772 638812780 64241 72424 331506 72700 160676 72896 391655 73200 52389 73216 523672 74104 619675 74956 295646 76768 307580 77896 547597 80824 571547 87368 475799 90940 631604 113876 124651 123548 148525 130096 367533 149792 439726 173648 355578 175404 679545 190732 559580 225780 511706 326468 667547 352736 655477 398736 535644 469408 136582 483716 499753 513124 270926 575612 715601 590408 487780 04 463779 713208 643540 714896 583515 803216 343438 806952 427663 855156 739506 872200 197221 975692 64371 987692 775594 1005268 595488 1024812 691482 1042212 727552 1047464 379566 1260044 76601 1276756 16384 1345072 403667 1474468 209158 1477808 172604 1536168 221124 1637652 258798 1811504 88598 1963740 245588 2076748 703467 2193536 415671 2430908 801322 2552640 319552 2785212 28315 3454880 112612 3755548 451666 3929420 100666 4651876 40451 5714940 751514 6257740 233293 7313900 184735 9334796 763606 10940780283609 20837264788338 45285640607471 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Error registering at postgresql.org
I'm trying to register at postgresql.org so I can edit the wiki to fix a broken link. I received a link on my e-mail so I could set my password, but when I submit the Change password form I get an error: Forbidden (403) CSRF verification failed. Request aborted. More information is available with DEBUG=True. I've tried appending ?DEBUG=True to the URL but got no further information. Can someone help? Thanks, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unexpectedly high disk space usage
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB servers with more disk space and memory. Unexpectedly, the DB servers have steadily increased their disk space usage since. Reported system load doesn't seem to be affected. It's happening to all our DB servers running 9.1. When we reload all pg_dumps from our worst-affected server into an offline server, the disk space usage is about 26 GB, but the production database is using 166 GB. (# df /var/lib/pgsql;) To resolve this, we've tried: 1) reindexed everything (cut about 10% of disk usage temporarily) 2) tried vacuum full, and vacuum analyze on all databases. (to minimal effect) 3) Restarting PG (no discernable effect) including a full stop/start. 4) We've looked for stale prepared transactions (none found) 5) instructions from the wiki to try to determine what the cause of all the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up all the results for all the different databases, tables, indexes, etc. in a script, we get a number very close to the usage of the freshly loaded server. (24 GB) What is Postgres doing with ~ 80% of its disk space usage? This is not normal, is it? I would hate to have to take the servers off line just to dump/restore in order to bring disk usage back to normal... SYSTEM SPECS: I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S) #FROM: sysctl.conf: # Controls the maximum shared segment size, in bytes kernel.shmmax = 136365211648 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 -Ben # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The = is optional.) Whitespace may be used. Comments are introduced with # # anywhere on a line. The complete list of parameter names and allowed # values can be found in the PostgreSQL documentation. # # The commented-out settings shown in this file represent the default values. # Re-commenting a setting is NOT sufficient to revert it to the default value; # you need to reload the server. # # This file is read on server startup and when the server receives a SIGHUP # signal. If you edit the file on a running system, you have to SIGHUP the # server for the changes to take effect, or use pg_ctl reload. Some # parameters, which are marked below, require a server shutdown and restart to # take effect. # # Any parameter can also be given as a command-line option to the server, e.g., # postgres -c log_connections=on. Some parameters can be changed at run time # with the SET SQL command. # # Memory units: kB = kilobytesTime units: ms = milliseconds #MB = megabytes s = seconds #GB = gigabytes min = minutes # h = hours # d = days #-- # FILE LOCATIONS #-- # The default values of these variables are driven from the -D command-line # option or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory # (change requires restart) #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file # (change requires restart) #ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file # (change requires restart) # If external_pid_file is not explicitly set, no extra PID file is written. #external_pid_file = '(none)' # write an extra PID file # (change requires restart) #-- # CONNECTIONS AND AUTHENTICATION #-- # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all # (change requires restart) port = 5432 # (change requires restart) # max_connections = 150 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #superuser_reserved_connections = 3 # (change requires restart) #unix_socket_directory = '' # (change requires restart)
[GENERAL] Index creation problem
Hi, I'm getting a problem where my application hangs in the process of adding a field to a table. The app adds the field, then creates an index on the field. It hangs for ages (minutes) until I cancel the query. My investigation so far has been Look at current queries: agilebasedata=# SELECT datname,procpid,current_query FROM pg_stat_activity; datname | procpid | current_query -+-+- agilebaseschema |5799 | IDLE in transaction agilebasedata | 18126 | SELECT datname,procpid,current_query FROM pg_stat_activity; agilebasedata |5844 | IDLE agilebasedata |5108 | CREATE INDEX l_ntvs1fk9de719830100m5aoi8suwo ON ntvs1fk9desoci59z(lower(m5aoi8suwo4jocu76) varchar_pattern_ops) agilebasedata |5109 | IDLE in transaction agilebaseschema | 25200 | IDLE agilebasedata | 29257 | IDLE agilebasedata | 31574 | IDLE (8 rows) As you can see, the only user query running is the CREATE INDEX. Cancelling this with select pg_cancel_backend(5108); gets the app back on it's feet. I thought there may be some sort of deadlock, but there is only one long running query. By the way, the indexing query should return quickly, as there are 0 rows in the table. It does in fact return immediately when you run it manually: agilebasedata=# CREATE INDEX l_nx4uaurg3r1981190097whsqcun3e9 ON nx4uaurg3r94vwqhj(lower(whsqcun3e9lfzlwlv) varchar_pattern_ops); ERROR: relation l_nx4uaurg3r1981190097whsqcun3e9 already exists agilebasedata=# drop index l_nx4uaurg3r1981190097whsqcun3e9; DROP INDEX agilebasedata=# CREATE INDEX l_nx4uaurg3r1981190097whsqcun3e9 ON nx4uaurg3r94vwqhj(lower(whsqcun3e9lfzlwlv) varchar_pattern_ops); CREATE INDEX One thing to mention is that there are a lot of indexes and relations: agilebasedata=# select count(*) from pg_index; count --- 2587 (1 row) agilebasedata=# select count(*) from pg_class; count --- 5361 (1 row) I wonder if I'm running up against some sort of limit. I am going to change the code so it doesn't add an index (it's not always necessary) but would like to get to the bottom of things first. Regards Oliver Kohll www.gtwm.co.uk - company / www.agilebase.co.uk - product
Re: [GENERAL] Index creation problem
On 19 Oct 2012, at 13:28, Frank Lanitz fr...@frank.uvena.de wrote: Just an idea without bigger investigation: Whare are the idle in transactions are doing? Maybe they are blocking the create index. Cheers, Frank Good question, I don't know. The app runs on Java / Apache Tomcat, which maintains a connection pool, so I assumed the IDLEs were just opened connections but I didn't notice two were idle in transaction. I haven't dealt with this before but this looks like a good explanation: http://www.depesz.com/2008/08/28/hunting-idle-in-transactions/ I will up logging and try to replicate. Next time I will look at pg_locks too. Also, the PG version is 9.1.6. Oliver
Re: [GENERAL] Index creation problem
On 19 Oct 2012, at 15:09, Albe Laurenz laurenz.a...@wien.gv.at wrote: You should always include the list in your replies - other people might be interested in the solution. Oops, thought I had. There must be at least two transactions involved to create a locking problem like you describe. But since CREATE INDEX takes strong locks, it can easily get locked by other harmless things. OK, I've reduced the calls to CREATE INDEX, waiting to test that. If there's still an issue I will use CONCURRENTLY as you suggested. Many thanks Oliver
Re: [GENERAL] database corruption questions
Craig Ringer wrote: On 10/14/2012 05:53 AM, Heine Ferreira wrote: Hi Are there any best practices for avoiding database corruption? * Maintain rolling backups with proper ageing. For example, keep one a day for the last 7 days, then one a week for the last 4 weeks, then one a month for the rest of the year, then one a year. What kind of rolling backups? From pg_basebackup? * Use warm standby with log shipping and/or replication to maintain a live copy of the DB. * If you want point-in-time recovery, keep a few days or weeks worth of WAL archives and a basebackup around. That'll help you recover from those oops I meant DROP TABLE unimportant; not DROP TABLE vital_financial_records; issues. * Keep up to date with the latest PostgreSQL patch releases. Don't be one of those people still running 9.0.0 when 9.0.10 is out. The problem is that updating the database usually results in downtime. Or can the downtime be avoided in a replication scenario? * Plug-pull test your system when you're testing it before going live. Put it under load with something like pgbench, then literally pull the plug out. If your database doesn't come back up fine you have hardware, OS or configuration problems. * Don't `kill -9` the postmaster. It should be fine, but it's still not smart. * ABSOLUTELY NEVER DELETE postmaster.pid * Use good quality hardware with proper cooling and a good quality power supply. If possible, ECC RAM is a nice extra. * Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD reviews periodically posted on this mailing list if considering using SSDs. Make sure the SSD has a supercapacitor or other reliable option for flushing its write cache on power loss. Always do repeated plug-pull testing when using SSDs. * Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not the right choices for a database you care about. Never, ever, ever use FAT32. * If on Windows, do not run an anti-virus program on your database server. Nobody should be using it for other things or running programs on it anyway. * Avoid RAID 5, mostly because the performance is terrible, but also because I've seen corruption issues with rebuilds from parity on failing disks. * Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance. * If you're going to have a UPS (you shouldn't need one as your system should be crash-safe), don't waste your money on a cheap one. Get a good online double-conversion unit that does proper power filtering. Cheap UPSs are just a battery with a fast switch, they provide no power filtering and what little surge protection they offer is done with a component that wears out after absorbing a few surges, becoming totally ineffective. Since your system should be crash-safe a cheap UPS will do nothing for corruption protection, it'll only help with uptime. -- Craig Ringer Thanks, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] allow servers to access to the same data
Tulio wrote: You can use a stream replication in hot standby (native) to have the same data and access in both (but not update and insert in the slave, just select) and create a virtual IP using heartbeat.. configuring a master to use some IP (virtual) and when this lost the IP, the second server (slave) will be use this IP.. and will allow update and insert... is it that you want? I've come across a few mentions of Heartbeat being used for PostgreSQL failover, do have any links to more information about this? Em 17/10/2012 10:20, GMAIL escreveu: it's possible to access the same data from two different servers. the two servers have the same IP and not run simultaneously Thanks in advance, Daniel Serodio
Re: [GENERAL] allow servers to access to the same data
Shaun Thomas wrote: On 10/17/2012 12:53 PM, Daniel Serodio (lists) wrote: I've come across a few mentions of Heartbeat being used for PostgreSQL failover, do have any links to more information about this? This was the subject of my talk at PG Open this year. I've got the entire PDF of slides, liner notes, and instructions on the Postgres Wiki: http://wiki.postgresql.org/wiki/Postgres_Open_2012 Full link to PDF: http://wiki.postgresql.org/images/0/07/Ha_postgres.pdf It's a very cut-down version of the approach we've used successfully for a while. That's great, thanks for the links. Regards, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Is pg_basebackup also for regular backups?
I was reading the documentation for pg_basebackup and it states that resulting backups can be used both for point-in-time recovery and as the starting point for a log shipping or streaming replication standby servers. Should pg_basebackup also be used for regular backups, or only for PITR/streaming replication? Regards, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help estimating database and WAL size
Jasen Betts wrote: On 2012-10-08, Daniel Serodio (lists)daniel.li...@mandic.com.br wrote: We are preparing a PostgreSQL database for production usage and we need to estimate the storage size for this database. We're a team of developers with low expertise on database administration, so we are doing research, reading manuals and using our general IT knowledge to achieve this. We have actual data to migrate to this database and some rough estimations of growth. For the sake of the example, let's say we have a estimation of growth of 50% per year. The point is: what's the general proper technique for doing a good size estimation? We are estimating the storage usage by the following rules. Topics where we need advice are marked with ** asterisks **. Feedback on the whole process is more than welcome. 1) Estimate the size of each table 1.1) Discover the actual size of each row. - For fields with a fixed size (like bigint, char, etc) we used the sizes described in the documentation - For fields with a dynamic size (like text) we estimated the string length and used the function select pg_column_size('expected text here'::text) long text is subject to compression, pg_column_size doesn't seem to test compression, compression is some sort of LZ.. Interesting, I didn't know about automatic compression. I've just read the section on TOAST and haven't been able to answer this either: Is there any way to check for the compressed size? - We added 4 more bytes for the OID that PostgreSQL uses internally OID is optional, IIRC PGXID is not I hadn't heard of PGXID, I've just searched Google but found no reference to this term except for this e-mail thread and some source code. What is PGXID? Where can I learn more about hit? 1.2) Multiply the size of each row by the number of estimated rows ** Do I need to consider any overhead here, like row or table metadata? ** page size 8K column overhead 1 byte per not-NULL column, NULLs are free, 2) Estimate the size of each table index ** Don't know how to estimate this, need advice here ** IIRC ( data being indexed + 8 bytes ) / fill factor 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** how big are your transactions? Very short, a couple of statements each. 4) Estimate the size of the backups (full and incremental) ** Don't know how to estimate this, need advice here ** depends on the format you use, backups tend to compress well. 5) Sum all the estimates for the actual minimum size no, you get estimated size. Thanks a lot for the response. Regards, Daniel Serodio 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 1, 2 and 4 for the minimum size after 1 year 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 and 6 for a good safety margin I know the rules got pretty extensive, please let me know if you need more data or examples for a better understanding. We've also posted this question to http://dba.stackexchange.com/q/25617/10166
Re: [GENERAL] Help estimating database and WAL size
John R Pierce wrote: On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote: 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead Logs). These are typically 16MB each. on databases with a really heavy write load, I might bump the checkpoint_segments as high as 60, which seems to result in about 120 of them being created, 2GB total. these files get reused, unless you are archiving them to implement a continuous realtime backup system (which enables PITR, Point in Time Recovery) Thanks, I was using the term transaction log as a synonym for WAL. We're planning on enabling PITR; how can we calculate the WAL size and the WAL archive size in this case? Regards, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is pg_basebackup also for regular backups?
Magnus Hagander wrote: On Mon, Oct 15, 2012 at 10:15 PM, Daniel Serodio (lists) daniel.li...@mandic.com.br wrote: I was reading the documentation for pg_basebackup and it states that resulting backups can be used both for point-in-time recovery and as the starting point for a log shipping or streaming replication standby servers. Should pg_basebackup also be used for regular backups, or only for PITR/streaming replication? It's for regular backups, based off PITR. I'm sorry, I don't understand what you mean by based off PITR. Do I need PITR enabled in order to use pg_basebackup, or do you mean that I won't lose the ability to use PITR after restoring from a backup created with pg_basebackup, or do you mean something else? Regards, Daniel Serodio
[GENERAL] Help estimating database and WAL size
We are preparing a PostgreSQL database for production usage and we need to estimate the storage size for this database. We're a team of developers with low expertise on database administration, so we are doing research, reading manuals and using our general IT knowledge to achieve this. We have actual data to migrate to this database and some rough estimations of growth. For the sake of the example, let's say we have a estimation of growth of 50% per year. The point is: what's the general proper technique for doing a good size estimation? We are estimating the storage usage by the following rules. Topics where we need advice are marked with ** asterisks **. Feedback on the whole process is more than welcome. 1) Estimate the size of each table 1.1) Discover the actual size of each row. - For fields with a fixed size (like bigint, char, etc) we used the sizes described in the documentation - For fields with a dynamic size (like text) we estimated the string length and used the function select pg_column_size('expected text here'::text) - We added 4 more bytes for the OID that PostgreSQL uses internally 1.2) Multiply the size of each row by the number of estimated rows ** Do I need to consider any overhead here, like row or table metadata? ** 2) Estimate the size of each table index ** Don't know how to estimate this, need advice here ** 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** 4) Estimate the size of the backups (full and incremental) ** Don't know how to estimate this, need advice here ** 5) Sum all the estimates for the actual minimum size 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 1, 2 and 4 for the minimum size after 1 year 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 and 6 for a good safety margin I know the rules got pretty extensive, please let me know if you need more data or examples for a better understanding. We've also posted this question to http://dba.stackexchange.com/q/25617/10166 Thanks in advance, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] RFE: Column aliases in WHERE clauses
It would be nice if PostgreSQL supported column aliases in WHERE clauses, eg: SELECT left(value, 1) AS first_letter FROM some_table WHERE first_letter 'a'; Is this the proper mailing list for such feature requests? Thanks in advance, Daniel Serodio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] RFE: Column aliases in WHERE clauses
Ryan Kelly wrote: On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote: It would be nice if PostgreSQL supported column aliases in WHERE clauses, eg: SELECT left(value, 1) AS first_letter FROM some_table WHERE first_letter 'a'; Is this the proper mailing list for such feature requests? I think this is explicitly disallowed by the spec. And by Tom: http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php Tom's explanation makes perfect sense, thanks for the pointer. Regards, Daniel Serodio
Re: [GENERAL] Too far out of the mainstream
Here's a bit of positive news spin - in a backhanded way perhaps, but still a compliment: http://www.theregister.co.uk/2012/08/31/postgresql_too_cool_for_school/ Oliver www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 21 Aug 2012, at 13:32, Vincent Veyron vv.li...@wanadoo.fr wrote: Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ? I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. This is a general 'cloud or dedicated' question, I won't go into it but I believe cloud proponents cite management ease, scalability etc. I'm sure there's a place for every type of hosting. However I would be interested in hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a client has just proposed running on one. If there are none forthcoming in the short term I may be in a position to provide some results myself in a month or two. Oliver Kohll www.agilebase.co.uk
Re: [GENERAL] Slow information_schema.views
On 22 Mar 2012, at 10:17, Albe Laurenz wrote: Or is there a better way of finding view dependencies? I see there's a pg_catalog entry for tables that a view depends on but that's not what I'm after. You can use pg_depend and pg_rewrite as follows: SELECT DISTINCT r.ev_class::regclass FROM pg_depend d JOIN pg_rewrite r ON (d.objid = r.oid) WHERE d.classid = 'pg_rewrite'::regclass AND d.refclassid = 'pg_class'::regclass AND r.ev_class d.refobjid AND d.refobjid::regclass::text LIKE '%myviewname%'; I didn't test it very much, so play around with it a little before you trust it. I don't know if it will perform better in your case, but it should return more appropriate results (you don't want to find VIEW dummy AS SELECT * FROM t WHERE a = 'myviewname'). Yours, Laurenz Albe Thank you - I did come to a similar method yesterday following some pointers from previous messages but I'm glad to have some confirmation it's the right direction. It does perform an order of magnitude faster for me, from 500ms down to under 20ms. My exact query is SELECT distinct dependent.relname FROM pg_depend JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid JOIN pg_class as dependent ON pg_rewrite.ev_class = dependent.oid JOIN pg_class as dependee ON pg_depend.refobjid = dependee.oid WHERE dependee.relname = 'myviewname' AND dependent.relname != 'myviewname' Haven't tested this much yet either. I'll compare yours to mine and check the differences. Regards Oliver www.agilebase.co.uk
[GENERAL] Slow information_schema.views
Hello, I'm doing some SELECTs from information_schema.views to find views with dependencies on other views, i.e. SELECT table_name FROM information_schema.views WHERE view_definition ILIKE '%myviewname%'; and each is taking about 1/2 a second, which is getting a bit slow for my use. There are 1213 views listed in information_schema.views Doing an explain analyze, it looks like the issue is likely to be the pg_get_viewdef function or one of the privilege check functions. I'm not worried about privilege checks and I don't need a nicely formatted definition. Is there a way of finding out how pg_get_viewdef works so I can perhaps do a lower level query? I've previously used pg_catalog.pg_views which performs similarly. Or is there a better way of finding view dependencies? I see there's a pg_catalog entry for tables that a view depends on but that's not what I'm after. Regards Oliver Kohll www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Multi master use case?
On 28 Jan 2012, at 15:27, Greg Sabino Mullane g...@turnstep.com wrote: Is this a case for multi master do you think? I.e. running one on the internet, one locally. Yes, could be. b) changing schemas (new tables, fields, views etc.) as well as data That's a tall order; I don't think anything will do that automatically, although rubyrep claims to at least pick up new tables. OK, I guess I could treat one as 'schema master' and pg_dump schema + data across to the other once a night, once all activity has stopped and standard replication completed. Any experiences/thoughts? My experience is with Bucardo, which should do the job admirably (but with the data only). My advice would be to just set up a test system and try rubyrep and Bucardo out. For the latter, use the latest Bucardo5 beta, as Bucardo4 will be deprecated soon: http://bucardo.org/downloads/Bucardo-4.99.3.tar.gz Thanks, I'll do that. Oliver www.agilebase.co.uk
Re: [GENERAL] explain analyse and nested loop joins
Thanks, It does look like an incorrect prediction. Looking again, I think it's the row estimate for the join that's out - the planner estimates one row returned, in which case a nested join would probably make sense, whereas in fact there are 23. However it's a generated (user created) query, so I think what I might do is get the application to detect this case from the query plan where there is a slow query and automatically test turning off nested joins. I'll just have to keep an eye on it to see if it becomes unnecessary in future PG versions. Regards Oliver www.agilebase.co.uk On 6 Nov 2011, at 04:17, Pavel Stehule wrote: Hello Propably there are a dependency between following columns - and then a prediction is not correct. Try to move one less selective to OUTER SELECT SELECT * FROM (SELECT your query OFFSET 0) x WHERE x.invoiced = false Regards Pavel Stehule 2011/11/5 Oliver Kohll - Mailing Lists oliver.li...@gtwm.co.uk: b2deliveryorders.complete = false AND b2deliveryorders.invoiced = false -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] explain analyse and nested loop joins
Hi, I have a query I'm trying to optimise. It takes just under a second to run, not too bad for my users but I'm worried that as the size of the data increases, it will get worse. Of course the plan may change when that happens but I'd also like to learn a bit more about optimisation anyway. The explain is here: http://explain.depesz.com/s/Ost - that one took 690ms. Seeing it had a couple of nested joins at the top, I 'set enable_nestloop = false;', resulting in an improvement of about 20x: http://explain.depesz.com/s/BRi The query is below. It joins to a second report dbvcalc_delivery_charges which I can also send if necessary. I've only guesses as to the reasons the default plan is slow or how to affect it, can someone enlighten me? Regards Oliver Kohll www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Fwd: explain analyse and nested loop joins
Oops, forgot to include the query, it's SELECT b2deliveryorders.idb2deliveryorders, a2clientpremises.ida2clientpremises, a2clientpremises.premisesname, a2clientpremises.town, b2deliveryorders.expectedby, b2deliveryorders.dateordered, b2deliveryorders.invoicenumber, b2deliveryorders.deliverymethod, b2deliveryorders.driver, dbvcalc_delivery_charges.total, dbvcalc_delivery_charges.boxes, b2deliveryorders.createdbyauto FROM b2deliveryorders LEFT JOIN a2clientpremises ON b2deliveryorders.a2clientpremisespremisesname = a2clientpremises.ida2clientpremises LEFT JOIN dbvcalc_delivery_charges ON b2deliveryorders.idb2deliveryorders = dbvcalc_delivery_charges.idb2deliveryorders WHERE b2deliveryorders.complete = false AND b2deliveryorders.invoiced = false ORDER BY b2deliveryorders.expectedby NULLS FIRST; Oliver Begin forwarded message: From: Oliver Kohll - Mailing Lists oliver.li...@gtwm.co.uk Subject: explain analyse and nested loop joins Date: 5 November 2011 19:21:23 GMT To: pgsql-general pgsql-general@postgresql.org Hi, I have a query I'm trying to optimise.
Re: [GENERAL] Random multiple times
Many thanks both, those solutions are great and have gone in my wiki for future ref. Regards Oliver On 21 Sep 2011, at 21:56, Szymon Guz wrote: Short answer is: yes. More information you can find here http://simononsoftware.com/problem-with-random-in-postgresql-subselect/ regards Szymon
[GENERAL] Random multiple times
Hi, I understand random() is a volatile function and runs multiple times for multiple rows returned by a SELECT, however is there a way of getting it to run multiple times *within* another function call and in the same row. i.e. something like select regexp_replace('+1 555 555 555', E'\\d', trunc(random() * 9 + 1)::text,'g'); regexp_replace +1 111 111 111 (1 row) As you can see, it returns the same digit each time. I've tried wrapping a select around the trunc too. Regards Oliver Kohll www.gtwm.co.uk / www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Realtime Query Dashboard Results
On 7 Jan 2011, at 22:02, THOMPSON, JARED (ATTBAPCO) jt0...@att.com wrote: I assume when people use dashboards they are not being queried every second for updating but maybe every minute? Are there any tools that work good on top of postgres? (I see in the stock market (though I am looking at/for production data) they seem to use tools that frequently update their dashboards. What is a realistic timeframe to expect query updates for a dashboard? Having written a dashboard on top of PostgreSQL (screenshot at http://blog.agilebase.co.uk/2010/03/31/dashboard-preview/ ), I can at least state my decisions: Charts are updated once a day overnight, or cached whenever someone looks at them in the system underlying the dashboard, so they are at most one day old. A chart is also updated when a user clicks on it to drill down to the data. Of course what you decide depends on what the business use case is and what demands there are on the system. In my cases so far the slowest charts take 1 or 2 seconds to generate by SQL so if necessary, each could be loaded in in real time over AJAX, though that hasn't been needed yet. Regards Oliver Kohll
Re: [GENERAL] locating cities within a radius of another
On 21 Jul 2010, at 23:14, Joe Conway m...@joeconway.com wrote: If you want something simple, and not requiring PostGIS, but plpgsql instead, see: http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php For completeness, the earthdistance module also provides the distance between two lat/longs, the point@point syntax is simple to use: http://www.postgresql.org/docs/8.3/static/earthdistance.html Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software
Re: [GENERAL] locating cities within a radius of another
On 22 Jul 2010, at 12:57, Geoffrey wrote: For completeness, the earthdistance module also provides the distance between two lat/longs, the point@point syntax is simple to use: http://www.postgresql.org/docs/8.3/static/earthdistance.html Disgregard my last post, Surely as soon as I hit send, the light went on... I'm looking at deriving my points for point @ point from ll_to_earth(). I constructed mine using point(longitude, latitude), where long and lat are double precision, which returns a datatype of type point. ll_to_earth() looks like it returns a datatype of type earth, so not sure if it will work. Maybe things have changed in a recent release, please let me know if so. So an example would be select point(-2.2171,56.8952)@point(-1.2833,51.6667) as miles; miles -- 363.202864676916 (1 row) Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software
[GENERAL] Backups / replication
Hello, I'm interested in using WAL shipping / replication for backup purposes but have no interest in failover. Currently my situation is: I have two servers, live and backup, which are in different cities. The backup server is also a test/development machine. Backups of my most important database are made hourly with pg_dump, excluding some larger tables with non-critical logging data. Even so, as the database grows, backups are taking longer and it looks as though they may start to impact performance. A full backup is made nightly and transferred to the backup machine, along with all of the day's hourly backups. I'm looking into using replication by WAL shipping - after all, there's no use to backing up data which hasn't changed since last time - only a small percentage of records are created/updated. However, I need a) to be able to restore to a point in time easily, which I can do to within an hour at the moment by restoring the correct dump. Sometimes users ask for a restore having accidentally updated/deleted records. b) to carry on running a test server database, that means one that's read and writeable. I obviously can't use a replication slave as a read/write test server at the same time. At the moment I've thought of a couple of options, I don't know if either are possible - I have a bit of a hazy idea of WAL replication. 1) Continuously ship the WAL records to somewhere on the test server unknown to Postgres but run the test machine as a normal database completely separately. If a backup is needed, delete the test database, restore to the last full backup (a filesystem backup?) and copy all WAL records into Postgres' directory so it can see them. Start it up configured to replay them, up to a certain time. 2) Run two instances of Postgres on the test/backup server on different ports, one configured as a replication slave, one normal. I'm not sure if this is possible with the RPM builds I'm using. Are either of those two likely? Any other suggestions? Another question is will the replication coming in v9.0 change things and would it be worth holding off until then? In particular Command Prompt's PITR tools look useful for restoring to a particular point in time, will these still work or will there be equivalents? Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Cumulative count (running total) window fn
Hello, Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running total: http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data and the results are slightly different. My query is almost exactly the same - I've simplified by grouping by year only rather than year and month: select extract(year from signup_date), count(email_address), sum(count(email_address)) over (rows unbounded preceding) from email_list group by 1 order by 1; date_part | count | sum ---+---+-- 2007 | 501 | 1374 2008 | 491 | 491 2009 | 382 | 873 2010 |66 | 1440 (4 rows) What I'm looking for is date_part | count | sum ---+---+-- 2007 | 501 | 501 2008 | 491 | 992 2009 | 382 | 1374 2010 |66 | 1440 It seems to be adding up the counts but not in the right order. I've also tried an explicit ORDER BY inside the partition with no difference: select extract(year from signup_date), count(email_address), sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding) from email_list group by 1 order by 1; Does anyone have any other ideas? Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software www.gtwm.co.uk - company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cumulative count (running total) window fn
On 29 Apr 2010, at 10:01, Magnus Hagander wrote: select extract(year from signup_date), count(email_address), sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding) from email_list group by 1 order by 1; Does anyone have any other ideas? Aren't you looking for something along the line of: SELECT year, sum(c) over (order by year) FROM ( SELECT extract(year from signup_date) AS year, count(email_address) AS c FROM email_list GROUP BY extract(year from signup_date) ) (adjust for typos, I didn't test it) Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery. Oliver Kohll
Re: [GENERAL] Cumulative count (running total) window fn
Aren't you looking for something along the line of: SELECT year, sum(c) over (order by year) FROM ( SELECT extract(year from signup_date) AS year, count(email_address) AS c FROM email_list GROUP BY extract(year from signup_date) ) (adjust for typos, I didn't test it) Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery. Oliver Kohll Like this?: SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1; Thom Almost, but put me on the right track! This one is exactly what I'm looking for: SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) FROM email_list GROUP BY 1 ORDER BY 1; The ORDER BY count(email_address) did give the same results for my data but only because the count values just happen to give the same ordering as the years - I tested by changing some dates. Many thanks all. Oliver
Re: [GENERAL] Cumulative count (running total) window fn
Curious note - how does the non-subselect version and the subselect version compare performance-wise? Magnus, On a test table with 12,000 rows there's not much in it, the subselect has a simpler plan but they both take practically the same time. The two plans (note I've been rewriting the field names for readability until now but haven't here): explain analyze SELECT year, sum(c) over (order by year) FROM ( SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c FROM a2e9a7e9e257153de GROUP BY extract(year from a56b7a8d6de03f67b) ) as subq; QUERY PLAN - WindowAgg (cost=851.49..874.06 rows=1290 width=16) (actual time=43.369..43.394 rows=5 loops=1) - Sort (cost=851.49..854.71 rows=1290 width=16) (actual time=43.340..43.342 rows=5 loops=1) Sort Key: (date_part('year'::text, a2e9a7e9e257153de.a56b7a8d6de03f67b)) Sort Method: quicksort Memory: 25kB - HashAggregate (cost=752.59..771.94 rows=1290 width=26) (actual time=43.300..43.317 rows=5 loops=1) - Seq Scan on a2e9a7e9e257153de (cost=0.00..689.56 rows=12605 width=26) (actual time=0.031..26.723 rows=12605 loops=1) Total runtime: 43.549 ms explain analyze SELECT extract(year from a56b7a8d6de03f67b), count(a10e4ab8863c199f1), sum(count(a10e4ab8863c199f1)) OVER (ORDER BY count(a10e4ab8863c199f1)) FROM a2e9a7e9e257153de GROUP BY 1 ORDER BY 1; QUERY PLAN --- Sort (cost=1382.39..1388.52 rows=2451 width=32) (actual time=44.229..44.230 rows=5 loops=1) Sort Key: (date_part('year'::text, a56b7a8d6de03f67b)) Sort Method: quicksort Memory: 25kB - WindowAgg (cost=1195.39..1244.41 rows=2451 width=32) (actual time=44.171..44.208 rows=5 loops=1) - Sort (cost=1195.39..1201.52 rows=2451 width=32) (actual time=44.125..44.127 rows=5 loops=1) Sort Key: (count(a10e4ab8863c199f1)) Sort Method: quicksort Memory: 25kB - HashAggregate (cost=1014.52..1057.41 rows=2451 width=32) (actual time=44.071..44.099 rows=5 loops=1) - Seq Scan on a2e9a7e9e257153de (cost=0.00..833.58 rows=24126 width=32) (actual time=0.032..26.683 rows=12605 loops=1) Total runtime: 44.396 ms Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software www.gtwm.co.uk - company
Re: [GENERAL] Invalid objects
On 25 Apr 2010, at 07:34, Scott Bailey wrote: I can point you to the relevant code in GitHub if you're interested (it's Java). Absolutely. Thanks Scott Bailey Line 813 of http://github.com/okohll/agileBase/blob/master/gtpb_server/src/com/gtwm/pb/model/manageSchema/DatabaseDefn.java - private void updateViewDbAction is the top level function. Regards Oliver Kohll -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid objects
Scott, I implemented a system exactly like this for the app in my signature below. Basically the system stores metadata containing each view definition and the joins between them. When someone needs to alter a view, say to remove a column, a DROP CASCADE is performed then each view is recreated in turn *in the correct order*. Everything's in a transaction, so a failure of recreation will roll back to the original state. This can of course happen if for example a column is deleted which other views use. This method is only used if the original attempt fails due to dependency errors - some view updates can work just fine anyway. I can point you to the relevant code in GitHub if you're interested (it's Java). Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll www.agilebase.co.uk - software www.gtwm.co.uk - company On 24 Apr 2010, at 13:01, Scott Bailey arta...@comcast.net wrote: Using views in Postgres can be a painful process. Changing a column in a base table will require you to drop all views that depend on it, and all views that depend on those views and so on. My coworker was complaining this morning that he now has a bunch of queries where a view is joined back on the original table to add a column that was missing from the view. It was easier to do this than to drop the view and all of it's dependencies and then find all the source code and rebuild all of the views in the correct order. So my thought was to create an invalid objects table to store the source and dependencies (and possibly permissions) when a DDL change invalidates a view or a function. And later you can call a procedure that (tries to) rebuild those invalid objects. My initial plan of attack is to just create a function that stores the information required to rebuild the dependencies before dropping them. Something like: store_and_drop('my_view_name') I'm thinking that ultimately it would be nice if postgres could do this automatically. Maybe: DROP my_view_name CASCADE WITH RESTORE So before I begin, has anyone already done this? And does anyone have any advice as to how it may best be done? Thanks Scott Bailey
[GENERAL] Running/cumulative count using windows
Hello, I'm still reasonably new to windowing functions, having used a few since 8.4 came out. I wonder if anyone can help with this one. I've got a table of email addresses in a CRM system similar to the following: CREATE TABLE test( signup_date timestamp, email_address varchar(1000) ); INSERT INTO test(signup_date, email_address) VALUES(now(), 't...@test.com'); INSERT INTO test(signup_date, email_address) VALUES(now(), 't...@test1.com'); INSERT INTO test(signup_date, email_address) VALUES(now() - '1 month'::interval, 't...@test2.com'); I'd like a running count, or cumulative count of the number of signups per month. I'm pretty sure a window function would do it but I can't work it out. So a plain count by month would be SELECT date_part('year',signup_date) as year, date_part('month',signup_date) as month, count(*) FROM test GROUP BY year, month ORDER BY year, month; giving year | month | count --+---+--- 2010 | 2 | 1 2010 | 3 | 2 How would you make the count a cumulative one? The output should then be year | month | count --+---+--- 2010 | 2 | 1 2010 | 3 | 3 Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)845 456 1810 / skype:okohll www.agilebase.co.uk - software www.gtwm.co.uk - company
Re: [GENERAL] [pgsql-general] looking for a powerful frontend/teport generator
On 29 Mar 2010, at 14:33, Clemens Eisserer linuxhi...@gmail.com wrote: Hi, Sorry for beeing a bit off-topic. Recently I've done some database-fontends, which I used java+swingset+netbeans-gui-builder for. Compared to plain java development is quite fast - however I wonder wether you could recommend db-fontend generators like the infamous access. What I've found so far was either: - extremly expensive - not compatible with free DBs (like postgres) - not powerful - not cross-platform Does anybody know tools which don't have properties like listed above? Thank you in advance, Cleens Hello, there are one or two (including one I developed) under 'GUI builders' at pgFoundry: http://pgfoundry.org/softwaremap/trove_list.php?form_cat=323 Regards Oliver Kohll oli...@agilebase.co.uk / +44(0)845 456 1810 www.agilebase.co.uk - software www.gtwm.co.uk - company
[GENERAL] pg_dump new version
Hello, May I take a lull between PG releases to ask an upgrade Q? Documentation and people often recommend that when doing a major version upgrade, to dump data from the old server with the new version of pg_dump, since the new pg_dump may have bugfixes not available to the old. I've never done this simply because it's not clear from the docs how to - perhaps it's platform dependant. The upgrade doc http://www.postgresql.org/docs/8.4/interactive/install-upgrading.html is a very clear step by step procedure to follow with the exception of 'To make the backup, you can use the pg_dumpall command from the version you are currently running. For best results, however, try to use the pg_dumpall command fromPostgreSQL 8.4.2, since this version contains bug fixes and improvements over older versions. While this advice might seem idiosyncratic since you haven't installed the new version yet, it is advisable to follow it if you plan to install the new version in parallel with the old version'. So for someone using RPM packages to install Postgres, what's the recommended sequence to do this? Regards Oliver Kohll www.agilebase.co.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_dump: SQL command failed
Hello, I've just come across this in an output from a cron backup script: /etc/cron.hourly/gtwm_backup_databases.sh: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: could not open relation with OID 572838 pg_dump: The command was: SELECT pg_catalog.pg_get_viewdef('572838'::pg_catalog.oid) AS viewdef The complete script is #!/bin/bash nice /usr/bin/pg_dump -U postgres -f /var/local/backup/agilebaseschema_`/bin/date +%H`.sql agilebaseschema nice /usr/bin/pg_dump -U postgres --exclude-table=dbint_log_* -f /var/local/backup/agilebasedata_`/bin/date +%H`.sql agilebasedata Running the script again manually, it works with no errors and I haven't noticed the error before. Unfortunately I recently deleted my archive of cron results so I can't do an exhaustive check and I can't see anything in the postgres log either but it's the first time I've noticed it since I can remember. A web search shows other people have come across this before now and again though I can't see any resolutions. I'm running postgres v8.4.1. Running that SELECT statement manually on both databases returns agilebasedata=# SELECT pg_catalog.pg_get_viewdef('572838'::pg_catalog.oid) AS viewdef; viewdef Not a view (1 row) Views are regularly altered, dropped and created in the agilebasedata database, 15 times today to date, which have all succeeded without error. Any help on what could cause that error? Regards Oliver Kohll oli...@agilebase.co.uk / 0845 456 1810 / 07814 828608 www.agilebase.co.uk - software www.gtwm.co.uk - company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pgAdmin in 8.4 installation uses tools from 8.3 installation
We just upgraded a customer's PostgreSQL installation from 8.3 to 8.4.1. I wanted to make a small change to the database. I wanted to take a backup before I did it, just in case. When I run pgAdmin from the PostgreSQL/8.4/bin folder and try to take a backup, the backup fails. The output shows me that it is running PostgreSQL/8.3/bin/pg_dump.exe instead of PostgreSQL/8.4/bin/pg_dump.exe. I found that the psql window that can be invoked from the latest pgAdmin is also running a tool from the 8.3 installation. How do I tell pgAdmin to use 8.4 tools instead of 8.3 tools? Preferences? I'm running on a Mac which means the preferences screen is accessible using PgAdmin3 Preferences - on Windows it may be something like File/Tools Options/Preferences. Then set the PG bin path as appropriate. It may also be that the 8.3 bin folder is set in your PATH variable. Regards, Andy
Re: [GENERAL] FATAL: no pg_hba.conf entry for host “::1”
Hi, So the issue seems to be with the last line on IPv6, to be honest i dont even know what IPv6 is, could disabling IPv6 do the trick or do i need to add something? Your machine is connecting to PostgreSQL using IPv6's localhost address. IPv6 is a newer format of IP addressing that allows for more IP addresses - ::1 = 127.0.0.1, basically. Newer TCP/IP stacks attempt to connect to localhost using the IPv6 address first, instead of IPv4 (the current format.) When you connect, try passing -h 127.0.0.1 to psql or whatever app/library you're using. That should force it to connect over IPv4. Or just uncomment the IPv6 line in your pg_hba.conf and restart PostgreSQL. The funny thing is with the exact same details a different application connects to postgreSQL fine? Any ideas and bare in mind im a newbie when explaining Is the different application on a different server? If so, it could be that the other app's OS is using IPv4 by default instead of IPv6. Or the app is connecting over an IPv4 address (e.g. 127.0.0.1) instead of localhost. Regards, Andy
Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...
On 27 Sep 2009, at 21:10, InterRob rob.mar...@gmail.com wrote: Peter, may I invite you to privately share some more details on the system you are using and the design of it? Did you implement it using PostgreSQL? Looking forward to your reply. (And with respect to your previous message: whom are you actually referring to by the acronym OPs?) Or publicly? I for one would be interested hearing more. From situations I've come across, EAV seems to be proposed when either 1) attributes are very numerous and values very sparse 2) people want to be able to quickly add (and remove?) attributes My feeling is it's probably valid for 1, at least I haven't come across anything better, but not for 2. Regards Oliver www.gtwm.co.uk - company www.gtportalbase.com - product
Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...
On 25 Sep 2009, at 07:22, InterRob rob.mar...@gmail.com wrote: I guess it IS quite overengineered indeed... What I'm trying to do is to facilitate different fieldwork methodologies for archaeological research (on project basis); there is no final agreement on data structure and semantics; however, on a meta-level all choices are rational and can be modelled... Infact, all models can be related to each other: that's where the hybrid part comes in: I wish to implement the common denominator (90%) and then further extend this, enabing specific data model implementations -- including checks for data integrity. Hi Rob, Just wondering if you've considered rapid prototyping of the core of it to try and gain consensus by giving people something they can see and talk about, as an alternative to doing a lot of design work up front? Regards Oliver Kohll www.gtwm.co.uk - company www.gtportalbase.com - product
Re: [GENERAL] Viable alternatives to SQL?
On 27 Aug 2009, at 17:11, pgsql-general-ow...@postgresql.org wrote: From: Kelly Jones kelly.terry.jo...@gmail.com Date: 27 August 2009 14:43:51 BST To: pgsql-general@postgresql.org Subject: Viable alternatives to SQL? Many sites let you search databases of information, but the search queries are very limited. I'm creating a site that'll allow arbitrary SQL queries to my data (I realize I'll need to handle injection attacks). Are there other viable ways to query data? I read a little on Business System 12 (BS12), Tutorial D, and even something called T-SQL (I think), but they all seem theoretical and not fully implemented. I want a query language that non-techies can use easily, but also supports arbitrarily complex queries. Does such a language exist? Originally I was using the webexone.com database to allow non-techies to create tables and views, however found it too limiting - the functional ceiling too low. So our org. built a web-app for this based on postgres - a GUI. However it's not aimed at letting non-techies do everything, which is rather ambitious. People familiar with SQL and relational db concepts can do practically everything but non-techs can do simpler tasks. For example I recently used some of the new windowing functions (yay!) in a view but users typically add/remove fields, filters and aggregate calculations. Info at www.gtportalbase.com, it's also just gone on github as open source. Oliver Kohll
[GENERAL] Division by zero
Hello, Divide by zero errors have come up a couple of times on this list (once raised by me). I wonder if I could propose a feature for discussion. Could the result of a division by zero be treated as infinity or null, rather than raising an error? Floating point types already have the concept of infinity. I'd have thought that there's no reason why a /0 in one row necessarily has to be fatal for the whole view. In many cases, you can imagine that returning infinity makes more sense. Strictly, I suppose, 1/0 should return infinity, 0/0 null and -1/0 negative infinity. Alternatively, all could return NaN. At least there could be a configuration option to turn on this behaviour. The concern stems from the fact that when a divide by zero occurs in a view, no rows at all are returned, just the error message. This makes it very difficult to work out where the problem value is, compared to other tools like spreadsheets, which return a cell error. A view can be very fragile. Further, the Postgres error doesn't give any details of the field and of course can't point to the row, it just says ERROR: division by zero There may well be good reasons for not treating this. I've come across comments such as 'I think everybody would agree that this would be a bad thing to do!' but remain to be convinced. I know you can use CASE and NULLIF but if you have complex calculations, that makes them a lot less readable. Regards Oliver Kohll oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608 www.gtwm.co.uk - company www.gtportalbase.com - product
Re: [GENERAL] [pgsql-general] Daily digest v1.9081 (14 messages)
On 8 Jun 2009, at 17:23, Merlin Moncure mmonc...@gmail.com wrote: Is there a way when creating a table to limit it to one row? That is, without using a stored procedure? I searched the documentation, but didn't find anything. CREATE TABLE x (...); CREATE UNIQUE INDEX x_only_one_row ON ((1)); very clever :D merlin To clever for me, I don't understand what's going on and can't replicate it in my 8.3. Any expansion? Rgs Oliver oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608 www.gtwm.co.uk - company www.gtportalbase.com - product
Re: [GENERAL] Sum of multiplied deltas
On 8 Jun 2009, at 19:01, David Fetter wrote: Hello! I've the following data: datetime | val1 | val2 time1|4 | 40% time2|7 | 30% time3| 12 | 20% ... I'd like to sum up the following: (7-4)*30% + (12-7)*20% + ... This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only feature, and dodgy hacks are the rule until you can use them. Cheers, David. I do this type of thing now and again using a self join with an offset. select test_a.val1 - test_b.val1 from test test_a inner join test test_b on test_a.pkey = test_b.pkey - 1; Thought I was quite clever the first time, didn't know it was a dodgy hack! I'm trying to learn more about windowing before 8.4, how would this example be done with that? Regards Oliver -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Division by zero
On 4 Jun 2009, at 13:11, Sam Mason s...@samason.me.uk wrote: You need to take care of only one case here: denominator == 0; rest of the cases will be handled sanely by the database. CASE WHEN packet_size = 0 THEN null ELSE cost/packet_size END as unit_cost Yes; or even shorter: cost/nullif(packet_size,0) AS unit_cost Thanks Sam and others. nullif is a good one to remember. However my problem is I want to be able to deal with an arbitrary calculation a user may create. In the light of a new day I realise this is obviously not trivial and would entail reasonably complex parsing. You'd have to find elements that could cause an error (division, some aggregates) and insert in the correct place nullif or CASE, taking care of bracket matching for starters - a messy workaround to the problem. I might look into functions if that's the only reasonable way of catching exceptions. Oliver
[GENERAL] Division by zero
Hello, We have a system that allows users to create views containing calculations but divisions by zero are commonly a problem. An simple example calculation in SQL would be SELECT cost / pack_size AS unit_cost from products; Either variable could be null or zero. I don't think there's a way of returning null or infinity for divisions by zero, rather than causing an error but I'd just like to check - and put in a vote for that functionality! If not, I will have to get the system to check for any numeric fields in user input calculations and rewrite them similar to CASE WHEN cost IS NULL THEN null WHEN pack_size IS NULL THEN null WHEN cost = 0 THEN null WHEN pack_size = 0 THEN null ELSE cost / pack_size AS unit_cost I don't want to write new functions, I'd rather keep it in plain SQL. Best regards Oliver Kohll oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608 www.gtwm.co.uk - company www.gtportalbase.com - product -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Foreign key verification trigger conditions
I have an update statement that affects every row in a given table. For that table it changes the value in a single column, which itself has a foreign key constraint. The table has an additional 9 foreign keys, some of which reference large tables. My expectation would be that only the changed column would be checked against the foreign key of interest, instead I find that all the foreign keys are checked when this statement is executed. I decided to create a simple test case to demonstrate this behaviour, but what I found was strange. The first time I created the test cases the behaviour matches my experience but the second time I created it the behaviour was then as I would have expected. This is the result I am experiencing with the unnecessary foreign key verification: testdb=# explain analyze update t1 set B = 1; QUERY PLAN - Seq Scan on t1 (cost=0.00..25.10 rows=1510 width=22) (actual time=0.026..0.029 rows=1 loops=1) Trigger for constraint fk1: time=0.111 calls=1 Trigger for constraint fk2: time=0.014 calls=1 Total runtime: 0.259 ms (4 rows) Only fk1's column is being updated, not fk2's. Below is both sessions. Any feedback on this and how to avoid it is appreciated as well as whether the developers would consider this a bug, I am inclined to believe so as it hurts performance. Thanks, -J * testdb=# create table t1 (A BIGINT, B BIGINT); CREATE TABLE testdb=# create table t2 (B BIGINT PRIMARY KEY); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index t2_pkey for table t2 CREATE TABLE testdb=# alter table t1 add constraint fk1 foreign key (B) references t2 (B); ALTER TABLE testdb=# explain analyze insert into t2 values (1); QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010 rows=1 loops=1) Total runtime: 45.508 ms (2 rows) testdb=# explain analyze insert into t1 values (1, 1); QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010 rows=1 loops=1) Trigger for constraint fk1: time=0.256 calls=1 Total runtime: 0.345 ms (3 rows) testdb=# explain analyze update t1 set A = 2; QUERY PLAN - Seq Scan on t1 (cost=0.00..27.70 rows=1770 width=14) (actual time=0.019..0.022 rows=1 loops=1) Total runtime: 0.125 ms (2 rows) testdb=# explain analyze update t1 set B = 1; QUERY PLAN - Seq Scan on t1 (cost=0.00..27.70 rows=1770 width=14) (actual time=0.021..0.024 rows=1 loops=1) Total runtime: 0.115 ms (2 rows) testdb=# explain analyze update t1 set B = 1; QUERY PLAN - Seq Scan on t1 (cost=0.00..27.70 rows=1770 width=14) (actual time=0.021..0.024 rows=1 loops=1) Total runtime: 0.113 ms (2 rows) testdb=# explain analyze insert into t2 values (2); QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010 rows=1 loops=1) Total runtime: 0.120 ms (2 rows) testdb=# explain analyze update t1 set B = 2; QUERY PLAN - Seq Scan on t1 (cost=0.00..27.70 rows=1770 width=14) (actual time=0.020..0.024 rows=1 loops=1) Trigger for constraint fk1: time=0.112 calls=1 Total runtime: 0.233 ms (3 rows) testdb=# explain analyze update t1 set A = 99; QUERY PLAN - Seq Scan on t1 (cost=0.00..27.70 rows=1770 width=14) (actual time=0.021..0.025 rows=1 loops=1) Total runtime: 0.117 ms (2 rows) testdb=# alter table t1 add b2 bigint; ALTER TABLE testdb=# alter table t1 add constraint fk2 foreign key (B2) references t2 (B); ALTER TABLE testdb=# explain analyze update t1 set B = 1; QUERY PLAN - Seq Scan on t1 (cost=0.00..25.10 rows=1510 width=22) (actual time=0.026..0.029 rows=1 loops=1) Trigger for constraint fk1: time=0.111 calls=1 Trigger for constraint fk2: time=0.014
Re: [GENERAL] Foreign key verification trigger conditions
Hi Tom, Thank you for pointing out the condition under which this occurs, I had not made the connection that the check was only occurring when the value in the other columns with foreign keys are null. I agree 100% that a strict key equality check that is in general use in the database should not return true for null = null. But I believe we can always come to the conclusion that a foreign key constraint is satisfied if all of the key values are null since that effectively means that the relationship is not present. Searching for ri_KeysEqual leads me to this discussion of the same topic: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00803.php Would there be any interest in implementing this change? Should I be reporting a bug to get it into the development queue? (My apologies that I have neither the skills nor the resources to work on it myself.) Thanks, -J On Mon, Jun 1, 2009 at 9:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: j-lists jamisonli...@gmail.com writes: I have an update statement that affects every row in a given table. For that table it changes the value in a single column, which itself has a foreign key constraint. The table has an additional 9 foreign keys, some of which reference large tables. My expectation would be that only the changed column would be checked against the foreign key of interest, instead I find that all the foreign keys are checked when this statement is executed. What your test case actually seems to show is that the skip-the-trigger optimization doesn't fire when the column value is NULL. Which is because ri_KeysEqual() doesn't consider two nulls to be equal. It's possible we could change that but I'd be worried about breaking other cases that are actually semantically critical... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] New 8.4 features
Hello, Some of the new language features in 8.4 seem like pretty major additions. I know that the window functions will be very useful. There have been many times in the past when I've wanted to aggregate in this way: http://elegantcode.com/2009/01/04/sql-window-clause/ If this is possible now it'll help massively in rapid prototyping - in the past, reports involving aggregates have needed the most time to develop. Another interesting feature is recursive SQL. I know 8.4 is only just in beta but it would be good to learn what is (and isn't) possible. These seem like the sorts of things that would get good writeups at varlena.com but I see there haven't been any new posts there in a couple of years. My question is, is anyone planning to blog / write focussing on these features? Oliver Kohll www.gtwm.co.uk - company www.gtportalbase.com - product -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Open sourcing
Hi, Wondering who has any practical experience of open sourcing a software product? I made a small internal database tool open source a while ago and that worked quite well - www.gtportalbase.com/opensource. People got involved in adding to the code and some people even paid a bit for our help. Now considering more. Any larger scale experiences? Regards Oliver Kohll oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608 www.gtwm.co.uk - company www.gtportalbase.com - product
[GENERAL] What query on system tables retrieves table or queries definitoin
What is the query for retrieving a tables and a views definition from the system tables (in 7.4). I am trying to find my way in the system tables and I am finding very confusing, especially way in the way the tables are linked. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] What is the syntax for setting a default date in PostgreSQL
This issue always stumps me. I need to set the default date for a column in postgres. A value for today, today + x days, now (timestamp), now + x amount of days. I never seem to be able to locate it in the documentation or online. Can anyone post it here once and for all for posterity? I will be much obliged :) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Syntax Error Inserting From STDIN?
I am trying to run the following: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen I get back: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen ERROR: syntax error at or near 23 at character 80 What am I doing wrong? I am on 8.1.1... The table is defined as: --++ -- id | integer| not null default issue_id | integer| title| character varying(255) | description | character varying(255) | feature_type | character varying(255) | Thx. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Syntax Error Inserting From STDIN?
Interesting. How would I go about solving that? I inserted an extra line between the two, no dice. From: Scott Marlowe [EMAIL PROTECTED] Date: Tue, 20 Dec 2005 13:53:37 -0600 To: Hunter's Lists [EMAIL PROTECTED] Cc: PostgreSQL pgsql-general@postgresql.org Subject: Re: [GENERAL] Syntax Error Inserting From STDIN? On Tue, 2005-12-20 at 13:30, Hunter's Lists wrote: I am trying to run the following: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen I get back: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen ERROR: syntax error at or near 23 at character 80 Seeing that character 80 I'm gonna guess this is a CR/LF issue. I.e. pgsql on your machine is seeing the 23 as being on the same line as the copy departments statement. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Syntax Error Inserting From STDIN?
Everything was on OS X. Looks like it was a problem with spaces vs. tabs. Anyway, I went through and fixed all the lines and everything went in. We had a strange problem restoring a 8.0.4 dump to a 8.1.1 server and this was the last of the data that had to be re-imported. From: Scott Marlowe [EMAIL PROTECTED] Date: Tue, 20 Dec 2005 16:41:32 -0600 To: Hunter's Lists [EMAIL PROTECTED] Cc: PostgreSQL pgsql-general@postgresql.org Subject: Re: [GENERAL] Syntax Error Inserting From STDIN? quick answer, try a different editor. Are you editing in the same environment as the database is in, or are you editing on windows and feeding the copy data in on another platform? On Tue, 2005-12-20 at 14:40, Hunter's Lists wrote: Interesting. How would I go about solving that? I inserted an extra line between the two, no dice. From: Scott Marlowe [EMAIL PROTECTED] Date: Tue, 20 Dec 2005 13:53:37 -0600 To: Hunter's Lists [EMAIL PROTECTED] Cc: PostgreSQL pgsql-general@postgresql.org Subject: Re: [GENERAL] Syntax Error Inserting From STDIN? On Tue, 2005-12-20 at 13:30, Hunter's Lists wrote: I am trying to run the following: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen I get back: COPY departments (id, issue_id, title, description, feature_type) FROM stdin; 23 4 Local BuzzThings to do, people to see, places to go. aspen ERROR: syntax error at or near 23 at character 80 Seeing that character 80 I'm gonna guess this is a CR/LF issue. I.e. pgsql on your machine is seeing the 23 as being on the same line as the copy departments statement. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Beyond the 1600 columns limit on windows
Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input. There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical thankeeping thesparse representation inside of the database for my application. On 11/8/05, John D. Burger [EMAIL PROTECTED] wrote: Evandro's mailing lists (Please, don't send personal messages to thisaddress) wrote: It has nothing to do with normalisation. It is a program for scientific applications. Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations.Having done similar things in the past, I wonder if your current DB design includes a column for every feature-value combination:instanceIDcolor=redcolor=bluecolor=yellow...height=71height=72-42 True False False 43 False TrueFalse44 False False True...This is likely to be extremely sparse, and you might use a sparserepresentation accordingly.As several folks have suggested, the representation in the database needn't be the same as in your code. Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now.The point is that, if you want to use Postgres, this is not in factefficient and practical.In fact, it might be the case that mappingfrom a sparse DB representation to your internal data structures is =more= efficient than naively using the same representation in bothplaces.- John D. BurgerMITRE-- Evandro M Leite JrPhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
[GENERAL] Beyond the 1600 columns limit on windows
Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. Regards -Evandro-- Evandro M Leite JrPhD Student Software developerUniversity of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
I'm doing a PhD in data mining and I need more than 1600 columns. I gotan error message saying that I can not use more than 1600 columns. It is happening because I have to change categorical values to binarycreating new columns. Do you know if oracle can handle it? -- Evandro M Leite Jr. PhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroMobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160 On 11/8/05, Tino Wildenhain [EMAIL PROTECTED] wrote: Evandro's mailing lists (Please, don't send personal messages to thisaddress) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.I would like to know who on earth needs 1600 columns and even beyond?Hint: you can have practically unlimited rows in your n:m table :-) ---(end of broadcast)---TIP 4: Have you searched our list archives?http://archives.postgresql.org -- Evandro M Leite JrPhD Student Software developerUniversity of Southampton, UKPersonal website: http://evandro.org Academic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
Sorry, It has nothing to do with normalisation. It is a program for scientific applications. Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations. Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now. Many thanks -Evandro On 08 Nov 2005 05:30:07 -0800, Randal L. Schwartz merlyn@stonehenge.com wrote: Evandro's == Evandro's mailing lists (Please, don't send personal messages to this address) [EMAIL PROTECTED] writes:[I would have replied to your personal address, but I'm not aboutto copy it from a footer.]Evandro's I'm doing a PhD in data mining and I need more than 1600 columns. I got an Evandro's error message saying that I can not use more than 1600 columns.Evandro'sIt is happening because I have to change categorical values to binaryEvandro's creating new columns. Do you know if oracle can handle it? /me bogglesYou are doing a PhD in data mining, and you have a table that needsmore than 1600 columns?/me gaspsWhat are they *teaching* these days?If you have a design that has more than 20 or so columns, you're probably already not normalizing properly.There just aren't *that*many attributes of a object before you should start factoring parts ofit out, even if it means creating some 1-1 tables.In programming, if I ever see someone name a sequence of variables, like thing1 and thing2, I know there's going to be trouble ahead,because that should have been a different data structure.Similarly,I bet some of your columns are foo1 and foo2.Signs of brokenness in the design.Or do you really have 1600 *different* attributes, none of which havea number in their name?That requires a serious amount ofcreativity. :)--Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!-- Evandro M Leite JrPhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
[GENERAL] Seq Scan but I think it should be Index Scan
So the details of it: I'm using PostgreSQL 8.0.3 on a Gentoo kernel 2.6.12 on a P4/2.8G+HT proc, with kernel RAID0 on some SATA drives and 1G RAM. Don't know the bus speed. I'm thinking that my queries are not using indexs correctly and therefore taking longer to complete than they should. I've put the details below, but changed some names. If anyone could shed some light? pg_config --configure '--prefix=/usr' '--mandir=/usr/share/man' '--host=i686-pc-linux-gnu' '--with-docdir=/usr/share/doc/postgresql-8.0.3' '--libdir=/usr/lib' '--enable-depend' '--with-gnu-ld' '--with-perl' '--with-openssl' '--enable-nls' '--with-pam' 'CFLAGS=-O2 -march=pentium4' 'host_alias=i686-pc-linux-gnu' data=# explain analyze select count(id) from x_base where x_type 100 and x_date='2005-10-26' and x_time'06:00:00'; QUERY PLAN --- Aggregate (cost=539453.36..539453.36 rows=1 width=4) (actual time=66200.763..66200.764 rows=1 loops=1) - Seq Scan on x_base (cost=0.00..539136.18 rows=126871 width=4) (actual time=31618.624..66174.710 rows=37451 loops=1) Filter: ((x_type 100) AND (x_date = '2005-10-26'::date) AND (x_time '06:00:00'::time without time zone)) Total runtime: 66200.811 ms data=# \d x_base Table public.x_base Column| Type |Modifiers -++-- id | integer| not null default nextval('public.x_base_id_seq'::text) x_code | character(8) | x_date | date | not null x_time | time without time zone | not null a | character(1) | b | integer| c | character(5) | d | character(16) | e | character(1) | f | character(1) | g | character(10) | h | character(1) | i | character(1) | j | character varying(32) | k | integer| l | integer| m | integer| n | character varying(32) | o | integer| p | character varying(14) | q | integer| Indexes: x_base_pkey PRIMARY KEY, btree (id) ix_d_cd btree (x_date) ix_t_cb btree (x_type) Foreign-key constraints: fk_k_id FOREIGN KEY (k) REFERENCES x_file(id) Now, see that x_type index? Why didn't this thing Index Scan ix_t_cb on that column? Me thinks if it had my query would be much faster. Or perhaps if I only where x_type? I tried that but it still took a minute. I took out count() and it still took a minute. Always using Seq Scan, am I doing something dumb here? There are more than six million records in that table, maybe thats just how long it takes? Perhaps I should change architecture or schema to improve performance? Tweak the log? Thanks. /djb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Large Table Performance
List, I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There are only 20 columns in the table, mostly char and integer. It's FK'd in two places to another table for import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search is done inside a date range). I thought it would be OK but after a few weeks of operation I have more than five million records in there. Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? I could munge dates into integers if their faster, I'm OK with that. What can I tweak in the configuration file to speed things up? What about some drastic schema change that more experience would have shown me? I cannot show the full schema but it's like this: -- My import/export data information table ie_data (id serial primary key, date date, [12 other columns here]) big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary transaction detail columns]) So when I say select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. TIA for any suggestions. /djb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] NTFS partition autodetection during instalation
Dear hackers I want to embed postgres into a software installation. I will use silent install, how do I detect which partition is NTFS so I can tell the installer to use that partition? Many thanks -Evandro-- Evandro M Leite JrPhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
[GENERAL] How can this be optimized, if possible?
Hello, My database has grown far faster then expected and a query which used to run acceptably now does not. I'm trying to figure out a way to make this operate faster and scale better. I'm very open to the idea that this does not need to be done using a SQL query at all - right now I'm really just in need of some conceptual/architectural help on this one. So I have two tables: Table category Column| Type| Modifiers --+---+--- head_title | character varying | cat_title| character varying | subcat_title | character varying | category | ltree | Table test Column | Type | Modifiers +-- +- id | integer | not null default nextval('master.test_id_seq'::text) category | ltree[] | ... there are other fields in the test table, but these are really the only two relevant to this. The query I want to run against these two tables is something like this: SELECT count(*) as count, category.category, nlevel(category.category) AS level, subpath(category.category,0,nlevel(category.category)-1) as parent, category.head_title, category.cat_title, category.subcat_title FROM test, category WHERE test.category @ category.category GROUP BY category.category, category.head_title, category.cat_title, category.subcat_title | Many times the WHERE clause will contain additional search criteria on the 'test' table. What I am trying to get is a count of how many rows from the test table fall into each category, being limited by the search criteria. This query is starting to take an enormous amount of time (30+ seconds) and I really need the results of this in a couple seconds tops. I can do a select category from test and it completes in about .5 seconds. The category table currently only has 225 rows, the test table having approximately 30,000. SELECT count(category,category FROM test GROUP BY category is quite slow and I thought of making a materialized view of this, but then of course I don't see any way to make that result limited by my search criteria. I am completely open to re-architecting this entirely, performance of this query is critical to my application - I really just am not sure where to start. It seems like everything I do is worse then what I started with. ... It *almost* seems as if I need to build some sort of search engine like tool which performs all the queries against the database, has tons of memory, and cache the category attributes for each record in memory. This sure seems like a lot of work though - I sincerely hope there is an easier way. Thanks for your help, as always! - Greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Peculiar performance observation....
On Mon, 2005-03-14 at 21:14, Net Virtual Mailing Lists wrote: On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote: Hello, I am sorry to bring this up again Does anyone have any idea what might be going on here?... I'm very worried about this situation.. ;-( It looks to me like either you're not analyzing often enough, or your statistics target is too low to get a good sample. Note your estimated versus real rows are off by a factor of 70 (28 est. versus 1943 actual rows). That's a pretty big difference, and where you should be looking. - Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual time=11.498..4800.907 rows=1943 loops=1) Yes, this is because PostgreSQL is using an index to approximate a sequential scan, which is not a good thing since PostgreSQL can't get all the information it needs from just an index, but has to visit the table to check visibility. All of these were after a vacuum full analyze, which I actually do nightly on the database. I probably confused the issue with all of my posts, this is the query which has me concerned. When running it on my system here, the disk thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to run... WHen running on our production servers, I can't hear the disk, but see an equally troubling performance loss when using the index. I'll call this query 1: database= explain analyze select id from table1 where category @ 'a.b'; QUERY PLAN - - - Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) Total runtime: 1.258 ms I can do this to speed things up (this results in very little disk activity, certainly not the thrashing the original query did): create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs= explain analyze select id from table1 where id in (select id from yuck where category @ 'a.b'); QUERY PLAN - - - Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) - HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) - Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) - Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = outer.id) Total runtime: 1261.551 ms (8 rows) If I drop the index table1_category_full_gist_idx, the query speeds up dramatically (10-15 times faster on both dev and prod uction systems). So my concern, in short: why is it so much slower when actually using an index and why is it trying to make mince meat out of my hard drive? I'll explain it again, sorry if my quoting originally was a bit of a mess. I meant to post the last comment I made after some other comment in your original post that I think I deleted. Anyway, the reason it's slow is that PostgreSQL, unlike most other databases, cannot get the answers from an index. It can only get a pointer to the right place in the table to look for the answer. After that, due to visibility issues caused by the way postgresql implements MVCC, it then has to look IN THE TABLE to find out if the value is visible to your transaction or not. So it's going Index then table, then index, then table, then index, then table, for however many rows it's gonna grab. In this case 1943. In query 1, the number of rows being returned by the index scan is 1943, but the planner only thinks it's gonna get back 28. So, with a 70:1 ratio of incorrectness here, the planner thinks an index scan is a good idea. It's not, it's a terrible idea for your table. The problem is likely that the query planner is not getting the right numbers for this table, and I'm not even sure how accurate statistics can be for ltrees, as I've only ever used btree indexes in postgresql. But, upping the statistics target for the column producing this bad behavior and rerunning
Re: [GENERAL] Peculiar performance observation....
Hello, I am sorry to bring this up again Does anyone have any idea what might be going on here?... I'm very worried about this situation.. ;-( - Greg Something even more peculiar (at least it seems to me..)... If I drop the index table1_category_gist_idx, I get this: jobs= explain analyze select id from table1 where category @ 'a.b' ORDER BY category; QUERY PLAN - -- Sort (cost=7568.55..7568.62 rows=28 width=52) (actual time=4842.691..4854.468 rows=1943 loops=1) Sort Key: category - Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual time=11.498..4800.907 rows=1943 loops=1) Filter: (category @ 'a.b'::ltree) Total runtime: 4871.076 ms (5 rows) .. no disk thrashing all over the place.. I'm really perplexed about this one..;-( - Greg I have a rather peculiar performance observation and would welcome any feedback on this. First off, the main table (well, part of it.. it is quite large..): Table table1 Column | Type | Modifiers +-- +- id | integer | not null default nextval('master.id_seq'::text) user_id| integer | ... (skipping about 20 columns) category | ltree[] | somedata | text | not null Indexes: table1_pkey primary key, btree (id) table1_category_full_gist_idx gist (category) table1_id_idx btree (id) table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text) table1_user_id_idx btree (user_id) database= explain analyze select id from table1 where category @ 'a.b'; QUERY PLAN - - - Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) Total runtime: 1.258 ms If I do this: create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs= explain analyze select id from table1 where id in (select id from yuck where category @ 'a.b'); QUERY PLAN - - - Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) - HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) - Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) - Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = outer.id) Total runtime: 1261.551 ms (8 rows) In the first query, my hard disk trashes audibly the entire 12 seconds (this is actually the best run I could get, it is usually closer to 20 seconds), the second query runs almost effortlessly.. I've tried reindexing, even dropping the index and recreating it but nothing I do helps at all. Now keep in mind that I do all of my development on painfully slow hardware in order to make any performance issues really stand out. But, I've done this on production servers too with an equal performance improvement noticed. I just can't figure out why this second query is so much faster, I feel like I must have done something very wrong in my schema design or something to be suffering this sort of a performance loss. Any idea what I can do about this? Thanks as always! - Greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Peculiar performance observation....
On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote: Hello, I am sorry to bring this up again Does anyone have any idea what might be going on here?... I'm very worried about this situation.. ;-( It looks to me like either you're not analyzing often enough, or your statistics target is too low to get a good sample. Note your estimated versus real rows are off by a factor of 70 (28 est. versus 1943 actual rows). That's a pretty big difference, and where you should be looking. - Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual time=11.498..4800.907 rows=1943 loops=1) Yes, this is because PostgreSQL is using an index to approximate a sequential scan, which is not a good thing since PostgreSQL can't get all the information it needs from just an index, but has to visit the table to check visibility. All of these were after a vacuum full analyze, which I actually do nightly on the database. I probably confused the issue with all of my posts, this is the query which has me concerned. When running it on my system here, the disk thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to run... WHen running on our production servers, I can't hear the disk, but see an equally troubling performance loss when using the index. database= explain analyze select id from table1 where category @ 'a.b'; QUERY PLAN - - - Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) Total runtime: 1.258 ms I can do this to speed things up (this results in very little disk activity, certainly not the thrashing the original query did): create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs= explain analyze select id from table1 where id in (select id from yuck where category @ 'a.b'); QUERY PLAN - - - Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) - HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) - Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) - Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = outer.id) Total runtime: 1261.551 ms (8 rows) If I drop the index table1_category_full_gist_idx, the query speeds up dramatically (10-15 times faster on both dev and prod uction systems). So my concern, in short: why is it so much slower when actually using an index and why is it trying to make mince meat out of my hard drive? - Greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[GENERAL] Peculiar performance observation....
I have a rather peculiar performance observation and would welcome any feedback on this. First off, the main table (well, part of it.. it is quite large..): Table table1 Column | Type | Modifiers +-- +- id | integer | not null default nextval('master.id_seq'::text) user_id| integer | ... (skipping about 20 columns) category | ltree[] | somedata | text | not null Indexes: table1_pkey primary key, btree (id) table1_category_full_gist_idx gist (category) table1_id_idx btree (id) table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text) table1_user_id_idx btree (user_id) database= explain analyze select id from table1 where category @ 'a.b'; QUERY PLAN - - - Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) Total runtime: 1.258 ms If I do this: create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs= explain analyze select id from table1 where id in (select id from yuck where category @ 'a.b'); QUERY PLAN - - - Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) - HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) - Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) - Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = outer.id) Total runtime: 1261.551 ms (8 rows) In the first query, my hard disk trashes audibly the entire 12 seconds (this is actually the best run I could get, it is usually closer to 20 seconds), the second query runs almost effortlessly.. I've tried reindexing, even dropping the index and recreating it but nothing I do helps at all. Now keep in mind that I do all of my development on painfully slow hardware in order to make any performance issues really stand out. But, I've done this on production servers too with an equal performance improvement noticed. I just can't figure out why this second query is so much faster, I feel like I must have done something very wrong in my schema design or something to be suffering this sort of a performance loss. Any idea what I can do about this? Thanks as always! - Greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Peculiar performance observation....
Something even more peculiar (at least it seems to me..)... If I drop the index table1_category_gist_idx, I get this: jobs= explain analyze select id from table1 where category @ 'a.b' ORDER BY category; QUERY PLAN - -- Sort (cost=7568.55..7568.62 rows=28 width=52) (actual time=4842.691..4854.468 rows=1943 loops=1) Sort Key: category - Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual time=11.498..4800.907 rows=1943 loops=1) Filter: (category @ 'a.b'::ltree) Total runtime: 4871.076 ms (5 rows) .. no disk thrashing all over the place.. I'm really perplexed about this one..;-( - Greg I have a rather peculiar performance observation and would welcome any feedback on this. First off, the main table (well, part of it.. it is quite large..): Table table1 Column | Type | Modifiers +-- +- id | integer | not null default nextval('master.id_seq'::text) user_id| integer | ... (skipping about 20 columns) category | ltree[] | somedata | text | not null Indexes: table1_pkey primary key, btree (id) table1_category_full_gist_idx gist (category) table1_id_idx btree (id) table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text) table1_user_id_idx btree (user_id) database= explain analyze select id from table1 where category @ 'a.b'; QUERY PLAN - - - Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) Total runtime: 1.258 ms If I do this: create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs= explain analyze select id from table1 where id in (select id from yuck where category @ 'a.b'); QUERY PLAN - - - Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) - HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) - Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category @ 'a.b'::ltree) Filter: (category @ 'a.b'::ltree) - Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = outer.id) Total runtime: 1261.551 ms (8 rows) In the first query, my hard disk trashes audibly the entire 12 seconds (this is actually the best run I could get, it is usually closer to 20 seconds), the second query runs almost effortlessly.. I've tried reindexing, even dropping the index and recreating it but nothing I do helps at all. Now keep in mind that I do all of my development on painfully slow hardware in order to make any performance issues really stand out. But, I've done this on production servers too with an equal performance improvement noticed. I just can't figure out why this second query is so much faster, I feel like I must have done something very wrong in my schema design or something to be suffering this sort of a performance loss. Any idea what I can do about this? Thanks as always! - Greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Problem with inherited table, can you help?...
I have the following three tables and my inserts are blocking each other in a way I just can't understand Can someone point me in the direction as to what is causing this? jobs= \d master.locations Table master.locations Column|Type | Modifiers -+- + location_id | integer | not null default nextval('master.locations_location_id_seq'::text) user_id | integer | addr1 | character varying(50) | addr2 | character varying(50) | city| character varying(50) | not null state_id| integer | state_other | character varying(50) | country_id | integer | zip | character varying(35) | not null loc_type| character varying(9)| deleted | boolean | not null entered_dt | timestamp without time zone | not null updated_dt | timestamp without time zone | Check constraints: locations_loc_type CHECK (loc_type::text = 'primary'::text OR loc_type::text = 'secondary'::text) jobs= \d jl_site1.locations Table jl_site1.locations Column|Type | Modifiers -+- + location_id | integer | not null default nextval('master.locations_location_id_seq'::text) user_id | integer | addr1 | character varying(50) | addr2 | character varying(50) | city| character varying(50) | not null state_id| integer | state_other | character varying(50) | country_id | integer | zip | character varying(35) | not null loc_type| character varying(9)| deleted | boolean | not null entered_dt | timestamp without time zone | not null updated_dt | timestamp without time zone | Indexes: locations_pkey primary key, btree (location_id) locations_location_id_key unique, btree (location_id) locations_country_id_idx btree (country_id) locations_state_id_idx btree (state_id) locations_user_id_idx btree (user_id) locations_zip_idx btree (zip) Check constraints: locations_loc_type CHECK (loc_type::text = 'primary'::text OR loc_type::text = 'secondary'::text) Foreign-key constraints: $3 FOREIGN KEY (user_id) REFERENCES jl_site1.customer(id) ON UPDATE CASCADE ON DELETE CASCADE $2 FOREIGN KEY (country_id) REFERENCES countries(country_id) ON DELETE RESTRICT $1 FOREIGN KEY (state_id) REFERENCES states(state_id) ON DELETE RESTRICT Inherits: locations jobs= \d jl_site2.locations Table jl_site2.locations Column|Type | Modifiers -+- + location_id | integer | not null default nextval('master.locations_location_id_seq'::text) user_id | integer | addr1 | character varying(50) | addr2 | character varying(50) | city| character varying(50) | not null state_id| integer | state_other | character varying(50) | country_id | integer | zip | character varying(35) | not null loc_type| character varying(9)| deleted | boolean | not null entered_dt | timestamp without time zone | not null updated_dt | timestamp without time zone | Indexes: locations_pkey primary key, btree (location_id) locations_location_id_key unique, btree (location_id) locations_country_id_idx btree (country_id) locations_state_id_idx btree (state_id) locations_user_id_idx btree (user_id) locations_zip_idx btree (zip) Check constraints: locations_loc_type CHECK (loc_type::text = 'primary'::text OR loc_type::text = 'secondary'::text) Foreign-key constraints: $3 FOREIGN KEY (user_id) REFERENCES jl_site2.customer(id) ON UPDATE CASCADE ON DELETE CASCADE $2 FOREIGN KEY (country_id) REFERENCES countries(country_id) ON DELETE RESTRICT $1 FOREIGN KEY (state_id) REFERENCES states(state_id) ON DELETE RESTRICT Inherits: locations (NOTE: at this point, hopefull it is clear that both jl_site1 and jl_site2 inherit the master.locations table) In connection #1, I do: 1. set search_path=jl_site1,public; 2. BEGIN; 3. INSERT INTO locations
Re: [GENERAL] Disabling triggers in a transaction
Net Virtual Mailing Lists wrote: All I did was added an extra column to my table (I called it batch_process). Then in the trigger do something like (in whichever function you are calling): IF NEW.batch_update IS NOT NULL AND NEW.batch_update = ''t'' THEN NEW.batch_process := NULL; RETURN NULL; END IF; .. whatever the rest of transaction is Why don't you just set it to false instead of NULL? Wouldn't that reduce the condition to just IF NEW.batch_update THEN ...? In that case you should default the column to false of course, or the condition will always fail (the value being NULL). Personally, I would use a more descriptive name for the column, 'disable_triggers' or something like that. Yeah, that's one improvement I meant to make but just haven't gotten around to it.. It is just the way this thing got written the first time during and testing and the oh! It worked! realization.. ;-) Also, I find it more convenient to use true and false instead of having to escape 't' and 'f' all the time ;) Yeah.. ;-) Then when doing an insert, just: INSERT INTO table (..., batch_process) VALUES (..., 't') when you want the trigger not to fire... Or an update: UPDATE TABLE table SET , batch_process = 't' I'm not sure sure how to make it work on a function called from a delete trigger though.. ;-( The drawbacks of this method are that you'll have to modify all your queries when you want to disable triggers (though that can usually be solved programatically), and that only the triggers that support this method of disabling will be actually disabled. It seems like you would have to do something programatically anyways in order to say Okay, now I want to disable the triggers -- go do something If you work at the same project with multiple people who all write triggers from time to time, or when you have to deal with legacy code from an older database, I think you'll run into trouble with the above quicker than you'd like. However, if you manage to get this into the design fase of a project it'll probably work just fine (the delete problem you mentioned aside...). I think any of the solutions I've seen mentioned so far would present the same problem. :-( -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Disabling triggers in a transaction
It is the only known way to control triggers though it isn't regularly tested by the developers. I think I've come up with another way.. I posted this recently, but did not get any feedback on it so I'm not sure how dumb it is... It is working really great for me though All I did was added an extra column to my table (I called it batch_process). Then in the trigger do something like (in whichever function you are calling): IF NEW.batch_update IS NOT NULL AND NEW.batch_update = ''t'' THEN NEW.batch_process := NULL; RETURN NULL; END IF; .. whatever the rest of transaction is Then when doing an insert, just: INSERT INTO table (..., batch_process) VALUES (..., 't') when you want the trigger not to fire... Or an update: UPDATE TABLE table SET , batch_process = 't' I'm not sure sure how to make it work on a function called from a delete trigger though.. ;-( - Greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly