Re: [PERFORM] SQL With Dates
It sounds like what you're doing is comparing the planner's cost estimate from running EXPLAIN on a few different queries. The planner's cost estimate was never intended to do what you're trying to do; it's not an absolute scale of cost, it's just a tool that the planner uses to get relative comparisons of logically equivalent plans. The actual number that the planner spits out is meaningless in an absolute sense. It's entirely possible that one query with an estimated cost of 1 will run faster than a query with an estimated cost of 100. What you actually need to do is compare the real running time of the queries in order to see which ones are actually problematic. For that, you'd do better using a tool like pgFouine to look at actual performance trends. -- Mark On Mon, 2009-04-20 at 10:55 -0300, Rafael Domiciano wrote: Hello People, I have initiated a work to review the sqls of our internal software. Lot of them he problem are about sql logic, or join with table unecessary, and so on. But software has lot of sql with date, doing thinks like: [..] date = '2009-04-01' AND date = '2009-04-15' [..] Redoing the SQL with fix date (date = '2009-04-01') o cost in explain always still about 200 or less. But with a period the cost is high, about 6000 or more. Select is using Index and the date is using index too. There is some way to use date period with less cost? Rafael Domiciano -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multicolumn indexes still efficient if not fullystressed?
On Mon, 2009-01-12 at 18:49 +0100, Jörg Kiegeland wrote: Hello, I created a multicolumn index on the columns c_1,..,c_n . If I do use only a true subset of these columns in a SQL query, is the index still efficient? Or is it better to create another multicolumn index defined on this subset? Thanks for any comments! Why would you create a multicolumn index for all columns if that's not what you actually query on? The order of columns matter for multicolumn indexes. Multicolumn indexes work best for queries that use all of the columns in the index, but can also be helpful if at least the leftmost columns in the index are specified in the query. So it depends on the order. If the index is defined on (c_1, c_2, c_3, c_4) and your query includes: WHERE c_2=val AND c_3=val AND c_4=val, then the index is almost certainly useless. On the other hand, if you were to query WHERE c_1=val then if c_1 is highly selective the index would still help. See here: http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html -- Mark Lewis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Tue, 2008-08-26 at 18:44 +0200, henk de wit wrote: Hi, We're currently having a problem with queries on a medium sized table. This table is 22GB in size (via select pg_size_pretty(pg_relation_size('table'));). It has 7 indexes, which bring the total size of the table to 35 GB (measured with pg_total_relation_size). On this table we're inserting records with a relatively low frequency of +- 6~10 per second. We're using PG 8.3.1 on a machine with two dual core 2.4Ghz XEON CPUs, 16 GB of memory and Debian Linux. The machine is completely devoted to PG, nothing else runs on the box. Lately we're getting a lot of exceptions from the Java process that does these inserts: An I/O error occured while sending to the backend. No other information is provided with this exception (besides the stack trace of course). The pattern is that for about a minute, almost every insert to this 22 GB table results in this exception. After this minute everything is suddenly fine and PG happily accepts all inserts again. We tried to nail the problem down, and it seems that every time this happens, a select query on this same table is in progress. This select query starts right before the insert problems begin and most often right after this select query finishes executing, inserts are fine again. Sometimes though inserts only fail in the middle of the execution of this select query. E.g. if the select query starts at 12:00 and ends at 12:03, inserts fail from 12:01 to 12:02. We have spend a lot of hours in getting to the bottom of this, but our ideas for fixing this problem are more or less exhausted at the moment. I wonder if anyone recognizes this problem and could give some pointers to stuff that we could investigate next. Thanks a lot in advance. If the select returns a lot of data and you haven't enabled cursors (by calling setFetchSize), then the entire SQL response will be loaded in memory at once, so there could be an out-of-memory condition on the client. Or if the select uses sorts and PG thinks it has access to more sort memory than is actually available on the system (due to ulimits, physical memory restrictions, etc) then you could run into problems that look like out-of-memory errors on the server. If could also be something else entirely; exceeding your max connections, something like that. A really good place to start would be to enable tracing on the JDBC driver. Look at the docs for the PostgreSQL JDBC driver for how to enable logging; that should give you a much better picture of exactly where and what is failing. If the issue is server-side, then you will also want to look at the PostgreSQL logs on the server; anything as serious as a backend aborting should write an entry in the log. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote: On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. However, if you have index1, there is no point in having index2 or index5. Matthew Thanks Matthew, does that mean i can just have index1, index3, index4? (trying to get the thread back into newest-comments-last order) Well, yes you can get away with just index1, index3 and index4, and it may well be the optimal solution for you, but it's not entirely clear-cut. It's true that PG can use index1 to satisfy queries of the form SELECT x FROM y WHERE column1=somevalue or column1=a AND column2=b. It will not be as fast as an index lookup from a single index, but depending on the size of the tables/indexes and the selectivity of leading column(s) in the index, the difference in speed may be trivial. On the other hand, if you have individual indexes on column1, column2 and column3 but no multi-column index, PG can combine the individual indexes in memory with a bitmap. This is not as fast as a normal lookup in the multi-column index would be, but can still be a big win over not having an index at all. To make an educated decision you might want to read over some of the online documentation about indexes, in particular these two sections: http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html and http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
The tradeoffs for multiple indexes are more or less as follows: 1. Having the right indexes makes queries faster, often dramatically so. 2. But more indexes makes inserts/updates slower, although generally not dramatically slower. 3. Each index requires disk space. With several indexes, you can easily have more of your disk taken up by indexes than with actual data. I would be careful to only create the indexes you need, but it's probably worse to have too few indexes than too many. Depends on your app though. -- Mark On Wed, 2008-08-20 at 20:40 +0530, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance? for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 which means, i am trying fire the SQL queries keeping columns in the where conditions. and the possibilities are like the above. if we create such indexes will it effect on performance? and what is the best go in this case? On Wed, Aug 20, 2008 at 8:10 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL+Hibernate Performance
Yes, we use connection pooling. As I recall Hibernate ships with c3p0 connection pooling built-in, which is what we use. We were happy enough with c3p0 that we ended up moving our other non-hibernate apps over to it, away from DBCP. pgpool does connection pooling at a socket level instead of in a local library level, so really it's a very different thing. If your app is the only thing talking to this database, and you don't have a multi-database configuration, then it will be easier for you to use a Java-based connection pooling library like c3p0 or DBCP than to use pgpool. -- Mark On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote: Hi Mark, Thank you very much for the information. I will analyse the DB structure and create indexes on PG directly. Are you using any connection pooling like DBCP? or PG POOL? Regards, KP On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED] wrote: On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote: Hi, Can anyone suggest the performance tips for PostgreSQL using Hibernate. One of the queries: - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which is better among them? or creating either of them is enough? or need to create both of them? and any more performace aspects ? Hibernate is a library for accessing a database such as PostgreSQL. It does not offer any add-on capabilities to the storage layer itself. So when you tell Hibernate that a column should be indexed, all that it does create the associated PostgreSQL index when you ask Hibernate to build the DB tables for you. This is part of Hibernate's effort to protect you from the implementation details of the underlying database, in order to make supporting multiple databases with the same application code easier. So there is no performance difference between a PG index and a Hibernate column index, because they are the same thing. The most useful Hibernate performance-tuning advice isn't PG-specific at all, there are just things that you need to keep in mind when developing for any database to avoid pathologically bad performance; those tips are really beyond the scope of this mailing list, Google is your friend here. I've been the architect for an enterprise-class application for a few years now using PostgreSQL and Hibernate together in a performance-critical context, and honestly I can't think of one time that I've been bitten by a PG-specific performance issue (a lot of performance issues with Hibernate that affected all databases though; you need to know what you're doing to make Hibernate apps that run fast. If you do run into problems, you can figure out the actual SQL that Hibernate is issuing and do the normal PostgreSQL explain analyze on it; usually caused by a missing index. -- Mark -- Best Regards Kranti Kiran Kumar Parisa M: +91 - 9391 - 438 - 738 +91 - 9849 - 625 - 625 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware question for a DB server
What type of usage does it need to scale for? How many concurrent connections? What size database? Data warehousing or OLTP-type workloads? Ratio of reads/writes? Do you care about losing data? One question that's likely going to be important depending on your answers above is whether or not you're getting a battery-backed write cache for that ServeRAID-8K. -- Mark Lewis On Wed, 2008-03-12 at 19:58 +0100, Pascal Cohen wrote: Hello, we plan to buy a dedicated server to host our database. Here is the proposal I was given (with a second identical server fro backup using log shipping): = IBM X3650 (This is a 2U server, can hold 8 Drives) 2 x QC Xeon E5450 (3.0GHz 12MB L2 1333MHz 80W) 8 x 2GB RAM (16GB total) 2.5 SAS Hotswap ServeRAID-8K SAS Controller 8 x 73GB 15K 2.5 SAS Drive CD/DVD Drive Remote Supervisor Adapter II Slimline Redundant Power 4 Year, 24x7 2hour support/warranty = I would like specialists advices. If you need additional details, please let me know. Thanks in advance for your help Thank you Pascal -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query slows after offset of 100K
Michael, Our application had a similar problem, and what we did to avoid having people click into the middle of 750k records was to show the first page with forward/back links but no link to go to the middle. So people could manually page forward as far as they want, but nobody is going to sit there clicking next 37k times. We have several thousand users and none of them complained about the change. Maybe it's because at the same time as we made that change we also improved the rest of the searching/filtering interface. But I think that really people don't need to jump to the middle of the records anyway as long as you have decent search abilities. If you wanted to keep your same GUI, one workaround would be to periodically update a table which maps page number to first unique key on page. That decouples the expensive work to generate the page offsets from the query itself, so if your data changes fairly infrequently it might be appropriate. Sort of a materialized-view type approach. If you can be approximate in your GUI you can do a lot more with this optimization-- if people don't necessarily need to be able to go directly to page 372898 but instead would be satisfied with a page roughly 47% of the way into the massive result set (think of a GUI slider), then you wouldn't need to update the lookup table as often even if the data changed frequently, because adding a few thousand records to a 750k row result set is statistically insignificant, so your markers wouldn't need to be updated very frequently and you wouldn't need to store a marker for each page, maybe only 100 markers spread evenly across the result set would be sufficient. -- Mark Lewis On Thu, 2008-02-14 at 19:49 +, Michael Lorenz wrote: Fair enough, and I did think of this as well. However, I didn't think this was a viable option in my case, since we're currently allowing the user to randomly access the pages (so $lastkey wouldn't really have any meaning). The user can choose to sort on object ID, name or modification time, and then go straight to any page in the list. With 750K records, that's around 37K pages. Maybe a better way to phrase my question is: how can I paginate my data on 3 different keys which allow random access to any given page, and still get reasonable performance? Should I just force the user to limit their result set to some given number of records before allowing any paginated access? Or is it just not practical, period? Thanks, Michael Lorenz To: [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query slows after offset of 100K Date: Thu, 14 Feb 2008 14:08:15 -0500 From: [EMAIL PROTECTED] Michael Lorenz writes: My query is as follows: SELECT o.objectid, o.objectname, o.isactive, o.modificationtime FROMobject o WHERE ( o.deleted = false OR o.deleted IS NULL ) AND o.accountid = 111 ORDER BY 2 LIMIT 20 OFFSET 1; This is guaranteed to lose --- huge OFFSET values are never a good idea (hint: the database still has to fetch those rows it's skipping over). A saner way to do pagination is to remember the last key you displayed and do something like WHERE key $lastkey ORDER BY key LIMIT 20, which will allow the database to go directly to the desired rows, as long as you have an index on the key. You do need a unique ordering key for this to work, though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly _ Your Future Starts Here. Dream it? Then be it! Find it at www.seek.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau%2F%3Ftracking%3Dsk%3Ahet%3Ask%3Anine%3A0%3Ahot%3Atext_t=764565661_r=OCT07_endtext_Future_m=EXT ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. The problem I was pointing out is that if chunk#2 moved foward a few bytes before it started reading in search of a newline, how will chunk#1 know that it's supposed to read up to that further point? You have to stop #1 from reading further when it catches up with where #2 started. Since the start of #2 is fuzzy until some reading is done, what you're describing will need #2 to send some feedback to #1 after they've both started, and that sounds bad to me. I like designs where the boundaries between threads are clearly defined before any of them start and none of them ever talk to the others. I don't think that any communication is needed beyond the beginning of the threads. Each thread knows that it should start at byte offset X and end at byte offset Y, but if Y happens to be in the middle of a record then just keep going until the end of the record. As long as the algorithm for reading past the end marker is the same as the algorithm for skipping past the beginning marker then all is well. -- Mark Lewis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Autovacuum running out of memory
On Tue, 2007-10-16 at 10:14 -0400, Jason Lustig wrote: I ran ulimit -a for the postgres user, and here's what I got: ... max memory size (kbytes, -m) 20 open files (-n) 100 max user processes (-u) 100 virtual memory (kbytes, -v) 20 ... These settings are all quite low for a dedicated database server, they would be more appropriate for a small development instance of PG sharing a machine with several other processes. Others have commented on the memory settings, but depending on the maximum number of connections you expect to have open at any time you may want to consider increasing the max user processes and open files settings as well. -- Mark Lewis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] quickly getting the top N rows
On Thu, 2007-10-04 at 11:00 -0700, Ben wrote: If I have this: create table foo (bar int primary key); ...then in my ideal world, Postgres would be able to use that index on bar to help me with this: select bar from foo order by bar desc limit 20; But in my experience, PG8.2 is doing a full table scan on foo, then sorting it, then doing the limit. I have a more complex primary key, but I was hoping the same concept would still apply. Am I doing something wrong, or just expecting something that doesn't exist? It has to do with the way that NULL values are stored in the index. This page has details and instructions for how to get it to work: http://developer.postgresql.org/pgdocs/postgres/indexes-ordering.html -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Low CPU Usage
On Wed, 2007-09-19 at 10:38 -0700, [EMAIL PROTECTED] wrote: Hi all. Recently I have installed a brand new server with a Pentium IV 3.2 GHz, SATA Disk, 2GB of Ram in Debian 4.0r1 with PostgreSQL 8.2.4 (previously a 8.1.9). I have other similar server with an IDE disk, Red Hat EL 4 and PostgreSQL 8.2.3 I have almost the same postgresql.conf in both servers, but in the new one (I have more work_mem than the other one) things go really slow. I began to monitor i/o disk and it's really ok, I have test disk with hdparm and it's 5 times faster than the IDE one. Running the same queries in both servers in the new one it envolves almost 4 minutes instead of 18 seconds in the old one. Both databases are the same, I have vacuum them and I don't know how to manage this issue. The only weird thing is than in the older server running the query it uses 30% of CPU instead of 3 o 5 % of the new one!!! What's is happening with this server? I upgrade from 8.1.9 to 8.2.4 trying to solve this issue but I can't find a solution. Any ideas? It could be a planning issue. Have you analyzed the new database to gather up-to-date statistics? A comparison of EXPLAIN ANALYZE results for an example query in both systems should answer that one. Another possibility because you're dealing with lower-end drives is that you have a case of one drive lying about fsync where the other is not. If possible, try running your test with fsync=off on both servers. If there's a marked improvement on the new server but no significant change on the old server then you've found your culprit. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Index files
On Sat, 2007-09-15 at 01:51 +0530, Harsh Azad wrote: Great, creating new tablespace for indexes worked! Now the question is whether existing tables/index can be moved to the new tablespace using an alter command or the only way possible is to drop and recreate them? You can alter an existing index: http://www.postgresql.org/docs/8.2/static/sql-alterindex.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN vs Internal Disks
On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Sounds like a good candidate for Slony replication for backups / read-only slaves. I haven't seen a SAN yet whose DR / replication facilities are on par with a good database replication solution. My impression is that those facilities are mostly for file servers, mail servers, etc. It would be difficult for a SAN to properly replicate a database given the strict ordering, size and consistency requirements for the data files. Not impossible, but in my limited experience I haven't found one that I trust to do it reliably either, vendor boastings to the contrary notwithstanding. (Hint: make sure you know exactly what your vendor's definition of the term 'snapshot' really means). So before you invest in a SAN, make sure that you're actually going to be able to (and want to) use all the nice management features you're paying for. We have some SAN's that are basically acting just as expensive external RAID arrays because we do the database replication/backup in software anyway. -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN vs Internal Disks
On Thu, 2007-09-06 at 22:28 +0530, Harsh Azad wrote: Thanks Mark. If I replicate a snapshot of Data and log files (basically the entire PG data directory) and I maintain same version of postgres on both servers, it should work right? I am also thinking that having SAN storage will provide me with facility of keeping a warm standby DB. By just shutting one server down and starting the other mounting the same File system I should be able to bing my DB up when the primary inccurs a physical failure. I'm only considering SAN storage for this feature - has anyone ever used SAN for replication and warm standy-by on Postgres? Regards, Harsh We used to use a SAN for warm standby of a database, but with Oracle and not PG. It worked kinda sorta, except for occasional crashes due to buggy drivers. But after going through the exercise, we realized that we hadn't gained anything over just doing master/slave replication between two servers, except that it was more expensive, had a tendency to expose buggy drivers, had a single point of failure in the SAN array, failover took longer and we couldn't use the warm standby server to perform read-only queries. So we reverted back and just used the SAN as expensive DAS and set up a separate box for DB replication. So if that's the only reason you're considering a SAN, then I'd advise you to spend the extra money on more DAS disks. Maybe I'm jaded by past experiences, but the only real use case I can see to justify a SAN for a database would be something like Oracle RAC, but I'm not aware of any PG equivalent to that. -- Mark Lewis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] join tables vs. denormalization by trigger
On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote: Hi, I wonder about differences in performance between two scenarios: Background: Table A, ~50,000 records Table B, ~3,000,000 records (~20 cols) Table C, ~30,000,000 records (~10 cols) a query every 3sec. with limit 10 Table C depends on Table B wich depends on Table A, int8 foreign key, btree index * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) * im required to show records from Table C, but also with some (~5cols) info from Table B * where clause always contains the foreign key to Table A * where clause may contain further 1-10 search parameter Scenario A) simply inner join Table B + C Scenario B) with use of trigger on insert/update I could push the required information from table B down to table C. - so i would only require to select from table C. My question: 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? You're assuming that B is always going to be faster than A, which certainly isn't a foregone conclusion. Let's say that you average 10 bytes per column. In scenario A, the total data size is then roughly 3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB. In scenario B due to your denormalization, the total data size is more like 30,000,000 * 30 * 10 = 9 GiB, or 2.5 times more raw data. That's a lot of extra disk IO, unless your database will always fit in memory in both scenarios. Although you didn't provide enough data to answer with certainty, I would go on the assumption that A is going to be faster than B. But even if it weren't, remember that premature optimization is the root of all evil. If you try A and it doesn't perform fast enough, then you can always try B later to see if it works any better. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 8.2 Autovacuum BUG ?
On Fri, 2007-08-31 at 12:25 -0400, Pallav Kalva wrote: Can you please correct me if I am wrong, I want to understand how this works. Based on what you said, it will run autovacuum again when it passes 200M transactions, as SELECTS are transactions too and are going on these tables. But the next time when it runs autovacuum, it shouldnt freeze the tuples again as they are already frozen and wont generate lot of archive logs ? Or is this because of it ran autovacuum for the first time on this db ? just the first time it does this process ? That is correct. The tuples are now frozen, which means that they will not need to be frozen ever again unless you insert/update any records. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: Fwd: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy)
On Fri, 2007-08-31 at 19:39 -0400, Tom Lane wrote: I wrote: Mark Lewis [EMAIL PROTECTED] writes: We've been holding back from upgrading to 8.2 because this one is a show-stopper for us. Well, you could always make your own version with this patch reverted: http://archives.postgresql.org/pgsql-committers/2006-11/msg00066.php I might end up doing that in the 8.2 branch if a better solution seems too large to back-patch. I thought of a suitably small hack that should cover at least the main problem without going so far as to revert that patch entirely. What we can do is have the IS NULL estimator recognize when the clause is being applied at an outer join, and not believe the table statistics in that case. I've applied the attached patch for this --- are you interested in trying it out on your queries before 8.2.5 comes out? Wish I could, but I'm afraid that I'm not going to be in a position to try out the patch on the application that exhibits the problem for at least the next few weeks. -- Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [Solved] Postgres performance problem
Perhaps you had a long-running transaction open (probably a buggy or hung application) that was preventing dead rows from being cleaned up. Restarting PG closed the offending connection and rolled back the transaction, which allowed vacuum to clean up all the dead rows. If you're not running regular VACUUMs at all but are instead exclusively running VACUUM FULL, then I don't think you would see warnings about running out of fsm enties, which would explain why you did not notice the bloat. I haven't confirmed that though, so I might be wrong. -- Mark Lewis On Thu, 2007-08-30 at 11:50 +0200, Ruben Rubio wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi ... Seems its solved. But the problem is not found. As you may know, I do a vacuum full and a reindex database each day. I have logs that confirm that its done and I can check that everything was fine. So, this morning, I stopped the website, I stopped database, started it again. (I was around 200 days without restarting), then I vacuum database and reindex it (Same command as everyday) . Restart again, and run again the website. Now seems its working fine. But I really does not know where is the problem. Seems vacuum its not working fine? Maybe database should need a restart? I really don't know. Does someone had a similar problem? Thanks in advance, Ruben Rubio -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1pLMIo1XmbAXRboRAqgQAKCkWcZYE8RDppEVI485wDLnIW2SfQCfV+Hj e8PurQb2TOSYDPW545AJ83c= =dQgM -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] LIKE query verses =
On Wed, 2007-08-29 at 18:01 +0530, Karthikeyan Mahadevan wrote: * 1) EXPLAIN ANALYSE SELECT job_category.job_id,job.name,job.state,job.build_id,cat.name as reporting_group FROM category,job_category,job,category as cat WHERE job.job_id=job_category.job_id AND job_category.category_id=category.category_id AND cat.build_id=category.build_id AND category.name = 'build_id.pap3260-20070828_01' AND cat.name like ('reporting_group.Tier2%'); QUERY PLAN -- Nested Loop (cost=0.00..291.53 rows=8 width=103) (actual time=98.999..385.590 rows=100 loops=1) - Nested Loop (cost=0.00..250.12 rows=9 width=34) (actual time=98.854..381.106 rows=100 loops=1) - Nested Loop (cost=0.00..123.22 rows=1 width=34) (actual time=98.717..380.185 rows=1 loops=1) - Index Scan using idx_cat_by_name on category cat (cost=0.00..5.97 rows=1 width=34) (actual time=95.834..245.276 rows=977 loops=1) Index Cond: (((name)::text = 'reporting'::character varying) AND ((name)::text 'reportinh'::character varying)) Filter: ((name)::text ~~ 'reporting_group.Tier2%'::text) - Index Scan using idx_cat_by_bld_id on category (cost=0.00..117.24 rows=1 width=8) (actual time=0.134..0.134 rows=0 loops=977) Index Cond: (outer.build_id = category.build_id) Filter: ((name)::text = 'build_id.pap3260-20070828_01'::text) - Index Scan using idx_jcat_by_cat_id on job_category (cost=0.00..126.00 rows=71 width=8) (actual time=0.126..0.569 rows=100 loops=1) Index Cond: (job_category.category_id = outer.category_id) - Index Scan using job_pkey on job (cost=0.00..4.59 rows=1 width=73) (actual time=0.033..0.036 rows=1 loops=100) Index Cond: (job.job_id = outer.job_id) Total runtime: 385.882 ms -- Remember that using LIKE causes PG to interpret an underscore as 'any character', which means that it can only scan the index for all records that start with 'reporting', and then it needs to apply a filter to each match. This is going to be slower than just going directly to the matching index entry. What you probably want to do is tell PG that you're looking for a literal underscore and not for any matching character by escaping the underscore, that will allow it to do a much quicker index scan. Something like: cat.name like 'reporting|_group.Tier2%' ESCAPE '|' -- Mark Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Fwd: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy)
It looks like your view is using a left join to look for rows in one table without matching rows in the other, i.e. a SQL construct similar in form to the query below: SELECT ... FROM A LEFT JOIN B ON (...) WHERE B.primary_key IS NULL Unfortunately there has been a planner regression in 8.2 in some cases with these forms of queries. This was discussed a few weeks (months?) ago on this forum. I haven't looked closely enough to confirm that this is the problem in your case, but it seems likely. Is it possible to refactor the query to avoid using this construct to see if that helps? We've been holding back from upgrading to 8.2 because this one is a show-stopper for us. -- Mark Lewis On Tue, 2007-08-28 at 11:24 -0500, Evan Carroll wrote: -- Forwarded message -- From: Evan Carroll [EMAIL PROTECTED] Date: Aug 28, 2007 11:23 AM Subject: Re: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy) To: Scott Marlowe [EMAIL PROTECTED] On 8/28/07, Scott Marlowe [EMAIL PROTECTED] wrote: I looked through your query plan, and this is what stood out in the 8.2 plan: - Nested Loop Left Join (cost=8830.30..10871.27 rows=1 width=102) (actual time=2148.444..236018.971 rows=62 loops=1) Join Filter: ((public.contact.pkid = public.contact.pkid) AND (public.event.ts_in public.event.ts_in)) Filter: (public.event.pkid IS NULL) Notice the misestimation is by a factor of 62, and the actual time goes from 2149 to 236018 ms. Again, have you analyzed your tables / databases? contacts=# \o scott_marlowe_test contacts=# VACUUM FULL ANALYZE; contacts=# SELECT * FROM test_view WHERE U_ID = 8; Cancel request sent ERROR: canceling statement due to user request contacts=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE U_ID = 8; output found at http://rafb.net/p/EQouMI82.html -- Evan Carroll System Lord of the Internets [EMAIL PROTECTED] 832-445-8877 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Indexscan is only used if we use limit n
On Wed, 2007-08-15 at 16:36 -0300, Sebastián Baioni wrote: Hello, Whe are running PostgreSQL 8.2.0 on amd64-portbld-freebsd6.2, compiled by GCC cc (GCC) 3.4.6 [FreeBSD] 20060305. The query only uses the index if we have a limit n: Without Limit n explain select esapcuit, esapcuil from esact00 t1 order by esapcuit, esapcuil Sort (cost=843833.82..853396.76 rows=3825177 width=30) Sort Key: esapcuit, esapcuil - Seq Scan on esact00 t1 (cost=0.00..111813.77 rows=3825177 width=30) With Limit n explain select esapcuit, esapcuil from esact00 t1 order by esapcuit, esapcuil limit 1 This isn't really unexpected-- it's faster to do a full sequential scan of a table than it is to do a full index traversal over the table. And usually it's still cheaper even after sorting the results of the full table scan. So as near as we can tell, PG is just doing what it's supposed to do and picking the best plan it can. You didn't really ask a question-- is this causing problems somehow, or were you just confused by the behavior? -- Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance problems with large telemetric datasets on 7.4.2
On Fri, 2007-08-03 at 06:52 -0700, Sven Clement wrote: Hello everybody, as I'm new to this list I hope that it is the right place to post this and also the right format, so if I'm committing an error, I apologize in advance. First the background of my request: I'm currently employed by an enterprise which has approx. 250 systems distributed worldwide which are sending telemetric data to the main PostgreSQL. The remote systems are generating about 10 events per second per system which accumulates to about 2500/tps. The data is stored for about a month before it is exported and finally deleted from the database. On the PostgreSQL server are running to databases one with little traffic (about 750K per day) and the telemetric database with heavy write operations all around the day (over 20 million per day). We already found that the VACUUM process takes excessively long and as consequence the database is Vacuumed permanently. The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated to database. OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1) with the libpq frontend library. Now the problem: The problem we are experiencing is that our queries are slowing down continuously even if we are performing queries on the index which is the timestamp of the event, a simple SELECT query with only a simple WHERE clause ( or ) takes very long to complete. So the database becomes unusable for production use as the data has to be retrieved very quickly if we want to act based on the telemetric data. So I'm asking me if it is useful to update to the actual 8.2 version and if we could experience performance improvement only by updating. Thank you for your answers, Sven Clement Upgrading from 7.4.x to 8.2.x will probably give you a performance benefit, yes. There have been numerous changes since the days of 7.4. But you didn't really give any information about why the query is running slow. Specifically, could you provide the query itself, some information about the tables/indexes/foreign keys involved, and an EXPLAIN ANALYZE for one of the problematic queries? Also, what kind of vacuuming regimen are you using? Just a daily cron maybe? Are you regularly analyzing the tables? -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres optimizer
On Fri, 2007-08-03 at 13:58 -0400, Mouhamadou Dia wrote: Hello, I have a Postgres instance (version 8.1) running on a Solaris 10 machine. When I run the following query SELECT * FROM PROR_ORG, ( ( ( ( (PRPT_PRT LEFT OUTER JOIN PRPT_PRTADR ON PRPT_PRT.PRT_NRI = PRPT_PRTADR.PRT_NRI AND PRPT_PRTADR.ADR_F_DEF=true) LEFT OUTER JOIN PLGE_CTY ON PRPT_PRTADR.CTY_NRI = PLGE_CTY.CTY_NRI) LEFT OUTER JOIN PLGE_CTY1 PLGE_CTY_PLGE_CTY1 ON PLGE_CTY.CTY_NRI = PLGE_CTY_PLGE_CTY1.CTY_NRI AND PLGE_CTY_PLGE_CTY1.LNG_CD = 'fr') LEFT OUTER JOIN PLGE_CTRSD ON PRPT_PRTADR.CTRSD_CD = PLGE_CTRSD.CTRSD_CD AND PRPT_PRTADR.CTR_ISO_CD = PLGE_CTRSD.CTR_ISO_CD) LEFT OUTER JOIN PLGE_CTR ON PRPT_PRTADR.CTR_ISO_CD = PLGE_CTR.CTR_ISO_CD) , PROR_ORG1 PROR_ORG_PROR_ORG1, PROR_ORGT, PROR_ORGT1 PROR_ORGT_PROR_ORGT1 WHERE ( (PROR_ORG.ORGT_CD = PROR_ORGT.ORGT_CD) AND (PROR_ORGT.ORGT_CD = PROR_ORGT_PROR_ORGT1.ORGT_CD AND PROR_ORGT_PROR_ORGT1.LNG_CD = 'fr') AND (PROR_ORG.PRT_NRI = PROR_ORG_PROR_ORG1.PRT_NRI AND PROR_ORG_PROR_ORG1.LNG_CD = 'fr') AND (PROR_ORG.PRT_NRI = PRPT_PRT.PRT_NRI) ) AND ( ((PROR_ORG.ORGT_CD ='CHAIN')) ) it takes 45 seconds to run. In this case the optimizer does a sequential scan of the PRPT_PRT table (which is the largest one) despite the existence of an index on PRT_NRI column of PRPT_PRT table. I’ve activated the GEQO but it still takes nearly the same time to run (between 40 and 45s). When I change the order of PRPT_PRT and PROR_ORG tables, it takes only 30 milliseconds to run. In this case the optimizer uses the index on PRT_NRI column of PRPT_PRT table, what is normal and what I was expecting. Is there a known problem with the Postgres optimizer? For your information, the same query takes 20 milliseconds to run on Informix and 60 milliseconds to run on Oracle independently of the order of the tables in the query. PRPT_PRT has 1.3 millions rows PRPT_PRTADR has 300.000 rows PROR_ORG has 1500 rows These are the largest tables, all the others are small tables. All statistics are up to date. If I recall correctly, PG 8.2 was the first version where the planner supported reordering outer joins. Prior releases would get poor performance unless the joins were listed in the right order. So it is quite possible that upgrading to 8.2 would solve your problem. Do you have the ability to try that? -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] disk filling up
On Thu, 2007-07-26 at 09:18 -0700, Brandon Shalton wrote: Hello all, My hard disk is filling up in the /base directory to where it has consumed all 200gig of that drive. All the posts that i see keep saying move to a bigger drive, but at some point a bigger drive would just get consumed. How can i keep the disk from filling up other than get like a half TB setup just to hold the ./base/* folder Ummm, don't put more 200G worth of data in there? :) You didn't give us any information about what you're using the database for, why you think that using 200G is excessive, what version of the database you're running, stuff like that. So really there's nothing that we can help you out with, except for the normal read the manual about vacuuming and make sure you're doing it newbie answer. -- Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] insert vs select into performance
If you're performing via JDBC, are you using addBatch/executeBatch, or are you directly executing each insert? If you directly execute each insert, then your code will wait for a server round-trip between each insert. That still won't get you to the speed of select into, but it should help. You could also look at the pgsql-jdbc archives for the JDBC driver patches which allow you to use COPY-style bulk loading, which should get you to the performance level of COPY, which should be reasonably close to the performance of select into. -- Mark Lewis On Tue, 2007-07-17 at 22:50 +0200, Thomas Finneid wrote: Michael Glaesemann wrote: On Jul 17, 2007, at 14:38 , Thomas Finneid wrote: I was doing some testing on insert compared to select into. I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Now I am wondering why the select into is that much faster? It would be helpful if you included the actual queries you're using, as there are a number of variables: create table ciu_data_type ( id integer, loc_id integer, value1 integer, value2 real, value3 integer, value4 real, value5 real, value6 char(2), value7 char(3), value8 bigint, value9 bigint, value10 real, value11 bigint, value12 smallint, value13 double precision, value14 real, value15 real, value16 char(1), value17 varchar(18), value18 bigint, value19 char(4) ); performed with JDBC insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, value10, value11 ) values (?,?,?,?,?,?,?,?) select * into ciu_data_type_copy from ciu_data_type 1) If there are any constraints on the original table, the INSERT will be checking those constraints. AIUI, SELECT INTO does not generate any table constraints. No constraints in this test. 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, foo3 FROM pre_foo or individual inserts for each row? The former would be faster than the latter. 2b) If you are doing individual inserts, are you wrapping them in a transaction? The latter would be faster. disabling autocommit, but nothing more than that I havent done this test in a stored function yet, nor have I tried it with a C client so far, so there is the chance that it is java/jdbc that makes the insert so slow. I'll get to that test soon if there is any chance my theory makes sence. regards thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Delete Cascade FK speed issue
On Tue, 2007-07-03 at 08:05 +0200, Patric de Waha wrote: Hi, I've dbase with about 80 relations. On deleting a user, this cascades through all the tables. This is very slow, for 20 users it takes 4 hours, with exclusive access to the dbase. No other users connected to the dbase. Ok I know there will be somewhere a relation with a FK without index, which is being scanned sequentially. But how can I find out what postgres is doing while it is handling the transaction? Is there a way I can find out what postgres does, and where it hangs around, so I know where the FK might not be indexed. (The dbase is to big to analyze it by hand). The way I do it now is to check the pg_locks relation, but this is not very representative. Is there profiling method for triggers/constraints, or a method which gives me a hint why it is taking so long? In 8.1 and later, an EXPLAIN ANALYZE of the delete will show you the amount of time spent in each trigger. Remember that it will still perform the delete, so if you want to be able to re-run the DELETE over and over as you add missing indexes, run it in a transaction and rollback each time. That will tell you which foreign key constraint checks are taking up time. The output will not be nearly as useful if you don't name your foreign key constraints, but is still better than nothing. Alternatively, you can just dump the schema to a text file and spend 30 minutes and some text searching to reconstruct your foreign key dependency graph rooted at the table in question and check each column for proper indexes. We recently did this for a 150 relation database, it's not as painful as you seem to think it is. An 80 relation database is by no means too big to analyze :) -- Mark Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Volunteer to build a configuration tool
On Wed, 2007-06-20 at 11:21 -0400, Greg Smith wrote: ... One of the things that was surprising to me when I started looking at the organization of the PostgreSQL buffer cache is how little gross information about its contents is available. I kept expecting to find a summary section where you could answer questions like how much of the cache currently has information about index/table X? used as an input to the optimizer. I understand that the design model expects much of this is unknowable due to the interaction with the OS cache, and in earlier versions you couldn't make shared_buffers big enough for its contents to be all that interesting, so until recently this wasn't worth collecting. But in the current era, where it's feasible to have multi-GB caches efficiently managed by PG and one can expect processor time is relatively cheap, it seems to me one way to give a major boost to the optimizer is to add some overhead to buffer cache management so it collects such information. When I was trying to do a complete overhaul on the background writer, the #1 problem was that I had to assemble my own statistics on what was inside the buffer cache as it was scanned, because a direct inspection of every buffer is the only way to know things like what percentage of the cache is currently dirty. ... One problem with feeding the current state of the buffer cache to the planner is that the planner may be trying to prepare a plan which will execute 10,000 times. For many interesting queries, the state of the cache will be very different after the first execution, as indexes and active portions of tables are brought in. For that matter, an early stage of query execution could significantly change the contents of the buffer cache as seen by a later stage of the execution, even inside a single query. I'm not saying that inspecting the buffer cache more is a bad idea, but gathering useful information with the current planner is a bit tricky. For purposes of idle speculation, one could envision some non-trivial changes to PG which would make really slick use this data: (1) Allow PG to defer deciding whether to perform an index scan or sequential scan until the moment it is needed, and then ask the buffer cache what % of the pages from the relevant indexes/tables are currently cached. (2) Automatically re-plan prepared queries with some kind of frequency (exponential in # of executions? fixed-time?), to allow the plans to adjust to changes in the buffer-cache. Besides being hard to build, the problem with these approaches (or any other approach which takes current temporary state into account) is that as much as some of us might want to make use of every piece of data available to make the planner into a super-brain, there are lots of other folks who just want plan stability. The more dynamic the system is, the less predictable it can be, and especially in mission-critical stuff, predictability matters more than . Tom said it really well in a recent post, To me, the worst catch-22 we face in this area is that we'd like the optimizer's choices of plan to be stable and understandable, but the real-world costs of queries depend enormously on short-term conditions such as how much of the table has been sucked into RAM recently by other queries. I have no good answer to that one. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance query about large tables, lots of concurrent access
On Tue, 2007-06-19 at 09:37 -0400, Karl Wright wrote: Alvaro Herrera wrote: Karl Wright wrote: This particular run lasted four days before a VACUUM became essential. The symptom that indicates that VACUUM is needed seems to be that the CPU usage of any given postgresql query skyrockets. Is this essentially correct? Are you saying you weren't used to run VACUUM all the time? If so, that's where the problem lies. Postgresql 7.4 VACUUM runs for so long that starting it with a cron job even every 24 hours caused multiple instances of VACUUM to eventually be running in my case. So I tried to find a VACUUM schedule that permitted each individual vacuum to finish before the next one started. A vacuum seemed to require 4-5 days with this particular database - or at least it did for 7.4. So I had the VACUUM schedule set to run every six days. I will be experimenting with 8.1 to see how long it takes to complete a vacuum under load conditions tonight. The longer you wait between vacuuming, the longer each vacuum is going to take. There is of course a point of diminishing returns for vacuum where this no longer holds true; if you vacuum too frequently the overhead of running the vacuum will dominate the running time. But 6 days for a busy database is probably way, way, way past that threshold. Generally, the busier the database the more frequently you need to vacuum, not less. If your update/delete transaction rate is high enough then you may need to vacuum multiple times per hour, at least on some tables. Playing with autovacuum might help you out here, because it can look at how badly a vacuum is needed and adjust the vacuuming rate on the fly on a per-table basis. Be sure to look up some reasonable autovacuum settings first; the 8.1 defaults aren't. -- Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Thousands of tables versus on table?
On Mon, 2007-06-04 at 13:40 -0400, Thomas Andrews wrote: I have several thousand clients. Our clients do surveys, and each survey has two tables for the client data, responders responses Frequent inserts into both table. Right now, we are seeing significant time during inserts to these two tables. Can you provide some concrete numbers here? Perhaps an EXPLAIN ANALYZE for the insert, sizes of tables, stuff like that? Some of the indices in tableA and tableB do not index on the client ID first. What reason do you have to think that this matters? So, we are considering two possible solutions. (1) Create separate responders and responses tables for each client. (2) Make sure all indices on responders and responses start with the client id (excepting, possibly, the primary keys for these fields) and have all normal operation queries always include an id_client. Right now, for example, given a responder and a survey question, we do a query in responses by the id_responder and id_survey. This gives us a unique record, but I'm wondering if maintaining the index on (id_responder,id_survey) is more costly on inserts than maintaining the index (id_client,id_responder,id_survey) given that we also have other indices on (id_client,...). Option (1) makes me very nervous. I don't like the idea of the same sorts of data being stored in lots of different tables, in part for long-term maintenance reasons. We don't really need cross-client reporting, however. What version of PG is this? What is your vacuuming strategy? Have you tried a REINDEX to see if that helps? -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LIKE search and performance
On Thu, 2007-05-24 at 21:54 +0100, James Mansion wrote: If Sybase is still like SQL Server (or the other way around), it *may* end up scanning the index *IFF* the index is a clustered index. If it's a normal index, it will do a sequential scan on the table. Are you sure its not covered? Have to check at work - but I'm off next week so it'll have to wait. It's not a win on PostgreSQL, because of our MVCC implementation. We need to scan *both* index *and* data pages if we go down that route, in which case it's a lot faster to just scan the data pages alone. Why do you need to go to all the data pages - doesn't the index structure contain all the keys so you prefilter and then check to see if the *matched* items are still in view? I'll be first to admit I know zip about Postgres, but it seems odd - doesn't the index contain copies of the key values?. I suspect that I mis-spoke with 'leaf'. I really just mean 'all index pages with data', since the scan does not even need to be in index order, just a good way to get at the data in a compact way. PG could scan the index looking for matches first and only load the actual rows if it found a match, but that could only be a possible win if there were very few matches, because the difference in cost between a full index scan and a sequential scan would need to be greater than the cost of randomly fetching all of the matching data rows from the table to look up the visibility information. So yes it would be possible, but the odds of it being faster than a sequential scan are small enough to make it not very useful. And since it's basically impossible to know the selectivity of this kind of where condition, I doubt the planner would ever realistically want to choose that plan anyway because of its poor worst-case behavior. -- Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Ever Increasing IOWAIT
You're not swapping are you? One explanation could be that PG is configured to think it has access to a little more memory than the box can really provide, which forces it to swap once it's been running for long enough to fill up its shared buffers or after a certain number of concurrent connections are opened. -- Mark Lewis On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote: We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up. The weird thing is that if we restart postgres it’s fine for hours but over time it goes bad again. (CPU usage graph here http://www.flickr.com/photos/[EMAIL PROTECTED]/502596262/ ) You can clearly see where the restart happens in the IO area This is Postgres 8.1.4 64bit. Anyone have any ideas? Thanks Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Throttling PostgreSQL's CPU usage
1. If you go the route of using nice, you might want to run the 3D front-end at a higher priority instead of running PG at a lower priority. That way apache, php and the other parts all run at the same priority as PG and just the one task that you want to run smoothly is elevated. 2. You may not even need separate priorities if you're running on Linux with a recent kernel and you enable the sleep() calls that you would need anyway for solution #1 to work. This is because Linux kernels are getting pretty good nowadays about rewarding tasks with a lot of sleeps, although there are some further kernel changes still under development that look even more promising. -- Mark On Tue, 2007-05-08 at 16:27 -0400, Daniel Griscom wrote: I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL via Apache/PHP. The 3D display is supposed to show smooth motion from location to location, with PostGIS giving dynamically updated information on the locations. Everything runs on the same machine, and it all works, but when I start a query the 3D display stutters horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't let go until it's completed the query. I don't need the PostgreSQL query to return quickly, but I must retain smooth animation while the query is being processed. In other words, I need PostgreSQL to spread out its CPU usage so that it doesn't monopolize the CPU for any significant time (more than 50ms or so). Possible solutions: 1: Set the PostgreSQL task priority lower than the 3D renderer task, and to make sure that the 3D renderer sleep()s enough to let PostgreSQL get its work done. The obvious objection to this obvious solution is Priority inversion!, but I see that as an additional challenge to be surmounted rather than an absolute prohibition. So, any thoughts on setting the PostgreSQL task priority (including by the much-maligned tool shown at http://weblog.bignerdranch.com/?p=11)? 2: Some variation of the Cost-Based Vacuum Delay. Hypothetically, this would have the PostgreSQL task sleep() periodically while processing the query, allowing the 3D renderer to continue working at a reduced frame rate. My understanding, however, is that this only works during VACUUM and ANALYZE commands, so it won't help during my SELECT commands. So, any thoughts on using Cost-Based Vacuum Delay as a Cost-Based Select Delay? 3: ... some other solution I haven't thought of. Any thoughts, suggestions, ideas? Thanks, Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Maybe he's looking for a switch for initdb that would make it interactive and quiz you about your expected usage-- sort of a magic auto-configurator wizard doohicky? I could see that sort of thing being nice for the casual user or newbie who otherwise would have a horribly mis-tuned database. They could instead have only a marginally mis-tuned database :) On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote: On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. If the person knows all that, why wouldn't they know to just change the config parameters? Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] a question about Direct I/O and double buffering
Not to hijack this thread, but has anybody here tested the behavior of PG on a file system with OS-level caching disabled via forcedirectio or by using an inherently non-caching file system such as ocfs2? I've been thinking about trying this setup to avoid double-caching now that the 8.x series scales shared buffers better, but I figured I'd ask first if anybody here had experience with similar configurations. -- Mark On Thu, 2007-04-05 at 13:09 -0500, Erik Jones wrote: On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: Hi, A page may be double buffered in PG's buffer pool and in OS's buffer cache. Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate double buffering. I noticed there were discusses on the list. But I can not find similar option in PG. Does PG support direct I/O now? The tuning guide of PG usually recommends a small shared buffer pool (compared to the size of physical memory). I think it is to avoid swapping. If there were swapping, OS kernel may swap out some pages in PG's buffer pool even PG want to keep them in memory. i.e. PG would loose full control over buffer pool. A large buffer pool is not good because it may 1. cause more pages double buffered, and thus decrease the efficiency of buffer cache and buffer pool. 2. may cause swapping. Am I right? If PG's buffer pool is small compared with physical memory, can I say that the hit ratio of PG's buffer pool is not so meaningful because most misses can be satisfied by OS Kernel's buffer cache? Thanks! To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file. erik jones [EMAIL PROTECTED] software developer 615-296-0838 emma(r) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] a question about Direct I/O and double buffering
... [snipped for brevity] ... Not to hijack this thread, but has anybody here tested the behavior of PG on a file system with OS-level caching disabled via forcedirectio or by using an inherently non-caching file system such as ocfs2? I've been thinking about trying this setup to avoid double-caching now that the 8.x series scales shared buffers better, but I figured I'd ask first if anybody here had experience with similar configurations. -- Mark Rather than repeat everything that was said just last week, I'll point out that we just had a pretty decent discusson on this last week that I started, so check the archives. In summary though, if you have a high io transaction load with a db where the average size of your working set of data doesn't fit in memory with room to spare, then direct io can be a huge plus, otherwise you probably won't see much of a difference. I have yet to hear of anybody actually seeing any degradation in the db performance from it. In addition, while it doesn't bother me, I'd watch the top posting as some people get pretty religious about (I moved your comments down). I saw the thread, but my understanding from reading through it was that you never fully tracked down the cause of the factor of 10 write volume mismatch, so I pretty much wrote it off as a data point for forcedirectio because of the unknowns. Did you ever figure out the cause of that? -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 12:01 -0500, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: On 2/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On Mon, 2007-02-05 at 18:35, Karen Hill wrote: I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. Yeah, I noticed that about 10 seconds after hitting send... :) actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. - Mark Lewis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, 2007-02-06 at 14:06 -0500, Merlin Moncure wrote: On 2/6/07, Mark Lewis [EMAIL PROTECTED] wrote: actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. the database will not allow you to create a RI link out unless the parent table has a primary key/unique constraint, which the database backs with an indexand you can't even trick it afterwards by dropping the constraint. it's the other direction, when you cascade forwards when you can have a problem. this is most common with a delete, but can also happen on an update of a table's primary key with child tables referencing it. Hmmm, should check my SQL before hitting send I guess. Well, at least you no longer have to wear the stupid award, Merlin :) -- Mark Lewis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow update with simple query
Out of curiosity, how hard would it be to modify the output of EXPLAIN ANALYZE when doing an insert/update to include the index update times and/or non-FK constraint checking times and/or the table row update times? Or any other numbers that might be useful in circumstances like this. I'm wondering if it's possible to shed some light on the remaining dark shadows of PG performance troubleshooting. -- Mark Lewis On Thu, 2006-12-14 at 11:19 -0500, Tom Lane wrote: Arnaud Lesauvage [EMAIL PROTECTED] writes: Tom Lane a crit : It seems the time must be going into this trigger function. What does it do? A lot of things ! Indeed, if it runs it will very badly hurt performances (table lookups, string manipulation, etc...) ! But it should only be tringered on INSERTs, and I am doing an UPDATE ! Doh, right, I obviously still need to ingest more caffeine this morning. I think the conclusion must be that there was just too much I/O to be done to update all the rows. Have you done any tuning of shared_buffers and so forth? I recall having seen cases where update performance went bad as soon as the upper levels of a large index no longer fit into shared_buffers ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] File Systems Compared
Anyone run their RAIDs with disk caches enabled, or is this akin to having fsync off? Disk write caches are basically always akin to having fsync off. The only time a write-cache is (more or less) safe to enable is when it is backed by a battery or in some other way made non-volatile. So a RAID controller with a battery-backed write cache can enable its own write cache, but can't safely enable the write-caches on the disk drives it manages. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] RES: Priority to a mission critical transaction
On Wed, 2006-11-29 at 08:25 -0500, Brian Hurt wrote: ... I have the same question. I've done some embedded real-time programming, so my innate reaction to priority inversions is that they're evil. But, especially given priority inheritance, is there any situation where priority inversion provides *worse* performance than running everything at the same priority? I can easily come up with situations where it devolves to that case- where all processes get promoted to the same high priority. But I can't think of one where using priorities makes things worse, and I can think of plenty where it makes things better. ... It can make things worse when there are at least 3 priority levels involved. The canonical sequence looks as follows: LOW: Aquire a lock MED: Start a long-running batch job that hogs CPU HIGH: Wait on lock held by LOW task at this point, the HIGH task can't run until the LOW task releases its lock. but the LOW task can't run to completion and release its lock until the MED job completes. (random musing): I wonder if PG could efficiently be made to temporarily raise the priority of any task holding a lock that a high priority task waits on. I guess that would just make it so that instead of HIGH tasks being effectively reduced to LOW, then LOW tasks could be promoted to HIGH. -- Mark Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
Tom, I'm interested in the problem of cross-column statistics from a theoretical perspective. It would be interesting to sit down and try to reason out a useful solution, or at very least to understand the problem better so I can anticipate when it might come and eat me. From my understanding, the main problem is that if PG knows the selectivity of n conditions C1,C2,...,Cn then it doesn't know whether the combined selectivity will be C1*C2*...*Cn (conditions are independent) or max(C1,C2,...,Cn) (conditions are strictly dependent), or somewhere in the middle. Therefore, row estimates could be orders of magnitude off. I suppose a common example would be a table with a serial primary key column and a timestamp value which is always inserted as CURRENT_TIMESTAMP, so the two columns are strongly correlated. If the planner guesses that 1% of the rows of the table will match pk100, and 1% of the rows of the table will match timestamp X, then it would be nice for it to know that if you specify both pk100 AND timestampX that the combined selectivity is still only 1% and not 1% * 1% = 0.01%. As long as I'm sitting down and reasoning about the problem anyway, are there any other types of cases you're aware of where some form of cross- column statistics would be useful? In the unlikely event that I actually come up with a brilliant and simple solution, I'd at least like to make sure that I'm solving the right problem :) Thanks, Mark Lewis On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote: Brian Herlihy [EMAIL PROTECTED] writes: What would it take for hints to be added to postgres? A *whole lot* more thought and effort than has been expended on the subject to date. Personally I have no use for the idea of force the planner to do exactly X given a query of exactly Y. You don't have exactly Y today, tomorrow, and the day after (if you do, you don't need a hint mechanism at all, you need a mysql-style query cache). IMHO most of the planner mistakes we see that could be fixed via hinting are really statistical estimation errors, and so the right level to be fixing them at is hints about how to estimate the number of rows produced for given conditions. Mind you that's still a plenty hard problem, but you could at least hope that a hint of that form would be useful for more than one query. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Unsubscribe
I'd prefer to have a short footer link called something like Mailing List Page which would take you to a page where you could subscribe, unsubscribe, or view the archives. I think that making the link short and also making it a quick shortcut away from the archives tips the scales in terms of utility vs. annoyance. One of the tips that shows up in the footers today is just a link to the archives anyway. -- Mark Lewis On Wed, 2006-10-04 at 11:28 -0500, Bruno Wolff III wrote: On Wed, Oct 04, 2006 at 08:30:03 -0700, Joshua D. Drake [EMAIL PROTECTED] wrote: Although I 100% agree with you Bruno, it should be noted that our lists are a closed box for most people. They don't follow what is largely considered standard amongst lists which is to have list information at the bottom of each e-mail. There are reasons you don't want to do that. Footers work OK for single part email messages. They don't make so much sense in multipart messages. You can probably take a crap shoot and add the footer to the first text/plain part and not break things. This won't work so well for multipart alternative messages that have text/plain and text/html parts. You could also try to insert a footer in to the html part, but thats a bit trickier since you can't just put it at the end. However, since the postgres lists are mostly just using text/plain parts for message bodies and there are already footers being used to distribute tips, it wouldn't make things significantly worse to add unsubscribe information as well. I would prefer just making the unsubscribe instructions easy to find on the web. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] BUG #2658: Query not using index
Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #2658: Query not using index
Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] BUG #2658: Query not using index
A few hundred is quite a lot for the next proposal and it's kind of an ugly one, but might as well throw the idea out since you never know. Have you considered creating one partial index per assetid? Something along the lines of CREATE INDEX asset_index_N ON asset_positions(ts) WHERE assetid=N? I'd guess that the planner probably wouldn't be smart enough to use the partial indexes unless you issued a separate query for each assetid, but each one of those queries should be really fast. Of course, this is all assuming that PG knows how to use partial indexes to satisfy MAX queries; I'm not sure if it does. -- Mark Lewis On Tue, 2006-10-03 at 14:35 -0700, Graham Davis wrote: Not many. It fluctuates, but there are usually only ever a few hundred at most. Each assetid has multi-millions of positions though. Mark Lewis wrote: Hmmm. How many distinct assetids are there? -- Mark Lewis On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote: The summary table approach maintained by triggers is something we are considering, but it becomes a bit more complicated to implement. Currently we have groups of new positions coming in every few seconds or less. They are not guaranteed to be in order. So for instance, a group of positions from today could come in and be inserted, then a group of positions that got lost from yesterday could come in and be inserted afterwards. This means the triggers would have to do some sort of logic to figure out if the newly inserted position is actually the most recent by timestamp. If positions are ever deleted or updated, the same sort of query that is currently running slow will need to be executed in order to get the new most recent position. So there is the possibility that new positions can be inserted faster than the triggers can calculate and maintain the summary table. There are some other complications with maintaining such a summary table in our system too, but I won't get into those. Right now I'm just trying to see if I can get the query itself running faster, which would be the easiest solution for now. Graham. Mark Lewis wrote: Have you looked into a materialized view sort of approach? You could create a table which had assetid as a primary key, and max_ts as a column. Then use triggers to keep that table up to date as rows are added/updated/removed from the main table. This approach would only make sense if there were far fewer distinct assetid values than rows in the main table, and would get slow if you commonly delete rows from the main table or decrease the value for ts in the row with the highest ts for a given assetid. -- Mark Lewis On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote: Thanks Tom, that explains it and makes sense. I guess I will have to accept this query taking 40 seconds, unless I can figure out another way to write it so it can use indexes. If there are any more syntax suggestions, please pass them on. Thanks for the help everyone. Graham. Tom Lane wrote: Graham Davis [EMAIL PROTECTED] writes: How come an aggreate like that has to use a sequential scan? I know that PostgreSQL use to have to do a sequential scan for all aggregates, but there was support added to version 8 so that aggregates would take advantage of indexes. Not in a GROUP BY context, only for the simple case. Per the comment in planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. The problem is that using an index to obtain the maximum value of ts for a given value of assetid is not the same thing as finding out what all the distinct values of assetid are. This could possibly be improved but it would take a considerable amount more work. It's definitely not in the category of bug fix. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Decreasing BLKSZ
I'm not sure if decreasing BLKSZ is the way to go. It would allow you to have more smaller blocks in memory, but the actual coverage of the index would remain the same; if only 33% of the index fits in memory with the 8K BLKSZ then only 33% would fit in memory with a 4k BLKSZ. I can see where you're going if the tree nodes for all 15 million key entries fit in memory as well as the most recent nodes for the logtime nodes lower down in the index; basically trying to make sure that the right 33% of the index is in memory. But it seems like it might be more useful to have two indexes, one on logtime and one on key. Inserts into the logtime index would be correlated with your insert order and as such be cache-friendly so that's not an issue. The index on just the key column would be at least as small as the active subset of a combined index, so performance should be at least as good as you could possibly achieve by reducing BLKSIZE. PG 8.1 is smart enough to use a bitmap index scan to combine the two indexes at query time; if that gives you adequate performance then it would be simpler than reducing BLKSIZE. -- Mark Lewis On Mon, 2006-09-25 at 17:54 -0400, Marc Morin wrote: Would it be possible to change the primary key to (logtime,key)? This could help keeping the working window small. No, the application accessing the data wants all the rows between start and end time for a particular key value. Secondly, the real working set is smaller, as the rows are all inserted at the end of the table, filling each page until it's full, so only the last pages are accessed. There's no relation between the index order, and the order of data on disk, unless you CLUSTER. I'd theorizing that my problem is in updating the index itself and not the heap. Insert order Refers to the order by which the applications are inserting the rows and as such, the order by Which the index is being updated. This in turn, is causing the b-tree to be traverse. Problem Is the working set of blocks at the bottom of the btree is too big for my cache. Any comment on other affects or gotchas with lowering the size of BLKSZ? Currently, our database is thrashing its cache of blocks we we're getting only ~100 inserts/second, every insert results in a evict-read-modify operation. I'm not shure that's the correct diagnosis. Do you have one transaction per insert? Every transaction means a forced sync to the disk, so you won't get more than about 100-200 commits per second, depending on your actual disk rotation speed. No, an insert consists of roughly 10,000+ rows per transaction block. To improve concurrency of the numer of inserters running in parallel, try to tweak the config variables commit_delay and commit_sibling, so you get a higher overall throughput at cost of an increased delay per connection, and increase the number of inserters. Using sensible tweaking, the throughput should scale nearly linear with the number of backens. :-) I don't think this will help us here due to large transactions already. If feasible for your application, you can also bundle several log entries into a single transaction. If you're CPU bound, you can use COPY instead of INSERT or (if you can wait for 8.2) the new multi-row INSERT to further improve performance, but I doubt that you're CPU bound. The only way to really get over the sync limit is to have (at least) the WAL on a battery backed ram / SSD media that has no spinning disk physical limit, or abandon crash safety by turning fsync off. Again, problem is not with WAL writing, already on it's own raid1 disk pair. The I/O pattern we see is about 1-2% load on WAL and 100% load on the array holding the indexes and tables. Throughput is very low, something like 150k-200K bytes/second of real rows being deposited on the disk. The disks are busy seeking all over the disk platter to fetch a block, add a single row, then seek to another spot and write back a previously dirty buffer Thanks, Markus. -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL and sql-bench
Not to offend, but since most of us are PG users, we're not all that familiar with what the different tests in MySQL's sql-bench benchmark do. So you won't get very far by saying PG is slow on benchmark X, can I make it faster?, because that doesn't include any of the information we need in order to help. Specifics would be nice, including at least the following: 1. Which specific test case(s) would you like to try to make faster? What do the table schema look like, including indexes and constraints? 2. What strategy did you settle on for handling VACUUM and ANALYZE during the test? Have you confirmed that you aren't suffering from table bloat? 3. What are the actual results you got from the PG run in question? 4. What is the size of the data set referenced in the test run? -- Mark Lewis On Thu, 2006-09-21 at 07:52 -0700, yoav x wrote: Hi After upgrading DBI and DBD::Pg, this benchmark still picks MySQL as the winner (at least on Linux RH3 on a Dell 1875 server with 2 hyperthreaded 3.6GHz CPUs and 4GB RAM). I've applied the following parameters to postgres.conf: max_connections = 500 shared_buffers = 3000 work_mem = 10 effective_cache_size = 30 Most queries still perform slower than with MySQL. Is there anything else that can be tweaked or is this a limitation of PG or the benchmark? Thanks. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
So this might be a dumb question, but the above statements apply to the cluster (e.g. postmaster) as a whole, not per postgres process/transaction correct? So each transaction is blocked waiting for the main postmaster to retrieve the data in the order it was requested (i.e. not multiple scanners/aio)? Each connection runs its own separate back-end process, so these statements apply per PG connection (=process). ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimizing DELETE
You do not have indexes on all of the columns which are linked by foreign key constraints. For example, let's say that I had a scientist table with a single column scientist_name and another table discovery which had scientist_name as a column with a foreign key constraint to the scientist table. If the system were to try to delete a row from the scientist table, then it would need to scan the discovery table for any row which referenced that scientist_name. If there is an index on the scientist_name column in the discovery table, this is a fast operation. In your case however, there most likely isn't an index on that column, so it needs to do a full table scan of the discovery table for each row deleted from the scientist table. If the discovery table has 100,000 rows, and there are 100 scientists, then deleting those 100 scientists would require scanning 100,000 * 100 = 10M records, so this sort of thing can quickly become a very expensive operation. Because of this potential for truly atrocious update/delete behavior, some database systems (SQL Server at least, and IIRC Oracle as well) either automatically create the index on discovery.scientist_name when the foreign key constraint is created, or refuse to create the foreign key constraint if there isn't already an index. PG doesn't force you to have an index, which can be desirable for performance reasons in some situations if you know what you're doing, but allows you to royally shoot yourself in the foot on deletes/updates to the parent table if you're not careful. If you have a lot of constraints and want to track down which one is unindexed, then doing an EXPLAIN ANALYZE of deleting a single row from the parent table will tell you how long each of the referential integrity checks takes, so you can figure out which indexes are missing. -- Mark Lewis On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote: I've just fired off a DELETE FROM table command (i.e. unfiltered DELETE) on a trivially small table but with many foreign key references (on similar-sized tables), and I'm waiting for it to finish. It's been 10 minutes now, which seems very excessive for a table of 9000 rows on a 3 GHz desktop machine. 'top' says it's all spent in USER time, and there's a ~~500KB/s write rate going on. Just before this DELETE, I've deleted data from a larger table (5 rows) using the same method and it finished in couple of seconds - maybe it's a PostgreSQL bug? My question is: assuming it's not a bug, how to optimize DELETEs? Increasing work_mem maybe? (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64) (I know about TRUNCATE; I need those foreign key references to cascade) ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sql-bench
The last I checked (years ago), sql-bench was very synthetic (i.e. reflecting no realistic use case). It's the sort of test suite that's useful for database developers when testing the effects of a particular code change or optimization, but not so applicable to real-world uses. Historically the test was also bad for PG because it did nasty things like 10,000 inserts each in separate transactions because the test was written for MySQL which at the time didn't support transactions. Not sure if that's been fixed yet or not. Can you provide details about the schema and the queries that are slow? -- Mark On Wed, 2006-09-13 at 05:24 -0700, yoav x wrote: Hi I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. Some of the insert tests seems to be ver slow For example: select_join_in Are there any tuning parameters that can be changed to speed these queries? Or are these queries especially tuned to show MySQL's stgrenths? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Is this way of testing a bad idea?
Monitoring the processes using top reveals that the total amount of memory used slowly increases during the test. When reaching insert number 4, or somewhere around that, memory is exhausted, and the the systems begins to swap. Each of the postmaster processes seem to use a constant amount of memory, but the total memory usage increases all the same. So . . . . what's using the memory? It doesn't sound like PG is using it, so is it your Java app? If it's the Java app, then it could be that your code isn't remembering to do things like close statements, or perhaps the max heap size is set too large for your hardware. With early RHEL3 kernels there was also a quirky interaction with Sun's JVM where the system swaps itself to death even when less than half the physical memory is in use. If its neither PG nor Java, then perhaps you're misinterpreting the results of top. Remember that the free memory on a properly running Unix box that's been running for a while should hover just a bit above zero due to normal caching; read up on the 'free' command to see the actual memory utilization. -- Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PowerEdge 2950 questions
it's worse than that. if you need to read something that is not in the o/s cache, all the disks except for one need to be sent to a physical location in order to get the data. Thats the basic rule with striping: it optimizes for sequential i/o in expense of random i/o. There are some optimizations that can help, but not much. caching by the controller can increase performance on writes because it can optimize the movement across the disks by instituting a delay between the write request and the actual write. raid 1 (or 1+x) is the opposite. It allows the drive heads to move independantly on reads when combined with some smart algorithms. writes however must involve all the disk heads however. Many controllers do not to seem to optimze raid 1 properly although linux software raid seems to. A 4 disk raid 1, for example, could deliver four times the seek performance which would make it feel much faster than a 4 disk raid 0 under certain conditions. I understand random mid-sized seeks (seek to x and read 512k) being slow on RAID5, but if the read size is small enough not to cross a stripe boundary, this could be optimized to only one seek on one drive. Do most controllers just not do this, or is there some other reason that I'm not thinking of that would force all disks to seek? -- Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to get higher tps
Well, at least on my test machines running gnome-terminal, my pgbench runs tend to get throttled by gnome-terminal's lousy performance to no more than 300 tps or so. Running with 2/dev/null to throw away all the detailed logging gives me 2-3x improvement in scores. Caveat: in my case the db is on the local machine, so who knows what all the interactions are. Also, when you initialized the pgbench db what scaling factor did you use? And does running pgbench with -v improve performance at all? -- Mark On Tue, 2006-08-22 at 09:19 -0400, Marty Jia wrote: Joshua, Here is shared_buffers = 8 fsync = on max_fsm_pages = 35 max_connections = 1000 work_mem = 65536 effective_cache_size = 61 random_page_cost = 3 Here is pgbench I used: pgbench -c 10 -t 1 -d HQDB Thanks Marty -Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED] Sent: Monday, August 21, 2006 6:09 PM To: Marty Jia Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to get higher tps Marty Jia wrote: I'm exhausted to try all performance tuning ideas, like following parameters shared_buffers fsync max_fsm_pages max_connections shared_buffers work_mem max_fsm_pages effective_cache_size random_page_cost I believe all above have right size and values, but I just can not get higher tps more than 300 testd by pgbench What values did you use? Here is our hardware Dual Intel Xeon 2.8GHz 6GB RAM Linux 2.4 kernel RedHat Enterprise Linux AS 3 200GB for PGDATA on 3Par, ext3 50GB for WAL on 3Par, ext3 With PostgreSql 8.1.4 We don't have i/o bottle neck. Are you sure? What does iostat say during a pgbench? What parameters are you passing to pgbench? Well in theory, upgrading to 2.6 kernel will help as well as making your WAL ext2 instead of ext3. Whatelse I can try to better tps? Someone told me I can should get tps over 1500, it is hard to believe. 1500? Hmmm... I don't know about that, I can get 470tps or so on my measily dual core 3800 with 2gig of ram though. Joshua D. Drake Thanks Marty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to get higher tps
Not much we can do unless you give us more info about how you're testing (pgbench setup), and what you've done with the parameters you listed below. It would also be useful if you told us more about your drive array than just 3Par. We need to know the RAID level, number/speed of disks, whether it's got a battery-backed write cache that's turned on, things like this. Like Jeff just said, it's likely that you're waiting for rotational latency, which would limit your maximum tps for sequential jobs based on the number of disks in your array. For example, a 2-disk array of 10k RPM disks is going to max out somewhere around 333 tps. (2*1/60). -- Mark Lewis On Mon, 2006-08-21 at 16:45 -0400, Marty Jia wrote: I'm exhausted to try all performance tuning ideas, like following parameters shared_buffers fsync max_fsm_pages max_connections shared_buffers work_mem max_fsm_pages effective_cache_size random_page_cost I believe all above have right size and values, but I just can not get higher tps more than 300 testd by pgbench Here is our hardware Dual Intel Xeon 2.8GHz 6GB RAM Linux 2.4 kernel RedHat Enterprise Linux AS 3 200GB for PGDATA on 3Par, ext3 50GB for WAL on 3Par, ext3 With PostgreSql 8.1.4 We don't have i/o bottle neck. Whatelse I can try to better tps? Someone told me I can should get tps over 1500, it is hard to believe. Thanks Marty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and
MSSQL can give either a graphical query plan or a text-based one similar to PG. There's no way that I've found to get the equivalent of an EXPLAIN ANALYZE, but I'm by no means an MSSQL guru. To get a neat-looking but not very useful graphical query plan from the Query Analyzer tool, hit Ctrl-L. To get the text-based one, execute SET SHOWPLAN_ALL ON which toggles diagnostic mode on, and each query that you run will return the explain plan instead of actually running until you execute SET SHOWPLAN_ALL OFF. -- Mark Lewis On Thu, 2006-08-17 at 09:11 -0400, Tom Lane wrote: Peter Hardman [EMAIL PROTECTED] writes: I wonder whether Paradox and MySQL are just not doing the sort (this seems to be what eats up the time), since the output of the subquery is in fact already in the proper order. MSSQL (from the other thread). I feel fairly safe in assuming that MySQL's query optimizer is not nearly in the league to do this query effectively. (I like the theory Arjen mentioned that what you are measuring there is the effects of their query cache rather than a smart fundamental implementation.) I wonder whether MSSQL has an EXPLAIN equivalent ... Anywy, your point about the sort being redundant is a good one, and offhand I'd have expected PG to catch that; I'll have to look into why it didn't. But that's not going to explain a 10x speed difference, because the sort isn't 90% of the runtime. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Inner Join of the same table
Can you provide an EXPLAIN ANALYZE of the query in PG? Have you analyzed the PG database? How many rows is this query expected to return? Which version of PG are you running? What indexes have you defined? -- Mark On Tue, 2006-08-15 at 14:38 +, Sebastián Baioni wrote: Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a serious problem: Table: APORTES - Rows: 9,000,000 (9 million) *cuiT (char 11) *cuiL (char 11) *PERI (char 6) FAMI (numeric 6) I need all the cuiLs whose max(PERI) are from a cuiT, and the Max (FAMI) of those cuiLs, so the sentence is: SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI FROM APORTES T INNER JOIN (SELECT cuiL, MAX(PERI) AS MAXPERI, MAX(FAMI) AS MAXFAMI FROM APORTES GROUP BY cuiL) AS U ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI WHERE T.cuiT='12345678901' In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts. Do you know if there is any way to tune the server or optimize this sentence? Thanks Sebastián Baioni Instrumentos musicalesSebastián Baioni Ofertas náuticas __ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). Probalo ya! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance with 2 AMD/Opteron 2.6Ghz and 8gig
This isn't all that surprising. The main weaknesses of RAID-5 are poor write performance and stupid hardware controllers that make the write performance even worse than it needs to be. Your numbers bear that out. Reads off RAID-5 are usually pretty good. Your 'dd' test is going to be a little misleading though. Most DB access isn't usually purely sequential; while it's easy to see why HW RAID-5 might outperform HW-RAID-10 in large sequential reads (the RAID controller would need to be smarter than most to make RAID-10 as fast as RAID-5), I would expect that HW RAID-5 and RAID-10 random reads would be about equal or else maybe give a slight edge to RAID-10. -- Mark Lewis On Fri, 2006-07-28 at 13:31 -0400, Jeff Trout wrote: I too have a DL385 with a single DC Opteron 270. It claims to have a smart array 6i controller and over the last couple of days I've been runnign some tests on it, which have been yielding some suprising results. I've got 6 10k U320 disks in it. 2 are in a mirror set. We'll not pay any attention to them. The remaining 4 disks I've been toying with to see what config works best, using hardware raid and software raid. system info: dl dl385 - 1 opteron 270 - 5GB ram - smart array 6i cciss0: HP Smart Array 6i Controller Firmware Version: 2.58 Linux db03 2.6.17-1.2157_FC5 #1 SMP Tue Jul 11 22:53:56 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux using xfs Each drive can sustain 80MB/sec read (dd, straight off device) So here are the results I have so far. (averaged) hardware raid 5: dd - write 20GB file - 48MB/sec dd - read 20GB file - 247MB/sec [ didn't do a bonnie run on this yet ] pretty terrible write performance. good read. hardware raid 10 dd - write 20GB - 104MB/sec dd - read 20GB - 196MB/sec bonnie++ Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- -- Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % CP /sec %CP db03 9592M 45830 97 129501 31 62981 14 48524 99 185818 19 949.0 1 software raid 5 dd - write 20gb - 85MB/sec dd - read 20gb - 135MB/sec I was very suprised at those results. I was sort of expecting it to smoke the hardware. I repeated the test many times, and kept getting these numbers. bonnie++: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- -- Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % CP /sec %CP db03 9592M 44110 97 81481 23 34604 10 44495 95 157063 28 919.3 1 software 10: dd - write - 20GB - 108MB/sec dd - read - 20GB - 86MB/sec( WTF? - this is repeatable!!) bonnie++ Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- -- Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % CP /sec %CP db03 9592M 44539 98 105444 20 34127 8 39830 83 100374 10 1072 1 so I'm going to be going with hw r5, which went against what I thought going in - read perf is more important for my usage than write. I'm still not sure about that software 10 read number. something is not right there... -- Jeff Trout [EMAIL PROTECTED] http://www.dellsmartexitin.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] index usage
A volatile function has may return a different result for each row; think of the random() or nextval() functions for example. You wouldn't want them to return the same value for each row returned. -- Mark Lewis On Fri, 2006-07-28 at 13:59 -0700, Ben wrote: It's volatile, but it will always return an integer. On Fri, 28 Jul 2006, Daniel Caune wrote: De : [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] De la part de Ben Envoyé : vendredi, juillet 28, 2006 15:21 À : pgsql-performance@postgresql.org Objet : [PERFORM] index usage I have a table with 37000 rows, an integer column, and an index on that column. I've got a function that returns an integer. When I do a select where I restrict that column to being equal to a static number, explain tells me the index will be used. When I do the same thing but use the function instead of a static number, explain shows me a full scan on the table. I must be missing something, because my understanding is that the function will be evaluated once for the statement and then collapsed into a static number for the filtering. But the results of the explain seem to imply that's not the case? Is your function IMMUTABLE, STABLE or VOLATILE? -- Daniel ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Savepoint performance
All, I support a system that runs on several databases including PostgreSQL. I've noticed that the other DB's always put an implicit savepoint before each statement executed, and roll back to that savepoint if the statement fails for some reason. PG does not, so unless you manually specify a savepoint you lose all previous work in the transaction. So my question is, how expensive is setting a savepoint in PG? If it's not too expensive, I'm wondering if it would be feasible to add a config parameter to psql or other client interfaces (thinking specifically of jdbc here) to do it automatically. Doing so would make it a little easier to work with PG in a multi-db environment. My main reason for wanting this is so that I can more easily import, say, 50 new 'objects' (related rows stored across several tables) in a transaction instead of only one at a time without fear that an error in one object would invalidate the whole batch. I could do this now by manually setting savepoints, but if it's not a big deal performance-wise to modify the JDBC driver to start an anonymous savepoint with each statement, then I'd prefer that approach as it seems that it would make life easier for other folks too. Thanks in advance for any feedback :) -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Commit slower on faster PC
The IDE drive is almost certainly lying about flushing data to the disk. Lower-end consumer drives often do. What this means is that commits will be a whole lot faster, but the database loses its ACID guarantees, because a power failure at the wrong moment could corrupt the whole database. If you don't care about your data and want the SCSI drive to perform fast just like the IDE drive, you can set fsync = off in your configuration file. -- Mark On Wed, 2006-07-12 at 10:16 -0600, Koth, Christian (DWBI) wrote: Hi, please help me with the following problem: I have noticed a strange performance behaviour using a commit statement on two different machines. On one of the machines the commit is many times faster than on the other machine which has faster hardware. Server and client are running always on the same machine. Server version (same on both machines): PostgreSQL 8.1.3. (same binaries as well) PC1: Pentium 4 (2.8 GHz) 1GB RAM IDE-HDD (approx. 50 MB/s rw), fs: ext3 Mandrake Linux: Kernel 2.4.22 PC2: Pentium 4 (3.0 GHz) 2GB RAM SCSI-HDD (approx. 65 MB/s rw), fs: ext3 Mandrake Linux: Kernel 2.4.32 Both installations of the database have the same configuration, different from default are only the following settings on both machines: shared_buffers = 2 listen_addresses = '*' max_stack_depth = 4096 pgbench gives me the following results: PC1: transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 10 number of transactions actually processed: 10/10 tps = 269.905533 (including connections establishing) tps = 293.625393 (excluding connections establishing) PC2: transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 10 number of transactions actually processed: 10/10 tps = 46.061935 (including connections establishing) tps = 46.519634 (excluding connections establishing) My own performance test sql script which inserts and (auto)commits some data into a simple table produces the following log output in the server log: PC1: LOG: duration: 1.441 ms statement: INSERT INTO performance_test VALUES (500938362, 'Xawhefjmd'); STATEMENT: INSERT INTO performance_test VALUES (500938362, 'Xawhefjmd'); PC2: LOG: duration: 29.979 ms statement: INSERT INTO performance_test VALUES (500938362, 'Xawhefjmd'); STATEMENT: INSERT INTO performance_test VALUES (500938362, 'Xawhefjmd'); I created a 'strace' one both machines which is interesting: Opening the socket: --- PC1: socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 10 0.21 PC2: socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8 0.15 PC1: bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr(0.0.0.0)}, 16) = 0 0.07 PC2: bind (8, {sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr(0.0.0.0)}}, 16) = 0 0.07 PC1: getsockname(10, {sa_family=AF_INET, sin_port=htons(32820), sin_addr=inet_addr(0.0.0.0)}, [16]) = 0 0.05 PC2: getsockname( 8, {sin_family=AF_INET, sin_port=htons(36219), sin_addr=inet_addr(0.0.0.0)}}, [16]) = 0 0.05 PC1: connect(10, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr(127.0.0.1)}, 16) = 0 0.000440 PC2: connect( 8, {sin_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr(127.0.0.1)}}, 16) = 0 0.000394 PC1: setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.06 PC2: setsockopt (8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.04 Inserting and commiting the data: exec. time - PC1: send(10, B\\0\0INSERT INTO performance_test VAL..., 175, 0) = 175 0.15 recv(10, 2\0\17INSERT 0 1\0Z\0\0\0\5T, 8192, 0) = 53 0.07 send(10, B\0\0\0\17\0S_2\0\0\0\0\0\0\0E\0\0\0\t\0\0\0\0\1S\0\0\0\4, 31, 0) = 31 0.11 recv(10, 2\0\0\0\4C\0\0\0\vCOMMIT\0Z\0\0\0\5I, 8192, 0) = 23 0.000211 PC2: send(8, B\\0\0INSERT INTO performance_test VAL..., 175, 0) = 175 0.14 recv(8, 2\0\17INSERT 0 1\0Z\0\0\0\5T, 8192, 0) = 53 0.05 send(8, B\0\0\0\17\0S_2\0\0\0\0\0\0\0E\0\0\0\t\0\0\0\0\1S\0\0\0\4, 31, 0) = 31 0.09 recv(8, 2\0\0\0\4C\0\0\0\vCOMMIT\0Z\0\0\0\5I, 8192, 0) = 23 0.0253 Every command is a bit faster on PC2 except the last one which is many times slower. Any help or hint where to look at would be highly appreciated because I'm running out of ideas ;-). regards, Christian ** The information contained in, or attached to, this e-mail, may contain confidential information and is intended solely for the use of the individual or entity to whom they are addressed and may be subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it
Re: [PERFORM] Postgres consuming way too much memory???
On Thu, 2006-06-15 at 11:34 -0400, Tom Lane wrote: jody brownell [EMAIL PROTECTED] writes: When postgresql starts to go into this bloating state, I can only make it happen from my java app. That's interesting. The JDBC driver uses protocol features that aren't used by psql, so it's possible that the leak is triggered by one of those features. I wouldn't worry too much about duplicating the problem from psql anyway --- a Java test case will do fine. I am going to try closing the connection after each TX to see if this resolves it for now. If not, I will write a java app, stored procedure (table etc) reproduce it without our application. Just to mention another possible culprit; this one doesn't seem all that likely to me, but at least it's easy to investigate. With DBCP and non-ancient versions of the JDBC driver that use v3 protocol and real prepared statements, it is possible to (mis)configure the system to create an unbounded number of cached prepared statements on any particular connection. Older versions of DBCP were also known to have bugs which aggravated this issue when prepared statement caching was enabled, IIRC. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimizer internals
On Thu, 2006-06-15 at 14:05 -0400, John Vincent wrote: Now I've been told by our DBA that we should have been able to wholy satisfy that query via the indexes. DB2 can satisfy the query using only indexes because DB2 doesn't do MVCC. Although MVCC is generally a win in terms of making the database easier to use and applications less brittle, it also means that the database must inspect the visibility information for each row before it can answer a query. For most types of queries this isn't a big deal, but for count(*) type queries, it slows things down. Since adding the visibility information to indexes would make them significantly more expensive to use and maintain, it isn't done. Therefore, each row has to be fetched from the main table anyway. Since in this particular query you are counting all rows of the database, PG must fetch each row from the main table regardless, so the sequential scan is much faster because it avoids traversing the index and performing random read operations. -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer internals
On Thu, 2006-06-15 at 14:46 -0400, John Vincent wrote: One question that we came up with is how does this affect other aggregate functions like MAX,MIN,SUM and whatnot? Being that this is our data warehouse, we use these all the time. As I've said previously, I didn't know a human could generate some of the queries we've passed through this system. Previously, MIN and MAX would also run slowly, for the same reason as COUNT(*). But there really isn't a need for that, since you can still get a big speedup by scanning the index in order, looking up each row and stopping as soon as you find a visible one. This has been fixed so newer versions of PG will run quickly and use the index for MIN and MAX. I don't remember which version had that change; it might not be until 8.2. You can dig the archives to find out for sure. For older versions of PG before the fix, you can make MIN and MAX run quickly by rewriting them in the following form: SELECT column FROM table ORDER BY column LIMIT 1; Unfortunately SUM is in the same boat as COUNT; in order for it to return a meaningful result it must inspect visibility information for all of the rows. -- Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN performance mystery
On Thu, 2006-06-15 at 18:24 -0400, Tom Lane wrote: I agree with Brian's suspicion that the SATA drive isn't properly fsync'ing to disk, resulting in bogusly high throughput. However, ISTM a well-configured SAN ought to be able to match even the bogus throughput, because it should be able to rely on battery-backed cache to hold written blocks across a power failure, and hence should be able to report write-complete as soon as it's got the page in cache rather than having to wait till it's really down on magnetic platter. Which is what the SATA drive is doing ... only it can't keep the promise it's making for lack of any battery backup on its on-board cache. It really depends on your SAN RAID controller. We have an HP SAN; I don't remember the model number exactly, but we ran some tests and with the battery-backed write cache enabled, we got some improvement in write performance but it wasn't NEARLY as fast as an SATA drive which lied about write completion. The write-and-fsync latency was only about 2-3 times better than with no write cache at all. So I wouldn't assume that just because you've got a write cache on your SAN, that you're getting the same speed as fsync=off, at least for some cheap controllers. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgres fsync off (not needed) with NetApp
No. You need fsync on in order to force the data to get TO the NetApp at the right time. With fsync off, the data gets cached in the operating system. -- Mark Lewis On Wed, 2006-06-14 at 14:48 -0700, Dan Gorman wrote: All, So I thought I'd pose this question: If I have a pg database attached to a powervault (PV) with just an off-the-shelf SCSI card I generally want fsync on to prevent data corruption in case the PV should loose power. However, if I have it attached to a NetApp that ensures data writes to via the NVRAM can I safely turn fsync off to gain additional performance? Best Regards, Dan Gorman ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Initial database loading and IDE x SCSI
On Fri, 2006-06-02 at 15:25 -0300, [EMAIL PROTECTED] wrote: Hi, I would like to know if my supposition is right. Considering an environment with only one hard disk attached to a server, an initial loading of the database probably is much faster using an IDE/ATA interface with write-back on than using an SCSI interface. That´s because of the SCSI command interface overhead. No, it's because the SCSI drive is honoring the database's request to make sure the data is safe. Then main advantage of SCSI interfaces, the multiuser environment is lost in this scenery. Am I right? Am I missing something here? Even if I´m right, is something that could be done too improove SCSI loading performance in this scenery? You can perform the initial load in large transactions. The extra overhead for ensuring that data is safely written to disk will only be incurred once per transaction, so try to minimize the number of transactions. You could optionally set fsync=off in postgresql.conf, which means that the SCSI drive will operate with no more safety than an IDE drive. But you should only do that if you're willing to deal with catastrophic data corruption. But if this is for a desktop application where you need to support IDE drives, you'll need to deal with that anyway. Thanks in advance! Reimer ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: RES: RES: [PERFORM] Initial database loading and IDE x SCSI
On Fri, 2006-06-02 at 17:37 -0300, [EMAIL PROTECTED] wrote: Many thanks Mark, I will consider fsync=off only to do an initial load, not for a database normal operation. This approach works well. You just need to remember to shut down the database and start it back up again with fsync enabled for it to be safe after the initial load. I was just thinking about this hipotetical scenario: a) a restore database operation b) fsync off c) write-back on (IDE) As I could understand, in this sceneraio, it´s normal the IDE drive be faster than the SCSI, ok? If fsync is off, then the IDE drive loses its big advantage, so IDE and SCSI should be about the same speed. Of course, the database is exposed because of the fsync=off, but if you consider only the system performance, then it is true. Isn´t it? Thanks, Reimer ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
On Thu, 2006-05-25 at 16:52 -0400, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. The index won't help, as per this comment from planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. Given the numbers you mention (300k rows in 2000 groups) I'm not convinced that an index-based implementation would help much; we'd still need to fetch at least one record out of every 150, which is going to cost near as much as seqscanning all of them. Well, if the MySQL server has enough RAM that the index is cached (or index + relevant chunks of data file if using InnoDB?) then that would explain how MySQL can use an index to get fast results. -- Mark Lewis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Getting even more insert performance (250m+rows/day)
The data gets inserted in batches every 5 minutes and I potentially have people querying it constantly, so I can't remove and re-create the index. How live does your data need to be? One possibility would be to use a separate table for each batch instead of a separate table per day, create the indexes after the import and only after the indexes have been created make the table available for user queries. You'd be trading latency for throughput in that case. Also, you mentioned that you're CPU-bound, but that you have multiple CPU's. In that case, performing N concurrent imports (where N is the number of processor cores available) might be a win over a single- threaded import. -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] performance question (something to do w/
Doing a SELECT with a large list of variables inside an IN runs slowly on every database we've tested. We've tested mostly in Oracle and PostgreSQL, and both get very slow very quickly (actually Oracle refuses to process the query at all after it gets too many bind parameters). In our case, we have a (potentially quite large) set of external values that we want to look up in the database. We originally thought that doing a single select with a large IN clause was the way to go, but then we did some performance analysis on the optimal batch size (number of items to include per IN clause), and discovered that for most databases, the optimal batch size was 1. For PostgreSQL I think it was 2. The moral of the story is that you're probably better off running a bunch of small selects than in trying to optimize things with one gargantuan select. -- Mark Lewis On Mon, 2006-05-08 at 13:29 -0400, Jeffrey Tenny wrote: Why does this query take so long? (PostgreSQL 8.0.3, FC4) Hopefully I have provided enough information below. LOG: statement: SELECT * FROM x WHERE f IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,\ $25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63\ ,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,\ $102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,\ $133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,\ $164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,\ $195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,\ $226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,\ $257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286,$287,\ $288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,\ $319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,\ $350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,\ $381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,\ $412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,\ $443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,\ $474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,\ $505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,\ $536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,\ $567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,\ $598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,\ $629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650) ORDER BY f,c LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 10.282945 elapsed 10.23 user 0.048992 system sec ! [25.309152 user 0.500923 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [0/10397] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/15 [291/55] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00
Re: [PERFORM] Slow restoration question
They are not equivalent. As I understand it, RAID 0+1 performs about the same as RAID 10 when everything is working, but degrades much less nicely in the presence of a single failed drive, and is more likely to suffer catastrophic data loss if multiple drives fail. -- Mark On Tue, 2006-05-02 at 12:40 -0600, Brendan Duddridge wrote: Everyone here always says that RAID 5 isn't good for Postgres. We have an Apple Xserve RAID configured with RAID 5. We chose RAID 5 because Apple said their Xserve RAID was optimized for RAID 5. Not sure if we made the right decision though. They give an option for formatting as RAID 0+1. Is that the same as RAID 10 that everyone talks about? Or is it the reverse? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On May 2, 2006, at 11:16 AM, Jim C. Nasby wrote: On Wed, Apr 26, 2006 at 05:14:41PM +0930, Eric Lam wrote: all dumpfiles total about 17Gb. It has been running for 50ish hrs and up to about the fourth file (5-6 ish Gb) and this is on a raid 5 server. RAID5 generally doesn't bode too well for performance; that could be part of the issue. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] hardare config question
A UPS will make it less likely that the system will reboot and destroy your database due to a power failure, but there are other causes for a system reboot. With a BBU, the only component that can fail and cause catastrophic data loss is the RAID itself. With a UPS, you are additionally vulnerable to OS crashes, failures in non-RAID hardware, UPS failures, or anything else that would necessitate a hard reboot. So a UPS is a decent replacement for a BBU only if you trust your app server/OS more than you value your data. -- Mark Lewis On Mon, 2006-05-01 at 10:58 -0700, Erik Myllymaki wrote: I have been in discussion with 3ware support and after adjusting some settings, the 3ware card in RAID 1 gets better performance than the single drive. I guess this had everything to do with the write (and maybe read?) cache. Of course now i am in a dangerous situation - using volatile write cache without a BBU. If I were to use a UPS to ensure a soft shutdown in the event of power loss, am I somewhat as safe as if I were to purchase a BBU for this RAID card? Thanks. Mark Lewis wrote: It's also possible that the single SATA drive you were testing (or the controller it was attached to) is lying about fsync and performing write caching behind your back, whereas your new controller and drives are not. You'll find a lot more info on the archives of this list about it, but basically if your application is committing a whole lot of small transactions, then it will run fast (but not safely) on a drive which lies about fsync, but slower on a better disk subsystem which doesn't lie about fsync. Try running a test with fsync=off with your new equipment and if it suddenly starts running faster, then you know that's the problem. You'll either have a choice of losing all of your data the next time the system shuts down uncleanly but being fast, or of running slow, or of fixing the applications to use chunkier transactions. -- Mark On Fri, 2006-04-28 at 13:36 -0400, Vivek Khera wrote: On Apr 28, 2006, at 11:37 AM, Erik Myllymaki wrote: When I had this installed on a single SATA drive running from the PE1800's on-board SATA interface, this operation took anywhere from 65-80 seconds. With my new RAID card and drives, this operation took 272 seconds!? switch it to RAID10 and re-try your experiment. if that is fast, then you know your raid controller does bad RAID5. anyhow, I have in one server (our office mail server and part-time development testing box) an adaptec SATA RAID from dell. it is configured for RAID5 and does well for normal office stuff, but when we do postgres tests on it, it just is plain old awful. but I have some LSI based cards on which RAID5 is plenty fast and suitable for the DB, but those are SCSI. For what it is worth, the Dell PE1850 internal PERC4/Si card is wicked fast when hooked up with a pair of U320 SCSI drives. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware: HP StorageWorks MSA 1500
Hmmm. We use an MSA 1000 with Fibre Channel interconnects. No real complaints, although I was a little bit disappointed by the RAID controller's battery-backed write cache performance; tiny random writes are only about 3 times as fast with write caching enabled as with it disabled, I had (perhaps naively) hoped for more. Sequential scans from our main DB (on a 5-pair RAID 10 set with 15k RPM drives) get roughly 80MB/sec. Getting the redundant RAID controllers to fail over correctly on Linux was a big headache and required working the tech support phone all day until we finally got to the deep guru who knew the proper undocumented incantations. -- Mark Lewis On Thu, 2006-04-20 at 20:00 +0200, Mikael Carneholm wrote: We're going to get one for evaluation next week (equipped with dual 2Gbit HBA:s and 2x14 disks, iirc). Anyone with experience from them, performance wise? Regards, Mikael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Database possible corruption , unsolvable mystery
Can you post an explain analyze for the delete query? That will at least tell you if it is the delete itself which is slow, or a trigger / referential integrity constraint check. Which version of PG is this? -- Mark Lewis On Wed, 2006-03-29 at 12:58 -0500, Eric Lauzon wrote: Greetings, We have an issue where we have a database with many tables. The layout of the database is 3 set of look alike tables with different names. Each set of tables has some referential integrety that point back to the main control table. On two set of tables we are able to efficiently delete referential and main record without a problems, but on the 3rd set we have an issue where the control table is clugged and delete seem to take forever , as example on the first two set a delete of 60K record take about 4 second to 10 second but on the 3rd set it can take as long as 3hours. This seem to be only affecting one database , the schema and way of doing is replicated elsewhere and if efficient. The question is, even after droping 3rd set integrity , dumping the table data , deleting the table, doing a manual checkpoint and recreating the table with the dump data , with or without referential integrity , the problems still araise. If we copy the data from the live table and do a create table aaa as select * from problematic_table; the table aaa operations are normal and efficient. This is why our investigation brought us to the folowing questions: 1. Are postgresql data file name are hashed references to table name(as oracle)? [~path to data EX:/var/log/pgsql/data/[arbitraty numbers]/[datafile]]? 2. If the data files are corrupted and we re-create is it possible it uses the same files thus creating the same issue? 3. Since we know that all the tables has that problems is there an internal table with undisclosed references to tables data files? I hope the questions were clear. Have a good day, and thank you in advance. Eric Lauzon [Recherche Développement] Above Sécurité / Above Security Tél : (450) 430-8166 Fax : (450) 430-1858 AVERTISSEMENT CONCERNANT LA CONFIDENTIALITÉ Le présent message est à l'usage exclusif du ou des destinataires mentionnés ci-dessus. Son contenu est confidentiel et peut être assujetti au secret professionnel. Si vous avez reçu le présent message par erreur, veuillez nous en aviser immédiatement et le détruire en vous abstenant d'en faire une copie, d'en divulguer le contenu ou d'y donner suite. CONFIDENTIALITY NOTICE This communication is intended for the exclusive use of the addressee identified above. Its content is confidential and may contain privileged information. If you have received this communication by error, please notify the sender and delete the message without copying or disclosing it. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres and Ingres R3 / SAN
On Tue, 2006-03-07 at 13:00 -0600, Jim C. Nasby wrote: ... PostgreSQL on a SAN won't buy you what I think you think it will. It's essentially impossible to safely run two PostgreSQL installs off the same data files without destroying your data. What a SAN can buy you is disk-level replication, but I've no experience with that. It is possible to run two instances against the same SAN using tools such as RedHat's Cluster Suite. We use that in-house as a cheap alternative for Oracle clustering, although we're not using it for our PostgreSQL servers yet. It's not for load balancing, just active/passive fault tolerance. -- Mark Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote: Once or twice we've kicked around the idea of having some datatype-specific sorting code paths alongside the general-purpose one, but I can't honestly see this as being workable from a code maintenance standpoint. regards, tom lane It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int outputs, such that the following two properties hold: f(a)=f(b) iff a=b if a==b then f(a)==f(b) So if a data type supports the sortKey interface you could perform the sort on f(value) and only refer back to the actual element comparison functions when two sortKeys have the same value. Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). Depending on the overhead, you might not even need to maintain 2 independent search code paths, since you could always use f(x)=0 as the default sortKey function which would degenerate to the exact same sort behavior in use today. -- Mark Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). How exactly do you imagine doing this for text? I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though. In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit sortKey as elsewhere suggested). The sorting key doesn't need to be a one-to-one mapping. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Reliability recommendations
Machine 1: $2000 Machine 2: $2000 Machine 3: $2000 Knowing how to rig them together and maintain them in a fully fault- tolerant way: priceless. (Sorry for the off-topic post, I couldn't resist). -- Mark Lewis On Wed, 2006-02-15 at 09:19 -0800, Craig A. James wrote: Jeremy Haile wrote: We are a small company looking to put together the most cost effective solution for our production database environment. Currently in production Postgres 8.1 is running on this machine: Dell 2850 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache 4 GB DDR2 400 Mhz 2 x 73 GB 10K SCSI RAID 1 (for xlog and OS) 4 x 146 GB 10K SCSI RAID 10 (for postgres data) Perc4ei controller ... I sent our scenario to our sales team at Dell and they came back with all manner of SAN, DAS, and configuration costing as much as $50k. Given what you've told us, a $50K machine is not appropriate. Instead, think about a simple system with several clones of the database and a load-balancing web server, even if one machine could handle your load. If a machine goes down, the load balancer automatically switches to the other. Look at the MTBF figures of two hypothetical machines: Machine 1: Costs $2,000, MTBF of 2 years, takes two days to fix on average. Machine 2: Costs $50,000, MTBF of 100 years (!), takes one hour to fix on average. Now go out and buy three of the $2,000 machines. Use a load-balancer front end web server that can send requests round-robin fashion to a server farm. Clone your database. In fact, clone the load-balancer too so that all three machines have all software and databases installed. Call these A, B, and C machines. At any given time, your Machine A is your web front end, serving requests to databases on A, B and C. If B or C goes down, no problem - the system keeps running. If A goes down, you switch the IP address of B or C and make it your web front end, and you're back in business in a few minutes. Now compare the reliability -- in order for this system to be disabled, you'd have to have ALL THREE computers fail at the same time. With the MTBF and repair time of two days, each machine has a 99.726% uptime. The MTBF, that is, the expected time until all three machines will fail simultaneously, is well over 100,000 years! Of course, this is silly, machines don't last that long, but it illustrates the point: Redundancy is beats reliability (which is why RAID is so useful). All for $6,000. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query planner issue
You have lots of dead rows. Do a vacuum full to get it under control, then run VACUUM more frequently and/or increase your FSM settings to keep dead rows in check. In 7.2 vacuum is pretty intrusive; it will be much better behaved once you can upgrade to a more recent version. You really, really want to upgrade as soon as possible, and refer to the on-line docs about what to do with your FSM settings. -- Mark Lewis On Mon, 2006-01-30 at 23:57 +0100, Emmanuel Lacour wrote: Hi everybody, I have the following problem, on a test server, if I do a fresh import of production data then run 'explain analyze select count(*) from mandats;' I get this result: Aggregate (cost=6487.32..6487.32 rows=1 width=0) (actual time=607.61..607.61 rows=1 loops=1) - Seq Scan on mandats (cost=0.00..6373.26 rows=45626 width=0) (actual time=0.14..496.20 rows=45626 loops=1) Total runtime: 607.95 msec On the production server, if I do the same (without other use of the server), I get: Aggregate (cost=227554.33..227554.33 rows=1 width=0) (actual time=230705.79..230705.79 rows=1 loops=1) - Seq Scan on mandats (cost=0.00..227440.26 rows=45626 width=0) (actual time=0.03..230616.64 rows=45760 loops=1) Total runtime: 230706.08 msec Is there anyone having an idea on how yo solve this poor performances? I think it is caused by many delete/insert on this table every day, but how to solve it, I need to run this qury each hour :(. I run vacuum each night, postgresql is unfortunatly 7.2.1 :( (no upgrade before 2 or 3 months). ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Suspending SELECTs
I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could easily implement a cursor-pooling strategy, but there is no perfect solution to the problem of guaranteeing that cursors be closed. Remember that web applications require the user to open a session by connecting the appropriate HTTP resource, but users as never required to log out. Hence, in order to eventually reclaim all cursors, I must use magical log-out detection algorithm, which is usually implemented with a simple timeout. This guarantees the required property of safety (the population of cursors does not diverge) but does not guarantee the required property of liveness (a user connecting to the application, who has opened a session but has not logged out, and thus possesses a session token, should have access the execution context identified by his token). I fail to see the problem here. Why should liveness be a required property? If is it simply that you can't promptly detect when a user is finished with their web session so you can free resources, then remember that there is no requirement that you dedicate a connection to their session in the first place. Even if you're using your own custom middleware, it isn't a very complicated or conceptually difficult thing to implement (see my previous post). Certainly it's simpler than allowing clients to pass around runtime state. As far as implementing this sort of thing in the back-end, it would be really hard with the PostgreSQL versioning model. Oracle can more easily (and kind of does) support cursors like you've described because they implement MVCC differently than PostgreSQL, and in their implementation you're guaranteed that you always have access to the most recent x megabytes of historical rows, so even without an open transaction to keep the required rows around you can still be relatively sure they'll be around for long enough. In PostgreSQL, historical rows are kept in the tables themselves and periodically vacuumed, so there is no such guarantee, which means that you would need to either implement a lot of complex locking for little material gain, or just hold the cursors in moderately long-running transactions, which leads back to the solution suggested earlier. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Suspending SELECTs
On Mon, 2006-01-16 at 11:13 +0100, Alessandro Baretta wrote: I am aware that what I am dreaming of is already available through cursors, but in a web application, cursors are bad boys, and should be avoided. What I would like to be able to do is to plan a query and run the plan to retreive a limited number of rows as well as the executor's state. This way, the burden of maintaining the cursor on hold, between activations of the web resource which uses it, is transferred from the DBMS to the web application server, and, most importantly, the responsibility for garbage-collecting stale cursors is implicitely delegated to the garbage-collector of active user sessions. Without this mechanism, we are left with two equally unpleasant solutions: first, any time a user instantiates a new session, a new cursor would have to be declared for all relevant queries, and an ad-hoc garbage collection daemon would have to be written to periodically scan the database for stale cursors to be closed; otherwise, instead of using cursors, the web application could resort to OFFSET-LIMIT queries--no garbage collection issues but pathetic performance and server-load. Do we have any way out? Alex I know that Tom has pretty much ruled out any persistent cursor implementation in the database, but here's an idea for a workaround in the app: Have a pool of connections used for these queries. When a user runs a query the first time, create a cursor and remember that this user session is associated with that particular connection. When the user tries to view the next page of results, request that particular connection from the pool and continue to use the cursor. Between requests, this connection could of course be used to service other users. This avoids the awfulness of tying up a connection for the entire course of a user session, but still allows you to use cursors for performance. When a user session is invalidated or times out, you remove the mapping for this connection and close the cursor. Whenever there are no more mappings for a particular connection, you can use the opportunity to close the current transaction (to prevent eternal transactions). If the site is at all busy, you will need to implement a pooling policy such as 'do not open new cursors on the connection with the oldest transaction', which will ensure that all transactions can be closed in a finite amount of time, the upper bound on the duration of a transaction is (longest_session_duration * connections in pool). Limitations: 1. You shouldn't do anything that acquires write locks on the database using these connections, because the transactions will be long-running. To mitigate this, use a separate connection pool. 2. Doesn't work well if some queries take a long time to run, because other users may need to wait for the connection, and another connection won't do. 3. If this is a busy web site, you might end up with potentially many thousands of open cursors. I don't know if this introduces an unacceptable performance penalty or other bottleneck in the server? -- Mark Lewis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Extremely irregular query performance
If this is a query that will be executed more than once, you can also avoid incurring the planning overhead multiple times by using PREPARE. -- Mark Lewis On Wed, 2006-01-11 at 18:50 -0500, Jean-Philippe Côté wrote: Thanks a lot for this info, I was indeed exceeding the genetic optimizer's threshold. Now that it is turned off, I get a very stable response time of 435ms (more or less 5ms) for the same query. It is about three times slower than the best I got with the genetic optimizer on, but the overall average is much lower. I'll also try to play with the geqo parameters and see if things improve. Thanks again, J-P ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] help tuning queries on large database
Ron, A few days back you mentioned: Upgrade your kernel to at least 2.6.12 There's a known issue with earlier versions of the 2.6.x kernel and 64b CPUs like the Opteron. See kernel.org for details. I did some searching and couldn't find any obvious mention of this issue (I gave up after searching through the first few hundred instances of 64 in the 2.6.12 changelog). Would you mind being a little more specific about which issue you're talking about? We're about to deploy some new 16GB RAM Opteron DB servers and I'd like to check and make sure RH backported whatever the fix was to their current RHEL4 kernel. Thanks, Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] browsing table with 2 million records
Do you have an index on the date column? Can you post an EXPLAIN ANALYZE for the slow query? -- Mark Lewis On Wed, 2005-10-26 at 13:41 -0700, aurora wrote: I am running Postgre 7.4 on FreeBSD. The main table have 2 million record (we would like to do at least 10 mil or more). It is mainly a FIFO structure with maybe 200,000 new records coming in each day that displace the older records. We have a GUI that let user browser through the record page by page at about 25 records a time. (Don't ask me why but we have to have this GUI). This translates to something like select count(*) from table -- to give feedback about the DB size select * from table order by date limit 25 offset 0 Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run. I read some recent messages that select count(*) would need a table scan for Postgre. That's disappointing. But I can accept an approximation if there are some way to do so. But how can I optimize select * from table order by date limit x offset y? One minute response time is not acceptable. Any help would be appriciated. Wy ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Is There Any Way ....
Which version of PG are you using? One of the new features for 8.0 was an improved caching algorithm that was smart enough to avoid letting a single big query sweep everything else out of cache. -- Mark Lewis On Tue, 2005-10-04 at 10:45 -0400, Lane Van Ingen wrote: Yes, Stefan, the kind of usage you are mentioning is exactly why I was asking. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Stefan Weiss Sent: Tuesday, October 04, 2005 6:32 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Is There Any Way On 2005-09-30 01:21, Lane Van Ingen wrote: (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ? I was wondering about this too. IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as cheap, even when querying on columns that are not indexed. I'm thinking about smallish tables like users, groups, *types, etc which would be needed every 2-3 queries, but might be swept out of RAM by one large query in between. Keeping a table like users on a RAM fs would not be an option, because the information is not volatile. cheers, stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Releasing memory during External sorting?
operations != passes. If you were clever, you could probably write a modified bubble-sort algorithm that only made 2 passes. A pass is a disk scan, operations are then performed (hopefully in memory) on what you read from the disk. So there's no theoretical log N lower-bound on the number of disk passes. Not that I have anything else useful to add to this discussion, just a tidbit I remembered from my CS classes back in college :) -- Mark On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote: Ron Peacetree [EMAIL PROTECTED] writes: 2= No optimal external sorting algorithm should use more than 2 passes. 3= Optimal external sorting algorithms should use 1 pass if at all possible. A comparison-based sort must use at least N log N operations, so it would appear to me that if you haven't got approximately log N passes then your algorithm doesn't work. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor performance of delete by primary key
I had a similar problem, so I downloaded 8.1 from CVS, ran it on a relatively gnarly dev workstation, imported a dump of my 8.0 database, and ran my troublesome queries with the new EXPLAIN ANALYZE. This process took about an hour and worked great, provided that you've actually named your foreign key constraints. Otherwise, you'll find out that there's a trigger for a constraint called $3 that's taking up all of your time, but you won't know what table that constraint is on. -- Mark On Tue, 2005-09-06 at 11:32 -0400, Tom Lane wrote: Brian Choate [EMAIL PROTECTED] writes: We are seeing a very strange behavior from postgres. For one of our very = common tasks we have to delete records from a table of around 500,000 = rows. The delete is by id which is the primary key. It seems to be = consistently taking around 10 minutes to preform. This is totally out of = line with the rest of the performance of the database. I'll bet this table has foreign-key references from elsewhere, and the referencing columns are either not indexed, or not of the same datatype as the master column. Unfortunately there's no very simple way to determine which FK is the problem. (In 8.1 it'll be possible to do that with EXPLAIN ANALYZE, but in existing releases EXPLAIN doesn't break out the time spent in each trigger ...) You have to just eyeball the schema :-(. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] PG8 Tuning
(Musing, trying to think of a general-purpose performance-tuning rule that applies here): Actually, it seems to me that with the addition of the WAL in PostgreSQL and the subsequent decreased need to fsync the data files themselves (only during checkpoints?), that the only time a battery-backed write cache would make a really large performance difference would be on the drive(s) hosting the WAL. So although it is in general good to have a dedicated spindle for the WAL, for many workloads it is in fact significantly better to have the WAL written to a battery-backed write cache. The exception would be for applications with fewer, larger transactions, in which case you could actually use the dedicated spindle. Hmmm, on second thought, now I think I understand the rationale behind having a non-zero commit delay setting-- the problem with putting pg_xlog on a single disk without a write cache is that frequent fsync() calls might cause it to spend most of its time seeking instead of writing (as seems to be happening to Paul here). Then again, the OS IO scheduler should take care of this for you, making this a non-issue. Perhaps Solaris 10 just has really poor IO scheduling performance with this particular hardware and workload? Ah well. Thought myself in circles and have no real conclusions to show for it. Posting anyway, maybe this will give somebody some ideas to work with. -- Mark Lewis On Fri, 2005-08-12 at 08:47 +, Steve Poe wrote: Paul, Before I say anything else, one online document which may be of assistance to you is: http://www.powerpostgresql.com/PerfList/ Some thoughts I have: 3) You're shared RAM setting seems overkill to me. Part of the challenge is you're going from 1000 to 262K with no assessment in between. Each situation can be different, but try in the range of 10 - 50K. 4) pg_xlog: If you're pg_xlog is on a spindle is *only* for pg_xlog you're better off. If it is sharing with any other OS/DB resource, the performance will be impacted. From what I have learned from others on this list, RAID5 is not the best choice for the database. RAID10 would be a better solution (using 8 of your disks) then take the remaining disk and do mirror with your pg_xlog if possible. Best of luck, Steve Poe On Thu, 2005-08-11 at 13:23 +0100, Paul Johnson wrote: Hi all, we're running PG8 on a Sun V250 with 8GB RAM and 2*1.3GHz SPARC CPUs running Solaris 10. The DB cluster is on an external fibre-attached Sun T3 array that has 9*36GB drives configured as a single RAID5 LUN. The system is for the sole use of a couple of data warehouse developers, hence we are keen to use 'aggressive' tuning options to maximise performance. So far we have made the following changes and measured the impact on our test suite: 1) Increase checkpoint_segments from 3 to 64. This made a 10x improvement in some cases. 2) Increase work_mem from 1,024 to 524,288. 3) Increase shared_buffers from 1,000 to 262,143 (2 GB). This required setting SHMMAX=4294967295 (4 GB) in /etc/system and re-booting the box. Question - can Postgres only use 2GB RAM, given that shared_buffers can only be set as high as 262,143 (8K pages)? So far so good... 4) Move /pg_xlog to an internal disk within the V250. This has had a severe *negative* impact on performance. Copy job has gone from 2 mins to 12 mins, simple SQL job gone from 1 min to 7 mins. Not even run long SQL jobs. I'm guessing that this is because pg_xlog has gone from a 9 spindle LUN to a single spindle disk? In cases such as this, where an external storage array with a hardware RAID controller is used, the normal advice to separate the data from the pg_xlog seems to come unstuck, or are we missing something? Cheers, Paul Johnson. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] faster INSERT with possible pre-existing row?
Easier and faster than doing the custom trigger is to simply define a unique index and let the DB enforce the constraint with an index lookup, something like: create unique index happy_index ON happy_table(col1, col2, col3); That should run faster than the custom trigger, but not as fast as the temp table solution suggested elsewhere because it will need to do an index lookup for each row. With this solution, it is important that your shared_buffers are set high enough that the happy_index can be kept in memory, otherwise performance will drop precipitously. Also, if you are increasing the size of the table by a large percentage, you will want to ANALYZE periodically, as an optimal plan for a small table may be a disaster for a large table, and PostgreSQL won't switch plans unless you run ANALYZE. -- Mark On Tue, 2005-07-26 at 14:51 -0500, John A Meinel wrote: Matthew Nuzum wrote: On 7/26/05, Dan Harris [EMAIL PROTECTED] wrote: I am working on a process that will be inserting tens of million rows and need this to be as quick as possible. The catch is that for each row I could potentially insert, I need to look and see if the relationship is already there to prevent multiple entries. Currently I am doing a SELECT before doing the INSERT, but I recognize the speed penalty in doing to operations. I wonder if there is some way I can say insert this record, only if it doesn't exist already. To see if it exists, I would need to compare 3 fields instead of just enforcing a primary key. I struggled with this for a while. At first I tried stored procedures and triggers, but it took very long (over 24 hours for my dataset). After several iterations of rewritting it, first into C# then into Python I got the whole process down to under 30 min. My scenario is this: I want to normalize log data. For example, for the IP address in a log entry, I need to look up the unique id of the IP address, or if the IP address is new, insert it and then return the newly created entry. Multiple processes use the data, but only one process, run daily, actually changes it. Because this one process knows that the data is static, it selects the tables into in-memory hash tables (C#) or Dictionaries (Python) and then does the lookups there. It is *super* fast, but it uses a *lot* of ram. ;-) To limit the ram, I wrote a version of the python code that uses gdbm files instead of Dictionaries. This requires a newer version of Python (to allow a gdbm db to work just like a dictionary) but makes life easier in case someone is using my software on a lower end machine. This doubled the time of the lookups from about 15 minutes to 30, bringing the whole process to about 45 minutes. Did you ever try the temp table approach? You could: COPY all records into temp_table, with an empty row for ip_id -- Get any entries which already exist UPDATE temp_table SET ip_id = (SELECT ip_id from ipaddress WHERE add=add) WHERE EXISTS (SELECT ip_id FROM ipaddress WHERE add=add); -- Create new entries INSERT INTO ipaddress(add) SELECT add FROM temp_table WHERE ip_id IS NULL; -- Update the rest UPDATE temp_table SET ip_id = (SELECT ip_id from ipaddress WHERE add=add) WHERE ip_id IS NULL AND EXISTS (SELECT ip_id FROM ipaddress WHERE add=add); This would let the database do all of the updating work in bulk on it's side, rather than you pulling all the data out and doing it locally. An alternative would be something like: CREATE TEMP TABLE new_ids (address text, ip_id int); COPY all potentially new addresses into that table. -- Delete all entries which already exist DELETE FROM new_ids WHERE EXISTS (SELECT ip_id FROM ipaddresses WHERE add=new_ids.address); -- Now create the new entries INSERT INTO ipaddresses(add) SELECT address FROM new_ids; -- At this point you are guaranteed to have all addresses existing in -- the database If you then insert your full data into the final table, only leave the ip_id column as null. Then if you have a partial index where ip_id is NULL, you could use the command: UPDATE final_table SET ip_id = (SELECT ip_id FROM ipaddresses WHERE add=final_table.add) WHERE ip_id IS NULL; You could also do this in a temporary table, before bulk inserting into the final table. I don't know what you have tried, but I know that for Dan, he easily has 36M rows. So I don't think he wants to pull that locally and create a in-memory hash just to insert 100 rows or so. Also, for your situation, if you do keep a local cache, you could certainly save the cache between runs, and use a temp table to determine what new ids you need to add to it. Then you wouldn't have to pull the complete set each time. You just pull new values for entries you haven't added yet. John =:- ---(end of broadcast)--- TIP 5: don't
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Fri, 2005-07-08 at 16:43 +0200, Enrico Weigelt wrote: * PFC [EMAIL PROTECTED] wrote: snip For Python it's the reverse : the MySQL driver is slow and dumb, and the postgres driver (psycopg 2) is super fast, handles all quoting, and knows about type conversions, it will automatically convert a Python List into a postgres Array and do the right thing with quoting, and it works both ways (ie you select a TEXT[] you get a list of strings all parsed for you). It knows about all the postgres types (yes even numeric = python Decimal) and you can even add your own types. That's really cool, plus the developer is a friendly guy. Is there anything similar for java ? The postgres JDBC driver is very good-- refer to pgsql-jdbc mailing list or look at jdbc.postgresql.org. I've had only limited experience with the mysql jdbc driver, but it seemed servicable enough, if you can live with their licensing and feature set. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Needed: Simplified guide to optimal memory
We run the RPM's for RH 7.3 on our 7.2 install base with no problems. RPM's as recent as for PostgreSQL 7.4.2 are available here: ftp://ftp10.us.postgresql.org/pub/postgresql/binary/v7.4.2/redhat/redhat-7.3/ Or you can always compile from source. There isn't any such thing as a 'supported' package for RH7.2 anyway. -- Mark Lewis On Thu, 2005-06-16 at 07:46 -0700, Todd Landfried wrote: Yes, it is 7.2. Why? because an older version of our software runs on RH7.3 and that was the latest supported release of Postgresql for RH7.3 (that we can find). We're currently ported to 8, but we still have a large installed base with the other version. On Jun 15, 2005, at 7:18 AM, Tom Lane wrote: Dennis Bjorklund [EMAIL PROTECTED] writes: On Wed, 15 Jun 2005, Todd Landfried wrote: NOTICE: shared_buffers is 256 This looks like it's way too low. Try something like 2048. It also is evidently PG 7.2 or before; SHOW's output hasn't looked like that in years. Try a more recent release --- there's usually nontrivial performance improvements in each major release. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL
If the original paper was published in 1984, then it's been more than 20 years. Any potential patents would already have expired, no? -- Mark Lewis On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote: Quoting Jim C. Nasby [EMAIL PROTECTED]: Well, in a hash-join right now you normally end up feeding at least one side of the join with a seqscan. Wouldn't it speed things up considerably if you could look up hashes in the hash index instead? You might want to google on grace hash and hybrid hash. The PG hash join is the simplest possible: build a hash table in memory, and match an input stream against it. *Hybrid hash* is where you spill the hash to disk in a well-designed way. Instead of thinking of it as building a hash table in memory, think of it as partitioning one input; if some or all of it fits in memory, all the better. The boundary condition is the same. The real wizard of hybrid hash has to be Goetz Graefe, who sadly has now joined the MS Borg. He demonstrated that for entire-table joins, hybrid hash completely dominates sort-merge. MSSQL now uses what he developed as an academic, but I don't know what the patent state is. Grace hash is the original implementation of hybrid hash: Kitsuregawa, M., Tanaka, H., and Moto-oka, T. (1984). Architecture and Performance of Relational Algebra Machine Grace. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [NOVICE] Many connections lingering
If there are potentially hundreds of clients at a time, then you may be running into the maximum connection limit. In postgresql.conf, there is a max_connections setting which IIRC defaults to 100. If you try to open more concurrent connections to the backend than that, you will get a connection refused. If your DB is fairly gnarly and your performance needs are minimal it should be safe to increase max_connections. An alternative approach would be to add some kind of database broker program. Instead of each agent connecting directly to the database, they could pass their data to a broker, which could then implement connection pooling. -- Mark Lewis On Tue, 2005-04-12 at 22:09, Slavisa Garic wrote: This is a serious problem for me as there are multiple users using our software on our server and I would want to avoid having connections open for a long time. In the scenario mentioned below I haven't explained the magnitute of the communications happening between Agents and DBServer. There could possibly be 100 or more Agents per experiment, per user running on remote machines at the same time, hence we need short transactions/pgsql connections. Agents need a reliable connection because failure to connect could mean a loss of computation results that were gathered over long periods of time. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Delete query takes exorbitant amount of time
Tom Lane Wrote: Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show the runtime expended in each trigger when the statement is of a kind that has triggers. We couldn't break down the time *within* the triggers, but even this info would help a lot in terms of finger pointing ... Seq Scan on ... (nn.nnn ms) Trigger foo: nn.mmm ms Trigger bar: nn.mmm ms Total time: nn.mmm ms So I got the latest from CVS on Friday night to see how hard it would be to implement this, but it turns out that Tom has already committed the improvement, so I'm in Tom's fan club today. I imported my test dataset and was almost immediately able to track down the cause of my performance problem. Thanks! Mark Lewis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Delete query takes exorbitant amount of time
Tom, I've got a similar problem with deletes taking a very long time. I know that there are lots of foreign keys referencing this table, and other foreign keys referencing those tables, etc. I've been curious, is there a way to find out how long the foreign key checks take for each dependent table? -- Mark Lewis On Thu, 2005-03-24 at 16:52, Tom Lane wrote: Karim Nassar [EMAIL PROTECTED] writes: Here is the statement: orfs=# explain analyze DELETE FROM int_sensor_meas_type WHERE id_meas_type IN (SELECT * FROM meas_type_ids); QUERY PLAN - Hash Join (cost=11.53..42.06 rows=200 width=6) (actual time=1.564..2.840 rows=552 loops=1) ... Total runtime: 2499616.216 ms (7 rows) Notice that the actual join is taking 2.8 ms. The other ~40 minutes is in operations that we cannot see in this plan, but we can surmise are ON DELETE triggers. Where do I go from here? Look at what your triggers are doing. My private bet is that you have unindexed foreign keys referencing this table, and so each deletion forces a seqscan of some other, evidently very large, table(s). regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match