Re: [PERFORM] Postgres Benchmark Results
I assume red is the postgresql. AS you add connections, Mysql always dies. On 5/20/07, PFC <[EMAIL PROTECTED]> wrote: I felt the world needed a new benchmark ;) So : Forum style benchmark with simulation of many users posting and viewing forums and topics on a PHP website. http://home.peufeu.com/ftsbench/forum1.png One of those curves is "a very popular open-source database which claims to offer unparallelled speed". The other one is of course Postgres 8.2.3 which by popular belief is "full-featured but slow" What is your guess ? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] MVCC performance issue
In reading what you are describing, don't you think PG 9 goes a long way to helping you out? On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer wrote: > On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote: > > The >> result is to have huge fragmentation on table space, unnecessary updates >> in all affected indexes, unnecessary costly I/O operations, poor >> performance on SELECT that retrieves big record sets (i.e. reports etc) >> and slower updates. >> > > Yep. It's all about trade-offs. For some workloads the in-table MVCC > storage setup works pretty darn poorly, but for most it seems to work quite > well. > > There are various other methods of implementing relational storage with > ACID properties. You can exclude all other transactions while making a > change to a table, ensuring that nobody else can see "old" or "new" rows so > there's no need to keep them around. You can use an out-of-line redo log > (a-la Oracle). Many other methods exist, too. > > They all have advantages and disadvantages for different workloads. It's > far from trivial to mix multiple schemes within a single database, so mixing > and matching schemes for different parts of your DB isn't generally > practical. > > > 1) When a raw UPDATE is performed, store all "new raw versions" either >> in separate temporary table space >>or in a reserved space at the end of each table (can be allocated >> dynamically) etc >> > > OK, so you want a redo log a-la Oracle? > > > 2) Any SELECT queries within the same session will be again accessing >> the new version of the row >> 3) Any SELECT queries from other users will still be accessing the old >> version >> > > ... and incurring horrible random I/O penalties if the redo log doesn't fit > in RAM. Again, a-la Oracle. > > Even read-only transactions have to hit the undo log if there's an update > in progress, because rows they need may have been moved out to the undo log > as they're updated in the main table storage. > > [snip description] > > > I understand that my suggestion seems to be too simplified and also that >> there are many implementation details and difficulties that I am not >> aware. >> > > It sounds like you're describing Oracle-style MVCC, using redo logs. > > > http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ > > http://en.wikipedia.org/wiki/Multiversion_concurrency_control > > Oracle's MVCC approach has its own costs. Like Pg's, those costs increase > with update/delete frequency. Instead of table bloat, Oracle suffers from > redo log growth (or redo log size management issues). Instead of increased > table scan costs from dead rows, Oracle suffers from random I/O costs as it > looks up the out-of-line redo log for old rows. Instead of long-running > writer transactions causing table bloat, Oracle can have problems with > long-running reader transactions aborting when the redo log runs out of > space. > > Personally, I don't know enough to know which is "better". I suspect > they're just different, with different trade-offs. If redo logs allow you > to do without write-ahead logging, that'd be interesting - but then, the > WAL is useful for all sorts of replication options, and the use of linear > WALs means that write ordering in the tables doesn't need to be as strict, > which has performance advantages. > > -- > Craig Ringer > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I have to concur. Sql is written specifially and only for Windows. It is optimized for windows. Postgreal is writeen for just about everything trying to use common code so there isn't much optimization because it has to be optimized based on the OS that is running it. Check out your config and send it to us. That would include the OS and hardware configs for both machines. On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra wrote: > Dne 17.11.2010 05:47, Pavel Stehule napsal(a): > > 2010/11/17 Humair Mohammed : > >> > >> There are no indexes on the tables either in SQL Server or Postgresql - > I am > >> comparing apples to apples here. I ran ANALYZE on the postgresql tables, > > Actually no, you're not comparing apples to apples. You've provided so > little information that you may be comparing apples to cucumbers or > maybe some strange animals. > > 1) info about the install > > What OS is this running on? I guess it's Windows in both cases, right? > > How nuch memory is there? What is the size of shared_buffers? The > default PostgreSQL settings is very very very limited, you have to bump > it to a much larger value. > > What are the other inportant settings (e.g. the work_mem)? > > 2) info about the dataset > > How large are the tables? I don't mean number of rows, I mean number of > blocks / occupied disk space. Run this query > > SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid)) > FROM pg_class WHERE relname IN ('table1', 'table2'); > > 3) info about the plan > > Please, provide EXPLAIN ANALYZE output, maybe with info about buffers, > e.g. something like > > EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ... > > 4) no indexes ? > > Why have you decided not to use any indexes? If you want a decent > performance, you will have to use indexes. Obviously there is some > overhead associated with them, but it's premature optimization unless > you prove the opposite. > > BTW I'm not a MSSQL expert, but it seems like it's building a bitmap > index on the fly, to synchronize parallelized query - PostgreSQL does > not support that. > > regards > Tomas > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Performance of PostgreSQL over NFS
I am wondering why anyone would do that? Too much overhead and no reliable enough. On Tue, Dec 21, 2010 at 2:28 PM, Mladen Gogala wrote: > I was asked about performance of PostgreSQL on NetApp, the protocol should > be NFSv3. Has anybody tried it? The database in question is a DW type, a > bunch of documents indexed by Sphinx. Does anyone have any information? > -- > > > Mladen Gogala Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > http://www.vmsinfo.com The Leader in Integrated Media Intelligence > Solutions > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Making the most of memory?
Josh what about the rest of your system? What operating system? Your hardware setup. Drives? Raids? What indices do you have setup for these queries? There are other reasons that could cause bad queries performance. On Jan 22, 2008 11:11 PM, Joshua Fielek <[EMAIL PROTECTED]> wrote: > > Hey folks -- > > For starters, I am fairly new to database tuning and I'm still learning > the ropes. I understand the concepts but I'm still learning the real > world impact of some of the configuration options for postgres. > > We have an application that has been having some issues with performance > within postgres 8.1.9 and later 8.2.5. The upgrade to 8.2.5 gained us a > nice little performance increase just off the improved query > optimization, but we are still having other performance issues. > > The database itself is not that large -- a db_dump of the sql file as > text is only about 110MB. I haven't checked the exact size of the actual > data base, but the entire data directory is smaller than the available > memory at about 385MB including logs and config files. This is a single > database with a relatively small number of client connections (50 or so) > making a fair number of smaller queries. This is not a massive data > effort by any means at this time, but it will be growing. > > We have available currently ~4GB (8GB total) for Postgres. We will be > moving to a server that will have about 24GB (32GB total) available for > the database, with the current server becoming a hot backup, probably > with slony or something similar to keep the databases in sync. > > I've been monitoring the memory usage of postgres on the current system > and it seems like none of the threads ever allocate more than about > 400MB total and about 80-90MB shared memory. It seems to me that since > we have a very large chunk of memory relative to the database size we > should be loading the entire database into memory. How can we be sure > we're getting the most out of the memory we're allocating to postgres? > What can we do to improve the memory usage, looking for performance > first and foremost, on both the larger and smaller systems? > > Here's the salient config items for the 8GB system: > > max_connections = 200# realistically we expect 50-150 open > shared_buffers = 38000 > sort_mem = 1048576 > work_mem = 32000 > maintenance_work_mem = 32000 > max_fsm_pages = 480001# probably too large for the max_fsm_* > max_fsm_relations = 2# items; one Db with ~400 tables. > effective_cache_size = 212016# ~2GB, could probably double this > > > Thanks, > J > -- > Joshua J. Fielek > Sr. Software Engineer > Concursive Corporation > 223 East City Hall Ave., Suite 212 > Norfolk, VA 23510 > Phone : (757) 627-3002x6656 > Mobile : (757) 754-4462 > Fax: (757) 627-8773 > Email : [EMAIL PROTECTED] > http://www.concursive.com > > ---(end of broadcast)--- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] What is the best way to storage music files in Postgresql
I am going to embarkon building a music library using apache, postgresql and php. What is the best way to store the music files? Which file type should I use? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <[EMAIL PROTECTED]> wrote: > > > I am going to embarkon building a music library using apache, > > > postgresql and php. What is the best way to store the music files? > > > > Your options are either to use a BLOB within the database or to store > > paths to normal files in the file system in the database. I suspect > > using normal files will make backup and management a great deal easier > > than using in-database BLOBs, so personally I'd do it that way. > > I discussed something like this with some co-workers recently, and > here's what I had to say. Not all of these apply to the original > message, but they are things to consider when marrying a database to a > file storage system. > > Storing the files in the database as BLOBs: > Pros: > - The files can always be seen by the database system as long as it's > up (there's no dependence on an external file system). > - There is one set of locking mechanisms, meaning that the file > operations can be atomic with the database operations. > - There is one set of permissions to deal with. > Cons: > - There is almost no way to access files outside of the database. If > the database goes down, you are screwed. > - If you don't make good use of tablespaces and put blobs on a > separate disk system, the disk could thrash going between data and > blobs, affecting performance. > - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've > read). > > Storing files externally, storing pathnames in the database: > Pros: > - You can access and manage files from outside the database and > possibly using different interfaces. > - There's a lot less to store directly in the database. > - You can use existing file-system permissions, mechanisms, and limits. > Cons: > - You are dealing with two storage systems and two different locking > systems which are unlikely to play nice with each other. Transactions > are not guaranteed to be atomic (e.g. a database rollback will not > rollback a file system operation, a commit will not guarantee that > data in a file will stay). > - The file system has to be seen by the database system and any remote > clients that wish to use your application, meaning that a networked FS > is likely to be used (depending on how many clients you have and how > you like to separate services), with all the fun that comes from > administering one of those. Note that this one in particular really > only applies to enterprise-level installations, not smaller > installations like the original poster's. > - If you don't put files on a separate disk-system or networked FS, > you can get poor performance from the disk thrashing between the > database and the files. > > There are a couple main points: > 1. The favorite answer in computing, "it depends", applies here. What > you decide depends on your storage system, your service and > installation policies, and how important fully atomic transactions are > to you. > 2. If you want optimal performance out of either of these basic > models, you should make proper use of separate disk systems. I have no > idea which one is faster (it depends, I'm sure) nor do I have much of > an idea of how to benchmark this properly. > > Peter > It seems to me as such a database gets larger, it will become much harder to > manage with the 2 systems. I am talking mostly about music. So each song > should not get too large. I have read alot on this list and on other > resources and there seems to be leanings toward 1+0 raids for storage. It > seems to the most flexible when it comes to speed, redundancy and recovery > time. I do want my database to be fully atomic. I think that is important > as this database grows. Are my assumptions wrong? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
On Sun, Mar 16, 2008 at 2:25 AM, Craig Ringer <[EMAIL PROTECTED]> wrote: > Rich wrote: > > I am going to embarkon building a music library using apache, > > postgresql and php. What is the best way to store the music files? > > Your options are either to use a BLOB within the database or to store > paths to normal files in the file system in the database. I suspect > using normal files will make backup and management a great deal easier > than using in-database BLOBs, so personally I'd do it that way. > > Storing the audio files in the database does make it easier to keep the > database and file system backups in sync, but I'm not really sure that's > worth the costs. What costs are to speaking of? > > I'm sure that what you're doing has been done many times before, though, > so even if you're not going to use one of the existing options you might > at least want to have a look at how they work. > > > > Which file type should I use? > > I'm not sure I understand this question. Are you asking which audio > compression codec and audio container file type (like "mp3", "aac", etc) > you should use? If so, this is really not the right place to ask that. > > Do you mean which file /system/ ? > > -- > Craig Ringer > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT EXISTS or LEFT JOIN which one is better?
Al I have looked at this before and I am not sure the effort is worth all the thought about it. Let your explain tell you which is better. I read this link a year ago. http://stackoverflow.com/questions/227037/can-i-get-better-performance-using-a-join-or-using-exists On Sun, Apr 29, 2012 at 5:27 AM, AI Rumman wrote: > I can write a query to solve my requirement in any of the followings :- > > 1. > select * > from a > where NOT EXISTS > ( > select 1 > from b > where a.id = b.id) > union all > select * > from b > > > 2. > select > ( > case when b.id is not null then >b.id >else >a.id > ) as id > from a > left join b > on a.id = b.id > > Any one please tell me which one is better? >
[PERFORM] From: Rich
Hi pgsql http://activebillion.com/bring.php?fzuvceubqu3101hcvfvcq Rich -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Outer join query plans and performance
I tried on pgsql-general but got no reply. re-posting here as it's probably the best place to ask I'm having some significant performance problems with left join. Can anyone give me any pointers as to why the following 2 query plans are so different? EXPLAIN SELECT * FROM tokens.ta_tokens t LEFT JOIN tokens.ta_tokenhist h1 ON t.token_id = h1.token_id LEFT JOIN tokens.ta_tokenhist h2 ON t.token_id = h2.token_id WHERE h1.histdate = 'now'; Nested Loop Left Join (cost=0.00..68778.43 rows=2215 width=1402) -> Nested Loop (cost=0.00..55505.62 rows=2215 width=714) -> Index Scan using idx_tokenhist__histdate on ta_tokenhist h1 (cost=0.00..22970.70 rows=5752 width=688) Index Cond: (histdate = '2005-10-24 13:28:38.411844'::timestamp without time zone) -> Index Scan using ta_tokens_pkey on ta_tokens t (cost=0.00..5.64 rows=1 width=26) Index Cond: ((t.token_id)::integer = ("outer".token_id)::integer) -> Index Scan using fkx_tokenhist__tokens on ta_tokenhist h2 (cost=0.00..5.98 rows=1 width=688) Index Cond: (("outer".token_id)::integer = (h2.token_id)::integer) Performance is fine for this one and the plan is pretty much as i'd expect. This is where i hit a problem. EXPLAIN SELECT * FROM tokens.ta_tokens t LEFT JOIN tokens.ta_tokenhist h1 ON t.token_id = h1.token_id LEFT JOIN tokens.ta_tokenhist h2 ON t.token_id = h2.token_id WHERE h2.histdate = 'now'; Hash Join (cost=1249148.59..9000709.22 rows=2215 width=1402) Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer) -> Hash Left Join (cost=1225660.51..8181263.40 rows=4045106 width=714) Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer) -> Seq Scan on ta_tokens t (cost=0.00..71828.06 rows=4045106 width=26) -> Hash (cost=281243.21..281243.21 rows=10504921 width=688) -> Seq Scan on ta_tokenhist h1 (cost=0.00..281243.21 rows=10504921 width=688) -> Hash (cost=22970.70..22970.70 rows=5752 width=688) -> Index Scan using idx_tokenhist__histdate on ta_tokenhist h2 (cost=0.00..22970.70 rows=5752 width=688) Index Cond: (histdate = '2005-10-24 13:34:51.371905'::timestamp without time zone) I would understand if h2 was joined on h1, but it isn't. It only joins on t. can anyone give any tips on improving the performance of the second query (aside from changing the join order manually)? select version(); version -- PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6) Thanks -- - Rich Doughty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Outer join query plans and performance
Tom Lane wrote: Rich Doughty <[EMAIL PROTECTED]> writes: EXPLAIN SELECT * FROM tokens.ta_tokens t LEFT JOIN tokens.ta_tokenhist h1 ON t.token_id = h1.token_id LEFT JOIN tokens.ta_tokenhist h2 ON t.token_id = h2.token_id WHERE h1.histdate = 'now'; EXPLAIN SELECT * FROM tokens.ta_tokens t LEFT JOIN tokens.ta_tokenhist h1 ON t.token_id = h1.token_id LEFT JOIN tokens.ta_tokenhist h2 ON t.token_id = h2.token_id WHERE h2.histdate = 'now'; The reason these are different is that the second case constrains only the last-to-be-joined table, so the full cartesian product of t and h1 has to be formed. If this wasn't what you had in mind, you might be able to rearrange the order of the LEFT JOINs, but bear in mind that in general, changing outer-join ordering changes the results. (This is why the planner won't fix it for you.) FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries in approximately 3 seconds. postgres does the first in 6 seconds and the second in a lot longer (eventually abandoned). -- - Rich Doughty ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Outer join query plans and performance
Tom Lane wrote: Rich Doughty <[EMAIL PROTECTED]> writes: Tom Lane wrote: The reason these are different is that the second case constrains only the last-to-be-joined table, so the full cartesian product of t and h1 has to be formed. If this wasn't what you had in mind, you might be able to rearrange the order of the LEFT JOINs, but bear in mind that in general, changing outer-join ordering changes the results. (This is why the planner won't fix it for you.) FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries in approximately 3 seconds. Does mysql get the correct answer, though? It's hard to see how they do this fast unless they (a) are playing fast and loose with the semantics, or (b) have very substantially more analysis logic for OUTER JOIN semantics than we do. Perhaps mysql 5.x is better about this sort of thing, but for 4.x I'd definitely find theory (a) more plausible than (b). i would assume so. i'll re-run my testcase later and verify the results of the two side-by-side. The cases that would be interesting are those where rearranging the outer join order actually does change the correct answer --- it may not in this particular case, I haven't thought hard about it. It seems fairly likely to me that they are rearranging the join order here, and I'm just wondering whether they have the logic needed to verify that such a transformation is correct. regards, tom lane -- - Rich Doughty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Strange query plan invloving a view
x27;sold'::text) -> Index Scan using fkx_tokenhist__tokens on ta_tokenhist r (cost=0.00..17.96 rows=1 width=246) Index Cond: (("outer".token_id)::integer = (r.token_id)::integer) Filter: ((status)::text = 'redeemed'::text) This query returns a lot quicker than the plan would suggest, as the planner is over-estimating the amount of rows where ((sarreport_id)::integer = 9). it thinks there are 53430 when in fact there are only 7 (despite a vacuum and analyse). Can anyone give me any suggestions? are the index stats the cause of my problem, or is it the rewrite of the query? Cheers Version: PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6) -- - Rich Doughty ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Strange query plan invloving a view
Tom Lane wrote: Rich Doughty <[EMAIL PROTECTED]> writes: However, the following query (which i believe should be equivalent) SELECT * FROM tokens.ta_tokenhist h INNER JOIN tokens.ta_tokens t ON h.token_id = t.token_id LEFT JOIN tokens.ta_tokenhist i ON t.token_id = i.token_id AND i.status = 'issued' LEFT JOIN tokens.ta_tokenhist s ON t.token_id = s.token_id AND s.status = 'sold' LEFT JOIN tokens.ta_tokenhist r ON t.token_id = r.token_id AND r.status = 'redeemed' WHERE h.sarreport_id = 9 ; No, that's not equivalent at all, because the implicit parenthesization is left-to-right; therefore you've injected the constraint to a few rows of ta_tokenhist (and therefore only a few rows of ta_tokens) into the bottom of the LEFT JOIN stack. In the other case the constraint is at the wrong end of the join stack, and so the full view output gets formed before anything gets thrown away. Some day the Postgres planner will probably be smart enough to rearrange the join order despite the presence of outer joins ... but today is not that day. thanks for the reply. is there any way i can achieve what i need to by using views, or should i just use a normal query? i'd prefer to use a view but i just can't get round the performance hit. -- - Rich Doughty ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] view of view
Keith Worthington wrote: Hi All, I am working on an application that uses PostgreSQL. One of the functions of the application is to generate reports. In order to keep the code in the application simple we create a view of the required data in the database and then simply execute a SELECT * FROM view_of_the_data; All of the manipulation and most of the time even the ordering is handled in the view. My question is how much if any performance degradation is there in creating a view of a view? IOW if I have a view that ties together a couple of tables and manipulates some data what will perform better; a view that filters, manipulates, and orders the data from the first view or a view that performs all the necessary calculations on the original tables? from personal experience, if the inner views contain outer joins performance isn't that great. -- - Rich Doughty ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Optimizing timestamp queries? Inefficient Overlaps?
I have a table similar to this: CREATE TABLE event_resources ( event_resource_id serial NOT NULL, event_id integer NOT NULL, resource_id integer NOT NULL, start_date timestamptz NOT NULL, end_date timestamptz NOT NULL, CONSTRAINT event_resources_pkey PRIMARY KEY (event_resource_id) ); Where the same resource can be added to an event multiple times. Since the table spans a few years, any day queried should return at most 0.1% of the table, and seems perfect for indexes. So I add these: CREATE INDEX er_idx1 ON event_resources (start_date); CREATE INDEX er_idx2 ON event_resources (end_date); One query I need to perform is "All event resources that start or end on a particular day". The first thing that comes to mind is this: select * from event_resources er where er.start_date::date = $1::date or er.end_date::date = $1::date This is very slow. Pg chooses a sequential scan. (I am running vacuum and analyze) Shouldn't Pg be able to use an index here? I've tried creating function indexes using cast, but Pg returns this error message: ERROR: functions in index expression must be marked IMMUTABLE Which I assume is related to timezones and daylight saving issues in converting a timestamptz into a plain date. This form strangely won't use an index either: select * from event_resources er where (er.start_date, er.end_date) overlaps ($1::date, $1::date+1) This is the only query form I've found that will use an index: select * from event_resources er where (er.start_date >= $1::date and er.start_date < ($1::date+1)) or (er.end_date >= $1::date and er.end_date < ($1::date+1)) I know it's not exactly the same as the overlaps method, but since this works I would expect OVERLAPS to work as well. I prefer overlaps because it's clean and simple, self documenting. Another (similar) query I need to perform is "All event resources that overlap a given time range". Seems tailor-made for OVERLAPS: select * from event_resources er where (er.start_date, er.end_date) overlaps ($1::timestamptz, $2::timestamptz) Again. can't get this to use an index. I have to use this again: select * from event_resources er where (er.start_date >= $1::timestamptz and er.start_date < $2::timestamptz) or (er.end_date >= $1::timestamptz and er.end_date < $2::timestamptz) What am I doing wrong? This is Pg 8.1.2 on RHEL 4.
Re: [PERFORM] Postgresql Configutation and overflow
What are your table sizes? What are your queries like? (Mostly read, mostly write?) Can you post the "analyze" output for some of the slow queries? The three things that stand out for me is your disk configuration (RAID 5 is not ideal for databases, you really want RAID 1 or 1+0) and also that you have enable_seqscan set to off. I would leave that turned on.Lastly, your effective_cache_size looks low. Your OS is probably caching more than 512 MB, I know mine is usually 1-2 GB and I don't have 12 GB of ram available. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of fabrix peñuelas Sent: Thursday, December 28, 2006 7:58 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Postgresql Configutation and overflow Good day, I have been reading about the configuration of postgresql, but I have a server who does not give me the performance that should. The tables are indexed and made vacuum regularly, i monitor with top, ps and pg_stat_activity and when i checked was slow without a heavy load overage. Before, the server reached 2000 connections to postgresql (with max_connections=3000 in it for future workflow). I divided the load with another server for better performance, and now reach 500 connections, but yet is overflow. My question is about how much memory should i configure in shared_buffers and effective_cache_size. Features: - 4 Processsors Intel Xeon Dual 3.0Ghz - 12 GB RAM - 2 discos en RAID 1 for OS - 4 discs RAID 5 for DB - S.O Slackware 11.0 Linux 2.6.17.7 - Postgres 8.1.4 =In internet i found this: Tuning PostgreSQL for performance 2 Some basic parameters 2.1 Shared buffers # Start at 4MB (512) for a workstation # Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096) # Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768) == My postgresql.conf configuration is: #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir'# use data in another directory #hba_file = 'ConfigDir/pg_hba.conf'# host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf'# IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)'# write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - listen_addresses = '*'# what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all port = 5432 max_connections = 3000 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. #superuser_reserved_connections = 2 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777# octal #bonjour_name = ''# defaults to the computer name #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 81920# min 16 or max_connections*2, 8KB each temp_buffers = 5000# min 100, 8KB each max_prepared_transactions = 1000# can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 10240# min 64, size in KB maintenance_work_mem = 253952# min 1024, size in KB max_stack_depth = 4096# min 100, size in KB # - Free Space Map - #max_fsm_pages = 2# min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000# min 100, ~70 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000# min 25 #preload_libraries = '' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0# 0-1000 milliseconds #vacuum_cost_page_hit = 1# 0-1 credits #vacuum_cost_page_miss = 10# 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits # - Background writer - #bgwriter_delay = 200# 10-1 milliseconds between rounds #bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5# 0-1000 buffers max written/round #bgwriter_all_percent = 0.333# 0
Re: [PERFORM] Trivial function query optimized badly
Craig, What version of postgres are you using? I just tested this on PG 8.1.2 and was unable to reproduce these results. I wrote a simple function that returns the same text passed to it, after sleeping for 1 second. I use it in a where clause, like your example below, and regardless of the number of rows in the table, it still takes roughly 1 second, indicating to me the function is only called once. Is it possible that your function really isn't immutable? Would PG realize this and fall back to treating it as VOLATILE ? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Craig A. James Sent: Wednesday, January 03, 2007 9:11 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Trivial function query optimized badly Well, once again I'm hosed because there's no way to tell the optimizer the cost for a user-defined function. I know this issue has already been raised (by me!) several times, but I have to remind everyone about this. I frequently must rewrite my SQL to work around this problem. Here is the function definition: CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles' LANGUAGE 'C' STRICT IMMUTABLE; Here is the bad optimization: db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1); QUERY PLAN Seq Scan on version (cost=0.00..23.41 rows=1 width=4) (actual time=1434.281..1540.253 rows=1 loops=1) Filter: (isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O'::text, 1)) Total runtime: 1540.347 ms (3 rows) I've had to break it up into two separate queries. Ironically, for large databases, Postgres does the right thing -- it computes the function, then uses the index on the "isosmiles" column. It's blazingly fast and very satisfactory. But for small databases, it apparently decides to recompute the function once per row, making the query N times slower (N = number of rows) than it should be! In this instance, there are 1000 rows, and factor of 10^4 is a pretty dramatic slowdown... To make it work, I had to call the function separately then use its result to do the select. db=> explain analyze select cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1); QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.692..1.694 rows=1 loops=1) Total runtime: 1.720 ms (2 rows) db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'; QUERY PLAN - Index Scan using i_version_isosmiles on version (cost=0.00..5.80 rows=1 width=4) (actual time=0.114..0.117 rows=1 loops=1) Index Cond: (isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'::text) Total runtime: 0.158 ms (3 rows) Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow Query on Postgres 8.2
Dave, Is it me or are the two examples you attached returning different row counts? That means either the source data is different, or your queries are. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Dutcher Sent: Thursday, January 04, 2007 5:32 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Slow Query on Postgres 8.2 Hello, I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which runs a lot slower. Here is the query: select type, currency_id, instrument_id, sum(amount) as total_amount from om_transaction where strategy_id in ('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7',' BASKET8','BASKET9','BASKET10','BASKET11') and owner_trader_id in ('dave','sam','bob','tad', 'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milh ouse','disco stu') and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29) and as_of_date > '2006-12-04' and as_of_date <= '2006-12-05' group by type, currency_id, instrument_id; I changed the values in the in statements to fake ones, but it still takes over three seconds on 8.2, where 8.1 only takes 26 milliseconds. When I increase the number of valules in the IN clauses, the query rapidly gets worse. I tried increasing my stats target to 1000 and analyzing, but that didn't help so I put that back to 10. While the query is running the CPU is at 100%. Is there a more efficient way to write a query like this? I've attached the output from EXPLAIN ANALYZE in a file because it is somewhat large. Thanks, Dave Dutcher Telluride Asset Management 952.653.6411
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
I'm using 8.2 and using order by & limit is still faster than MAX() even though MAX() now seems to rewrite to an almost identical plan internally. Count(*) still seems to use a full table scan rather than an index scan. Using one of our tables, MySQL/Oracle/MS-SQL all return instantly while PG takes longer ther 700ms. Luckily we can design around this issue. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Craig A. James Sent: Sunday, January 07, 2007 5:57 PM To: Guy Rouillier; PostgreSQL Performance Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS Craig A. James wrote: > The "idiom" to replace count() was > "select col from tbl order by col desc limit 1". It worked miracles for > my app. Sorry, I meant to write, "the idiom to replace MAX()", not count()... MAX() was the function that was killing me, 'tho count() also gave me problems. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Here's the queries and explains... Granted, it's not a huge difference here, but both timings are absolutely consistent. Using max(), this runs almost 15 queries/sec and "limit 1" runs at almost 40 queries/sec. Is the differene in explain analyze expected behavior? (rows=168196 vs. rows=1) (The table is freshly analayzed) select max(item_id) from receipt_items Result (cost=0.04..0.05 rows=1 width=0) (actual time=0.030..0.031 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.023..0.024 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items (cost=0.00..6883.71 rows=168196 width=4) (actual time=0.020..0.020 rows=1 loops=1) Filter: (item_id IS NOT NULL) Total runtime: 0.067 ms select item_id from receipt_items order by item_id desc limit 1 Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items (cost=0.00..6883.71 rows=168196 width=4) (actual time=0.008..0.008 rows=1 loops=1) Total runtime: 0.026 ms A couple more similar examples from this table: select max(create_date) from receipt_items Result (cost=0.05..0.06 rows=1 width=0) (actual time=0.032..0.032 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.025..0.026 rows=1 loops=1) -> Index Scan Backward using test_idx_1 on receipt_items (cost=0.00..7986.82 rows=168196 width=8) (actual time=0.022..0.022 rows=1 loops=1) Filter: (create_date IS NOT NULL) Total runtime: 0.069 ms select create_date from receipt_items order by create_date desc limit 1; Limit (cost=0.00..0.05 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1) -> Index Scan Backward using test_idx_1 on receipt_items (cost=0.00..7986.82 rows=168196 width=8) (actual time=0.009..0.009 rows=1 loops=1) Total runtime: 0.027 ms -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Sunday, January 07, 2007 8:48 PM To: Adam Rich Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance' Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS "Adam Rich" <[EMAIL PROTECTED]> writes: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Care to quantify that? AFAICT any difference is within measurement noise, at least for the case of separately-issued SQL commands. > Count(*) still seems to use a full table scan rather than an index scan. Yup. Don't hold your breath for something different. Postgres has made design choices that make certain cases fast and others slow, and count(*) is one case that has come out on the short end of the stick. If that's your most important measure of performance, then indeed you should select a different database that's made different tradeoffs. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Here's another, more drastic example... Here the order by / limit version runs in less than 1/7000 the time of the MAX() version. select max(item_id) from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id Aggregate (cost=10850.84..10850.85 rows=1 width=4) (actual time=816.382..816.383 rows=1 loops=1) -> Hash Join (cost=2072.12..10503.30 rows=139019 width=4) (actual time=155.177..675.870 rows=147383 loops=1) Hash Cond: (ri.receipt_id = r.receipt_id) -> Seq Scan on receipt_items ri (cost=0.00..4097.56 rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1) -> Hash (cost=2010.69..2010.69 rows=24571 width=4) (actual time=155.146..155.146 rows=24571 loops=1) -> Hash Join (cost=506.84..2010.69 rows=24571 width=4) (actual time=34.803..126.452 rows=24571 loops=1) Hash Cond: (r.event_id = e.event_id) -> Seq Scan on receipts r (cost=0.00..663.58 rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1) -> Hash (cost=469.73..469.73 rows=14843 width=4) (actual time=34.780..34.780 rows=14843 loops=1) -> Seq Scan on events e (cost=0.00..469.73 rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1) Total runtime: 816.645 ms select item_id from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id order by item_id desc limit 1 Limit (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1 loops=1) -> Nested Loop (cost=0.00..22131.43 rows=139019 width=4) (actual time=0.044..0.044 rows=1 loops=1) -> Nested Loop (cost=0.00..12987.42 rows=168196 width=8) (actual time=0.032..0.032 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items ri (cost=0.00..6885.50 rows=168196 width=8) (actual time=0.016..0.016 rows=1 loops=1) -> Index Scan using receipts_pkey on receipts r (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1) Index Cond: (r.receipt_id = ri.receipt_id) -> Index Scan using events_pkey on events e (cost=0.00..0.04 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (e.event_id = r.event_id) Total runtime: 0.112 ms -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D. Drake Sent: Sunday, January 07, 2007 9:10 PM To: Adam Rich Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance' Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Gonna need you to back that up :) Can we get an explain analyze? > Count(*) still seems to use a full table scan rather than an index scan. > There is a TODO out there to help this. Don't know if it will get done. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] group by will not use an index?
That query looks strange to me (a group by without an aggregate). See if this is any faster: SELECT DISTINCT DATE(inserted) FROM Messages I won't hold my breath though, I don't think there's any way around the full table scan in Postgres, because the index does not contain enough information about transactional state, so table access is always required (unlike virtually every other type of db) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of tsuraan Sent: Tuesday, January 09, 2007 5:06 PM To: pgsql-performance Subject: [PERFORM] group by will not use an index? I have a table of messages with paths and inserted dates (among other things), like so: CREATE TABLE Messages ( msgkey BIGSERIAL PRIMARY KEY, path TEXT NOT NULL, inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW() ); I run a query to determine which days actually saw emails come in, like so: SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted); That's obviously not very efficient, so I made an index: CREATE INDEX messages_date_inserted_ind ON Messages(DATE(inserted)); However, GROUP BY does not use this index: =# explain analyze select date(inserted) from messages group by date(inserted); QUERY PLAN -- HashAggregate (cost=104773.10..104789.51 rows=1313 width=8) (actual time=31269.476..31269.557 rows=44 loops=1) -> Seq Scan on messages (cost=0.00..101107.25 rows=1466340 width=8) (actual time=23.923..25248.400 rows=1467036 loops=1) Total runtime: 31269.735 ms (3 rows) Is it possible to get pg to use an index in a group by? I don't see why it wouldn't be possible, but maybe I'm missing something. Using pg 8.1.4...
Re: [PERFORM] Partitioning
Each partition can have its own disk, without using subpartitions. CREATE TABLE th (id INT, name VARCHAR(30), adate DATE) PARTITION BY LIST(YEAR(adate)) ( PARTITION p1999 VALUES IN (1995, 1999, 2003) DATA DIRECTORY = '/var/appdata/95/data' INDEX DIRECTORY = '/var/appdata/95/idx', PARTITION p2000 VALUES IN (1996, 2000, 2004) DATA DIRECTORY = '/var/appdata/96/data' INDEX DIRECTORY = '/var/appdata/96/idx', PARTITION p2001 VALUES IN (1997, 2001, 2005) DATA DIRECTORY = '/var/appdata/97/data' INDEX DIRECTORY = '/var/appdata/97/idx', PARTITION p2000 VALUES IN (1998, 2002, 2006) DATA DIRECTORY = '/var/appdata/98/data' INDEX DIRECTORY = '/var/appdata/98/idx' ); Subpartitions are just a way to break (parent) partitions up into smaller pieces. Those of course can be moved to other disks just like the main partitions. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bernd Helmle Sent: Thursday, January 11, 2007 6:51 AM To: Scott Marlowe Cc: Jim C. Nasby; Jeremy Haile; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe <[EMAIL PROTECTED]> wrote: [...] > > And I don't think the mysql partition supports tablespaces either. > MySQL supports distributing partitions over multiple disks via the SUBPARTITION clause [1]. I leave it to you, wether their syntax is cleaner, more powerful or easier or ;) Bernd [1] http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] max() versus order/limit (WAS: High update activity, PostgreSQL vs BigDBMS)
Did anybody get a chance to look at this? Is it expected behavior? Everyone seemed so incredulous, I hoped maybe this exposed a bug that would be fixed in a near release. -Original Message- From: Adam Rich [mailto:[EMAIL PROTECTED] Sent: Sunday, January 07, 2007 11:53 PM To: 'Joshua D. Drake'; 'Tom Lane' Cc: 'Craig A. James'; 'PostgreSQL Performance' Subject: RE: [PERFORM] High update activity, PostgreSQL vs BigDBMS Here's another, more drastic example... Here the order by / limit version runs in less than 1/7000 the time of the MAX() version. select max(item_id) from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id Aggregate (cost=10850.84..10850.85 rows=1 width=4) (actual time=816.382..816.383 rows=1 loops=1) -> Hash Join (cost=2072.12..10503.30 rows=139019 width=4) (actual time=155.177..675.870 rows=147383 loops=1) Hash Cond: (ri.receipt_id = r.receipt_id) -> Seq Scan on receipt_items ri (cost=0.00..4097.56 rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1) -> Hash (cost=2010.69..2010.69 rows=24571 width=4) (actual time=155.146..155.146 rows=24571 loops=1) -> Hash Join (cost=506.84..2010.69 rows=24571 width=4) (actual time=34.803..126.452 rows=24571 loops=1) Hash Cond: (r.event_id = e.event_id) -> Seq Scan on receipts r (cost=0.00..663.58 rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1) -> Hash (cost=469.73..469.73 rows=14843 width=4) (actual time=34.780..34.780 rows=14843 loops=1) -> Seq Scan on events e (cost=0.00..469.73 rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1) Total runtime: 816.645 ms select item_id from events e, receipts r, receipt_items ri where e.event_id=r.event_id and r.receipt_id=ri.receipt_id order by item_id desc limit 1 Limit (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1 loops=1) -> Nested Loop (cost=0.00..22131.43 rows=139019 width=4) (actual time=0.044..0.044 rows=1 loops=1) -> Nested Loop (cost=0.00..12987.42 rows=168196 width=8) (actual time=0.032..0.032 rows=1 loops=1) -> Index Scan Backward using receipt_items_pkey on receipt_items ri (cost=0.00..6885.50 rows=168196 width=8) (actual time=0.016..0.016 rows=1 loops=1) -> Index Scan using receipts_pkey on receipts r (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1) Index Cond: (r.receipt_id = ri.receipt_id) -> Index Scan using events_pkey on events e (cost=0.00..0.04 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (e.event_id = r.event_id) Total runtime: 0.112 ms -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joshua D. Drake Sent: Sunday, January 07, 2007 9:10 PM To: Adam Rich Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance' Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote: > I'm using 8.2 and using order by & limit is still faster than MAX() > even though MAX() now seems to rewrite to an almost identical plan > internally. Gonna need you to back that up :) Can we get an explain analyze? > Count(*) still seems to use a full table scan rather than an index scan. > There is a TODO out there to help this. Don't know if it will get done. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Configuration Advice
Doesn't sound like you want postgres at all Try mysql. -Original Message- From: "Steve" <[EMAIL PROTECTED]> To: pgsql-performance@postgresql.org Sent: 1/17/2007 2:41 PM Subject: [PERFORM] Configuration Advice Hey there; I've been lurking on this list awhile, and I've been working with postgres for a number of years so I'm not exactly new to this. But I'm still having trouble getting a good balance of settings and I'd like to see what other people think. We may also be willing to hire a contractor to help tackle this problem if anyone is interested. I've got an application here that runs large (in terms of length -- the queries have a lot of conditions in them) queries that can potentially return millions of rows but on average probably return tens of thousands of rows. It's read only for most of the day, and pretty much all the queries except one are really fast. However, each night we load data from a legacy cobol system into the SQL system and then we summarize that data to make the reports faster. This load process is intensely insert/update driven but also has a hefty amount of selects as well. This load process is taking ever longer to complete. SO ... our goal here is to make this load process take less time. It seems the big part is building the big summary table; this big summary table is currently 9 million rows big. Every night, we drop the table, re-create it, build the 9 million rows of data (we use COPY to put hte data in when it's prepared, not INSERT), and then build the indexes on it -- of which there are many. Unfortunately this table gets queried in a lot of different ways and needs these indexes; also unfortunately, we have operator class indexes to support both ASC and DESC sorting on columns so these are for all intents and purposes duplicate but required under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still a requirement?) Building these indexes takes forever! It's a long grind through inserts and then building the indexes takes a hefty amount of time too. (about 9 hours). Now, the application is likely part at fault, and we're working to make it more efficient, but it has nothing to do with the index building time. I'm wondering what we can do to make this better if anything; would it be better to leave the indexes on? It doesn't seem to be. Would it be better to use INSERTs instead of copies? Doesn't seem to be. Anyway -- ANYTHING we can do to make this go faster is appreciated :) Here's some vital statistics: - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI discs. The disc configuration seems to be a good one, it's the best of all the ones we've tested so far. - The load process itself takes about 6 gigs of memory, the rest is free for postgres because this is basically all the machine does. - If this was your machine and situation, how would you lay out the emmory settings? What would you set the FSM to? Would you leave teh bgwriter on or off? We've already got FSYNC off because "data integrity" doesn't matter -- this stuff is religeously backed up and we've got no problem reinstalling it. Besides, in order for this machine to go down, data integrity of the DB is the least of the worries :) Do wal_buffers/full_page_writes matter of FSYNC is off? If so, what settings? What about checkpoints? Any finally, any ideas on planner constants? Here's what I'm using: seq_page_cost = 0.5 # measured on an arbitrary scale random_page_cost = 1.0 # same scale as above cpu_tuple_cost = 0.001 # same scale as above cpu_index_tuple_cost = 0.0001 # same scale as above cpu_operator_cost = 0.00025 # same scale as above effective_cache_size = 679006 I really don't remember how I came up with that effective_cache_size number Anyway... any advice would be appreciated :) Steve ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Configuration Advice
Sorry if this came off sounding trollish All databases have their strengths & weaknesses, and I feel the responsible thing to do is exploit those strengths where possible, rather than expend significant time and effort coaxing one database to do something it wasn't designed to. There's just no such thing as "one size fits all". I have professional experience with MS-SQL, Oracle, MySQL, and Postgres. and the scenario described sounds more ideal for MySQL & MyISAM than anything else: 1) No concerns for data loss (turning fsync & full_page_writes off) since the data can be reloaded 2) No need for MVCC or transactions, since the database is read-only 3) No worries about lock contention 4) Complex queries that might take advantage of the MySQL "Query Cache" since the base data never changes 5) Queries that might obtain data directly from indexes without having to touch tables (again, no need for MVCC) If loading in the base data and creating the summary table is taking a lot of time, using MySQL with MyISAM tables (and binary logging disabled) should provide significant time savings, and it doesn't sound like there's any concerns for the downsides. Yes, postgresql holds an edge over MySQL for heavy OLTP applications, I use it for that and I love it. But for the scenario the original poster is asking about, MySQL/MyISAM is ideal. -Original Message- From: Bricklen Anderson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 17, 2007 3:29 PM To: Adam Rich Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Configuration Advice Adam Rich wrote: > Doesn't sound like you want postgres at all Try mysql. Could you explain your reason for suggesting mysql? I'm simply curious why you would offer that as a solution. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
If I'm reading this correctly, 89% of the query time is spent doing an index scan of earth_coords_idx. Scanning pets is only taking 6% of the total time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Stosberg Sent: Tuesday, February 06, 2007 8:40 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...) Bruno Wolff III wrote: > > Some people here may be able to tell you more if you show us explain > analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find "Pets Near You". We join the pets table on the shelters table to get a zipcode, and then join a shelters table with "earth_distance" to get the coordinates of the zipcode. ( Is there any significant penalty for using a varchar vs an int for a joint? ). I've been investigating partial indexes for the pets table. It has about 300,000 rows, but only about 10 are "active", and those are the ones we are care about. Queries are also frequently made on males vs females, dogs vs cats or specific ages, and those specific cases seem like possible candidates for partial indexes as well. I played with that approach some, but had trouble coming up with any thing that benchmarked faster. I'm reading the explain analyze output correctly myself, nearly all of the time spent is related to the 'pets' table, but I can't see what to about it. Help appreciated! Mark Nested Loop (cost=11.82..29.90 rows=1 width=0) (actual time=37.601..1910.787 rows=628 loops=1) -> Nested Loop (cost=6.68..20.73 rows=1 width=24) (actual time=35.525..166.547 rows=1727 loops=1) -> Bitmap Heap Scan on pets (cost=6.68..14.71 rows=1 width=4) (actual time=35.427..125.594 rows=1727 loops=1) Recheck Cond: (((sex)::text = 'f'::text) AND (species_id = 1)) Filter: ((pet_state)::text = 'available'::text) -> BitmapAnd (cost=6.68..6.68 rows=2 width=0) (actual time=33.398..33.398 rows=0 loops=1) -> Bitmap Index Scan on pets_sex_idx (cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739 rows=35579 loops=1) Index Cond: ((sex)::text = 'f'::text) -> Bitmap Index Scan on pet_species_id_idx (cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779 rows=48695 loops=1) Index Cond: (species_id = 1) -> Index Scan using shelters_pkey on shelters (cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1 loops=1727) Index Cond: ("outer".shelter_id = shelters.shelter_id) -> Bitmap Heap Scan on earth_distance (cost=5.14..9.15 rows=1 width=9) (actual time=0.984..0.984 rows=0 loops=1727) Recheck Cond: ((cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_distance.earth_coords) AND (("outer".postal_code_for_joining)::text = (earth_distance.zipcode)::text)) -> BitmapAnd (cost=5.14..5.14 rows=1 width=0) (actual time=0.978..0.978 rows=0 loops=1727) -> Bitmap Index Scan on earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223 loops=1727) Index Cond: (cube_enlarge(('(-2512840.11676572, 4646218.19036629, 3574817.21369166)'::cube)::cube, 160930.130863421::double precision, 3) @ earth_coords) -> Bitmap Index Scan on earth_distance_zipcode_idx (cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1 loops=1727) Index Cond: (("outer".postal_code_for_joining)::text = (earth_distance.zipcode)::text) Total runtime: 1913.099 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Do I need to rebuild php-pgsql for 8.2.3
Here's what I do... 1) Install postgresql-libs from the RHEL source 2) Install compat-postgresql-libs from postgresql.org (install, not upgrade, use rpm -hiv) use force if necessary 3) Install postgresq-libs from postgresql.org (again, install, not upgrade, use rpm-hiv) use force if necessary If done correctly, you'll end up with all 3 client versions: /usr/lib/libpq.so.3 /usr/lib/libpq.so.4 /usr/lib/libpq.so.5 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Dengler Sent: Wednesday, April 11, 2007 12:25 PM To: Devrim GÜNDÜZ Cc: pgsql-performance; Guillaume Smet Subject: Re: [PERFORM] Do I need to rebuild php-pgsql for 8.2.3 Hi, Thanks for the info. One more thingI am in rpm hell. When I try to # rpm -Uvh postgresql-libs-8.2.3-1PGDG.i686.rpm I get: error: Failed dependencies: libpq.so.3 is needed by (installed) perl-DBD-Pg-1.31-6.i386 libpq.so.3 is needed by (installed) postgresql-python-7.4.13-2.RHEL4.1.i386 libpq.so.3 is needed by (installed) php-pgsql-4.3.9-3.15.i386 and when I try: # rpm -ivh compat-postgresql-libs-3-3PGDG.i686.rpm I get: error: Failed dependencies: postgresql-libs < 8.0.2 conflicts with compat-postgresql-libs-3-3PGDG.i686 gr... should just force the upgrade (ie. --nodeps)? Thanks Mike On 4/10/07, Devrim GÜNDÜZ <[EMAIL PROTECTED]> wrote: Hi, On Tue, 2007-04-10 at 22:55 +0200, Guillaume Smet wrote: > See http://developer.postgresql.org/~devrim/rpms/compat/ and choose > the correct package for your architecture. ... or better, each RHEL4 directory in our FTP site has compat package (that directory is not up2date now). Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
[PERFORM] Java Out-of-memory errors on attempts to read tables with millionsof rows
Greetings, We have several tables (in a PG 7.3.3 database on RH Linux 7.3) with 2M+ rows (each row 300-400 bytes in length) that we SELECT into a JDBC ResultSet for display to the user. We expected that the driver would not actually transmit data from the database until the application began issuing getXXX() calls. (IIRC, this is the way the Oracle driver works, and we had created a buffering mechanism to use it.) Instead, the driver appears to be attempting to create the whole rowset in Java memory before returning, and the application runs out of memory. (Java has been configured to use up to 1.5G on the machine this occurs on.) Now the SELECT is preceded by a COUNT of the rows that the same query would return, so perhaps that's what's causing the problem. But the question is, is this the way a ResultSet is supposed to work? Are there any configuration options available that modify this behavior? Are there commercial implementations of PG JDBC that don't have this problem? (Shame on me, but I have to ask. :) Any help will be greatly appreciated! Rich Cullingford [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Top n queries and GROUP BY
All, This is a straight SQL question, maybe not appropriate for a performance list, but... I have a simple stock holdings setup: => select * from t1; nam |co | num -+---+-- joe | ibm | 600 abe | ibm | 1500 joe | cisco | 1200 abe | cisco | 800 joe | novell| 500 joe | microsoft | 200 What I would like to see is a Top-n-holdings-by-name", e.g, for n=2: nam | co| num --++- joe | cisco | 1200 joe | ibm| 600 abe | ibm| 1500 abe | cisco | 800 I can get part of the way by using a LIMIT clause in a subquery, e.g, => select 'abe', a.co, a.num from (select co, num from t1 where nam='abe' order by num desc limit 2) as a; ?column? | co | num --+---+-- abe | ibm | 1500 abe | cisco | 800 but I can't figure out a correlated subquery (or GROUP BY arrangement or anything else) that will cycle through the names. I vaguely remember that these kinds or queries are hard to do in standard SQL, but I was hoping that PG, with its extensions... Thanks, Rich Cullingford [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Top n queries and GROUP BY
Rich Cullingford wrote: All, This is a straight SQL question, maybe not appropriate for a performance list, but... I have a simple stock holdings setup: => select * from t1; nam |co | num -+---+-- joe | ibm | 600 abe | ibm | 1500 joe | cisco | 1200 abe | cisco | 800 joe | novell| 500 joe | microsoft | 200 What I would like to see is a Top-n-holdings-by-name", e.g, for n=2: nam | co| num --++- joe | cisco | 1200 joe | ibm| 600 abe | ibm| 1500 abe | cisco | 800 I can get part of the way by using a LIMIT clause in a subquery, e.g, => select 'abe', a.co, a.num from (select co, num from t1 where nam='abe' order by num desc limit 2) as a; ?column? | co | num --+---+-- abe | ibm | 1500 abe | cisco | 800 but I can't figure out a correlated subquery (or GROUP BY arrangement or anything else) that will cycle through the names. I vaguely remember that these kinds or queries are hard to do in standard SQL, but I was hoping that PG, with its extensions... I forgot about row subqueries; for n=3, for example: => SELECT * FROM t1 WHERE (nam,co,num) IN (SELECT nam,co,num FROM t1 b where b.nam=t1.nam order by num desc limit 3) order by nam, num desc; nam | co | num -++-- abe | ibm| 1500 abe | cisco | 800 joe | cisco | 1200 joe | ibm| 600 joe | novell | 500 (5 rows) Seems to work... Thanks all, Rich Cullingford [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]