[PERFORM] select max(id) from aTable is very slow
Hi, we have a table with about 6.000.000 rows. There is an index on a column with the name id which is an integer and serves as primary key. When we execute select max(id) from theTable; it takes about 10 seconds. Explain analyze returns: Aggregate (cost=153635.15..153635.15 rows=1 width=4) (actual time=9738.263..9738.264 rows=1 loops=1) - Seq Scan on job_property (cost=0.00..137667.32 rows=6387132 width=4) (actual time=0.102..7303.649 rows=6387132 loops=1) Total runtime: 9738.362 ms (3 rows) I recreated the index on column id and ran vacuum analyze job_property but this did not help. I tried to force index usage with SET ENABLE_SEQSCAN TO OFF; but the explain analyze still looks like the query is done using a seqscan. Is the speed more or less normal for a 'dual G5 with 2 GHZ and 4 GB of Ram and a SATA hd' or do i miss something? regards David ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] select max(id) from aTable is very slow
David Teran wrote: Hi, we have a table with about 6.000.000 rows. There is an index on a column with the name id which is an integer and serves as primary key. When we execute select max(id) from theTable; it takes about 10 seconds. Explain analyze returns: Due to the open-ended nature of PG's aggregate function system, it can't see inside the max() function to realise it doesn't need all the values. Fortune favours the flexible however - the simple workaround is to use the equivalent: SELECT id FROM theTable ORDER BY id DESC LIMIT 1; -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Slow response of PostgreSQL
On Tue, 17 Feb 2004, Saleem Burhani Baloch wrote: select count(*), sum(vl_ex_stax) , sum(qty) , unit from inv_detail group by unit; on both databases. PostgreSQL Machine ** P-III 600Mhz (Dell Precision 220) 256 MB Ram (RD Ram) 40 GB Baracuda Ext2 File System. RedHat 7.2 PostgreSQL 7.1.3-2 Besides the comments on the conf file already sent, 7.1.3 is many versions behind the current version and definately has some deficiencies either fully or partially corrected in later versions. All in all, I'd suggest going all the way to 7.4.1 since the hash aggregate stuff might help the queries you're running. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] very large db performance question
IMHO the size of the DB is less relevant than the query workload. For example, if you're storying 100GB of data but only doing a single index scan on it every 10 seconds, any modern machine with enough HD space should be fine. I agree that the workload is likely to be the main issue in most situations. However, if your queries involve lots of counting and aggregating, your databases contains several gigabytes of data, and you are using common hardware, be prepared to wait anywhere from minutes to hours, even if you are the only user. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Disappointing performance in db migrated from MS SQL
On Fri, 13 Feb 2004 [EMAIL PROTECTED] wrote: Josh, the disks in the new system should be substantially faster than the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has 15k RPM disks, as opposed to the 10k RPM disks in the old system. Spindle speed does not correlate with 'throughput' in any easy way. What controllers are you using for these disks? This is doubly so with a good RAID card with battery backed cache. I'd bet that 10k rpm drives on a cached array card will beat an otherwise equal setup with 15k rpm disks and no cache. I know that losing the cache slows my system down to a crawl (i.e. set it to write thru instead of write back.) comparitively speaking. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tables on multiple disk drives
On Tue, 17 Feb 2004, Craig Thomas wrote: On Tue, 17 Feb 2004, Konstantin Tokar wrote: Hi! Does PostgreSQL allow to create tables and indices of a single database on multiple disk drives with a purpose of increase performance as Oracle database does? If a symbolic reference is the only method then the next question is: how can it be determined what file is referred to what table and index? You're life will be simpler, and your setup will be faster without having to muck about with it, if you just buy a good RAID controller with battery backed cache. LSI/Megaraid and Adaptec both make serviceable controllers for reasonable prices, and as you add drives, the speed just goes up, no muddling around with sym links. This works to a limited extent. For very large databases, maximum throughput of I/O is the paramount factor for database performance. With raid controllers, your LUN is still limited to a small number of disks. PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc can write directly to disk (raw I/O). With large databases it is advantageous to spread a table across 100's of disks, if the table is quite large. I don't know of any manufacturer that creates a 100 disk raid array yet. You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI interfaces, and they act as one unit. That's 3*4*15 = 180 disks max. With FC AL connections and four cards, it would be possible to approach 1000 drives. Of course, I'm not sure how fast any RAID card setup is gonna be with that many drives, but ya never know. My guess is that before you go there you buy a big external RAID box built for speed. We have a couple of 200+ drive external RAID5 storage boxes at work that are quite impressive. Some of the problem can be addressed by using a volume manager (such as LVM in Linux, or Veritas on Unix-like systems). This allows one to create a volume using partitions from many disks. One can then create a file system and mount it on the volume. Pretty much RAID arrays in software, which means no battery backed cache, which means it'll be fast at reading, but probably pretty slow at writes, epsecially if there's a lot of parallel access waiting to write to the database. However, to get the best performance, Raw I/O capability is the best way to go. Unsupported statement made as fact. I'm not saying it can't or isn't true, but my experience has been that large RAID5 arrays are a great compromise between maximum performance and reliability, giving a good measure of each. It doesn't take 100 drives to do well, even a dozen to two dozen will get you in the same basic range as splitting out files by hand with sym links without all the headache of chasing down the files, shutting down the database, linking it over etc... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] RAID or manual split?
It seems, that if I know the type and frequency of the queries a database will be seeing, I could split the database by hand over multiple disks and get better performance that I would with a RAID array with similar hardware. Unlikely, but possible if you had radically different hardware for different tables. Six large (3-7 Mrow) 'summary' tables, each being updated continuously by 5-20 processes with about 0.5 transactions/second/process. Well you should get close to an order of magnitude better performance from a RAID controller with write-back cache on those queries. Periodically (currently every two weeks), join queries are performed between one of the 'summary' tables(same one each time) and each of the other five. Each join touches most rows of both tables, indexes aren't used. Results are written into a separate group of 'inventory' tables (about 500 Krow each), one for each join. The more disks the data is spread over the better (the RAID controller will help here with striping). There are frequent (100-1000/day) queries of both the inventory and summary tables using the primary key -- always using the index and returning 10 rows. RAM is what you need, to cache the data and indexes, and then as much CPU power as you can get. We're currently getting (barely) acceptable performance from a single 15k U160 SCSI disk, but db size and activity are growing quickly. I've got more disks and a battery-backed LSI card on order. 3 or more disks in a stripe set, with write back caching, will almost certainly give a huge performance boost. Try that first, and only if you have issues should you think about futzing with symlinks etc. M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Pl/Pgsql Functions running simultaneously
Am I on the wrong list to ask this question, or does this list usually have low activity? Just asking because I am new and I need to know where to ask this question. Thanks. On Wednesday 02 June 2004 16:08, Marcus Whitney wrote: Hello all, I have an import function that I have been working on for some time now, and it performed well up until recently. It is doing a lot, and because the queries are not cached, I am not sure if that is what the problem is. If a function takes a while, does it lock any of the tables it is accessing, even for SELECT? Below is the bulk of the function: -- set sql statement variables create_import_file_sql := ''COPY '' || container_table || '' ('' || filtered_container_columns || '') TO '' || quote_literal(formatted_import_file) || '' WITH NULL AS '' || null_single_quotes; upload_to_import_table_sql := ''COPY '' || import_table || '' ('' || field_names || '') FROM '' || quote_literal(formatted_import_file) || '' WITH NULL AS '' || null_single_quotes; clean_personalization_fields_sql := ''UPDATE '' || import_table || '' SET emma_member_email = btrim(emma_member_email, '' || quote_literal(quoted_single_quote) || '') , emma_member_name_first = btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote) || '') , emma_member_name_last = btrim(emma_member_name_last, '' || quote_literal(quoted_single_quote) || '') ;''; clean_personalization_fields_sql2 := ''UPDATE '' || import_table || '' SET emma_member_email = btrim(emma_member_email) , emma_member_name_first = btrim(emma_member_name_first) , emma_member_name_last = btrim(emma_member_name_last) ;''; set_account_id_sql := ''UPDATE '' || import_table || '' SET emma_account_id = '' || account_id; set_default_active_status_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 1''; set_errors_for_null_email_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 2 WHERE emma_member_email IS NULL''; record_null_email_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email IS NULL) WHERE emma_import_history_id ='' || import_history_id; set_errors_for_invalid_email_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 2 WHERE emma_member_email !~* '' || email_regex; record_invalid_email_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email !~* '' || email_regex || '' ) WHERE emma_import_history_id ='' || import_history_id; get_dupes_in_import_sql := ''SELECT emma_member_email, emma_member_status_id FROM '' || import_table || '' GROUP BY emma_member_email, emma_member_status_id having count(*) 1''; insert_dupes_sql := ''INSERT INTO '' || dupe_table || '' SELECT * FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' || member_table || ''.emma_member_email)''; record_table_dupe_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email = LOWER('' || member_table || ''.emma_member_email)) WHERE emma_import_history_id ='' || import_history_id; remove_dupes_from_import_table_sql := ''DELETE FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' || member_table || ''.emma_member_email)''; create_clean_import_file_sql := ''COPY '' || import_table || '' TO '' || quote_literal(clean_import_file) || '' WITH NULL AS '' || null_single_quotes; create_members_groups_ids_file_sql := ''COPY '' || import_table || '' (emma_member_id) TO '' || quote_literal(members_groups_ids_file) || '' WITH NULL AS '' || null_single_quotes; empty_import_table_sql := ''TRUNCATE '' || import_table; upload_clean_import_sql := ''COPY '' || member_table || '' FROM '' || quote_literal(clean_import_file) || '' WITH NULL AS '' || null_single_quotes; upload_members_groups_ids_sql := ''COPY '' || members_groups_ids_table || '' (emma_member_id) FROM '' || quote_literal(members_groups_ids_file) || '' WITH NULL AS '' || null_single_quotes; empty_members_groups_ids_sql := ''TRUNCATE '' || members_groups_ids_table; empty_members_dupes_sql := ''TRUNCATE '' || dupe_table; vacuum_sql := ''VACUUM '' || member_table || ''; VACUUM '' || import_table || ''; VACUUM '' || container_table || ''; VACUUM '' || members_groups_table || ''; VACUUM '' || members_groups_ids_table || ''; VACUUM '' || dupe_table; -- BEGIN ACTIVITY -- Create the filtered import file with the EXECUTE create_import_file_sql; -- Load data from the filtered file to the import table EXECUTE
[PERFORM] Most transactions per second on largest box?
This is a very general question but what is the largest linux box anyone has run PostgreSQL on and what kind of concurrent transactions per second have you seen? We have a client who has huge bursts of activity, coinciding with high rated TV appearances, meaning hundreds of thousands of users registering and logging in at the same time. Currently we are running a dual cpu dell blade server on redhat linux (2.4?) and PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC 2.96, raid5 and am using sqlrelay for connection pooling. It works fine under ordinary load but bogs down too much under these huge loads. I would love to be able to tell them that a specific box like a 2 ghz quad xenon with 10 GB of ram, on a raid 10 server will get them x concurrent transactions per second with x threads running. From that I can give them a great estimate on how many registrations per minute there money can buy, but right now all I can tell them is that if you spend more money you will get more tps. Thanks for any info you may have.
Re: [PERFORM] Tables on multiple disk drives
Konstantin, Does PostgreSQL allow to create tables and indices of a single database on multiple disk drives with a purpose of increase performance as Oracle database does? If a symbolic reference is the only method then the next question is: how can it be determined what file is referred to what table and index? Howdy! I bet you're a bit taken aback by the discussion that ensued, and even more confused than before. You are actually asking about two related features: Tablespaces, which allows designating different directories/volumes for specific tables and indexes at creation time, and: Partitioned Tables, which allows the division of large tables and/or indexes horizontally along pre-defined criteria. The first, tablespaces, are under development and may make it for 7.5, or maybe not, but certainly in the version after that. The second, partitioned tables, is NOT under development because this feature lacks both a programmer and a clear specification. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings