[PERFORM] select max(id) from aTable is very slow

2004-06-03 Thread David Teran
Hi,
we have a table with about 6.000.000 rows. There is an index on a  
column with the name id which is an integer and serves as primary key.

When we execute select max(id) from theTable; it takes about 10  
seconds. Explain analyze returns:

 

 Aggregate  (cost=153635.15..153635.15 rows=1 width=4) (actual  
time=9738.263..9738.264 rows=1 loops=1)
   -  Seq Scan on job_property  (cost=0.00..137667.32 rows=6387132  
width=4) (actual time=0.102..7303.649 rows=6387132 loops=1)
 Total runtime: 9738.362 ms
(3 rows)


I recreated the index on column id and ran vacuum analyze job_property  
but this did not help. I tried to force index usage with  SET  
ENABLE_SEQSCAN TO OFF; but the explain analyze still looks like the  
query is done using a seqscan.

Is the speed more or less normal for a 'dual G5 with 2 GHZ and 4 GB of  
Ram and a SATA hd' or do i miss something?

regards David
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] select max(id) from aTable is very slow

2004-06-03 Thread Richard Huxton
David Teran wrote:
Hi,
we have a table with about 6.000.000 rows. There is an index on a  
column with the name id which is an integer and serves as primary key.

When we execute select max(id) from theTable; it takes about 10  
seconds. Explain analyze returns:
Due to the open-ended nature of PG's aggregate function system, it can't 
see inside the max() function to realise it doesn't need all the values.

Fortune favours the flexible however - the simple workaround is to use 
the equivalent:
  SELECT id FROM theTable ORDER BY id DESC LIMIT 1;

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Slow response of PostgreSQL

2004-06-03 Thread Stephan Szabo
On Tue, 17 Feb 2004, Saleem Burhani Baloch wrote:

 select count(*), sum(vl_ex_stax) , sum(qty) , unit from inv_detail group by unit;
 on both databases.

 PostgreSQL Machine
 **
 P-III 600Mhz (Dell Precision 220)
 256 MB Ram (RD Ram)
 40 GB Baracuda Ext2 File System.
 RedHat 7.2
 PostgreSQL 7.1.3-2

Besides the comments on the conf file already sent, 7.1.3 is many versions
behind the current version and definately has some deficiencies either
fully or partially corrected in later versions.  All in all, I'd suggest
going all the way to 7.4.1 since the hash aggregate stuff might help the
queries you're running.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] very large db performance question

2004-06-03 Thread Eric Jain
 IMHO the size of the DB is less relevant than the query workload. For
 example, if you're storying 100GB of data but only doing a single
 index scan on it every 10 seconds, any modern machine with enough HD
 space should be fine.

I agree that the workload is likely to be the main issue in most
situations. However, if your queries involve lots of counting and
aggregating, your databases contains several gigabytes of data, and you
are using common hardware, be prepared to wait anywhere from minutes to
hours, even if you are the only user.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Disappointing performance in db migrated from MS SQL

2004-06-03 Thread scott.marlowe
On Fri, 13 Feb 2004 [EMAIL PROTECTED] wrote:

  Josh, the disks in the new system should be substantially faster than
  the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has
  15k RPM disks, as opposed to the 10k RPM disks in the old system.
 
 Spindle speed does not correlate with 'throughput' in any easy way.  What
 controllers are you using for these disks?

This is doubly so with a good RAID card with battery backed cache.  

I'd bet that 10k rpm drives on a cached array card will beat an otherwise 
equal setup with 15k rpm disks and no cache.  I know that losing the cache 
slows my system down to a crawl (i.e. set it to write thru instead of 
write back.) comparitively speaking.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tables on multiple disk drives

2004-06-03 Thread scott.marlowe
On Tue, 17 Feb 2004, Craig Thomas wrote:

  On Tue, 17 Feb 2004, Konstantin Tokar wrote:
 
  Hi!
  Does PostgreSQL allow to create tables and indices of a single
  database on multiple disk drives with a purpose of increase
  performance as Oracle database does? If a symbolic reference is the
  only method then the next question is: how can it be determined what
  file is referred to what table and index?
 
  You're life will be simpler, and your setup will be faster without
  having  to muck about with it, if you just buy a good RAID controller
  with battery  backed cache.  LSI/Megaraid and Adaptec both make
  serviceable controllers  for reasonable prices, and as you add drives,
  the speed just goes up, no  muddling around with sym links.
 
 This works to a limited extent.  For very large databases, maximum
 throughput of I/O is the paramount factor for database performance.  With
 raid controllers, your LUN is still limited to a small number of disks.
 PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc
 can write directly to disk (raw I/O).  With large databases it is
 advantageous to spread a table across 100's of disks, if the table is
 quite large.  I don't know of any manufacturer that creates a 100 disk
 raid array yet.

You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI 
interfaces, and they act as one unit.  That's 3*4*15 = 180 disks max.

With FC AL connections and four cards, it would be possible to approach 
1000 drives.  

Of course, I'm not sure how fast any RAID card setup is gonna be with that 
many drives, but ya never know.  My guess is that before you go there you 
buy a big external RAID box built for speed.  We have a couple of 200+ 
drive external RAID5 storage boxes at work that are quite impressive.

 Some of the problem can be addressed by using a volume manager (such as
 LVM in Linux, or Veritas on Unix-like systems).  This allows one to
 create a volume using partitions from many disks.  One can then create
 a file system and mount it on the volume.

Pretty much RAID arrays in software, which means no battery backed cache, 
which means it'll be fast at reading, but probably pretty slow at writes, 
epsecially if there's a lot of parallel access waiting to write to the 
database.

 However, to get the best performance, Raw I/O capability is the best
 way to go.

Unsupported statement made as fact.  I'm not saying it can't or isn't
true, but my experience has been that large RAID5 arrays are a great 
compromise between maximum performance and reliability, giving a good 
measure of each.  It doesn't take 100 drives to do well, even a dozen to 
two dozen will get you in the same basic range as splitting out files by 
hand with sym links without all the headache of chasing down the files, 
shutting down the database, linking it over etc...



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] RAID or manual split?

2004-06-03 Thread matt
 It seems, that if I know the type and frequency of the queries a
 database will be seeing, I could split the database by hand over
 multiple disks and get better performance that I would with a RAID array
 with similar hardware.

Unlikely, but possible if you had radically different hardware for
different tables.

 Six large (3-7 Mrow) 'summary' tables, each being updated continuously
 by 5-20 processes with about 0.5 transactions/second/process.

Well you should get close to an order of magnitude better performance from
a RAID controller with write-back cache on those queries.

 Periodically (currently every two weeks), join queries are
 performed between one of the 'summary' tables(same one each time) and
 each of the other five.  Each join touches most rows of both tables,
 indexes aren't used.  Results are written into a separate group of
 'inventory' tables (about 500 Krow each), one for each join.

The more disks the data is spread over the better (the RAID controller
will help here with striping).

 There are frequent (100-1000/day) queries of both the
 inventory and summary tables using the primary key -- always using the
 index and returning  10 rows.

RAM is what you need, to cache the data and indexes, and then as much CPU
power as you can get.

 We're currently getting (barely) acceptable performance from a single
 15k U160 SCSI disk, but db size and activity are growing quickly.
 I've got more disks and a battery-backed LSI card on order.

3 or more disks in a stripe set, with write back caching, will almost
certainly give a huge performance boost.  Try that first, and only if you
have issues should you think about futzing with symlinks etc.

M

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Pl/Pgsql Functions running simultaneously

2004-06-03 Thread Marcus Whitney
Am I on the wrong list to ask this question, or does this list usually have 
low activity?  Just asking because I am new and I need to know where to ask 
this question.  Thanks.

On Wednesday 02 June 2004 16:08, Marcus Whitney wrote:
 Hello all,

   I have an import function that I have been working on for some time now,
 and it performed well up until recently.  It is doing a lot, and because
 the queries are not cached, I am not sure if that is what the problem is. 
 If a function takes a while, does it lock any of the tables it is
 accessing, even for SELECT?

 Below is the bulk of the function:

 -- set sql statement variables
 create_import_file_sql := ''COPY '' || container_table || '' ('' ||
 filtered_container_columns  || '') TO '' ||
 quote_literal(formatted_import_file) || '' WITH NULL AS '' ||
 null_single_quotes;
 upload_to_import_table_sql := ''COPY '' || import_table || '' (''
 || field_names || '') FROM '' || quote_literal(formatted_import_file) || ''
 WITH NULL AS '' ||  null_single_quotes;
 clean_personalization_fields_sql := ''UPDATE '' || import_table ||
 '' SET emma_member_email = btrim(emma_member_email, '' ||
 quote_literal(quoted_single_quote) || '') , emma_member_name_first =
 btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote) ||
 '') ,   emma_member_name_last = btrim(emma_member_name_last, '' ||
 quote_literal(quoted_single_quote) || '') ;'';
 clean_personalization_fields_sql2 := ''UPDATE '' || import_table ||
 '' SET emma_member_email = btrim(emma_member_email) ,
 emma_member_name_first = btrim(emma_member_name_first) ,  
 emma_member_name_last =
 btrim(emma_member_name_last) ;'';
 set_account_id_sql := ''UPDATE '' || import_table || '' SET
 emma_account_id = '' || account_id;
 set_default_active_status_sql := ''UPDATE '' || import_table || ''
 SET emma_member_status_id = 1'';
 set_errors_for_null_email_sql := ''UPDATE '' || import_table || ''
 SET emma_member_status_id = 2 WHERE emma_member_email IS NULL'';
 record_null_email_count_sql := ''UPDATE '' || import_history_table
 || '' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' ||
 import_table || '' WHERE emma_member_email IS NULL) WHERE
 emma_import_history_id ='' || import_history_id;
 set_errors_for_invalid_email_sql := ''UPDATE '' || import_table ||
 '' SET emma_member_status_id = 2  WHERE emma_member_email !~* '' ||
 email_regex; record_invalid_email_count_sql := ''UPDATE '' ||
 import_history_table

 || ''  SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM '' ||

 import_table || ''  WHERE emma_member_email !~* '' || email_regex || '' )
 WHERE emma_import_history_id ='' || import_history_id;
 get_dupes_in_import_sql := ''SELECT emma_member_email,
 emma_member_status_id FROM '' || import_table || '' GROUP BY
 emma_member_email, emma_member_status_id having count(*)  1'';
 insert_dupes_sql := ''INSERT  INTO '' || dupe_table || '' SELECT *
 FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' ||
 member_table || ''.emma_member_email)'';
 record_table_dupe_count_sql := ''UPDATE '' || import_history_table
 || '' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM '' ||
 import_table || '' WHERE emma_member_email = LOWER('' || member_table ||
 ''.emma_member_email)) WHERE emma_import_history_id ='' ||
 import_history_id; remove_dupes_from_import_table_sql := ''DELETE FROM ''
 || import_table

 || '' WHERE LOWER(emma_member_email) = LOWER('' || member_table ||

 ''.emma_member_email)'';
 create_clean_import_file_sql := ''COPY '' || import_table || '' TO
 ''

 || quote_literal(clean_import_file) || '' WITH NULL AS '' ||

 null_single_quotes;
 create_members_groups_ids_file_sql := ''COPY '' || import_table ||
 '' (emma_member_id) TO '' || quote_literal(members_groups_ids_file) || ''
 WITH NULL AS '' ||  null_single_quotes;
 empty_import_table_sql := ''TRUNCATE '' || import_table;
 upload_clean_import_sql := ''COPY '' || member_table || '' FROM ''
 || quote_literal(clean_import_file) || '' WITH NULL AS '' ||
 null_single_quotes;
 upload_members_groups_ids_sql := ''COPY '' ||
 members_groups_ids_table

 || '' (emma_member_id) FROM '' || quote_literal(members_groups_ids_file) ||

 '' WITH NULL AS '' ||  null_single_quotes;
 empty_members_groups_ids_sql := ''TRUNCATE '' ||
 members_groups_ids_table;
 empty_members_dupes_sql := ''TRUNCATE '' || dupe_table;
 vacuum_sql := ''VACUUM '' || member_table || ''; VACUUM '' ||
 import_table || ''; VACUUM '' || container_table || ''; VACUUM '' ||
 members_groups_table || ''; VACUUM '' || members_groups_ids_table || '';
 VACUUM '' || dupe_table;

 -- BEGIN ACTIVITY
 -- Create the filtered import file with the
 EXECUTE create_import_file_sql;
 -- Load data from the filtered file to the import table
 EXECUTE 

[PERFORM] Most transactions per second on largest box?

2004-06-03 Thread jason.servetar








This is a very general question but what is the largest linux
box anyone has run PostgreSQL on and what kind of concurrent transactions per
second have you seen? 



We have a client who has huge bursts of activity, coinciding
with high rated TV appearances, meaning hundreds of thousands of users
registering and logging in at the same time.



Currently we are running a dual cpu dell blade server on
redhat linux (2.4?) and PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC
2.96, raid5 and am using sqlrelay for connection pooling. It works fine
under ordinary load but bogs down too much under these huge loads. 



I would love to be able to tell them that a specific box
like a 2 ghz quad xenon with 10 GB of ram, on a raid 10 server will get them x concurrent
transactions per second with x threads running. From that I can give them a great
estimate on how many registrations per minute there money can buy, but right
now all I can tell them is that if you spend more money you will get more tps. 



Thanks for any info you may have. 








Re: [PERFORM] Tables on multiple disk drives

2004-06-03 Thread Josh Berkus
Konstantin,

   Does PostgreSQL allow to create tables and indices of a single
   database on multiple disk drives with a purpose of increase
   performance as Oracle database does? If a symbolic reference is the
   only method then the next question is: how can it be determined what
   file is referred to what table and index?

Howdy!   I bet you're a bit taken aback by the discussion that ensued, and 
even more confused than before.

You are actually asking about two related features:

Tablespaces, which allows designating different directories/volumes for 
specific tables and indexes at creation time, and:

Partitioned Tables, which allows the division of large tables and/or indexes 
horizontally along pre-defined criteria.

The first, tablespaces, are under development and may make it for 7.5, or 
maybe not, but certainly in the version after that.

The second, partitioned tables, is NOT under development because this feature 
lacks both a programmer and a clear specification.   

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings