[PERFORM] Improving Query
I have the following query that is a part of a function: select order_details.tpv_success_id, order_details.tpv_id, order_details.ver_code, order_details.app_id, order_details.acct_id, order_details.first_name || ' ' || order_details.last_name as customer_name, order_details.order_date as creation_date, verification_success.checked_out, order_details.csr_name, order_details.products, order_details.promotion, order_details.division_id, order_details.has_dish_billing_info, (select array_upper(acct_product_data_requirements_details.acct_prod_ids, 1)) as num_prods, count(distinct case when provision_issues.account_product_id is not null and provision_issues.resolved_date is null then provision_issues.account_product_id end ) as num_open_issues, count(case when reconciliations.rec_id is not null and reconciliation_cancels.rec_cancel_id is null then reconciliations.rec_id end ) as num_provisioned, count(case when reconciliation_cancels.rec_cancel_id is not null then reconciliation_cancels.rec_cancel_id end ) as num_canceled from frontier.order_details inner join frontier.verification_success on order_details.tpv_success_id = verification_success.tpv_success_id inner join frontier.acct_product_data_requirements_details left outer join frontier.provision_issues on provision_issues.account_product_id = any(acct_product_data_requirements_details.acct_prod_ids) and provision_issues.resolved_date is null left outer join frontier.reconciliations left outer join frontier.reconciliation_cancels on reconciliations.rec_id = reconciliation_cancels.rec_id on reconciliations.account_product_id = any(acct_product_data_requirements_details.acct_prod_ids) on order_details.acct_id = acct_product_data_requirements_details.acct_id where verification_success.checked_out is null group by order_details.tpv_success_id, order_details.tpv_id, order_details.ver_code, order_details.app_id, order_details.acct_id, order_details.first_name || ' ' || order_details.last_name, order_details.order_date, verification_success.checked_out, order_details.csr_name, order_details.products, order_details.promotion, num_prods, order_details.division_id, order_details.has_dish_billing_info having ( count(case when reconciliations.rec_id is not null and reconciliation_cancels.rec_cancel_id is null then reconciliations.rec_id end ) < (select array_upper(acct_product_data_requirements_details.acct_prod_ids, 1)) ) and ( count(distinct case when provision_issues.account_product_id is not null and provision_issues.resolved_date is null then provision_issues.account_product_id end ) + count(case when reconciliations.rec_id is not null and reconciliation_cancels.rec_cancel_id is null then reconciliations.rec_id end ) + count(case when reconciliation_cancels.rec_cancel_id is not null then reconciliation_cancels.rec_cancel_id end ) ) < (select array_upper(acct_product_data_requirements_details.acct_prod_ids, 1)) and order_details.division_id = any('{79,38,70,66,35,40,37,36,67,41,65,39}') --this array here varies. indexes are present for the different variations order by order_details.order_date here is the execution plan: Sort (cost=1350779962.18..1350969875.28 rows=75965240 width=290) (actual time=16591.711..16591.723 rows=110 loops=1) Sort Key: frontier.order_details.order_date -> GroupAggregate (cost=1235567017.53..1295217399.34 rows=75965240 width=290) (actual time=16583.383..16591.420 rows=110 loops=1) Filter: ((count(CASE WHEN ((rec_id IS NOT NULL) AND (rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END) < (subplan)) AND (((count(DISTINCT CASE WHEN ((account_product_id IS NOT NULL) AND (resolved_date IS NULL)) THEN account_product_id ELSE NULL::integer END) + count(CASE WHEN ((rec_id IS NOT NULL) AND (rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END)) + count(CASE WHEN (rec_cancel_id IS NOT NULL) THEN rec_cancel_id ELSE NULL::integer END)) < (subplan))) -> Sort (cost=1235567017.53..1238002161.29 rows=974057502 width=290) (actual time=16576.997..16577.513 rows=3366 loops=1) Sort Key: frontier.order_details.tpv_success_id, frontier.order_details.tpv_id, frontier.order_details.ver_code, frontier.order_details.app_id, frontier.order_details.acct_id, (((frontier.order_details.first_name)::text || ' '::text) || (frontier.order_details.last_name)::text), frontier.order_details.order_date, verification_success.checked_out, frontier.order_details.csr_name, frontier.order_details.products, frontier.order_details.promotion, (subplan), frontier.order_details.division_id, frontier.order_details.has_dish_billing_info -> Merge Join (cost=11383.41..310142000.26 rows=97405750
[PERFORM] Hardware for PostgreSQL
I am trying to build a very Robust DB server that will support 1000+ concurrent users (all ready have seen max of 237 no pooling being used). I have read so many articles now that I am just saturated. I have a general idea but would like feedback from others. I understand query tuning and table design play a large role in performance, but taking that factor away and focusing on just hardware, what is the best hardware to get for Pg to work at the highest level (meaning speed at returning results)? How does pg utilize multiple processors? The more the better? Are queries spread across multiple processors? Is Pg 64 bit? If so what processors are recommended? I read this : http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html POSTGRESQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. POSTGRESQL does not use multi- threading to allow a single process to use multiple CPUs. Its pretty old (2003) but is it still accurate? if this statement is accurate how would it affect connection pooling software like pg_pool? RAM? The more the merrier right? Understanding shmmax and the pg config file parameters for shared mem has to be adjusted to use it. Disks? standard Raid rules right? 1 for safety 5 for best mix of performance and safety? Any preference of SCSI over SATA? What about using a High speed (fibre channel) mass storage device? Who has built the biggest baddest Pg server out there and what do you use? Thanks! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Running on an NFS Mounted Directory
Title: Running on an NFS Mounted Directory I was wondering if there were any performance issues with having a data directory that was an nfs mounted drive? Say like a SAN or NAS device? Has anyone done this before?
Re: [PERFORM] Running on an NFS Mounted Directory
I am looking for the best solution to have a large amount of disk storage attached to my PostgreSQL 8.1 server. I was thinking of having a san or nas attached device be mounted by the pg server over nfs, hence the question about nfs performance. What other options/protocols are there to get high performance and data integrity while having the benefit of not having the physical storage attached to the db server? On 4/27/06 12:55 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Wed, Apr 26, 2006 at 07:35:42PM -0700, Steve Wampler wrote: >> On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote: >>> I was wondering if there were any performance issues with having a data >>> directory that was an nfs mounted drive? Say like a SAN or NAS device? Has >>> anyone done this before? >> >> My understanding is that NFS is pretty poor in performance in general, >> so I would expect it to be particularly bad for a DB. You might run >> some (non-DB) performance tests to get a feel for how bad it might me. >> (Someone once told me that NFS topped out at around 12MB/s, but I don't >> know if that's really true [they were trying to sell a competitive >> networked filesystem]). >> >> In any event, you're at least limited by ethernet speeds, if not more. > > More importantly, the latency involved will kill commit performance. If > it doesn't then it's likely that fsync isn't being obeyed, which means 0 > data integrity. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Running on an NFS Mounted Directory
OK. My thought process was that having non local storage as say a big raid 5 san ( I am talking 5 TB with expansion capability up to 10 ) would allow me to have redundancy, expandability, and hopefully still retain decent performance from the db. I also would hopefully then not have to do periodic backups from the db server to some other type of storage. Is this not a good idea? How bad of a performance hit are we talking about? Also, in regards to the commit data integrity, as far as the db is concerned once the data is sent to the san or nas isn't it "written"? The storage may have that write in cache, but from my reading and understanding of how these various storage devices work that is how they keep up performance. I would expect my bottleneck if any to be the actual Ethernet transfer to the storage, and I am going to try and compensate for that with a full gigabit backbone. On 4/27/06 8:44 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote: > On Thu, Apr 27, 2006 at 08:38:55AM -0400, Ketema Harris wrote: >> I am looking for the best solution to have a large amount of disk storage >> attached to my PostgreSQL 8.1 server. > >> What other options/protocols are there to get high performance and data >> integrity while having the benefit of not having the physical storage >> attached to the db server? > > These are two distinct requirements. Are both really requirements or is > one "nice to have"? The "best" solution for "a large amount of disk > storage" isn't "not having the physical storage attached to the db > server". If you use non-local storage it will be slower and more > expensive, quite likely by a large margin. There may be other advantages > to doing so, but you haven't mentioned any of those as requirements. > > Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Running on an NFS Mounted Directory
Yes, your right, I meant not have to do the backups from the db server itself. I can do that within the storage device now, by allocating space for it, and letting the device copy the data files on some periodic basis. On 4/27/06 9:05 AM, "Bruno Wolff III" <[EMAIL PROTECTED]> wrote: > On Thu, Apr 27, 2006 at 08:57:51 -0400, > Ketema Harris <[EMAIL PROTECTED]> wrote: >> performance from the db. I also would hopefully then not have to do >> periodic backups from the db server to some other type of storage. Is this >> not a good idea? How bad of a performance hit are we talking about? Also, > > You always need to do backups if you care about your data. What if someone > accidental deletes a lot of data? What if someone blows up your data > center (or there is a flood)? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Running on an NFS Mounted Directory
First, I appreciate all of your input. >No, backups are completely unrelated to your storage type; you need them > either way. Please another post. I meant the storage would do the back ups. >redundancy, expandability What I mean by these stupid flavor words is: Redundancy : raid 5. Expandability : the ability to stick another drive in my array and get more storage and not have to turn of the db. >Do you > need the ability to do snapshots? Yes. >Do you want to share one big, expensive, reliable unit between > multiple systems? Will you be doing failover? Yes, and Yes. Really on one other system, a phone system, but it is the crux of my business and will be writing a lot of recorded phone calls. I am working with a storage company now to set up the failover, I want the db and phone systems to never no if the storage switched over. You have given me a lot to think about. The performance concerns me and I will have to find some way to test. Perhaps spending a little less on the storage system and more on the actual servers is the way to go? Then utilize some combination off pg_backup, and the archive_command directive with a periodic script. Thank You all. I will keep researching this and the more input the better. Thank You. On 4/27/06 9:24 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote: > On Thu, Apr 27, 2006 at 08:57:51AM -0400, Ketema Harris wrote: >> OK. My thought process was that having non local storage as say a big raid >> 5 san ( I am talking 5 TB with expansion capability up to 10 ) > > That's two disk trays for a cheap slow array. (Versus a more expensive > solution with more spindles and better seek performance.) > >> would allow >> me to have redundancy, expandability, and hopefully still retain decent >> performance from the db. I also would hopefully then not have to do >> periodic backups from the db server to some other type of storage. > > No, backups are completely unrelated to your storage type; you need them > either way. On a SAN you can use a SAN backup solution to back multiple > systems up with a single backup unit without involving the host CPUs. > This is fairly useless if you aren't amortizing the cost over a large > environment. > >> Is this not a good idea? > > It really depends on what you're hoping to get. As described, it's not > clear. (I don't know what you mean by "redundancy, expandability" or > "decent performance".) > >> How bad of a performance hit are we talking about? > > Way too many factors for an easy answer. Consider the case of NAS vs > SCSI direct attach storage. You're probably in that case comparing a > single 125MB/s (peak) gigabit ethernet channel to (potentially several) > 320MB/s (peak) SCSI channels. With a high-end NAS you might get 120MB/s > off that GBE. With a (more realistic) mid-range unit you're more likely > to get 40-60MB/s. Getting 200MB/s off the SCSI channel isn't a stretch, > and you can fairly easily stripe across multiple SCSI channels. (You can > also bond multiple GBEs, but then your cost & complexity start going way > up, and you're never going to scale as well.) If you have an environment > where you're doing a lot of sequential scans it isn't even a contest. > You can also substitute SATA for SCSI, etc. > > For a FC SAN the peformance numbers are a lot better, but the costs & > complexity are a lot higher. An iSCSI SAN is somewhere in the middle. > >> Also, in regards to the commit data integrity, as far as the db is >> concerned once the data is sent to the san or nas isn't it "written"? >> The storage may have that write in cache, but from my reading and >> understanding of how these various storage devices work that is how >> they keep up performance. > > Depends on the configuration, but yes, most should be able to report > back a "write" once the data is in a non-volatile cache. You can do the > same with a direct-attached array and eliminate the latency inherent in > accessing the remote storage. > >> I would expect my bottleneck if any to be the actual Ethernet transfer >> to the storage, and I am going to try and compensate for that with a >> full gigabit backbone. > > see above. > > The advantages of a NAS or SAN are in things you haven't really touched > on. Is the filesystem going to be accessed by several systems? Do you > need the ability to do snapshots? (You may be able to do this with > direct-attach also, but doing it on a remote storage device tends to be > simpler.) Do you want to share one big, expensive, reliable unit between > multiple systems? Will you be doing failover? (Note that failover > requi
Re: [PERFORM] Running on an NFS Mounted Directory
The SAN has the snapshot capability. On 4/27/06 9:31 AM, "Bruno Wolff III" <[EMAIL PROTECTED]> wrote: > On Thu, Apr 27, 2006 at 09:06:48 -0400, > Ketema Harris <[EMAIL PROTECTED]> wrote: >> Yes, your right, I meant not have to do the backups from the db server >> itself. I can do that within the storage device now, by allocating space >> for it, and letting the device copy the data files on some periodic basis. > > Only if the database server isn't running or your SAN provides a way to > provide a snapshot of the data at a particular instant in time. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Nested Loops vs. Hash Joins or Merge Joins
; Index Scan using customer_pkey on customer (cost=0.00..3.06 rows=1 width=75) (actual time=0.011..0.015 rows=1 loops=3) Index Cond: ("outer".customer_id = customer.customer_id) -> Seq Scan on product (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.013 rows=2 loops=3) -> Seq Scan on marketer_divisions (cost=0.00..1.02 rows=2 width=49) (actual time=0.005..0.013 rows=2 loops=3) -> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.013 rows=2 loops=3) Total runtime: 3.288 ms Nested Loops off: Hash Join (cost=8.27..11.78 rows=1 width=268) (actual time=1.701..1.765 rows=3 loops=1) Hash Cond: ("outer".city_id = "inner".city_id) -> Hash Join (cost=7.24..10.73 rows=1 width=272) (actual time=1.629..1.667 rows=3 loops=1) Hash Cond: ("outer".customer_id = "inner".customer_id) -> Seq Scan on customer (cost=0.00..3.32 rows=32 width=75) (actual time=0.006..0.136 rows=33 loops=1) -> Hash (cost=7.24..7.24 rows=1 width=205) (actual time=1.366..1.366 rows=3 loops=1) -> Hash Join (cost=6.43..7.24 rows=1 width=205) (actual time=1.243..1.333 rows=3 loops=1) Hash Cond: ("outer".division_id = "inner".division_id) -> Hash Join (cost=5.40..6.20 rows=1 width=164) (actual time=1.184..1.252 rows=3 loops=1) Hash Cond: ("outer".ldc_id = "inner".ldc_id) -> Merge Join (cost=4.38..5.16 rows=1 width=151) (actual time=1.124..1.169 rows=3 loops=1) Merge Cond: ("outer".marketer_product_id = "inner".marketer_product_id) -> Index Scan using "PK_marketer_product_id" on marketer_product (cost=0.00..3.04 rows=4 width=12) (actual time=0.012..0.019 rows=2 loops=1) -> Sort (cost=4.38..4.38 rows=1 width=147) (actual time=1.098..1.109 rows=3 loops=1) Sort Key: account.marketer_product_id -> Hash Join (cost=2.78..4.37 rows=1 width=147) (actual time=1.007..1.064 rows=3 loops=1) Hash Cond: ("outer".app_id = "inner".app_id) -> Hash Join (cost=1.75..3.32 rows=1 width=155) (actual time=0.494..0.875 rows=31 loops=1) Hash Cond: (("outer".app_id = "inner".app_id) AND ("outer".acct_id = "inner".acct_id)) -> Seq Scan on account (cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.154 rows=34 loops=1) -> Hash (cost=1.50..1.50 rows=50 width=8) (actual time=0.451..0.451 rows=50 loops=1) -> Seq Scan on application (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.223 rows=50 loops=1) -> Hash (cost=1.03..1.03 rows=1 width=4) (actual time=0.042..0.042 rows=3 loops=1) -> Seq Scan on reconciliation (cost=0.00..1.03 rows=1 width=4) (actual time=0.007..0.019 rows=3 loops=1) Filter: (transferred_date IS NULL) -> Hash (cost=1.02..1.02 rows=2 width=21) (actual time=0.036..0.036 rows=2 loops=1) -> Seq Scan on product (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.014 rows=2 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=49) (actual time=0.036..0.036 rows=2 loops=1) -> Seq Scan on marketer_divisions (cost=0.00..1.02 rows=2 width=49) (actual time=0.007..0.016 rows=2 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.039..0.039 rows=2 loops=1) -> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual time=0.009..0.017 rows=2 loops=1) Total runtime: 2.084 ms With nested loops enabled does it choose to use them because it sees the estimated start up cost with loops as less? Does it not know that the total query would be faster with the Hash Joins? This query is in development right now, and as such there are not many rows. When it goes to production the reconciliation table will grow by about 50 – 100 rows per day where the transferred_date is NULL (this is the driving criteria behind this query.) As the table grows can I expect Pg to realize the the nested loops will be slower and will it switch to the Hash Joins? If not how would I force it to use the Hash Joins without just turning off nested loops completely? Is it a good idea to turn off nested loops completely? Statistics collecting and auto vacuum is enabled btw. I have an erd diagram showing the table structures if anyone is interested in looking at it, just let me know. Thanks, Ketema
Re: [PERFORM] Improving Query
On Oct 30, 2007, at 9:23 AM, Richard Huxton wrote: Ketema wrote: I have the following query that is a part of a function: Yikes! Difficult to get a clear view of what this query is doing. It seems complicated because you only have a small subset of the schema these tables tie into. Be happy to share the whole thing, if it is needed. OK, I'm assuming you're vacuumed and analysed on all these tables... Yes. Auto-vacuum is on and do a Full vacuuum every 2 days. My concern is with the sort step that takes 15 seconds by itself: -> Sort (cost=1235567017.53..1238002161.29 rows=974057502 width=290) (actual time=16576.997..16577.513 rows=3366 loops=1) That's taking hardly any time, the startup time is 16576.997 already. Of course, the row estimate is *way* out of line. OK. I misread the plan and took start up time as the time it took to perform operation. Thanks for the link to explain analyze. If you look here (where the explain is a bit easier to see) http://explain-analyze.info/query_plans/1258-query-plan-224 The two main things to look at seem to be the nested loops near the top and a few lines down the materialise (cost=326... These two nested loops seem to be pushing the row estimates wildly out of reality. They also consume much of the time. The immediate thing that leaps out here is that you are trying to join an int to an array of ints. Why are you using this setup rather than a separate table? I see what you are talking about. When I initially used this set up it was because I wanted to avoid a table that had a ton of rows in it that I knew I would have to join to often. So I made a column that holds on average 4 or 5 ints representing "products" on a particular "order". I did not realize that using a function in the join would be worse that simply having a large table. How can I improve this step? Things I have thought about: 1)Creating indexes on the aggregates...Found out this can't be done. Nope - not sure what it would mean in any case. My initial thought was the counts were causing the slow up. THis is not the issue though as you have shown. 2)Create Views of the counts and the sub select...is this any faster as the view is executed at run time anyway? Might make the query easier to write, won't make it faster. Not without materialised views which are the fancy name for #3... 3)Create actual tables of the sub select and aggregates...How would this be maintained to ensure it was always accurate? Triggers. Because of the use of this system I may take this route as I think it will be less changes. 4)Increasing hardware resources. Currently box is on a single processor amd64 with 8Gb of RAM. below are the settings for resource usage. shared_buffers = 65536 temp_buffers = 5000 max_prepared_transactions = 2000 These are settings out of postgresql.conf Currently systctl.conf is set to kernel.shmmax = 805306368 connections are at 300 and I usually have about 200 connections open. work_mem = 131072 maintenance_work_mem = 512000 Can't say about these without knowing whether you've got only one connection or 100. max_stack_depth = 7168 max_fsm_pages = 16 max_fsm_relations = 4000 The only function of this box if for Pg, so I do not mind it using every last drop of ram and resources that it can. 5)Upgrade version of pg..currently is running 8.1.4 Well every version gets better at planning, so it can't hurt. At one point I did go to 8.2.3 on a dev box and performance was horrible. Have not had opportunity to see how to make postgresql.conf file in 8.2 match settings in 8.1 as some things have changed. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Improving Query
On Oct 30, 2007, at 9:31 AM, Michael Glaesemann wrote: On Oct 30, 2007, at 7:18 , Ketema wrote: here is the execution plan: I've put this online here: http://explain-analyze.info/query_plans/1259-ketema-2007-10-30 I have attached an erd of the tables used in this query. If it is stripped out it can be viewed here: http://www.ketema.net/ provision_list_tables_erd.jpg My concern is with the sort step that takes 15 seconds by itself: -> Sort (cost=1235567017.53..1238002161.29 rows=974057502 width=290) (actual time=16576.997..16577.513 rows=3366 loops=1) What jumps out at me is the huge difference in estimated and returned rows, and the huge cost estimates. Have you analyzed recently? Yes. I run vacuum FULL ANALYZE VERBOSE every two days with a cron job. I am running again now any way. Do you have enable_seqscan disabled? It appears so, due to the high cost here: -> Seq Scan on order_details (cost=1.0..10012.45 rows=35 width=199) (actual time=0.001..0.001 rows=0 loops=1) http://explain-analyze.info/query_plans/1259- ketema-2007-10-30#node-3594 What does it look like with seqscan enabled? it was disabled. new plan posted here: http://explain-analyze.info/query_plans/1261-provision-list-seq-scan- enabled 2)Create Views of the counts and the sub select...is this any faster as the view is executed at run time anyway? Views aren't materialized: it's like inlining the definition of the view itself in the query. 3)Create actual tables of the sub select and aggregates...How would this be maintained to ensure it was always accurate? One way would be to update the summaries using triggers. Hopefully you won't need to do this after analyzing and perhaps tweaking your server configuration. Unfortunately I don't have the time to look at the query plan in more detail, but I suspect there's a better way to get the results you're looking for. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Improving Query
I am definitely interested in upgrading. Is there a guide out there that perhaps was created to explain the changes in the config files from 8.1 to 8.2 ? Migration guide I guess? On Oct 30, 2007, at 11:39 AM, Tom Lane wrote: Richard Huxton <[EMAIL PROTECTED]> writes: Ketema wrote: 5)Upgrade version of pg..currently is running 8.1.4 Well every version gets better at planning, so it can't hurt. +1 ... there are at least two things about this query that 8.2 could be expected to be a great deal smarter about: * mixed outer and inner joins * something = ANY(array) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] hardware and For PostgreSQL
I am trying to build a very Robust DB server that will support 1000+ concurrent users (all ready have seen max of 237 no pooling being used). i have read so many articles now that I am just saturated. I have a general idea but would like feedback from others. I understand query tuning and table design play a large role in performance, but taking that factor away and focusing on just hardware, what is the best hardware to get for Pg to work at the highest level (meaning speed at returning results)? How does pg utilize multiple processors? The more the better? Are queries spread across multiple processors? Is Pg 64 bit? If so what processors are recommended? I read this : http://www.postgresql.org/files/documentation/books/ aw_pgsql/hw_performance/node12.html POSTGRESQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. POSTGRESQL does not use multi- threading to allow a single process to use multiple CPUs. Its pretty old (2003) but is it still accurate? if this statement is accurate how would it affect connection pooling software like pg_pool? RAM? The more the merrier right? Understanding shmmax and the pg config file parameters for shared mem has to be adjusted to use it. Disks? standard Raid rules right? 1 for safety 5 for best mix of performance and safety? Any preference of SCSI over SATA? What about using a High speed (fibre channel) mass storage device? Who has built the biggest baddest Pg server out there and what do you use? Thanks! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq