from:"Ketema"

[PERFORM] Improving Query

2007-10-30 Thread Ketema

I have the following query that is a part of a function:

select order_details.tpv_success_id, order_details.tpv_id,
order_details.ver_code, order_details.app_id,
order_details.acct_id, order_details.first_name || ' ' ||
order_details.last_name as customer_name,
order_details.order_date as creation_date,
verification_success.checked_out, order_details.csr_name,
order_details.products, order_details.promotion,
order_details.division_id, order_details.has_dish_billing_info,
 (select
array_upper(acct_product_data_requirements_details.acct_prod_ids, 1))
as num_prods,
count(distinct case when provision_issues.account_product_id is 
not
null and provision_issues.resolved_date is null then
provision_issues.account_product_id end ) as num_open_issues,
count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) as num_provisioned,
count(case when reconciliation_cancels.rec_cancel_id is not null
then reconciliation_cancels.rec_cancel_id end ) as num_canceled
from frontier.order_details
   inner join frontier.verification_success
on order_details.tpv_success_id =
verification_success.tpv_success_id
inner join frontier.acct_product_data_requirements_details
left outer join frontier.provision_issues
on provision_issues.account_product_id =
any(acct_product_data_requirements_details.acct_prod_ids) and
provision_issues.resolved_date is null
left outer join frontier.reconciliations
left outer join frontier.reconciliation_cancels
on reconciliations.rec_id =
reconciliation_cancels.rec_id
on reconciliations.account_product_id =
any(acct_product_data_requirements_details.acct_prod_ids)
on order_details.acct_id =
acct_product_data_requirements_details.acct_id
where verification_success.checked_out is null
group by order_details.tpv_success_id, order_details.tpv_id,
order_details.ver_code, order_details.app_id,
order_details.acct_id, order_details.first_name || ' ' ||
order_details.last_name, order_details.order_date,
verification_success.checked_out, order_details.csr_name,
order_details.products, order_details.promotion, num_prods,
order_details.division_id, order_details.has_dish_billing_info
having ( count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) <  (select
array_upper(acct_product_data_requirements_details.acct_prod_ids,
1)) ) and
(
count(distinct case when provision_issues.account_product_id is 
not
null and provision_issues.resolved_date is null then
provision_issues.account_product_id end ) +
count(case when reconciliations.rec_id is not null and
reconciliation_cancels.rec_cancel_id is null then
reconciliations.rec_id end ) +
count(case when reconciliation_cancels.rec_cancel_id is not null
then reconciliation_cancels.rec_cancel_id end )
) < (select
array_upper(acct_product_data_requirements_details.acct_prod_ids, 1))
and order_details.division_id =
any('{79,38,70,66,35,40,37,36,67,41,65,39}') --this array here
varies.  indexes are present for the different variations
order by order_details.order_date

here is the execution plan:

Sort  (cost=1350779962.18..1350969875.28 rows=75965240 width=290)
(actual time=16591.711..16591.723 rows=110 loops=1)
  Sort Key: frontier.order_details.order_date
  ->  GroupAggregate  (cost=1235567017.53..1295217399.34 rows=75965240
width=290) (actual time=16583.383..16591.420 rows=110 loops=1)
Filter: ((count(CASE WHEN ((rec_id IS NOT NULL) AND
(rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END) <
(subplan)) AND (((count(DISTINCT CASE WHEN ((account_product_id IS NOT
NULL) AND (resolved_date IS NULL)) THEN account_product_id ELSE
NULL::integer END) + count(CASE WHEN ((rec_id IS NOT NULL) AND
(rec_cancel_id IS NULL)) THEN rec_id ELSE NULL::integer END)) +
count(CASE WHEN (rec_cancel_id IS NOT NULL) THEN rec_cancel_id ELSE
NULL::integer END)) < (subplan)))
->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502
width=290) (actual time=16576.997..16577.513 rows=3366 loops=1)
  Sort Key: frontier.order_details.tpv_success_id,
frontier.order_details.tpv_id, frontier.order_details.ver_code,
frontier.order_details.app_id, frontier.order_details.acct_id,
(((frontier.order_details.first_name)::text || ' '::text) ||
(frontier.order_details.last_name)::text),
frontier.order_details.order_date, verification_success.checked_out,
frontier.order_details.csr_name, frontier.order_details.products,
frontier.order_details.promotion, (subplan),
frontier.order_details.division_id,
frontier.order_details.has_dish_billing_info
  ->  Merge Join  (cost=11383.41..310142000.26
rows=97405750

[PERFORM] Hardware for PostgreSQL

2007-10-31 Thread Ketema

I am trying to build a very Robust DB server that will support 1000+
concurrent users (all ready have seen max of 237 no pooling being
used).  I have read so many articles now that I am just saturated.  I
have a general idea but would like feedback from others.

I understand query tuning and table design play a large role in
performance, but taking that factor away
and focusing on just hardware, what is the best hardware to get for Pg
to work at the highest level
(meaning speed at returning results)?

How does pg utilize multiple processors?  The more the better?
Are queries spread across multiple processors?
Is Pg 64 bit?
If so what processors are recommended?

I read this : 
http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/node12.html
POSTGRESQL uses a multi-process model, meaning each database
connection has its own Unix process. Because of this, all multi-cpu
operating systems can spread multiple database connections among the
available CPUs. However, if only a single database connection is
active, it can only use one CPU. POSTGRESQL does not use multi-
threading to allow a single process to use multiple CPUs.

Its pretty old (2003) but is it still accurate?  if this statement is
accurate how would it affect connection pooling software like pg_pool?

RAM?  The more the merrier right? Understanding shmmax and the pg
config file parameters for shared mem has to be adjusted to use it.
Disks?  standard Raid rules right?  1 for safety 5 for best mix of
performance and safety?
Any preference of SCSI over SATA? What about using a High speed (fibre
channel) mass storage device?

Who has built the biggest baddest Pg server out there and what do you
use?

Thanks!


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[PERFORM] Running on an NFS Mounted Directory

2006-04-26 Thread Ketema Harris

Title: Running on an NFS Mounted Directory



I was wondering if there were any performance issues with having a data directory that was an nfs mounted drive?  Say like a SAN or NAS device? Has anyone done this before?

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Ketema Harris

I am looking for the best solution to have a large amount of disk storage
attached to my PostgreSQL 8.1 server.  I was thinking of having a san or nas
attached device be mounted by the pg server over nfs, hence the question
about nfs performance.  What other options/protocols are there to get high
performance and data integrity while having the benefit of not having the
physical storage attached to the db server?

On 4/27/06 12:55 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

> On Wed, Apr 26, 2006 at 07:35:42PM -0700, Steve Wampler wrote:
>> On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote:
>>> I was wondering if there were any performance issues with having a data
>>> directory that was an nfs mounted drive?  Say like a SAN or NAS device? Has
>>> anyone done this before?
>>  
>> My understanding is that NFS is pretty poor in performance in general,
>> so I would expect it to be particularly bad for a DB.  You might run
>> some (non-DB) performance tests to get a feel for how bad it might me.
>> (Someone once told me that NFS topped out at around 12MB/s, but I don't
>> know if that's really true [they were trying to sell a competitive
>> networked filesystem]).
>> 
>> In any event, you're at least limited by ethernet speeds, if not more.
> 
> More importantly, the latency involved will kill commit performance. If
> it doesn't then it's likely that fsync isn't being obeyed, which means 0
> data integrity.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Ketema Harris

OK.  My thought process was that having non local storage as say a big raid
5 san ( I am talking 5 TB with expansion capability up to 10 ) would allow
me to have redundancy, expandability, and hopefully still retain decent
performance from the db.  I also would hopefully then not have to do
periodic backups from the db server to some other type of storage.  Is this
not a good idea?  How bad of a performance hit are we talking about?  Also,
in regards to the commit data integrity, as far as the db is concerned once
the data is sent to the san or nas isn't it "written"?  The storage may have
that write in cache, but from my reading and understanding of how these
various storage devices work that is how they keep up performance.  I would
expect my bottleneck if any to be the actual Ethernet transfer to the
storage, and I am going to try and compensate for that with a full gigabit
backbone.

On 4/27/06 8:44 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 27, 2006 at 08:38:55AM -0400, Ketema Harris wrote:
>> I am looking for the best solution to have a large amount of disk storage
>> attached to my PostgreSQL 8.1 server.
> 
>> What other options/protocols are there to get high performance and data
>> integrity while having the benefit of not having the physical storage
>> attached to the db server?
> 
> These are two distinct requirements. Are both really requirements or is
> one "nice to have"? The "best" solution for "a large amount of disk
> storage" isn't "not having the physical storage attached to the db
> server". If you use non-local storage it will be slower and more
> expensive, quite likely by a large margin. There may be other advantages
> to doing so, but you haven't mentioned any of those as requirements.
> 
> Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Ketema Harris

Yes, your right, I meant not have to do the backups from the db server
itself.  I can do that within the storage device now, by allocating space
for it, and letting the device copy the data files on some periodic basis.

On 4/27/06 9:05 AM, "Bruno Wolff III" <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 27, 2006 at 08:57:51 -0400,
>   Ketema Harris <[EMAIL PROTECTED]> wrote:
>> performance from the db.  I also would hopefully then not have to do
>> periodic backups from the db server to some other type of storage.  Is this
>> not a good idea?  How bad of a performance hit are we talking about?  Also,
> 
> You always need to do backups if you care about your data. What if someone
> accidental deletes a lot of data? What if someone blows up your data
> center (or there is a flood)?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Ketema Harris

First, I appreciate all of your input.

>No, backups are completely unrelated to your storage type; you need them
> either way.
Please another post. I meant the storage would do the back ups.
>redundancy, expandability
What I mean by these stupid flavor words is:
Redundancy : raid 5.
Expandability : the ability to stick another drive in my array and get more
storage and not have to turn of the db.
>Do you 
> need the ability to do snapshots?
Yes.
>Do you want to share one big, expensive, reliable unit between
> multiple systems? Will you be doing failover?
Yes, and Yes.  Really on one other system, a phone system, but it is the
crux of my business and will be writing a lot of recorded phone calls. I am
working with a storage company now to set up the failover, I want the db and
phone systems to never no if the storage switched over.

You have given me a lot to think about.  The performance concerns me and I
will have to find some way to test.  Perhaps spending a little less on the
storage system and more on the actual servers is the way to go?  Then
utilize some combination off pg_backup, and the archive_command directive
with a periodic script.

Thank You all.  I will keep researching this and the more input the better.
Thank You.

On 4/27/06 9:24 AM, "Michael Stone" <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 27, 2006 at 08:57:51AM -0400, Ketema Harris wrote:
>> OK.  My thought process was that having non local storage as say a big raid
>> 5 san ( I am talking 5 TB with expansion capability up to 10 )
> 
> That's two disk trays for a cheap slow array. (Versus a more expensive
> solution with more spindles and better seek performance.)
> 
>> would allow
>> me to have redundancy, expandability, and hopefully still retain decent
>> performance from the db.  I also would hopefully then not have to do
>> periodic backups from the db server to some other type of storage.
> 
> No, backups are completely unrelated to your storage type; you need them
> either way. On a SAN you can use a SAN backup solution to back multiple
> systems up with a single backup unit without involving the host CPUs.
> This is fairly useless if you aren't amortizing the cost over a large
> environment.
> 
>> Is this not a good idea?
> 
> It really depends on what you're hoping to get. As described, it's not
> clear. (I don't know what you mean by "redundancy, expandability" or
> "decent performance".)
> 
>> How bad of a performance hit are we talking about?
> 
> Way too many factors for an easy answer. Consider the case of NAS vs
> SCSI direct attach storage. You're probably in that case comparing a
> single 125MB/s (peak) gigabit ethernet channel to (potentially several)
> 320MB/s (peak) SCSI channels. With a high-end NAS you might get 120MB/s
> off that GBE. With a (more realistic) mid-range unit you're more likely
> to get 40-60MB/s. Getting 200MB/s off the SCSI channel isn't a stretch,
> and you can fairly easily stripe across multiple SCSI channels. (You can
> also bond multiple GBEs, but then your cost & complexity start going way
> up, and you're never going to scale as well.) If you have an environment
> where you're doing a lot of sequential scans it isn't even a contest.
> You can also substitute SATA for SCSI, etc.
> 
> For a FC SAN the peformance numbers are a lot better, but the costs &
> complexity are a lot higher. An iSCSI SAN is somewhere in the middle.
> 
>> Also, in regards to the commit data integrity, as far as the db is
>> concerned once the data is sent to the san or nas isn't it "written"?
>> The storage may have that write in cache, but from my reading and
>> understanding of how these various storage devices work that is how
>> they keep up performance.
> 
> Depends on the configuration, but yes, most should be able to report
> back a "write" once the data is in a non-volatile cache. You can do the
> same with a direct-attached array and eliminate the latency inherent in
> accessing the remote storage.
> 
>> I would expect my bottleneck if any to be the actual Ethernet transfer
>> to the storage, and I am going to try and compensate for that with a
>> full gigabit backbone.
> 
> see above.
> 
> The advantages of a NAS or SAN are in things you haven't really touched
> on. Is the filesystem going to be accessed by several systems? Do you
> need the ability to do snapshots? (You may be able to do this with
> direct-attach also, but doing it on a remote storage device tends to be
> simpler.) Do you want to share one big, expensive, reliable unit between
> multiple systems? Will you be doing failover? (Note that failover
> requi

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Ketema Harris

The SAN has the snapshot capability.


On 4/27/06 9:31 AM, "Bruno Wolff III" <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 27, 2006 at 09:06:48 -0400,
>   Ketema Harris <[EMAIL PROTECTED]> wrote:
>> Yes, your right, I meant not have to do the backups from the db server
>> itself.  I can do that within the storage device now, by allocating space
>> for it, and letting the device copy the data files on some periodic basis.
> 
> Only if the database server isn't running or your SAN provides a way to
> provide a snapshot of the data at a particular instant in time.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Nested Loops vs. Hash Joins or Merge Joins

2006-05-11 Thread Ketema Harris

;  Index Scan using customer_pkey on customer  (cost=0.00..3.06 rows=1 width=75) (actual time=0.011..0.015 rows=1 loops=3)
   Index Cond: ("outer".customer_id = customer.customer_id)
   ->  Seq Scan on product  (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.013 rows=2 loops=3)
 ->  Seq Scan on marketer_divisions  (cost=0.00..1.02 rows=2 width=49) (actual time=0.005..0.013 rows=2 loops=3)
   ->  Seq Scan on cities  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.013 rows=2 loops=3)
 Total runtime: 3.288 ms

Nested Loops off:

Hash Join  (cost=8.27..11.78 rows=1 width=268) (actual time=1.701..1.765 rows=3 loops=1)
   Hash Cond: ("outer".city_id = "inner".city_id)
   ->  Hash Join  (cost=7.24..10.73 rows=1 width=272) (actual time=1.629..1.667 rows=3 loops=1)
 Hash Cond: ("outer".customer_id = "inner".customer_id)
 ->  Seq Scan on customer  (cost=0.00..3.32 rows=32 width=75) (actual time=0.006..0.136 rows=33 loops=1)
 ->  Hash  (cost=7.24..7.24 rows=1 width=205) (actual time=1.366..1.366 rows=3 loops=1)
   ->  Hash Join  (cost=6.43..7.24 rows=1 width=205) (actual time=1.243..1.333 rows=3 loops=1)
 Hash Cond: ("outer".division_id = "inner".division_id)
 ->  Hash Join  (cost=5.40..6.20 rows=1 width=164) (actual time=1.184..1.252 rows=3 loops=1)
   Hash Cond: ("outer".ldc_id = "inner".ldc_id)
   ->  Merge Join  (cost=4.38..5.16 rows=1 width=151) (actual time=1.124..1.169 rows=3 loops=1)
 Merge Cond: ("outer".marketer_product_id = "inner".marketer_product_id)
 ->  Index Scan using "PK_marketer_product_id" on marketer_product  (cost=0.00..3.04 rows=4 width=12) (actual time=0.012..0.019 rows=2 loops=1)
 ->  Sort  (cost=4.38..4.38 rows=1 width=147) (actual time=1.098..1.109 rows=3 loops=1)
   Sort Key: account.marketer_product_id
   ->  Hash Join  (cost=2.78..4.37 rows=1 width=147) (actual time=1.007..1.064 rows=3 loops=1)
 Hash Cond: ("outer".app_id = "inner".app_id)
 ->  Hash Join  (cost=1.75..3.32 rows=1 width=155) (actual time=0.494..0.875 rows=31 loops=1)
   Hash Cond: (("outer".app_id = "inner".app_id) AND ("outer".acct_id = "inner".acct_id))
   ->  Seq Scan on account  (cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.154 rows=34 loops=1)
   ->  Hash  (cost=1.50..1.50 rows=50 width=8) (actual time=0.451..0.451 rows=50 loops=1)
 ->  Seq Scan on application  (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.223 rows=50 loops=1)
 ->  Hash  (cost=1.03..1.03 rows=1 width=4) (actual time=0.042..0.042 rows=3 loops=1)
   ->  Seq Scan on reconciliation  (cost=0.00..1.03 rows=1 width=4) (actual time=0.007..0.019 rows=3 loops=1)
 Filter: (transferred_date IS NULL)
   ->  Hash  (cost=1.02..1.02 rows=2 width=21) (actual time=0.036..0.036 rows=2 loops=1)
 ->  Seq Scan on product  (cost=0.00..1.02 rows=2 width=21) (actual time=0.005..0.014 rows=2 loops=1)
 ->  Hash  (cost=1.02..1.02 rows=2 width=49) (actual time=0.036..0.036 rows=2 loops=1)
   ->  Seq Scan on marketer_divisions  (cost=0.00..1.02 rows=2 width=49) (actual time=0.007..0.016 rows=2 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=2 width=4) (actual time=0.039..0.039 rows=2 loops=1)
 ->  Seq Scan on cities  (cost=0.00..1.02 rows=2 width=4) (actual time=0.009..0.017 rows=2 loops=1)
 Total runtime: 2.084 ms

With nested loops enabled does it choose to use them because it sees the estimated start up cost with loops as less?  Does it not know that the total query would be faster with the Hash Joins?  This query is in development right now, and as such there are not many rows.  When it goes to production the reconciliation table will grow by about 50 – 100 rows per day where the transferred_date is NULL (this is the driving criteria behind this query.)  As the table grows can I expect Pg to realize the the nested loops will be slower and will it switch to the Hash Joins?  If not how would I force it to use the Hash Joins without just turning off nested loops completely?  Is it a good idea to turn off nested loops completely?

Statistics collecting and auto vacuum is enabled btw.  I have an erd diagram showing the table structures if anyone is interested in looking at it, just let me know.

Thanks,
Ketema

Re: [PERFORM] Improving Query

2007-10-30 Thread Ketema Harris



On Oct 30, 2007, at 9:23 AM, Richard Huxton wrote:


Ketema wrote:

I have the following query that is a part of a function:


Yikes! Difficult to get a clear view of what this query is doing.
It seems complicated because you only have a small subset of the  
schema these tables tie into.

Be happy to share the whole thing, if it is needed.


OK, I'm assuming you're vacuumed and analysed on all these tables...

Yes.  Auto-vacuum is on and do a Full vacuuum every 2 days.




My concern is with the sort step that takes 15 seconds by itself:
->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502  
width=290)

(actual time=16576.997..16577.513 rows=3366 loops=1)


That's taking hardly any time, the startup time is 16576.997  
already. Of course, the row estimate is *way* out of line.
OK. I misread the plan and took start up time as the time it took to  
perform operation.  Thanks for the link to explain analyze.


If you look here (where the explain is a bit easier to see)
http://explain-analyze.info/query_plans/1258-query-plan-224

The two main things to look at seem to be the nested loops near the  
top and a few lines down the materialise (cost=326...


These two nested loops seem to be pushing the row estimates wildly  
out of reality. They also consume much of the time.


The immediate thing that leaps out here is that you are trying to  
join an int to an array of ints. Why are you using this setup  
rather than a separate table?
I see what you are talking about.  When I initially used this set up  
it was because I wanted to avoid a table that had a ton of rows in it  
that I knew I would have to join to often.  So I made a column that  
holds on average 4 or 5 ints representing "products" on a particular  
"order". I did not realize that using a function in the join would be  
worse that simply having a large table.



How can I improve this step?
Things I have thought about:
1)Creating indexes on the aggregates...Found out this can't be done.


Nope - not sure what it would mean in any case.
My initial thought was the counts were causing the slow up.  THis is  
not the issue though as you have shown.



2)Create Views of the counts and the sub select...is this any faster
as the view is executed at run time anyway?


Might make the query easier to write, won't make it faster. Not  
without materialised views which are the fancy name for #3...



3)Create actual tables of the sub select and aggregates...How would
this be maintained to ensure it was always accurate?


Triggers.
Because of the use of this system I may take this route as I think it  
will be less changes.



4)Increasing hardware resources.  Currently box is on a single
processor amd64 with 8Gb of RAM.  below are the settings for resource
usage.
shared_buffers = 65536
temp_buffers = 5000
max_prepared_transactions = 2000



These are settings out of postgresql.conf  Currently systctl.conf is  
set to kernel.shmmax = 805306368

connections are at 300 and I usually have about 200 connections open.



work_mem = 131072
maintenance_work_mem = 512000


Can't say about these without knowing whether you've got only one  
connection or 100.



max_stack_depth = 7168
max_fsm_pages = 16
max_fsm_relations = 4000
The only function of this box if for Pg, so I do not mind it using
every last drop of ram and resources that it can.
5)Upgrade version of pg..currently is running 8.1.4


Well every version gets better at planning, so it can't hurt.
At one point I did go to 8.2.3 on a dev box and performance was  
horrible.  Have not had opportunity to see how to make  
postgresql.conf file in 8.2 match settings in 8.1 as some things have  
changed.


--
  Richard Huxton
  Archonet Ltd



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Improving Query

2007-10-30 Thread Ketema Harris



On Oct 30, 2007, at 9:31 AM, Michael Glaesemann wrote:



On Oct 30, 2007, at 7:18 , Ketema wrote:


here is the execution plan:


I've put this online here:

http://explain-analyze.info/query_plans/1259-ketema-2007-10-30


I have attached an erd of the tables used in this query.  If it is
stripped out it can be viewed here: http://www.ketema.net/ 
provision_list_tables_erd.jpg


My concern is with the sort step that takes 15 seconds by itself:

->  Sort  (cost=1235567017.53..1238002161.29 rows=974057502  
width=290)

(actual time=16576.997..16577.513 rows=3366 loops=1)


What jumps out at me is the huge difference in estimated and  
returned rows, and the huge cost estimates. Have you analyzed  
recently?

Yes.  I run vacuum FULL ANALYZE VERBOSE every two days with a cron job.

I am running again now any way.


Do you have enable_seqscan disabled? It appears so, due to the high  
cost here:


-> Seq Scan on order_details (cost=1.0..10012.45  
rows=35 width=199) (actual time=0.001..0.001 rows=0 loops=1)


http://explain-analyze.info/query_plans/1259- 
ketema-2007-10-30#node-3594


What does it look like with seqscan enabled?

it was disabled.  new plan posted here:

http://explain-analyze.info/query_plans/1261-provision-list-seq-scan- 
enabled




2)Create Views of the counts and the sub select...is this any faster
as the view is executed at run time anyway?


Views aren't materialized: it's like inlining the definition of the  
view itself in the query.



3)Create actual tables of the sub select and aggregates...How would
this be maintained to ensure it was always accurate?


One way would be to update the summaries using triggers. Hopefully  
you won't need to do this after analyzing and perhaps tweaking your  
server configuration.


Unfortunately I don't have the time to look at the query plan in  
more detail, but I suspect there's a better way to get the results  
you're looking for.


Michael Glaesemann
grzm seespotcode net





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Improving Query

2007-10-30 Thread Ketema Harris


I am definitely interested in upgrading.

Is there a guide out there that perhaps was created to explain the  
changes in the config files from 8.1 to 8.2 ?


Migration guide I guess?


On Oct 30, 2007, at 11:39 AM, Tom Lane wrote:


Richard Huxton <[EMAIL PROTECTED]> writes:

Ketema wrote:

5)Upgrade version of pg..currently is running 8.1.4



Well every version gets better at planning, so it can't hurt.


+1 ... there are at least two things about this query that 8.2  
could be

expected to be a great deal smarter about:
* mixed outer and inner joins
* something = ANY(array)

regards, tom lane

---(end of  
broadcast)---

TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[PERFORM] hardware and For PostgreSQL

2007-10-31 Thread Ketema Harris

I am trying to build a very Robust DB server that will support 1000+  
concurrent users (all ready have seen max of 237 no pooling being  
used).  i have read so many articles now that I am just saturated.  I  
have a general idea but would like feedback from others.


I understand query tuning and table design play a large role in  
performance, but taking that factor away
and focusing on just hardware, what is the best hardware to get for  
Pg to work at the highest level

(meaning speed at returning results)?

How does pg utilize multiple processors?  The more the better?
Are queries spread across multiple processors?
Is Pg 64 bit?
If so what processors are recommended?

I read this : http://www.postgresql.org/files/documentation/books/ 
aw_pgsql/hw_performance/node12.html
POSTGRESQL uses a multi-process model, meaning each database  
connection has its own Unix process. Because of this, all multi-cpu  
operating systems can spread multiple database connections among the  
available CPUs. However, if only a single database connection is  
active, it can only use one CPU. POSTGRESQL does not use multi- 
threading to allow a single process to use multiple CPUs.


Its pretty old (2003) but is it still accurate?  if this statement is  
accurate how would it affect connection pooling software like pg_pool?


RAM?  The more the merrier right? Understanding shmmax and the pg  
config file parameters for shared mem has to be adjusted to use it.
Disks?  standard Raid rules right?  1 for safety 5 for best mix of  
performance and safety?
Any preference of SCSI over SATA? What about using a High speed  
(fibre channel) mass storage device?


Who has built the biggest baddest Pg server out there and what do you  
use?


Thanks!





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[PERFORM] Improving Query

[PERFORM] Hardware for PostgreSQL

[PERFORM] Running on an NFS Mounted Directory

Re: [PERFORM] Running on an NFS Mounted Directory

Re: [PERFORM] Running on an NFS Mounted Directory

Re: [PERFORM] Running on an NFS Mounted Directory

Re: [PERFORM] Running on an NFS Mounted Directory

Re: [PERFORM] Running on an NFS Mounted Directory

[PERFORM] Nested Loops vs. Hash Joins or Merge Joins

Re: [PERFORM] Improving Query

Re: [PERFORM] Improving Query

Re: [PERFORM] Improving Query

[PERFORM] hardware and For PostgreSQL

13 matches

Site Navigation

Mail list logo

Footer information