Re: [PERFORM] hardware and For PostgreSQL

2007-11-01 Thread Joe Uhl
Magnus Hagander wrote:
> Ron St-Pierre wrote:
>   
>> Joe Uhl wrote:
>> 
>>> I realize there are people who discourage looking at Dell, but i've been
>>> very happy with a larger ball of equipment we ordered recently from
>>> them.  Our database servers consist of a PowerEdge 2950 connected to a
>>> PowerVault MD1000 with a 1 meter SAS cable.
>>>
>>>   
>>>   
>> We have a similar piece of equipment from Dell (the PowerEdge), and when
>> we had a problem with it we received excellent service from them. When
>> our raid controller went down (machine < 1 year old), Dell helped to
>> diagnose the problem and installed a new one at our hosting facility,
>> all within 24 hours.
>> 
>
> 24 hours?! I have a new one for my HP boxes onsite in 4 hours, including
> a tech if needed...
>
> But I assume Dell also has service-agreement deals you can get to get
> the level of service you'd want. (But you won't get it for a
> non-brand-name server, most likely)
>
> Bottom line - don't underestimate the service you get from the vendor
> when something breaks. Because eventually, something *will* break.
>
>
> //Magnus
>   
Yeah the response time depends on the service level purchased.  I
generally go with 24 hour because everything is redundant so a day of
downtime isn't going to bring services down (though it could make them
slow depending on what fails) but you can purchase 4 hr and in some
cases even 2 hr.  I had a "gold" level support contract on a server that
failed awhile back and within 3 net hours they diagnosed and fixed the
problem by getting onsite and replacing the motherboard and a cpu.  I
haven't had any of our 24hr support level devices fail yet so don't have
anything to compare there.

If you do go with Dell and want the higher support contracts i'll
restate that a small business account is the way to go.  Typically the
prices are better to the point that a support level upgrade appears free
when compared to the best shopping cart combo I can come up with.

Joe

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Adam Tauno Williams
> > You're likely better off (performance-wise) putting it on the same disk
> > as the database itself if that one has better RAID, for example.
> I'm thinking along the lines of since nothing much writes to the OS
> Disk, I should(keyword) be safe.

You are almost certainly wrong about this;  think "syslog"

-- 
Adam Tauno Williams, Network & Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Palle Girgensohn

Hi,

I have a table "login" with approx 600,000 tuples, a person table with 
approx 10 tuples.


When running
select max("when") from login where userid='userid'

it takes a second or two, but when adding "group by userid" the planner 
decides on using another plan, and it gets *much* faster. See example below.


Number of tuples per user varies from zero to a couple of thousands. It 
seems to slower when there are no tuples as all, but it is always slow.


This is only for max() and min(). For count(), the plan is the same, it 
always uses "Aggregate".


Any ideas about this? Do we need to add "group by userid" to our code base 
to optimize, or is there another way? Updating postgresql to 8.2 is a long 
term option, but I'd like a short term option as well...


Regards,
Palle


pp=# select version();
version
-
PostgreSQL 8.1.8 on amd64-portbld-freebsd6.1, compiled by GCC cc (GCC) 
3.4.4 [FreeBSD] 20050518

(1 row)

Time: 0,530 ms
pp=# explain analyze SELECT max("when") FROM login WHERE userid='girgen' ;
QUERY PLAN

Result  (cost=323.80..323.81 rows=1 width=0) (actual 
time=3478.781..3478.785 rows=1 loops=1)

  InitPlan
->  Limit  (cost=0.00..323.80 rows=1 width=8) (actual 
time=3478.768..3478.768 rows=0 loops=1)
  ->  Index Scan Backward using login_when_idx on "login" 
(cost=0.00..131461.90 rows=406 width=8) (actual time=3478.759..3478.759 
rows=0 loops=1)
Filter: (("when" IS NOT NULL) AND (userid = 
'sarah.gilliam1'::text))

Total runtime: 3478.868 ms
(6 rows)

Time: 3480,442 ms
pp=# explain analyze SELECT max("when") FROM login WHERE userid='girgen' 
group by userid;

QUERY PLAN
-
GroupAggregate  (cost=0.00..648.44 rows=1 width=25) (actual 
time=0.191..0.191 rows=0 loops=1)
  ->  Index Scan using login_userid_idx on "login"  (cost=0.00..646.40 
rows=406 width=25) (actual time=0.183..0.183 rows=0 loops=1)

Index Cond: (userid = 'sarah.gilliam1'::text)
Total runtime: 0.243 ms
(4 rows)

Time: 0,938 ms
pp=# \d login
 Table "public.login"
Column |   Type   | Modifiers
+--+
userid | text |
kursid | integer  |
when   | timestamp with time zone |
mode   | text | default 'pm'::text
Indexes:
   "login_kurs_user_idx" btree (kursid, userid)
   "login_userid_idx" btree (userid)
   "login_when_idx" btree ("when")
Foreign-key constraints:
   "pp_fk1" FOREIGN KEY (userid) REFERENCES person(userid) ON UPDATE 
CASCADE ON DELETE CASCADE
   "pp_fk2" FOREIGN KEY (kursid) REFERENCES course(id) ON UPDATE CASCADE 
ON DELETE CASCADE




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [Fwd: Re: [PERFORM] Outer joins and Seq scans]

2007-11-01 Thread Tom Lane
Sami Dalouche <[EMAIL PROTECTED]> writes:
> Compare that to the following query, that is exactly the same except
> that the City table is inner'joined instead of outer joined
> ...
> the explain analyze is available at :
> http://www.photosdesami.com/temp/exp6.txt

AFAICS it's just absolutely blind luck that that query is fast.  The
planner chooses to do the contactinf7_/city8_ join first, and because
that happens to return no rows at all, all the rest of the query falls
out in no time, even managing to avoid the scan of adcreatedevent.
If there were any rows out of that join it would be a great deal slower.

There is a pretty significant semantic difference between the two
queries, too, now that I look closer: when you make 
"... join City city8_ on contactinf7_.city_id=city8_.id"
a plain join instead of left join, that means the join to contactinf7_
can be reduced to a plain join as well, because no rows with nulls for
contactinf7_ could possibly contribute to the upper join's result.
That optimization doesn't apply in the original form of the query,
which restricts the planner's freedom to rearrange things.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Tom Lane
Palle Girgensohn <[EMAIL PROTECTED]> writes:
> When running
>  select max("when") from login where userid='userid'
> it takes a second or two, but when adding "group by userid" the planner 
> decides on using another plan, and it gets *much* faster. See example below.

It's only faster for cases where there are few or no rows for the
particular userid ...

> Number of tuples per user varies from zero to a couple of thousands.

The planner is using an intermediate estimate of 406 rows.  You might be
well advised to increase the statistics target for login.userid --- with
luck that would help it to choose the right plan type for both common
and uncommon userids.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [Fwd: Re: [PERFORM] Outer joins and Seq scans]

2007-11-01 Thread Sami Dalouche
Thanks for your answer.

So, basically, what you are saying is that there is nothing particularly
wrong with the query, nor with its optimization ? So if I need
performance for this query, I should just revert to other techniques
(giving more memory to postgres, caching outside postgres, etc..) ?

Regards,
Sami Dalouche

Le jeudi 01 novembre 2007 à 09:29 -0400, Tom Lane a écrit :
> Sami Dalouche <[EMAIL PROTECTED]> writes:
> > Compare that to the following query, that is exactly the same except
> > that the City table is inner'joined instead of outer joined
> > ...
> > the explain analyze is available at :
> > http://www.photosdesami.com/temp/exp6.txt
> 
> AFAICS it's just absolutely blind luck that that query is fast.  The
> planner chooses to do the contactinf7_/city8_ join first, and because
> that happens to return no rows at all, all the rest of the query falls
> out in no time, even managing to avoid the scan of adcreatedevent.
> If there were any rows out of that join it would be a great deal slower.
> 
> There is a pretty significant semantic difference between the two
> queries, too, now that I look closer: when you make 
> "... join City city8_ on contactinf7_.city_id=city8_.id"
> a plain join instead of left join, that means the join to contactinf7_
> can be reduced to a plain join as well, because no rows with nulls for
> contactinf7_ could possibly contribute to the upper join's result.
> That optimization doesn't apply in the original form of the query,
> which restricts the planner's freedom to rearrange things.
> 
>   regards, tom lane


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Palle Girgensohn



--On torsdag, november 01, 2007 09.43.39 -0400 Tom Lane <[EMAIL PROTECTED]> 
wrote:



Palle Girgensohn <[EMAIL PROTECTED]> writes:

When running
 select max("when") from login where userid='userid'
it takes a second or two, but when adding "group by userid" the planner
decides on using another plan, and it gets *much* faster. See example
below.


It's only faster for cases where there are few or no rows for the
particular userid ...


Number of tuples per user varies from zero to a couple of thousands.


The planner is using an intermediate estimate of 406 rows.  You might be
well advised to increase the statistics target for login.userid --- with
luck that would help it to choose the right plan type for both common
and uncommon userids.


Unfortunately, altering statistics doesn't help. I see no difference when 
altering the value from 10 (default) to 100, 1000 or 10. :-(


Are there any other things I can modify?

OH, btw, maybe something in the postgresql.conf sucks?

max_connections = 100
shared_buffers = 3  # min 16 or max_connections*2, 8KB 
each

temp_buffers = 2500 # min 100, 8KB each
max_prepared_transactions = 100 # can be 0 or more
work_mem = 16384# min 64, size in KB
maintenance_work_mem = 16384# min 1024, size in KB
max_stack_depth = 32768 # min 100, size in KB
max_fsm_pages = 50
max_fsm_relations = 2
max_files_per_process = 2000
fsync = off
checkpoint_segments = 50# in logfile segments, min 1, 16MB 
each

effective_cache_size = 1# typically 8KB each
random_page_cost = 1.8
geqo = on
geqo_threshold = 10
from_collapse_limit = 8
join_collapse_limit = 8 # 1 disables collapsing of explicit


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Tom Lane
Palle Girgensohn <[EMAIL PROTECTED]> writes:
> Unfortunately, altering statistics doesn't help. I see no difference when 
> altering the value from 10 (default) to 100, 1000 or 10. :-(

Um, you did re-ANALYZE the table after changing the setting?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Palle Girgensohn



--On torsdag, november 01, 2007 11.06.57 -0400 Tom Lane <[EMAIL PROTECTED]> 
wrote:



Palle Girgensohn <[EMAIL PROTECTED]> writes:

Unfortunately, altering statistics doesn't help. I see no difference
when  altering the value from 10 (default) to 100, 1000 or 10. :-(


Um, you did re-ANALYZE the table after changing the setting?


alter table login alter userid SET statistics 1000;
vacuum analyze login;


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Scott Marlowe
On 11/1/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> Palle Girgensohn <[EMAIL PROTECTED]> writes:
> > Unfortunately, altering statistics doesn't help. I see no difference when
> > altering the value from 10 (default) to 100, 1000 or 10. :-(
>
> Um, you did re-ANALYZE the table after changing the setting?

And he changed it with

ALTER TABLE name ALTER [ COLUMN ] column SET STORAGE { PLAIN |
EXTERNAL | EXTENDED | MAIN }

right?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Tom Lane
Palle Girgensohn <[EMAIL PROTECTED]> writes:
> --On torsdag, november 01, 2007 11.06.57 -0400 Tom Lane <[EMAIL PROTECTED]> 
> wrote:
>> Um, you did re-ANALYZE the table after changing the setting?

> alter table login alter userid SET statistics 1000;
> vacuum analyze login;

Hm, that's the approved procedure all right.  But the plans didn't
change at all?  Not even the estimated number of rows?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Gregory Stark

"Palle Girgensohn" <[EMAIL PROTECTED]> writes:

> Are there any other things I can modify?

You might consider an index on . Keep in mind that every new
index imposes an incremental cost on every update and insert and increases the
time for vacuum.

> max_prepared_transactions = 100 # can be 0 or more

Are you actually using prepared transactions (are you synchronising multiple
databases using a transaction manager)? If not then set this to 0 as it takes
some resources.

> maintenance_work_mem = 16384# min 1024, size in KB

Raising this might decrease vacuum times if that's a problem.

> fsync = off

You realize that this means if the system loses power or the kernel crashes
you could have data corruption? Do you take very frequent backups or can you
reconstruct your data?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Steve Crawford
Ketema wrote:
> I am trying to build a very Robust DB server that will support 1000+
> concurrent users (all ready have seen max of 237 no pooling being
> used).  I have read so many articles now that I am just saturated.  I
> have a general idea but would like feedback from others.

Describe a bit better. 1,000 users or 1,000 simultaneous connections?
Ie, do you have a front-end where someone logs on, gets a connection,
and keeps it for the duration or is it a web-type app where each request
might connect-query-disconnect? If the latter, are your connections
persistent? How many queries/second do you expect?

How complex are the queries (retrieve single record or data-mining)?
Read-only or lots of updates? Do the read-queries need to be done every
time or are they candidates for caching?

> RAM?  The more the merrier right?

Generally, true. But once you reach the point that everything can fit in
RAM, more is just wasted $$$. And, according to my reading, there are
cases where more RAM can hurt - basically if you manage to create a
situation where your queries are large enough to just flush cache so you
don't benefit from caching but are hurt by spending time checking cache
for the data.

> Who has built the biggest baddest Pg server out there and what do you
> use?

Not me.

Someone just showed me live system monitoring data on one of his several
PG machines. That one was clocking multi-thousand TPS on a server
(Sun??) with 128GB RAM. That much RAM makes "top" look amusing.

Several of the social-networking sites are using PG - generally
spreading load over several (dozens) of servers. They also make heavy
use of pooling and caching - think dedicated memcached servers offering
a combined pool of several TB RAM.

For pooling, pgbouncer seems to have a good reputation. Tests on my
current production server show it shaving a few ms off every
connect-query-disconnect cycle. Connects are fairly fast in PG but that
delay becomes a significant issue under heavy load.

Test pooling carefully, though. If you blindly run everything through
your pooler instead of just selected apps, you can end up with
unexpected problems when one client changes a backend setting like "set
statement_timeout to 5". If the next client assigned to that backend
connection runs a long-duration analysis query, it is likely to fail.

Cheers,
Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread Palle Girgensohn



--On torsdag, november 01, 2007 09.43.39 -0400 Tom Lane <[EMAIL PROTECTED]> 
wrote:



Palle Girgensohn <[EMAIL PROTECTED]> writes:

When running
 select max("when") from login where userid='userid'
it takes a second or two, but when adding "group by userid" the planner
decides on using another plan, and it gets *much* faster. See example
below.


It's only faster for cases where there are few or no rows for the
particular userid ...


Well, no, not really. See below. OTH, it sometimes a bit slower. Seems to 
depend on how far away from the estimated number of rows you get? Weird?



Number of tuples per user varies from zero to a couple of thousands.


The planner is using an intermediate estimate of 406 rows.  You might be
well advised to increase the statistics target for login.userid --- with
luck that would help it to choose the right plan type for both common
and uncommon userids.


I'll try that, thanks!

--

pp=#  SELECT max("when") FROM login WHERE userid='kudo' group by userid;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 6,194 ms
pp=#  SELECT max("when") FROM login WHERE userid='kudo' ;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 992,391 ms
pp=#  SELECT max("when") FROM login WHERE userid='kudo' ;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 779,582 ms
pp=#  SELECT max("when") FROM login WHERE userid='kudo' ;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 818,667 ms
pp=#  SELECT max("when") FROM login WHERE userid='kudo' ;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 640,242 ms
pp=#  SELECT max("when") FROM login WHERE userid='kudo' group by userid;
 max
---
2007-01-04 15:31:46.863325+01
(1 row)

Time: 18,384 ms
pp=#  SELECT count(*) FROM login WHERE userid='kudo' group by userid;
count
---
 1998
(1 row)

Time: 12,762 ms
pp=# explain analyze SELECT max("when") FROM login WHERE userid='kudo' 
group by userid;
  QUERY PLAN 


-
GroupAggregate  (cost=0.00..648.44 rows=1 width=25) (actual 
time=24.700..24.703 rows=1 loops=1)
  ->  Index Scan using login_userid_idx on "login"  (cost=0.00..646.40 
rows=406 width=25) (actual time=0.140..16.931 rows=1998 loops=1)

Index Cond: (userid = 'kudo'::text)
Total runtime: 24.779 ms
(4 rows)

Time: 25,633 ms
pp=# explain analyze SELECT max("when") FROM login WHERE userid='kudo' ;

QUERY PLAN 



Result  (cost=323.93..323.94 rows=1 width=0) (actual 
time=1400.994..1400.997 rows=1 loops=1)

  InitPlan
->  Limit  (cost=0.00..323.93 rows=1 width=8) (actual 
time=1400.975..1400.979 rows=1 loops=1)
  ->  Index Scan Backward using login_when_idx on "login" 
(cost=0.00..131515.87 rows=406 width=8) (actual time=1400.968..1400.968 
rows=1 loops=1)

Filter: (("when" IS NOT NULL) AND (userid = 'kudo'::text))
Total runtime: 1401.057 ms
(6 rows)

Time: 1401,881 ms


pp=#  SELECT userid, count("when") FROM login WHERE userid in ('girgen' , 
'kudo') group by userid;

userid | count
+---
kudo   |  1998
girgen |  1120
(2 rows)


pp=# explain analyze SELECT max("when") FROM login WHERE userid='girgen' 
group by userid;
  QUERY PLAN 


-
GroupAggregate  (cost=0.00..648.44 rows=1 width=25) (actual 
time=25.137..25.141 rows=1 loops=1)
  ->  Index Scan using login_userid_idx on "login"  (cost=0.00..646.40 
rows=406 width=25) (actual time=0.121..20.712 rows=1120 loops=1)

Index Cond: (userid = 'girgen'::text)
Total runtime: 25.209 ms
(4 rows)

Time: 25,986 ms

pp=# explain analyze SELECT max("when") FROM login WHERE userid='girgen' ;
 QUERY 
PLAN

--
Result  (cost=323.93..323.94 rows=1 width=0) (actual time=6.695..6.698 
rows=1 loops=1)

  InitPlan
->  Limit  (cost=0.00..323.93 rows=1 width=8) (actual 
time=6.669..6.675 rows=1 loops=1)
  ->  Index Scan Backward using login_when_idx on "login" 
(cost=0.00..131515.87 rows=406 width=8) (actual time=6.660..6.660 rows=1 
loops=1)
Filter: (("when" IS NOT NULL) AND (userid = 
'girgen'::text))

Total runtime: 6.785 ms
(6 rows)

Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Steve Crawford
Magnus Hagander wrote:
> Ow Mun Heng wrote:
>>> You're likely better off (performance-wise) putting it on the same disk
>>> as the database itself if that one has better RAID, for example.
>> I'm thinking along the lines of since nothing much writes to the OS
>> Disk, I should(keyword) be safe.
> 
> Unless it's *always* in the cache (not so likely), reads will also move
> the heads...

And if you aren't mounted noatime, reads will also cause a write.

Cheers,
Steve

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] How to avoid hashjoin and mergejoin

2007-11-01 Thread Carlo Stonebanks
I am comparing the same query on two different PG 8.2 servers, one Linux 
(8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.


The Windows posgrestsql.config is pretty well tuned but it looks like 
someone had wiped out the Linux config so the default one was re-installed. 
All performance-related memory allocation values seem to be set to the 
defaults, but mods have been made: max_connections = 100 and shared_buffers 
= 32MB.


The performance for this query is terrible on the Linux server, and good on 
the Windows server - presumably because the original Linux PG config has 
been lost. This query requires: that "set enable_seqscan to 'off';"


Still, the Linux server did not create the same, fast plan as the Windows 
server. In order to get the same plan we had to:


set enable_hashjoin to 'off';
set enable_mergejoin to 'off';

The plans were now similar, using nested loops and bitmapped heap scans. Now 
the Linux query outperformed the Windows query.


Question: Can anyone tell me which config values would have made PG select 
hash join and merge joins when the nested loop/bitmap heap scan combination 
was faster?


Carlo 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to avoid hashjoin and mergejoin

2007-11-01 Thread Scott Marlowe
On 11/1/07, Carlo Stonebanks <[EMAIL PROTECTED]> wrote:
> I am comparing the same query on two different PG 8.2 servers, one Linux
> (8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.
>
> The Windows posgrestsql.config is pretty well tuned but it looks like
> someone had wiped out the Linux config so the default one was re-installed.
> All performance-related memory allocation values seem to be set to the
> defaults, but mods have been made: max_connections = 100 and shared_buffers
> = 32MB.
>
> The performance for this query is terrible on the Linux server, and good on
> the Windows server - presumably because the original Linux PG config has
> been lost. This query requires: that "set enable_seqscan to 'off';"

Have you run analyze on the server yet?

A few general points on performance tuning.  With 8.2 you should set
shared_buffers to a pretty big chunk of memory on linux, up to 25% or
so.  That means 32 Meg shared buffers is REAL low for a linux server.
Try running anywhere from 512Meg up to 1Gig for starters and see if
that helps too.  Also turn up work_mem to something like 16 to 32 meg
then restart the server after making these changes.

Then give us the explain analyze output with all the enable_xxx set to ON.

summary: analyze, increase shared_buffers and work_mem, give us explain analyze.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to avoid hashjoin and mergejoin

2007-11-01 Thread Tom Lane
"Carlo Stonebanks" <[EMAIL PROTECTED]> writes:
> Still, the Linux server did not create the same, fast plan as the Windows 
> server. In order to get the same plan we had to:

> set enable_hashjoin to 'off';
> set enable_mergejoin to 'off';

This is just about never the appropriate way to solve a performance
problem, as it will inevitably create performance problems in other
queries.

What I'm wondering is whether the tables have been ANALYZEd recently,
and also whether there are any nondefault postgresql.conf settings in
use on the other server.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Ow Mun Heng

On Thu, 2007-11-01 at 11:16 -0700, Steve Crawford wrote:
> Magnus Hagander wrote:
> > Ow Mun Heng wrote:
> >>> You're likely better off (performance-wise) putting it on the same disk
> >>> as the database itself if that one has better RAID, for example.
> >> I'm thinking along the lines of since nothing much writes to the OS
> >> Disk, I should(keyword) be safe.
> > 
> > Unless it's *always* in the cache (not so likely), reads will also move
> > the heads...
> 
> And if you aren't mounted noatime, reads will also cause a write.


/dev/VolGroup00/LogVol01 /   ext3defaults,noatime   
 1 1
/dev/md0/raid1_arrayext3noatime,data=writeback  
 1 1

Yep..yep..

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] hardware for PostgreSQL

2007-11-01 Thread Mark Floyd

Hello,

I am new to setting up PostgreSQL machines for our operational  
environments and would appreciate if someone can take a look at this  
setup; throw tomatoes if it looks too bad.  We're expecting an  
initial load of about 5 million text meta-data records to our  
database; and are expecting upwards of 50 million records by 2008.   
We are expecting 40 "connect-query-disconnect' clients every 5  
minutes or so, and are looking at 15 connections/sec on our front  
facing components.  We've designed a set of Dell systems which we are  
planning to stick into our Slony/PgPool-II hybrid cluster; taking  
over our current random hodgepodge of machines we used when first  
experimenting.  Each of these systems will be identical.  Speed is  
important but we are putting more weight on the storage aspects.   
Below is our model system:


Dell PowerEdge Energy 2950
(2)  Quad Core Intel Xeon L5320, 2x4MB Cache, 1.86Ghz, 1066Mhz FSB
4GB 667Mhz Dual Ranked DIMMs, Energy Smart

PERC 5/i, x8 Backplane, Integrated Controller Card

Hard Drive Configuration: Integrated SAS/SATA RAID1/Raid 5

Hard Drive 1 (For Operating System): 36GB 10K RPM SAS 3Gbps 2.5-in  
Hot Plug HD

Hard Drive 2 (For logs): 36GB 10K RPM SAS 3Gbps 2.5-in Hot Plug HD

Hard Drives 3,4,5,6 (In a RAID 5 Configuration): (4) 146GB 10K SAS  
3Gbps Hard Drive, 2-5 inch, Hot Plug


Network Adapter: Dual Embedded Broadcom NetXTreme II 5708 Gigabit  
Ethernet NIC


It's overkill for our initial system but we are shooting for a system  
that allows for growth.  If someone can let us know if we're on the  
right path or are shooting ourselves in the foot with this setup I'd  
appreciate it.


Thanks,

- Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] hardware for PostgreSQL

2007-11-01 Thread Scott Marlowe
On 11/1/07, Mark Floyd <[EMAIL PROTECTED]> wrote:
> Hello,
> Dell PowerEdge Energy 2950
> (2)  Quad Core Intel Xeon L5320, 2x4MB Cache, 1.86Ghz, 1066Mhz FSB
> 4GB 667Mhz Dual Ranked DIMMs, Energy Smart
>
> PERC 5/i, x8 Backplane, Integrated Controller Card
>
> Hard Drive Configuration: Integrated SAS/SATA RAID1/Raid 5
>
> Hard Drive 1 (For Operating System): 36GB 10K RPM SAS 3Gbps 2.5-in
> Hot Plug HD
> Hard Drive 2 (For logs): 36GB 10K RPM SAS 3Gbps 2.5-in Hot Plug HD
>
> Hard Drives 3,4,5,6 (In a RAID 5 Configuration): (4) 146GB 10K SAS
> 3Gbps Hard Drive, 2-5 inch, Hot Plug

If you can fit 8 drives in it, for the love of god add two more and
mirror your OS and xlog drives ( I assume that's what you mean by
drive 2 for logs).Running a server on non-redundant drives is not
the best way to do things.

And if you can live on ~ 300 Gigs of storage instead of 450 Gigs, look
into RAID-10 for your data array.  RAID 10 is noticeably faster than
RAID-5 for any database that sees a fair bit of writing activity.

> It's overkill for our initial system but we are shooting for a system
> that allows for growth.  If someone can let us know if we're on the
> right path or are shooting ourselves in the foot with this setup I'd
> appreciate it.

Other than the 8 cores, it's not really overkill.  And depending on
your usage patterns 8 cores may well not be overkill too.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to avoid hashjoin and mergejoin

2007-11-01 Thread Carlo Stonebanks
<>

In this particular example, this was done to "force" the query on the Linux
box to use the same plan as on the Windows box to prove that - once the
correct plan was chosen - the Linux box could at least MATCH the Windows
box.

That being said, I should mention this: we take certain "core" queries that
we know are essential and embed them in a plpgsql SRF's that save the
various settings, modify them as required for the query, then restore them
after the rows are returned.

Does this address the problem you mentioned?

<< What I'm wondering is whether the tables have been ANALYZEd recently,>>

This is SUPPOSED to be done after a restore - but I will verify, thanks for
the reminder.

<< and also whether there are any nondefault postgresql.conf settings in
use on the other server.>>

Definitely - this is what alerted me to the fact that there was something
suspicious. We try to optimize our memory settings (based on various tuning
docs, advice from here, and good old trial-and-error). Since the new config
had barely any changes, I knew something was wrong.

Carlo 

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: November 1, 2007 5:42 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to avoid hashjoin and mergejoin 

"Carlo Stonebanks" <[EMAIL PROTECTED]> writes:
> Still, the Linux server did not create the same, fast plan as the Windows 
> server. In order to get the same plan we had to:

> set enable_hashjoin to 'off';
> set enable_mergejoin to 'off';

This is just about never the appropriate way to solve a performance
problem, as it will inevitably create performance problems in other
queries.

What I'm wondering is whether the tables have been ANALYZEd recently,
and also whether there are any nondefault postgresql.conf settings in
use on the other server.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] select max(field) from table much faster with a group by clause?

2007-11-01 Thread hubert depesz lubaczewski
On Thu, Nov 01, 2007 at 02:07:55PM +0100, Palle Girgensohn wrote:
> I have a table "login" with approx 600,000 tuples, a person table with 
> approx 10 tuples.
> When running
> select max("when") from login where userid='userid'
> it takes a second or two, but when adding "group by userid" the planner 
> decides on using another plan, and it gets *much* faster. See example below.
> pp=# explain analyze SELECT max("when") FROM login WHERE userid='girgen' ;

just do:
create index q on login (userid, "when"); and you should be fine.
if it will not help, rewrite the query as:
select "when"
from login
where userid = 'girgen'
order by userid desc, "when" desc limit 1;

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq