Re: [PERFORM] which hardware setup

2010-05-25 Thread Jesper Krogh

Option 2:
App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and  
2x 146GB 15k RPM SAS (RAID1) disks




 you didnt mention your dataset size, but i the second option would  
be preferrable in most situations since it gives more of the os memory  
for disc caching. 12 gb vs 4 gb for the host running pg


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which hardware setup

2010-05-25 Thread Pedro Axelrud
Sorry Jesper, I thought I had mentioned.. our dataset have 18GB.


Pedro Axelrud
http://mailee.me
http://softa.com.br
http://flavors.me/pedroaxl


On Tue, May 25, 2010 at 03:21, Jesper Krogh jes...@krogh.cc wrote:

 Option 2:
 App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x
 146GB 15k RPM SAS (RAID1) disks


  you didnt mention your dataset size, but i the second option would be
 preferrable in most situations since it gives more of the os memory for disc
 caching. 12 gb vs 4 gb for the host running pg



[PERFORM] which hardware setup

2010-05-24 Thread Pedro Axelrud
Hello,

I work for a web app to send email newsletters, and I have one question
about postgres' performance in two different setups. Actually we have one
4GB Ram VPS running our app server (it's a rails app under nginx and thin)
and a 4GB Ram VPS running the database (18GB). We want to migrate to bare
metal servers, but I have made two different setups and don't know what is
the best option:

Option 1:
App Server: Dual Xeon 5130 dual core with 4GB ram and SATA disk
Postgres: Xeon 3360 quad core with 4GB ram and 2 x 146GB 15k RPM SAS (RAID1)
disks

Option 2:
App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x 146GB
15k RPM SAS (RAID1) disks

I know the first option would be better in terms of I/O for postgres, but
our app server doesnt use much I/O and with the second option we would have
much more ram.


Thank you

Pedro Axelrud
http://mailee.me
http://softa.com.br
http://flavors.me/pedroaxl


Re: [PERFORM] Which hardware ?

2008-06-20 Thread Lionel
Scott Marlowe wrote:
 We had a reporting server with about 80G of data on a machine with 4G
 ram last place I worked, and it could take it a few extra seconds to
 hit the old data, but the SW RAID-10 on it made it much faster at
 reporting than it would have been with a single disk.

Would this be a nice choice ?

HP Proliant DL320 G5p  Xeon DC 3 GHz - 8 Go RAM DDR2 ECC
- 4 x 146 Go SAS 15k rpm   - RAID-10 HP Smart Array (128 Mo cache)

I finally choose to have 2 data tables:
- one with pre aggregated (dividing size by 10), unpartitionned (=the 
database they currently use)
- one with original data, yearly partitionned

I will choose before each statement which table will be used depending on 
which select/joins/where/groupby the user choosed.
The aggregated datas will allow me to maintain actual performances (and even 
improve it using the new hardware twice more powerfull).

I think lines aggregation will be handled by the java application (excel/csv 
file loaded in memory),
which will be much faster than using a trigger on insertion in the full 
table.

Thanks.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Which hardware ?

2008-06-17 Thread Lionel
Hi,

I need to install a 8.3 database and was wondering which hardware would be 
sufficient to have good performances (less than 30s for² slowest select).

Database size: 25 Go /year, 5 years of history
One main table containing 40 million lines per year.
Batch inserts of 10 lines. Very very few deletes, few updates.

30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2 
lines.
5 of them have forein keys on the main table.

I will use table partitionning on the year column.

Statements will mainly do sums on the main table, grouped by whatever column 
of the database (3-5 joined tables, or join on join), with some criterions 
that may vary, lots of joined varchar in ('a','b',...,'z').
It's almost impossible to predict what users will do via the webapplication 
that queries this database: almost all select, join, group by, where... 
possibilities are available.

Up to 4 simultaneous users.

I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual 
(RAID1) SATA2 750Go HD.
Perharps with another HD for indexes.

Do you think it will be enough ?
Is another RAID for better performances a minimum requirement ?
Will a secondary HD for indexes help ?

Which OS would you use ? (knowing that there will be a JDK 1.6 installed 
too)

With 5 millions of lines, the same application runs quite fast on windows 
2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly 
when concurrent statements are made). Each statement consumes 100% of the 
CPU.


thanks for advices.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Scott Marlowe
On Tue, Jun 17, 2008 at 7:38 AM, Lionel [EMAIL PROTECTED] wrote:
 Hi,

 I need to install a 8.3 database and was wondering which hardware would be
 sufficient to have good performances (less than 30s for² slowest select).

 Database size: 25 Go /year, 5 years of history
 One main table containing 40 million lines per year.
 Batch inserts of 10 lines. Very very few deletes, few updates.

 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2
 lines.
 5 of them have forein keys on the main table.

 I will use table partitionning on the year column.

 Statements will mainly do sums on the main table, grouped by whatever column
 of the database (3-5 joined tables, or join on join), with some criterions
 that may vary, lots of joined varchar in ('a','b',...,'z').
 It's almost impossible to predict what users will do via the webapplication
 that queries this database: almost all select, join, group by, where...
 possibilities are available.

 Up to 4 simultaneous users.

 I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual
 (RAID1) SATA2 750Go HD.
 Perharps with another HD for indexes.

 Do you think it will be enough ?
 Is another RAID for better performances a minimum requirement ?
 Will a secondary HD for indexes help ?

More drives, all in the same RAID-10 setup.  For reporting like this
writing speed often isn't that critical, so you are often better off
with software RAID-10 than using a mediocre hardware RAID controller
(most adapatecs, low end LSI, etc...)

You'd be surprised what going from a 2 disk RAID1 to a 4 disk RAID10
can do in these circumstances.  Going up to 6, 8, 10 or more disks
really makes a difference.

 Which OS would you use ? (knowing that there will be a JDK 1.6 installed
 too)

I'd use RHEL5 because it's what I'm familiar with.  Any stable flavor
of linux or FreeBSD7 are good performance choices if you know how to
drive them.

 With 5 millions of lines, the same application runs quite fast on windows
 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly
 when concurrent statements are made). Each statement consumes 100% of the
 CPU.

That statement about concurrent statements REALLY sells me on the idea
of a many disk RAID10 here.  I'd take that over quad cores for what
you're doing any day.  Not that I'd turn down quad cores here either.
:)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Andrew Sullivan
On Tue, Jun 17, 2008 at 03:38:59PM +0200, Lionel wrote:
 Hi,
 
 I need to install a 8.3 database and was wondering which hardware would be 
 sufficient to have good performances (less than 30s for� slowest select).

 Statements will mainly do sums on the main table, grouped by whatever column 
 of the database (3-5 joined tables, or join on join), with some criterions 
 that may vary, lots of joined varchar in ('a','b',...,'z').
 It's almost impossible to predict what users will do via the webapplication 
 that queries this database: almost all select, join, group by, where... 
 possibilities are available.

I'm not sure that I have any specific recommendation to make in the
face of such sweeping requirements.  But I'd say you need to make I/O
cheap, which means piles of memory and extremely fast disk
subsystems.

Also, there's another important question (which never gets asked in
these discussions), which is, How much is the performance worth to
you?  If the last 10% of users get something longer than 30s, but
less than 40s, and they will pay no more to get the extra 10s response
time, then it's worth nothing to you, and you shouldn't fix it.
 
 Up to 4 simultaneous users.

You won't need lots of processer, then.
 
 I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual 
 (RAID1) SATA2 750Go HD.
 Perharps with another HD for indexes.

How big's the database?  If you can have enough memory to hold the
whole thing, including all indexes, in memory, that's what you want.
Apart from that, dual SATA2 is probably underpowered.  But. . .
 
 Which OS would you use ? (knowing that there will be a JDK 1.6 installed 
 too)

. . .I think this is the real mistake.  Get a separate database box.
It's approximately impossible to tune a box correctly for both your
application and your database, in my experience.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Matthew Wakeling

On Tue, 17 Jun 2008, Lionel wrote:

I need to install a 8.3 database and was wondering which hardware would be
sufficient to have good performances (less than 30s for² slowest select).



It's almost impossible to predict what users will do via the webapplication
that queries this database: almost all select, join, group by, where...
possibilities are available.


Well, Scott has given you some good pointers on how to make a fast system. 
However, your original question (is this fast enough) is impossible to 
answer, especially if the users are allowed to run absolutely anything 
they want. I bet you I could craft a query that takes more than 30 seconds 
regardless of how fast you make your system.


Having said that, I'll add the suggestion that you should put as much RAM 
in the box as possible. It can only help. As others have said, if you only 
have four users, then CPU power isn't going to be such a problem, and 
given that, I'd disagree with Andrew and say as long as you have plenty of 
RAM, Java can play well with a database on the same box. Depends what it 
is doing, of course.


Matthew

--
To be or not to be   -- Shakespeare
To do is to be   -- Nietzsche
To be is to do   -- Sartre
Do be do be do   -- Sinatra
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Lionel
Andrew Sullivan wrote:
 You won't need lots of processer, then.

can't find less than quad core for this price range...

 How big's the database?

with 20 millions of rows, the main table is 3.5 Go on win XP.
With 8 Go of indexes.

I estimate the whole database around 30 Go / year

  If you can have enough memory to hold the
 whole thing, including all indexes, in memory, that's what you want.
 Apart from that, dual SATA2 is probably underpowered.  But. . .

RAID is twice more expansive.
(600euros/month for a 5x750Go SATA2 with 12Gb of ram and unnecessary 2x quad 
core)

didn't find any RAID 10 not too expansive dedicated server.

If this setup is twice as fast, I can afford it. But if it a 30sec VS 
40sec...I'm not sure my customer will pay.

 Which OS would you use ? (knowing that there will be a JDK 1.6
 installed too)

 . . .I think this is the real mistake.  Get a separate database box.
 It's approximately impossible to tune a box correctly for both your
 application and your database, in my experience.

My tomcat webapp is well coded  and consumes nearly nothing.
On such powerful hardware, I prefer to run both on the same server.
I could eventually run it on a different server, much less powerfull, but 
it's not on the same network, I guess this would be an issue. 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Greg Smith

On Tue, 17 Jun 2008, Andrew Sullivan wrote:


Which OS would you use ? (knowing that there will be a JDK 1.6 installed
too)


. . .I think this is the real mistake.  Get a separate database box.
It's approximately impossible to tune a box correctly for both your
application and your database, in my experience.


I can't remember the last time I deployed a PG box that didn't have a Java 
app or three on it, too.  You've got even odds that putting it a separate 
system will even be a improvement.  Yes, if the Java app is a pig and the 
machine doesn't have enough resources, separating it out to another system 
will help.  But there are plenty of these buggers that will end up so much 
slower from the additional network latency that it's a net loss (depends 
on how the app groups its requests for rows).


If you know enough about Java to watch things like how much memory the 
JVMs are taking up, I wouldn't hesitate to put them all on the same 
machine.  Considering that Lionel's system seems pretty overpowered for 
what he's doing--runs plenty fast on a much slower system, enough RAM to 
hold a large portion of the primary tables and database, all batch updates 
that don't really need a good RAID setup--I'd say looks good here and 
recommend he just follow the plan he outlined.  Just watch the system with 
top for a bit under load to make sure the Java processes are staying under 
control.


As for OS, a RHEL5 or clone like CentOS should work fine here, which is 
more appropriate depends on your support requirements.  I would recommend 
against using FreeBSD as it's not the friendliest Java platform, and the 
additional complexity of Solaris seems like overkill for your app. 
Basically, evem though it's available for more of them, I only consider 
deploying a Java app on one of the mainstream platforms listed at 
http://www.java.com/en/download/manual.jsp right now because those are the 
mature releases.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Andrew Sullivan
On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote:
 My tomcat webapp is well coded  and consumes nearly nothing.

If I were ever inclined to say, Nonsense, about code I've never
seen, this is probably the occasion on which I'd do it.  A running JVM
is necessarily going to use some memory, and that is memory use that
you won't be able to factor out properly when developing models of
your database system performance.

 I could eventually run it on a different server, much less powerfull, but 
 it's not on the same network, I guess this would be an issue. 

The power of the system is hard to know about in the context (with
only 8Go of memory, I don't consider this a powerful box at all,
note).  But why wouldn't it be on the same network?  You're using the
network stack anyway, note: JVMs can't go over domain sockets.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Scott Marlowe
On Tue, Jun 17, 2008 at 9:32 AM, Greg Smith [EMAIL PROTECTED] wrote:

  Considering that Lionel's system seems pretty overpowered for what he's
 doing--runs plenty fast on a much slower system, enough RAM to hold a large
 portion of the primary tables and database, all batch updates that don't
 really need a good RAID setup--I'd say looks good here and recommend he
 just follow the plan he outlined.  Just watch the system with top for a bit
 under load to make sure the Java processes are staying under control.

In the original post he mentioned that he had 5 years of data at about
25G / year.

With 125G of data, it's likely that if most queries are on recent data
it'll be in RAM, but anything that hits older data will NOT have that
luxury.  Which is why I recommended RAID-10.  It doesn't have to be on
a $1200 card with 44 disks or something, but even 4 disks in a sw
RAID-10 will be noticeably faster (about 2x) than a simple RAID-1 at
hitting that old data.

We had a reporting server with about 80G of data on a machine with 4G
ram last place I worked, and it could take it a few extra seconds to
hit the old data, but the SW RAID-10 on it made it much faster at
reporting than it would have been with a single disk.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Scott Marlowe
On Tue, Jun 17, 2008 at 9:42 AM, Andrew Sullivan [EMAIL PROTECTED] wrote:
 On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote:
 My tomcat webapp is well coded  and consumes nearly nothing.

 If I were ever inclined to say, Nonsense, about code I've never
 seen, this is probably the occasion on which I'd do it.  A running JVM
 is necessarily going to use some memory, and that is memory use that
 you won't be able to factor out properly when developing models of
 your database system performance.

But if that amount of memory is 256 Megs and it only ever acts as a
control panel or data access point, it's probably not a huge issue.
If it's 2 Gig it's another issue.  It's all about scale.  The real
performance hog for me on all in one boxes has been perl / fastcgi
setups.

 The power of the system is hard to know about in the context (with
 only 8Go of memory, I don't consider this a powerful box at all,
 note).

I always think of main memory in terms of how high a cache hit rate it
can get me.  If 8G gets you a 50% hit rate, and 16G gets you a 95% hit
rate, then 16G is the way to go.  But if 8G gets you to 75% and 32G
gets you to 79% because of your usage patterns (the world isn't always
bell curve shaped) then 8G is plenty and it's time to work on faster
disk subsystems if you need more performance.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Greg Smith

On Tue, 17 Jun 2008, Andrew Sullivan wrote:

A running JVM is necessarily going to use some memory, and that is 
memory use that you won't be able to factor out properly when developing 
models of your database system performance.


Now you've wandered into pure FUD.  Tuning maximum memory usage on a Java 
app so you can model it is straightforward (albeit a little confusing at 
first), and in most cases you can just sample it periodically to get a 
good enough estimate for database tuning purposes.  JVMs let you adjust 
maximum memory use with -Xmx , and if anything the bigger problem I run 
into is that using too much memory hits that limit and crashes Java long 
before it becomes a hazard to the database.


This is a system with 8GB of RAM here; having some Tomcat instances 
co-existing with the database when there's that much room to work is not 
that hard.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Greg Smith

On Tue, 17 Jun 2008, Scott Marlowe wrote:


We had a reporting server with about 80G of data on a machine with 4G
ram last place I worked, and it could take it a few extra seconds to
hit the old data, but the SW RAID-10 on it made it much faster at
reporting than it would have been with a single disk.


I agree with your statement above, that query time could likely be dropped 
a few seconds with a better disk setup.  I just question whether that's 
necessary given the performance target here.


Right now the app is running on an underpowered Windows box and is 
returning results in around 10s, on a sample data set that sounds like 1/8 
of a year worth of data (1/40 of the total).  It is seemingly CPU bound 
with not enough processor to handle concurrent queries being the source of 
the worst-case behavior.  The target is keeping that 30s on more powerful 
hardware, with at least 6X as much processor power and a more efficient 
OS, while using yearly partitions to keep the amount of data to juggle at 
once under control.  That seems reasonable to me, and while better disks 
would be nice I don't see any evidence they're really needed here.  This 
application sounds a batch processing/reporting one where plus or minus a 
few seconds doesn't have a lot of business value.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Scott Marlowe
On Tue, Jun 17, 2008 at 10:56 AM, Greg Smith [EMAIL PROTECTED] wrote:
 On Tue, 17 Jun 2008, Scott Marlowe wrote:

 We had a reporting server with about 80G of data on a machine with 4G
 ram last place I worked, and it could take it a few extra seconds to
 hit the old data, but the SW RAID-10 on it made it much faster at
 reporting than it would have been with a single disk.

 I agree with your statement above, that query time could likely be dropped a
 few seconds with a better disk setup.  I just question whether that's
 necessary given the performance target here.

 Right now the app is running on an underpowered Windows box and is returning
 results in around 10s, on a sample data set that sounds like 1/8 of a year
 worth of data (1/40 of the total).  It is seemingly CPU bound with not
 enough processor to handle concurrent queries being the source of the
 worst-case behavior.  The target is keeping that 30s on more powerful
 hardware, with at least 6X as much processor power and a more efficient OS,
 while using yearly partitions to keep the amount of data to juggle at once
 under control.  That seems reasonable to me, and while better disks would be
 nice I don't see any evidence they're really needed here.  This application
 sounds a batch processing/reporting one where plus or minus a few seconds
 doesn't have a lot of business value.

I think you're making a big assumption that this is CPU bound.  And it
may be that when all the queries are operating on current data that it
is.  But as soon as a few ugly queries fire that need to read tens of
gigs of data off the drives, then you'll start to switch to I/O bound
and the system will slow a lot.

We had a single drive box doing work on an 80G set that was just fine
with the most recent bits.  Until I ran a report that ran across the
last year instead of the last two days, and took 2 hours to run.

All the queries that had run really quickly on all the recent data
suddenly were going from 1 or 2 seconds to 2 or 3 minutes.  And I'd
have to kill my reporting query.

Moved it to the same exact hardware but with a 4 disc RAID-10 and the
little queries stayed 1-2 seconds while th reporting queries were cut
down by factors of about 4 to 10.  RAID-1 will be somewhere between
them I'd imagine.  RAID-10 has an amazing ability to handle parallel
accesses without falling over performance-wise.

You're absolutely right though, we really need to know the value of
fast performance here.

If you're monitoring industrial systems you need fast enough response
to spot problems before they escalate to disasters.

If you're running aggregations of numbers used for filling out
quarterly reports, not so much.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Scott Marlowe
On Tue, Jun 17, 2008 at 11:59 AM, Lionel [EMAIL PROTECTED] wrote:
 Scott Marlowe wrote:
 You're absolutely right though, we really need to know the value of
 fast performance here.

 the main problem is that my customers are used to have their reporting after
 few seconds.
 They want do have 10 times more data but still have the same speed, which
 is, I think, quite impossible.

 If you're running aggregations of numbers used for filling out
 quarterly reports, not so much.

 The application is used to analyse products sales behaviour, display charts,
 perform comparisons, study progression...
 10-40 seconds seems to be a quite good performance.
 More than 1 minute will be too slow (meaning they won't pay for that).

 I did some test with a 20 millions lines database on a single disk dual core
 2GB win XP system (default postgresql config), most of the time is spent in
 I/O: 50-100 secs for statements that scan 6 millions of lines, which will
 happen. Almost no CPU activity.

 So here is the next question: 4 disks RAID10 (did not find a french web host
 yet) or 5 disk RAID5 (found at 600euros/month) ?
 I don't want to have any RAID issue...
 I did not have any problem with my basic RAID1 since many years, and don't
 want that to change.

Do you have root access on your servers?  then just ask for 5 disks
with one holding the OS / Apps and you'll do the rest.  Software RAID
is probably a good fit for cheap right now.

If you can set it up yourself, you might be best off with 2 disk
RAID-1.  5 750G disks in a RAID-1 yields 750G of storage (duh) but
allows for five different readers to operate without the heads having
to seek.  large amounts of data can be read at a medium speed from a
RAID-1 like this.  But most RAID implementations don't aggregate
bandwidth for RAID-1.

They do for RAID-0.  So, having a huge RAID-0 zero array allows for
reading a large chunk of data really fast from all disks at once.

RAID1+0 gives you the ability to tune this in either direction.  But
the standard config of a 4 disk setup (striping two mirrors, each made
from two disks, is a good compromise to start with.  Average read
speed of array is doubled, and the ability to have two reads not
conflict helps too.

RAID5 is a comproise to provide the most storage while having mediocre
performance or, when degraded, horrifficaly poor performance.

Hard drives are cheap, hosting not as much.

Also, always look at optimizing their queries.  A lot of analysis is
done by brute force queries that rewritten intelligently suddenly run
in minutes not hours.  or seconds not minutes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Greg Smith

On Tue, 17 Jun 2008, Lionel wrote:


I did some test with a 20 millions lines database on a single disk dual core
2GB win XP system (default postgresql config), most of the time is spent in
I/O: 50-100 secs for statements that scan 6 millions of lines, which will
happen. Almost no CPU activity.


I hope you're aware that the default config is awful, and there are all 
sorts of possible causes for heavy I/O churn that might improve if you 
setup the postgresql.conf file to use the server's resources more 
aggressively (the default is setup for machines with a very small amount 
of RAM).  There are lots of links to articles that cover the various areas 
you might improve at 
http://wiki.postgresql.org/wiki/Performance_Optimization


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which hardware ?

2008-06-17 Thread Jon D



 From: [EMAIL PROTECTED]
 Subject: [PERFORM] Which hardware ?
 Date: Tue, 17 Jun 2008 15:38:59 +0200
 To: pgsql-performance@postgresql.org
 
 Hi,
 
 I need to install a 8.3 database and was wondering which hardware would be 
 sufficient to have good performances (less than 30s for² slowest select).
 
 Database size: 25 Go /year, 5 years of history
 One main table containing 40 million lines per year.
 Batch inserts of 10 lines. Very very few deletes, few updates.
 
 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2 
 lines.
 5 of them have forein keys on the main table.
 
 I will use table partitionning on the year column.
 
 Statements will mainly do sums on the main table, grouped by whatever column 
 of the database (3-5 joined tables, or join on join), with some criterions 
 that may vary, lots of joined varchar in ('a','b',...,'z').
 It's almost impossible to predict what users will do via the webapplication 
 that queries this database: almost all select, join, group by, where... 
 possibilities are available.
 
 Up to 4 simultaneous users.
 
 I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual 
 (RAID1) SATA2 750Go HD.
 Perharps with another HD for indexes.
 
 Do you think it will be enough ?
 Is another RAID for better performances a minimum requirement ?
 Will a secondary HD for indexes help ?
 
 Which OS would you use ? (knowing that there will be a JDK 1.6 installed 
 too)
 
 With 5 millions of lines, the same application runs quite fast on windows 
 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly 
 when concurrent statements are made). Each statement consumes 100% of the 
 CPU.
 
 
 thanks for advices.
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


I think hardware isnt going to solve your problem,  especially the cpu.  You 
only have four users..  and postgres can only use 1 core per query.   If you 
have sequential scans that span this table and say it has 60-80 million rows,  
It can could take longer then 30 seconds.  Even if you have alot of ram.  Just 
imagine what postgres is doing... if its target search is going to end in 
searching 40 million rows and it has to aggregate on two,  or three columns its 
going to be slow.  No amount of hardware is going to fix this.   Sure you can 
gain some speed by having entire tables in ram.   No magic bullet here.   Disk 
is definitely not a magic bullet.Even if you have a bunch of fast disks its 
still much slower then RAM in performing reads.So if you read heavy then 
adding more disk isnt going to just solve all your problems.   RAM is nice.   
The more pages you can keep in ram the less reading from the disk.  

Even with that all said and done...   aggregating lots of rows takes time.I 
suggest you come up with a system from preaggregating your data if possible.  
Identify all of your target dimensions.   If your lucky,  you only have a few 
key dimensions which can reduce size of table by lots and reduce queries to 1-2 
seconds.   There are a number of ways to tackle this,  but postgres is a nice 
db to do this with, since writers do not block readers.   

I think you should focus on getting this system to work well with minimal 
hardware first.  Then you can upgrade.   Over the next few years the db is only 
going to get larger.   You have 4 users now.. but who's to say what it will 
evolve into.
_
Earn cashback on your purchases with Live Search - the search that pays you 
back!
http://search.live.com/cashback/?pkw=form=MIJAAF/publ=HMTGL/crea=earncashback
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance