Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Ron Johnson
On Fri, 2003-07-25 at 11:13, Josh Berkus wrote:
> Folks,
> 
> > Since PG doesn't have active-active clustering, that's out, but since
> > the database will be very static, why not have, say 8 machines, each
> > with it's own copy of the database?  (Since there are so few updates,
> > you feed the updates to a litle Perl app that then makes the changes
> > on each machine.)  (A round-robin load balancer would do the trick
> > in utilizing them all.)
> 
> Another approach I've seen work is to have several servers connect to one SAN 
> or NAS where the data lives.  Only one server is enabled to handle "write" 
> requests; all the rest are read-only.  This does mean having dispacting 
> middleware that parcels out requests among the servers, but works very well 
> for the java-based company that's using it.

Wouldn't the cache on the read-only databases get out of sync with
the true on-disk data?

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Josh Berkus
Folks,

> Since PG doesn't have active-active clustering, that's out, but since
> the database will be very static, why not have, say 8 machines, each
> with it's own copy of the database?  (Since there are so few updates,
> you feed the updates to a litle Perl app that then makes the changes
> on each machine.)  (A round-robin load balancer would do the trick
> in utilizing them all.)

Another approach I've seen work is to have several servers connect to one SAN 
or NAS where the data lives.  Only one server is enabled to handle "write" 
requests; all the rest are read-only.  This does mean having dispacting 
middleware that parcels out requests among the servers, but works very well 
for the java-based company that's using it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Shridhar Daithankar
On 25 Jul 2003 at 18:41, Kasim Oztoprak wrote:
> what exactly do you mean from a pilot program?

Like get a quad CPU box, load the data and ask only 10 operators to test the 
system..

Beta testing basically..

Bye
 Shridhar

--
The man on tops walks a lonely street; the "chain" of command is often a noose.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Kasim Oztoprak
On 25 Jul 2003 17:13 EEST you wrote:

> On 25 Jul 2003 at 16:38, Kasim Oztoprak wrote:
> > this is kind of directory assistance application. actually the select statements 
> > are not
> > very complex. the database contain 25 million subscriber records and the operators 
> > searches 
> > for the subscriber numbers or addresses. there are not much update operations 
> > actually the 
> > update ratio is approximately %0.1 . 
> > 
> > i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon 
> > processors.
> > and suitable memory capacity with it. 
> 
> Are you going to duplicate the data?
> 
> If you are going to have 3000 sql statements per second, I would suggest,
> 
> 1. Get quad CPU. You probably need that horsepower
> 2. Use prepared statements and stored procedures to avoid parsing overhead.
> 
> I doubt you would need cluster of machines though. If you run it thr. a pilot 
> program, that would give you an idea whether or not you need a cluster..
> 
> Bye
>  Shridhar
>

i will try to cluster them. i can duplicate the data if i need. in the case of 
update, then, i will fix them through. 

what exactly do you mean from a pilot program?

-kasým 
> --
> Default, n.:  The hardware's, of course.
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Ron Johnson
On Fri, 2003-07-25 at 11:38, Kasim Oztoprak wrote:
> On 24 Jul 2003 23:25 EEST you wrote:
> 
> > On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> > > On 24 Jul 2003 17:08 EEST you wrote:
> > > 
> > > > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
> > [snip]
> > > 
> > > we do not have memory problem or disk problems. as I have seen in the list the 
> > > best way to 
> > > use disks are using raid 10 for data and raid 1 for os. we can put as much 
> > > memory as 
> > > we require. 
> > > 
> > > now the question, if we have 100 searches per second and in each search if we 
> > > need 30 sql
> > > instruction, what will be the performance of the system in the order of time. 
> > > Let us say
> > > we have two machines described aove in a cluster.
> > 
> > That's 3000 sql statements per second, 180 thousand per minute
> > What the heck is this database doing!
> > 
> > A quad-CPU Opteron sure is looking useful right about now...  Or
> > an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
> > available.
> > 
> > How complicated are each of these SELECT statements?
> 
> this is kind of directory assistance application. actually the select statements are 
> not
> very complex. the database contain 25 million subscriber records and the operators 
> searches 
> for the subscriber numbers or addresses. there are not much update operations 
> actually the 
> update ratio is approximately %0.1 . 
> 
> i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon 
> processors.
> and suitable memory capacity with it. 
> 
> i hope it will overcome with this problem. any similar implementation?

Since PG doesn't have active-active clustering, that's out, but since
the database will be very static, why not have, say 8 machines, each
with it's own copy of the database?  (Since there are so few updates,
you feed the updates to a litle Perl app that then makes the changes
on each machine.)  (A round-robin load balancer would do the trick
in utilizing them all.)

Also, with lots of machines, you could get away with less expensive
machines, say 2GHz CPU, 1GB RAM and a 40GB IDE drive.  Then, if one
goes down for some reason, you've only lost a small portion of your
capacity, and replacing a part will be very inexpensive.

And if volume increases, just add more USD1000 machines...

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Shridhar Daithankar
On 24 Jul 2003 at 9:42, William Yu wrote:

> As far as I can tell, the performance impact seems to be minimal. 
> There's a periodic storm of replication updates in cases where there's 
> mass updates sync last resync. But if you have mostly reads and few 
> writes, you shouldn't see this situation. The biggest performance impact 
> seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

Can you use WAL based replication? I don't have a URL handy but there are 
replication projects which transmit WAL files to another server when they fill 
in.

OTOH, I was thinking of a simple replication theme. If postgresql provides a 
hook where it calls an external library routine for each heapinsert in WAL, 
there could be a simple multi-slave replication system. One doesn't have to 
wait till WAL file fills up.

Of course, it's upto the library to make sure that it does not hold postgresql 
commits for too long that would hamper the performance.

Also there would need a receiving hook which would directly heapinsert the data 
on another node.

But if the external library is threaded, will that work well with postgresql?

Just a thought. If it works, load-balancing could be lot easy and near-
realtime..


Bye
 Shridhar

--
We fight only when there is no other choice.  We prefer the ways ofpeaceful contact.   
 -- Kirk, "Spectre of the Gun", stardate 4385.3


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Shridhar Daithankar
On 25 Jul 2003 at 16:38, Kasim Oztoprak wrote:
> this is kind of directory assistance application. actually the select statements are 
> not
> very complex. the database contain 25 million subscriber records and the operators 
> searches 
> for the subscriber numbers or addresses. there are not much update operations 
> actually the 
> update ratio is approximately %0.1 . 
> 
> i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon 
> processors.
> and suitable memory capacity with it. 

Are you going to duplicate the data?

If you are going to have 3000 sql statements per second, I would suggest,

1. Get quad CPU. You probably need that horsepower
2. Use prepared statements and stored procedures to avoid parsing overhead.

I doubt you would need cluster of machines though. If you run it thr. a pilot 
program, that would give you an idea whether or not you need a cluster..

Bye
 Shridhar

--
Default, n.:The hardware's, of course.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] hardware performance and some more

2003-07-25 Thread Kasim Oztoprak
On 24 Jul 2003 23:25 EEST you wrote:

> On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> > On 24 Jul 2003 17:08 EEST you wrote:
> > 
> > > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
> [snip]
> > 
> > we do not have memory problem or disk problems. as I have seen in the list the 
> > best way to 
> > use disks are using raid 10 for data and raid 1 for os. we can put as much memory 
> > as 
> > we require. 
> > 
> > now the question, if we have 100 searches per second and in each search if we need 
> > 30 sql
> > instruction, what will be the performance of the system in the order of time. Let 
> > us say
> > we have two machines described aove in a cluster.
> 
> That's 3000 sql statements per second, 180 thousand per minute
> What the heck is this database doing!
> 
> A quad-CPU Opteron sure is looking useful right about now...  Or
> an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
> available.
> 
> How complicated are each of these SELECT statements?

this is kind of directory assistance application. actually the select statements are 
not
very complex. the database contain 25 million subscriber records and the operators 
searches 
for the subscriber numbers or addresses. there are not much update operations actually 
the 
update ratio is approximately %0.1 . 

i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon 
processors.
and suitable memory capacity with it. 

i hope it will overcome with this problem. any similar implementation?

> 
> -- 
>  - 
> | Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
> | Jefferson, LA  USA  |
> | |
> | "I'm not a vegetarian because I love animals, I'm a vegetarian  |
> |  because I hate vegetables!"|
> |unknown  |
>  - 
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread Ron Johnson
On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> On 24 Jul 2003 17:08 EEST you wrote:
> 
> > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
[snip]
> 
> we do not have memory problem or disk problems. as I have seen in the list the best 
> way to 
> use disks are using raid 10 for data and raid 1 for os. we can put as much memory as 
> we require. 
> 
> now the question, if we have 100 searches per second and in each search if we need 
> 30 sql
> instruction, what will be the performance of the system in the order of time. Let us 
> say
> we have two machines described aove in a cluster.

That's 3000 sql statements per second, 180 thousand per minute
What the heck is this database doing!

A quad-CPU Opteron sure is looking useful right about now...  Or
an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
available.

How complicated are each of these SELECT statements?

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread William Yu
| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?
You could do active-active but it would require work on your end. I did 
a recent check on all the Postgres replication packages and they all 
seem to be single master -> single/many slaves. Updating on more than 1 
server looks to be problematic. I run an active-active now but I had to 
develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as 
the DB engine. Since our clients access our server over the internet, 
uptime is a big issue. Hence, we have two server farms: one colocated in 
San Francisco and the other in Sterling, VA. In addition to redudancy, 
we also wanted to spread the load across the servers. To do this, we 
went with the expedient method of 1-minute DNS zonemaps where if both 
servers are up, 70% traffic is sent to the faster farm and 30% to the 
other. Both servers are constantly monitored and if one goes down, a new 
zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to 
character keys. By making keys into characters, we could prepend a 
server location code so ID 100 generated at SF would not conflict with 
ID 100 generated in Sterling. Instead, they would be marked as S0100 
and V0100. Another benefit is the increase of possible key 
combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated 
records. In every table, we add extra fields to mark the date/time the 
record was last inserted/updated/deleted. All records touched as of the 
last resync are extracted, zipped up, pgp-encrypted and then posted on 
an ftp server. Files are then transfered between servers, records 
unpacked and inserted/updated. Some checks are needed to determine what 
takes precedence if users updated the same record on both servers but 
otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal. 
There's a periodic storm of replication updates in cases where there's 
mass updates sync last resync. But if you have mostly reads and few 
writes, you shouldn't see this situation. The biggest performance impact 
seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I 
suppose I could just make the resync program run more often but that's a 
bit inelegant. Perhaps I could capture every update/delete/insert/alter 
statement from the postgres logs, parsing them out to commands and then 
zipping/encrypting every command as a separate item to be processed. Or 
add triggers to every table where updated records are pushed to a custom 
"updated log".

The biggest problem is of course locks -- especially at the application 
level. I'm still thinking over what to do here.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread Kasim Oztoprak
On 24 Jul 2003 18:44 EEST you wrote:

> > Now, the second question is related to the performance of the database. Assuming 
> > we have a
> > dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for 
> > each, with the
> > main memory of lets say 32 GB. We can either use a small SAN from EMC or we can 
> > put all disks
> > into the machines with the required raid confiuration.
> >
> > We will install RedHat Advanced Server 2.1 to the machine as the operating system 
> > and postgresql as
> > the database server. We have a database having 25 millions records  having the 
> > length of 250 bytes
> > on average for each record. And there are 1000 operators accessing the database 
> > concurrently. The main
> > operation on the database (about 95%) is select rather than insert, so do you have 
> > any idea about
> > the performance of the system?
> 
> I have a very similar installation: Dell PE6600 with dual 2.0 Xeons/2MB cache, 4 GB 
> memory, 6-disk RAID-10 for data, 2-disk RAID-1 for RH Linux 8.  My database has over 
> 60 million records averaging  200 bytes per tuple.  I have a large nightly data 
> load, then very complex multi-table join queries all day with a few INSERT 
> transactions.  While I do not have 1000 concurrent users (more like 30 for me), my 
> processors and disks seem to be idle the vast majority of the time - this machine is 
> overkill.  So I think you will have no problem with your hardware, and could 
> probably easily get away with only two processors.  Someday, if you can determine 
> with certainty that the CPU is a bottleneck, drop in the 3rd and 4th processors (and 
> $10,000).   And save yourself money on the RAM as well - it's incredibly easy to put 
> in more if you need it.  If you really want to spend money, set up the fastest disk 
> arrays you can imagine.
>  

i have some time for the production, therefore, i can wait for the beta and production 
of version 7.4.
as i have seeen from your comments, you have 30 clients reaching to the database. 
assuming the maximum number 
of search for each client is 5 then, search per second will be atmost 3. in my case, 
there will be around 
100 search per second. so the main bothleneck comes from there. 

and finally, the rate for the insert operation is about %0.1  (1 in every thousand). 
I've started to learn
about my limitations a few days ago, i would like to learn whether i can solve my 
problem with postgresql 
or not. 

> I cannot emphasize enough: allocate a big chunk of time for tuning your database and 
> learning from this list.  I migrated from Microsoft SQL Server.  Out of the box 
> PostgreSQL was horrible for me, and even after significant tuning it crawled on 
> certain queries (compared to MSSQL).  The list helped me find a data type mismatch 
> in a JOIN clause, and since then the performance of PostgreSQL has blown the doors 
> off of MSSQL.  Since I only gave myself a couple days to do tuning before the db had 
> to go in production, I almost had to abandon PostgreSQL and revert to MS.  My 
> problems were solved in the nick of time, but I really wish I had made more time for 
> tuning.  
>  
> Running strong in production for 7 months now with PostgreSQL 7.3, and eagerly 
> awaiting 7.4!
>  
> Roman Fail
> POS Portal, Inc.
>  
>  
>  
>  
>  
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread Roman Fail
> Now, the second question is related to the performance of the database. Assuming we 
> have a
> dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for 
> each, with the
> main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put 
> all disks
> into the machines with the required raid confiuration.
>
> We will install RedHat Advanced Server 2.1 to the machine as the operating system 
> and postgresql as
> the database server. We have a database having 25 millions records  having the 
> length of 250 bytes
> on average for each record. And there are 1000 operators accessing the database 
> concurrently. The main
> operation on the database (about 95%) is select rather than insert, so do you have 
> any idea about
> the performance of the system?

I have a very similar installation: Dell PE6600 with dual 2.0 Xeons/2MB cache, 4 GB 
memory, 6-disk RAID-10 for data, 2-disk RAID-1 for RH Linux 8.  My database has over 
60 million records averaging  200 bytes per tuple.  I have a large nightly data load, 
then very complex multi-table join queries all day with a few INSERT transactions.  
While I do not have 1000 concurrent users (more like 30 for me), my processors and 
disks seem to be idle the vast majority of the time - this machine is overkill.  So I 
think you will have no problem with your hardware, and could probably easily get away 
with only two processors.  Someday, if you can determine with certainty that the CPU 
is a bottleneck, drop in the 3rd and 4th processors (and $10,000).   And save yourself 
money on the RAM as well - it's incredibly easy to put in more if you need it.  If you 
really want to spend money, set up the fastest disk arrays you can imagine.
 
I cannot emphasize enough: allocate a big chunk of time for tuning your database and 
learning from this list.  I migrated from Microsoft SQL Server.  Out of the box 
PostgreSQL was horrible for me, and even after significant tuning it crawled on 
certain queries (compared to MSSQL).  The list helped me find a data type mismatch in 
a JOIN clause, and since then the performance of PostgreSQL has blown the doors off of 
MSSQL.  Since I only gave myself a couple days to do tuning before the db had to go in 
production, I almost had to abandon PostgreSQL and revert to MS.  My problems were 
solved in the nick of time, but I really wish I had made more time for tuning.  
 
Running strong in production for 7 months now with PostgreSQL 7.3, and eagerly 
awaiting 7.4!
 
Roman Fail
POS Portal, Inc.
 
 
 
 
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread Kasim Oztoprak
On 24 Jul 2003 17:08 EEST you wrote:

> On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
> 
> > The questions for this explanation are:
> >   1 - Can we use postgresql within clustered environment?
> >   2 - if the answer is yes, in which method can we use postgresql within a 
> > cluster?
> >   active - passive or active - active?
> 
> Coupled with linux-HA( See http://linux-ha.org) heartbeat service, it *should* 
> be possible to run postgresql in active-passive clustering.
> 
> If postgresql supported read-only database so that several nodes could read off 
> a single disk but only one could update that, a sort of active-active should be 
> possible as well. But postgresql can not have a read only database. That would 
> be a handy addition in such cases..
>  

so in the master and slave configuration we can use the system within clustering 
environment. 

> > Now, the second question is related to the performance of the database. Assuming 
> > we have a 
> > dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for 
> > each, with the 
> > main memory of lets say 32 GB. We can either use a small SAN from EMC or we can 
> > put all disks 
> > into the machines with the required raid confiuration.
> > 
> > We will install RedHat Advanced Server 2.1 to the machine as the operating system 
> > and postgresql as 
> > the database server. We have a database having 25 millions records  having the 
> > length of 250 bytes 
> > on average for each record. And there are 1000 operators accessing the database 
> > concurrently. The main 
> > operation on the database (about 95%) is select rather than insert, so do you have 
> > any idea about 
> > the performance of the system? 
> 
> Assumig 325 bytes per tuple(250 bytes field 24-28 byte header varchar fields) 
> gives 25 tuples per 8K page, there would be 8GB of data. This configuration 
> could fly with 12-16GB of RAM. After all data is read that is. You can cut down 
> on other requirements as well. May be a 2x opteron with 16GB RAMmight be a 
> better fit but check out how much CPU cache it has.

we do not have memory problem or disk problems. as I have seen in the list the best 
way to 
use disks are using raid 10 for data and raid 1 for os. we can put as much memory as 
we require. 

now the question, if we have 100 searches per second and in each search if we need 30 
sql
instruction, what will be the performance of the system in the order of time. Let us 
say
we have two machines described aove in a cluster.



> 
> A grep -rwn across data directory would fill the disk cache pretty well..:-)
> 
> HTH
> 
> Bye
>  Shridhar
> 
> --
> Egotism, n:   Doing the New York Times crossword puzzle with a pen.Egotist, n:   
>  A 
> person of low taste, more interested in himself than me.  -- Ambrose 
> Bierce, 
> "The Devil's Dictionary"
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] hardware performance and some more

2003-07-24 Thread Shridhar Daithankar
On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:

> The questions for this explanation are:
>   1 - Can we use postgresql within clustered environment?
>   2 - if the answer is yes, in which method can we use postgresql within a 
> cluster?
>   active - passive or active - active?

Coupled with linux-HA( See http://linux-ha.org) heartbeat service, it *should* 
be possible to run postgresql in active-passive clustering.

If postgresql supported read-only database so that several nodes could read off 
a single disk but only one could update that, a sort of active-active should be 
possible as well. But postgresql can not have a read only database. That would 
be a handy addition in such cases..
 
> Now, the second question is related to the performance of the database. Assuming we 
> have a 
> dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for 
> each, with the 
> main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put 
> all disks 
> into the machines with the required raid confiuration.
> 
> We will install RedHat Advanced Server 2.1 to the machine as the operating system 
> and postgresql as 
> the database server. We have a database having 25 millions records  having the 
> length of 250 bytes 
> on average for each record. And there are 1000 operators accessing the database 
> concurrently. The main 
> operation on the database (about 95%) is select rather than insert, so do you have 
> any idea about 
> the performance of the system? 

Assumig 325 bytes per tuple(250 bytes field+24-28 byte header+varchar fields) 
gives 25 tuples per 8K page, there would be 8GB of data. This configuration 
could fly with 12-16GB of RAM. After all data is read that is. You can cut down 
on other requirements as well. May be a 2x opteron with 16GB RAMmight be a 
better fit but check out how much CPU cache it has.

A grep -rwn across data directory would fill the disk cache pretty well..:-)

HTH

Bye
 Shridhar

--
Egotism, n: Doing the New York Times crossword puzzle with a pen.Egotist, n:   
 A 
person of low taste, more interested in himself than me.-- Ambrose 
Bierce, 
"The Devil's Dictionary"


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Hardware performance

2003-07-18 Thread Magnus Hagander
> > >Adam Witney wrote:
> [snip]
> > If you would go with that one, make sure to get the optional BBWC 
> > (Battery Backed Write Cache). Without it the controller 
> won't enable 
> > the write-back cache (which it really shouldn't, since it 
> wouldn't be 
> > safe without the batteries). WB cache can really speed things on in 
> > many db situations - it's sort of like "speed of fsync off, 
> security 
> > of fsync on". I've seen huge speedups with both postgresql 
> and other 
> > databases on that.
> 
> Don't forget to check the batteries!!!  And if you have an 
> HPaq service contract, don't rely on them to do it...

That's what management software is for.. :-) (Yes, it does check the
batteries. They are also reported on reboot, but you don't want to do
that often, of course)

Under the service contract, HP will *replace* the batteries for free,
though - but you have to know when to replace them.

//Magnus

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware performance

2003-07-18 Thread Ron Johnson
On Thu, 2003-07-17 at 13:55, Magnus Hagander wrote:
> >Adam Witney wrote:
[snip]
> If you would go with that one, make sure to get the optional BBWC
> (Battery Backed Write Cache). Without it the controller won't enable the
> write-back cache (which it really shouldn't, since it wouldn't be safe
> without the batteries). WB cache can really speed things on in many db
> situations - it's sort of like "speed of fsync off, security of fsync
> on". I've seen huge speedups with both postgresql and other databases on
> that.

Don't forget to check the batteries!!!  And if you have an HPaq service
contract, don't rely on them to do it...

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware performance

2003-07-18 Thread Ron Johnson
On Wed, 2003-07-16 at 23:25, Roman Fail wrote:
[snip] 
> has every bit of redundancy you can order.  While uncommon, the
> backplane is one one of the many single points of failure!

Unless you go with a shared-disk cluster (Oracle 9iRAC or OpenVMS)
or replication.

Face it, if your pockets are deep enough, you can make everything
redundant and burden-sharing (i.e., not just waiting for the master
system to die).  (And with some enterprise FC controllers, you can
mirror the disks many kilometers away.)

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Hardware performance

2003-07-17 Thread Robert Creager
On Thu, 17 Jul 2003 16:20:42 +0100
Adam Witney <[EMAIL PROTECTED]> said something like:

> 
> Actually I am going through the same questions myself at the
> moment I would like to have a 2 disk RAID1 and a 4 disk RAID5, so
> need at least 6 disks
> 
> Anybody have any suggestions or experience with other hardware
> manufacturers for this size of setup? (2U rack, up to 6 disks, 2
> processors, ~2GB RAM, if possible)
> 

We recently bought a couple of Compaq Proliant DL380 units.  They are
2u, and support 6 disks, 2 CPU's, 12Gb max.

We purchased 2 units of 1CPU, 4x72Gb RAID 0+1, 1Gb mem, redundant fans
and power supplies for around $11,000 total.  Unfortunately they are
running Win2K with SQLAnywhere (ClearQuest/Web server) ;-)  So far (5
months), they're real board...

Cheers,
Rob

-- 
 21:16:04 up  1:19,  1 user,  load average: 2.04, 1.99, 1.38


pgp0.pgp
Description: PGP signature


Re: [PERFORM] Hardware performance

2003-07-17 Thread Magnus Hagander
>Adam Witney wrote:
>> Actually I am going through the same questions myself at the 
>moment I
>> would like to have a 2 disk RAID1 and a 4 disk RAID5, so 
>need at least 6
>> disks
>> 
>> Anybody have any suggestions or experience with other 
>hardware manufacturers
>> for this size of setup? (2U rack, up to 6 disks, 2 
>processors, ~2GB RAM, if
>> possible)
>
>I tend to use either 1U or 4U servers, depending on the 
>application. But 
>I've had good experiences with IBM recently, and a quick look on their 
>site shows the x345 with these specs:
>
>*  2U, 2-way server delivers extreme performance and availability for 
>demanding applications
>*  Up to 2 Intel Xeon processors up to 3.06GHz with 533MHz front-side 
>bus speed for outstanding performance
>*  Features up to 8GB of DDR memory, 5 PCI (4 PCI-X) slots and up to 6 
>hard disk drives for robust expansion
>*  Hot-swap redundant cooling, power and hard disk drives for high 
>availability
>*  Integrated dual Ultra320 SCSI with RAID-1 for data protection
>
>This may not wrap well, but here is the url:
>http://www-132.ibm.com/webapp/wcs/stores/servlet/CategoryDispla
> y?catalogId=-840&storeId=1&categoryId=2559454&langId=-1&dualCurrId=73
>
> Handles 6 drives; maybe that fits the bill?

[naturally, there should be one for each of the major server vendors,
eh?]

I've used mainly HP (as in former Compaq) machines here, with nothing
but good experience.  HPs machine in the scame class is the DL380G3.
Almost identical specs to the IBM (I'd expect all major vendors have
fairly similar machines). Holds 12Gb RAM. Only 3 PCI-X slots (2 of them
hotplug). RPS. 6 disk slots (Ultra-320) that can be put on one or two
SCSI chains (builtin RAID controller only handles a single channel,
though, so you'd need an extra SmartArray controller if you want to
split them). RAID0/1/1+0/5.

If you would go with that one, make sure to get the optional BBWC
(Battery Backed Write Cache). Without it the controller won't enable the
write-back cache (which it really shouldn't, since it wouldn't be safe
without the batteries). WB cache can really speed things on in many db
situations - it's sort of like "speed of fsync off, security of fsync
on". I've seen huge speedups with both postgresql and other databases on
that.


If you want to be "ready for more storage", I'd suggest looking at a 1U
server with a 3U external disk rack. That'll give you 16 disks in 4U (2
in the server + 14 in the rack on 2 channels), which is hard to beat. If
you have no need to go there, then sure, the 2U machine will be better.
But I've found the "small machine with external rack" a lot more
flexible than the "big machine with disks inside it". (For example, you
can put two 1U servers to it, and have 7 disks assigned to each server)
In HP world that would mean DL360G3 and the StorageWorks 4354.

The mandatory link:
http://h18004.www1.hp.com/products/servers/platforms/index-dl-ml.html



Though if you are already equipped with servers from one vendor, I'd
suggest sticking to it as long as the specs are fairly close. Then you
only need one set of management software etc.


//Magnus

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Hardware performance

2003-07-17 Thread Andrew Sullivan
On Thu, Jul 17, 2003 at 07:57:53AM -0700, Joe Conway wrote:
> 
> As I said, I've never personally found it necessary to move WAL off to a 
> different physical drive. What do you think is the best configuration 

On our Solaris test boxes (where, alas, we do not have the luxury of
1/2 TB external RAID boxes :-( ), putting WAL on a disk of its own
yielded something like 30% improvement in throughput on high
transaciton volumes.  So it's definitely important in some cases.

A


Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Jord Tanner

On Thu, 2003-07-17 at 08:20, Adam Witney wrote:


> Anybody have any suggestions or experience with other hardware manufacturers
> for this size of setup? (2U rack, up to 6 disks, 2 processors, ~2GB RAM, if
> possible)
> 
> Thanks
> 
> adam

Check out http://www.amaxit.com It is all white box stuff, but they have
some really cool gear.

-- 
Jord Tanner <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware performance

2003-07-17 Thread Jean-Luc Lachance
Sorry for the redundant duplication of the repetition.
I should have read the follow-up messages.


Joe Conway wrote:
> 
> Jean-Luc Lachance wrote:
> > I am currious. How can you have RAID 1+0 with only 2 drives?
> > If you are thinking about partitioning the drives, wont this defeate the
> > purpose?
> 
> Yeah -- Hannu already pointed out that my mind was fuzzy when I made
> that statement :-(. See subsequent posts.
> 
> Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Adam Witney wrote:
Actually I am going through the same questions myself at the moment I
would like to have a 2 disk RAID1 and a 4 disk RAID5, so need at least 6
disks
Anybody have any suggestions or experience with other hardware manufacturers
for this size of setup? (2U rack, up to 6 disks, 2 processors, ~2GB RAM, if
possible)
I tend to use either 1U or 4U servers, depending on the application. But 
I've had good experiences with IBM recently, and a quick look on their 
site shows the x345 with these specs:

•  2U, 2-way server delivers extreme performance and availability for 
demanding applications
•  Up to 2 Intel Xeon processors up to 3.06GHz with 533MHz front-side 
bus speed for outstanding performance
•  Features up to 8GB of DDR memory, 5 PCI (4 PCI-X) slots and up to 6 
hard disk drives for robust expansion
•  Hot-swap redundant cooling, power and hard disk drives for high 
availability
•  Integrated dual Ultra320 SCSI with RAID-1 for data protection

This may not wrap well, but here is the url:
http://www-132.ibm.com/webapp/wcs/stores/servlet/CategoryDisplay?catalogId=-840&storeId=1&categoryId=2559454&langId=-1&dualCurrId=73
Handles 6 drives; maybe that fits the bill?

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Jean-Luc Lachance wrote:
I am currious. How can you have RAID 1+0 with only 2 drives?
If you are thinking about partitioning the drives, wont this defeate the
purpose?
Yeah -- Hannu already pointed out that my mind was fuzzy when I made 
that statement :-(. See subsequent posts.

Joe



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Jean-Luc Lachance
I am currious. How can you have RAID 1+0 with only 2 drives?
If you are thinking about partitioning the drives, wont this defeate the
purpose?

JLL

Joe Conway wrote:
> 
> [...]
> 2 drives, RAID 1+0: WAL
> 2 drives, RAID 1+0: data
> [...]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware performance

2003-07-17 Thread Adam Witney
On 17/7/03 4:09 pm, "Joe Conway" <[EMAIL PROTECTED]> wrote:

> Adam Witney wrote:
>> I think the issue from the original posters point of view is that the Dell
>> PE2650 can only hold a maximum of 5 internal drives
>> 
> 
> True enough, but maybe that's a reason to be looking at other
> alternatives. I think he said the hardware hasn't been bought yet.

Actually I am going through the same questions myself at the moment I
would like to have a 2 disk RAID1 and a 4 disk RAID5, so need at least 6
disks

Anybody have any suggestions or experience with other hardware manufacturers
for this size of setup? (2U rack, up to 6 disks, 2 processors, ~2GB RAM, if
possible)

Thanks

adam


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Adam Witney wrote:
I think the issue from the original posters point of view is that the Dell
PE2650 can only hold a maximum of 5 internal drives
True enough, but maybe that's a reason to be looking at other 
alternatives. I think he said the hardware hasn't been bought yet.

Joe



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Hardware performance

2003-07-17 Thread Adam Witney

> As I said, I've never personally found it necessary to move WAL off to a
> different physical drive. What do you think is the best configuration
> given the constraint of 5 drives? 1 drive for OS, and 4 for RAID 1+0 for
> data-plus-WAL? I guess the ideal would be to find enough money for that
> 6th drive, use the mirrored pair for both OS and WAL.

I think the issue from the original posters point of view is that the Dell
PE2650 can only hold a maximum of 5 internal drives


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Hannu Krosing wrote:
How do you do RAID 1+0 with just two drives ?

Hmm, good point -- I must have been tired last night ;-). With two 
drives you can do mirroring or striping, but not both.

Usually I've seen a pair of mirrored drives for the OS, and a RAID 1+0 
array for data. But that requires 6 drives, not 5. On non-database 
servers usually the data array is RAID 5, and you could get away with 5 
drives (as someone else pointed out).

As I said, I've never personally found it necessary to move WAL off to a 
different physical drive. What do you think is the best configuration 
given the constraint of 5 drives? 1 drive for OS, and 4 for RAID 1+0 for 
data-plus-WAL? I guess the ideal would be to find enough money for that 
6th drive, use the mirrored pair for both OS and WAL.

Joe



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Vincent van Leeuwen
On 2003-07-16 19:57:22 -0700, Balazs Wellisch wrote:
> We're now stuck on the question of what type of RAID configuration to use
> for this server. RAID 5 offers the best fault tolerance but doesn't perform
> all that well. RAID 10 offers much better performance, but no hot swap. Or
> should we not use RAID at all. I know that ideally the log (WAL) files
> should reside on a separate disk from the rest of the DB. Should we use 4
> separate drives instead? One for the OS, one for data, one for WAL, one for
> swap? Or RAID 10 for everything plus 1 drive for WAL? Or RAID 5 for
> everything?
> 

We have recently run our own test (simulating our own database load) on a new
server which contained 7 15K rpm disks. Since we always want to have a
hot-spare drive (servers are located in a hard-to-reach datacenter) and we
always want redundancy, we tested two different configurations:
- 6 disk RAID 10 array, holding everything
- 4 disk RAID 5 array holding postgresql data and 2 disk RAID 1 array holding
  OS, swap and WAL logs

Our database is used for a very busy community website, so our load contains a
lot of inserts/updates for a website, but much more selects than there are
updates.

Our findings were that the 6 disk RAID 10 set was significantly faster than
the other setup.

So I'd recommend a 4-disk RAID 10 array. I'd use the 5th drive for a hot-spare
drive, but that's your own call. However, it would be best if you tested some
different setups under your own database load to see what works best for you.


Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Hardware performance

2003-07-17 Thread Hannu Krosing
Joe Conway kirjutas N, 17.07.2003 kell 07:52:
> To an extent it depends on how big the drives are and how large you 
> expect the database to get. For maximal performance you want RAID 1+0 
> for data and WAL; and you want OS, data, and WAL each on their own 
> drives. So with 5 drives one possible configuration is:
> 
> 1 drive OS: OS on it's own drive makes it easy to upgrade, or restore 
> the OS from CD if needed
> 2 drives, RAID 1+0: WAL
> 2 drives, RAID 1+0: data

How do you do RAID 1+0 with just two drives ?

--
Hannu



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-16 Thread Joe Conway
Balazs Wellisch wrote:
first of all I'd like to thank everyone who responded to my earlier
post. I have a much better understanding of postgres performance
tuning now. In case anyone's interested we've decided to go with RH9
and PostgreSQL 7.3 and we'll do the OS and DB tuning ourselves.
(should be a good learning experience)
Good choice! I think you'll find that this list will be a great resource 
as you learn. One point here is that you should use 7.3.3 (latest 
release version) instead of the version of Postgres in the distribution. 
Also, you might want to rebuild the RPMs from source using
"--target i686".

We have the budget for 5 drives. Does anyone have any real world
experience with what hard drive configuration works best for
postgres? This is going to be a dedicated DB server. There are going
to be a large number of transactions being written to the database.
To an extent it depends on how big the drives are and how large you 
expect the database to get. For maximal performance you want RAID 1+0 
for data and WAL; and you want OS, data, and WAL each on their own 
drives. So with 5 drives one possible configuration is:

1 drive OS: OS on it's own drive makes it easy to upgrade, or restore 
the OS from CD if needed
2 drives, RAID 1+0: WAL
2 drives, RAID 1+0: data

But I've seem reports that with fast I/O subsystems, there was no 
measurable difference with WAL separated from data. And to be honest, 
I've never personally found it necessary to separate WAL from data. You 
may want to test with WAL on the same volume as the data to see if there 
is enough difference to warrant separating it or not given your load and 
your actual hardware. If not, use 1 OS drive and 4 RAID 1+0 drives as 
one volume.

You never want find any significant use of hard disk based swap space -- 
if you see that, you are probably misconfigured, and performance will be 
poor no matter how you've set up the drives.

And there will be some moderately complex queries run concurrently to
present this information in the form of various reports on the web.
Once you have some data on your test server, and you have complex 
queries to tune, there will be a few details you'll get asked every time 
if you don't provide them when posting a question to the list:

1) Have you been running VACUUM and ANALYZE (or VACUUM ANALYZE) at
   appropriate intervals?
2) What are the table definitions and indexes for all tables involved?
3) What is the output of EXPLAIN ANALYZE?
HTH,

Joe



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Hardware performance

2003-07-16 Thread Roman Fail
I've got a Dell 2650 set up with 5 drives and a separate app server connecting with 
JDBC.  Since you've only got 5 drives, my conclusion regarding the best balance of 
performance and redundancy was:
 
2 drives have the OS, swap, and WAL in RAID-1
3 drives have the data in RAID-5
 
If you can afford it, get the 2+3 split backplane and make the 3 data drives the 
biggest, fastest you can afford.  Currently that means the 15k 73GB drives, which 
would give you 146GB for data.  Make the OS drives smaller and slower if you need to 
save cash.  
 
If only it had six drive baysyou could use 4 drives for the data and do RAID-10.  
If you've got the additional rackspace available, you could get the 5U Dell 2600 
instead for the same ballpark cost.  If you order it with rack rails, it comes all set 
up for rack installation...a special sideways faceplate and everything.
 
By the way, RAID-5 is not the best fault tolerance, RAID-1 or RAID-10 is.  And you can 
certainly hot-swap RAID-10 arrays.  I've actually done itrecently!  I am of the 
mind that single drives are not an option for production servers - I just don't need 
the pain of the server going down at all.  Although they DO go down despite 
redundancy...I just had a SCSI backplane go out in a Dell 6600 that has every bit of 
redundancy you can order.  While uncommon, the backplane is one one of the many single 
points of failure!  
 
Roman Fail
POS Portal, Inc.
 
 

-Original Message- 
From: Balazs Wellisch [mailto:[EMAIL PROTECTED] 
Sent: Wed 7/16/2003 7:57 PM 
To: [EMAIL PROTECTED] 
Cc: 
Subject: [PERFORM] Hardware performance


Hi all,
 
first of all I'd like to thank everyone who responded to my earlier post. I 
have a much better understanding of postgres performance tuning now. In case anyone's 
interested we've decided to go with RH9 and PostgreSQL 7.3 and we'll do the OS and DB 
tuning ourselves. (should be a good learning experience)
 
We are now getting ready to purchase the hardware that will be used to run the 
database server. We're spending quite a bit of money on it because this will 
eventually, if things go well within two months, become a production server. We're 
getting all RH certified hardware from Dell. (Dell 2650)
 
We're now stuck on the question of what type of RAID configuration to use for 
this server. RAID 5 offers the best fault tolerance but doesn't perform all that well. 
RAID 10 offers much better performance, but no hot swap. Or should we not use RAID at 
all. I know that ideally the log (WAL) files should reside on a separate disk from the 
rest of the DB. Should we use 4 separate drives instead? One for the OS, one for data, 
one for WAL, one for swap? Or RAID 10 for everything plus 1 drive for WAL? Or RAID 5 
for everything?
 
We have the budget for 5 drives. Does anyone have any real world experience 
with what hard drive configuration works best for postgres? This is going to be a 
dedicated DB server. There are going to be a large number of transactions being 
written to the database. (Information is logged from a separate app through ODBC to 
postgres) And there will be some moderately complex queries run concurrently to 
present this information in the form of various reports on the web. (The app server is 
a separate machine and will connect to the DB through JDBC to create the HTML reports)
 
Any thoughts, ideas, comments would be appreciated.
 
Thank you,
 
Balazs Wellisch
Neu Solutions
[EMAIL PROTECTED]
 


---(end of broadcast)---
TIP 8: explain analyze is your friend