subject:"Re\: \[PERFORM\] Hardware\/OS recommendations for large databases \(5TB\)"

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-17 Thread Vivek Khera



On Nov 16, 2005, at 4:50 PM, Claus Guttesen wrote:


I'm (also) FreeBSD-biased but I'm not shure whether the 5 TB fs will
work so well if tools like fsck are needed. Gvinum could be one option
but I don't have any experience in that area.


Then look into an external filer and mount via NFS.  Then it is not  
FreeBSD's responsibility to manage the volume.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-16 Thread Claus Guttesen

> at 5TB data, i'd vote that the application is disk I/O bound, and the
> difference in CPU speed at the level of dual opteron vs. dual-core
> opteron is not gonna be noticed.
>
> to maximize disk, try getting a dedicated high-end disk system like
> nstor or netapp file servers hooked up to fiber channel, then use a
> good high-end fiber channel controller like one from LSI.
>
> and go with FreeBSD amd64 port.  It is *way* fast, especially the
> FreeBSD 6.0 disk system.

I'm (also) FreeBSD-biased but I'm not shure whether the 5 TB fs will
work so well if tools like fsck are needed. Gvinum could be one option
but I don't have any experience in that area.

regards
Claus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-16 Thread Vivek Khera



On Nov 15, 2005, at 3:28 AM, Claus Guttesen wrote:


Hardware-wise I'd say dual core opterons. One dual-core-opteron
performs better than two single-core at the same speed. Tyan makes


at 5TB data, i'd vote that the application is disk I/O bound, and the  
difference in CPU speed at the level of dual opteron vs. dual-core  
opteron is not gonna be noticed.


to maximize disk, try getting a dedicated high-end disk system like  
nstor or netapp file servers hooked up to fiber channel, then use a  
good high-end fiber channel controller like one from LSI.


and go with FreeBSD amd64 port.  It is *way* fast, especially the  
FreeBSD 6.0 disk system.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-16 Thread William Yu


James Mello wrote:

Unless there was a way to guarantee consistency, it would be hard at
best to make this work. Convergence on large data sets across boxes is
non-trivial, and diffing databases is difficult at best. Unless there
was some form of automated way to ensure consistency, going 8 ways into
separate boxes is *very* hard. I do suppose that if you have fancy
storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms
of commodity stuff, I'd have to agree with Merlin. 


If you're talking about data consistency, I don't see why that's an 
issue in a bulk-load/read-only setup. Either bulk load on 1 server and 
then do a file copy to all the others -- or simultaneously bulk load on 
all servers.


If you're talking about consistency in directly queries to the 
appropriate servers, I agree that's a more complicated issue but not 
unsurmountable. If you don't use persistent connections, you can 
probably get pretty good routing using DNS -- monitor servers by looking 
at top/iostat/memory info/etc and continually change the DNS zonemaps to 
direct traffic to less busy servers. (I use this method for our global 
load balancers -- pretty easy to script via Perl/Python/etc.) Mind you 
since you need a Dual Processor motherboard anyways to get PCI-X, that 
means every machine would be a 2xDual Core so there's enough CPU power 
to handle the cases where 2 or 3 queries get sent to the same server 
back-to-back. Of course, I/O would take a hit in this case -- but I/O 
would take a hit in every case on a single 16-core mega system.


If use persistent connections, it'll definitely require extra 
programming beyond simple scripting. Take one of the opensource projects 
like PgPool or SQLRelay and alter it so it monitors all servers to see 
what server is least busy before passing a query on.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-15 Thread James Mello

Unless there was a way to guarantee consistency, it would be hard at
best to make this work. Convergence on large data sets across boxes is
non-trivial, and diffing databases is difficult at best. Unless there
was some form of automated way to ensure consistency, going 8 ways into
separate boxes is *very* hard. I do suppose that if you have fancy
storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms
of commodity stuff, I'd have to agree with Merlin. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of William Yu
Sent: Tuesday, November 15, 2005 10:57 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Hardware/OS recommendations for large databases (
5TB)

Merlin Moncure wrote:
>>You could instead buy 8 machines that total 16 cores, 128GB RAM and
>  
> It's hard to say what would be better.  My gut says the 5u box would 
> be a lot better at handling high cpu/high concurrency problems...like 
> your typical business erp backend.  This is pure speculation of 
> course...I'll defer to the experts here.

In this specific case (data warehouse app), multiple machines is the
better bet. Load data on 1 machine, copy to other servers and then use a
middleman to spread out SQL statements to each machine.

I was going to suggest pgpool as the middleman but I believe it's
limited to 2 machines max at this time. I suppose you could daisy chain
pgpools running on every machine.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-15 Thread William Yu


Merlin Moncure wrote:

You could instead buy 8 machines that total 16 cores, 128GB RAM and
 
It's hard to say what would be better.  My gut says the 5u box would be

a lot better at handling high cpu/high concurrency problems...like your
typical business erp backend.  This is pure speculation of course...I'll
defer to the experts here.


In this specific case (data warehouse app), multiple machines is the 
better bet. Load data on 1 machine, copy to other servers and then use a 
middleman to spread out SQL statements to each machine.


I was going to suggest pgpool as the middleman but I believe it's 
limited to 2 machines max at this time. I suppose you could daisy chain 
pgpools running on every machine.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-15 Thread Adam Weisberg

Luke,

-Original Message-
From: Luke Lonergan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 15, 2005 7:10 AM
To: Adam Weisberg
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Hardware/OS recommendations for large databases (
5TB)

Adam,

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Claus 
> Guttesen
> Sent: Tuesday, November 15, 2005 12:29 AM
> To: Adam Weisberg
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Hardware/OS recommendations for large databases

> ( 5TB)
> 
> > Does anyone have recommendations for hardware and/or OS to
> work with
> > around 5TB datasets?
> 
> Hardware-wise I'd say dual core opterons. One dual-core-opteron 
> performs better than two single-core at the same speed. Tyan makes 
> some boards that have four sockets, thereby giving you 8 cpu's (if you

> need that many). Sun and HP also makes nice hardware although the Tyan

> board is more competetive priced.
> 
> OS wise I would choose the FreeBSD amd64 port but partititions larger 
> than 2 TB needs some special care, using gpt rather than disklabel 
> etc., tools like fsck may not be able to completely check partitions 
> larger than 2 TB. Linux or Solaris with either LVM or Veritas FS 
> sounds like candidates.

I agree - you can get a very good one from www.acmemicro.com or
www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA
RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM
on a Tyan 2882 motherboard.  We get about 400MB/s sustained disk read
performance on these (with tuning) on Linux using the xfs filesystem,
which is one of the most critical factors for large databases.  

Note that you want to have your DBMS use all of the CPU and disk channel
bandwidth you have on each query, which takes a parallel database like
Bizgres MPP to achieve.

Regards,

- Luke


The What's New FAQ for PostgreSQL 8.1 says "the buffer manager for 8.1
has been enhanced to scale almost linearly with the number of
processors, leading to significant performance gains on 8-way, 16-way,
dual-core, and multi-core CPU servers."

Why not just use it as-is?

Cheers,

Adam

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-15 Thread Merlin Moncure

> Merlin,
> 
> > > just FYI: tyan makes a 8 socket motherboard (up to 16 cores!):
> > > http://www.swt.com/vx50.html
> > >
> > > It can be loaded with up to 128 gb memory if all the sockets are
> > > filled :).
> 
> Another thought - I priced out a maxed out machine with 16 cores and
> 128GB of RAM and 1.5TB of usable disk - $71,000.
> 
> You could instead buy 8 machines that total 16 cores, 128GB RAM and
28TB
> of disk for $48,000, and it would be 16 times faster in scan rate,
which
> is the most important factor for large databases.  The size would be
16
> rack units instead of 5, and you'd have to add a GigE switch for
$1500.
 
It's hard to say what would be better.  My gut says the 5u box would be
a lot better at handling high cpu/high concurrency problems...like your
typical business erp backend.  This is pure speculation of course...I'll
defer to the experts here.

Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-15 Thread Merlin Moncure

> Hardware-wise I'd say dual core opterons. One dual-core-opteron
> performs better than two single-core at the same speed. Tyan makes
> some boards that have four sockets, thereby giving you 8 cpu's (if you
> need that many). Sun and HP also makes nice hardware although the Tyan
> board is more competetive priced.

just FYI: tyan makes a 8 socket motherboard (up to 16 cores!):
http://www.swt.com/vx50.html

It can be loaded with up to 128 gb memory if all the sockets are filled
:).

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-15 Thread Claus Guttesen

> Does anyone have recommendations for hardware and/or OS to work with around
> 5TB datasets?

Hardware-wise I'd say dual core opterons. One dual-core-opteron
performs better than two single-core at the same speed. Tyan makes
some boards that have four sockets, thereby giving you 8 cpu's (if you
need that many). Sun and HP also makes nice hardware although the Tyan
board is more competetive priced.

OS wise I would choose the FreeBSD amd64 port but partititions larger
than 2 TB needs some special care, using gpt rather than disklabel
etc., tools like fsck may not be able to completely check partitions
larger than 2 TB. Linux or Solaris with either LVM or Veritas FS
sounds like candidates.

> I have been working with datasets no bigger than around 30GB, and that (I'm
> afraid to admit) has been in MSSQL.

Well, our data are just below 30 GB so I can't help you there :-)

regards
Claus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

10 matches

Site Navigation

Mail list logo

Footer information