Re: Webservers with Terrabytes of Data in - recomended setups
In article [EMAIL PROTECTED], Nick Holland wrote: Dumping the data from one disk to another is fine and dandy when you are talking about your 40G disk on your home or desktop computer, the fact that you are down for a few hours is no big deal. But what about a server? I don't care how fast your disks are, moving 300G of data to a new disk system is a lot of slow work. This I usually quantify as: we double storage capacity every 18 months, unfortunately, we double transfer speed (actual access/read/write speed) only much slower than that. Deal with it. --Toby.
Re: Webservers with Terrabytes of Data in - recomended setups
On Fri, Apr 20, 2007 at 11:04:34PM -0700, Clint Pachl wrote: What do you consider a sane number of front ends, 10, less, more? Well, I think that depends on too many variables. I have a movie server (OBSD) that exports NFS to two home theatre computers (FBSD). The movie server is a dual P3 1GHz with 4 U320 SCSI disks in RAID0. When simultaneously playing different DVDs on the two theatre computers, the movie server is 90% idle; that's with TCP connection. When using UDP mounts it's 96% idle. Although movie files are large sequential data, the bottleneck in my network is my 100Mbs LAN. I don't have the experience that others here have, but at a small ISP that I worked for used NFS to serve http. It was a Linux shop, they had a netapp NFS exporting 5000 users' /home dirs to a dozen 1U cheapo i386 whiteboxes that ran apache/tomcat/cgi etc. Disk and CPU (for cgi, https, tomcat, php, etc) were seperated. The only problem that they had with NFS was flock when mbox was used for mail storage for the mail farm (same netapp). When courier maildir was used, this was not longer an issue. The web farm was mainly read only, while the mail farm was split read and write, to the same netapp. All eggs were in the one netapp basket... Maybe not on the same scale as the OP has in mind. May be it's time for me to revisit this yet again, but never been very succesful with high traffic. All I can say is that I love NFS. You're missing out. Plus it is so simple. I have wanted to check out AFS for fail-over reasons, but too many docs for me to read. One last note. Holland's disk structuring is very cool (read his earlier post for details). If I were to serve NFS to dozens or hundreds of clients I would use his scheme, however, apply his partitioning scheme at the host level. If an NFS server is saturated, spread the load by adding another server. The drawback is that each client has multiple NFS mounts. However, if you have this many machines uniformly accessing an NFS array, the entire mounting process should be automated. This is where clever planning takes place. Now I work for Sun, and they have something like 30,000 employees. Nearly all staff use Sunray work stations, and home directories are NFS mounts over a global WAN. There is not one massive /home box, obviously. There are many home NFS servers, in each of many cities. From here in Scotland, I can work with an engineer elsewhere by cd'ing to /somwhere/holland, /nowwhere/japan, /elsewhere/colorado. Only takes a couple of seconds for the automounter to kick in. The output of mount shows the layout of /home something like: /home/user1 box1.uk:/export/home5/28/user1 /home/user2 box9.au:/export/home17/2/user2 So, many average sized boxes are used, that in turn have many average disk packs, that are split. As is expected, LDAP and NIS are used. -- Craig Skinner | http://www.kepax.co.uk | [EMAIL PROTECTED]
Re: Webservers with Terrabytes of Data in - recomended setups
* Joachim Schipper [EMAIL PROTECTED] [2007-04-20 00:36]: On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. there is nothing wrong with serving directly from NFS. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
Re: Webservers with Terrabytes of Data in - recomended setups
On Fri, Apr 20, 2007 at 12:36:29PM +0200, Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 00:36]: On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. there is nothing wrong with serving directly from NFS. Really? You have a lot more experience in this area, so I will defer to you if you are sure, but it seems to me that in the sort of system I explicitly assumed (something like a web farm), serving everything off NFS would involve either very expensive hardware or be rather slow. I see how in your example - a lot of storage, not accessed often - just serving everything off NFS makes perfect sense. However, that was not what I was talking about. Perhaps you could elaborate a little? I'm interested, at least... Joachim -- TFMotD: hostapd.conf (5) - configuration file for the Host Access Point daemon
Re: Webservers with Terrabytes of Data in - recomended setups
On Fri, Apr 20, 2007 at 09:03:54AM -0500, Jacob Yocom-Piatt wrote: from my observations redundancy is the biggest problem with NFS and that its ability to efficiently serve up data is more than ample. Redundancy is certainly a problem, but lots of US HPC and distributed computing sites have severe scaling problems with NFS. High r/w traffic has killed several file servers in projects that we work with, and it sucks big time. I don't know anyone who's happy or excited or confident in their HPC NFS deployments; everyone I've talked to hopes for a real solution to this problem. ;) If the OP's use case involves lots of writes (especially from many clients), I'd be concerned about NFS' ability to keep up. Then again, I've had problems with pretty much all of the network filesystems (including AFS, though it's the least bad in my experience). I'm still waiting for Ceph[0] to mature (and to shed its linuxisms). ;) [0] http://ceph.sf.net/ -- o--{ Will Maier }--o | web:...http://www.lfod.us/ | [EMAIL PROTECTED] | *--[ BSD Unix: Live Free or Die ]--*
Re: Webservers with Terrabytes of Data in - recomended setups
Joachim Schipper wrote: there is nothing wrong with serving directly from NFS. Really? You have a lot more experience in this area, so I will defer to you if you are sure, but it seems to me that in the sort of system I explicitly assumed (something like a web farm), serving everything off NFS would involve either very expensive hardware or be rather slow. I see how in your example - a lot of storage, not accessed often - just serving everything off NFS makes perfect sense. However, that was not what I was talking about. at HPC facilities (LANL, sandia, LLNL, argonne, etc) NFS is used extensively for this purpose since the amount of storage required for simulation outputs greatly outstrips the storage that any one machine can provide, especially the compute nodes. before i switched my email address i would get regular notifications that NFS filesystems were down for this-or-that many hours at compute facility X. from my observations redundancy is the biggest problem with NFS and that its ability to efficiently serve up data is more than ample. AFS provides additional redundancy via volume replication and having the various services that comprise it spread over several machines. there is a lot of documentation to go through tho. cheers, jake Perhaps you could elaborate a little? I'm interested, at least... Joachim
Re: Webservers with Terrabytes of Data in - recomended setups
Jason Beaudoin wrote: snip Use all the tricks you can for YOUR solution, including: * lots of small partitions What are the reasonings behind this? Thanks for the awesome post! I think it runs something like this If there is a problem somewhere on the disk, if it's all one big partition, you must fix the big partition if it's lots of small partitions, you fix the one with the problem. Even worse, in some situations, the difference is between being dead and being somewhat crippled. Methinks there's lots of hard-won experience behind Nick's answers ;)
Re: Webservers with Terrabytes of Data in - recomended setups
Bullshit. just use NFS :) -Bob * Steven Harms [EMAIL PROTECTED] [2007-04-19 17:01]: This isn't an OpenBSD specific solution, but you should be able to use an EMC san to accomplish this (we use a fiber channel setup) On 4/19/07, Stuart Henderson [EMAIL PROTECTED] wrote: On 2007/04/19 18:08, Daniel Ouellet wrote: Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Good idea yes, but if I recall properly, unless major changes have been done, isn't it the use of NFS become a huge bottle neck compare to local drive? I think the archive is full of complain about the thought put of NFS not being so good. I meant using it the other way round: have the *webservers* export their filesystem, and ftp/management servers mount them to provide a single space for carrying out updates and backups, locating files, etc. Having a bunch of webservers serve data from a large NFS store seems less attractive for most of the cases I can think of. The main one I see where it may be attractive is where heavy CGI processing or similar is done (that's usually a different situation to having many TB of data, though). In the CGI case, there are some benefits to distributing files by another way (notably avoiding the NFS server as a point of failure), rsync as Joachim mentioned is one way to shift the files around, CVS is also suitable, it encourages keeping tighter control over changes too, and isn't difficult to learn. -- #!/usr/bin/perl if ((not 0 not 1) != (! 0 ! 1)) { print Larry and Tom must smoke some really primo stuff...\n; }
Re: Webservers with Terrabytes of Data in - recomended setups
* Joachim Schipper [EMAIL PROTECTED] [2007-04-20 14:49]: On Fri, Apr 20, 2007 at 12:36:29PM +0200, Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 00:36]: On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. there is nothing wrong with serving directly from NFS. Really? You have a lot more experience in this area, so I will defer to you if you are sure, but it seems to me that in the sort of system I explicitly assumed (something like a web farm), serving everything off NFS would involve either very expensive hardware or be rather slow. no. cache works. reads are no problem whatsoever in this kind of setup (well. I am sure you can make that a problem with many frontend servers and lots to read. obviously. but for any sane number of frontends, should not) -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
Re: Webservers with Terrabytes of Data in - recomended setups
On Fri, Apr 20, 2007 at 07:56:16PM +0200, Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 14:49]: On Fri, Apr 20, 2007 at 12:36:29PM +0200, Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 00:36]: On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. there is nothing wrong with serving directly from NFS. Really? You have a lot more experience in this area, so I will defer to you if you are sure, but it seems to me that in the sort of system I explicitly assumed (something like a web farm), serving everything off NFS would involve either very expensive hardware or be rather slow. no. cache works. reads are no problem whatsoever in this kind of setup (well. I am sure you can make that a problem with many frontend servers and lots to read. obviously. but for any sane number of frontends, should not) Yeah, you are right. Now what was I thinking, anyway? Anyway, thanks! Joachim -- TFMotD: pci_make_tag, pci_decompose_tag, pci_conf_read, pci_conf_write (9) - PCI config space manipulation functions
Re: Webservers with Terrabytes of Data in - recomended setups
Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 14:49]: On Fri, Apr 20, 2007 at 12:36:29PM +0200, Henning Brauer wrote: * Joachim Schipper [EMAIL PROTECTED] [2007-04-20 00:36]: On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. there is nothing wrong with serving directly from NFS. Really? You have a lot more experience in this area, so I will defer to you if you are sure, but it seems to me that in the sort of system I explicitly assumed (something like a web farm), serving everything off NFS would involve either very expensive hardware or be rather slow. no. cache works. reads are no problem whatsoever in this kind of setup (well. I am sure you can make that a problem with many frontend servers and lots to read. obviously. but for any sane number of frontends, should not) OK, then how well CARP works on NFS for backup mount in case something goes wrong with the main NFS server source? Is it efficient, possible and mount itself again? Delay? What do you consider a sane number of front ends, 10, less, more? Cache, you mean cache on the source NFS, or cache on the client NFS? Sorry, look like I have more questions then answers as I skip NFS a few years ago because of the bottle neck on the NFS transfer. Write was bad, read OK, but not huge. May well be different now, I would be happy with decent read, but what can be excepted. The archive is not to nice on the subject I have to say. Always looks like a bottle neck on the NFS side. If small site, or low traffic, yes that's great, but what can one expect to reach the limits here? Any ideas? May be it's time for me to revisit this yet again, but never been very succesful with high traffic. Many thanks Daniel
Re: Webservers with Terrabytes of Data in - recomended setups
On Friday 20 April 2007 08:32, Tony Abernethy wrote: Jason Beaudoin wrote: snip Use all the tricks you can for YOUR solution, including: * lots of small partitions What are the reasonings behind this? Thanks for the awesome post! I think it runs something like this If there is a problem somewhere on the disk, if it's all one big partition, you must fix the big partition if it's lots of small partitions, you fix the one with the problem. Even worse, in some situations, the difference is between being dead and being somewhat crippled. Methinks there's lots of hard-won experience behind Nick's answers ;) You last assumption is the most correct, and Nick has put some of that experience into FAQ-14 for our reading pleasure. In general, you always want to assume a failure *WILL* occur, rather than think in terms of if something will fail. Having lots of small partitions, and using Read Only partitions wherever possible (also mentioned by Nick) gives you a number of important advantages. Assume that someone, possibly you, has managed to trip over the power cord, how long will it take you to get the server back up? If your partitions are Read/Write, then you will be doing a fsck on each of them. That means time. If your partitions are huge, then you will need a lot of RAM and time to preform the fsck. If you have a massive partition and insufficient RAM, then your fsck will fail (see FAQ-14.7 fsck(8) time and memory requirements) and you'll be stuck like a turtle on it's back at a soup competition. The above is just your start up time after a crash or power loss. Assume that someone, possibly you, has written some bad code that will scribble all over the data in one of your partitions. How long will it take you to recover? If the partition was marked RO, then you don't have a problem. If it was a small RW partition, you can repair it reasonably quickly from backup. If your backup media fails, your losses are minimal. By comparison, if it's a huge RW partition, then you're stuffed. The list of reasons goes on and on but when you really think about it, you'll understand that you're just doing proper risk management by trying to mitigate as many of the bad effects of failures as possible. Never drink the marketing kool-aid that will try to sell you on the idea that failures are somehow avoidable. Sure, it might sound like a nice idea but the idea always falls short of reality. Being prepared for the reality of failures is a much better approach than sticking your head in the sand. /jcr
Re: Webservers with Terrabytes of Data in - recomended setups
J.C. Roberts wrote: On Friday 20 April 2007 08:32, Tony Abernethy wrote: Jason Beaudoin wrote: snip Use all the tricks you can for YOUR solution, including: * lots of small partitions What are the reasonings behind this? Thanks for the awesome post! I think it runs something like this If there is a problem somewhere on the disk, if it's all one big partition, you must fix the big partition if it's lots of small partitions, you fix the one with the problem. Even worse, in some situations, the difference is between being dead and being somewhat crippled. Methinks there's lots of hard-won experience behind Nick's answers ;) yeah, though fortunately most of it was in the form of confirmation of already held paranoia. :) You last assumption is the most correct, and Nick has put some of that experience into FAQ-14 for our reading pleasure. In addition to Tony and J.C.'s comments (I've edited them out for size, go back and read 'em if you haven't), let me add another really big reason: Growth and scalability. Usual logic goes something like this: I need a lot of space, so I'm going to build a file system that has a lot of space in it, and you drop all that space into one file system. Efficient? For a while, yes. BUT, what about when it fills up? Usual response: use a Volume Manager or Dump the data to a new, bigger disk system. Ok, the ability of some volume managers to dynamically increase the size of a file system is kinda cool, but I would argue that for many apps, it is just another way of saying, The initial design SUCKED and I had more money than brains to fix the problem (assuming one of the commercial products, of course). Somewhat over simplification, of course...but... Dumping the data from one disk to another is fine and dandy when you are talking about your 40G disk on your home or desktop computer, the fact that you are down for a few hours is no big deal. But what about a server? I don't care how fast your disks are, moving 300G of data to a new disk system is a lot of slow work. Here's a better idea: break your data into more manageable chunks, and design the system to fill those chunks AND make it easy to add more later. So, you implement today with 1TB of data space, broken up into two 500G chunks. Fill the first one, move on to the second one. Fill the second one, you bolt on more storage -- a process which will probably take minutes, not hours. When you bolt on more storage, you will be doing it in the future, when capacity is bigger and cost is less. Let's look at the machine I mentioned yesterday, our e-mail archive system: disks: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/wd0a 199358 4974213965026%/ /dev/wd0e 1030550 6979018 0%/tmp /dev/wd0d 4126462 2877500 104264073%/usr /dev/wd0f 103055019794278108220%/var /dev/wd0h 3093790 198932494977868%/home /dev/wd0g 154803456 18471934 12859135013%/archive /dev/wd2e 288411108 262462948 1152760696%/archive/a03 /dev/wd2f 288408068 264898440 908922697%/archive/a04 /dev/wd3e 480678832 442797322 1384757097%/archive/a05 /dev/wd3f 480675792 440723042 1591896297%/archive/a06 /dev/wd4e 480678832 439989958 1665493496%/archive/a07 /dev/wd4f 480675792 443581618 1306038697%/archive/a08 /dev/wd1e 480678840 19931182 436713716 4%/archive/a09 /dev/wd1f 480678368 2 45668 0%/archive/a10 Look that over carefully, you can almost see the story of the machine's design. wd0 is a mirrored pair of 300G SATA drives (Accusys 75160). Note that only a little more than half the drive is allocated at this time! Why? Because there's no reason to wait for an fsck on 300G when 160G is plenty. And besides, I may have guessed wrong in how big I made /var or /tmp or ... wd2 is a RAID5 set of 300G SATA drives (Accusys 76130). Why? Because it was the biggest bang for the buck at the time, split down the middle for manageability. wd1 also started out as 300G drives, but has since been replaced by the now cheaper/G 500G drives. It has only just started being used a couple days ago. wd3 and wd4 are also 1TB arrays made up of three 500G drives. They were purchased after the original 300G drives were getting full. Funny how that works, the 500G drives we just purchased (a09 and a10) cost less than the 300G drives we installed originally. Delaying purchasing storage until you need it is a good thing! The suspiciously missing a01 and a02 partitions are now sitting on a shelf, as they have been removed from the system. It is relatively unlikely that we will be needing to go back to those, but we hang on to 'em, Just In Case (and it is cheaper to hang onto three 300G SATA drives now than it is to restore from DVD if we were to need to). Granted, in five years, those drives may not spin up, nor may
Re: Webservers with Terrabytes of Data in - recomended setups
On Wed, Apr 18, 2007 at 03:22:07PM +0530, Siju George wrote: Hi, How Do you handle when you have to Serve terrabytes of Data through http/https/ftp etc? Put it on Differrent machines and use some knid of loadbalancer/intelligent program that directs to the right mahine? use some kind of clustering Software? Waht hardware do you use to make your System Scalable from a few terrabytes of Data to a few hundred of them? Does OpenBSD have any clustering Software available? Is anyone running such setups? Please let me know :-) I don't really know, but how about some http proxy (hoststated comes to mind, pound or squid also works) and a lot of hosts each serving a subset of the total behind that? Yes, that's exactly what you said. I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. Maybe rsync'ing from a central fileserver would work? However, there are a lot of specialized solutions available (various SANs come to mind; Google has published several papers on filesystems and algorithms like MapReduce, although the latter isn't going to help you for serving HTTP). All in all, though, I think the most important part are rate of change and reliability conditions. A big web host might hit an impressive amount of data, but it doesn't change all that often and a site occasionally going offline is usually tolerated (just restore a recent backup). In such cases, something like the above seems to work. Joachim -- TFMotD: moduli (5) - system moduli file
Re: Webservers with Terrabytes of Data in - recomended setups
I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). For serving content some HTTP-based scheme to get the requests to hit the right server is probably in order. Proxies are useful if you have special requirements (for example SSL, where it doesn't make sense to have the CPU and the disk in the same place), but it normally makes more sense to distribute the requests to the correct server/s in the first place (either by front-ends that know the location of content sending a Location: header if you want to give out URLs with a single server name) or by the html pointing clients to the files on the right servers. various SANs come to mind TFMotD: fsck(8) (-: Relying on black-box vendors for fixes is an additional bonus. Works for some people, though. Allegedly.
Re: Webservers with Terrabytes of Data in - recomended setups
Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Good idea yes, but if I recall properly, unless major changes have been done, isn't it the use of NFS become a huge bottle neck compare to local drive? I think the archive is full of complain about the thought put of NFS not being so good. Am I wrong here? I would love to use NFS as well for multiple servers accessing one source, but so far, it always being not so good to do that. If that's wrong please correct me as I would love to know if that still the case or not. Best, Daniel
Re: Webservers with Terrabytes of Data in - recomended setups
On Thu, Apr 19, 2007 at 10:51:56PM +0100, Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Something like that might be a very good idea, yes. Just don't try to serve everything directly off NFS. (An even better idea might be setting up a repository for your favourite version control system and making partial checkouts. Gets you most of the benefit of a unified filesystem, at the cost of complex - and thus fragile - checkin hooks. On the other hand, version control is likely to be a big plus.) For serving content some HTTP-based scheme to get the requests to hit the right server is probably in order. Proxies are useful if you have special requirements (for example SSL, where it doesn't make sense to have the CPU and the disk in the same place), but it normally makes more sense to distribute the requests to the correct server/s in the first place (either by front-ends that know the location of content sending a Location: header if you want to give out URLs with a single server name) or by the html pointing clients to the files on the right servers. I think doing that in HTML will quickly become an administration nightmare. various SANs come to mind TFMotD: fsck(8) (-: Relying on black-box vendors for fixes is an additional bonus. Works for some people, though. Allegedly. Yeah, they seem to work. It wouldn't be my first choice, either, but I've never tried to run OpenBSD in this kind of environment. At least a good, expen$$$ive SAN is good for covering your backside. JOachim -- TFMotD: perl561delta (1) - what's new for perl v5.6.x
Re: Webservers with Terrabytes of Data in - recomended setups
This isn't an OpenBSD specific solution, but you should be able to use an EMC san to accomplish this (we use a fiber channel setup) On 4/19/07, Stuart Henderson [EMAIL PROTECTED] wrote: On 2007/04/19 18:08, Daniel Ouellet wrote: Stuart Henderson wrote: I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. NFS may actually be useful; if you really need the files in one directory space for management/updates that's a way to do it (i.e. mount all the various storage servers by NFS on a management station/ftp server/whatever). Good idea yes, but if I recall properly, unless major changes have been done, isn't it the use of NFS become a huge bottle neck compare to local drive? I think the archive is full of complain about the thought put of NFS not being so good. I meant using it the other way round: have the *webservers* export their filesystem, and ftp/management servers mount them to provide a single space for carrying out updates and backups, locating files, etc. Having a bunch of webservers serve data from a large NFS store seems less attractive for most of the cases I can think of. The main one I see where it may be attractive is where heavy CGI processing or similar is done (that's usually a different situation to having many TB of data, though). In the CGI case, there are some benefits to distributing files by another way (notably avoiding the NFS server as a point of failure), rsync as Joachim mentioned is one way to shift the files around, CVS is also suitable, it encourages keeping tighter control over changes too, and isn't difficult to learn.
Re: Webservers with Terrabytes of Data in - recomended setups
Siju George wrote: Hi, How Do you handle when you have to Serve terrabytes of Data through http/https/ftp etc? Put it on Differrent machines and use some knid of loadbalancer/intelligent program that directs to the right mahine? use some kind of clustering Software? Waht hardware do you use to make your System Scalable from a few terrabytes of Data to a few hundred of them? Does OpenBSD have any clustering Software available? Is anyone running such setups? Please let me know :-) Thankyou so much Kind Regards Siju Too open-ended a question... Are you talking about many TB on one site? Lots of sites? Is there some reason it has to be on one server or one site? Is this huge storage, huge demand? Huge storage, low demand? Is this storage all needed on day 1, or will it grow with time? (hint: if it grows with time, build for NOW, with ability to add later, don't buy storage in advance!) etc. Let the answers to those questions guide your engineering work, don't rely on knee-jerk reactions. And don't be afraid to change the question to meet available answers. :) Common error is to take the given proposed solution (posed as a problem, but often someone has digested the REAL problem into what they think is the only possible model, and sent you down a bad alley) as gospel, and never question the basic assumptions. I've got a web server with over 3.5TB of storage on it that cost about $6000US a year or so ago. It's a huge-storage, low-demand app, probably gets on average a query a day, if that. If the box breaks, time can be spent repairing it, but we don't want to lose the data (it's carefully backed up, but the backup media is so compressed, it takes longer to uncompress the files than it does to scp them back into the box!). So, the thing has redundancy where it counts (disk) and simplicity where it doesn't matter, and it can be upgraded, enhanced and changed as needed. And, we have a small enough amount invested in the thing that we can completely change our mind about the approach to the problem any time in the future and throw it all away with a very clear conscience. (My current boss-of-the-week thinks he wants to replace this with an unknown proprietary app feeding a $30,000 per-processor database server attached to a $60,000 disk array, so you can see how insignificant the price tag on this system is. You can also see something about my boss. And why I'm looking for a better job). Let's say you have one website that you are trying to serve massive amounts of static files from. I presume you aren't just dropping people at the root of a massive directory tree and letting them dig for their desired file...you probably have some kind of app directing them to the file they need. Well, you should have no problem also directing them to the SERVER they need, as well...do a little magic on the front-end machine, you could also implement massive amounts of very cheap redundancy for very low cost. For example, if you have two machines, A and B, skip RAID, just put both data sets on both machines. If you lose A, serve A's files from B, it's a little slower, but still working. Repair A, resync (if needed) and you are back up and running at 100%. Now you can use the absolutely cheapest and least redundant machines around to accomplish your task. (in this case, your front-end machines would have to be a little more sophisticated...but still should have multiple-machine redundancy). SANs are the cool way to do this, of course. Also a very expensive way...and something I'd try to avoid unless it was really needed. Design it simple, design it to be fixable WHEN it breaks, and you will save your hair... Use all the tricks you can for YOUR solution, including: * lots of small partitions * RO any partitions you can (no need to fsck after an oops) * Assume you will need more storage later, and figure out how to add it without removing data from your existing storage * Assume your existing 500G disk is going to look pathetic in a few years when 10TB microdrives are in your palmtop computer, and make sure you have a plan to migrate the data off those first disks you installed. * Guess how much processor you need, and figure out how to deal with it when you are wrong. * Keep in mind if you don't expect lots of demand this year, next year's systems will be a lot faster, bigger and cheaper. * Last year's computers loaded with modern disks are still pretty darned fast for many applications. Nick.
Webservers with Terrabytes of Data in - recomended setups
Hi, How Do you handle when you have to Serve terrabytes of Data through http/https/ftp etc? Put it on Differrent machines and use some knid of loadbalancer/intelligent program that directs to the right mahine? use some kind of clustering Software? Waht hardware do you use to make your System Scalable from a few terrabytes of Data to a few hundred of them? Does OpenBSD have any clustering Software available? Is anyone running such setups? Please let me know :-) Thankyou so much Kind Regards Siju