Offlist exchange forwarded with permission. ----- Forwarded message from Kyle Schochenmaier <[EMAIL PROTECTED]> -----
From: Kyle Schochenmaier <[EMAIL PROTECTED]> Date: Wed, 12 Nov 2008 16:37:39 -0600 To: Eugen Leitl <[EMAIL PROTECTED]> Cc: Troy Benjegerdes <[EMAIL PROTECTED]> Subject: Re: [Pvfs2-users] linux vserver and PVFS2 To answer a couple of your more basic questions about the filesystem: Performance : SATA disks provide great throughput as long as you dont overload them with IO's. If your workload is fairly sequential these will perform on par with FC disks. Scaling from my experience has shown to be fairly dependent on the disks and the network, not so much on the filesystem - when you start to get lots of spindles moving and have really fast networks (infiniband, mx, 10ge), then the filesystem's performance comes into play a bit more, and there are tweaks for performance that can be made at this point. Also, for systems where disks have more bandwidth available than the networks, scaling seems to be really good - in some cases very linear - and again, you'd be limited by network rates. On 4 bonded GigE connections, I would imagine you will be limited by the network performance as you start adding 50-80MB/s for each node you add. But I may not exactly understand your setup. If you have 4*GigE/node then thats a lot different. I'll leave the DRDB stuff for troy ;-) Typical failover/HA systems for PVFS2 are handled by a daemon called Heartbeat and in order for it to work you basically need to have multiple physical nodes with access to the same LUN's(disks). I dont know if this is very simple to implement without SRP/iSCSI/etc. There is no mirroring done inside pvfs2, I've looked into it, and several others have looked into it and provided proof of concepts, but none of said work has seen the light of day. Hope this helps a little. ~Kyle Kyle Schochenmaier On Wed, Nov 12, 2008 at 3:09 PM, Eugen Leitl <[EMAIL PROTECTED]> wrote: > On Wed, Nov 12, 2008 at 12:19:21PM -0600, Troy Benjegerdes wrote: >> What's your application? > > Hosting some ~100 vservers/node. > >> I was just looking at infiniband card prices, and it might cost you less >> than 4xGigE to get a 24 port IB switch and these cards.. > > The 4 ports are onboard a 407 EUR Sun Fire X2100 M2 kit. GBit switches > are almost free, too, and having several switches offers some redundancy. > >> http://www.colfaxdirect.com/store/pc/viewPrd.asp?idcategory=6&idproduct=12 >> >> >> For tolerating node failures, I would do some sort of software mirroring >> across nodes, using either something like DRDB, or Infiniband SRP. > > I've used drbd, but then I'd be wasting even more active nodes. > It would be nice to mix striping and mirroring at PVFS2 level. > >> >> Eugen Leitl wrote: >> >I'm planning to eventually operate a 20+ node cluster of Debian boxes >> >(2-4 cores AMD64, 8 GByte RAM, about 1-2 TByte RAID 1, 4x GBit Ether >> >interfaces, probably with jumbo frames) with a unified filestore. >> > >> >A few questions I've been unable to answer by searching: >> > >> >Can I make Linux vserver guests fs live on PVFS2? >> >If I don't use unification >> >http://linux-vserver.org/Frequently_Asked_Questions#Unification >> >? >> > >> >How much aggregate throughput can I expect with some 20 >> >nodes, with a modern 7 krpm SATA drive (RAID 1 pair, about >> >80 MByte/s sustainable throughput, or so)? >> > >> >Is there a way to set up PVFS2 to tolerate 1-2 node losses on >> >above 20-node assembly, and how much of the raw storage would I lose >> >that way? >> > >> >Thanks, >> > >> > > -- > Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
