On Friday 14 July 2006 01:14 pm, Rob Ross wrote: > Hi David, > > I need a little better idea of the entire workload to really answer your > question, but I can talk a little about pros and cons at least. > > First, we've actually run a CD farm on top of PVFS (well PVFS1) before. > Long ago we had a system at Clemson that ripped CDs onto PVFS and then > scheduled encodings on various cluster nodes using a job scheduler (PBS > at the time). A pretty cheezy version of what you're talking about, but > it was cool :). Also, it was relatively fast, once we got cdparanoia to > write in multi-KB blocks (a patch they accepted long ago).
Basically the same thing then. > > The first thing to think about is reliability. As long as you have > redundant storage on the various servers, and your rebuild times aren't > too long, you will not lose data. If a server fails for some reason, you > will lose access to data until you get the server running again, but the > data will still be there. There are ways to maintain access in the event > of server failure, but those require SAN hardware that you might not > already have in-house. Aha, this is about what I thought. Our current plan is to use commodity raid boxes with 32x750gig drives, starting out with 6 servers, and then adding more machines when needed (Another question -- if you do a 'mv stuff stuff', will that cause the file to be re-saved across the newly enlarged fs? I think I saw this trick on the Lustre list, would that work with PVFS2?) SAN is definately something to think about, but we don't do anything like that yet. > > Alternatively you could just split the space up into two volumes and > rsync or something to mirror, up to you. It sounds like this would be a > viable model for you, and disk is cheap. And so is my boss. :) > > Access from windows is going to require exporting with samba or NFS. We > don't really suggest that usually and don't test it in-house, but it > should work. Access will be relatively slow because of the lack of > client caching in PVFS. The encoding machines (for now), copy files to their own FS, encode, then copy all the resulting files back for storage (they do all this in parallel). I thought about this before I sent an email, and I assume I can still get some increase in throughput if the encoding machines talk to separate PVFS clients (say, encoders 1-5 talk to PVFS client #1, 6-10 talk to client #2, etc... If this is bogus, just say so. :) I realized this was an an issue, but as long as I can get the daily encoding done, there are other benefits that outweigh this problem. > > If the majority of your I/O traffic is to and from windows boxes, I > would say you should probably find something with better windows support > or that caches on clients so as to get better performance via samba or > NFS. If a significant part of your I/O traffic is on the linux side, > then I think PVFS might make sense for you. It's definately significant. We do some of the encoding in Unix, some in Windows (because we have a lot of storage machines that basically just sit there, so it's nice to have them do something). We fill a lot of hard drives, and I want to start using Linux for that (All the downstream companies from us want ntfs, so I will have to use Captive/Linux-NTFS...) > > Regards, > > Rob > > David Case wrote: > > I am looking at PVFS to replace a system of 45 machines holding about > > 100tb of data, were we currently use nfs and a bunch of symlinks to keep > > track of everything. When I was given this thing to administer, I was > > able to parallelize parts of it, but having everything under one > > filesystem would be really really nice. > > > > This system is basically the music equivalent of a render farm -- we get > > about 60-100 cds a day, we encode them in a lossless compressed format, > > then have a set of windows machines that run the various encodings (mp3, > > aac, wma, etc... some of which are unavailable on Linux). And then the > > files get delivered to a bunch of downstream partner companies (probably > > 70 or so). We keep a copy of all the encodings we do (23 different > > encodings at this time, more soon). So ideally we need a fast parallel > > system that can also serve as an archive (because when new partner > > companies come in, we give them the whole catalog). > > > > If we had a catastrophic loss of all the data, I would probably lose my > > job, but occasional partial losses are recoverable (we keep a store copy > > of every CD, and we have recovered from losing 10,000 albums in a fairly > > short amount of time) > > > > So do you think PVFS is suitable? I saw a post on this list that it is > > best suited for use as a fast scratchpad. I really like PVFS over Lustre > > just from looking at it -- a kernel module is vastly more palatable than > > patching the kernel, plus the whole thing is totally free and open, > > unlike Lustre. > > > > What do you think? -- David Case Digital Distribution Wrangler [EMAIL PROTECTED] _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
