>>>>> "Ski" == Ski Kacoroski <[email protected]> writes:
Ski> John, Ski> Thanks for your reply... You're welcome. Always glad to help. Ski> On 01/29/2010 09:12 AM, John Stoffel wrote: >> Ski> I have been tussling with a SAN problem for several weeks now and Ski> would like comments from you folks on it. >> >> This doesn't sound so much like a SAN problem, but a replication >> problem between your Equallogic boxes. >> Ski> Situation: Equallogic iscsi san. Primary site has a 2TB volume Ski> (1.7TB used per the SAN, 1.5TB used per the operating system). Ski> The volume is used as the datastore for a Scalix (used to be HP Ski> OpenMail) mail server. File system is ext3 mounted with defaults Ski> and yes I plan to redo this over the weekend. It has about 30 Ski> million files of which 60% are less than 4K in size. Equallogic Ski> uses a 64K strip on its arrays with a 256K block size (not Ski> changeable). DR site has 6TB allocated for replicas. >> >> How are you doing the replication? Block level? Rsync? Ski> Block level. I suspected that, since it's the only efficient way to do the copy... Ski> My primary problem is that the replication keeps failing for Ski> running out of space even though I have 6TB available. I can do Ski> the first replica, and sometimes a second or third, but then it Ski> starts failing due to lack of space. Change amounts ranges are Ski> 200 - 500GB. Even with that I should be able to create a few Ski> replicas into 6TB (I would thnk). >> >> I'm not totally suprised, since it sounds like you're doing block >> level replication here, and since your files are all so much smaller >> than the minimum block size, you're having problems when only 4K of a >> block changes, it has to send the entire 64K stripe or 256K block over >> to the replica system. >> >> Does the initial replica take only 2Tb of space? And then the >> followons take lots more than the size of the changed files would >> suggest? Ski> yes. >> Ski> What have you experienced with SAN's and applications that have Ski> millions of small files? What tricks did you use to make them Ski> work? Am I barking up the wrong tree and need to go in a totally Ski> different direction? >> >> I think you'll need to bite the bullet and do some sort of per-file >> replication, just because your usage is killing your SAN replication. >> I assume the Scalix mailstore if maildir format, with each message in >> it's own file? Not fun. Ski> No, the application basically creates a massive linked list on Ski> disk with each email broken into several smaller files (headers, Ski> body, wrapper, attachments, etc.). The data store has 960 Ski> directories which each have around 47000 files each (some have Ski> less). So go take the application vendor outside and shoot them. :] It's a piss-poor design, esp with memory being as cheap as it is these days for stuff liek this. They should have used a proper DB instead. >> I'm in a Netapp world these days, and while I do replication of >> volumes with lots and lots of small files, it's not at the level of >> churn you're at, nor is it important that I keep multiple snapshots >> around. >> >> Turning off atime updates in ext3 might be a good first step, anything >> you can do to limit changes to the filesystem would be a good thing. >> >> If you can break your filesystem down into smaller sub-units, that >> might let you do rsync style file level scans more efficiently. Or >> maybe you just do the intial replica using the block level stuff, THEN >> do a file level scan on the replica so your don't impact your >> production box and keep copies there. Ski> I tried once using rsync and after several days gave up because Ski> of so may files. Perhaps I should try a block level initial Ski> replication, then write some sort of parallel rsync that will do Ski> small directories at a time. It's not going to be pretty, but running a group of 10 rsyncs across those 960 directories in parallel might be your best option. Still sucks. And it will put a huge load on your array as well. You might be able to tweak the ext3 settings so that it knows it's running on a 256k block size filesystem, so that it will layout stuff on those boundaries, which might be a big help. mkfs -t ext3 -E stride=256k -J size=32m -O dir_index,extent /dev/lun... You might also look at the -G parameter, so that filesystem metadata is more grouped, to hopefully keep the number of changed blocks to a minimum. Moving to ext4 with extents might be a better option though. Can you test this out with a small test case? That would be the best thing to do. Good luck! John _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
