Ski> I have been tussling with a SAN problem for several weeks now and Ski> would like comments from you folks on it.
This doesn't sound so much like a SAN problem, but a replication problem between your Equallogic boxes. Ski> Situation: Equallogic iscsi san. Primary site has a 2TB volume Ski> (1.7TB used per the SAN, 1.5TB used per the operating system). Ski> The volume is used as the datastore for a Scalix (used to be HP Ski> OpenMail) mail server. File system is ext3 mounted with defaults Ski> and yes I plan to redo this over the weekend. It has about 30 Ski> million files of which 60% are less than 4K in size. Equallogic Ski> uses a 64K strip on its arrays with a 256K block size (not Ski> changeable). DR site has 6TB allocated for replicas. How are you doing the replication? Block level? Rsync? Ski> My primary problem is that the replication keeps failing for Ski> running out of space even though I have 6TB available. I can do Ski> the first replica, and sometimes a second or third, but then it Ski> starts failing due to lack of space. Change amounts ranges are Ski> 200 - 500GB. Even with that I should be able to create a few Ski> replicas into 6TB (I would thnk). I'm not totally suprised, since it sounds like you're doing block level replication here, and since your files are all so much smaller than the minimum block size, you're having problems when only 4K of a block changes, it has to send the entire 64K stripe or 256K block over to the replica system. Does the initial replica take only 2Tb of space? And then the followons take lots more than the size of the changed files would suggest? Ski> What have you experienced with SAN's and applications that have Ski> millions of small files? What tricks did you use to make them Ski> work? Am I barking up the wrong tree and need to go in a totally Ski> different direction? I think you'll need to bite the bullet and do some sort of per-file replication, just because your usage is killing your SAN replication. I assume the Scalix mailstore if maildir format, with each message in it's own file? Not fun. I'm in a Netapp world these days, and while I do replication of volumes with lots and lots of small files, it's not at the level of churn you're at, nor is it important that I keep multiple snapshots around. Turning off atime updates in ext3 might be a good first step, anything you can do to limit changes to the filesystem would be a good thing. If you can break your filesystem down into smaller sub-units, that might let you do rsync style file level scans more efficiently. Or maybe you just do the intial replica using the block level stuff, THEN do a file level scan on the replica so your don't impact your production box and keep copies there. More details though! Thanks, John _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
