[Lustre-discuss] SSD caching of MDT
Article by Jeff Layton: http://www.linux-mag.com/id/7839 anyone have views on whether this sort of caching would be useful for the MDT? My feeling is that MDT reads are probably pretty random but writes might benefit...? GREG -- Greg Matthews01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] SSD caching of MDT
IMHO, SSD has more IOPS then disks and has larger capacity then raid/nvram so it seems that SSD should help in MDS, the U want SSD in dual host env to support failover? regards On 8/19/2010 8:29 AM, Gregory Matthews wrote: Article by Jeff Layton: http://www.linux-mag.com/id/7839 anyone have views on whether this sort of caching would be useful for the MDT? My feeling is that MDT reads are probably pretty random but writes might benefit...? GREG attachment: laotsao.vcf___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] SSD caching of MDT
On 2010-08-19, at 7:27, LaoTsao 老曹 laot...@gmail.com wrote: IMHO, SSD has more IOPS then disks and has larger capacity then raid/nvram so it seems that SSD should help in MDS, the U want SSD in dual host env to support failover? regards On 8/19/2010 8:29 AM, Gregory Matthews wrote: Article by Jeff Layton: http://www.linux-mag.com/id/7839 anyone have views on whether this sort of caching would be useful for the MDT? My feeling is that MDT reads are probably pretty random but writes might benefit...? The MDS is doing mostly linear writes to the journal and async checkpoint of those blocks to the filesystem (so they can be reordered and merged). That is one reason why we have seen relatively modest performance gains from SSD in benchmarks. That said, I think that there is still benefit to be had from SSD _if the whole MDT is on SSD_ because I suspect the random lookup/unlink performance for full/aged filesystems will be much better. We haven't done any benchmarks to this effect, however. Cheers, Andreas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] SSD caching of MDT
On Thu, Aug 19, 2010 at 01:29:37PM +0100, Gregory Matthews wrote: Article by Jeff Layton: http://www.linux-mag.com/id/7839 anyone have views on whether this sort of caching would be useful for the MDT? My feeling is that MDT reads are probably pretty random but writes might benefit...? if you look at the tiny size of inodes in slabtop on an MDS you'll see that all read ops for most fs's are probably 100% cached in ram by a decent sized MDS. ie. once you have traversed all inodes of a fs once, then likely the MDT's are a write-only media, and the ram of the MDS is a faster iop machine than any SSD could ever be. you are then left with a MDT workload of entirely small writes. that is definitely not a SSD sweet spot - many SSDs will fragment badly and slow down horrendously, which eg. JBODs of 15k rpm SAS disks will not do. basically beware of cheap SSDs, possibly any SSD, and certainly any SSD that isn't an Intel x25-e or better. the Marvell controller SSDs we sadly have many of now, I would not inflict upon any MDT. also, having experimented with ramdisk MDT's (not in production obviously), it is clear that even this 'perfect' media doesn't solve all Lustre iops problems. far from it. usually it just means that you hit algorithmic or numa problems in Lustre MDS code, or (more likely) the ops just flow onto the OSTs and those become the bottleneck instead. basically ramdisk MDT speedups weren't big over even just say, 16 fast FC or SAS disks. SSDs would be in-between if they were behaving perfectly, which would require extensive testing to determine. looking at it a different way, Lustre's statahead kinda works ok, create's are (IIRC) batched so also scale ok, so delete's might be the only workload left where the fastest MDT money can buy would get you any significant benefit... probably not worth the spend for most folks. assuming for a moment that SSDs worked as they should, then other Lustre related workloads for which SSDs might be suitable are external journals for OSTs, md bitmaps, or (one day) perhaps ZFS intent logs. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss