[Lustre-discuss] SSD caching of MDT

2010-08-19 Thread Gregory Matthews
Article by Jeff Layton:

http://www.linux-mag.com/id/7839

anyone have views on whether this sort of caching would be useful for 
the MDT? My feeling is that MDT reads are probably pretty random but 
writes might benefit...?

GREG
-- 
Greg Matthews01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] SSD caching of MDT

2010-08-19 Thread LaoTsao 老曹
 IMHO,  SSD has more IOPS  then disks and has larger capacity then 
raid/nvram
so it seems that SSD should help in MDS, the U want SSD in dual host env 
to support failover?

regards


On 8/19/2010 8:29 AM, Gregory Matthews wrote:

Article by Jeff Layton:

http://www.linux-mag.com/id/7839

anyone have views on whether this sort of caching would be useful for
the MDT? My feeling is that MDT reads are probably pretty random but
writes might benefit...?

GREG
attachment: laotsao.vcf___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] SSD caching of MDT

2010-08-19 Thread Andreas Dilger
On 2010-08-19, at 7:27, LaoTsao 老曹 laot...@gmail.com wrote:
 IMHO,  SSD has more IOPS  then disks and has larger capacity then raid/nvram
 so it seems that SSD should help in MDS, the U want SSD in dual host env to 
 support failover?
 regards
 
 
 On 8/19/2010 8:29 AM, Gregory Matthews wrote:
 Article by Jeff Layton:
 
 http://www.linux-mag.com/id/7839
 
 anyone have views on whether this sort of caching would be useful for
 the MDT? My feeling is that MDT reads are probably pretty random but
 writes might benefit...?
 

The MDS is doing mostly linear writes to the journal and async checkpoint of 
those blocks to the filesystem (so they can be reordered and merged).

That is one reason why we have seen relatively modest performance gains from 
SSD in benchmarks. 

That said, I think that there is still benefit to be had from SSD _if the whole 
MDT is on SSD_ because I suspect the random lookup/unlink performance for 
full/aged filesystems will be much better.  We haven't done any benchmarks to 
this effect, however. 

Cheers, Andreas
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] SSD caching of MDT

2010-08-19 Thread Robin Humble
On Thu, Aug 19, 2010 at 01:29:37PM +0100, Gregory Matthews wrote:
Article by Jeff Layton:

http://www.linux-mag.com/id/7839

anyone have views on whether this sort of caching would be useful for 
the MDT? My feeling is that MDT reads are probably pretty random but 
writes might benefit...?

if you look at the tiny size of inodes in slabtop on an MDS you'll
see that all read ops for most fs's are probably 100% cached in ram
by a decent sized MDS. ie. once you have traversed all inodes of a fs
once, then likely the MDT's are a write-only media, and the ram of the
MDS is a faster iop machine than any SSD could ever be.

you are then left with a MDT workload of entirely small writes. that is
definitely not a SSD sweet spot - many SSDs will fragment badly and
slow down horrendously, which eg. JBODs of 15k rpm SAS disks will not do.
basically beware of cheap SSDs, possibly any SSD, and certainly any SSD
that isn't an Intel x25-e or better. the Marvell controller SSDs we
sadly have many of now, I would not inflict upon any MDT.

also, having experimented with ramdisk MDT's (not in production
obviously), it is clear that even this 'perfect' media doesn't solve
all Lustre iops problems. far from it. usually it just means that you
hit algorithmic or numa problems in Lustre MDS code, or (more likely)
the ops just flow onto the OSTs and those become the bottleneck instead.
basically ramdisk MDT speedups weren't big over even just say, 16 fast
FC or SAS disks. SSDs would be in-between if they were behaving
perfectly, which would require extensive testing to determine.

looking at it a different way, Lustre's statahead kinda works ok,
create's are (IIRC) batched so also scale ok, so delete's might be
the only workload left where the fastest MDT money can buy would get
you any significant benefit... probably not worth the spend for most
folks.

assuming for a moment that SSDs worked as they should, then other
Lustre related workloads for which SSDs might be suitable are external
journals for OSTs, md bitmaps, or (one day) perhaps ZFS intent logs.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss