On Feb 02, 2009 02:39 -0800, Ben Rockwood wrote: > This is just a thought exercise.... but I'm curious what would exactly be > involved in essentially biasing caching such that a 'ls -al' was never slow. > > In my experience, IO speed an vary, but if a user types "ls -al" in > the shell and the response isn't nearly instantaneous they start calling > IT staff. Being able to cache all that data (perhaps by priming it) > ensuring its not bumped out later would be interesting. > > For ZFS this is primarily a function of ZAP and DNLC, correct? > Does "metadata" caching satisfy everything a directory listing could > want or are there bits of data that slip through requiring actual disk IO?
At the SPA level, it would seem possible to use flash in a different manner than is currently used by ZFS today. Instead of using flash only as a cache (as is essentially done with the L2ARC and the Logzilla) it would be possible to have the SPA allocate metadata on special RAID-1 VDEV(s) that are built from SSDs, and similarly avoid data allocations on these SSD VDEV(s) unless other VDEVs were full. My understanding is that the SPA currently already knows whether a specific allocation is for data or metadata, though there might need to be some tweaks so that e.g. the meta-dnode contents and ZAPs are considered metadata. This could potentially put all "ls -l" data permanently in high-IOPS flash storage. The main question is what fraction of the pool needs to be on SSDs to make this workable? 5%, 10%? It obviously depends on the average file size, but I'd suspect there is some rough estimate of how much SSD storage would be needed to have a good chance that all metadata is in flash. Cheers, Andreas PS - I'm not a ZFS developer, so don't take this as gospel... Just musings. -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.