Jeff/Troy,

----- "Jeff Darcy" <[EMAIL PROTECTED]> wrote:

> We've tried loading "canned" MDT/OST images into memory on some nodes
> and serving from there, and it does seem to work.  There are two
> downsides, though.  One is that the Linux loopback driver is a real
> performance bottleneck, ever since some bright person had the idea to
> make it less multi-threaded than it had been.  Another is that
> booting tends to involve metadata-heavy access patterns which are not exactly
> Lustre's strength - a situation made worse when you have nearly a
> thousand clients doing it at the same time and your MDS is a
> relatively small node like the others.  So far we've found that NBD serves us
> better in the boot/root filesystem role, though that means a
> read-only root which involves its own complexity.  Your mileage will almost
> certainly vary.

A good trick with Lustre to get around the metadata bottleneck is to use disk 
image files on Lustre (e.g. SquashFS) and mount them using the loopback driver 
on each compute node (or "lctl attach_device" ?). So instead of having to 
bother the MDS you need only seek through the file on an OST. By either 
striping the read-only image across all your OSTs or having a round-robin image 
per OST you can get pretty good scalability.

I tried this with our 700 node compute cluster but to be honest the overall 
booting performance was not that different to a couple of NFS servers serving a 
read-only root so it was not really worth the extra complexity in the end.

We do still use SquashFS on Lustre from time to time when we have a directory 
tree with 30,000 small files in it that needs to be read by every farm machine. 
It's rare but it does happen and traditionally NFS does much better with such 
workloads.

Daire
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to