On Wed, Jun 17 at 13:49, Alan Hargreaves wrote:
Another question worth asking here is, is a find over the entire filesystem something that they would expect to be executed with sufficient regularity that it the execution time would have a business impact.

Exactly.  That's such an odd business workload on 250,000,000 files
that there isn't likely to be much of a shortcut other than just
throwing tons of spindles (or SSDs) at the problem, and/or having tons
of memory.

If the finds are just by name, thats easy for the system to cache, but
if you're expecting to run something against the output of find with
-exec to parse/process 250M files on a regular basis, you'll likely be
severely IO bound.  Almost to the point of arguing for something like
Hadoop or another form of distributed map:reduce on your dataset with
a lot of nodes, instead of a single storage server.


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to