> How do you assign one file to multiple folders, and then see the
> multiple parent folders associated with the file?
I think what the OP means is what's alternatively called hard links.
Not to rain on your parade, but you have some misconceptions about
filesystems on Linux (and, maybe, in general). First, Linux has a
virtual filesystem that serves as an interface to many possible other
filesystems. The virtual filesystem is the query interface that also
ensures that the actual filesystems used are somewhat similar, since
they need to provide the functionality required by the interface. Even
though some of the functionality is optional.
Admittedly, it's a simpler query language than SQL. On the other hand,
SQL is way too expressive for what a filesystem does or can do. So,
> What kind of script/commands will show the 50 Largest .zip files, in
> descending order by size, across a large set of subdirectories?
for d in (large set of subdirectories) ; do
find d -type f -name '*.zip' -printf '%s\t%p\n'
done | sort -n | tail -n50
I can never remember whether sort's default order is ascending or
descending, and I haven't tried running it for obvious reasons. If
it's the wrong order, you'll have to replace tail with head.
All things considered, I think I'd prefer the Shell solution over your
database in most cases because it queries the live filesystem (so
there's little chance of results being out of date).
So... to your speed measurements: you really need to know better what
you measure. There are plenty of different filesystems that can be
"hiding" behind the virtual filesystem, their performance will vary
greatly based on the operation you are trying to perform. Not just
that, the performance will differ greatly based on the configuration
you used for a particular filesystem, file sizes and number of files
in a directory, the media backing the filesystem, the amount of memory
and how "warm" it is during the test... Testing performance requires a
lot of knowledge and understanding, and it's very rare that you can
claim some total results, like X is faster than Y (in every possible
situation). Think about it being as difficult as it is in math to find
totality proofs (i.e. that P(x) for all X in (some large, usually
infinite set)).
Finally, retrieving filesystem-related information through Python is
not usually a very fast way to do it. Python likes to sacrifice speed
for convenience. So, measuring against its performance is a
non-starter.
--
https://mail.python.org/mailman3//lists/python-list.python.org