On Fri, May 14, 2004 at 01:38:12PM -0400, Tanner Lovelace <[EMAIL PROTECTED]> wrote: > David Rasch said the following on 5/14/04 1:13 PM: > > > >With an inode -> filename hash, couldn't this be done linearly O(N) ? > >Check each file against the hash looking for a file already using this > >inode? > > Theoretically, but the data structures could get tricky. You'd need to > sure the hashing function was large enough so that there weren't collisions. > But, since the inode is a number, the perfect (as far as no collisions go) > hashing function is the identity function. But, then how do you store it. > If you allocate an array for all possible values, then yes, you can do > it O(N) by just checking that bucket but at the cost of a huge amount > of memory. With each bit you lower the hash function you decrease memory > requirements but also increase the number of things you must check to > avoid collisions. It would be interesting to see the curve that > results from this tradeoff. You have to pay somewhere, be it memory or > operations.
Well, collisions in hash functions aren't the kiss of death, and a non-identity hashing functions could yield very good results. Clearly rsync knows something about hashes as it uses them for determining what parts of the file need transfered. As for memory, having spent some time modding rsync, I can tell you that one of rsync's main flaws is keeping the entire file transfer list in memory, and storing the inode for each file wouldn't be a big change. Since this is all a theoretical discussion, I was just pointing out that the algorithm is not necessarily O(N^2) and could also be computed _before_ the file transfer and the memory for the above hash could be then discarded. David
signature.asc
Description: Digital signature
-- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
