First, I'd like to note that I made a mistake earlier: I said shuffling is O(n) space and O(m) time when in fact it's O(n) in both space and time. I forgot you have to initialize the array.

On Wed, 26 Apr 2006, Rob wrote:

On Wed, Apr 26, 2006 at 11:21:19PM -0400, Rob wrote:
both a and bmare in the[1] set, even though hash(a) == hash(b).  This would
use space O(k+k*log m ) and a running time that is a pain in the ass

Stupid terminal fuckups... that is supposed to read "a and b are in the
set" and below that "O(m+k*log m)[1]"; my term was screwed up so my edit
ended up off by one line :-(

Ahh, OK I finally understand what you're proposing. I think it's just O(m+k) space though - at most m nodes in binary trees and always k pointers to root nodes. As far as time, if we're stepping i from 0 to m-1, it takes O(log n) to calculate the hash, O(log(i/k)) to search the tree at that node, and there's an i/n chance that you have to go again, so the work at every i is O((n/(n-i))*log(n*i/k)). This gives an absolutely hideous integral, but the result is worse than O(m*log(m)) but better than O(n*m*log(m)/(n-m)).

Also, why bother with p^x(mod k)? Since the x are random, and n >> k, then just x(mod k) should give good coverage of the hash space, no? If you can use x(mod k), it also lets you pick k on the fly, to be more optimal for your particular m and n.

Still, is this the best one can do? Is it possbile to get O(m*log(m)) or better time without using O(n) space? What about if we restrict ourselves to deterministic algorithms?

                        Alexey

Reply via email to