First, I'd like to note that I made a mistake earlier: I said shuffling is
O(n) space and O(m) time when in fact it's O(n) in both space and time. I
forgot you have to initialize the array.
On Wed, 26 Apr 2006, Rob wrote:
On Wed, Apr 26, 2006 at 11:21:19PM -0400, Rob wrote:
both a and bmare in the[1] set, even though hash(a) == hash(b). This would
use space O(k+k*log m ) and a running time that is a pain in the ass
Stupid terminal fuckups... that is supposed to read "a and b are in the
set" and below that "O(m+k*log m)[1]"; my term was screwed up so my edit
ended up off by one line :-(
Ahh, OK I finally understand what you're proposing. I think it's just
O(m+k) space though - at most m nodes in binary trees and always k
pointers to root nodes. As far as time, if we're stepping i from 0 to m-1,
it takes O(log n) to calculate the hash, O(log(i/k)) to search the tree at
that node, and there's an i/n chance that you have to go again, so the
work at every i is O((n/(n-i))*log(n*i/k)). This gives an absolutely
hideous integral, but the result is worse than O(m*log(m)) but better than
O(n*m*log(m)/(n-m)).
Also, why bother with p^x(mod k)? Since the x are random, and n >> k, then
just x(mod k) should give good coverage of the hash space, no? If you can
use x(mod k), it also lets you pick k on the fly, to be more optimal for
your particular m and n.
Still, is this the best one can do? Is it possbile to get O(m*log(m)) or
better time without using O(n) space? What about if we restrict ourselves
to deterministic algorithms?
Alexey