On Mon, Jul 18, 2011 at 4:04 AM, Claudio Martella <[email protected]> wrote: > No, you can have collisions, so the index is not perfect (which means > you can have buckets for colliding keys and empty unused entries in the > hashtable directory).
Well, if a perfect index is what you are after, you can generate hashing functions that avoid collisions (You know this quality work by your fellow countrymen: http://sux.dsi.unimi.it/docs/it/unimi/dsi/sux4j/mph/MinimalPerfectHashFunction.html?) > Also, the index stores only offsets, not the keys. So they overhead of > the index is sizeof(long) * number of key-value pairs in the file (in > the optimal case, the overhead is 16 bytes for each colliding key-value > pair as it's managed through a linked-list). For this reason using a > load-factor > 1 can actually save memory and increase performance. > OK > Thank you very much for all this information. I'll try to implement a > prototype in September, I'm pretty busy with deadlines right now. > Good stuff, St.Ack
