Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Nick Vatamaniuc
Yeah hash(hash(immutable))=hash(immutable) it seems. Not sure this is a specification but it happens that way: -- $ hash('abc') -1600925533 $ hash(hash('abc')) -1600925533 $ hash(hash(hash(('abc' -1600925533 -

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Diez B. Roggisch
I am a little surprised that hash(hash(s)) == hash(s), is that actually true? As hashes are integers the hash of an integer is the integer itself, this isn't to surprising. Diez -- http://mail.python.org/mailman/listinfo/python-list

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Peter Otten
Terry Hancock wrote: Nick Vatamaniuc wrote: The original post just said that he wanted to index some values by their urls and didn't really care about the urls themselves, so I suggested that he just use the hash of the key as the key. The dictionary will then take a hash of that [note

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Peter Otten
Terry Hancock wrote: Nick Vatamaniuc wrote: The original post just said that he wanted to index some values by their urls and didn't really care about the urls themselves, so I suggested that he just use the hash of the key as the key. The dictionary will then take a hash of that [note

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: depending on your application, a bloom filter might be a good enough: http://en.wikipedia.org/wiki/Bloom_filter Thanks (everyone) for the comments. I like the idea of the bloom filter or using an md5 hash, since a rare collision will not be a show-stopper in

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Paul Rubin
Fredrik Lundh [EMAIL PROTECTED] writes: btw, since my brain is on vacation, could anyone who already has all the necessary background information in their head (Paul?) tell me if using a chopped-up MD5 or SHA-1 (or whatever) digest as the hash functions for a bloom filter is a good idea...

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Paul Rubin
Paul Rubin http://[EMAIL PROTECTED] writes: Finally, I think to use Bloom filters effectively you have to choose nlayers and layersize carefully, according to the number of keys you expect to see, the amount of memory you have available, and/or the probability of false matches that you're

Re: don't need dictionary's keys - hash table?

2006-07-13 Thread Piet van Oostrum
[EMAIL PROTECTED] (k) wrote: k Hello, k I am using some very large dictionaries with keys that are long strings k (urls). For a large dictionary these keys start to take up a k significant amount of memory. I do not need access to these keys -- I k only need to be able to retrieve the value

don't need dictionary's keys - hash table?

2006-07-12 Thread kdotsky
Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread kdotsky
[EMAIL PROTECTED] wrote: Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: Will the hash function always generate unique keys? no. hash() is designed for dictionaries (hash tables), not for use as a cryptographic hash. depending on your application, a bloom filter might be a good enough: http://en.wikipedia.org/wiki/Bloom_filter (see

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote: I just realized that of course the hash is not always going to be unique, so this wouldn't really work. And it seems a hash table would still need to store the keys (as strings) so that string comparisons can be done when a collision occurs. btw, Python's dictionary

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Diez B. Roggisch
[EMAIL PROTECTED] wrote: Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Nick Vatamaniuc
It should be enough but it might be a little slower than hash(string). Diez B. Roggisch wrote: [EMAIL PROTECTED] wrote: Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Nick Vatamaniuc
Dictionaries are hash tables in Python. If you don't really want to store the keys, just use dic[hash(key)]=value, this way the dictionary will have the same shape and distribution of keys as dic[key]=value because hash('abc')=hash(hash('abc')) but the long string of actual keys are not

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Fredrik Lundh
Nick Vatamaniuc wrote: If you don't really want to store the keys, just use dic[hash(key)]=value, this way the dictionary will have the same shape and distribution of keys as dic[key]=value because hash('abc')=hash(hash('abc')) but the long string of actual keys are not referenced by the

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Tim Chase
how come you're so sure that there will never be any collisions ? because none of his strings want their insurance to go up... :*) -tkc -- http://mail.python.org/mailman/listinfo/python-list

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Nick Vatamaniuc
Fred, I never said there will be no collisions. For clarity, can you quote the exact phrase where I mentioned that? To say that there will be no collision is nonsense because the # of possible long url strings is certainly larger than 2^32, applying the pigeon hole principle you could easily

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Fredrik Lundh
Nick Vatamaniuc wrote: I never said there will be no collisions. For clarity, can you quote the exact phrase where I mentioned that? the text I quoted is only true if you can guarantee that there will be no collisions. /F -- http://mail.python.org/mailman/listinfo/python-list

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Diez B. Roggisch
Please don't top post. I had to fix that below. Any other thoughts or considerations are appreciated. You could try and create a md5 sum of your strings and use that as key. It _should_ be good enough, but I'm no crypto expert so take that with a grain of salt. It should be enough but it

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Paul Rubin
[EMAIL PROTECTED] writes: do it this way am I going to get the memory savings I am after? Will the hash function always generate unique keys? Also, would the same technique work for a set? A hash function that always generates unique hashes is called a perfect hash. They tend to be slow,

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread kdotsky
depending on your application, a bloom filter might be a good enough: http://en.wikipedia.org/wiki/Bloom_filter Thanks (everyone) for the comments. I like the idea of the bloom filter or using an md5 hash, since a rare collision will not be a show-stopper in my case. --

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Terry Hancock
Nick Vatamaniuc wrote: The original post just said that he wanted to index some values by their urls and didn't really care about the urls themselves, so I suggested that he just use the hash of the key as the key. The dictionary will then take a hash of that [note that:

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Ganesan Rajagopal
Terry == Terry Hancock [EMAIL PROTECTED] writes: Note that it is trivial to catch collisions on entry and correct them: key = hash(url_string) i = 0 if d.has_key(key): while d.has_key(key): i += 1 hash is a number. It's sufficient to do while d.has_key(key): key += 1

Re: don't need dictionary's keys - hash table?

2006-07-12 Thread Paul Rubin
Ganesan Rajagopal [EMAIL PROTECTED] writes: hash is a number. It's sufficient to do while d.has_key(key): key += 1 This is called linear probing and is not considered that great a collision resolution strategy for hash tables. Really, if you want an exhaustive study about hashing, see