Yeah hash(hash(immutable))=hash(immutable) it seems. Not sure this is a
specification but it happens that way:
--
$ hash('abc')
-1600925533
$ hash(hash('abc'))
-1600925533
$ hash(hash(hash(('abc'
-1600925533
-
I am a little surprised that hash(hash(s)) == hash(s), is that actually
true?
As hashes are integers the hash of an integer is the integer itself,
this isn't to surprising.
Diez
--
http://mail.python.org/mailman/listinfo/python-list
Terry Hancock wrote:
Nick Vatamaniuc wrote:
The original post just said that he wanted to index some values by
their urls and didn't really care about the urls themselves, so I
suggested that he just use the hash of the key as the key. The
dictionary will then take a hash of that [note
Terry Hancock wrote:
Nick Vatamaniuc wrote:
The original post just said that he wanted to index some values by
their urls and didn't really care about the urls themselves, so I
suggested that he just use the hash of the key as the key. The
dictionary will then take a hash of that [note
[EMAIL PROTECTED] wrote:
depending on your application, a bloom filter might be a good enough:
http://en.wikipedia.org/wiki/Bloom_filter
Thanks (everyone) for the comments. I like the idea of the bloom
filter or using an md5 hash, since a rare collision will not be a
show-stopper in
Fredrik Lundh [EMAIL PROTECTED] writes:
btw, since my brain is on vacation, could anyone who already has all
the necessary background information in their head (Paul?) tell me
if using a chopped-up MD5 or SHA-1 (or whatever) digest as the hash
functions for a bloom filter is a good idea...
Paul Rubin http://[EMAIL PROTECTED] writes:
Finally, I think to use Bloom filters effectively you have to choose
nlayers and layersize carefully, according to the number of keys you
expect to see, the amount of memory you have available, and/or the
probability of false matches that you're
[EMAIL PROTECTED] (k) wrote:
k Hello,
k I am using some very large dictionaries with keys that are long strings
k (urls). For a large dictionary these keys start to take up a
k significant amount of memory. I do not need access to these keys -- I
k only need to be able to retrieve the value
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated with a certain
key, so I do not
[EMAIL PROTECTED] wrote:
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated
[EMAIL PROTECTED] wrote:
Will the hash function always generate unique keys?
no. hash() is designed for dictionaries (hash tables), not for use as a
cryptographic hash.
depending on your application, a bloom filter might be a good enough:
http://en.wikipedia.org/wiki/Bloom_filter
(see
[EMAIL PROTECTED] wrote:
I just realized that of course the hash is not always going to be
unique, so this wouldn't really work. And it seems a hash table would
still need to store the keys (as strings) so that string comparisons
can be done when a collision occurs.
btw, Python's dictionary
[EMAIL PROTECTED] wrote:
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated
It should be enough but it might be a little slower than hash(string).
Diez B. Roggisch wrote:
[EMAIL PROTECTED] wrote:
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of
Dictionaries are hash tables in Python.
If you don't really want to store the keys, just use
dic[hash(key)]=value, this way the dictionary will have the same shape
and distribution of keys as dic[key]=value because
hash('abc')=hash(hash('abc')) but the long string of actual keys are
not
Nick Vatamaniuc wrote:
If you don't really want to store the keys, just use
dic[hash(key)]=value, this way the dictionary will have the same shape
and distribution of keys as dic[key]=value because
hash('abc')=hash(hash('abc')) but the long string of actual keys are
not referenced by the
how come you're so sure that there will never be any collisions ?
because none of his strings want their insurance to go up...
:*)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list
Fred,
I never said there will be no collisions. For clarity, can you quote
the exact phrase where I mentioned that?
To say that there will be no collision is nonsense because the # of
possible long url strings is certainly larger than 2^32, applying the
pigeon hole principle you could easily
Nick Vatamaniuc wrote:
I never said there will be no collisions. For clarity, can you quote
the exact phrase where I mentioned that?
the text I quoted is only true if you can guarantee that there will be
no collisions.
/F
--
http://mail.python.org/mailman/listinfo/python-list
Please don't top post. I had to fix that below.
Any other thoughts or considerations are appreciated.
You could try and create a md5 sum of your strings and use that as key. It
_should_ be good enough, but I'm no crypto expert so take that with a grain
of salt.
It should be enough but it
[EMAIL PROTECTED] writes:
do it this way am I going to get the memory savings I am after? Will
the hash function always generate unique keys? Also, would the same
technique work for a set?
A hash function that always generates unique hashes is called a
perfect hash. They tend to be slow,
depending on your application, a bloom filter might be a good enough:
http://en.wikipedia.org/wiki/Bloom_filter
Thanks (everyone) for the comments. I like the idea of the bloom
filter or using an md5 hash, since a rare collision will not be a
show-stopper in my case.
--
Nick Vatamaniuc wrote:
The original post just said that he wanted to index some values by
their urls and didn't really care about the urls themselves, so I
suggested that he just use the hash of the key as the key. The
dictionary will then take a hash of that [note that:
Terry == Terry Hancock [EMAIL PROTECTED] writes:
Note that it is trivial to catch collisions on entry and correct them:
key = hash(url_string)
i = 0
if d.has_key(key):
while d.has_key(key):
i += 1
hash is a number. It's sufficient to do
while d.has_key(key):
key += 1
Ganesan Rajagopal [EMAIL PROTECTED] writes:
hash is a number. It's sufficient to do
while d.has_key(key):
key += 1
This is called linear probing and is not considered that great a
collision resolution strategy for hash tables. Really, if you want an
exhaustive study about hashing, see
25 matches
Mail list logo