* Would it be a problem to use CRC32 instead of SHA? (Since security is
not a problem, and CRC32 is faster.)
What happens if you get a collision?
That is, you have two different long identifiers:
a.b.c.d...something
a.b.c.d...anotherthing
which by bad luck both hash to the same value:
a.b.c.d.$AABB99
a.b.c.d.$AABB99
(or whatever).
Yes, that was the question. How do I avoid that? (Of course I can avoid
that by using a full sha256 hash value.)
* Can somebody think of a
better algorithm, that would give a bigger chance of recognizing the
original identifier from the modified one?
Rather than truncating the most significant part of the identifier, the
field name, you should truncate the least important part, the middle.
a.b.c.d.e.f.g.something
goes to:
a.b...g.something
or similar.
Yes, this is a good idea. Thank you.
--
http://mail.python.org/mailman/listinfo/python-list