On 14.11.2012 01:41, Richard Baron Penman wrote:
I found the MD5 and SHA hashes slow to calculate.
Slow? For URLs? Are you kidding? How many URLs per second do you want to
calculate?
The builtin hash is fast but I was concerned about collisions. What
rate of collisions could I expect?
MD5
thanks for perspective!
--
http://mail.python.org/mailman/listinfo/python-list
On 14.11.2012 02:39, Roy Smith wrote:
The next step is to reduce the number of bits you are encoding. You
said in another post that 1 collision in 10 million hashes would be
tolerable. So you need:
math.log(10*1000*1000, 2)
23.25349666421154
24 bits worth of key.
Nope :-)
Base64
On 11/14/2012 06:29 AM, Johannes Bauer wrote:
snip
When doing these calculations, it's important to keep the birthday
paradox in mind (this is kind of counter-intuitive): The chance of a
collission raises tremendously when we're looking for *any* arbitrary
two hashes colliding within a
On 14.11.2012 13:33, Dave Angel wrote:
Te birthday paradox could have been important had the OP stated his goal
differently. What he said was:
Ideally I would want to avoid collisions altogether. But if that means
significant extra CPU time then 1 collision in 10 million hashes would be
Hello,
I want to create a URL-safe unique ID for URL's.
Currently I use:
url_id = base64.urlsafe_b64encode(url)
base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
'ZG9jcy5weXRob24ub3JnL2xpYnJhcnkvdXVpZC5odG1s'
I would prefer more concise ID's.
What do you recommend? - Compression?
In 0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com Richard
richar...@gmail.com writes:
I want to create a URL-safe unique ID for URL's.
Currently I use:
url_id = base64.urlsafe_b64encode(url)
base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
Good point - one way encoding would be fine.
Also this is performed millions of times so ideally efficient.
On Wednesday, November 14, 2012 10:34:03 AM UTC+11, John Gordon wrote:
In 0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com Richard
richar...@gmail.com writes:
I want to
One option would be using a hash. Python's built-in hash, a 32-bit
CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
depending on the needs. Higher bit counts will reduce the odds of
accidental collisions; cryptographically secure ones if outside
attacks matter. In such a case,
I found the MD5 and SHA hashes slow to calculate.
The builtin hash is fast but I was concerned about collisions. What
rate of collisions could I expect?
Outside attacks not an issue and multiple processes would be used.
On Wed, Nov 14, 2012 at 11:26 AM, Chris Kaynor ckay...@zindagigames.com
Am 14.11.2012 01:26, schrieb Chris Kaynor:
One option would be using a hash. Python's built-in hash, a 32-bit
CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
depending on the needs. Higher bit counts will reduce the odds of
accidental collisions; cryptographically secure
These URL ID's would just be used internally for quick lookups, not exposed
publicly in a web application.
Ideally I would want to avoid collisions altogether. But if that means
significant extra CPU time then 1 collision in 10 million hashes would be
tolerable.
--
Am 14.11.2012 01:41, schrieb Richard Baron Penman:
I found the MD5 and SHA hashes slow to calculate.
The builtin hash is fast but I was concerned about collisions. What
rate of collisions could I expect?
Seriously? It takes about 1-5msec to sha1() one MB of data on a modern
CPU, 1.5 on my box.
Am 14.11.2012 01:50, schrieb Richard:
These URL ID's would just be used internally for quick lookups, not exposed
publicly in a web application.
Ideally I would want to avoid collisions altogether. But if that means
significant extra CPU time then 1 collision in 10 million hashes would be
I found md5 / sha 4-5 times slower than hash. And base64 a lot slower.
No database or else I would just use their ID.
On Wednesday, November 14, 2012 11:59:55 AM UTC+11, Christian Heimes wrote:
Am 14.11.2012 01:41, schrieb Richard Baron Penman:
I found the MD5 and SHA hashes slow to
In article 0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com,
Richard richar...@gmail.com wrote:
Hello,
I want to create a URL-safe unique ID for URL's.
Currently I use:
url_id = base64.urlsafe_b64encode(url)
base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
I am dealing with URL's rather than integers
--
http://mail.python.org/mailman/listinfo/python-list
So the use case - I'm storing webpages on disk and want a quick retrieval
system based on URL.
I can't store the files in a single directory because of OS limitations so have
been using a sub folder structure.
For example to store data at URL abc: a/b/c/index.html
This data is also viewed
The next step is to reduce the number of bits you are encoding. You
said in another post that 1 collision in 10 million hashes would be
tolerable. So you need:
math.log(10*1000*1000, 2)
23.25349666421154
I think a difficulty would be finding a hash algorithm that maps
In article 1ce88f36-bfc7-4a55-89f8-70d1645d2...@googlegroups.com,
Richard richar...@gmail.com wrote:
So the use case - I'm storing webpages on disk and want a quick retrieval
system based on URL.
I can't store the files in a single directory because of OS limitations so
have been using a
thanks for pointer to Varnish.
I found MongoDB had a lot of size overhead so that it ended up using 4x the
data stored.
--
http://mail.python.org/mailman/listinfo/python-list
On Wed, Nov 14, 2012 at 2:25 PM, Richard richar...@gmail.com wrote:
So the use case - I'm storing webpages on disk and want a quick retrieval
system based on URL.
I can't store the files in a single directory because of OS limitations so
have been using a sub folder structure.
For example
yeah good point - I have gone with md5 for now.
On Wednesday, November 14, 2012 3:06:18 PM UTC+11, Chris Angelico wrote:
On Wed, Nov 14, 2012 at 2:25 PM, Richard richar...@gmail.com wrote:
So the use case - I'm storing webpages on disk and want a quick retrieval
system based on URL.
23 matches
Mail list logo