[issue4751] Patch for better thread support in hashlib

2009-05-03 Thread Gregory P. Smith

Gregory P. Smith g...@krypto.org added the comment:

Committed with a couple refactorings in trunk r72267.  I also added a
test (basically checking for corruption that would occur if the locks
weren't working).

(I'll sort out any py3k vs trunk differences to make future change
merges easier).

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-04-07 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

bump

hashlibopenssl_gil_py27.diff has not yet been applied to py27 and does
not apply cleanly any more. Here is an updated version.

--
status: pending - open
Added file: http://bugs.python.org/file13646/hashlibopenssl_gil_py27_2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-04-07 Thread Lukas Lueg

Changes by Lukas Lueg knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file13057/hashlibopenssl_gil_py27.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

@ebfe: Did you wrote the patch (for python 2.7)? Are you still 
interrested to write the patch?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-12 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

yes, I got lost on that one. I'll create a patch for 2.7 tonight.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-12 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

Patch for 2.7

Added file: http://bugs.python.org/file13057/hashlibopenssl_gil_py27.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

@ebfe: Your patch is very close to r68411 (patch for py3k), and so it 
looks correct (I didn't test it).

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-12 Thread Collin Winter

Changes by Collin Winter coll...@gmail.com:


--
nosy: +collinwinter, jyasskin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-02-11 Thread Gregory P. Smith

Gregory P. Smith g...@krypto.org added the comment:

assigning to me so i don't lose track of making sure this happens for 
trunk.

--
assignee:  - gregory.p.smith
components: +Extension Modules -Library (Lib)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-08 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

There is still a potential problem.
Figure the following:
- thread A executes ENTER_HASHLIB while lock is NULL; therefore, thread
A has released the GIL and doesn't hold any lock
- thread B enters EVP_update with a large buffer (it can be there, since
A doens't hold the GIL)
- thread B allocates the lock, releases the GIL, and allocates the lock
- thread A continues running and arrives at LEAVE_HASHLIB; there,
self-lock is not NULL anymore, so it tries to release it; since it
hasn't acquired it before, this may block forever (depending on the
platform I assume)

To remove this possibility, the macros should probably look like:

#define ENTER_HASHLIB(obj) \
{ \
int __lock_exists = ((obj)-lock) != NULL; \
if (__lock_exists) { \
if (!PyThread_acquire_lock((obj)-lock, 0)) { \
Py_BEGIN_ALLOW_THREADS \
PyThread_acquire_lock((obj)-lock, 1); \
Py_END_ALLOW_THREADS \
} \
}

#define LEAVE_HASHLIB(obj) \
if (__lock_exists) { \
PyThread_release_lock((obj)-lock); \
} \
}

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-08 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Oops, nevermind what I said. The GIL isn't released if obj-lock isn't
there.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-08 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Haypo's last patch is ok. If you want it to be in 2.7 too, however,
you'll have to provide another patch (I won't do it myself).

--
resolution:  - accepted
versions:  -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-08 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Committed to py3k in r68411. Please tell me if you intend to do a patch
for 2.7. Otherwise, you/I can close the issue.

--
stage:  - committed/rejected
status: open - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-08 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

I'll do a patch for 2.7

--
versions: +Python 2.7 -Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-06 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The patch looks fine to me, apart from one point: the return value of
PyThread_allocate_lock() should be checked for NULL, and the error
either propagated or cleared.

(I'd also suggest lowering HASHLIB_GIL_MINSIZE to 2048 or 4196)

Gregory, what's your take?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-06 Thread Gregory P. Smith

Gregory P. Smith g...@krypto.org added the comment:

hashlibopenssl_small_lock-4.diff looks good to me.

I also agree that HASHLIB_GIL_MINSIZE should be lowered to 2048.

Commit it, and please backport it to trunk before closing this bug.

--
nosy:  -gps
versions: +Python 2.7, Python 3.1 -Python 3.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-06 Thread Lukas Lueg

Changes by Lukas Lueg knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file12587/hashlibopenssl_small_lock-4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-06 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

PyThread_allocate_lock can fail without interference. object-lock will
stay NULL and the GIL is simply not released.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-04 Thread Lukas Lueg

Changes by Lukas Lueg knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file12533/hashopenssl_threads-4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-04 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

I've modified haypo's patch as commented. The object's lock should be
free 99.9% of the time so we try non-blocking first and can thereby skip
 releasing and re-locking the gil (to avoid a deadlock).

Added file: http://bugs.python.org/file12587/hashlibopenssl_small_lock-4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-04 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file12542/hashlibopenssl_small_lock-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-04 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file12554/hashlibopenssl_small_lock-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-03 Thread ebfe

Changes by ebfe knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file12557/md5module_small_locks.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-03 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

Haypo, we can probably reduce overhead by defining ENTER_HASHLIB like this:

#define ENTER_HASHLIB(obj) \
if ((obj)-lock) { \
if (!PyThread_acquire_lock((obj)-lock, 0)) { \
Py_BEGIN_ALLOW_THREADS \
PyThread_acquire_lock((obj)-lock, 1); \
Py_END_ALLOW_THREADS \
} \
}

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-03 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

I'm not sure about the approach of dynamically allocating self-lock.
Imagine you allocate this lock while another thread is between
ENTER_HASHLIB and LEAVE_HASHLIB. What happens on LEAVE_HASHLIB? The
thread tries to release a lock it hadn't acquired (because the lock was
NULL at the time). Is it simply ignored?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-03 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

The lock is created while having the GIL in EVP_update. No other
function releases the GIL (besides the creator-function which does not
need the local lock).

Thereby no other thread can be in between ENTER and LEAVE while the lock
is allocated.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

Releasing the GIL is somewhat expensive and should be avoided if
possible. I've moved LEAVE_HASHLIB in EVP_update so the object gets
unlocked before we call Py_END_ALLOW_THREADS. This is *only* possible
because EVP_update does not use the object beyond those lines.

Here is a new patch and a small test-script.

Added file: http://bugs.python.org/file12533/hashopenssl_threads-4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

test-script

Added file: http://bugs.python.org/file12534/hashlibtest2.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

Changes by ebfe knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file12461/hashopenssl_threads-3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

gnarf, actually it should be 'threads.append(Hasher(md))' in the script :-\

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Releasing the GIL is somewhat expensive and should be avoided 
 if possible.

Another possible solution is to create a lockless object by default, 
and create a lock if the data size is bigger than N (eg. 8 KB). When 
the lock is created, update will always use the lock (and so the GIL).

In general, you have two classes of hashlib usages:
 - hash a big files by chunk of k KB (eg. 256 KB)
 - hash a very small string (eg. 8 bytes)

When you have a small string, you don't need to release the GIL nor to 
use locks. Whereas for a file, you can always release the GIL (and so 
you need a lock to protect the context).

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

I don't think this is actually worth the trouble. You run into situation
where one thread might decide that it needs a lock now with other
threads being in the to-be-locked-area at that time.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

New implementation of finer lock grain in _hashlibopenssl: only create 
the lock at the first update with more than 8 KB bytes. Object 
creation/deallocation is faster if we hash less than 8 KB.

Changes between hashopenssl_threads-4.diff and my new patch: fix the 
deadlock in ENTER_HASHLIB() (for the GIL) without speed change 
(because we don't change the GIL state if we don't use a lock).

Changes between py3k trunk and my new patch:
 - release the GIL with large byte string = faster with multiple CPUs
 - fix EVP_get_block_size() and EVP_get_digest_size(): the context was 
not protected by the lock!

Added file: http://bugs.python.org/file12541/hashlibopenssl_small_lock.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file12459/hashopenssl_threads-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Update small lock patch: replace all tabs by spaces! I forget a change 
between Python trunk and my patch: there is also the error message for 
Unicode object. Before:
import hashlib; hashlib.md5(abc)
   TypeError: object supporting the buffer API required
after:
import hashlib; hashlib.md5(abc)
   TypeError: Unicode-objects must be encoded before hashing

Added file: http://bugs.python.org/file12542/hashlibopenssl_small_lock-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file12541/hashlibopenssl_small_lock.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread Gregory P. Smith

Gregory P. Smith g...@krypto.org added the comment:

First:  thanks for doing this.  I've had a patch sitting in my own
sandbox to release the GIL while hashing for a while but I hadn't
finished testing it.  It looks pretty similar to what you've done so
lets go with the patch being developed in this issue.

Rather than making HASHLIB_GIL_MINSIZE a constant I suggest making it a
property of the hash object so that it can be set by the user.  Most
users will be fine with the default but depending upon the application,
platform and hash algorithm being used other values may make more sense.

--
nosy: +gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

I don't think so.

The interface should stay simple - python has very few such magic knobs.
People will optimize for their own box as you said - and that code will
run worse on all the others...

Besides, we've lived so long with single-threaded openssl. Let's make
HASHLIB_GIL_MINSIZE such that there is no risk of additional overhead
introduced by this patch and refer to it's current value in the
hashlib-module's documentation.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

haypo, the patch will not compile when WITH_THREADS is not defined. The
'lock'-member in the object structure is not present without
WITH_THREADS however the line 'if (self-lock == NULL  view.len =
HASHLIB_GIL_MINSIZE)' will always refer to it.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

About HASHLIB_GIL_MINSIZE, I'm unable to mesure the overhead. I tried 
timeit with 8190 and 8200 but the results are *very* close. I'm 
running Linux, it's maybe different on other OS.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

Here is another patch, this time for the fallback-md5-module. I know
that situations are rare where openssl is not present but threading is.
However they might occur out there and the md5module needed some love
anyway:

- The MD5 class from the fallback module can now also use threads with
'small locks'
- The behaviour regarding unicode data input is now consistent as to
what the openssl-driven classes do.
- Some code cleanup.


I might act on the sha modules as way the next days. sha256.c still
accepts 's#'...

I might a

Added file: http://bugs.python.org/file12557/md5module_small_locks.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-02 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

ebfe Here is another patch, this time for the fallback-md5-module

Please open a separated issue for each module, this issue is already 
too long and complex ;-) And it would be easier to fix other modules 
when patches for hashlib will be accepted ;-)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-01 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-01 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Ooooh, I suggested to ebfe to remove the GIL unlock/lock, but I was 
wrong :-( I hate locks! What is the right fix? Replace
   ENTER_HASHLIB(self)
   Py_BEGIN_ALLOW_THREADS
   ...
   Py_END_ALLOW_THREADS
   LEAVE_HASHLIB(self)
by
   Py_BEGIN_ALLOW_THREADS
   ENTER_HASHLIB(self)
   ...
   LEAVE_HASHLIB(self)
   Py_END_ALLOW_THREADS
?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2009-01-01 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The right fix would probably be to define ENTER_HASHLIB(self) as
Py_BEGIN_ALLOW_THREADS
PyThread_acquire_lock(self-lock)
Py_END_ALLOW_THREADS

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-31 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Based on quick testing on my computer, we could probably put the limit
as low as 1KB. But it may be that locks are cheap under Linux.
In any case, the patch looks good, but I'm no OpenSSL expert.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-31 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Actually, your code can deadlock since ENTER_HASHLIB doesn't release the
GIL. Think about it:

// Thread A is here, holding the GIL and waiting for self-lock to be 
// released by thread B
ENTER_HASHLIB(self)
Py_BEGIN_ALLOW_THREADS
// Thread B is here, holding self-lock and waiting for the GIL to be
// released by thread A
Py_END_ALLOW_THREADS
LEAVE_HASHLIB(self)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread ebfe

New submission from ebfe knabberknusperh...@yahoo.de:

The hashlib functions provided by _hashopenssl.c hold the GIL all the
time although the underlying openssl-library is basically thread-safe.
I've attached a patch (svn diff) which basically does four things:

* If python is compiled with thread-support, the EVPobject is extended
by an additional PyThread_type_lock which protects the objects individually.
* The 'update' function releases the GIL if the to-be-hashed object is a
Bytes-object and therefor provides trustworthy locking (all other types,
including subclasses, are not trustworthy!). This allows multiple
threads to do hashing in parallel.
* The EVP_hash function removes duplicated code.
* The situation regarding unicode objects is now more meaningful. Upon
passing a unicode-string to the .update() function, the original hashlib
throws a TypeError: object supporting the buffer API required which is
confusing. I think it's perfectly valid not to accept unicode-strings as
input and people should required to call str.encode() upon their strings
before hashing, so a well-defined byte-representation of their strings
get hashed. Therefor I patched the MY_GET_BUFFER_VIEW_OR_ERROUT-macro to
throw TypeError: Unicode-objects must be encoded before hashing. This
also fixes issue #1118


I've tested this patch and did not run into problems. CPU occupancy
relies on the buffer-size passed to .update() as releasing the GIL is
basically not worth the effort for very small buffers. More testing may
be needed...

--
components: Library (Lib)
files: hashopenssl_threads.diff
keywords: patch
messages: 78297
nosy: ebfe
severity: normal
status: open
title: Patch for better thread support in hashlib
type: performance
versions: Python 3.0
Added file: http://bugs.python.org/file12453/hashopenssl_threads.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I think that you don't use Py_BEGIN_ALLOW_THREADS / 
Py_END_ALLOW_THREADS correctly: the GIL can be released when the 
hashlib lock is acquired (to run hash functions in parallel threads). 
So the macros should be:

#define ENTER_HASHLIB \
   PyThread_acquire_lock(self-lock, 1); \
   Py_BEGIN_ALLOW_THREADS

#define LEAVE_HASHLIB \
   Py_END_ALLOW_THREADS \
   PyThread_release_lock(self-lock);

If I'm right, issue #4738 (zlib) is also affected.

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Hi,

Very good idea. However, you don't need to discriminate for the bytes
type specifically. When a buffer is taken on the object (with
PyObject_GetBuffer()), the object is internally locked until the
buffer is release with PyBuffer_Release(). Try with a bytearray and
you'll see: if you resize the bytearray while hashing it in another
thread, you'll get a BufferError exception.

All in all, it should make your code and macros much simpler.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

EVP_copy() and EVP_get_digest_size() should call 
ENTER_HASHLIB/LEAVE_HASHLIB to protect self-ctx.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

If view.len is negative, EVP_hash() may read invalid memory :-/ Be 
careful of integer overflow in this block:

   Py_ssize_t offset = 0, sublen = len;
   while (sublen) {
  unsigned int process = sublen  MUNCH_SIZE ? MUNCH_SIZE : 
sublen;
  ...
   }

You removed Py_SAFE_DOWNCAST(len, Py_ssize_t, unsigned int) which 
should be used (eg. on process?).

Note: you might modify len directly instead of using a second variable 
(sublen), and cp instead of using an offset.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

New version of ebfe's patch:
 - ENTER/LEAVE_HASHLIB:
   * don't touch GIL in ENTER_HASHLIB (it's useless)
   * add mandatory argument (explicit use of self)
 - EVP_hash():
   * restore Py_SAFE_DOWNCAST
   * simplify the code: always use the while() instead of if+while
   * use while (0  len) to skip zero or negative value (even if 
pitrou told me that len should not be negative)
 - EVP_dealloc(): free the lock before the context
 - release the GIL for all calls to EVP_hash()
 - use the context lock in EVP_copy() and EVP_get_digest_size() to 
protect self
 - don't use the context lock in EVP_repr() (useless because we don't 
read OpenSSL context)
 - fix the indentation of the code (replace tab by spaces)

Some rules for ENTER/LEAVE_HASHLIB:
 * it is only needed to protect the context attribute (eg. name 
doesn't need to be protected by the lock)
 * it doesn't touch the GIL: use an explicit call to 
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS

About the GIL:
 * EVP_DigestInit() and EVP_MD_CTX_copy() are consired fast enough to 
no release the GIL
 * The GIL is released for the slowest function: EVP_DigestUpdate() 
(called in our EVP_hash() function)

Added file: http://bugs.python.org/file12459/hashopenssl_threads-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread ebfe

Changes by ebfe knabberknusperh...@yahoo.de:


Removed file: http://bugs.python.org/file12453/hashopenssl_threads.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

Thanks for the advices.

Antoine, maybe you could clarify the situation regarding buffer-locks
for me. In older versions of PEP 3118 the PyBUF_LOCK flag was still
present but it doesn't seem to have made it's way into the final draft.
Is it save to assume that a buffer-view will not change until release()
is called - for all types supporting the buffer protocol in py3k ??

I've done some testing and the overhead of releasing and re-locking the
GIL is definitely a performance problem when trying to hash many small
strings (doubled runtime for 100.000 times b'abc'). I've taken on
haypo's patch to release the GIL only when the buffer is larger than 10kb.

Added file: http://bugs.python.org/file12461/hashopenssl_threads-3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Is it save to assume that a buffer-view will not change until release()
 is called - for all types supporting the buffer protocol in py3k ??

Yes, it is!

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 I've taken on haypo's patch to release the GIL only 
 when the buffer is larger than 10kb

You can factorize the code by moving Py_BEGIN_ALLOW_THREADS / 
Py_END_ALLOW_THREADS *into* EVP_hash ;-)

10 KB is a random value or the fast value for your computer?

I wrote a small benchmark: md5sum.py, my Python multithreaded version 
of md5sum. Results on 129 files (between 7 and 10 MB) on an Intel Quad 
Core @ 2.5 GHz:
 - without the patch: best=10.6 sec / average ~= 11.5 sec
 - with the patch (version 3): best=7.7 sec / average ~= 8.5 sec

My program creates N threads for N files, which is maybe stupid (eg. 
limit to C+1 thread for C cores).

Added file: http://bugs.python.org/file12462/md5sum.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

New version of my md5sum.py program limited to 10 threads. New 
benchmark with 160 files (size in 7..10 MB):
 - Python unpatched: best=4.8 sec
 - C version (/usr/bin/md5sum): best=3.6 sec
 - Python patched: best=2.1 sec

As everybody knows, Python is faster than the C language ;-) And the 
patch is really useful (the program is more than twice faster with 4 
cores).

Added file: http://bugs.python.org/file12464/md5sum.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file12462/md5sum.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

Here is another simple benchmarker. For me it shows almost perfect
scaling (2 cores = 196% performance) if the buffer put into .update() is
large enough.

I deliberately did not move Py_BEGIN_ALLOW_THREADS into EVP_hash as we
might call this function without having some lock on the input buffer.

The 10kb limit was based on my own computer (MacBook Pro 2x2.5GHz) and
is somewhat more-safe-than-sorry.
Hashing is *very* fast on modern CPUs and working on many small strings
becomes very inefficient when releasing the GIL all the time. Just try
to hash 10240 bytes vs. 10241 bytes.

Added file: http://bugs.python.org/file12465/hashlibtest.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4751] Patch for better thread support in hashlib

2008-12-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

hashlibtest.py results on my Quad Core with 4 threads:
 - unpatched: best=13.0 sec
 - patched: best=3.25 sec

Some maths: 13.0 / 4 = 3.25 \o/

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4751
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com