Re: BCrypt + Python3

2013-05-18 Thread Donald Stufft

On May 18, 2013, at 5:15 AM, Aymeric Augustin 
 wrote:

> Apologies for answering so late. I see the change discussed here was already 
> committed. The change itself is fine — essentially because it's limited to 
> the bcrypt password hasher — but I'd like to bring some perspective to parts 
> of this discussion.
> 
> Overall, I strongly advocate consistency in the Python ecosystem, and the 
> standard library sets the, err, standard. Here's how it deals with this 
> situation in Python 3.
> 
 import hashlib
> 
> 1) Hash functions must reject str objects because the encoding isn't 
> guaranteed:
> 
 hashlib.md5('foo')
> Traceback (most recent call last):
>  File "", line 1, in 
> TypeError: Unicode-objects must be encoded before hashing
> 
> 2) Digests must be returned as bytes (quite obviously):
> 
 hashlib.md5(b'foo').digest()
> b'\xac\xbd\x18\xdbL\xc2\xf8\\\xed\xefeO\xcc\xc4\xa4\xd8'
> 
> 3) Hex digests must be returned as str:
> 
 hashlib.md5(b'foo').hexdigest()
> 'acbd18db4cc2f85cedef654fccc4a4d8'
> 
> Adapting this example to Python 2 is left as an exercise :)
> 
> As a consequence, I agree with Claude's recommendation to use unicode strings 
> whenever possible (eg. for hex digests). However, I believe that a simple 
> hash function mustn't accept unicode strings. Wrappers — say, an 
> make_password_hash function — must encode unicode strings to bytes before 
> passing them to hash functions.
> 
> Regarding Donald's pull request, `data = force_bytes(data)` makes sense, 
> because the hasher must be fed bytes. There's already a `password = 
> force_bytes(password)` just above.
> 
> I'm less enthusiastic about the change adding `force_text(data)`. It actually 
> works around bcrypt.hashpw returning an unexpected type in these 
> circumstance. But, if that's how bcrypt.hashpw works, that's fine.

Well the python library returns bytes (and accepts bytes for the salt) because 
fundamentally bcrypt operates on bytes, and the C library reflects that. The 
force_text would need to happen either in Django or in the Python library and I 
believe it's more appropriate for it to happen in Django.

> 
> Donald, we've discussed this before and I know you have strong feelings 
> against the design of the standard library in this regard. Still, Python is 
> the environment we're living in, and we shouldn't fight it.
> 
> -- 
> Aymeric.
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: BCrypt + Python3

2013-05-18 Thread Aymeric Augustin
Apologies for answering so late. I see the change discussed here was already 
committed. The change itself is fine — essentially because it's limited to the 
bcrypt password hasher — but I'd like to bring some perspective to parts of 
this discussion.

Overall, I strongly advocate consistency in the Python ecosystem, and the 
standard library sets the, err, standard. Here's how it deals with this 
situation in Python 3.

>>> import hashlib

1) Hash functions must reject str objects because the encoding isn't guaranteed:

>>> hashlib.md5('foo')
Traceback (most recent call last):
  File "", line 1, in 
TypeError: Unicode-objects must be encoded before hashing

2) Digests must be returned as bytes (quite obviously):

>>> hashlib.md5(b'foo').digest()
b'\xac\xbd\x18\xdbL\xc2\xf8\\\xed\xefeO\xcc\xc4\xa4\xd8'

3) Hex digests must be returned as str:

>>> hashlib.md5(b'foo').hexdigest()
'acbd18db4cc2f85cedef654fccc4a4d8'

Adapting this example to Python 2 is left as an exercise :)

As a consequence, I agree with Claude's recommendation to use unicode strings 
whenever possible (eg. for hex digests). However, I believe that a simple hash 
function mustn't accept unicode strings. Wrappers — say, an make_password_hash 
function — must encode unicode strings to bytes before passing them to hash 
functions.

Regarding Donald's pull request, `data = force_bytes(data)` makes sense, 
because the hasher must be fed bytes. There's already a `password = 
force_bytes(password)` just above.

I'm less enthusiastic about the change adding `force_text(data)`. It actually 
works around bcrypt.hashpw returning an unexpected type in these circumstance. 
But, if that's how bcrypt.hashpw works, that's fine.

Donald, we've discussed this before and I know you have strong feelings against 
the design of the standard library in this regard. Still, Python is the 
environment we're living in, and we shouldn't fight it.

-- 
Aymeric.



-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: BCrypt + Python3

2013-05-11 Thread Donald Stufft

On May 11, 2013, at 4:10 AM, Claude Paroz  wrote:

> Le samedi 11 mai 2013 07:59:18 UTC+2, Donald Stufft a écrit :
> I went looking for BCrypt + Django + Python3 today and this is what I found: 
> 
> The current recommended solution to bcrypt + Django is using py-bcrypt which 
> is not compatible with Python3. 
> 
> Someone else has taken py-bcrypt and created py3k-bcrypt however they made 
> the decision to enforce having str instances sent in for password/salt 
> instead of bytes which makes it incompatible with Django's encode function on 
> the BCrypt password hashers[1]. 
> 
> So I created a library simply called `bcrypt`[2] which has the same API as 
> py-bcrypt but it functions on Python 2.6+ and 3.x (as well as PyPy 2.0 since 
> it's implemented via CFFI). When testing this against Python3 + Django I 
> discovered that Django isn't properly encoding/decoding when talking to the 
> external library so I made a patch that causes it to always send bytes to the 
> external library, and str's to other pats of Django. That can be found here: 
> https://github.com/django/django/pull/1052 
> 
> My bcrypt library is obviously new code but it's a small wrapper over 
> crypt_blowfish from OpenWall[3] and I'm wondering (assuming no one objects to 
> me merging my Patch) if it would make sense to switch the documentation away 
> from suggesting py-bcrypt and have it suggest bcrypt instead since it will 
> allow BCrypt to function on Python3 as well. 
> 
> Thoughts? 
> 
> [1] I believe this is inheriently wrong as bcrypt operates on bytes not on 
> unicode characters, and in order for this to work py3k-bcrypt must be 
> assuming a character set it can encode(). 
> [2] Found at https://crate.io/packages/bcrypt/ or 
> https://github.com/dstufft/bcrypt 
> [2] Found at http://www.openwall.com/crypt/ 
> 
> 
> Hi Donald,
> 
> There are several approaches in string handling in Python 3, being as content 
> input or output. As for me, I'm generally privileging unicode strings 
> whenever possible. See for example the Python hashlib behaviour for digest() 
> and hexdigest(): digest() returns a bytestring as it can return a full range 
> of bytes (0-255), while hexdigest() returns a string as the result is 
> guaranteed to be ASCII-safe.
> 
> Similarly I would have returned a string (unicode) from hashpw() as far as it 
> is guaranteed to be ASCII-safe. As for inputs, I think it is easy enough to 
> accept both bytestrings and strings, test them and encode('utf-8') when 
> needed.
> I recognize that it looks a bit odd on Python 2 to receive unicode when you 
> fed bytes to a method.
> 
> I'm not sure there is a "right" way, it's all about design and choice. Feel 
> free to ignore me :-)

As far as the return value from hashpw goes it's bytes primarily because the 
inputs to hashpw are expected to be bytes.

As far as the input values for hashpw goes, it accepts only bytes because 
bcrypt as an algorithm functions only on streams of bytes, not on unicode 
characters. Not every character in the world can be represented in utf-8 and I 
believe it's better for a library to require bytes (you can see the hashlib on 
python3 does this) than to make a possibly erroneous  guess.

> 
> Claude
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail


BCrypt + Python3

2013-05-10 Thread Donald Stufft
I went looking for BCrypt + Django + Python3 today and this is what I found:

The current recommended solution to bcrypt + Django is using py-bcrypt which is 
not compatible with Python3. 

Someone else has taken py-bcrypt and created py3k-bcrypt however they made the 
decision to enforce having str instances sent in for password/salt instead of 
bytes which makes it incompatible with Django's encode function on the BCrypt 
password hashers[1].

So I created a library simply called `bcrypt`[2] which has the same API as 
py-bcrypt but it functions on Python 2.6+ and 3.x (as well as PyPy 2.0 since 
it's implemented via CFFI). When testing this against Python3 + Django I 
discovered that Django isn't properly encoding/decoding when talking to the 
external library so I made a patch that causes it to always send bytes to the 
external library, and str's to other pats of Django. That can be found here: 
https://github.com/django/django/pull/1052

My bcrypt library is obviously new code but it's a small wrapper over 
crypt_blowfish from OpenWall[3] and I'm wondering (assuming no one objects to 
me merging my Patch) if it would make sense to switch the documentation away 
from suggesting py-bcrypt and have it suggest bcrypt instead since it will 
allow BCrypt to function on Python3 as well.

Thoughts?

[1] I believe this is inheriently wrong as bcrypt operates on bytes not on 
unicode characters, and in order for this to work py3k-bcrypt must be assuming 
a character set it can encode().
[2] Found at https://crate.io/packages/bcrypt/ or 
https://github.com/dstufft/bcrypt
[2] Found at http://www.openwall.com/crypt/

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail