Gregory P. Smith wrote: > I'm in favor of not allowing unicode for hash functions. Depending on > the system default encoding for a hash will not be portable. > > another question for hashlib: It uses PyArg_Parse to get a single 's' > out of an optional parameter [see the code] and I couldn't figure out > what the best thing to do there was. It just needs a C string to pass > to openssl to lookup a hash function by name. Its C so i doubt it'll > ever be anything but ascii. How should that parameter be parsed > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > that it did that (is it always utf-8 or the possibly useless as far as > APIs expecting C strings are concerned system "default encoding")? > Requiring a bytes object would also work but I really don't like the > idea of users needing to use a specific type for something so simple. > (i consider string constants with their preceding b, r, u, s, type > characters ugly in code without a good reason for them to be there) >
The PyBUF_CHARACTER flag was an add-on after I realized that the old buffer API was being in several places to get Unicode objects to encode their data as a string (in the default encoding of the system, I believe). The unicode object is the only one that I know of that actually does something different when it is called with PyBUF_CHARACTER. > test_hashlib.py passed on the x86 osx system i was using to write the > code. I neglected to run the full suite or grep for hashlib in other > test suites and run those so i missed the test_unicodedata failure, > sorry about the breakage. > > Is it just me or do unicode objects supporting the buffer api seem > like an odd concept given that buffer api consumers (rather than > unicode consumers) shouldn't need to know about encodings of the data > being received. I think you have a point. The buffer API does support the concept of "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks rather like a hack. I'd have to look, because I don't even remember what is returned as the "format" from a unicode object if it is requested (it is probably not correct). I would prefer that the notion of encoding a unicode object is separated from the notion of the buffer API, but last week I couldn't see another way to un-tease it. -Travis > > -gps > > On 8/26/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: >> Change r57490 by Gregory P Smith broke a test in test_unicodedata and, >> on PPC OSX, several tests in test_hashlib. >> >> Looking into this it's pretty clear *why* it broke: before, the 's#' >> format code was used, while Gregory's change changed this into using >> the buffer API (to ensure the data won't move around). Now, when a >> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the >> buffer API uses the raw bytes in the Unicode object, which is >> typically UTF-16 or UTF-32. (I can't quite figure out why the tests >> didn't fail on my Linux box; I'm guessing it's an endianness issue, >> but it can't be that simple. Perhaps that box happens to be falling >> back on a different implementation of the checksums?) >> >> I checked in a fix (because I don't like broken tests :-) which >> restores the old behavior by passing PyBUF_CHARACTER to >> PyObject_GetBuffer(), which enables a special case in the buffer API >> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw >> bytes. (I still find this questionable, especially since a few random >> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make >> tests pass, but for the *bytes* type, requesting *characters* (even >> encoded ones) is iffy. >> >> But I'm wondering if passing a Unicode string to the various hash >> digest functions should work at all! Hashes are defined on sequences >> of bytes, and IMO we should insist on the user to pass us bytes, and >> not second-guess what to do with Unicode. >> >> Opinions? >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> _______________________________________________ >> Python-3000 mailing list >> Python-3000@python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/greg%40krypto.org >> _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com