On 8/26/07, Travis Oliphant <[EMAIL PROTECTED]> wrote: > > Gregory P. Smith wrote: > > I'm in favor of not allowing unicode for hash functions. Depending on > > the system default encoding for a hash will not be portable. > > > > another question for hashlib: It uses PyArg_Parse to get a single 's' > > out of an optional parameter [see the code] and I couldn't figure out > > what the best thing to do there was. It just needs a C string to pass > > to openssl to lookup a hash function by name. Its C so i doubt it'll > > ever be anything but ascii. How should that parameter be parsed > > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > > that it did that (is it always utf-8 or the possibly useless as far as > > APIs expecting C strings are concerned system "default encoding")? > > Requiring a bytes object would also work but I really don't like the > > idea of users needing to use a specific type for something so simple. > > (i consider string constants with their preceding b, r, u, s, type > > characters ugly in code without a good reason for them to be there) > > > > The PyBUF_CHARACTER flag was an add-on after I realized that the old > buffer API was being in several places to get Unicode objects to encode > their data as a string (in the default encoding of the system, I believe). > > The unicode object is the only one that I know of that actually does > something different when it is called with PyBUF_CHARACTER. > > Is it just me or do unicode objects supporting the buffer api seem > > like an odd concept given that buffer api consumers (rather than > > unicode consumers) shouldn't need to know about encodings of the data > > being received. > > I think you have a point. The buffer API does support the concept of > "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks > rather like a hack. I'd have to look, because I don't even remember > what is returned as the "format" from a unicode object if it is > requested (it is probably not correct).
given that utf-8 characters are varying widths i don't see how it could ever practically be correct for unicode. I would prefer that the notion of encoding a unicode object is separated > from the notion of the buffer API, but last week I couldn't see another > way to un-tease it. > > -Travis A thought that just occurred to me... Would a PyBUF_CANONICAL flag be useful instead of CHARACTERS? For unicode that'd mean utf-8 (not just the default encoding) but I could imagine other potential uses such as multi-dimension buffers (PIL image objects?) presenting a defined canonical form of the data useful for either serialization and hashing. Any buffer api implementing object would define its own canonical form. -gps
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com