At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:
For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage "first get it working, then get it working fast").
The issue is, I'd like to have an idempotent incantation that I can use to make the inputs and outputs to stdlib functions behave in a type-safe manner with respect to bytes, in cases where bytes are really what I want operated on.
Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to "know" what it's dealing with!
After all, right now, if a stdlib function might return bytes or unicode depending on runtime conditions, I can't even hardcode an .encode() call -- it would fail if the return type is a bytes.
This basically goes against the "tell, don't ask" pattern, and the Pythonically idempotent approach. That is, Python builtins normally return you back the same thing if it's already what you want - int(someInt)-> someInt, iter(someIter)->someIter, etc.
Since this incantation may need to be used often, and in places that are not known to me in advance, I would like it to not impose new overhead in unexpected places. (i.e., the usual argument brought against making changes to the 'list' type that would change certain operations from O(1) to O(log something)).
It's more about predictability, and having One *Obvious* Way To Do It, as opposed to "several ways, which you need to think carefully about and restructure your entire architecture around if necessary". One obvious way means I can focus on the mechanical effort of porting *first*, without having to think.
So, the performance issue isn't really about performance *per se*, so much as about the "mental UI" of the language. You could just as easily lie and tell me that your bstr implementation is O(1), and I would probably be happy and never notice, because the issue was never really about performance as such, but about having to *think* about it. (i.e., breaking flow.)
Really, the entire issue can presumably be dealt with by some series of incantations - it's just code after all. But having to sit and think about *every* situation where I'm dealing with bytes/unicode distinctions seems like a torture compared to being able to say, "okay, so when dealing with this sort of API and this sort of data, this is the One Obvious Way to do the conversions."
It's One Obvious Way that I want, but some people seem to be arguing that the One Obvious Way is to Think Carefully About It Every Time -- and that seems to violate the "Obvious" part, IMO. ;-)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com