"Jim Jewett" <[EMAIL PROTECTED]> wrote: > On 10/3/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Jim Jewett schrieb: > > > By knowing that there is only one possible representation for a given > > > string, he skips the equivalency cache. On the other hand, he also > > > loses the equivalency cache. > > > What is an equivalency cache, and why would one like to have one? > > Same string, different encoding. > > The Py 2.x unicode implementation saves a cached copy of the string > encoded in the default coding, but > > (1) it always creates the UCS4 (or UCS2) encoding, even though it > isn't always needed. > (2) any 3rd encoding -- not matter how frequent -- requires either > a fresh copy every time, or manual caching. > > An equivalency cache would save all input/output encodings that the > string was recoded to/from. (Possibly only with weak references -- > the mapping itself might benefit from tuning based on various > applications.)
If users don't want to recode, they should save the resulting encoding to a local or global variable. I'm personally not terribly concerned about needing to recode text every time one needs to access Tcl, win32, GTK, or QT APIs. For a large portion of the cases, Python does that now, and so far I've not heard any substantial complaints of "Python is slow when accessing API X". Whether we choose internal encoding based on content (Latin-1, UCS-2, UCS-4), or choose a single internal encoding based on a tradeoff of representation size and access time, I don't think it matters. Why? The odds are poor that any encoding we choose will really be the right internal encoding for more than a handful of cases, so users are going to need to recode, or write to handle the one (or more) internal encoding(s) available; which they already do. - Josiah _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
