Michael Torrie <torr...@gmail.com>: > On 03/18/2016 02:26 AM, Jussi Piitulainen wrote: >> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more >> promising. Indexing is by bytes (1-based in Julia) but the value at a >> valid index is the whole UTF-8 character at that point, and an >> invalid index raises an exception. > > This seems to me to be a leaky abstraction.
It may be that Python's Unicode abstraction is an untenable illusion because the underlying reality is 8-bit and there's no way to hide it completely. There's no problem providing pure Unicode strings. Things get iffy when Python's OS abstraction pretends sys.stdin is text or filenames are strings. > Julia's approach is interesting, but it strikes me as somewhat broken > as it pretends to do O(1) indexing, but in reality it's still O(n) If the underlying encoding is 8-bit, converting it to an O(1) structure would still be O(n). Marko -- https://mail.python.org/mailman/listinfo/python-list