On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Indeed. My guess is that indexing is more common than you think, > especially when iterating over the string. Of course, iteration > could also operate on UTF-8, if you introduced string iterator > objects.
Python's slice-and-dice model pretty much ensures that indexing is common. Almost everything is ultimately represented as indices: regex search results have the index in the API, find()/index() return indices, many operations take a start and/or end index. As long as that's the case, indexing better be fast. Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index(). Still, the mere existence of __getitem__ and __getslice__ on strings makes it necessary to implement them efficiently. How realistic would it be to drop them? What should replace them? Some kind of abstract pointers-into-strings perhaps, but that seems much more complex. The trick seems to be to support both simple programs manipulating short strings (where indexing is probably the easiest API to understand, and the additional copying is unlikely to cause performance problems) , as well as programs manipulating very large buffers containing text and doing sophisticated string processing on them. Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com