Le samedi 1 octobre 2011 22:21:01, Antoine Pitrou a écrit : > So, since people are confused at the number of possible options, you > propose to add a new option and therefore increase the confusion?
The idea is to provide an API very close to the str type. So if your program becomes slow in some functions and these functions are manipulating strings: just try to replace str() by strarray() at the beginning of your loop, and redo your benchmark. I don't know if we really need all str methods: ljust(), endswith(), isspace(), lower(), strip(), ... or if a UnicodeBuilder supporting in-place a+=b would be enough. I suppose that it just would be more practical to have the same methods. Another useful use case is to be able to replace a substring: using strarray, you can use the standard array[a:b] = newsubstring to insert, replace or delete. Extract of strarray unit tests: abc = strarray('abc') abc[:1] = '123' # replace self.assertEqual(abc, '123bc') abc[3:3] = '45' # insert self.assertEqual(abc, '12345bc') abc[5:] = '' # delete self.assertEqual(abc, '12345') But only "replace" would be O(1). ("insert" requires less work than a replace in a classic str if the replaced string is near the end.) You cannot insert/delete using StringIO, str.join, or StringBuilder/UnicodeBuilder, but you can using array('u'). Of course, you can replace a single character: strarray[i] = 'x'. (Using array[a:b]=newstr and array.index(), you can implement your in-place .replace() function.) > I don't understand why StringIO couldn't simply be optimized a little > more, if it needs to. Honestly, I didn't know that StringIO.write() is more efficient than str+=str, and it is surprising to use the io module (which is supposed to be related to files) to manipulate strings. But we can maybe document some "trick" (is it a trick or not?) in str documementation (and in FAQ, and in stackoverflow.com, and ...). > Or, if straightforward string concatenation really needs to be fast, > then str + str should be optimized (like it used to be). We cannot have best performance and lowest memory usage at the same time with the new str implementation (PEP 393). The new implementation is even more focused on read-only (constant) strings than the previous one (Py_UNICODE array using two memory blocks). The PEP 393 uses one memory block, you cannot resize a str object anymore. The old str type, StringIO, array (and strarray) use two memory blocks, so it is possible to resize them (objects keep their identifier after the resize). I *might* be possible to implement strarray that is fast on concatenation and has small memory footprint, but we cannot use it for the str type because str is immutable in Python. -- On a second thaught, it may be easy to implement strarray if it reuses unicodeobject.c. For example, strarray can be a special case (mutable) of PyUnicodeObject (which use two memory blocks): the string would always be ready, be never compact. By the way, bytesobject.c and bytearrayobject.c is a fiasco: most functions are duplicated whereas the code is very close. A big refactor is required to remove duplicate code there. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com