As has been said, a builtin *could* be written that would be "friendly to subclassing", by the definition in this thread. (I'll stay out of the argument for the moment as to whether that would be better)
I suspect that the reason str acts like it does is that it was originally written a LONG time ago, when you couldn't subclass basic built in types at all. Secondarily, it could be a performance tweak -- minimal memory and peak performance are pretty critical for strings. But collections.UserString does exist -- so if you want to subclass, and performance isn't critical, then use that. Steven A pointed out that UserStrings are not instances of str though. I think THAT is a bug. And it's probably that way because with the magic of duck typing, no one cared -- but with all the static type hinting going on now, that is a bigger liability than it used to be. Also basue when it was written, you couldn't subclass str. Though I will note that run-time type checking of string is relatively common compared to other types, due to the whole a-str-is-a-sequence-of-str issue making the distinction between a sequence of strings and a string itself is sometimes needed. And str is rarely duck typed. If anyone actually has a real need for this I'd post an issue -- it'd be interesting if the core devs see this as a bug or a feature (well, probably not feature, but maybe missing feature) OK -- I got distracted and tried it out -- it was pretty easy to update UserString to be a subclass of str. I suspect it isn't done that way now because it was originally written because you could not subclass str -- so it stored an internal str instead. The really hacky part of my prototype is this: # self.data is the original attribute for storing the string internally. Partly to prevent my having to re-write all the other methods, and partly because you get recursion if you try to use the methods on self when overriding them ... @property def data(self): return "".join(self) The "".join is because it was the only way I quickly thought of to make a native string without invoking the __str__ method and other initialization machinery. I wonder if there is another way? Certainly there is in C, but in pure Python? Anyway, after I did that and wrote a __new__ -- the rest of it "just worked". def __new__(cls, s): return super().__new__(cls, s) UserString and its subclasses return instances of themselves, and instances are instances of str. Code with a couple asserts in the __main__ block enclosed. Enjoy! -CHB NOTE: VERY minimally tested :-) On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico <ros...@gmail.com> wrote: > On Wed, 21 Dec 2022 at 09:30, Cameron Simpson <c...@cskk.id.au> wrote: > > > > On 19Dec2022 22:45, Chris Angelico <ros...@gmail.com> wrote: > > >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano <st...@pearwood.info> > wrote: > > >> > But this much (say with a better validator) gets you static type > checking, > > >> > syntax highlighting, and inherent documentation of intent. > > >> > > >> Any half-way decent static type-checker will immediately fail as soon > as > > >> you call a method on this html string, because it will know that the > > >> method returns a vanilla string, not a html string. > > > > > >But what does it even mean to uppercase an HTML string? Unless you > > >define that operation specifically, the most logical meaning is > > >"convert it into a plain string, and uppercase that". > > > > Yes, this was my thought. I've got a few subclasses of builtin types. > > They are not painless. > > > > For HTML "uppercase" is a kind of ok notion because the tags are case > > insensitive. > > Tag names are, but their attributes might not be, so even that might > not be safe. > > > Notthe case with, say, XML - my personal nagging example is > > from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a > > "screenoverlay" both existing with different semantics. Ugh. > > Ugh indeed. Why? Why? Why? > > > So indeed, I'd probably _want_ .upper to return a plain string and have > > special methods to do more targetted things as appropriate. > > > > Agreed. > > ChrisA > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
""" A UserString implementation that subclasses from str so instances of it and its subclasses are instances of string -- could be handy for using with static typing. NOTE: this could probably be cleaner code, but this was done with an absolute minimum of changes from what's in the standard library """ import sys as _sys class UserString(str): def __new__(cls, s): return super().__new__(cls, s) # There's no need for this logic in __init__ # def __init__(self, seq): # if isinstance(seq, str): # self.data = seq # elif isinstance(seq, UserString): # self.data = seq.data[:] # else: # self.data = str(seq) @property def data(self): return "".join(self) def __str__(self): return str(self.data) def __repr__(self): return repr(self.data) def __int__(self): return int(self.data) def __float__(self): return float(self.data) def __complex__(self): return complex(self.data) def __hash__(self): return hash(self.data) def __getnewargs__(self): return (self.data[:],) def __eq__(self, string): if isinstance(string, UserString): return self.data == string.data return self.data == string def __lt__(self, string): if isinstance(string, UserString): return self.data < string.data return self.data < string def __le__(self, string): if isinstance(string, UserString): return self.data <= string.data return self.data <= string def __gt__(self, string): if isinstance(string, UserString): return self.data > string.data return self.data > string def __ge__(self, string): if isinstance(string, UserString): return self.data >= string.data return self.data >= string def __contains__(self, char): if isinstance(char, UserString): char = char.data return char in self.data def __len__(self): return len(self.data) def __getitem__(self, index): return self.__class__(self.data[index]) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, str): return self.__class__(self.data + other) return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, str): return self.__class__(other + self.data) return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data * n) __rmul__ = __mul__ def __mod__(self, args): return self.__class__(self.data % args) def __rmod__(self, template): return self.__class__(str(template) % self) # the following methods are defined in alphabetical order: def capitalize(self): return self.__class__(self.data.capitalize()) def casefold(self): return self.__class__(self.data.casefold()) def center(self, width, *args): return self.__class__(self.data.center(width, *args)) def count(self, sub, start=0, end=_sys.maxsize): if isinstance(sub, UserString): sub = sub.data return self.data.count(sub, start, end) def removeprefix(self, prefix, /): if isinstance(prefix, UserString): prefix = prefix.data return self.__class__(self.data.removeprefix(prefix)) def removesuffix(self, suffix, /): if isinstance(suffix, UserString): suffix = suffix.data return self.__class__(self.data.removesuffix(suffix)) def encode(self, encoding='utf-8', errors='strict'): encoding = 'utf-8' if encoding is None else encoding errors = 'strict' if errors is None else errors return self.data.encode(encoding, errors) def endswith(self, suffix, start=0, end=_sys.maxsize): return self.data.endswith(suffix, start, end) def expandtabs(self, tabsize=8): return self.__class__(self.data.expandtabs(tabsize)) def find(self, sub, start=0, end=_sys.maxsize): if isinstance(sub, UserString): sub = sub.data return self.data.find(sub, start, end) def format(self, /, *args, **kwds): return self.data.format(*args, **kwds) def format_map(self, mapping): return self.data.format_map(mapping) def index(self, sub, start=0, end=_sys.maxsize): return self.data.index(sub, start, end) def isalpha(self): return self.data.isalpha() def isalnum(self): return self.data.isalnum() def isascii(self): return self.data.isascii() def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def isidentifier(self): return self.data.isidentifier() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isprintable(self): return self.data.isprintable() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width, *args): return self.__class__(self.data.ljust(width, *args)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self, chars=None): return self.__class__(self.data.lstrip(chars)) maketrans = str.maketrans def partition(self, sep): return self.data.partition(sep) def replace(self, old, new, maxsplit=-1): if isinstance(old, UserString): old = old.data if isinstance(new, UserString): new = new.data return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=_sys.maxsize): if isinstance(sub, UserString): sub = sub.data return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=_sys.maxsize): return self.data.rindex(sub, start, end) def rjust(self, width, *args): return self.__class__(self.data.rjust(width, *args)) def rpartition(self, sep): return self.data.rpartition(sep) def rstrip(self, chars=None): return self.__class__(self.data.rstrip(chars)) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def rsplit(self, sep=None, maxsplit=-1): return self.data.rsplit(sep, maxsplit) def splitlines(self, keepends=False): return self.data.splitlines(keepends) def startswith(self, prefix, start=0, end=_sys.maxsize): return self.data.startswith(prefix, start, end) def strip(self, chars=None): return self.__class__(self.data.strip(chars)) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, *args): return self.__class__(self.data.translate(*args)) def upper(self): return self.__class__(self.data.upper()) def zfill(self, width): return self.__class__(self.data.zfill(width)) if __name__ == "__main__": # make sure it works, at least a little us = UserString("something") assert isinstance(us, UserString) assert isinstance(us, str) us_upper = us.upper() assert isinstance(us_upper, UserString) assert isinstance(us_upper, str) # try subclassing class SpecialString(UserString): def special(self): return "Special" + self ss = SpecialString("something") assert isinstance(ss, SpecialString) assert isinstance(ss, UserString) assert isinstance(ss, str) ss_upper = ss.upper() assert isinstance(ss_upper, SpecialString) assert isinstance(ss_upper, UserString) assert isinstance(ss_upper, str)
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I62E7PVP5NN3KYYKFOW5OUKJRQSKNL4T/ Code of Conduct: http://python.org/psf/codeofconduct/