As has been said, a builtin *could* be written that would be "friendly to
subclassing", by the definition in this thread. (I'll stay out of the
argument for the moment as to whether that would be better)

I suspect that the reason str acts like it does is that it was originally
written a LONG time ago, when you couldn't subclass basic built in types at
all.

Secondarily, it could be a performance tweak -- minimal memory and peak
performance are pretty critical for strings.

But collections.UserString does exist -- so if you want to subclass, and
performance isn't critical, then use that. Steven A pointed out that
UserStrings are not instances of str though. I think THAT is a bug. And
it's probably that way because with the magic of duck typing, no one cared
-- but with all the static type hinting going on now, that is a bigger
liability than it used to be. Also basue when it was written, you couldn't
subclass str.

Though I will note that run-time type checking of string is relatively
common compared to other types, due to the whole a-str-is-a-sequence-of-str
issue making the distinction between a sequence of strings and a string
itself is sometimes needed. And str is rarely duck typed.

If anyone actually has a real need for this I'd post an issue -- it'd be
interesting if the core devs see this as a bug or a feature (well, probably
not feature, but maybe missing feature)

OK -- I got distracted and tried it out -- it was pretty easy to update
UserString to be a subclass of str. I suspect it isn't done that way now
because it was originally written because you could not subclass str -- so
it stored an internal str instead.

The really hacky part of my prototype is this:

# self.data is the original attribute for storing the string internally.
Partly to prevent my having to re-write all the other methods, and partly
because you get recursion if you try to use the methods on self when
overriding them ...

    @property
    def data(self):
        return "".join(self)

The "".join is because it was the only way I quickly thought of to make a
native string without invoking the __str__ method and other initialization
machinery. I wonder if there is another way? Certainly there is in C, but
in pure Python?

Anyway, after I did that and wrote a __new__ -- the rest of it "just
worked".

    def __new__(cls, s):
        return super().__new__(cls, s)

UserString and its subclasses return instances of themselves, and instances
are instances of str.

Code with a couple asserts in the __main__ block enclosed.

Enjoy!

-CHB

NOTE: VERY minimally tested :-)

On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico <ros...@gmail.com> wrote:

> On Wed, 21 Dec 2022 at 09:30, Cameron Simpson <c...@cskk.id.au> wrote:
> >
> > On 19Dec2022 22:45, Chris Angelico <ros...@gmail.com> wrote:
> > >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano <st...@pearwood.info>
> wrote:
> > >> > But this much (say with a better validator) gets you static type
> checking,
> > >> > syntax highlighting, and inherent documentation of intent.
> > >>
> > >> Any half-way decent static type-checker will immediately fail as soon
> as
> > >> you call a method on this html string, because it will know that the
> > >> method returns a vanilla string, not a html string.
> > >
> > >But what does it even mean to uppercase an HTML string? Unless you
> > >define that operation specifically, the most logical meaning is
> > >"convert it into a plain string, and uppercase that".
> >
> > Yes, this was my thought. I've got a few subclasses of builtin types.
> > They are not painless.
> >
> > For HTML "uppercase" is a kind of ok notion because the tags are case
> > insensitive.
>
> Tag names are, but their attributes might not be, so even that might
> not be safe.
>
> > Notthe case with, say, XML - my personal nagging example is
> > from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a
> > "screenoverlay" both existing with different semantics. Ugh.
>
> Ugh indeed. Why? Why? Why?
>
> > So indeed, I'd probably _want_ .upper to return a plain string and have
> > special methods to do more targetted things as appropriate.
> >
>
> Agreed.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
"""
A UserString implementation that subclasses from str

so instances of it and its subclasses are instances of string
 -- could be handy for using with static typing.

NOTE: this could probably be cleaner code, but this was done with
      an absolute minimum of changes from what's in the standard library
"""


import sys as _sys

class UserString(str):
    def __new__(cls, s):
        return super().__new__(cls, s)

    # There's no need for this logic in __init__
    # def __init__(self, seq):
    #     if isinstance(seq, str):
    #         self.data = seq
    #     elif isinstance(seq, UserString):
    #         self.data = seq.data[:]
    #     else:
    #         self.data = str(seq)

    @property
    def data(self):
        return "".join(self)

    def __str__(self):
        return str(self.data)

    def __repr__(self):
        return repr(self.data)

    def __int__(self):
        return int(self.data)

    def __float__(self):
        return float(self.data)

    def __complex__(self):
        return complex(self.data)

    def __hash__(self):
        return hash(self.data)

    def __getnewargs__(self):
        return (self.data[:],)

    def __eq__(self, string):
        if isinstance(string, UserString):
            return self.data == string.data
        return self.data == string

    def __lt__(self, string):
        if isinstance(string, UserString):
            return self.data < string.data
        return self.data < string

    def __le__(self, string):
        if isinstance(string, UserString):
            return self.data <= string.data
        return self.data <= string

    def __gt__(self, string):
        if isinstance(string, UserString):
            return self.data > string.data
        return self.data > string

    def __ge__(self, string):
        if isinstance(string, UserString):
            return self.data >= string.data
        return self.data >= string

    def __contains__(self, char):
        if isinstance(char, UserString):
            char = char.data
        return char in self.data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.__class__(self.data[index])

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, str):
            return self.__class__(self.data + other)
        return self.__class__(self.data + str(other))

    def __radd__(self, other):
        if isinstance(other, str):
            return self.__class__(other + self.data)
        return self.__class__(str(other) + self.data)

    def __mul__(self, n):
        return self.__class__(self.data * n)

    __rmul__ = __mul__

    def __mod__(self, args):
        return self.__class__(self.data % args)

    def __rmod__(self, template):
        return self.__class__(str(template) % self)

    # the following methods are defined in alphabetical order:
    def capitalize(self):
        return self.__class__(self.data.capitalize())

    def casefold(self):
        return self.__class__(self.data.casefold())

    def center(self, width, *args):
        return self.__class__(self.data.center(width, *args))

    def count(self, sub, start=0, end=_sys.maxsize):
        if isinstance(sub, UserString):
            sub = sub.data
        return self.data.count(sub, start, end)

    def removeprefix(self, prefix, /):
        if isinstance(prefix, UserString):
            prefix = prefix.data
        return self.__class__(self.data.removeprefix(prefix))

    def removesuffix(self, suffix, /):
        if isinstance(suffix, UserString):
            suffix = suffix.data
        return self.__class__(self.data.removesuffix(suffix))

    def encode(self, encoding='utf-8', errors='strict'):
        encoding = 'utf-8' if encoding is None else encoding
        errors = 'strict' if errors is None else errors
        return self.data.encode(encoding, errors)

    def endswith(self, suffix, start=0, end=_sys.maxsize):
        return self.data.endswith(suffix, start, end)

    def expandtabs(self, tabsize=8):
        return self.__class__(self.data.expandtabs(tabsize))

    def find(self, sub, start=0, end=_sys.maxsize):
        if isinstance(sub, UserString):
            sub = sub.data
        return self.data.find(sub, start, end)

    def format(self, /, *args, **kwds):
        return self.data.format(*args, **kwds)

    def format_map(self, mapping):
        return self.data.format_map(mapping)

    def index(self, sub, start=0, end=_sys.maxsize):
        return self.data.index(sub, start, end)

    def isalpha(self):
        return self.data.isalpha()

    def isalnum(self):
        return self.data.isalnum()

    def isascii(self):
        return self.data.isascii()

    def isdecimal(self):
        return self.data.isdecimal()

    def isdigit(self):
        return self.data.isdigit()

    def isidentifier(self):
        return self.data.isidentifier()

    def islower(self):
        return self.data.islower()

    def isnumeric(self):
        return self.data.isnumeric()

    def isprintable(self):
        return self.data.isprintable()

    def isspace(self):
        return self.data.isspace()

    def istitle(self):
        return self.data.istitle()

    def isupper(self):
        return self.data.isupper()

    def join(self, seq):
        return self.data.join(seq)

    def ljust(self, width, *args):
        return self.__class__(self.data.ljust(width, *args))

    def lower(self):
        return self.__class__(self.data.lower())

    def lstrip(self, chars=None):
        return self.__class__(self.data.lstrip(chars))

    maketrans = str.maketrans

    def partition(self, sep):
        return self.data.partition(sep)

    def replace(self, old, new, maxsplit=-1):
        if isinstance(old, UserString):
            old = old.data
        if isinstance(new, UserString):
            new = new.data
        return self.__class__(self.data.replace(old, new, maxsplit))

    def rfind(self, sub, start=0, end=_sys.maxsize):
        if isinstance(sub, UserString):
            sub = sub.data
        return self.data.rfind(sub, start, end)

    def rindex(self, sub, start=0, end=_sys.maxsize):
        return self.data.rindex(sub, start, end)

    def rjust(self, width, *args):
        return self.__class__(self.data.rjust(width, *args))

    def rpartition(self, sep):
        return self.data.rpartition(sep)

    def rstrip(self, chars=None):
        return self.__class__(self.data.rstrip(chars))

    def split(self, sep=None, maxsplit=-1):
        return self.data.split(sep, maxsplit)

    def rsplit(self, sep=None, maxsplit=-1):
        return self.data.rsplit(sep, maxsplit)

    def splitlines(self, keepends=False):
        return self.data.splitlines(keepends)

    def startswith(self, prefix, start=0, end=_sys.maxsize):
        return self.data.startswith(prefix, start, end)

    def strip(self, chars=None):
        return self.__class__(self.data.strip(chars))

    def swapcase(self):
        return self.__class__(self.data.swapcase())

    def title(self):
        return self.__class__(self.data.title())

    def translate(self, *args):
        return self.__class__(self.data.translate(*args))

    def upper(self):
        return self.__class__(self.data.upper())

    def zfill(self, width):
        return self.__class__(self.data.zfill(width))

if __name__ == "__main__":

    # make sure it works, at least a little
    us = UserString("something")
    assert isinstance(us, UserString)
    assert isinstance(us, str)

    us_upper = us.upper()
    assert isinstance(us_upper, UserString)
    assert isinstance(us_upper, str)

    # try subclassing
    class SpecialString(UserString):
        def special(self):
            return "Special" + self

    ss = SpecialString("something")
    assert isinstance(ss, SpecialString)
    assert isinstance(ss, UserString)
    assert isinstance(ss, str)

    ss_upper = ss.upper()
    assert isinstance(ss_upper, SpecialString)
    assert isinstance(ss_upper, UserString)
    assert isinstance(ss_upper, str)


_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/I62E7PVP5NN3KYYKFOW5OUKJRQSKNL4T/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to