[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Stephen J. Turnbull
Brendan Barnwell writes:

 >  What it means for me for something to "be an HTML string" (or more 
 > precisely, to be an instance of HTMLString or whatever the class name 
 > is) is for it to be a string that has an extra tag attached to the 
 > object that means "this is HTML".

I don't like tags that lie.  Seems pointless (see below).

 > The point is that overrides are for specifying the *new* behavior
 > of the subclass (i.e., not allowing certain slice operations); you
 > shouldn't have to override methods just to retain the superclass
 > behavior.

Do you mean "retain the subclass behavior" here?  AFAICS what's being
called "hostile" is precisely retaining *superclass* behavior.

 >  I mean, we were talking about this in the context of syntax 
 > highlighting.  The utility of HTML-string highlighting would be 
 > seriously reduced if only *valid* HTML could be in an HTML string.

The proposed HTMLstring *class* is irrelevant to syntax highlighting,
regardless of its functionality.  The OP (and his syntax-highlighting
text editor!) wants standard literal syntax *in source code* that
allows an editor-that-is-not-as-programmable-as-emacs-or-vim to
recognize a fragment of text (typically in a literal string) that is
supposed to be highlighted as HTML.  Syntax highlighting is not aided
by an HTMLstring object in the *running Python program*.

I really don't understand what value your HTMLstring as str + tag
provides to the OP, or to a Python program.  I guess that an editor
written in Python could manipulate a list of TaggedString objects,
but this is a pretty impoverished model.  Emacsen have had extents/
overlays since 1990 or so, which can be nested or overlap, and nesting
and overlapping are both needed for source code highlighing.[1][2]

I don't take a position on the "builtins are hostile to subclassing"
debate.  I can't recall ever noticing the problem, so I'll let you all
handle that. :-)


Footnotes: 
[1]  In Emacsen, tagged source text (overlays) is used not only for
syntax highlighting which presumably is nested (but TagSoup HTML!),
but also to implement things like hiding text, which is an operation
on raw text that need not respect any syntax.

[2]  XEmacs's implementation of syntax highlighting actually works in
terms of "extent fragments" which are non-overlapping, but they're
horrible to work with from a editor API standpoint.  They're used only
in the implementation of the GUI display, for performance reasons, and
each one typically contains a plethora of tags.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UVQ6PGUKF5EG6UZWOBI76ZQANNFVC5TS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 13:56, Brendan Barnwell  wrote:
>
> On 2022-12-19 13:59, Chris Angelico wrote:
> > On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  
> > wrote:
> >>   > See my example regarding a StrEnum and tell me whether that would be
> >>   > more irritating.
> >>
> >>  I can't run that example myself as I don't have Python 3.11 set 
> >> up.
> >
> > The enum module was added in Python 3.4.
>
> Your example used StrEnum, which was added in Python 3.11.

Oh! My apologies. The older way of spelling it multiple inheritance
but comes to the same thing; it's still very definitely a string.
StrEnum is a lot more convenient, and I've been using 3.11 for long
enough that I forgot when it came in. Even back in 3.5 (the oldest
docs that I have handy), the notion of enum MI was listed as a
recommended method:

https://docs.python.org/3.5/library/enum.html#others

>>> class Demo(str, Enum):
... x = "eggs"
... m = "ham"

Other than that change to the signature, the demonstration behaves
exactly the same (I just tested it on 3.5). Again, my apologies for
unintentionally providing an example that works only on very new
Pythons.

> > Nonetheless, a StrEnum is absolutely a str, and whatever you say about
> > an HTML string has to also be valid for a StrEnum, or else the inverse
> > is.
>
> No, it doesn't, because HTMLString and StrEnum can be different
> subclasses of str with different behavior.  You seem to be missing the
> concept of subclasses here.  Yes, a StrEnum may be an instance of str,
> and an HTMLString may also be an instance of str.  But that does not
> mean the behavior of both needs to be same.  They are instances of
> *different subclasses* of str and can have *different behavior*.  An
> instance of collections.Counter is an instance of dict and so is an
> instance of collections.defaultdict, but that doesn't mean that anything
> I say about a Counter has to be valid for a defaultdict.

That is very true, but whenever the subclass is NOT the same as the
superclass, you provide functionality to do so. Otherwise, the normal
assumption should be that it behaves identically. For instance, if you
iterate over a Counter, you would expect to get all of the keys in it;
it's true that you can subscript it with any value and get back a
zero, but the default behaviour of Counter iteration is to do the same
thing that a dict would.

And that's what we generally see. A StrEnum is a str, and any
behaviours that aren't the same as str are provided by StrEnum (for
instance, it has a different __repr__). But for anything that isn't
overridden - including any new functionality, if you upgrade Python
and keep the same StrEnum code - you get the superclass's behaviour.

> > The way things are, a StrEnum or an HTML string will behave *exactly
> > as a string does*. The alternative is that, if any new operations are
> > added to strings in the future, they have to be explicitly blocked by
> > StrEnum or else they will randomly and mysteriously misbehave - or, at
> > very best, crash with unexpected errors. Which one is more hostile to
> > subclasses?
>
> I already answered that in my previous post.  To repeat: StrEnum is 
> the
> unusual case and I am fine with it being more difficult to create
> something like StrEnum, because that is not as important as making it
> easy to create classes that *do* return an instance of themselves (i.e.,
> an instance of the same type as "self") from their various methods.

I'm of the opinion that this is a lot less special than you might
think, since there are quite a lot of these sorts of special cases.

> The
> current behavior is more hostile to subclasses because people typically
> write subclasses to *extend* the behavior of superclasses, and that is
> hindered if you have to override every superclass method just to make it
> do the same thing but return the result wrapped in the new subclass.

Maybe, but I would say that the solution is to make an easier way to
make a subclass that automatically does those changes - not to make
this the behaviour of all classes, everywhere. Your idea to:

>One way that some libraries implement this for their own classes is to
> have an attribute or method called something like `_class` or
> `_constructor` that specifies which class to use to construct a new
> instance when needed.  By default such a class may return an instance of
> the same type as self (i.e., the most specific subclass), but subclasses
> could override it to do something else.

... have a _class attribute may be a good way to do this, since -
unless otherwise overridden - it would remain where it is. (Though,
minor bikeshedding - a dunder name is probably more appropriate here.)
It could even be done with a mixin:

class Str(autospecialize, str):
__autospecialize__ = __class__
def some_method(self): ...

and then the autospecialize class can handle this. There are many ways
of handling this, and IMO the best *default* 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Brendan Barnwell

On 2022-12-19 13:59, Chris Angelico wrote:

On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  wrote:

  > See my example regarding a StrEnum and tell me whether that would be
  > more irritating.

 I can't run that example myself as I don't have Python 3.11 set up.


The enum module was added in Python 3.4.


Your example used StrEnum, which was added in Python 3.11.


Nonetheless, a StrEnum is absolutely a str, and whatever you say about
an HTML string has to also be valid for a StrEnum, or else the inverse
is.


	No, it doesn't, because HTMLString and StrEnum can be different 
subclasses of str with different behavior.  You seem to be missing the 
concept of subclasses here.  Yes, a StrEnum may be an instance of str, 
and an HTMLString may also be an instance of str.  But that does not 
mean the behavior of both needs to be same.  They are instances of 
*different subclasses* of str and can have *different behavior*.  An 
instance of collections.Counter is an instance of dict and so is an 
instance of collections.defaultdict, but that doesn't mean that anything 
I say about a Counter has to be valid for a defaultdict.


	One way that some libraries implement this for their own classes is to 
have an attribute or method called something like `_class` or 
`_constructor` that specifies which class to use to construct a new 
instance when needed.  By default such a class may return an instance of 
the same type as self (i.e., the most specific subclass), but subclasses 
could override it to do something else.



The way things are, a StrEnum or an HTML string will behave *exactly
as a string does*. The alternative is that, if any new operations are
added to strings in the future, they have to be explicitly blocked by
StrEnum or else they will randomly and mysteriously misbehave - or, at
very best, crash with unexpected errors. Which one is more hostile to
subclasses?


	I already answered that in my previous post.  To repeat: StrEnum is the 
unusual case and I am fine with it being more difficult to create 
something like StrEnum, because that is not as important as making it 
easy to create classes that *do* return an instance of themselves (i.e., 
an instance of the same type as "self") from their various methods.  The 
current behavior is more hostile to subclasses because people typically 
write subclasses to *extend* the behavior of superclasses, and that is 
hindered if you have to override every superclass method just to make it 
do the same thing but return the result wrapped in the new subclass.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TYEXVQ3YLPTUM6LOF2657OXGDD5DNHPZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 12:55, Ethan Furman  wrote:
>
> On 12/19/22 13:59, Chris Angelico wrote:
>
>  > The way things are, a StrEnum or an HTML string will behave *exactly
>  > as a string does*. The alternative is that, if any new operations are
>  > added to strings in the future, they have to be explicitly blocked by
>  > StrEnum or else they will randomly and mysteriously misbehave - or, at
>  > very best, crash with unexpected errors. Which one is more hostile to
>  > subclasses?
>
> As Brendan noted, mixed-type enums are special -- they are meant to be 
> whatever they subclass, with a couple extra
> features/restrictions.

Fair, but defaultdict also exhibits this behaviour, so maybe there are
a number of special cases. Or, as Syndrome put it: "When everyone's
[special]... no one will be."

> Personally, every other time I've wanted to subclass a built-in data type, 
> I've wanted the built-in methods to return my
> subclass, not the original class.
>
> All of which is to say:  sometimes you want it one way, sometimes the other.  
> ;-)

Yep, sometimes each way. So the real question is not "would the
opposite decision make sense in some situations?" but "which one is
less of a problem when it's the wrong decision?". And I put it to you
that returning an instance of the base type is less of a problem, in
the same way that *any other* operation unaware of the subclass would
behave.

def underline(head):
"""Build an underline line for the given heading"""
return "=" * len(head)

Would you expect underline() to return the same type as head, or a
plain str? Would this be true of every single function that returns
something of the same kind as one of its parameters?

> Metaclasses, anyone?

Hmm, how would they help? I do think that metaprogramming could help
here, but not sure about metaclasses specifically.

If I wanted to automate this, I'd go for something like this:

@autospecialize
class Str(str):
def extra_method(self): ...

where the autospecialize decorator would look at your class's first
base class, figure out which methods should get this treatment (only
if not overridden, only if they return that type, not __new__, maybe
other rules), and then add a wrapper that returns __class__(self). But
people will dispute parts of that. Maybe it should be explicitly told
which base class to handle this way. Maybe it'd be better to have an
intermediate class, rather than mutating the subclass. Maybe you
should be explicit about which methods get autospecialized. It's not
an easy problem, and simply returning the base class is the one option
that you can be confident of.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X263X2B4VUEWJIDRL27FNYE2C3S5KV77/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Ethan Furman

On 12/19/22 13:59, Chris Angelico wrote:

> The way things are, a StrEnum or an HTML string will behave *exactly
> as a string does*. The alternative is that, if any new operations are
> added to strings in the future, they have to be explicitly blocked by
> StrEnum or else they will randomly and mysteriously misbehave - or, at
> very best, crash with unexpected errors. Which one is more hostile to
> subclasses?

As Brendan noted, mixed-type enums are special -- they are meant to be whatever they subclass, with a couple extra 
features/restrictions.


Personally, every other time I've wanted to subclass a built-in data type, I've wanted the built-in methods to return my 
subclass, not the original class.


All of which is to say:  sometimes you want it one way, sometimes the other.  
;-)

Metaclasses, anyone?

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H6TQFFG3QZDNC4EJGROYLJVWU6L57XBA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 11:16, Steven D'Aprano  wrote:
> Speaking of dicts, the dict.fromkeys method cooperates with subclasses.
> That proves that it can be done from a builtin. True, it is a
> classmethod rather than an instance method, but any instance method can
> find out its own class by calling `type()` (or the internal, C
> equivalent) on `self`. Just as we can do from Python.
>

What you really mean here is that fromkeys cooperates with subclasses
*that do not change the signature of __init__*. Otherwise, it won't
work.

The reason this is much easier with a classmethod alternate
constructor is that, if you don't want that behaviour, just don't do
that.

>>> class NumDict(dict):
... def __init__(self, max, /):
... for i in range(max): self[i] = i
...
>>> NumDict(10)
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>> NumDict(5)
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
>>> NumDict.fromkeys("abc")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: NumDict.__init__() missing 1 required positional argument: 'max'

So? Just don't use NumDict.fromkeys(), it doesn't make sense for a
NumDict. And that's fine. But if something else didn't work, it would
cause major problems. Which is why instance methods return plain
dictionaries:

>>> type(NumDict(5) | NumDict(3))


How is the vanilla dictionary supposed to know how to construct a
NumDict correctly? Come to think of it, how is dict supposed to know
how to construct a defaultdict? Oh, it doesn't really.

>>> d = defaultdict.fromkeys("asdf", 42)
>>> d["a"]
42
>>> d["b"]
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'b'
>>> d
defaultdict(None, {'a': 42, 's': 42, 'd': 42, 'f': 42})

All it does is construct a vanilla dictionary, because that's all it
knows how to do.

If the rule is "all operations return an object of the subclass
automatically", then the corollary is "all subclasses must retain the
signature of the superclass".

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/45GP3NBIUC3B6MQINMKMZLW4JNOCMOFL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Mon, Dec 19, 2022 at 03:48:01PM -0800, Christopher Barker wrote:
> On Mon, Dec 19, 2022 at 3:39 AM Steven D'Aprano  wrote
> 
> > In any case, I was making a larger point that this same issue applies to
> > other builtins like float, int and more.
> 
> 
> Actually, I think the issue is with immutable types, rather than builtins.

No.

>>> class MyList(list):
... def frobinate(self):
... return "something"
... 
>>> (MyList(range(5)) + []).frobinate()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'list' object has no attribute 'frobinate'

And of course, by default, MyList slices are MyLists too, right? No.

>>> type(MyList(range(5))[1:])


This is less of an issue for dicts because there are few dict methods 
and operators which return dicts.

Speaking of dicts, the dict.fromkeys method cooperates with subclasses. 
That proves that it can be done from a builtin. True, it is a 
classmethod rather than an instance method, but any instance method can 
find out its own class by calling `type()` (or the internal, C 
equivalent) on `self`. Just as we can do from Python.

> And that’s just the nature of the beast.

Of course it is not. We can write classes in Python that cooperate with 
subclasses. The only difference is that builtins are written in C. There 
is nothing fundamental to C that forces this behaviour. It's a choice.


-- 
Steve

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BA6M5Y5ZLPNSGHDRU7U6SBSFCAZAU3MS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  wrote:
>  > See my example regarding a StrEnum and tell me whether that would be
>  > more irritating.
>
> I can't run that example myself as I don't have Python 3.11 set up.

The enum module was added in Python 3.4.

Nonetheless, a StrEnum is absolutely a str, and whatever you say about
an HTML string has to also be valid for a StrEnum, or else the inverse
is.

The way things are, a StrEnum or an HTML string will behave *exactly
as a string does*. The alternative is that, if any new operations are
added to strings in the future, they have to be explicitly blocked by
StrEnum or else they will randomly and mysteriously misbehave - or, at
very best, crash with unexpected errors. Which one is more hostile to
subclasses?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FAAYN7V6A26FYL6XGIOMHANDCBDXRATH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Brendan Barnwell

Sorry, accidentally replied off-list. . .

On 2022-12-19 11:36, Chris Angelico wrote:

On Tue, 20 Dec 2022 at 06:29, Brendan Barnwell  wrote:


On 2022-12-19 03:45, Chris Angelico wrote:

On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:

But this much (say with a better validator) gets you static type checking,
syntax highlighting, and inherent documentation of intent.


Any half-way decent static type-checker will immediately fail as soon as
you call a method on this html string, because it will know that the
method returns a vanilla string, not a html string.


But what does it even mean to uppercase an HTML string? Unless you
define that operation specifically, the most logical meaning is
"convert it into a plain string, and uppercase that". Or, similarly,
slicing an HTML string. You could give that a completely different
meaning (maybe defining its children to be tags, and slicing is taking
a selection of those), but if you don't, slicing isn't really a
meaningful operation.


 I don't agree with that at all.  What it means for an HTML string to be
a subclass of a normal string is that all normal string operations still
work on an HTML string --- just like what it means for any instance of a
subclass to be an instance of the superclass is that you can do anything
to the subclass that you could do to the superclass.  Every character in
an HTML string is still a character and can still be uppercased.  The
string is still a sequence of characters and can be sliced.  All such
operations still have a perfectly natural meaning.


And that part is already true. None of this changes. That's guaranteed
by the concept of subclassing. But what you're doing is string
operations on a string.


We just want them to
now return an *HTML* string when they're done instead of a normal one.
The point of having a subclass is to define *additional* behavior while
still retaining the superclass behavior as well.


So how is it still an "HTML" string if you slice out parts of it and
it isn't valid HTML any more?


	What it means for me for something to "be an HTML string" (or more 
precisely, to be an instance of HTMLString or whatever the class name 
is) is for it to be a string that has an extra tag attached to the 
object that means "this is HTML".  That's it.  You can make an HTML 
string that contains utter gobbledegook if you want.  Of course, some 
operations may fail (like if it has a .validate() method) but that 
doesn't mean it's not still an instance of that class.


	Or, if you do want that, you can override the slicing method to raise 
an error if the result isn't valid HTML.  The point is that overrides 
are for specifying the *new* behavior of the subclass (i.e., not 
allowing certain slice operations); you shouldn't have to override 
methods just to retain the superclass behavior.


	I mean, we were talking about this in the context of syntax 
highlighting.  The utility of HTML-string highlighting would be 
seriously reduced if only *valid* HTML could be in an HTML string.


>>  Personally I find Python's behavior in this regard (not 
just for

>> strings but for other builtin types) to be one of its most irritating
>> warts.
>
> See my example regarding a StrEnum and tell me whether that would be
> more irritating.

	I can't run that example myself as I don't have Python 3.11 set up. 
But just from what you showed, I don't find it convincing.  Enums are 
special in that they are specifically designed to allow only a fixed set 
of values.  I see that as the uncommon case, rather than the common one 
of subclassing an "open-ended" class to create a new "open-ended" class 
(i.e., one that does not pre-specify exactly which values are allowed). 
So no, I don't think it would be more irritating.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CT7UM6REJZEA3L6HHI2CGHMPXRZ7NXHI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:
> > But this much (say with a better validator) gets you static type checking,
> > syntax highlighting, and inherent documentation of intent.
>
> Any half-way decent static type-checker will immediately fail as soon as
> you call a method on this html string, because it will know that the
> method returns a vanilla string, not a html string.

But what does it even mean to uppercase an HTML string? Unless you
define that operation specifically, the most logical meaning is
"convert it into a plain string, and uppercase that". Or, similarly,
slicing an HTML string. You could give that a completely different
meaning (maybe defining its children to be tags, and slicing is taking
a selection of those), but if you don't, slicing isn't really a
meaningful operation.

So it should be correct: you cannot simply uppercase an HTML string
and expect sane HTML.

I might be more sympathetic if you were talking about "tainted"
strings (ie those which contain data from an end user), on the basis
that most operations on those should yield tainted strings, but given
that systems of taint tracking seem to have managed just fine with the
existing way of doing things, still not particularly persuasive.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7GVERAPFWRAX463V24IYRKG5HIPYQ23I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Mon, Dec 19, 2022 at 01:02:02AM -0600, Shantanu Jain wrote:

> collections.UserString can take away a lot of this boilerplate pain from
> user defined str subclasses.

At what performance cost?

Also:

>>> s = collections.UserString('spam and eggs')
>>> isinstance(s, str)
False

which pretty much makes UserString useless for any code that does static 
checking or runtime isisinstance checks.

In any case, I was making a larger point that this same issue applies to 
other builtins like float, int and more.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UYRYTKMO3L5GSB2F5A4N5I6J3LTA7DQE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Sun, Dec 18, 2022 at 10:23:18PM -0500, David Mertz, Ph.D. wrote:

> I'd agree to "limited", but not "hostile."  Look at the suggestions I
> mentioned: validate, canoncialize, security check.  All of those are
> perfectly fine in `.__new__()`.

No, they aren't perfectly fine, because as soon as you apply any 
operation to your string subclass, you get back a plain vanilla string 
which bypasses your custom `__new__` and so does not perform the 
validation or security check.

> But this much (say with a better validator) gets you static type checking,
> syntax highlighting, and inherent documentation of intent.

Any half-way decent static type-checker will immediately fail as soon as 
you call a method on this html string, because it will know that the 
method returns a vanilla string, not a html string. And that's exactly 
what mypy does:

[steve ~]$ cat static_check_test.py 
class html(str):
pass

def func(s:html) -> None:
pass

func(html('').lower())

[steve ~]$ mypy static_check_test.py 
static_check_test.py:7: error: Argument 1 to "func" has incompatible 
type "str"; expected "html"
Found 1 error in 1 file (checked 1 source file)


Same with auto-completion. Either auto-complete will correctly show you 
that what you thought was a html object isn't, and fail to show any 
additional methods you added; or worse, it will wrongly think it is a 
html object when it isn't, and allow you to autocorrect methods that 
don't exist.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2JPILXSBEPUKHG4E5GH5KJFNOGNWXDYB/
Code of Conduct: http://python.org/psf/codeofconduct/