[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Shantanu Jain
collections.UserString can take away a lot of this boilerplate pain from
user defined str subclasses.

On Sun, Dec 18, 2022 at 7:28 PM Steven D'Aprano  wrote:

> On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:
>
> > However, if you want to allow these types to possibly *do* something with
> > the strings inside (validate them, canonicalize them, do a security
> check,
> > etc), I think I like the other way:
> >
> > #2
> >
> > class html(str): pass
> > class css(str): pass
>
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>
>
> To be useable, we have to override every string method that returns a
> string. Including dunders. So your class becomes full of tedious boiler
> plate:
>
> def upper(self):
> return type(self)(super().upper())
> def lower(self):
> return type(self)(super().lower())
> def casefold(self):
> return type(self)(super().casefold())
> # Plus another 29 or so methods
>
> This is not just tedious and error-prone, but it is inefficient: calling
> super returns a regular string, which then has to be copied as a
> subclassed string and the original garbage collected.
>
>
> --
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4RIQ65SHYK3T2KZ2XKOPD45KH2SOFQFI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano  wrote:

> > However, if you want to allow these types to possibly *do* something with
> > the strings inside (validate them, canonicalize them, do a security
> check,
> > etc), I think I like the other way:
> > class html(str): pass
> > class css(str): pass
>
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>

I'd agree to "limited", but not "hostile."  Look at the suggestions I
mentioned: validate, canoncialize, security check.  All of those are
perfectly fine in `.__new__()`.  E.g.:

In [1]: class html(str):
   ...: def __new__(cls, s):
   ...: if not "<" in s:
   ...: raise ValueError("That doesn't look like HTML")
   ...: return str.__new__(cls, s)


In [2]: html("Hello")


In [3]: html("Hello")
---
ValueErrorTraceback (most recent call last)
 in 
> 1 html("Hello")


 in __new__(cls, s)
  2 def __new__(cls, s):
  3 if not "<" in s:
> 4 raise ValueError("That doesn't look like HTML")
  5


ValueError: That doesn't look like HTML


I readily acknowledge that's not a very thorough validator :-).

But this much (say with a better validator) gets you static type checking,
syntax highlighting, and inherent documentation of intent.

I know that lots of things one can do with a str subclass wind up producing
a str instead.  But if the thing you do is just "make sure it is created as
the right kind of thing for static checking and editor assistance, I don't
care about any of that falling back.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AEQCVTJ2ABFQSQHWM62JOJQJI6UU675Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Chris Angelico
On Mon, 19 Dec 2022 at 12:29, Steven D'Aprano  wrote:
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>

"Hostile"? I dispute that. Are you saying that every method on a
string has to return something of the same type as self, rather than a
vanilla string? Because that would be far MORE hostile to other types
of string subclass:

>>> import dataclasses
>>> from enum import StrEnum
>>> class Demo(StrEnum):
... x = "eggs"
... m = "ham"
...
>>> Demo.x

>>> isinstance(Demo.x, str)
True
>>> Demo.x.upper()
'EGGS'
>>> Demo.m + " and " + Demo.x
'ham and eggs'

Demo.x is a string. Which means that, unless there's good reason to do
otherwise, it should behave as a string. So it should be possible to
use it as if it were the string "eggs", including appending it to
something, appending something to it, uppercasing it, etc, etc, etc.

So what should happen if you do these kinds of manipulations? Should
attempting to use a string in a normal string context raise
ValueError?

>>> Demo("ham and eggs")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.12/enum.py", line 726, in __call__
return cls.__new__(cls, value)
   ^^^
  File "/usr/local/lib/python3.12/enum.py", line 1121, in __new__
raise ve_exc
ValueError: 'ham and eggs' is not a valid Demo

I would say that *that* would count as "positively hostile to subclassing".

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HCXWIKZ47LI7UIESEYAP63TP2CGWHR5O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Steven D'Aprano
On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:

> However, if you want to allow these types to possibly *do* something with
> the strings inside (validate them, canonicalize them, do a security check,
> etc), I think I like the other way:
> 
> #2
> 
> class html(str): pass
> class css(str): pass

The problem with this is that the builtins are positively hostile to 
subclassing. The issue is demonstrated with this toy example:

class mystr(str):
def method(self):
return 1234

s = mystr("hello")
print(s.method())  # This is fine.
print(s.upper().method())  # This is not.


To be useable, we have to override every string method that returns a 
string. Including dunders. So your class becomes full of tedious boiler 
plate:

def upper(self):
return type(self)(super().upper())
def lower(self):
return type(self)(super().lower())
def casefold(self):
return type(self)(super().casefold())
# Plus another 29 or so methods

This is not just tedious and error-prone, but it is inefficient: calling 
super returns a regular string, which then has to be copied as a 
subclassed string and the original garbage collected.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
Using a typing approach sounds like a fantastic idea.  Moreover, as Stephen
showed, it's easy to make Emacs utilize that, and as I showed, it's easy to
make vim follow that.  I've only written one tiny VS Code extension, but it
wouldn't be hard there either.  I'm not sure how one adds stuff to PyCharm
and other editors, but I have to believe it's possible.

So I see two obvious approaches, both of which 100% fulfill Emil's hope
without new syntax:

#1

from typing import NewType


html = NewType("html", str)
css = NewType("css", str)


a: html = html("Hello world")
b: css = css("h1 { color: #99; }")


def combine(h: html, c: css):
print(f"Combined page elements: {h} | {c}")


combine(a, b)  # <- good
combine(b, a)  # <- bad



However, if you want to allow these types to possibly *do* something with
the strings inside (validate them, canonicalize them, do a security check,
etc), I think I like the other way:

#2

class html(str): pass
class css(str): pass


a: html = html("Hello world")
b: css = css("h1 { color: #99; }")


def combine(h: html, c: css):
print(f"Combined page elements: {h} | {c}")


combine(a, b)
combine(b, a)


The type annotations in the assignment lines are optional, but if you're
doing something other than just creating an instance of the (pseudo-)type,
they might add something.  They might also be what your text editor decides
to use as its marker.

For either version, type analysis will find a problem.  If I hadn't matched
the types in the assignment, it would detect extra problems:

(py3.11) 1310-scratch % mypy tagged_types1.py
tagged_types1.py:13: error: Argument 1 to "combine" has incompatible type
"css"; expected "html"  [arg-type]
tagged_types1.py:13: error: Argument 2 to "combine" has incompatible type
"html"; expected "css"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)


Using typing.Annotated can also be used, but it solves a slightly different
problem.




On Sun, Dec 18, 2022 at 5:24 PM Paul Moore  wrote:

> On Sun, 18 Dec 2022 at 21:42, Christopher Barker 
> wrote:
>
>> On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
>> wrote:
>>
>>> In general, I find any proposal to change Python "because then my text
>>> editor would need to
>>> change to accommodate the language" to be unconvincing.
>>>
>>
>> Personally, I’m skeptical of any proposal to change Python to make it
>> easier for IDEs.
>>
>> But there *may* be other good reasons to do something like this. I’m not
>> a static typing guy, but it segg do me that it could be useful to subtype
>> strings:
>>
>> This function expects an SQL string.
>>
>> This function returns an SQL string.
>>
>> Maybe not worth the overhead, but worth more than giving IDEs hints SATO
>> what to do.
>>
>
> I believe typing has annotated types that could do this.
> Paul
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XCACWMITDR5YNBICCNONLUGZUYC3NFRV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Paul Moore
On Sun, 18 Dec 2022 at 21:42, Christopher Barker 
wrote:

> On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
> wrote:
>
>> In general, I find any proposal to change Python "because then my text
>> editor would need to
>> change to accommodate the language" to be unconvincing.
>>
>
> Personally, I’m skeptical of any proposal to change Python to make it
> easier for IDEs.
>
> But there *may* be other good reasons to do something like this. I’m not a
> static typing guy, but it segg do me that it could be useful to subtype
> strings:
>
> This function expects an SQL string.
>
> This function returns an SQL string.
>
> Maybe not worth the overhead, but worth more than giving IDEs hints SATO
> what to do.
>

I believe typing has annotated types that could do this.
Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U22UUM7J22IKDQCQTMHW27AISQ2H2YOY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Christopher Barker
On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
wrote:

> In general, I find any proposal to change Python "because then my text
> editor would need to
> change to accommodate the language" to be unconvincing.
>

Personally, I’m skeptical of any proposal to change Python to make it
easier for IDEs.

But there *may* be other good reasons to do something like this. I’m not a
static typing guy, but it segg do me that it could be useful to subtype
strings:

This function expects an SQL string.

This function returns an SQL string.

Maybe not worth the overhead, but worth more than giving IDEs hints SATO
what to do.

-CHB
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UUDANPFKWV66IN3DXGTS3VQ6A7XY6YIX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
Well, obviously I have to come to the defense of vim as well :-).  I'm not
sure what year vim got the capability, but I suspect around as long as
emacs.

This isn't for exactly the same language use case, but finding a quick
example on the internet:

unlet b:current_syntaxsyntax include @srcBash syntax/bash.vim
syntax region srcBashHi start="..." end="..." keepend contains=@srcBash

unlet b:current_syntaxsyntax include @srcHTML syntax/html.vim
syntax region srcHTMLHi start="^...$" end="^...$" keepend contains=@srcHTML

This is easy to adapt to either the named function convention:
`html('Hello')` or to the
standardized-comment convention.

In general, I find any proposal to change Python "because then my text
editor would need to
change to accommodate the language" to be unconvincing.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6PMUCHFX6FG2IT2VHANPGSPX4GNBJAII/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Idea: Tagged strings in python

2022-12-18 Thread Stephen J. Turnbull
e...@emilstenstrom.se writes:

 > Seems simple enough, right? The problem is: There's no syntax
 > highlighting in my code editor for the three other languages.

Then you're not using Emacs's mmm-mode, which has been available for a
couple of decades.  Now, mmm-mode doesn't solve the whole problem --
it doesn't know anything about how the languages are tagged.  But this
isn't a problem for an Emacs shop, the team decides on a convention
(or recognizes a third party's convention), and somebody will code up
the 5-line function that font-lock (syntax highlighter in Emacs) uses
to dispatch to the appropriate the syntax highlighting mode.

AFAICS this requires either all editors become Emacs ;-) or all
editor maintainers get together and agree on the tags (this will need
to be extensible, there are a lot of languages out there, and some
editors will want to distinguish languages by version to flag syntax
invalid in older versions).  Is this really going to happen?  Just for
Python?  When the traditional solution of separating different
languages into different files is almost always acceptable?

There are other uses proposed for tagged strings.  In combination,
perhaps this feature is worthwhile.  But I think that on its own the
multiple language highlighting application is pretty dubious given the
limited benefit vs. the amount of complexity it will introduce not
only in Python, but in editors as well.

 > This makes for a horrible developer experience, where you
 > constantly have to hunt for characters inside of strings.

If this were a feature anyway, it would be very useful in certain
situations (for example dynamic web pages), no question about it.  But
mixed-language files are not something I want to see in projects I
work on -- and remember, I use Emacs, I have mmm-mode already.

 > If I instead use separate files, I get syntax highlighting and
 > auto-completion for each file, because editors set language based
 > on file type.

This is problematic for your case.  This means that the editor needs
to change how it dispatches to syntax highlighting.  Emacs, no
problem, it already dispatches highlighting based on tagged regions of
text.  But are other editors going to *change* to do that?

 > But should I really have to choose?

Most of the time, I'd say "yes", and you should choose multiple
files. ;-)  YMMV of course, but I really appreciate the separation of
concerns that is provided by separate files for Python code, HTML
templates, and (S)CSS presentation.

 > *Do we need a python language solution to this?*
 > Could the code editors fix this? There's a long issue thread for
 > vscode where this is discussed:
 > https://github.com/Microsoft/vscode/issues/1751 - The reasoning
 > (reasonable imho) is that this is not something that can be done
 > generally, but that it needs to be handled at the python vscode
 > extension level. Makes sense.

Makes sense, yes -- that's how Emacs does it, but Emacs is *already*
fundamentally designed on a model of implicitly tagged text.  Parsing
strings is already relatively hard because the begin marker is the
same as the end marker.  Now you need to tie it to the syntax
highlighting mode, which may change over large regions of text every
time you insert or delete a quotation mark or comment delimiter.  You
*can't* just hand it off to the Python highlighter, *every* syntax
highlighter that might be used inside a Python string at least needs
to know how to hand control back to Python.  For one thing, they all
need to learn about all four of Python's string delimiters.

And it gets worse.  I wonder how you end up with CSS and HTML inside
Python strings?  Yup, the CSS is inside a 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread emil
dn wrote:
> > Is this a problem with Python, or with the tool?
> «
> Language injections
> Last modified: 14 December 2022
> Language injections let you work with pieces of code in other languages 
> embedded in your code. When you inject a language (such as HTML, CSS, 
> XML, RegExp, and so on) into a string literal, you get comprehensive 
> code assistance for editing that literal.
> ...
> »
> https://www.jetbrains.com/help/pycharm/using-language-injections.html
> Contains a specific example for Django scripters.
> (sadly as an image - probably wouldn't be handled by this ListServer)

I touched upon this solution in the original post. If all editors could agree 
to use # language=html it would be an ok solution. That API creates lots of 
ambiguity around to what the comment should be applied. Some examples which are 
non-obvious imho:


"" # language=html
"

# language=html

""

# language=html
process_html("")

# language=html
concat_html("", "")


> > If I instead use separate files, I get syntax highlighting and 
> > auto-completion for each file, because editors set language based on file 
> > type. But should I really have to choose?
> > In other situations where files need to be collected together, a 
> data-archive may be used (not to be confused with any historical 
> context, nor indeed with data-compression).

The point here is to have everything in one file, editable and syntax 
highlighted in that same file. I don't think this tip applies to that?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TX35CCY4YLJEGWCODYHTWXWDM2SSANE4/
Code of Conduct: http://python.org/psf/codeofconduct/