[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Cameron Simpson

On 24Dec2022 15:12, Chris Angelico  wrote:

On Sat, 24 Dec 2022 at 15:00, Cameron Simpson  wrote:

help(list.index) seems empty.


Huh that's strange. I'm checking this in a few recent versions, and
they all say "Return first index of value".


Ugh. It isn't empty. But on my local system the pager help() invokes 
seems to use the alternate screen, so when I exit it the help's gone. I 
really need to debug that, it's incredibly annoying.


>3) If your answer to question 1 was incorrect, {does it help, would 
>it have helped} to have a note in the docs?


It would help to be able to understand the behaviour. I think with
`list.index` I'd expect an equality test only (I was surprised by your
"nan" example, even though "nan" is a pretty unusual value).


It might help in those rare instances where you think to go and read
the docs, which basically means the times when something looks wrong
to you.


Often.


It almost certainly won't help for cases where someone doesn't
recognize a problem. With string uppercasing, anyone who would think
to look in the docs for how it handles locale-specific case
conversions is already going to understand that it can't possibly do
anything more than a generic translation table, so I don't think it'd
buy anything to have a note in the docs.


I'm not so sure. For example, my naive inclination with maybe have been 
to look to see if it paid attention to say the POSIX locale, and 
blithely assume that such attention might be enough. Though I wonder if 
acting differently for locales might amount to mojibake sometimes.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MSMYS4EGPXJ32YEAWHJZOCDXHJYQLID5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Chris Angelico
On Sat, 24 Dec 2022 at 15:00, Cameron Simpson  wrote:
>
> On 24Dec2022 14:35, Chris Angelico  wrote:
> >On Sat, 24 Dec 2022 at 13:15, Cameron Simpson  wrote:
> >My question was more: do you know, or do you have to look? I'll take
> >another example. Take the list.index() method, which returns the index
> >where a thing can be found. *Without checking first*, answer these
> >questions:
> >
> >1) Will identical-but-not-equal values (eg the same instance of nan) be 
> >found?
>
> I'd say no, because it should need to do an equality check to compare
> things.
>
> Well, I'm wrong. Assuming it does a precheck with "is", I guess.

Indeed; even experts don't always know this. The check used is the
"identity-or-equality" check common to a lot of containment checks in
Python.

> >2) Do the docs and/or docstring tell you the answer to question 1?
>
> [ To the docs!... How disappointing, the "Index" href at top right does
> not take me directly to list.index :-) ]

:)

> help(list.index) seems empty.

Huh that's strange. I'm checking this in a few recent versions, and
they all say "Return first index of value".

> The best docs seem to be .index for
> sequences: https://docs.python.org/3/library/stdtypes.html#index-19
> which only allude to the intended semantics:
>
>  index of the first occurrence of x in s (at or after index i and
>  before index j)
>
> I guess that though _could_ imply an object identify check, but if I
> read it that way I might really take it to mean identity ("occurrence of
> x in s"), rather than the more useful and expected "also equal".

The docs are a bit lax about mentioning the identity check (see
further up in the same table, where containment is described in terms
of equality; in actual fact, nan in [nan] will be True), but this
seldom comes up in practice because most objects are equal to
themselves.

> The "nan" test says to me that there's an identity check, which is at
> least quite fast and maybe significantly useful for types which intern
> small values like int.

Right. I think that, originally, the identity check was considered to
be merely an optimization, and values that compare unequal to
themselves were considered weird outliers; it's only more recently
that it was decided that the "identity-or-equality" semantic check was
worth documenting. (No behavioural change, just a change in the
attitude towards "weird values".)

> >And then a logical followup:
> >
> >3) If your answer to question 1 was incorrect, {does it help, would it
> >have helped} to have a note in the docs?
>
> It would help to be able to understand the behaviour. I think with
> `list.index` I'd expect an equality test only (I was surprised by your
> "nan" example, even though "nan" is a pretty unusual value).
>

It might help in those rare instances where you think to go and read
the docs, which basically means the times when something looks wrong
to you. It almost certainly won't help for cases where someone doesn't
recognize a problem. With string uppercasing, anyone who would think
to look in the docs for how it handles locale-specific case
conversions is already going to understand that it can't possibly do
anything more than a generic translation table, so I don't think it'd
buy anything to have a note in the docs.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2ALUIGZJL3RNYFI3ZEU7OOEKVNGWL2CR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Cameron Simpson

On 24Dec2022 14:35, Chris Angelico  wrote:

On Sat, 24 Dec 2022 at 13:15, Cameron Simpson  wrote:
My question was more: do you know, or do you have to look? I'll take
another example. Take the list.index() method, which returns the index
where a thing can be found. *Without checking first*, answer these
questions:

1) Will identical-but-not-equal values (eg the same instance of nan) be found?


I'd say no, because it should need to do an equality check to compare 
things.


Let's see:

>>> from math import nan
>>> nan == nan
False
>>> L = [nan]
>>> L.index(nan)
0

Well, I'm wrong. Assuming it does a precheck with "is", I guess.


2) Do the docs and/or docstring tell you the answer to question 1?


[ To the docs!... How disappointing, the "Index" href at top right does 
not take me directly to list.index :-) ]


help(list.index) seems empty. The best docs seem to be .index for 
sequences: https://docs.python.org/3/library/stdtypes.html#index-19

which only allude to the intended semantics:

index of the first occurrence of x in s (at or after index i and 
before index j)


I guess that though _could_ imply an object identify check, but if I 
read it that way I might really take it to mean identity ("occurrence of 
x in s"), rather than the more useful and expected "also equal".


The "nan" test says to me that there's an identity check, which is at 
least quite fast and maybe significantly useful for types which intern 
small values like int.


This test says there's a fallback to an equality test:

>>> L1=[3]
>>> L2=[3]
>>> L1 is L2
False
>>> L1 == L2
True
>>> L1 in [L2]
True


And then a logical followup:

3) If your answer to question 1 was incorrect, {does it help, would it
have helped} to have a note in the docs?


It would help to be able to understand the behaviour. I think with 
`list.index` I'd expect an equality test only (I was surprised by your 
"nan" example, even though "nan" is a pretty unusual value).


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FIL23Q4SBDWUWMH4NOLGVIMCI7AGQGMY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Chris Angelico
On Sat, 24 Dec 2022 at 13:15, Cameron Simpson  wrote:
> >wording that clarifies whether x.upper() uppercases the string
> >in-place?
>
> Well, it says "a copy", so I'd say it's clear.
>

My question was more: do you know, or do you have to look? I'll take
another example. Take the list.index() method, which returns the index
where a thing can be found. *Without checking first*, answer these
questions:

1) Will identical-but-not-equal values (eg the same instance of nan) be found?
2) Do the docs and/or docstring tell you the answer to question 1?

And then a logical followup:

3) If your answer to question 1 was incorrect, {does it help, would it
have helped} to have a note in the docs?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J33BJF75LI3PH4ISNBJ3AIGZALBECJIB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Cameron Simpson

On 24Dec2022 09:11, Chris Angelico  wrote:

On Sat, 24 Dec 2022 at 09:07, Cameron Simpson  wrote:

On 23Dec2022 22:27, Chris Angelico  wrote:
>I think this would be a useful feature to have, although it'll
>probably end up needing a LOT of information (you can't just say "give
>me a locale-correct uppercasing of this string" without further
>context). So IMO it should be third-party.

It would probably be good to have a caveat mentioning these context
difficulties in the docs of the unicodedata and str/string case fiddling
methods. Not a complete exposition, but making it clear that for some
languages the rules require context, maybe with a
hard-to-implement-correctly example of naive/incorrect use.


Do people actually read those warnings?


I have read them, I think, though not for a while.


Hang on, lemme pop into the time machine and add one to the docstring
and docs for str.upper(). Okay, I'm back. Tell me, have you read the
docstring?


Python 3.9.13 (main, Aug 11 2022, 14:01:42)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> help(str.upper)
Help on method_descriptor:

upper(self, /)
Return a copy of the string converted to uppercase.

Hmm. Did you commit the change? Is the key to the time machine back on 
its hook?


Docs:

str.upper()
Return a copy of the string with all the cased characters 4 
converted to uppercase. Note that s.upper().isupper() might be 
False if s contains uncased characters or if the Unicode 
category of the resulting character(s) is not “Lu” (Letter,

uppercase), but e.g.  “Lt” (Letter, titlecase).
The uppercasing algorithm used is described in section 3.13 of 
the Unicode Standard.


and [4] here:

Cased characters are those with general category property being one 
of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” 
(Letter, titlecase).



wording that clarifies whether x.upper() uppercases the string
in-place?


Well, it says "a copy", so I'd say it's clear.

I've only got version 5.0 of Unicode here. [steps into the other 
room...] Thank you, I see you used the time machine to buy me version 
9.0 too :-)


Ah, 3.13 is 7 pages of compact text here.

I was thinking of something a bit more general, like "case changing is a 
complex language and context dependent process, and use of str.upper 
(etc) therefore perform a simplistic operation".


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/D47A4NQKHP4LBWM4B6J3XELBFVKN5DX6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Rob Cliffe via Python-ideas




On 20/12/2022 09:16, Steven D'Aprano wrote:

On Mon, Dec 19, 2022 at 05:53:38PM -0800, Ethan Furman wrote:


Personally, every other time I've wanted to subclass a built-in data type,
I've wanted the built-in methods to return my subclass, not the original
class.


Caveat: If you were subclassing str, you would probably want __str__ and 
__repr__ (if you were not overriding them) to return plain strings.

Best wishes
Rob Cliffe
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J4QUX4BT2MN5HAWKXV2FLPTENE5RFR3E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Chris Angelico
On Sat, 24 Dec 2022 at 09:07, Cameron Simpson  wrote:
>
> On 23Dec2022 22:27, Chris Angelico  wrote:
> >I think this would be a useful feature to have, although it'll
> >probably end up needing a LOT of information (you can't just say "give
> >me a locale-correct uppercasing of this string" without further
> >context). So IMO it should be third-party.
>
> It would probably be good to have a caveat mentioning these context
> difficulties in the docs of the unicodedata and str/string case fiddling
> methods. Not a complete exposition, but making it clear that for some
> languages the rules require context, maybe with a
> hard-to-implement-correctly example of naive/incorrect use.
>

Do people actually read those warnings?

Hang on, lemme pop into the time machine and add one to the docstring
and docs for str.upper(). Okay, I'm back. Tell me, have you read the
docstring? Do you know exactly what it says? For example, is there
wording that clarifies whether x.upper() uppercases the string
in-place?

(I had to actually check that one myself, as I haven't memorized the
docstring either.)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WS73FECJKUW26OVSCYODJCQZ45ZNILR4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Cameron Simpson

On 23Dec2022 22:27, Chris Angelico  wrote:

I think this would be a useful feature to have, although it'll
probably end up needing a LOT of information (you can't just say "give
me a locale-correct uppercasing of this string" without further
context). So IMO it should be third-party.


It would probably be good to have a caveat mentioning these context 
difficulties in the docs of the unicodedata and str/string case fiddling 
methods. Not a complete exposition, but making it clear that for some 
languages the rules require context, maybe with a 
hard-to-implement-correctly example of naive/incorrect use.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LEOBA4VMTNGRSP7ZBDR4ZOVECIA6LJ2J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Chris Angelico
On Fri, 23 Dec 2022 at 21:02, Steven D'Aprano  wrote:
>
> On Fri, Dec 23, 2022 at 06:02:39PM +0900, Stephen J. Turnbull wrote:
>
> > Many would argue that (POSIX) locales aren't a good fit for
> > anything. :-)
>
> :-)
>
> > I agree that it's kind of hard to see anything more complex than a
> > fixed table for the entire Unicode repertoire belonging in str,
> > though.
>
> I think for practical reasons, we don't want to overload the builtin str
> class with excessive complexity. But the string module? Or third-party
> libraries?

Not really convinced that it belongs in string, but it could go in
unicodedata (if it's lifted straight from the Unicode standards and
associated data files), locale, or any sort of third-party library.

I think this would be a useful feature to have, although it'll
probably end up needing a LOT of information (you can't just say "give
me a locale-correct uppercasing of this string" without further
context). So IMO it should be third-party.

Went looking on PyPI to see what already exists, but I didn't find
anything. Might just be that I didn't pick the right keywords to look
for though.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CB5X6OVGH52Q2YVIFK7CMSIOV5HP45EK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Steven D'Aprano
On Fri, Dec 23, 2022 at 06:02:39PM +0900, Stephen J. Turnbull wrote:

> Many would argue that (POSIX) locales aren't a good fit for
> anything. :-)

:-)

> I agree that it's kind of hard to see anything more complex than a
> fixed table for the entire Unicode repertoire belonging in str,
> though.

I think for practical reasons, we don't want to overload the builtin str 
class with excessive complexity. But the string module? Or third-party 
libraries?


> (I admit that my feeling toward Erdogan makes me less
> sympathetic to the Turks. :-)

Does that include the 70% or more Turks who disapprove of Erdoğan?

There are at least 35 surviving Turkic languages, including Azerbaijani, 
Turkmen, Qashqai, Balkan Gagauz, and Tatar. Although Turkish is the 
single largest of them, it only makes up about 38% of all Turkic 
speakers.

All up, there are about 200 million speakers of Turkic languages. That's 
more than Germanic languages (excluding English) or Japanese. If any 
special case should be a special case, it is the Turkish I Problem.

But as I said, probably not in the builtin str class.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QRKEYJ2DM2HCJWQEAVLLUNBBJYQ44WBY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-23 Thread Stephen J. Turnbull
Chris Angelico writes:

 > I don't think str.upper() is the place for it; Python has a locale
 > module that is a better fit for this.

Many would argue that (POSIX) locales aren't a good fit for
anything. :-)

I agree that it's kind of hard to see anything more complex than a
fixed table for the entire Unicode repertoire belonging in str,
though.  (I admit that my feeling toward Erdogan makes me less
sympathetic to the Turks. :-)  Use of locale notation in keys for more
sophisticated treatment is hard to beat as far as I know, though.

Steve


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7KTN6H7HKXJL7NI3MWEQ7ZY6V47XUVJQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-22 Thread Eric V. Smith via Python-ideas
Jim Baker's draft PEP gives runtime behavior to tagged strings. I agree 
it would all be pointless if it's just hints to an editor. Sorry, I 
again don't have a handy link to his PEP. But it's similar in spirit to 
PEP 501.


Eric

On 12/22/2022 2:36 PM, Joao S. O. Bueno wrote:
I am not enthusiastic about this idea at all: as I perceive it it is 
an IDE problem, external to the language, and

should be resolved there - maybe with a recommendation PEP.

But on the other hand, I had seem tens of e-mails discussing 
string-subclassing, so that
annotations could suffice as a hint to inner-string highlighting - and 
then: subclassing is not

really needed at all:
Maybe we can allow string tagging in annotations by using 
`str['html']`, "str['css']"  and so on.
(the typing module even could take no-op names such as "html", "css", 
etc... to mean those
without any other signs, so stuff could be annotated like `template: 
html = ""` which the
the same typing machinery that makes things like `TypedDict`. 
`Required`, etc...
 work would present these as plain "str" to the runtime, while 
allowing any

 tooling to perceive it as a specialized class.


In other words, one could then either write:

mytemplate: str['html'] = " "

Or

from typing import html
mytemplate: html = ...

(the former way could be used for arbitrary tagging as proposed by the
O.P. , and it would be trivial to add a "register" function to 
declaratively create

new tags at static-analysis time.

This syntax has the benefits that static type checkers can take 
full-beneffit of
the string subtypes, correctly pointing out when a "CSS" string is 
passed as
an argument that should contain "HTML", with no drawbacks, no syntax 
changes,

and no backwards compatibility breaks.

On Thu, Dec 22, 2022 at 1:42 AM Christopher Barker 
 wrote:



On Wed, Dec 21, 2022 at 9:35 AM Chris Angelico 
wrote:

>From the look of things, PyUnicode_Join (the internal
function that
handles str.join()) uses a lot of "reaching into the data
structure"
operations for efficiency. It uses PyUnicode_Check (aka
"isinstance(x,
str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and
then proceeds to cast the pointer and directly inspect its
members.

As such, I don't think UserString can ever truly be a str, 



I had figured subclasses of str wouldn’t be full players in the C
code — but join() us pretty fundamental:-(

-CHB
-- 
Christopher Barker, PhD (Chris)


Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at

https://mail.python.org/archives/list/python-ideas@python.org/message/C2HG3QJOU5SLU536CGOJ26VKXVEBZYBH/
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-ideas mailing list --python-ideas@python.org
To unsubscribe send an email topython-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived 
athttps://mail.python.org/archives/list/python-ideas@python.org/message/LFWSTEFW46ATMCTRRM6FZYCYX7WQBWSG/
Code of Conduct:http://python.org/psf/codeofconduct/___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/52ERBBVWPRGXKNSVXUQCNC33DYEHTSER/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-22 Thread Rob Cliffe via Python-ideas



On 19/12/2022 03:23, David Mertz, Ph.D. wrote:
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano  
wrote:


> However, if you want to allow these types to possibly *do*
something with
> the strings inside (validate them, canonicalize them, do a
security check,
> etc), I think I like the other way:
> class html(str): pass
> class css(str): pass

The problem with this is that the builtins are positively hostile to
subclassing. The issue is demonstrated with this toy example:

class mystr(str):
    def method(self):
        return 1234

s = mystr("hello")
print(s.method())  # This is fine.
print(s.upper().method())  # This is not.




Yes, you have to do some more work with the methods you need to use:

class mystr(str):
    def method(self):
    return 1234
    def upper(self):
    return mystr(str(self).upper())

s = mystr("hello")
print(s.method())  # prints 1234
print(s.upper())   # prints HELLO
print(s.upper().method())  # prints 1234

Best wishes
Rob Cliffe___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OYOTMOG57PSBIMMYVIFXNLPX7Q5TR3GM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-22 Thread Joao S. O. Bueno
I am not enthusiastic about this idea at all: as I perceive it it is an IDE
problem, external to the language, and
should be resolved there - maybe with a recommendation PEP.

But on the other hand, I had seem tens of e-mails discussing
string-subclassing, so that
annotations could suffice as a hint to inner-string highlighting - and
then: subclassing is not
really needed at all:
Maybe we can allow string tagging in annotations by using `str['html']`,
"str['css']"  and so on.
(the typing module even could take no-op names such as "html", "css",
etc... to mean those
without any other signs, so stuff could be annotated like `template: html =
""` which the
the same typing machinery that makes things like `TypedDict`. `Required`,
etc...
 work would present these as plain "str" to the runtime, while allowing any
 tooling to perceive it as a specialized class.


In other words, one could then either write:

mytemplate: str['html'] = " "

Or

from typing import html
mytemplate: html = ...

(the former way could be used for arbitrary tagging as proposed by the
O.P. , and it would be trivial to add a "register" function to
declaratively create
new tags at static-analysis time.

This syntax has the benefits that static type checkers can take
full-beneffit of
the string subtypes, correctly pointing out when a "CSS" string is passed
as
an argument that should contain "HTML", with no drawbacks, no syntax
changes,
and no backwards compatibility breaks.

On Thu, Dec 22, 2022 at 1:42 AM Christopher Barker 
wrote:

>
> On Wed, Dec 21, 2022 at 9:35 AM Chris Angelico  wrote:
>
>> >From the look of things, PyUnicode_Join (the internal function that
>> handles str.join()) uses a lot of "reaching into the data structure"
>> operations for efficiency. It uses PyUnicode_Check (aka "isinstance(x,
>> str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and
>> then proceeds to cast the pointer and directly inspect its members.
>>
>> As such, I don't think UserString can ever truly be a str,
>
>
> I had figured subclasses of str wouldn’t be full players in the C code —
> but join() us pretty fundamental:-(
>
> -CHB
> --
> Christopher Barker, PhD (Chris)
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/C2HG3QJOU5SLU536CGOJ26VKXVEBZYBH/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LFWSTEFW46ATMCTRRM6FZYCYX7WQBWSG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-22 Thread Rob Cliffe via Python-ideas




On 17/12/2022 16:07, e...@emilstenstrom.se wrote:

Python's currently supported string types are just single letter, so the 
suggestion is to require tagged strings to be at least two letters.



Er, no:

Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 
bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> rf'{2+2}'
'4'

Best wishes
Rob Cliffe
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OE5MVRZPX4ICFSTBAKA43BPGTFKE6C2Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Christopher Barker
On Wed, Dec 21, 2022 at 9:35 AM Chris Angelico  wrote:

> From the look of things, PyUnicode_Join (the internal function that
> handles str.join()) uses a lot of "reaching into the data structure"
> operations for efficiency. It uses PyUnicode_Check (aka "isinstance(x,
> str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and
> then proceeds to cast the pointer and directly inspect its members.
>
> As such, I don't think UserString can ever truly be a str,


I had figured subclasses of str wouldn’t be full players in the C code —
but join() us pretty fundamental:-(

-CHB
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C2HG3QJOU5SLU536CGOJ26VKXVEBZYBH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Chris Angelico
On Thu, 22 Dec 2022 at 04:14, Christopher Barker  wrote:
>
> On Wed, Dec 21, 2022 at 8:54 AM Chris Angelico  wrote:
>>
>> > I think both of those will call self.__str__, which creates a recursion -- 
>> > that's what I'm trying to avoid.
>> >
>> > I'm sure there are ways to optimize this -- but only worth doing if it's 
>> > worth doing at all :-)
>> >
>>
>> Second one doesn't seem to.
>>
>> >>> class Str(str):
>> ... def __str__(self):
>> ... print("str!")
>> ... return "spam"
>> ... def __repr__(self):
>> ... print("repr!")
>> ... return "SPAM"
>> ...
>> >>> s = Str("ham")
>> >>> f"{s}"
>> str!
>> 'spam'
>> >>> "".join((s,))
>> 'ham'
>
>
> hmm -- interesting trick -- I had jumped to that conclusion -- I wonder what 
> it IS using under the hood?
>

>From the look of things, PyUnicode_Join (the internal function that
handles str.join()) uses a lot of "reaching into the data structure"
operations for efficiency. It uses PyUnicode_Check (aka "isinstance(x,
str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and
then proceeds to cast the pointer and directly inspect its members.

As such, I don't think UserString can ever truly be a str, and it'll
never work with str.join(). The best you'd ever get would be
explicitly mapping str over everything first:

>>> s2 = UserString("eggs")
>>> "-".join(str(s) for s in [s, s2])
str!
'spam-eggs'

And we don't want that to be the default, since we're not writing
JavaScript code here.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3K36PWE26GVKNH5IFV44D6VKRO6TLBGB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Christopher Barker
On Wed, Dec 21, 2022 at 8:54 AM Chris Angelico  wrote:

> > I think both of those will call self.__str__, which creates a recursion
> -- that's what I'm trying to avoid.
> >
> > I'm sure there are ways to optimize this -- but only worth doing if it's
> worth doing at all :-)
> >
>
> Second one doesn't seem to.
>
> >>> class Str(str):
> ... def __str__(self):
> ... print("str!")
> ... return "spam"
> ... def __repr__(self):
> ... print("repr!")
> ... return "SPAM"
> ...
> >>> s = Str("ham")
> >>> f"{s}"
> str!
> 'spam'
> >>> "".join((s,))
> 'ham'
>

hmm -- interesting trick -- I had jumped to that conclusion -- I wonder
what it IS using under the hood?

Interestingly, neither does the f-string, *if* you include a format
> code with lots of room. I guess str.__format__ doesn't always call
> __str__().
>

Now that you mention that, UserString should perhaps have a __format__,
More evidence that it's not really being maintained.

Though maybe not -- perhaps the inherited one will be fine.

Now that I think about it, perhaps the inherited __str__ would be fine as
well.

-CHB


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UHMZ4PHWU7S3GON43LQKOSOVUI3J6AR4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Chris Angelico
On Thu, 22 Dec 2022 at 03:41, Christopher Barker  wrote:
>
> On Wed, Dec 21, 2022 at 1:18 AM Steven D'Aprano  wrote:
>>
>> On Tue, Dec 20, 2022 at 11:55:49PM -0800, Jeremiah Paige wrote:
>> > @property
>> > def data(self):
>> > return f"{self}"
>>
>> By my testing, on Python 3.10, this is slightly faster still:
>>
>> @property
>> def data(self):
>> return "".join((self,))
>
>
> I think both of those will call self.__str__, which creates a recursion -- 
> that's what I'm trying to avoid.
>
> I'm sure there are ways to optimize this -- but only worth doing if it's 
> worth doing at all :-)
>

Second one doesn't seem to.

>>> class Str(str):
... def __str__(self):
... print("str!")
... return "spam"
... def __repr__(self):
... print("repr!")
... return "SPAM"
...
>>> s = Str("ham")
>>> f"{s}"
str!
'spam'
>>> "".join((s,))
'ham'

Interestingly, neither does the f-string, *if* you include a format
code with lots of room. I guess str.__format__ doesn't always call
__str__().

>>> f"{s:s}"
repr!
SPAM
>>> f"{s:1s}"
repr!
SPAM
>>> f"{s:14s}"
'ham   '

Curiouser and curiouser. Especially since the returned strings aren't
enclosed in quotes. Let's try something.

>>> format(s, "10s") is s
False
>>> format(s, "s") is s
True
>>> format(s) is s
str!
False

Huh. How about that.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MJEMRGGUMAFP5W2FSRZJXG7MPWNN44GU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Christopher Barker
On Wed, Dec 21, 2022 at 8:34 AM Jeremiah Paige  wrote:

> That's interesting, for me both 3.9 and 3.10 show the f-string more than
> 5x faster.
> This is just timeit on f'{myvar}' vs ''.join((myvar,)) so it may not be
> the most nuanced
> comparison for a class property.
> Probably unsurprisingly having myvar be precomputed as the single tuple
> also
> gives speedups, around 45% for me.
>

That may be the optimization that 3.11 is doing for you :-)

Now that I think about it, if this is immutable, which it should be, as
it's a str subclass, then perhaps the data string can be pre-computed, as
it was in the original. I liked the property, as philosophically, you don't
want to store the same data twice, but with an immutable, there should be
no danger of it getting out of sync, and it would be faster. (though memory
intensive for large strings).

-CHB






> So if just speed is wanted maybe inject the
> tuple pre-constructed.
>
> ~ Jeremiah
>
> On Wed, Dec 21, 2022 at 1:19 AM Steven D'Aprano 
> wrote:
>
>> On Tue, Dec 20, 2022 at 11:55:49PM -0800, Jeremiah Paige wrote:
>> > @property
>> > def data(self):
>> > return f"{self}"
>>
>> By my testing, on Python 3.10, this is slightly faster still:
>>
>> @property
>> def data(self):
>> return "".join((self,))
>>
>> That's about 14% faster than the f-string version.
>>
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/CCZG6ALFEV3B67LENW5ZDJG5XSHKREG4/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/KUNHKJJJTSXNSJRBTGZNIA2TGYM5OE7O/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FUZ6H6OY4JIJ4CSUUGLDHMILZWU7VXGE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Christopher Barker
On Wed, Dec 21, 2022 at 1:18 AM Steven D'Aprano  wrote:

> On Tue, Dec 20, 2022 at 11:55:49PM -0800, Jeremiah Paige wrote:
> > @property
> > def data(self):
> > return f"{self}"
>
> By my testing, on Python 3.10, this is slightly faster still:
>
> @property
> def data(self):
> return "".join((self,))
>

I think both of those will call self.__str__, which creates a recursion --
that's what I'm trying to avoid.

I'm sure there are ways to optimize this -- but only worth doing if it's
worth doing at all :-)

- CHB



That's about 14% faster than the f-string version.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/CCZG6ALFEV3B67LENW5ZDJG5XSHKREG4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QQWOF4R7QVN6ARSE4RHAQU5FWRT7AEUR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Jeremiah Paige
That's interesting, for me both 3.9 and 3.10 show the f-string more than 5x
faster.
This is just timeit on f'{myvar}' vs ''.join((myvar,)) so it may not be the
most nuanced
comparison for a class property.
Probably unsurprisingly having myvar be precomputed as the single tuple also
gives speedups, around 45% for me. So if just speed is wanted maybe inject
the
tuple pre-constructed.

~ Jeremiah

On Wed, Dec 21, 2022 at 1:19 AM Steven D'Aprano  wrote:

> On Tue, Dec 20, 2022 at 11:55:49PM -0800, Jeremiah Paige wrote:
> > @property
> > def data(self):
> > return f"{self}"
>
> By my testing, on Python 3.10, this is slightly faster still:
>
> @property
> def data(self):
> return "".join((self,))
>
> That's about 14% faster than the f-string version.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/CCZG6ALFEV3B67LENW5ZDJG5XSHKREG4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KUNHKJJJTSXNSJRBTGZNIA2TGYM5OE7O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-21 Thread Steven D'Aprano
On Tue, Dec 20, 2022 at 11:55:49PM -0800, Jeremiah Paige wrote:
> @property
> def data(self):
> return f"{self}"

By my testing, on Python 3.10, this is slightly faster still:

@property
def data(self):
return "".join((self,))

That's about 14% faster than the f-string version.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CCZG6ALFEV3B67LENW5ZDJG5XSHKREG4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Jeremiah Paige
@property
def data(self):
return f"{self}"

Not at a workstation but this should be faster than join() while still
using syntax and not builtin functions.


~ Jeremiah

On Tue, Dec 20, 2022 at 5:38 PM Christopher Barker 
wrote:

> As has been said, a builtin *could* be written that would be "friendly to
> subclassing", by the definition in this thread. (I'll stay out of the
> argument for the moment as to whether that would be better)
>
> I suspect that the reason str acts like it does is that it was originally
> written a LONG time ago, when you couldn't subclass basic built in types at
> all.
>
> Secondarily, it could be a performance tweak -- minimal memory and peak
> performance are pretty critical for strings.
>
> But collections.UserString does exist -- so if you want to subclass, and
> performance isn't critical, then use that. Steven A pointed out that
> UserStrings are not instances of str though. I think THAT is a bug. And
> it's probably that way because with the magic of duck typing, no one cared
> -- but with all the static type hinting going on now, that is a bigger
> liability than it used to be. Also basue when it was written, you couldn't
> subclass str.
>
> Though I will note that run-time type checking of string is relatively
> common compared to other types, due to the whole a-str-is-a-sequence-of-str
> issue making the distinction between a sequence of strings and a string
> itself is sometimes needed. And str is rarely duck typed.
>
> If anyone actually has a real need for this I'd post an issue -- it'd be
> interesting if the core devs see this as a bug or a feature (well, probably
> not feature, but maybe missing feature)
>
> OK -- I got distracted and tried it out -- it was pretty easy to update
> UserString to be a subclass of str. I suspect it isn't done that way now
> because it was originally written because you could not subclass str -- so
> it stored an internal str instead.
>
> The really hacky part of my prototype is this:
>
> # self.data is the original attribute for storing the string internally.
> Partly to prevent my having to re-write all the other methods, and partly
> because you get recursion if you try to use the methods on self when
> overriding them ...
>
> @property
> def data(self):
> return "".join(self)
>
> The "".join is because it was the only way I quickly thought of to make a
> native string without invoking the __str__ method and other initialization
> machinery. I wonder if there is another way? Certainly there is in C, but
> in pure Python?
>
> Anyway, after I did that and wrote a __new__ -- the rest of it "just
> worked".
>
> def __new__(cls, s):
> return super().__new__(cls, s)
>
> UserString and its subclasses return instances of themselves, and
> instances are instances of str.
>
> Code with a couple asserts in the __main__ block enclosed.
>
> Enjoy!
>
> -CHB
>
> NOTE: VERY minimally tested :-)
>
> On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico  wrote:
>
>> On Wed, 21 Dec 2022 at 09:30, Cameron Simpson  wrote:
>> >
>> > On 19Dec2022 22:45, Chris Angelico  wrote:
>> > >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano 
>> wrote:
>> > >> > But this much (say with a better validator) gets you static type
>> checking,
>> > >> > syntax highlighting, and inherent documentation of intent.
>> > >>
>> > >> Any half-way decent static type-checker will immediately fail as
>> soon as
>> > >> you call a method on this html string, because it will know that the
>> > >> method returns a vanilla string, not a html string.
>> > >
>> > >But what does it even mean to uppercase an HTML string? Unless you
>> > >define that operation specifically, the most logical meaning is
>> > >"convert it into a plain string, and uppercase that".
>> >
>> > Yes, this was my thought. I've got a few subclasses of builtin types.
>> > They are not painless.
>> >
>> > For HTML "uppercase" is a kind of ok notion because the tags are case
>> > insensitive.
>>
>> Tag names are, but their attributes might not be, so even that might
>> not be safe.
>>
>> > Notthe case with, say, XML - my personal nagging example is
>> > from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a
>> > "screenoverlay" both existing with different semantics. Ugh.
>>
>> Ugh indeed. Why? Why? Why?
>>
>> > So indeed, I'd probably _want_ .upper to return a plain string and have
>> > special methods to do more targetted things as appropriate.
>> >
>>
>> Agreed.
>>
>> ChrisA
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Christopher Barker, PhD (Chris)
>
> Python Language 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Steven D'Aprano
On Wed, Dec 21, 2022 at 01:18:46AM -0500, David Mertz, Ph.D. wrote:

> I'm on my tablet, so cannot test at the moment. But is `str.upper()` REALLY
> wrong about the Turkish dotless I (and dotted capital I) currently?!

It has to be. Turkic languages like Turkish, Azerbaijani and Tatar 
distinguish dotted and non-dotted I's, leading to a slew of problems 
infamously known as "The Turkish I problem".

(Other languages use undotted i's but not in the same way, e.g. Irish 
roadsigns in Gaelic usually drop the dot to avoid confusion with í. And 
don't confuse the undotted i with the Latin iota ɩ, which is a 
completely different letter to the Greek iota ι. Alphabets are hard.)

In Turkic languages, we have:

Letter:   ıIiİ
---  ---  ---  ---  ---
Lowercase:ııii
Uppercase:IIİİ

Swapping case can never add or remove a dot. (The technical name for the 
dot is "tittle".) Which is perfectly logical, of course.

But most other people with Latin-based alphabets mix the dotted and 
dotless letters together, leading to this lossy table:

Letter:   ıIiİ
---  ---  ---  ---  ---
Lowercase:ıiii
Uppercase:IIIİ

which is the official Unicode case conversion, which Python follows.

>>> "ıIiİ".lower()
'ıiii̇'
>>> "ıIiİ".upper()
'IIIİ'

Just to make the Turkish I problem even more exciting, you aren't 
supposed to use Turkish rules when changing the case of foreign proper 
nouns. So the popular children's book "Alice Harikalar Diyarında" (Alice 
in Wonderland) should use *both* sets of rules when uppercasing to give 
"ALICE HARİKALAR DİYARINDA".

Sometimes the dot can be very significant.

https://gizmodo.com/a-cellphones-missing-dot-kills-two-people-puts-three-m-382026


> That feels like a BPO needed if true.

We do whatever the Unicode standard says to do. They say that 
localisation issues are out of scope for Unicode.


-- 
Steve

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SGQAVETZR6AZ3SS55LNVYL3TLKX6SUZ4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Chris Angelico
On Wed, 21 Dec 2022 at 17:39, David Mertz, Ph.D.  wrote:
>
> Oh yeah. Good points! Do we need a PEP for str.upper() to grow an optional 
> 'locale' argument? I feel like there are examples other than the Turkish i's 
> where this matters, but it's past my bedtime, so they aren't coming to mind.
>

I don't think str.upper() is the place for it; Python has a locale
module that is a better fit for this. (That's where you'd go if you
want to alphabetize strings with proper respect to language, for
instance.) But it's a difficult problem. Some languages have different
case-folding rules depending on whether you're uppercasing a name or
some other word. German needs to know whether something's a noun,
because even when lowercased, they have an initial capital letter.

The Unicode standard offers a reasonably-generic set of tools,
including for case folding. If you feel like delving deep, the
standard talks about case conversions in section 3.13 - about a
hundred and fifty pages into this document:
https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf - but as far
as I know, that's still locale-agnostic. I think (though I haven't
checked) that Python's str.upper/str.lower follow these rules.

Anything non-generic would be a gigantic task, not well suited to the
core string type, as it would need to be extremely context-sensitive.
Anyone who needs that kind of functionality should probably be
reaching for the locale module for other reasons anyway, so IMO that
would be a better place for a case-conversion toolkit.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OEHB7XXMA3O7KKJVQZHS3IYJVEKYKT3P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread David Mertz, Ph.D.
Oh yeah. Good points! Do we need a PEP for str.upper() to grow an optional
'locale' argument? I feel like there are examples other than the Turkish
i's where this matters, but it's past my bedtime, so they aren't coming to
mind.

Maybe Koine Greek which had not adopted the miniscule/majuscule distinction
of post 10th century CE that modern Greek inherited. I feel like
`s.upper(locale='koine')` might sensibly account for this.

On Wed, Dec 21, 2022, 1:23 AM Chris Angelico  wrote:

> On Wed, 21 Dec 2022 at 17:20, David Mertz, Ph.D. 
> wrote:
> >
> > I'm on my tablet, so cannot test at the moment. But is `str.upper()`
> REALLY wrong about the Turkish dotless I (and dotted capital I) currently?!
> >
> > That feels like a BPO needed if true.
>
> It's wrong about the ASCII i and I, which upper and lower case to each
> other. There's no way for str.upper() to be told what language it's
> working with, so it goes with a default that's valid for every
> language except Turkish and its friends. This also means that
> lowercasing "İ" will give "i" which uppercases to "I", so it doesn't
> round-trip. There is no solution other than a language-aware case
> transformation.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/P7VDKYKVEHYQT4HKQJPLSCZIKVVWYF46/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/F3VEWT2QEWJQ2F65EBLNYMW5KOFM7NI5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Cameron Simpson

On 21Dec2022 17:00, Steven D'Aprano  wrote:

On Wed, Dec 21, 2022 at 09:42:51AM +1100, Cameron Simpson wrote:

With str subtypes, the case that comes to my mind is mixing str
subtypes.

[...]

So, yes, for many methods I might reasonably expect a new html(str). But
I can contrive situations where I'd want a plain str


The key word there is *contrive*.


Surely.

I think my notion is that most of the ad hoc lexical str methods don't 
know anything about a str-with-special-semantics and therefore may well 
generally want to return a plain str, so it isn't the disasterous 
starting point I think you're suggesting.


Obviously that's a generalisation.


Obviously there are methods that are expected to return plain old
strings. If you have a html.extract_content() method which extracts the
body of the html document as plain text, stripping out all markup, there
is no point returning a html object and a str will do. But most methods
will need to keep the markup, and so they will need to return a html
object.


Hypothetical. I'm not sure I entirely agree.

I think we can both agree there will be methods which _should_ return a 
str and methods which should return the same type as the source object.  
How the mix plays out depends on the class.



[...] The status quo mostly hurts *lightweight* subclasses:

   class TurkishString(str):
   def upper(self):
   return TurkishString(str.upper(self.replace('i', 'İ')))
   def lower(self):
   return TurkishString(str.lower(self.replace('I', 'ı')))

That's fine so long as the *only* operations you do to a TurkishString
is upper or lower. As soon as you do concatenation, substring
replacement, stripping, joining, etc you get a regular string.

So we've gone from a lightweight subclass that needs to override two
methods, to a heavyweight subclass that needs to override 30+ methods.


I think __getattribute__ may be the go here. There's a calling cost of 
course, but you could fairly easily write a __getattribute__ which (a) 
checked for a superclass matching method and (b) special cases a few 
methods, and otherwise made all methods return either the same class 
(TurkishString) or plain str depending on the majority method flavour.


In fact, if I were doing this for real I might make a mixing or 
intermediate class with such a __getattribute__, provided there was a 
handy TurkishString(str)-ilke call to promote a plain str back into the 
subclass. (My personal preference is solidifying to a .promote(anything) 
method, which is a discuassion for elsewhere.)



This is probably why we don't rely on subclassing that much. Easier to
just write a top-level function and forget about subclassing.


Oooh, I do a _lot_ of subclassing :-)

Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HSQM2AGL4SYCLVVB3S4FHHR2SHCKO5A5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Chris Angelico
On Wed, 21 Dec 2022 at 17:03, Steven D'Aprano  wrote:
> The status quo mostly hurts *lightweight*
> subclasses:
>
> class TurkishString(str):
> def upper(self):
> return TurkishString(str.upper(self.replace('i', 'İ')))
> def lower(self):
> return TurkishString(str.lower(self.replace('I', 'ı')))
>
> That's fine so long as the *only* operations you do to a TurkishString
> is upper or lower. As soon as you do concatenation, substring
> replacement, stripping, joining, etc you get a regular string.
>

Also not a great example, honestly. Part of the problem is that there
*are no good examples*. You need something that subclasses a core data
type, does not change its constructor in any way, and needs to always
get back another of itself when any method is called. But in every
other way, it is its superclass. I think defaultdict comes close, but
it changes the constructor's signature; StrEnum and IntEnum come very
close, but apparently they're just special cases and can be written
off as irrelevant; there really aren't that many situations where this
even comes up.

Part of the problem is that it's really not clear which methods should
return "the same type" and which should return a core data type.
Clearly __len__ on a string should return a vanilla integer,
regardless of the precise class of string; but should bit_length on an
integer return an int of the subclass, or a plain integer? Is it
different just because it happens to return the same data type on a
vanilla int? What about as_integer_ratio() - should that return a
tuple of two of the same type, or should it return (self, 1) ?

Remember that you're asking the int type to make these decisions globally.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NSPAI3QBSOYTL5BHAJR23T5YJOU3PVK2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Chris Angelico
On Wed, 21 Dec 2022 at 17:20, David Mertz, Ph.D.  wrote:
>
> I'm on my tablet, so cannot test at the moment. But is `str.upper()` REALLY 
> wrong about the Turkish dotless I (and dotted capital I) currently?!
>
> That feels like a BPO needed if true.

It's wrong about the ASCII i and I, which upper and lower case to each
other. There's no way for str.upper() to be told what language it's
working with, so it goes with a default that's valid for every
language except Turkish and its friends. This also means that
lowercasing "İ" will give "i" which uppercases to "I", so it doesn't
round-trip. There is no solution other than a language-aware case
transformation.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P7VDKYKVEHYQT4HKQJPLSCZIKVVWYF46/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread David Mertz, Ph.D.
I'm on my tablet, so cannot test at the moment. But is `str.upper()` REALLY
wrong about the Turkish dotless I (and dotted capital I) currently?!

That feels like a BPO needed if true.

On Wed, Dec 21, 2022, 1:04 AM Steven D'Aprano  wrote:

> On Wed, Dec 21, 2022 at 09:42:51AM +1100, Cameron Simpson wrote:
>
> > With str subtypes, the case that comes to my mind is mixing str
> > subtypes.
> [...]
> > So, yes, for many methods I might reasonably expect a new html(str). But
> > I can contrive situations where I'd want a plain str
>
> The key word there is *contrive*.
>
> Obviously there are methods that are expected to return plain old
> strings. If you have a html.extract_content() method which extracts the
> body of the html document as plain text, stripping out all markup, there
> is no point returning a html object and a str will do. But most methods
> will need to keep the markup, and so they will need to return a html
> object.
>
> HTML is probably not the greatest example for this issue, because I
> expect that a full-blown HTML string subclass would probably have to
> override nearly all methods, so in this *specific* case the status quo
> is probably fine in practice. The status quo mostly hurts *lightweight*
> subclasses:
>
> class TurkishString(str):
> def upper(self):
> return TurkishString(str.upper(self.replace('i', 'İ')))
> def lower(self):
> return TurkishString(str.lower(self.replace('I', 'ı')))
>
> That's fine so long as the *only* operations you do to a TurkishString
> is upper or lower. As soon as you do concatenation, substring
> replacement, stripping, joining, etc you get a regular string.
>
> So we've gone from a lightweight subclass that needs to override two
> methods, to a heavyweight subclass that needs to override 30+ methods.
>
> This is probably why we don't rely on subclassing that much. Easier to
> just write a top-level function and forget about subclassing.
>
>
> --
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/Q6JQVEUAQXGX6EMAFVGYGGF7ZENUSMRP/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DXKVTYFTVKM6C2QO4DVFMP7R6XXJCQMF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Steven D'Aprano
On Wed, Dec 21, 2022 at 09:42:51AM +1100, Cameron Simpson wrote:

> With str subtypes, the case that comes to my mind is mixing str 
> subtypes.
[...]
> So, yes, for many methods I might reasonably expect a new html(str). But 
> I can contrive situations where I'd want a plain str

The key word there is *contrive*.

Obviously there are methods that are expected to return plain old 
strings. If you have a html.extract_content() method which extracts the 
body of the html document as plain text, stripping out all markup, there 
is no point returning a html object and a str will do. But most methods 
will need to keep the markup, and so they will need to return a html 
object.

HTML is probably not the greatest example for this issue, because I 
expect that a full-blown HTML string subclass would probably have to 
override nearly all methods, so in this *specific* case the status quo 
is probably fine in practice. The status quo mostly hurts *lightweight* 
subclasses:

class TurkishString(str):
def upper(self):
return TurkishString(str.upper(self.replace('i', 'İ')))
def lower(self):
return TurkishString(str.lower(self.replace('I', 'ı')))

That's fine so long as the *only* operations you do to a TurkishString 
is upper or lower. As soon as you do concatenation, substring 
replacement, stripping, joining, etc you get a regular string.

So we've gone from a lightweight subclass that needs to override two 
methods, to a heavyweight subclass that needs to override 30+ methods.

This is probably why we don't rely on subclassing that much. Easier to 
just write a top-level function and forget about subclassing.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q6JQVEUAQXGX6EMAFVGYGGF7ZENUSMRP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Christopher Barker
On Tue, Dec 20, 2022 at 8:21 PM Stephen J. Turnbull   > UserStrings are not instances of str though. I think THAT is a bug.
>
> I guess, although surely the authors of that class thought about it.


Well, kind of — the entire reason for UserString was that at the time, str
itself could not be subclassed. So it was certainly a feature at the time
;-)

The question is whether anyone thought about it again later, and the docs
seem to indicate not:

UserString

objects

The class, UserString

acts as a wrapper around string objects. The need for this class has been
partially supplanted by the ability to subclass directly from str
; however, this class
can be easier to work with because the underlying string is accessible as
an attribute.
And it has no docstrings at all -- it doesn't strike me that anyone is
putting any thought into carefully maintaining it.

Anyway, this could probably be improved with a StringLike ABC


I'm not so sure -- in many cases, the underlying C implementation is
critical -- and strings are one of those things that generally aren't
duck-typed -- subclassing is a special case of that.

Anyway -- I've only gotten this far 'cause it caught my interest -- but I
have no need for subclassing strings -- but if someone does, I think it
would be worth at least bringing up with the core devs.

-CHB
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U6PT3QA4MHXGE2ZIELUM6PK37HM3TICK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Stephen J. Turnbull
Christopher Barker writes:

 > But collections.UserString does exist -- so if you want to subclass, and
 > performance isn't critical, then use that. Steven A pointed out that
 > UserStrings are not instances of str though. I think THAT is a bug.

I guess, although surely the authors of that class thought about it.

Anyway, this could probably be improved with a StringLike ABC (and we
get to bikeshed whether bytes and bytecode are StringLike -- see ya,
I'm outttahere!)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2M3UWBJDFOROYURIAWTIZ23WRVLIWHHG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Christopher Barker
On Tue, Dec 20, 2022 at 6:20 PM Lucas Wiman  wrote:

> On Tue, Dec 20, 2022 at 5:38 PM Christopher Barker 
> wrote:
>
>> But collections.UserString does exist -- so if you want to subclass, and
>> performance isn't critical, then use that. Steven A pointed out that
>> UserStrings are not instances of str though. I think THAT is a bug. And
>> it's probably that way because with the magic of duck typing, no one cared
>> -- but with all the static type hinting going on now, that is a bigger
>> liability than it used to be. Also basue when it was written, you couldn't
>> subclass str.
>>
>> Though I will note that run-time type checking of string is relatively
>> common compared to other types, due to the whole a-str-is-a-sequence-of-str
>> issue making the distinction between a sequence of strings and a string
>> itself is sometimes needed. And str is rarely duck typed.
>>
>
> Note that UserString does break some built-in functionality, like you
> can't apply regular expressions to a UserString:
> >>> class FooString(UserString):
> ... pass
> ...
> >>> re.compile(r"asdf").match(FooString("asdf"))
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: expected string or bytes-like object, got 'FooString'
>
>
> There is more discussion in this thread (
> https://stackoverflow.com/questions/59756050/python3-when-userstring-does-not-behave-as-a-string),
> including a link to a very old bug (https://bugs.python.org/issue232493).
>

I wonder how many of these issues would go away if userString subclassed
for str. Maybe some?

But at the C level, duck typing simply doesn't work -- you need access to
an actual C string struct. Code that worked with strings *could* have a
little bit of wrapper for subclasses that would dig into it to find the
actual str underneath -- but if that code had to be written everywhere
strings are used in C -- that could be a pretty big project -- probably
what Guido meant by:

"Fixing this will be a major project, probably for Python 3000k"

I don't suppose it has been addressed at all?

Note: at least for string paths, the builtins all use fspath() (or
something) so that should be easy to make work. (and seems to with my
prototype already)

There is a related issue with json.dump etc,

json.dump works with my prototype as well.

-CHB


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/62JRML5OPTLW653RSN6GZD4ZE3TCZ572/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Lucas Wiman
On Tue, Dec 20, 2022 at 5:38 PM Christopher Barker 
wrote:

> But collections.UserString does exist -- so if you want to subclass, and
> performance isn't critical, then use that. Steven A pointed out that
> UserStrings are not instances of str though. I think THAT is a bug. And
> it's probably that way because with the magic of duck typing, no one cared
> -- but with all the static type hinting going on now, that is a bigger
> liability than it used to be. Also basue when it was written, you couldn't
> subclass str.
>
> Though I will note that run-time type checking of string is relatively
> common compared to other types, due to the whole a-str-is-a-sequence-of-str
> issue making the distinction between a sequence of strings and a string
> itself is sometimes needed. And str is rarely duck typed.
>

Note that UserString does break some built-in functionality, like you can't
apply regular expressions to a UserString:
>>> class FooString(UserString):
... pass
...
>>> re.compile(r"asdf").match(FooString("asdf"))
Traceback (most recent call last):
  File "", line 1, in 
TypeError: expected string or bytes-like object, got 'FooString'


There is more discussion in this thread (
https://stackoverflow.com/questions/59756050/python3-when-userstring-does-not-behave-as-a-string),
including a link to a very old bug (https://bugs.python.org/issue232493).
There is a related issue with json.dump etc, though it can be worked around
since there is a python-only json implementation.

I have run into this in practice at a previous job, with a runtime "taint"
tracker for logging access to certain database fields in a Django
application. Many views would select all fields from a table, then not
actually use the fields I needed to log access to, which generated false
positives. (Obviously the "correct" design is to only select data that is
relevant for the given code, but I was instrumenting a legacy codebase with
updated compliance requirements.) So I think there is some legitimate use
for this, though object proxies can be made to work around most of the
issues.

- Lucas
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DZQNWKTSUTKRMA4WFXVERW46AMPYDEAX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Christopher Barker
As has been said, a builtin *could* be written that would be "friendly to
subclassing", by the definition in this thread. (I'll stay out of the
argument for the moment as to whether that would be better)

I suspect that the reason str acts like it does is that it was originally
written a LONG time ago, when you couldn't subclass basic built in types at
all.

Secondarily, it could be a performance tweak -- minimal memory and peak
performance are pretty critical for strings.

But collections.UserString does exist -- so if you want to subclass, and
performance isn't critical, then use that. Steven A pointed out that
UserStrings are not instances of str though. I think THAT is a bug. And
it's probably that way because with the magic of duck typing, no one cared
-- but with all the static type hinting going on now, that is a bigger
liability than it used to be. Also basue when it was written, you couldn't
subclass str.

Though I will note that run-time type checking of string is relatively
common compared to other types, due to the whole a-str-is-a-sequence-of-str
issue making the distinction between a sequence of strings and a string
itself is sometimes needed. And str is rarely duck typed.

If anyone actually has a real need for this I'd post an issue -- it'd be
interesting if the core devs see this as a bug or a feature (well, probably
not feature, but maybe missing feature)

OK -- I got distracted and tried it out -- it was pretty easy to update
UserString to be a subclass of str. I suspect it isn't done that way now
because it was originally written because you could not subclass str -- so
it stored an internal str instead.

The really hacky part of my prototype is this:

# self.data is the original attribute for storing the string internally.
Partly to prevent my having to re-write all the other methods, and partly
because you get recursion if you try to use the methods on self when
overriding them ...

@property
def data(self):
return "".join(self)

The "".join is because it was the only way I quickly thought of to make a
native string without invoking the __str__ method and other initialization
machinery. I wonder if there is another way? Certainly there is in C, but
in pure Python?

Anyway, after I did that and wrote a __new__ -- the rest of it "just
worked".

def __new__(cls, s):
return super().__new__(cls, s)

UserString and its subclasses return instances of themselves, and instances
are instances of str.

Code with a couple asserts in the __main__ block enclosed.

Enjoy!

-CHB

NOTE: VERY minimally tested :-)

On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico  wrote:

> On Wed, 21 Dec 2022 at 09:30, Cameron Simpson  wrote:
> >
> > On 19Dec2022 22:45, Chris Angelico  wrote:
> > >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano 
> wrote:
> > >> > But this much (say with a better validator) gets you static type
> checking,
> > >> > syntax highlighting, and inherent documentation of intent.
> > >>
> > >> Any half-way decent static type-checker will immediately fail as soon
> as
> > >> you call a method on this html string, because it will know that the
> > >> method returns a vanilla string, not a html string.
> > >
> > >But what does it even mean to uppercase an HTML string? Unless you
> > >define that operation specifically, the most logical meaning is
> > >"convert it into a plain string, and uppercase that".
> >
> > Yes, this was my thought. I've got a few subclasses of builtin types.
> > They are not painless.
> >
> > For HTML "uppercase" is a kind of ok notion because the tags are case
> > insensitive.
>
> Tag names are, but their attributes might not be, so even that might
> not be safe.
>
> > Notthe case with, say, XML - my personal nagging example is
> > from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a
> > "screenoverlay" both existing with different semantics. Ugh.
>
> Ugh indeed. Why? Why? Why?
>
> > So indeed, I'd probably _want_ .upper to return a plain string and have
> > special methods to do more targetted things as appropriate.
> >
>
> Agreed.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
"""
A UserString implementation that subclasses from str

so instances of it and its subclasses are instances of string
 -- could be handy for using with static typing.

NOTE: this could probably be cleaner code, but this was done with
  an absolute minimum of changes from what's in the 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Chris Angelico
On Wed, 21 Dec 2022 at 09:30, Cameron Simpson  wrote:
>
> On 19Dec2022 22:45, Chris Angelico  wrote:
> >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:
> >> > But this much (say with a better validator) gets you static type 
> >> > checking,
> >> > syntax highlighting, and inherent documentation of intent.
> >>
> >> Any half-way decent static type-checker will immediately fail as soon as
> >> you call a method on this html string, because it will know that the
> >> method returns a vanilla string, not a html string.
> >
> >But what does it even mean to uppercase an HTML string? Unless you
> >define that operation specifically, the most logical meaning is
> >"convert it into a plain string, and uppercase that".
>
> Yes, this was my thought. I've got a few subclasses of builtin types.
> They are not painless.
>
> For HTML "uppercase" is a kind of ok notion because the tags are case
> insensitive.

Tag names are, but their attributes might not be, so even that might
not be safe.

> Notthe case with, say, XML - my personal nagging example is
> from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a
> "screenoverlay" both existing with different semantics. Ugh.

Ugh indeed. Why? Why? Why?

> So indeed, I'd probably _want_ .upper to return a plain string and have
> special methods to do more targetted things as appropriate.
>

Agreed.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Cameron Simpson

On 20Dec2022 20:16, Steven D'Aprano  wrote:

On Mon, Dec 19, 2022 at 05:53:38PM -0800, Ethan Furman wrote:
Personally, every other time I've wanted to subclass a built-in data 
type, I've wanted the built-in methods to return my subclass, not the original

class.


Enums are special. But outside of enums, I cannot think of any useful
situation where the desirable behaviour is for methods on a subclass to
generally return a superclass rather than the type of self.


With str subtypes, the case that comes to my mind is mixing str 
subtypes.


I happen to be wallowing in Django admin forms at the moment, and they 
have a mark_safe(some_html_here) function, which seems to return a str 
subtype (I infer - it's what I would be doing) - this is used in the 
templating engine to know that it _doesn't_ need to escape markup 
punctuation at render time. The default is escaping, to avoid accidental 
injection.


So...

I'd want an "html" str to support, say, addition to construct longer 
strings. html(str)+str should make a new html str with the plain str 
escaped. html(str)+css(str) should raise a TypeError. Etc etc.


html(str).upper() might uppercase only the bits outside the tags i.e.  
"foo" -> "FOO".


So, yes, for many methods I might reasonably expect a new html(str). But 
I can contrive situations where I'd want a plain str, and I'd be leery 
of "every method returns html(str)" by default - because such a string 
has substructure that seems to warrant careful thought.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CDQVYC2B24IUPNAF4CN5OVTURDVZJVDB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Cameron Simpson

On 19Dec2022 22:45, Chris Angelico  wrote:

On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:

> But this much (say with a better validator) gets you static type checking,
> syntax highlighting, and inherent documentation of intent.

Any half-way decent static type-checker will immediately fail as soon as
you call a method on this html string, because it will know that the
method returns a vanilla string, not a html string.


But what does it even mean to uppercase an HTML string? Unless you
define that operation specifically, the most logical meaning is
"convert it into a plain string, and uppercase that".


Yes, this was my thought. I've got a few subclasses of builtin types.  
They are not painless.


For HTML "uppercase" is a kind of ok notion because the tags are case 
insensitive. Notthe case with, say, XML - my personal nagging example is 
from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a 
"screenoverlay" both existing with different semantics. Ugh.


So indeed, I'd probably _want_ .upper to return a plain string and have 
special methods to do more targetted things as appropriate.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ATFPQTSMGIODXXZA72YFHQULHN3OGR6U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Chris Angelico
On Tue, 20 Dec 2022 at 20:20, Steven D'Aprano  wrote:
>
> On Mon, Dec 19, 2022 at 05:53:38PM -0800, Ethan Furman wrote:
>
> > Personally, every other time I've wanted to subclass a built-in data type,
> > I've wanted the built-in methods to return my subclass, not the original
> > class.
>
> Enums are special. But outside of enums, I cannot think of any useful
> situation where the desirable behaviour is for methods on a subclass to
> generally return a superclass rather than the type of self.
>
> Its normal behaviour for operations on a class K to return K instances,
> not some superclass of K. I dare say there are a few, but they don't
> come to mind.

How should it do that, if the constructor for K has a different
signature from the constructor for K's superclass that is providing
the method? How is the superclass to know how to return a K?

Should the vanilla dict.__or__ method be able to take two defaultdicts
and return a defaultdict, or is it reasonable to demand that, in this
situation, defaultdict needs to define the method itself?

>>> defaultdict(list) | defaultdict(list)
defaultdict(, {})
>>> defaultdict.__or__


I'm not sure how dict.__or__ would be expected to cope with this situation.

Yes, I'm sure it would be convenient. It would also have some
extremely annoying consequences.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OR7C52ZC2CN7SCPEGCNBSKFODV5OTFZD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-20 Thread Steven D'Aprano
On Mon, Dec 19, 2022 at 05:53:38PM -0800, Ethan Furman wrote:

> Personally, every other time I've wanted to subclass a built-in data type, 
> I've wanted the built-in methods to return my subclass, not the original 
> class.

Enums are special. But outside of enums, I cannot think of any useful 
situation where the desirable behaviour is for methods on a subclass to 
generally return a superclass rather than the type of self.

Its normal behaviour for operations on a class K to return K instances, 
not some superclass of K. I dare say there are a few, but they don't 
come to mind.


> All of which is to say:  sometimes you want it one way, sometimes the 
> other.  ;-)

Yes, but one way is *overwhelmingly* more common than the other. 
Builtins make the rare form easy and the common form hard.

> Metaclasses, anyone?

Oh gods, we shouldn't need to write a metaclass just to get methods that 
create instances of the calling class instead of one of its 
superclasses.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TOBBYPYOYBJV2FBC6PQFKZMNK46JCT3Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Stephen J. Turnbull
Brendan Barnwell writes:

 >  What it means for me for something to "be an HTML string" (or more 
 > precisely, to be an instance of HTMLString or whatever the class name 
 > is) is for it to be a string that has an extra tag attached to the 
 > object that means "this is HTML".

I don't like tags that lie.  Seems pointless (see below).

 > The point is that overrides are for specifying the *new* behavior
 > of the subclass (i.e., not allowing certain slice operations); you
 > shouldn't have to override methods just to retain the superclass
 > behavior.

Do you mean "retain the subclass behavior" here?  AFAICS what's being
called "hostile" is precisely retaining *superclass* behavior.

 >  I mean, we were talking about this in the context of syntax 
 > highlighting.  The utility of HTML-string highlighting would be 
 > seriously reduced if only *valid* HTML could be in an HTML string.

The proposed HTMLstring *class* is irrelevant to syntax highlighting,
regardless of its functionality.  The OP (and his syntax-highlighting
text editor!) wants standard literal syntax *in source code* that
allows an editor-that-is-not-as-programmable-as-emacs-or-vim to
recognize a fragment of text (typically in a literal string) that is
supposed to be highlighted as HTML.  Syntax highlighting is not aided
by an HTMLstring object in the *running Python program*.

I really don't understand what value your HTMLstring as str + tag
provides to the OP, or to a Python program.  I guess that an editor
written in Python could manipulate a list of TaggedString objects,
but this is a pretty impoverished model.  Emacsen have had extents/
overlays since 1990 or so, which can be nested or overlap, and nesting
and overlapping are both needed for source code highlighing.[1][2]

I don't take a position on the "builtins are hostile to subclassing"
debate.  I can't recall ever noticing the problem, so I'll let you all
handle that. :-)


Footnotes: 
[1]  In Emacsen, tagged source text (overlays) is used not only for
syntax highlighting which presumably is nested (but TagSoup HTML!),
but also to implement things like hiding text, which is an operation
on raw text that need not respect any syntax.

[2]  XEmacs's implementation of syntax highlighting actually works in
terms of "extent fragments" which are non-overlapping, but they're
horrible to work with from a editor API standpoint.  They're used only
in the implementation of the GUI display, for performance reasons, and
each one typically contains a plethora of tags.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UVQ6PGUKF5EG6UZWOBI76ZQANNFVC5TS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 13:56, Brendan Barnwell  wrote:
>
> On 2022-12-19 13:59, Chris Angelico wrote:
> > On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  
> > wrote:
> >>   > See my example regarding a StrEnum and tell me whether that would be
> >>   > more irritating.
> >>
> >>  I can't run that example myself as I don't have Python 3.11 set 
> >> up.
> >
> > The enum module was added in Python 3.4.
>
> Your example used StrEnum, which was added in Python 3.11.

Oh! My apologies. The older way of spelling it multiple inheritance
but comes to the same thing; it's still very definitely a string.
StrEnum is a lot more convenient, and I've been using 3.11 for long
enough that I forgot when it came in. Even back in 3.5 (the oldest
docs that I have handy), the notion of enum MI was listed as a
recommended method:

https://docs.python.org/3.5/library/enum.html#others

>>> class Demo(str, Enum):
... x = "eggs"
... m = "ham"

Other than that change to the signature, the demonstration behaves
exactly the same (I just tested it on 3.5). Again, my apologies for
unintentionally providing an example that works only on very new
Pythons.

> > Nonetheless, a StrEnum is absolutely a str, and whatever you say about
> > an HTML string has to also be valid for a StrEnum, or else the inverse
> > is.
>
> No, it doesn't, because HTMLString and StrEnum can be different
> subclasses of str with different behavior.  You seem to be missing the
> concept of subclasses here.  Yes, a StrEnum may be an instance of str,
> and an HTMLString may also be an instance of str.  But that does not
> mean the behavior of both needs to be same.  They are instances of
> *different subclasses* of str and can have *different behavior*.  An
> instance of collections.Counter is an instance of dict and so is an
> instance of collections.defaultdict, but that doesn't mean that anything
> I say about a Counter has to be valid for a defaultdict.

That is very true, but whenever the subclass is NOT the same as the
superclass, you provide functionality to do so. Otherwise, the normal
assumption should be that it behaves identically. For instance, if you
iterate over a Counter, you would expect to get all of the keys in it;
it's true that you can subscript it with any value and get back a
zero, but the default behaviour of Counter iteration is to do the same
thing that a dict would.

And that's what we generally see. A StrEnum is a str, and any
behaviours that aren't the same as str are provided by StrEnum (for
instance, it has a different __repr__). But for anything that isn't
overridden - including any new functionality, if you upgrade Python
and keep the same StrEnum code - you get the superclass's behaviour.

> > The way things are, a StrEnum or an HTML string will behave *exactly
> > as a string does*. The alternative is that, if any new operations are
> > added to strings in the future, they have to be explicitly blocked by
> > StrEnum or else they will randomly and mysteriously misbehave - or, at
> > very best, crash with unexpected errors. Which one is more hostile to
> > subclasses?
>
> I already answered that in my previous post.  To repeat: StrEnum is 
> the
> unusual case and I am fine with it being more difficult to create
> something like StrEnum, because that is not as important as making it
> easy to create classes that *do* return an instance of themselves (i.e.,
> an instance of the same type as "self") from their various methods.

I'm of the opinion that this is a lot less special than you might
think, since there are quite a lot of these sorts of special cases.

> The
> current behavior is more hostile to subclasses because people typically
> write subclasses to *extend* the behavior of superclasses, and that is
> hindered if you have to override every superclass method just to make it
> do the same thing but return the result wrapped in the new subclass.

Maybe, but I would say that the solution is to make an easier way to
make a subclass that automatically does those changes - not to make
this the behaviour of all classes, everywhere. Your idea to:

>One way that some libraries implement this for their own classes is to
> have an attribute or method called something like `_class` or
> `_constructor` that specifies which class to use to construct a new
> instance when needed.  By default such a class may return an instance of
> the same type as self (i.e., the most specific subclass), but subclasses
> could override it to do something else.

... have a _class attribute may be a good way to do this, since -
unless otherwise overridden - it would remain where it is. (Though,
minor bikeshedding - a dunder name is probably more appropriate here.)
It could even be done with a mixin:

class Str(autospecialize, str):
__autospecialize__ = __class__
def some_method(self): ...

and then the autospecialize class can handle this. There are many ways
of handling this, and IMO the best *default* 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Brendan Barnwell

On 2022-12-19 13:59, Chris Angelico wrote:

On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  wrote:

  > See my example regarding a StrEnum and tell me whether that would be
  > more irritating.

 I can't run that example myself as I don't have Python 3.11 set up.


The enum module was added in Python 3.4.


Your example used StrEnum, which was added in Python 3.11.


Nonetheless, a StrEnum is absolutely a str, and whatever you say about
an HTML string has to also be valid for a StrEnum, or else the inverse
is.


	No, it doesn't, because HTMLString and StrEnum can be different 
subclasses of str with different behavior.  You seem to be missing the 
concept of subclasses here.  Yes, a StrEnum may be an instance of str, 
and an HTMLString may also be an instance of str.  But that does not 
mean the behavior of both needs to be same.  They are instances of 
*different subclasses* of str and can have *different behavior*.  An 
instance of collections.Counter is an instance of dict and so is an 
instance of collections.defaultdict, but that doesn't mean that anything 
I say about a Counter has to be valid for a defaultdict.


	One way that some libraries implement this for their own classes is to 
have an attribute or method called something like `_class` or 
`_constructor` that specifies which class to use to construct a new 
instance when needed.  By default such a class may return an instance of 
the same type as self (i.e., the most specific subclass), but subclasses 
could override it to do something else.



The way things are, a StrEnum or an HTML string will behave *exactly
as a string does*. The alternative is that, if any new operations are
added to strings in the future, they have to be explicitly blocked by
StrEnum or else they will randomly and mysteriously misbehave - or, at
very best, crash with unexpected errors. Which one is more hostile to
subclasses?


	I already answered that in my previous post.  To repeat: StrEnum is the 
unusual case and I am fine with it being more difficult to create 
something like StrEnum, because that is not as important as making it 
easy to create classes that *do* return an instance of themselves (i.e., 
an instance of the same type as "self") from their various methods.  The 
current behavior is more hostile to subclasses because people typically 
write subclasses to *extend* the behavior of superclasses, and that is 
hindered if you have to override every superclass method just to make it 
do the same thing but return the result wrapped in the new subclass.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TYEXVQ3YLPTUM6LOF2657OXGDD5DNHPZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 12:55, Ethan Furman  wrote:
>
> On 12/19/22 13:59, Chris Angelico wrote:
>
>  > The way things are, a StrEnum or an HTML string will behave *exactly
>  > as a string does*. The alternative is that, if any new operations are
>  > added to strings in the future, they have to be explicitly blocked by
>  > StrEnum or else they will randomly and mysteriously misbehave - or, at
>  > very best, crash with unexpected errors. Which one is more hostile to
>  > subclasses?
>
> As Brendan noted, mixed-type enums are special -- they are meant to be 
> whatever they subclass, with a couple extra
> features/restrictions.

Fair, but defaultdict also exhibits this behaviour, so maybe there are
a number of special cases. Or, as Syndrome put it: "When everyone's
[special]... no one will be."

> Personally, every other time I've wanted to subclass a built-in data type, 
> I've wanted the built-in methods to return my
> subclass, not the original class.
>
> All of which is to say:  sometimes you want it one way, sometimes the other.  
> ;-)

Yep, sometimes each way. So the real question is not "would the
opposite decision make sense in some situations?" but "which one is
less of a problem when it's the wrong decision?". And I put it to you
that returning an instance of the base type is less of a problem, in
the same way that *any other* operation unaware of the subclass would
behave.

def underline(head):
"""Build an underline line for the given heading"""
return "=" * len(head)

Would you expect underline() to return the same type as head, or a
plain str? Would this be true of every single function that returns
something of the same kind as one of its parameters?

> Metaclasses, anyone?

Hmm, how would they help? I do think that metaprogramming could help
here, but not sure about metaclasses specifically.

If I wanted to automate this, I'd go for something like this:

@autospecialize
class Str(str):
def extra_method(self): ...

where the autospecialize decorator would look at your class's first
base class, figure out which methods should get this treatment (only
if not overridden, only if they return that type, not __new__, maybe
other rules), and then add a wrapper that returns __class__(self). But
people will dispute parts of that. Maybe it should be explicitly told
which base class to handle this way. Maybe it'd be better to have an
intermediate class, rather than mutating the subclass. Maybe you
should be explicit about which methods get autospecialized. It's not
an easy problem, and simply returning the base class is the one option
that you can be confident of.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X263X2B4VUEWJIDRL27FNYE2C3S5KV77/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Ethan Furman

On 12/19/22 13:59, Chris Angelico wrote:

> The way things are, a StrEnum or an HTML string will behave *exactly
> as a string does*. The alternative is that, if any new operations are
> added to strings in the future, they have to be explicitly blocked by
> StrEnum or else they will randomly and mysteriously misbehave - or, at
> very best, crash with unexpected errors. Which one is more hostile to
> subclasses?

As Brendan noted, mixed-type enums are special -- they are meant to be whatever they subclass, with a couple extra 
features/restrictions.


Personally, every other time I've wanted to subclass a built-in data type, I've wanted the built-in methods to return my 
subclass, not the original class.


All of which is to say:  sometimes you want it one way, sometimes the other.  
;-)

Metaclasses, anyone?

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H6TQFFG3QZDNC4EJGROYLJVWU6L57XBA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 11:16, Steven D'Aprano  wrote:
> Speaking of dicts, the dict.fromkeys method cooperates with subclasses.
> That proves that it can be done from a builtin. True, it is a
> classmethod rather than an instance method, but any instance method can
> find out its own class by calling `type()` (or the internal, C
> equivalent) on `self`. Just as we can do from Python.
>

What you really mean here is that fromkeys cooperates with subclasses
*that do not change the signature of __init__*. Otherwise, it won't
work.

The reason this is much easier with a classmethod alternate
constructor is that, if you don't want that behaviour, just don't do
that.

>>> class NumDict(dict):
... def __init__(self, max, /):
... for i in range(max): self[i] = i
...
>>> NumDict(10)
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>> NumDict(5)
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
>>> NumDict.fromkeys("abc")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: NumDict.__init__() missing 1 required positional argument: 'max'

So? Just don't use NumDict.fromkeys(), it doesn't make sense for a
NumDict. And that's fine. But if something else didn't work, it would
cause major problems. Which is why instance methods return plain
dictionaries:

>>> type(NumDict(5) | NumDict(3))


How is the vanilla dictionary supposed to know how to construct a
NumDict correctly? Come to think of it, how is dict supposed to know
how to construct a defaultdict? Oh, it doesn't really.

>>> d = defaultdict.fromkeys("asdf", 42)
>>> d["a"]
42
>>> d["b"]
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'b'
>>> d
defaultdict(None, {'a': 42, 's': 42, 'd': 42, 'f': 42})

All it does is construct a vanilla dictionary, because that's all it
knows how to do.

If the rule is "all operations return an object of the subclass
automatically", then the corollary is "all subclasses must retain the
signature of the superclass".

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/45GP3NBIUC3B6MQINMKMZLW4JNOCMOFL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Mon, Dec 19, 2022 at 03:48:01PM -0800, Christopher Barker wrote:
> On Mon, Dec 19, 2022 at 3:39 AM Steven D'Aprano  wrote
> 
> > In any case, I was making a larger point that this same issue applies to
> > other builtins like float, int and more.
> 
> 
> Actually, I think the issue is with immutable types, rather than builtins.

No.

>>> class MyList(list):
... def frobinate(self):
... return "something"
... 
>>> (MyList(range(5)) + []).frobinate()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'list' object has no attribute 'frobinate'

And of course, by default, MyList slices are MyLists too, right? No.

>>> type(MyList(range(5))[1:])


This is less of an issue for dicts because there are few dict methods 
and operators which return dicts.

Speaking of dicts, the dict.fromkeys method cooperates with subclasses. 
That proves that it can be done from a builtin. True, it is a 
classmethod rather than an instance method, but any instance method can 
find out its own class by calling `type()` (or the internal, C 
equivalent) on `self`. Just as we can do from Python.

> And that’s just the nature of the beast.

Of course it is not. We can write classes in Python that cooperate with 
subclasses. The only difference is that builtins are written in C. There 
is nothing fundamental to C that forces this behaviour. It's a choice.


-- 
Steve

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BA6M5Y5ZLPNSGHDRU7U6SBSFCAZAU3MS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Tue, 20 Dec 2022 at 07:13, Brendan Barnwell  wrote:
>  > See my example regarding a StrEnum and tell me whether that would be
>  > more irritating.
>
> I can't run that example myself as I don't have Python 3.11 set up.

The enum module was added in Python 3.4.

Nonetheless, a StrEnum is absolutely a str, and whatever you say about
an HTML string has to also be valid for a StrEnum, or else the inverse
is.

The way things are, a StrEnum or an HTML string will behave *exactly
as a string does*. The alternative is that, if any new operations are
added to strings in the future, they have to be explicitly blocked by
StrEnum or else they will randomly and mysteriously misbehave - or, at
very best, crash with unexpected errors. Which one is more hostile to
subclasses?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FAAYN7V6A26FYL6XGIOMHANDCBDXRATH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Brendan Barnwell

Sorry, accidentally replied off-list. . .

On 2022-12-19 11:36, Chris Angelico wrote:

On Tue, 20 Dec 2022 at 06:29, Brendan Barnwell  wrote:


On 2022-12-19 03:45, Chris Angelico wrote:

On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:

But this much (say with a better validator) gets you static type checking,
syntax highlighting, and inherent documentation of intent.


Any half-way decent static type-checker will immediately fail as soon as
you call a method on this html string, because it will know that the
method returns a vanilla string, not a html string.


But what does it even mean to uppercase an HTML string? Unless you
define that operation specifically, the most logical meaning is
"convert it into a plain string, and uppercase that". Or, similarly,
slicing an HTML string. You could give that a completely different
meaning (maybe defining its children to be tags, and slicing is taking
a selection of those), but if you don't, slicing isn't really a
meaningful operation.


 I don't agree with that at all.  What it means for an HTML string to be
a subclass of a normal string is that all normal string operations still
work on an HTML string --- just like what it means for any instance of a
subclass to be an instance of the superclass is that you can do anything
to the subclass that you could do to the superclass.  Every character in
an HTML string is still a character and can still be uppercased.  The
string is still a sequence of characters and can be sliced.  All such
operations still have a perfectly natural meaning.


And that part is already true. None of this changes. That's guaranteed
by the concept of subclassing. But what you're doing is string
operations on a string.


We just want them to
now return an *HTML* string when they're done instead of a normal one.
The point of having a subclass is to define *additional* behavior while
still retaining the superclass behavior as well.


So how is it still an "HTML" string if you slice out parts of it and
it isn't valid HTML any more?


	What it means for me for something to "be an HTML string" (or more 
precisely, to be an instance of HTMLString or whatever the class name 
is) is for it to be a string that has an extra tag attached to the 
object that means "this is HTML".  That's it.  You can make an HTML 
string that contains utter gobbledegook if you want.  Of course, some 
operations may fail (like if it has a .validate() method) but that 
doesn't mean it's not still an instance of that class.


	Or, if you do want that, you can override the slicing method to raise 
an error if the result isn't valid HTML.  The point is that overrides 
are for specifying the *new* behavior of the subclass (i.e., not 
allowing certain slice operations); you shouldn't have to override 
methods just to retain the superclass behavior.


	I mean, we were talking about this in the context of syntax 
highlighting.  The utility of HTML-string highlighting would be 
seriously reduced if only *valid* HTML could be in an HTML string.


>>  Personally I find Python's behavior in this regard (not 
just for

>> strings but for other builtin types) to be one of its most irritating
>> warts.
>
> See my example regarding a StrEnum and tell me whether that would be
> more irritating.

	I can't run that example myself as I don't have Python 3.11 set up. 
But just from what you showed, I don't find it convincing.  Enums are 
special in that they are specifically designed to allow only a fixed set 
of values.  I see that as the uncommon case, rather than the common one 
of subclassing an "open-ended" class to create a new "open-ended" class 
(i.e., one that does not pre-specify exactly which values are allowed). 
So no, I don't think it would be more irritating.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CT7UM6REJZEA3L6HHI2CGHMPXRZ7NXHI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Chris Angelico
On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano  wrote:
> > But this much (say with a better validator) gets you static type checking,
> > syntax highlighting, and inherent documentation of intent.
>
> Any half-way decent static type-checker will immediately fail as soon as
> you call a method on this html string, because it will know that the
> method returns a vanilla string, not a html string.

But what does it even mean to uppercase an HTML string? Unless you
define that operation specifically, the most logical meaning is
"convert it into a plain string, and uppercase that". Or, similarly,
slicing an HTML string. You could give that a completely different
meaning (maybe defining its children to be tags, and slicing is taking
a selection of those), but if you don't, slicing isn't really a
meaningful operation.

So it should be correct: you cannot simply uppercase an HTML string
and expect sane HTML.

I might be more sympathetic if you were talking about "tainted"
strings (ie those which contain data from an end user), on the basis
that most operations on those should yield tainted strings, but given
that systems of taint tracking seem to have managed just fine with the
existing way of doing things, still not particularly persuasive.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7GVERAPFWRAX463V24IYRKG5HIPYQ23I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Mon, Dec 19, 2022 at 01:02:02AM -0600, Shantanu Jain wrote:

> collections.UserString can take away a lot of this boilerplate pain from
> user defined str subclasses.

At what performance cost?

Also:

>>> s = collections.UserString('spam and eggs')
>>> isinstance(s, str)
False

which pretty much makes UserString useless for any code that does static 
checking or runtime isisinstance checks.

In any case, I was making a larger point that this same issue applies to 
other builtins like float, int and more.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UYRYTKMO3L5GSB2F5A4N5I6J3LTA7DQE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-19 Thread Steven D'Aprano
On Sun, Dec 18, 2022 at 10:23:18PM -0500, David Mertz, Ph.D. wrote:

> I'd agree to "limited", but not "hostile."  Look at the suggestions I
> mentioned: validate, canoncialize, security check.  All of those are
> perfectly fine in `.__new__()`.

No, they aren't perfectly fine, because as soon as you apply any 
operation to your string subclass, you get back a plain vanilla string 
which bypasses your custom `__new__` and so does not perform the 
validation or security check.

> But this much (say with a better validator) gets you static type checking,
> syntax highlighting, and inherent documentation of intent.

Any half-way decent static type-checker will immediately fail as soon as 
you call a method on this html string, because it will know that the 
method returns a vanilla string, not a html string. And that's exactly 
what mypy does:

[steve ~]$ cat static_check_test.py 
class html(str):
pass

def func(s:html) -> None:
pass

func(html('').lower())

[steve ~]$ mypy static_check_test.py 
static_check_test.py:7: error: Argument 1 to "func" has incompatible 
type "str"; expected "html"
Found 1 error in 1 file (checked 1 source file)


Same with auto-completion. Either auto-complete will correctly show you 
that what you thought was a html object isn't, and fail to show any 
additional methods you added; or worse, it will wrongly think it is a 
html object when it isn't, and allow you to autocorrect methods that 
don't exist.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2JPILXSBEPUKHG4E5GH5KJFNOGNWXDYB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Shantanu Jain
collections.UserString can take away a lot of this boilerplate pain from
user defined str subclasses.

On Sun, Dec 18, 2022 at 7:28 PM Steven D'Aprano  wrote:

> On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:
>
> > However, if you want to allow these types to possibly *do* something with
> > the strings inside (validate them, canonicalize them, do a security
> check,
> > etc), I think I like the other way:
> >
> > #2
> >
> > class html(str): pass
> > class css(str): pass
>
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>
>
> To be useable, we have to override every string method that returns a
> string. Including dunders. So your class becomes full of tedious boiler
> plate:
>
> def upper(self):
> return type(self)(super().upper())
> def lower(self):
> return type(self)(super().lower())
> def casefold(self):
> return type(self)(super().casefold())
> # Plus another 29 or so methods
>
> This is not just tedious and error-prone, but it is inefficient: calling
> super returns a regular string, which then has to be copied as a
> subclassed string and the original garbage collected.
>
>
> --
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4RIQ65SHYK3T2KZ2XKOPD45KH2SOFQFI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano  wrote:

> > However, if you want to allow these types to possibly *do* something with
> > the strings inside (validate them, canonicalize them, do a security
> check,
> > etc), I think I like the other way:
> > class html(str): pass
> > class css(str): pass
>
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>

I'd agree to "limited", but not "hostile."  Look at the suggestions I
mentioned: validate, canoncialize, security check.  All of those are
perfectly fine in `.__new__()`.  E.g.:

In [1]: class html(str):
   ...: def __new__(cls, s):
   ...: if not "<" in s:
   ...: raise ValueError("That doesn't look like HTML")
   ...: return str.__new__(cls, s)


In [2]: html("Hello")


In [3]: html("Hello")
---
ValueErrorTraceback (most recent call last)
 in 
> 1 html("Hello")


 in __new__(cls, s)
  2 def __new__(cls, s):
  3 if not "<" in s:
> 4 raise ValueError("That doesn't look like HTML")
  5


ValueError: That doesn't look like HTML


I readily acknowledge that's not a very thorough validator :-).

But this much (say with a better validator) gets you static type checking,
syntax highlighting, and inherent documentation of intent.

I know that lots of things one can do with a str subclass wind up producing
a str instead.  But if the thing you do is just "make sure it is created as
the right kind of thing for static checking and editor assistance, I don't
care about any of that falling back.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AEQCVTJ2ABFQSQHWM62JOJQJI6UU675Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Chris Angelico
On Mon, 19 Dec 2022 at 12:29, Steven D'Aprano  wrote:
> The problem with this is that the builtins are positively hostile to
> subclassing. The issue is demonstrated with this toy example:
>
> class mystr(str):
> def method(self):
> return 1234
>
> s = mystr("hello")
> print(s.method())  # This is fine.
> print(s.upper().method())  # This is not.
>

"Hostile"? I dispute that. Are you saying that every method on a
string has to return something of the same type as self, rather than a
vanilla string? Because that would be far MORE hostile to other types
of string subclass:

>>> import dataclasses
>>> from enum import StrEnum
>>> class Demo(StrEnum):
... x = "eggs"
... m = "ham"
...
>>> Demo.x

>>> isinstance(Demo.x, str)
True
>>> Demo.x.upper()
'EGGS'
>>> Demo.m + " and " + Demo.x
'ham and eggs'

Demo.x is a string. Which means that, unless there's good reason to do
otherwise, it should behave as a string. So it should be possible to
use it as if it were the string "eggs", including appending it to
something, appending something to it, uppercasing it, etc, etc, etc.

So what should happen if you do these kinds of manipulations? Should
attempting to use a string in a normal string context raise
ValueError?

>>> Demo("ham and eggs")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.12/enum.py", line 726, in __call__
return cls.__new__(cls, value)
   ^^^
  File "/usr/local/lib/python3.12/enum.py", line 1121, in __new__
raise ve_exc
ValueError: 'ham and eggs' is not a valid Demo

I would say that *that* would count as "positively hostile to subclassing".

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HCXWIKZ47LI7UIESEYAP63TP2CGWHR5O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Steven D'Aprano
On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:

> However, if you want to allow these types to possibly *do* something with
> the strings inside (validate them, canonicalize them, do a security check,
> etc), I think I like the other way:
> 
> #2
> 
> class html(str): pass
> class css(str): pass

The problem with this is that the builtins are positively hostile to 
subclassing. The issue is demonstrated with this toy example:

class mystr(str):
def method(self):
return 1234

s = mystr("hello")
print(s.method())  # This is fine.
print(s.upper().method())  # This is not.


To be useable, we have to override every string method that returns a 
string. Including dunders. So your class becomes full of tedious boiler 
plate:

def upper(self):
return type(self)(super().upper())
def lower(self):
return type(self)(super().lower())
def casefold(self):
return type(self)(super().casefold())
# Plus another 29 or so methods

This is not just tedious and error-prone, but it is inefficient: calling 
super returns a regular string, which then has to be copied as a 
subclassed string and the original garbage collected.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
Using a typing approach sounds like a fantastic idea.  Moreover, as Stephen
showed, it's easy to make Emacs utilize that, and as I showed, it's easy to
make vim follow that.  I've only written one tiny VS Code extension, but it
wouldn't be hard there either.  I'm not sure how one adds stuff to PyCharm
and other editors, but I have to believe it's possible.

So I see two obvious approaches, both of which 100% fulfill Emil's hope
without new syntax:

#1

from typing import NewType


html = NewType("html", str)
css = NewType("css", str)


a: html = html("Hello world")
b: css = css("h1 { color: #99; }")


def combine(h: html, c: css):
print(f"Combined page elements: {h} | {c}")


combine(a, b)  # <- good
combine(b, a)  # <- bad



However, if you want to allow these types to possibly *do* something with
the strings inside (validate them, canonicalize them, do a security check,
etc), I think I like the other way:

#2

class html(str): pass
class css(str): pass


a: html = html("Hello world")
b: css = css("h1 { color: #99; }")


def combine(h: html, c: css):
print(f"Combined page elements: {h} | {c}")


combine(a, b)
combine(b, a)


The type annotations in the assignment lines are optional, but if you're
doing something other than just creating an instance of the (pseudo-)type,
they might add something.  They might also be what your text editor decides
to use as its marker.

For either version, type analysis will find a problem.  If I hadn't matched
the types in the assignment, it would detect extra problems:

(py3.11) 1310-scratch % mypy tagged_types1.py
tagged_types1.py:13: error: Argument 1 to "combine" has incompatible type
"css"; expected "html"  [arg-type]
tagged_types1.py:13: error: Argument 2 to "combine" has incompatible type
"html"; expected "css"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)


Using typing.Annotated can also be used, but it solves a slightly different
problem.




On Sun, Dec 18, 2022 at 5:24 PM Paul Moore  wrote:

> On Sun, 18 Dec 2022 at 21:42, Christopher Barker 
> wrote:
>
>> On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
>> wrote:
>>
>>> In general, I find any proposal to change Python "because then my text
>>> editor would need to
>>> change to accommodate the language" to be unconvincing.
>>>
>>
>> Personally, I’m skeptical of any proposal to change Python to make it
>> easier for IDEs.
>>
>> But there *may* be other good reasons to do something like this. I’m not
>> a static typing guy, but it segg do me that it could be useful to subtype
>> strings:
>>
>> This function expects an SQL string.
>>
>> This function returns an SQL string.
>>
>> Maybe not worth the overhead, but worth more than giving IDEs hints SATO
>> what to do.
>>
>
> I believe typing has annotated types that could do this.
> Paul
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XCACWMITDR5YNBICCNONLUGZUYC3NFRV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Paul Moore
On Sun, 18 Dec 2022 at 21:42, Christopher Barker 
wrote:

> On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
> wrote:
>
>> In general, I find any proposal to change Python "because then my text
>> editor would need to
>> change to accommodate the language" to be unconvincing.
>>
>
> Personally, I’m skeptical of any proposal to change Python to make it
> easier for IDEs.
>
> But there *may* be other good reasons to do something like this. I’m not a
> static typing guy, but it segg do me that it could be useful to subtype
> strings:
>
> This function expects an SQL string.
>
> This function returns an SQL string.
>
> Maybe not worth the overhead, but worth more than giving IDEs hints SATO
> what to do.
>

I believe typing has annotated types that could do this.
Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U22UUM7J22IKDQCQTMHW27AISQ2H2YOY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread Christopher Barker
On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. 
wrote:

> In general, I find any proposal to change Python "because then my text
> editor would need to
> change to accommodate the language" to be unconvincing.
>

Personally, I’m skeptical of any proposal to change Python to make it
easier for IDEs.

But there *may* be other good reasons to do something like this. I’m not a
static typing guy, but it segg do me that it could be useful to subtype
strings:

This function expects an SQL string.

This function returns an SQL string.

Maybe not worth the overhead, but worth more than giving IDEs hints SATO
what to do.

-CHB
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UUDANPFKWV66IN3DXGTS3VQ6A7XY6YIX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread David Mertz, Ph.D.
Well, obviously I have to come to the defense of vim as well :-).  I'm not
sure what year vim got the capability, but I suspect around as long as
emacs.

This isn't for exactly the same language use case, but finding a quick
example on the internet:

unlet b:current_syntaxsyntax include @srcBash syntax/bash.vim
syntax region srcBashHi start="..." end="..." keepend contains=@srcBash

unlet b:current_syntaxsyntax include @srcHTML syntax/html.vim
syntax region srcHTMLHi start="^...$" end="^...$" keepend contains=@srcHTML

This is easy to adapt to either the named function convention:
`html('Hello')` or to the
standardized-comment convention.

In general, I find any proposal to change Python "because then my text
editor would need to
change to accommodate the language" to be unconvincing.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6PMUCHFX6FG2IT2VHANPGSPX4GNBJAII/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-18 Thread emil
dn wrote:
> > Is this a problem with Python, or with the tool?
> «
> Language injections
> Last modified: 14 December 2022
> Language injections let you work with pieces of code in other languages 
> embedded in your code. When you inject a language (such as HTML, CSS, 
> XML, RegExp, and so on) into a string literal, you get comprehensive 
> code assistance for editing that literal.
> ...
> »
> https://www.jetbrains.com/help/pycharm/using-language-injections.html
> Contains a specific example for Django scripters.
> (sadly as an image - probably wouldn't be handled by this ListServer)

I touched upon this solution in the original post. If all editors could agree 
to use # language=html it would be an ok solution. That API creates lots of 
ambiguity around to what the comment should be applied. Some examples which are 
non-obvious imho:


"" # language=html
"

# language=html

""

# language=html
process_html("")

# language=html
concat_html("", "")


> > If I instead use separate files, I get syntax highlighting and 
> > auto-completion for each file, because editors set language based on file 
> > type. But should I really have to choose?
> > In other situations where files need to be collected together, a 
> data-archive may be used (not to be confused with any historical 
> context, nor indeed with data-compression).

The point here is to have everything in one file, editable and syntax 
highlighted in that same file. I don't think this tip applies to that?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TX35CCY4YLJEGWCODYHTWXWDM2SSANE4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread dn

On 18/12/2022 05.07, e...@emilstenstrom.se wrote:

I'm the maintainer of a small django library called django-components. I've run 
into a problem that I have a language-level solution (tagged strings) to, that 
I think would benefit the wider python community.

*Problem*

...


Seems simple enough, right? The problem is: There's no syntax highlighting in 
my code editor for the three other languages. This makes for a horrible 
developer experience, where you constantly have to hunt for characters inside 
of strings. You saw the missing quote in js_string right? :)


Is this a problem with Python, or with the tool?

«
Language injections
Last modified: 14 December 2022

Language injections let you work with pieces of code in other languages 
embedded in your code. When you inject a language (such as HTML, CSS, 
XML, RegExp, and so on) into a string literal, you get comprehensive 
code assistance for editing that literal.

...
»
https://www.jetbrains.com/help/pycharm/using-language-injections.html


Contains a specific example for Django scripters.
(sadly as an image - probably wouldn't be handled by this ListServer)



If I instead use separate files, I get syntax highlighting and auto-completion 
for each file, because editors set language based on file type. But should I 
really have to choose?


In other situations where files need to be collected together, a 
data-archive may be used (not to be confused with any historical 
context, nor indeed with data-compression).


Might a wrapper around such of PSL's services help to both keep 
everything together, and yet enable separate editing format-recognition?


«
Data Compression and Archiving

The modules described in this chapter support data compression with the 
zlib, gzip, bzip2 and lzma algorithms, and the creation of ZIP- and 
tar-format archives.

...
»
https://docs.python.org/3/library/archiving.html



Disclaimer: JetBrains sponsors our PUG with monthly prizes, eg PyCharm.

--
Regards,
=dn
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VO3EOR7FFVFVERM2MXRNC2GFR2FQHZ6J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread David Mertz, Ph.D.
Just to be clear on my opinion. I think Emil's idea was 100% appropriate to
share on python-ideas, and he does a good job of showing where it works be
useful. Sure, a background search is nice, but not required.

That doesn't mean I *support* the idea. I take a very conservative attitude
towards language changes. I hope I've provided okay explanation of my
non-support, but it's NOT a criticism of Emil in any way.

That said, Jim Baker pitched his similar idea to my at last PyCon, and I
remember coming closer to feeling supportive. Maybe partially just because
I know and like Jim for a long time. But I think he was also suggesting
some extra semantics that seemed to move the needle in my mind.

On Sat, Dec 17, 2022, 2:27 PM Eric V. Smith via Python-ideas <
python-ideas@python.org> wrote:

> Jim Baker has been working on tagged strings, and Guido has a working
> implementation. See https://github.com/jimbaker/tagstr/issues/1
>
> I thought Jim had a draft PEP on this somewhere, but I can’t find it.
>
> --
> Eric
>
> On Dec 17, 2022, at 11:14 AM, e...@emilstenstrom.se wrote:
>
> Hi everyone!
>
> I'm the maintainer of a small django library called django-components.
> I've run into a problem that I have a language-level solution (tagged
> strings) to, that I think would benefit the wider python community.
>
> *Problem*
> A component in my library is a combination of python code, html, css and
> javascript. Currently I glue things together with a python file, where you
> put the paths to the html, css and javascript. When run, it brings all of
> the files together into a component. But for small components, having to
> juggle four different files around is cumbersome, so I've started to look
> for a way to put everything related to the component _in the same file_.
> This makes it much easier to work on, understand, and with fewer places to
> make path errors.
>
> Example:
> class Calendar(component.Component):
>template_string = ''
>css_string = '.calendar { background: pink }'
>js_string = 'document.getElementsByClassName("calendar)[0].onclick =
> function() { alert("click!") }'
>
> Seems simple enough, right? The problem is: There's no syntax highlighting
> in my code editor for the three other languages. This makes for a horrible
> developer experience, where you constantly have to hunt for characters
> inside of strings. You saw the missing quote in js_string right? :)
>
> If I instead use separate files, I get syntax highlighting and
> auto-completion for each file, because editors set language based on file
> type. But should I really have to choose?
>
> *Do we need a python language solution to this?*
> Could the code editors fix this? There's a long issue thread for vscode
> where this is discussed: https://github.com/Microsoft/vscode/issues/1751
> - The reasoning (reasonable imho) is that this is not something that can be
> done generally, but that it needs to be handled at the python vscode
> extension level. Makes sense.
>
> Could the vscode language extension fix this? Well, the language extension
> has no way to know what language it should highlight. If a string is HTML
> or CSS. PyCharm has decided to use a "special python comment" #
> language=html that makes the next string be highlighted in that language.
>
> So if just all editors could standardize on that comment, everything would
> work? I guess so, but is that really the most intuitive API to standardize
> around? If the next statement is not a string, what happens? If the comment
> is on the same line as another statement, does it affect that line, or the
> next? What if there's a newline in between the comment in the string, does
> that work?
>
> *Suggested solution*
> I suggest supporting _tagged strings_ in python. They would look like
> html''.
> * Python should not hold a list of which tagged strings it should support,
> it should be possible to use any tag.
> * To avoid clashes with current raw strings and unicode strings, a tag
> should be required to be at least 2 characters long (I'm open to other ways
> to avoid this).
>
> I like this syntax because:
> 1. It's clear what string the tag is affecting.
> 2. It makes sense when you read it, even though you've never seen the
> syntax before.
> 3. It clearly communicates which language to highlight to code editors,
> since you can use the language identifiers that already exist:
> https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers
> - for single letter languages, which are not supported to avoid clash with
> raw strings and unicode strings, the language extension would have to
> support "r-lang" and "c-lang" instead.
> 4. It mimics the syntax of tagged string templates in javascript (
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates).
> So it has som precedent.
>
> (If desirable, I think mimicing javascript further and making tagged
> strings call a function with the tag's name, would 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread Eric V. Smith via Python-ideas
Jim Baker has been working on tagged strings, and Guido has a working 
implementation. See https://github.com/jimbaker/tagstr/issues/1

I thought Jim had a draft PEP on this somewhere, but I can’t find it. 

--
Eric

> On Dec 17, 2022, at 11:14 AM, e...@emilstenstrom.se wrote:
> 
> Hi everyone!
> 
> I'm the maintainer of a small django library called django-components. I've 
> run into a problem that I have a language-level solution (tagged strings) to, 
> that I think would benefit the wider python community.
> 
> *Problem*
> A component in my library is a combination of python code, html, css and 
> javascript. Currently I glue things together with a python file, where you 
> put the paths to the html, css and javascript. When run, it brings all of the 
> files together into a component. But for small components, having to juggle 
> four different files around is cumbersome, so I've started to look for a way 
> to put everything related to the component _in the same file_. This makes it 
> much easier to work on, understand, and with fewer places to make path errors.
> 
> Example:
> class Calendar(component.Component):
>template_string = ''
>css_string = '.calendar { background: pink }'
>js_string = 'document.getElementsByClassName("calendar)[0].onclick = 
> function() { alert("click!") }'
> 
> Seems simple enough, right? The problem is: There's no syntax highlighting in 
> my code editor for the three other languages. This makes for a horrible 
> developer experience, where you constantly have to hunt for characters inside 
> of strings. You saw the missing quote in js_string right? :)
> 
> If I instead use separate files, I get syntax highlighting and 
> auto-completion for each file, because editors set language based on file 
> type. But should I really have to choose?
> 
> *Do we need a python language solution to this?*
> Could the code editors fix this? There's a long issue thread for vscode where 
> this is discussed: https://github.com/Microsoft/vscode/issues/1751 - The 
> reasoning (reasonable imho) is that this is not something that can be done 
> generally, but that it needs to be handled at the python vscode extension 
> level. Makes sense.
> 
> Could the vscode language extension fix this? Well, the language extension 
> has no way to know what language it should highlight. If a string is HTML or 
> CSS. PyCharm has decided to use a "special python comment" # language=html 
> that makes the next string be highlighted in that language. 
> 
> So if just all editors could standardize on that comment, everything would 
> work? I guess so, but is that really the most intuitive API to standardize 
> around? If the next statement is not a string, what happens? If the comment 
> is on the same line as another statement, does it affect that line, or the 
> next? What if there's a newline in between the comment in the string, does 
> that work?
> 
> *Suggested solution*
> I suggest supporting _tagged strings_ in python. They would look like 
> html''. 
> * Python should not hold a list of which tagged strings it should support, it 
> should be possible to use any tag. 
> * To avoid clashes with current raw strings and unicode strings, a tag should 
> be required to be at least 2 characters long (I'm open to other ways to avoid 
> this).
> 
> I like this syntax because:
> 1. It's clear what string the tag is affecting. 
> 2. It makes sense when you read it, even though you've never seen the syntax 
> before.
> 3. It clearly communicates which language to highlight to code editors, since 
> you can use the language identifiers that already exist: 
> https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers
>  - for single letter languages, which are not supported to avoid clash with 
> raw strings and unicode strings, the language extension would have to support 
> "r-lang" and "c-lang" instead.
> 4. It mimics the syntax of tagged string templates in javascript 
> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates).
>  So it has som precedent. 
> 
> (If desirable, I think mimicing javascript further and making tagged strings 
> call a function with the tag's name, would be a great addition to Python too. 
> This would make the syntax for parsing foreign languages much nicer. But this 
> is not required for my specific problem, it's just a nice next possible step 
> for this feature.)
> 
> *Backwards compatibility*
> This syntax currently raises a invalid syntax error. So introducing this 
> shouldn't break existing programs. Python's currently supported string types 
> are just single letter, so the suggestion is to require tagged strings to be 
> at least two letters. 
> 
> *Feedback?*
> What are your thoughts on this? Do you see a value in adding tagged strings 
> to python? Are there other use-cases where this would be useful? Does the 
> suggestion need to support calling tags as functions like in javascript 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread David Mertz, Ph.D.
On Sat, Dec 17, 2022, 1:03 PM  wrote:

> > Moreover, there is no reason an editor could not have a capability to
> > "colorize any string passed to a function named foo()."  Perhaps with
> some
> > sort of configuration file that indicates which function names correspond
> > to which languages, but also with presets.
>
> This is an interesting idea. Some counter-arguments:
> * Anything that's hidden behind a config file won't be used except by very
> few. So, as you say, you need presets somehow.


I've been using vim long enough that I probably only edit .vimrc (or
correspondingly for neovim) every week or two.

I use VS Code much less, so when I do, I probably edit setting.json more
like once a day (when I'm using it)

But many editors in any cases, have friendly custom editors for some
elements of their configs.

Of course, if presets are fine, indeed users need not change them. Tagged
templates do EXACTLY ZERO to make this less of a concern.

If there was a chance this could happen, it would solve my problem nicely.
> For the reasons above, I don't think this will be acceptable to editors.
>

I could trivially implement this in a few lines within every modern editor
I am aware of. I bet you can do it for your editor with less than 2 hours
effort.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3KTEUPIUJYJAAF2RBE3RVGQDISKCNTWM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread Emil Stenström
On Sat, Dec 17, 2022, at 19:20, Bruce Leban wrote:
> 
> On Sat, Dec 17, 2022 at 10:10 AM  wrote:
>> I replied to this in a separate post, but html() is likely a function name 
>> that is used in millions of existing code bases. Applying this rule to all 
>> of them will lead to too many errors to be acceptable to editors I think. 
>> And if this has to be explicitly configured in an editor very few will use 
>> it.
> 
> Understood. This string suffix syntax is supported by Python today and syntax 
> highlighters could be modified to support this without requiring changes to 
> any other component.
> 
> class Calendar(component.Component):
> template_string = '' ##html
> css_string = '.calendar { background: pink }' ##css
> js_string = 'document.getElementsByClassName("calendar")[0].onclick = 
> function() { alert("click!") }' ##javascript
> 
> --- Bruce

PyCharm supports syntax similar to this. They put a # language=html on the line 
in front of the string. I think this is messy for the reasons in my original 
post, but maybe this is the only reasonable way forward. I'll see if I can ask 
the vscode python language extension team what they think.

Nice to see you fixed the syntax error in the js too! :)___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XXIAQD5BMVEYFD3MUCXT7XOC5WLFZ64L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread C. Titus Brown via Python-ideas


> On Dec 17, 2022, at 10:08 AM, e...@emilstenstrom.se wrote:
> 
> Bruce Leban wrote:
>>> Try googling "python-ideas string prefixes". Doing mimimal diligence is a
>>> reasonable expectation before writing up an idea.
> 
> Thanks for the query "string prefixes". I tried other queries but not that 
> one. I ended my first message with "I hope I didn't break any unspoken rules" 
> and it seems I have. 

My two cents (speaking as long-term observer, not as the moderator, or perhaps 
in addition to the moderator ;) - I think your ask was appropriate, and I think 
the response of “here’s the search you should do!” was great.

Personally I think we could do without the implication that you should have 
done more due diligence. python-ideas is PRECISELY for this kind of question. 
Other forums should have a higher barrier to entry (like python-dev), but not 
python-ideas.

best,
—titus

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HOY4DYOSV5723735LI3BOJY7UUKD6NGK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread Bruce Leban
On Sat, Dec 17, 2022 at 10:10 AM  wrote:

>
> I replied to this in a separate post, but html() is likely a function name
> that is used in millions of existing code bases. Applying this rule to all
> of them will lead to too many errors to be acceptable to editors I think.
> And if this has to be explicitly configured in an editor very few will use
> it.
>

Understood. This string suffix syntax is supported by Python today and
syntax highlighters could be modified to support this without requiring
changes to any other component.

class Calendar(component.Component):
template_string = '' ##html
css_string = '.calendar { background: pink }' ##css
js_string = 'document.getElementsByClassName("calendar")[0].onclick =
function() { alert("click!") }' ##javascript

--- Bruce
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UHFXAYTAXLEX2DIV55GIJDW2K5TXUCMG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread emil
For reference: This thread has a much deeper discussion of this idea: 
https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19740/11

I'll continue the discussion there instead.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NT7KUR7GSKT7JXXNHDPJ5F3HRIP4H7FB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread emil
Bruce Leban wrote:
> > Try googling "python-ideas string prefixes". Doing mimimal diligence is a
> > reasonable expectation before writing up an idea.

Thanks for the query "string prefixes". I tried other queries but not that one. 
I ended my first message with "I hope I didn't break any unspoken rules" and it 
seems I have. 

> > If the tags are called as functions then you can do it today with this:
> > def html(s):
> > return s
> > HEAD = html('')
> > If I'm not missing anything, this doesn't help with syntax highlighting?
> > Highlighting is the problem I'm talking about in my post above.
> > 
> > > Not true. A syntax highlighter can certainly recognize html('...') just as
> > > it can recognize html'...'.

I replied to this in a separate post, but html() is likely a function name that 
is used in millions of existing code bases. Applying this rule to all of them 
will lead to too many errors to be acceptable to editors I think. And if this 
has to be explicitly configured in an editor very few will use it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SRGLKEP7P4LSP4R3KOQW34IV35NGDSYS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread emil
David Mertz, Ph.D. wrote:
> My impression whenever this idea is proposed is like Barry's.  The "win"
> isn't big enough not simply to use named functions.

Named functions solve another problem, so I don't see how this is an 
alternative? More on this below.

> Balancing out the slight "win" is the much larger loss of adding additional
> complexity to the Python language. New grammar, new parser, possibly new
> semantics if tagged strings are more than exclusively decorative.  It's not
> a *huge* complexity, but it's more than zero, and these keep adding up.
> Python is SO MUCH less simple than it was when I learned it in 1998.  While
> each individual change might have its independent value, it is now hard to
> describe Python as a "simple language."

This is an argument against _any_ change to the language. I recognize this 
sentiment, but stopping all change in the hopes of python being simple again I 
don't agree with. I don't think the general python developer is there either.

> Moreover, there is no reason an editor could not have a capability to
> "colorize any string passed to a function named foo()."  Perhaps with some
> sort of configuration file that indicates which function names correspond
> to which languages, but also with presets. 

This is an interesting idea. Some counter-arguments: 
* Anything that's hidden behind a config file won't be used except by very few. 
So, as you say, you need presets somehow. 
* Using presents for something simple like html() would render a lot of 
existing code differently than before this change. I don't think this i 
acceptable.
* The idea that "when a function named X is called, the parameter should be 
highlighted with language X" seems complicated to implement in a code editor. 
* Will it apply for all arguments, just the first one, or all strings? 

Due to the above I think it makes more sense to tag _the string_, not the 
calling function. 

> The details could be worked
> out, and maybe even an informal lexicon could be developed in a shared way.
> But all we save with more syntax is two character.  And the function style
> is exactly what JavaScript tagged strings do anyway, just as a shorthand
> for "call a function".  Compare:
> header = html`Hello`
> header = html("Hello")

The point here is not saving characters typed, it's tagging a string so it's 
easy for an editor to highlight it. For the reasons I listed above the two 
versions above are not equivalent.

> If we imagine that your favorite editor does the same colorization inside
> the wrapped string either way, how are these really different?

If there was a chance this could happen, it would solve my problem nicely. For 
the reasons above, I don't think this will be acceptable to editors.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WCLJ6F6BNXLTH6ZQPWTOKO6OWGAIRO5Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread Bruce Leban
On Sat, Dec 17, 2022 at 9:43 AM  wrote:

>
> Your reply could easily be read as "this is a bad idea, and you shouldn't
> have bothered writing it down". I hope that was not your intention, and
> instead it comes from handling self-indulgent people expecting things from
> you all day. I know, I get those requests too. I'll assume that was not
> your intention in my answers below.


>
Barry Scott wrote:
> > I think this has been discussed before and rejected.
>
> Do you have a link to that discussion, or is this just from memory? What
> should I search for to find this discussion? Why was it rejected?
>

Try googling "python-ideas string prefixes". Doing mimimal diligence is a
reasonable expectation before writing up an idea.

> If the tags are called as functions then you can do it today with this:
> > def html(s):
> > return s
> > HEAD = html('')
>
> If I'm not missing anything, this doesn't help with syntax highlighting?
> Highlighting is the problem I'm talking about in my post above.
>

Not true. A syntax highlighter can certainly recognize html('...') just as
it can recognize html'...'.

--- Bruce
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CZGJIZLCI5DVUKYIZENNADITYXNCRKLX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread emil
Hi Barry,

Your reply could easily be read as "this is a bad idea, and you shouldn't have 
bothered writing it down". I hope that was not your intention, and instead it 
comes from handling self-indulgent people expecting things from you all day. I 
know, I get those requests too. I'll assume that was not your intention in my 
answers below.

Barry Scott wrote:
> I think this has been discussed before and rejected.

Do you have a link to that discussion, or is this just from memory? What should 
I search for to find this discussion? Why was it rejected?

> Your need 2 things to happen
> (1) a syntax change in python that is acceptable
> (2) a significant editor to support syntax highlighting for that python 
> change.
> (3) someone willing to write and support the feature in the python code base

I understand all these 3 things are needed. I'm saying that I think this 
feature is worth it. Do you mean I should do things in a separate order? We are 
in the idea stage, before a (1) strict syntax can be suggested.

> Will you write and support the code?

Is commiting to write the code a requirement to suggest an idea? Or course this 
is required down the line, but let's see if this is a good idea first?

> If the tags are called as functions then you can do it today with this:
> def html(s):
> return s
> HEAD = html('')

If I'm not missing anything, this doesn't help with syntax highlighting? 
Highlighting is the problem I'm talking about in my post above.

Regards,
Emil
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TTED5M26SNXR6JKI4RAS77RGEHXOGR2O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread David Mertz, Ph.D.
My impression whenever this idea is proposed is like Barry's.  The "win"
isn't big enough not simply to use named functions.

Balancing out the slight "win" is the much larger loss of adding additional
complexity to the Python language. New grammar, new parser, possibly new
semantics if tagged strings are more than exclusively decorative.  It's not
a *huge* complexity, but it's more than zero, and these keep adding up.

Python is SO MUCH less simple than it was when I learned it in 1998.  While
each individual change might have its independent value, it is now hard to
describe Python as a "simple language."

Moreover, there is no reason an editor could not have a capability to
"colorize any string passed to a function named foo()."  Perhaps with some
sort of configuration file that indicates which function names correspond
to which languages, but also with presets.  The details could be worked
out, and maybe even an informal lexicon could be developed in a shared way.

But all we save with more syntax is two character.  And the function style
is exactly what JavaScript tagged strings do anyway, just as a shorthand
for "call a function".  Compare:

header = html`Hello`
header = html("Hello")

If we imagine that your favorite editor does the same colorization inside
the wrapped string either way, how are these really different?

On Sat, Dec 17, 2022 at 12:01 PM Barry Scott  wrote:

>
>
> > On 17 Dec 2022, at 16:07, e...@emilstenstrom.se wrote:
> >
> > Hi everyone!
> >
> > I'm the maintainer of a small django library called django-components.
> I've run into a problem that I have a language-level solution (tagged
> strings) to, that I think would benefit the wider python community.
> >
> > *Problem*
> > A component in my library is a combination of python code, html, css and
> javascript. Currently I glue things together with a python file, where you
> put the paths to the html, css and javascript. When run, it brings all of
> the files together into a component. But for small components, having to
> juggle four different files around is cumbersome, so I've started to look
> for a way to put everything related to the component _in the same file_.
> This makes it much easier to work on, understand, and with fewer places to
> make path errors.
> >
> > Example:
> > class Calendar(component.Component):
> >template_string = ''
> >css_string = '.calendar { background: pink }'
> >js_string = 'document.getElementsByClassName("calendar)[0].onclick =
> function() { alert("click!") }'
> >
> > Seems simple enough, right? The problem is: There's no syntax
> highlighting in my code editor for the three other languages. This makes
> for a horrible developer experience, where you constantly have to hunt for
> characters inside of strings. You saw the missing quote in js_string right?
> :)
> >
> > If I instead use separate files, I get syntax highlighting and
> auto-completion for each file, because editors set language based on file
> type. But should I really have to choose?
> >
> > *Do we need a python language solution to this?*
> > Could the code editors fix this? There's a long issue thread for vscode
> where this is discussed: https://github.com/Microsoft/vscode/issues/1751
> - The reasoning (reasonable imho) is that this is not something that can be
> done generally, but that it needs to be handled at the python vscode
> extension level. Makes sense.
> >
> > Could the vscode language extension fix this? Well, the language
> extension has no way to know what language it should highlight. If a string
> is HTML or CSS. PyCharm has decided to use a "special python comment" #
> language=html that makes the next string be highlighted in that language.
> >
> > So if just all editors could standardize on that comment, everything
> would work? I guess so, but is that really the most intuitive API to
> standardize around? If the next statement is not a string, what happens? If
> the comment is on the same line as another statement, does it affect that
> line, or the next? What if there's a newline in between the comment in the
> string, does that work?
> >
> > *Suggested solution*
> > I suggest supporting _tagged strings_ in python. They would look like
> html''.
> > * Python should not hold a list of which tagged strings it should
> support, it should be possible to use any tag.
> > * To avoid clashes with current raw strings and unicode strings, a tag
> should be required to be at least 2 characters long (I'm open to other ways
> to avoid this).
> >
> > I like this syntax because:
> > 1. It's clear what string the tag is affecting.
> > 2. It makes sense when you read it, even though you've never seen the
> syntax before.
> > 3. It clearly communicates which language to highlight to code editors,
> since you can use the language identifiers that already exist:
> https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers
> - for single letter languages, which are not supported to avoid 

[Python-ideas] Re: Idea: Tagged strings in python

2022-12-17 Thread Barry Scott


> On 17 Dec 2022, at 16:07, e...@emilstenstrom.se wrote:
> 
> Hi everyone!
> 
> I'm the maintainer of a small django library called django-components. I've 
> run into a problem that I have a language-level solution (tagged strings) to, 
> that I think would benefit the wider python community.
> 
> *Problem*
> A component in my library is a combination of python code, html, css and 
> javascript. Currently I glue things together with a python file, where you 
> put the paths to the html, css and javascript. When run, it brings all of the 
> files together into a component. But for small components, having to juggle 
> four different files around is cumbersome, so I've started to look for a way 
> to put everything related to the component _in the same file_. This makes it 
> much easier to work on, understand, and with fewer places to make path errors.
> 
> Example:
> class Calendar(component.Component):
>template_string = ''
>css_string = '.calendar { background: pink }'
>js_string = 'document.getElementsByClassName("calendar)[0].onclick = 
> function() { alert("click!") }'
> 
> Seems simple enough, right? The problem is: There's no syntax highlighting in 
> my code editor for the three other languages. This makes for a horrible 
> developer experience, where you constantly have to hunt for characters inside 
> of strings. You saw the missing quote in js_string right? :)
> 
> If I instead use separate files, I get syntax highlighting and 
> auto-completion for each file, because editors set language based on file 
> type. But should I really have to choose?
> 
> *Do we need a python language solution to this?*
> Could the code editors fix this? There's a long issue thread for vscode where 
> this is discussed: https://github.com/Microsoft/vscode/issues/1751 - The 
> reasoning (reasonable imho) is that this is not something that can be done 
> generally, but that it needs to be handled at the python vscode extension 
> level. Makes sense.
> 
> Could the vscode language extension fix this? Well, the language extension 
> has no way to know what language it should highlight. If a string is HTML or 
> CSS. PyCharm has decided to use a "special python comment" # language=html 
> that makes the next string be highlighted in that language. 
> 
> So if just all editors could standardize on that comment, everything would 
> work? I guess so, but is that really the most intuitive API to standardize 
> around? If the next statement is not a string, what happens? If the comment 
> is on the same line as another statement, does it affect that line, or the 
> next? What if there's a newline in between the comment in the string, does 
> that work?
> 
> *Suggested solution*
> I suggest supporting _tagged strings_ in python. They would look like 
> html''. 
> * Python should not hold a list of which tagged strings it should support, it 
> should be possible to use any tag. 
> * To avoid clashes with current raw strings and unicode strings, a tag should 
> be required to be at least 2 characters long (I'm open to other ways to avoid 
> this).
> 
> I like this syntax because:
> 1. It's clear what string the tag is affecting. 
> 2. It makes sense when you read it, even though you've never seen the syntax 
> before.
> 3. It clearly communicates which language to highlight to code editors, since 
> you can use the language identifiers that already exist: 
> https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers
>  - for single letter languages, which are not supported to avoid clash with 
> raw strings and unicode strings, the language extension would have to support 
> "r-lang" and "c-lang" instead.
> 4. It mimics the syntax of tagged string templates in javascript 
> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates).
>  So it has som precedent. 
> 
> (If desirable, I think mimicing javascript further and making tagged strings 
> call a function with the tag's name, would be a great addition to Python too. 
> This would make the syntax for parsing foreign languages much nicer. But this 
> is not required for my specific problem, it's just a nice next possible step 
> for this feature.)
> 
> *Backwards compatibility*
> This syntax currently raises a invalid syntax error. So introducing this 
> shouldn't break existing programs. Python's currently supported string types 
> are just single letter, so the suggestion is to require tagged strings to be 
> at least two letters. 
> 
> *Feedback?*
> What are your thoughts on this? Do you see a value in adding tagged strings 
> to python? Are there other use-cases where this would be useful? Does the 
> suggestion need to support calling tags as functions like in javascript to be 
> interesting?
> 
> (I'm new to python-ideas, so I hope I haven't broken some unspoken rule with 
> this suggestion.)

I think this has been discussed before and rejected.

Your need 2 things to happen
(1) a