[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Andrew Barnert via Python-ideas
On Sep 3, 2019, at 06:17, Rhodri James  wrote:
> 
>> On 03/09/2019 13:31, Chris Angelico wrote:
>>> On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:
>>> 
 On 31/08/2019 12:31, Chris Angelico wrote:
 We call it a string, but a bytes object has as much in common with
 bytearray and with a list of integers as it does with a text string.
>>> 
>>> You say that as if text strings aren't sequences of bytes.  Complicated
>>> and restricted sequences, I grant you, but no more so than a packet for
>>> a given network protocol.
>>> 
>> A text string is a sequence of characters. By "byte", I really mean
>> "octet", but Python prefers to say "byte".
> 
> And a character is a byte or sequence of bytes. (Odd-sized bytes are pretty 
> much history now, so for non-pendantic usages "byte" is good enough.)

Forget about bytes vs. octets; this still isn’t a useful perspective.

A character is a grapheme cluster, a sequence or one or more code points. A 
code point is an integer between 0 and 1.1M. A string is a flattened sequence 
of grapheme clusters—that is, a sequence of code points. (Python ignores the 
cluster part, pretending code points are characters, at the cost of requiring 
every application to handle normalization manually. Which is normally a good 
tradeoff, but it does mean that you can’t even say whether two sequences of 
code points are the same string without calling a function.)

Meanwhile, there are multiple ways to store those code points as bytes. Python 
does whatever it wants under the covers, hiding it from the user. Obviously 
there is _some_ array of bytes somewhere in memory that represents the 
characters of the string in some way (I say “obviously”, but that isn’t always 
true in Swift, and isn’t even frequently true in Haskell…), but you don’t have 
access to that. If you want a sequence of bytes, you have to ask for a sequence 
in some specific representation, like UTF-8 or UTF-16-BE or Shift-JIS, which it 
creates for you on the fly (albeit cached in a few special cases).

So, from your system programmer’s perspective, in what useful sense is a 
character, or a string, a sequence of bytes?

And this is all still ignoring the fact that in Python, all values are “boxed” 
in an opaque structure that you can’t access from within the language, and even 
from the C API of CPython the box structure isn’t part of the API, so even 
something simpler like, say, an int isn’t usefully a sequence of 30-bit digits 
from the system programmer’s perspective, it’s an opaque handle that you can 
pass to functions to _obtain_ a sequence of 30-bit digits. (In the case of 
strings, you have to first pass to opaque handle to one function to see what 
format to ask for, then pass it to another to obtain a sequence of 1, 2, or 
4-byte integers representing the code points in native-endian ASCII, UCS2, or 
UCS4. Which normally you don’t do—you ask for a UTF-8 string or a UTF-32 string 
that may get constructed on the fly—but if you really do want the actual 
storage, this is the way to get it.)

And most of this is not peculiar to Python. In Swift, a string is a sequence of 
grapheme clusters. In Java, it’s a sequence of UTF-16 code units. In Go, it’s a 
sequence of UTF-8 code units. In Haskell, it’s a lazy linked list of code 
points. And so on. In some of those cases, a character does happen to be 
represented as a string of bytes within a larger representation, but even when 
it is, that still doesn’t mean you can usefully access it that way.

Of course a text file on disk is a sequence or bytes, and (if you know the 
encoding and normalization) you could operate directly on those. But you don’t; 
you pass the byte strings to a function that decodes them (and then sometimes 
to a second function that normalizes them into a canonical form) and then use 
your language’s string functions on the result. In fact, you probably don’t 
even do that; you let the file object buffer the byte strings however it wants 
to and just hand you decoded text objects, so you don’t even know which byte 
substrings exist in memory at any given time.(Languages with powerful 
optimizers or macro systems like Haskell or Rust might actually do that by 
translating all your string-function calls into calls directly on the steam of 
bytes, but from your perspective that’s entirely under the covers, and you’re 
doing the same thing you do in Python.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WUOPKW5KCTEJVC6APXRBJYKWVLB5ISHQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Chris Angelico
On Wed, Sep 4, 2019 at 12:43 AM Rhodri James  wrote:
>
> On 03/09/2019 15:27, Chris Angelico wrote:
> > On Tue, Sep 3, 2019 at 11:19 PM Rhodri James  wrote:
> >>
> >> On 03/09/2019 13:31, Chris Angelico wrote:
> >>> On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:
> 
>  On 31/08/2019 12:31, Chris Angelico wrote:
> > We call it a string, but a bytes object has as much in common with
> > bytearray and with a list of integers as it does with a text string.
> 
>  You say that as if text strings aren't sequences of bytes.  Complicated
>  and restricted sequences, I grant you, but no more so than a packet for
>  a given network protocol.
> 
> >>>
> >>> A text string is a sequence of characters. By "byte", I really mean
> >>> "octet", but Python prefers to say "byte".
> >>
> >> And a character is a byte or sequence of bytes. (Odd-sized bytes are
> >> pretty much history now, so for non-pendantic usages "byte" is good 
> >> enough.)
> >>
> >
> > But a character is not an octet.
>
> I get that you're distinguishing between the thing and its
> representation, but I'm coming at this as an embedded systems engineer.
> For me, it's turtles^Woctets all the way down.
>

Is an integer also a sequence of bytes? A float? A list? At some
level, everything's just stored as bytes in memory, but since there
are many possible representations of the same information, it's best
not to say that a character "is" a byte, but that it "can be stored
in" some number of bytes.

In Python, subscripting a text string gives you another text string.
Subscripting a list of integers gives you an integer. Subscripting a
bytearray gives you an integer. And (as of Python 3.0) subscripting a
bytestring also gives you an integer. Whether that's right or wrong
(maybe subscripting a bytestring should have been defined as yielding
a length-1 bytestring), subscripting a text string does not give an
integer, and subscripting a bytestring does not give a character.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AQZX3HB7GZ7BLDOYVLEIVPEKO7MUOCWG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Rhodri James

On 03/09/2019 15:27, Chris Angelico wrote:

On Tue, Sep 3, 2019 at 11:19 PM Rhodri James  wrote:


On 03/09/2019 13:31, Chris Angelico wrote:

On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:


On 31/08/2019 12:31, Chris Angelico wrote:

We call it a string, but a bytes object has as much in common with
bytearray and with a list of integers as it does with a text string.


You say that as if text strings aren't sequences of bytes.  Complicated
and restricted sequences, I grant you, but no more so than a packet for
a given network protocol.



A text string is a sequence of characters. By "byte", I really mean
"octet", but Python prefers to say "byte".


And a character is a byte or sequence of bytes. (Odd-sized bytes are
pretty much history now, so for non-pendantic usages "byte" is good enough.)



But a character is not an octet.


I get that you're distinguishing between the thing and its 
representation, but I'm coming at this as an embedded systems engineer. 
For me, it's turtles^Woctets all the way down.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MSYZSQOMRQ35GIPFYKTK3FHJQKAT3D5F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Chris Angelico
On Tue, Sep 3, 2019 at 11:19 PM Rhodri James  wrote:
>
> On 03/09/2019 13:31, Chris Angelico wrote:
> > On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:
> >>
> >> On 31/08/2019 12:31, Chris Angelico wrote:
> >>> We call it a string, but a bytes object has as much in common with
> >>> bytearray and with a list of integers as it does with a text string.
> >>
> >> You say that as if text strings aren't sequences of bytes.  Complicated
> >> and restricted sequences, I grant you, but no more so than a packet for
> >> a given network protocol.
> >>
> >
> > A text string is a sequence of characters. By "byte", I really mean
> > "octet", but Python prefers to say "byte".
>
> And a character is a byte or sequence of bytes. (Odd-sized bytes are
> pretty much history now, so for non-pendantic usages "byte" is good enough.)
>

But a character is not an octet.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5WBCSWDRSBI522KPD7FWEM7HHFJBWCSJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Rhodri James

On 03/09/2019 13:31, Chris Angelico wrote:

On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:


On 31/08/2019 12:31, Chris Angelico wrote:

We call it a string, but a bytes object has as much in common with
bytearray and with a list of integers as it does with a text string.


You say that as if text strings aren't sequences of bytes.  Complicated
and restricted sequences, I grant you, but no more so than a packet for
a given network protocol.



A text string is a sequence of characters. By "byte", I really mean
"octet", but Python prefers to say "byte".


And a character is a byte or sequence of bytes. (Odd-sized bytes are 
pretty much history now, so for non-pendantic usages "byte" is good enough.)



--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JGULW6CPFGGPKZVH7UB6EAGKM7Y77W5U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Chris Angelico
On Tue, Sep 3, 2019 at 10:27 PM Rhodri James  wrote:
>
> On 31/08/2019 12:31, Chris Angelico wrote:
> > We call it a string, but a bytes object has as much in common with
> > bytearray and with a list of integers as it does with a text string.
>
> You say that as if text strings aren't sequences of bytes.  Complicated
> and restricted sequences, I grant you, but no more so than a packet for
> a given network protocol.
>

A text string is a sequence of characters. By "byte", I really mean
"octet", but Python prefers to say "byte".

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YCFPY7A4CBKUV5TTBMLVG7JWDP3XSXMH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-03 Thread Rhodri James

On 31/08/2019 12:31, Chris Angelico wrote:

We call it a string, but a bytes object has as much in common with
bytearray and with a list of integers as it does with a text string.


You say that as if text strings aren't sequences of bytes.  Complicated 
and restricted sequences, I grant you, but no more so than a packet for 
a given network protocol.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4JOM265E5T6XIHZQQJPL5FMJUFRN267Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-02 Thread Chris Angelico
On Mon, Sep 2, 2019 at 9:56 PM Steven D'Aprano  wrote:
>
> On Sun, Sep 01, 2019 at 12:24:24PM +1000, Chris Angelico wrote:
>
> > Older versions of Python had text and bytes be the same things.
>
> Whether a string object is *text* is a semantic question, and
> independent of what data format you use. 'Hello world!' is text, whether
> you are using Python 1.5 or Python 3.8. '\x01\x06\x13\0' is not text,
> whether you are using Python 1.5 or Python 3.8.

Okay, so "string" and "text" are completely different concepts. Hold
that thought.

> > That
> > means that, for backward compatibility, they have some common methods.
> > But does that really mean that bytes can be uppercased?
>
> I'm curious what you think that b'chris angelico'.upper() is doing, if
> it is not uppercasing the byte-string b'chris angelico'. Is it a mere
> accident that the result happens to be b'CHRIS ANGELICO'?
>
> Unicode strings are sequences of code-points, abstract integers between
> 0 and 1114111 inclusive. When you uppercase the Unicode string 'chris
> angelico', you're transforming the sequence of integers:
>
> U+0063,0068,0072,0069,0073,0020,0061,006e,0067,0065,006c,0069,0063,006f
>
> to this sequence of integers:
>
> U+0043,0048,0052,0049,0053,0020,0041,004e,0047,0045,004c,0049,0043,004f
>
> If you are prepared to call that "uppercasing", you should be prepared
> to do the same for the byte-string equivalent.
>
> (For the avoidance of doubt: this is independent of the encoding used to
> store those code points in memory or on disk. Encodings have nothing to
> do with this.)

No, they're not decoded. What happens is an *assumption* that certain
bytes represent uppercaseable characters, and others do not. I
specifically chose my example such that the corresponding code points
both represented letters, and that the uppercased versions of each
land inside the first 256 Unicode codepoints; yet uppercasing the
bytestring changes one and not the other. Is it uppercasing the number
0x61 to create the number 0x41? No, it's assuming that it means "a"
and uppercasing it to "A".

> The formal definition of a string is a sequence of symbols from an
> alphabet. That is precisely what bytes objects are: the alphabet in this
> case is the 8-bit numbers 0 to 255 inclusive, which for usefulness,
> convenience and backwards compatibility can be optionally interpreted as
> the 7-bit ASCII character set plus another 128 abstract "characters".
>
>
> > > I said they were *strings*. Strings are not necessarily text, although
> > > they often are. Formally, a string is a finite sequence of symbols that
> > > are chosen from a set called an alphabet. See:
> > >
> > > https://en.wikipedia.org/wiki/String_%28computer_science%29
> >
> > A finite sequence of symbols... you mean like a list of integers
> > within the range [0, 255]? Nothing in that formal definition says that
> > a "string" of anything other than characters should be meaningfully
> > treated as text.
>
> Sure. If your bytes don't represent text, then methods like upper()
> probably won't do anything meaningful. It's still a string though.

I specifically said a *list* of integers. Like what you'd get if you
call list() on a bytestring. There's nothing in the formal definition
you gave that precludes this from being considered a string, yet it is
somehow, by your own words, fundamentally different.

> > > > I don't think it's necessary to be too adamant about "must be some
> > > > sort of thing-we-call-string" here. Let practicality rule, since
> > > > purity has already waved a white flag at us.
> > >
> > > It is because of *practicality* that we should prefer that things that
> > > look similar should be similar. Code is read far more often that it is
> > > written, and if you read two pieces of code that look similar, we should
> > > strongly prefer that they should actually be similar.
> >
> > And you have yet to prove that this similarity is actually a thing.
>
> I'm not sure the onus is on me to prove this. "Status quo wins a
> stalemate." And surely the onus is on those proposing the new syntax to
> demonstrate that it will be fine to use string delimiters as function
> calls.

Actually it is, because YOU are the one who said that quoted strings
should be restricted to "string-like" things. Would a Path literal be
sufficiently string-like to be blessed with double quotes? A regex
literal? An IP header, represented as a bytestring? What's a string
and what's not? Why are you trying to draw a line?

> You could make a good start by finding other languages, reasonably
> conventional languages with syntax based on the Algol or C tradition,
> that use quotes '' or "" to return arbitrary types.

I gave an example wherein a list/array is represented as
";foo;bar;quux" - does that count? (VX-REXX, if you're curious.)

> Anyway, the bottom line is this:
>
> I have no objection to using prefixed quotes to represent Unicode
> strings, or byte strings, or Andrew's hypothetical UTF-16 strings, or
> 

[Python-ideas] Re: Custom string prefixes

2019-09-02 Thread Steven D'Aprano
On Sun, Sep 01, 2019 at 12:24:24PM +1000, Chris Angelico wrote:

> Older versions of Python had text and bytes be the same things.

Whether a string object is *text* is a semantic question, and 
independent of what data format you use. 'Hello world!' is text, whether 
you are using Python 1.5 or Python 3.8. '\x01\x06\x13\0' is not text, 
whether you are using Python 1.5 or Python 3.8.


> That
> means that, for backward compatibility, they have some common methods.
> But does that really mean that bytes can be uppercased?

I'm curious what you think that b'chris angelico'.upper() is doing, if 
it is not uppercasing the byte-string b'chris angelico'. Is it a mere 
accident that the result happens to be b'CHRIS ANGELICO'?

Unicode strings are sequences of code-points, abstract integers between 
0 and 1114111 inclusive. When you uppercase the Unicode string 'chris 
angelico', you're transforming the sequence of integers:

U+0063,0068,0072,0069,0073,0020,0061,006e,0067,0065,006c,0069,0063,006f

to this sequence of integers:

U+0043,0048,0052,0049,0053,0020,0041,004e,0047,0045,004c,0049,0043,004f

If you are prepared to call that "uppercasing", you should be prepared 
to do the same for the byte-string equivalent.

(For the avoidance of doubt: this is independent of the encoding used to 
store those code points in memory or on disk. Encodings have nothing to 
do with this.)


[...]
> Or is it that
> we allow bytes to be treated as ASCII-encoded text, which is then
> uppercased, and then returned to being bytes?

I'm fairly confident that bytes methods aren't implemented by decoding 
to Unicode, applying the method, then re-encoding back to bytes. But 
even if they were, that's just an implementation detail.

Imagine a float method that internally converted the float to a pair of 
integers (numerator/denominator), operated on that fraction, and then 
re-converted back to a float. I'm sure you wouldn't want to say that 
this proves that floats aren't numbers.

The same applies to byte-strings. In the unlikely case that byte methods 
delegate to str methods, that doesn't mean byte-strings aren't strings. 
It just means that two sorts of strings can share a single 
implementation for their methods. Code reuse for the win!


[...]
> > py> b"\xe7\x61".upper()
> > b'\xe7A'
> >
> > Whether it is *meaningful* to do so is another question. But the same
> > applies to str.upper: just because you can call the method doesn't mean
> > that the result will be semantically valid.
> 
> So what did you actually do here? You took some bytes that represent
> an integer, 

For the sake of the argument I'll accept that *this particular* byte 
string represents an integer rather than a series of mixed binary data 
and ASCII text, or text in some unusual encoding, or pixels in an image, 
or any of a million other things it could represent.

That's absolutely fine: if it doesn't make sense to call .upper() on 
your bytes, then don't call .upper() on them. Precisely as you wouldn't 
call .upper() on a str object, if it didn't make sense to do so.


> and you called a method on it that makes no sense
> whatsoever, because now it represents a different integer.

The same applies to Unicode strings too. Any Unicode string method that 
transforms the input returns something that represents a different 
sequence of code-points, hence a different sequence of integers.

Shall we agree that neither bytes nor Unicode are strings? No, I don't 
think so either :-)


> If I were to decode that string to text
> and THEN uppercase it, it might give a quite different result:

Sure. If you perform *any* transformation on the data first, it might 
give a different result on uppercasing:

- if you reverse the bytes, uppercasing gives a different result;

- if you replace b'a' with b'e', uppercasing gives a different result

etc. And exactly the same observation applies to str objects:

- if you reverse the characters, uppercasing gives a different result;

- if you replace 'a' with 'e', uppercasing gives a different result.


> And if you choose some other encoding than Latin-1, you might get
> different results again.

Sure. The bytes methods like .upper() etc are predicated on the 
assumption that your bytes represent ASCII text. If your bytes represent 
something else, then calling the .upper() method may not be meaningful 
or useful.

In other words... if your bytes string came from an ASCII text file, 
it's probably safe to uppercase it. If your bytes string came from a 
JPEG, then uppercasing them will probably make a mess of the image, if 
not corrupt the file. So don't do that :-)

Analogy: ints support the unary minus operator. But if your int 
represents a mass, then negating it isn't meaningful. There's no such 
thing as -5 kg. Should we conclude from this that the int type in Python 
doesn't represent a number, and that the support of numeric operators 
and methods is merely for backwards compatibility? I think not.

The formal definition of 

[Python-ideas] Re: Custom string prefixes

2019-09-02 Thread Ivan Levkivskyi
On Mon, 2 Sep 2019 at 07:04, Pasha Stetsenko  wrote:

> > Don't say that this proposal won't be abused. Every one of the OP's
> > motivating examples is an abuse of the syntax, returning non-strings
> > from something that looks like a string.
>
> If you strongly believe that if something looks like a string it ought to
> quack like a string too, then we can consider 2 potential remedies:
>
> 1. Change the delimiter, for example use curly braces: `re{abc}`. This
> would still be parseable, since currently an id cannot be followed by a set
> or a dict. (The forward-slash, on the other hand, will be ambiguous).
>
> 2. We can also leave the quotation marks as delimiters. Once this feature
> is implemented, the IDEs will update their parsers, and will be emitting a
> token of "user-defined literal" type. Simply setting the color for this
> token to something different than your preferred color for strings will
> make it visually clear that those tokens aren't strings. Hence, no
> possibility for confusion.
>

Just to add my 2 cents: there are always two sides in each language
proposal: more flexibility/usability, and more language complexity.
These need to be compared and the comparison is hard because it is often
subjective. FWIW, I think in this case the added complexity outweighs
the benefits. I think only the very widely used literals (like numbers)
deserve their own syntax. For everything else it is fine to have few extra
keystrokes.

--
Ivan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GD2AAFCB262JSSFZVOMUM6EXPTFJHZ4F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-09-02 Thread Pasha Stetsenko
> Don't say that this proposal won't be abused. Every one of the OP's 
> motivating examples is an abuse of the syntax, returning non-strings 
> from something that looks like a string.

If you strongly believe that if something looks like a string it ought to quack 
like a string too, then we can consider 2 potential remedies:

1. Change the delimiter, for example use curly braces: `re{abc}`. This would 
still be parseable, since currently an id cannot be followed by a set or a 
dict. (The forward-slash, on the other hand, will be ambiguous).

2. We can also leave the quotation marks as delimiters. Once this feature is 
implemented, the IDEs will update their parsers, and will be emitting a token 
of "user-defined literal" type. Simply setting the color for this token to 
something different than your preferred color for strings will make it visually 
clear that those tokens aren't strings. Hence, no possibility for confusion.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SSVHMHEJIPPUWNTRSR7IFKUHY4C23TWA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-31 Thread Chris Angelico
On Sun, Sep 1, 2019 at 10:47 AM Steven D'Aprano  wrote:
>
> On Sat, Aug 31, 2019 at 09:31:15PM +1000, Chris Angelico wrote:
> > On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano  wrote:
> > > > So b"abc" should not be allowed?
> > >
> > > In what way are byte-STRINGS not strings? Unicode-strings and
> > > byte-strings share a significant fraction of their APIs, and are so
> > > similar that back in Python 2.2 the devs thought it was a good idea to
> > > try automagically coercing from one to the other.
> > >
> > > I was careful to write *string* rather than *str*. Sorry if that wasn't
> > > clear enough.
> > >
> >
> > We call it a string, but a bytes object has as much in common with
> > bytearray and with a list of integers as it does with a text string.
>
> I don't think that's true.

Older versions of Python had text and bytes be the same things. That
means that, for backward compatibility, they have some common methods.
But does that really mean that bytes can be uppercased? Or is it that
we allow bytes to be treated as ASCII-encoded text, which is then
uppercased, and then returned to being bytes?

> py> b'abc'.upper()
> b'ABC'
>
> py> [1, 2, 3].upper()
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'list' object has no attribute 'upper'
>
> In Python2, byte-strings and Unicode strings were both subclasses of
> type basestring. Although we have moved away from that shared base class
> in Python3, it does demonstrate that conceptually bytes and str are
> closely related to each other.

Or does it actually demonstrate that Python 3 maintains backward
compatibility with Python 2?

> > Is the contents of a MIDI file a "string"? I would say no, it's not -
> > but it can *contain* strings, eg for metadata and lyrics.
> > You can't upper-case the
> > variable-length-integer b"\xe7\x61" any more than you can upper-case
> > the integer 13281.
>
> Of course you can.
>
> py> b"\xe7\x61".upper()
> b'\xe7A'
>
> Whether it is *meaningful* to do so is another question. But the same
> applies to str.upper: just because you can call the method doesn't mean
> that the result will be semantically valid.

So what did you actually do here? You took some bytes that represent
an integer, and you called a method on it that makes no sense
whatsoever, because now it represents a different integer. There's no
sense in which your new bytes object represents an "upper-cased
version of" the integer 13281. If I were to decode that string to text
and THEN uppercase it, it might give a quite different result:

>>> b"\xe7\x61".decode("Latin-1").upper().encode("Latin-1")
b'\xc7A'

And if you choose some other encoding than Latin-1, you might get
different results again. I put it to you that bytes.upper() exists
more for backward compatibility with Python 2 than because a bytes
object is, in some way, uppercaseable.

> source = "def spam():\n\tpass\n"
> source = source.upper()  # no longer valid Python source code.

But it started out as text, and it is now uppercase text. When you do
that with bytes, you have to first layer in "this is actually encoded
text", and you are then able to destroy that.

> > Bytes and text have a long relationship, and as such, there are
> > special similarities. That doesn't mean that bytes ARE text,
>
> I didn't say that bytes are (human-readable) text. Although they can be:
> not every application needs Unicode strings, ASCII strings are still
> special, and there are still applications where once has to mix binary
> and ASCII text data.
>
> I said they were *strings*. Strings are not necessarily text, although
> they often are. Formally, a string is a finite sequence of symbols that
> are chosen from a set called an alphabet. See:
>
> https://en.wikipedia.org/wiki/String_%28computer_science%29

A finite sequence of symbols... you mean like a list of integers
within the range [0, 255]? Nothing in that formal definition says that
a "string" of anything other than characters should be meaningfully
treated as text.

> > I don't think it's necessary to be too adamant about "must be some
> > sort of thing-we-call-string" here. Let practicality rule, since
> > purity has already waved a white flag at us.
>
> It is because of *practicality* that we should prefer that things that
> look similar should be similar. Code is read far more often that it is
> written, and if you read two pieces of code that look similar, we should
> strongly prefer that they should actually be similar.

And you have yet to prove that this similarity is actually a thing.

> Would you be happy with a Pythonesque language that used prefixed
> strings as the delimiter for arbitrary data types?
>
> mylist = L"1, 2, None, {}, L"", 99.5"
>
> mydict = D"key: value, None: L"", "abc": "xyz""
>
> myset = S"1, 2, None"

At some point it's meaningless to call it a "Pythonesque" language,
but I've worked with plenty of languages that simply do not have data
types this rich, and so everything is 

[Python-ideas] Re: Custom string prefixes

2019-08-31 Thread Steven D'Aprano
On Thu, Aug 29, 2019 at 11:19:58PM +0100, Rob Cliffe via Python-ideas wrote:

> Just curious:  Is there any reason not to make decimal.Decimal a 
> built-in type?

Yes: it is big and complex, with a big complex API that is over-kill for 
the major motivating use-case for a built-in decimal type.

There might be a strong case for adding a fixed-precision decimal type, 
and leaving out the complex parts of the Decimal API: no variable 
precision, just a single rounding mode, no contexts, no traps. If you 
need the full API, use the decimal module; if you just need something 
like builtin floats, but in base 10, use the built-in decimal.

There have been at least two proposals. Neither have got so far as a 
PEP. If I recall correctly, the first suggested using Decimal64:

https://en.wikipedia.org/wiki/Decimal64_floating-point_format

the second suggested Decimal128:

https://en.wikipedia.org/wiki/Decimal128_floating-point_format


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EEQXTA5MFTMEMTJJJM2GVPLSTG33SQXQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-31 Thread Steven D'Aprano
On Sat, Aug 31, 2019 at 09:31:15PM +1000, Chris Angelico wrote:
> On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano  wrote:
> > > So b"abc" should not be allowed?
> >
> > In what way are byte-STRINGS not strings? Unicode-strings and
> > byte-strings share a significant fraction of their APIs, and are so
> > similar that back in Python 2.2 the devs thought it was a good idea to
> > try automagically coercing from one to the other.
> >
> > I was careful to write *string* rather than *str*. Sorry if that wasn't
> > clear enough.
> >
> 
> We call it a string, but a bytes object has as much in common with
> bytearray and with a list of integers as it does with a text string.

I don't think that's true.

py> b'abc'.upper()
b'ABC'

py> [1, 2, 3].upper()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'list' object has no attribute 'upper'

Shall I beat this dead horse some more by listing the other 33 methods 
that byte-strings share with Unicode-strings but not lists?

Compared to just two methods shared by all three of bytes, str and list, 
(namely count() and index()), and *zero* methods shared by bytes and 
list but not str.

In Python2, byte-strings and Unicode strings were both subclasses of 
type basestring. Although we have moved away from that shared base class 
in Python3, it does demonstrate that conceptually bytes and str are 
closely related to each other.


> Is the contents of a MIDI file a "string"? I would say no, it's not -
> but it can *contain* strings, eg for metadata and lyrics.

Don't confuse *human-readable native language strings* for generic 
strings. "Hello world!" is a string, but so are '\x02^xs\0' and 
b'DEADBEEF'.


> You can't upper-case the
> variable-length-integer b"\xe7\x61" any more than you can upper-case
> the integer 13281.

Of course you can.

py> b"\xe7\x61".upper()
b'\xe7A'

Whether it is *meaningful* to do so is another question. But the same 
applies to str.upper: just because you can call the method doesn't mean 
that the result will be semantically valid.

source = "def spam():\n\tpass\n"
source = source.upper()  # no longer valid Python source code.


> Those common methods are mostly built on the
> assumption that the string contains ASCII text.

As they often do. If they don't, then don't call the text methods which 
don't make sense in context.

Just as there are cases where text methods don't make sense on Unicode 
strings. You wouldn't want to call .casefold() on a password, or 
.lstrip() on a line of Python source code.


[...]
> Bytes and text have a long relationship, and as such, there are
> special similarities. That doesn't mean that bytes ARE text, 

I didn't say that bytes are (human-readable) text. Although they can be: 
not every application needs Unicode strings, ASCII strings are still 
special, and there are still applications where once has to mix binary 
and ASCII text data.

I said they were *strings*. Strings are not necessarily text, although 
they often are. Formally, a string is a finite sequence of symbols that 
are chosen from a set called an alphabet. See:

https://en.wikipedia.org/wiki/String_%28computer_science%29



> I don't think it's necessary to be too adamant about "must be some
> sort of thing-we-call-string" here. Let practicality rule, since
> purity has already waved a white flag at us.

It is because of *practicality* that we should prefer that things that 
look similar should be similar. Code is read far more often that it is 
written, and if you read two pieces of code that look similar, we should 
strongly prefer that they should actually be similar.

Would you be happy with a Pythonesque language that used prefixed 
strings as the delimiter for arbitrary data types?

mylist = L"1, 2, None, {}, L"", 99.5"

mydict = D"key: value, None: L"", "abc": "xyz""

myset = S"1, 2, None"


That's what this proposal wants: string syntax that can return arbitrary 
data types.

How about using quotes for function calls?

assert chr"9" == "\t"

assert ord"9" == 57

That's what this proposal wants: string syntax for a subset of function 
calls.

Don't say that this proposal won't be abused. Every one of the OP's 
motivating examples is an abuse of the syntax, returning non-strings 
from something that looks like a string.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BCIIWV2KMETDPB7M2OUMXRXK6A6CVHGJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-31 Thread Chris Angelico
On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano  wrote:
> > So b"abc" should not be allowed?
>
> In what way are byte-STRINGS not strings? Unicode-strings and
> byte-strings share a significant fraction of their APIs, and are so
> similar that back in Python 2.2 the devs thought it was a good idea to
> try automagically coercing from one to the other.
>
> I was careful to write *string* rather than *str*. Sorry if that wasn't
> clear enough.
>

We call it a string, but a bytes object has as much in common with
bytearray and with a list of integers as it does with a text string.
Is the contents of a MIDI file a "string"? I would say no, it's not -
but it can *contain* strings, eg for metadata and lyrics. The MIDI
file representation of an integer might be stored in a byte-string,
but the common API between text strings and byte strings is going to
be mostly irrelevant here. You can't upper-case the
variable-length-integer b"\xe7\x61" any more than you can upper-case
the integer 13281. Those common methods are mostly built on the
assumption that the string contains ASCII text.

There are a few string-like functions that truly can be used with
completely binary data, and which actually do make a lot more sense on
a byte string than on, say, a list of integers. Notably, finding a
particular byte sequence can be done without knowing what the bytes
actually mean (and similarly bytes.split(), which does the same sort
of search), and you can strip off trailing b"\0" without needing to
give much meaning to the content. But I cannot recollect *ever* using
these methods on any bytes object that wasn't storing some form of
encoded text.

Bytes and text have a long relationship, and as such, there are
special similarities. That doesn't mean that bytes ARE text, any more
than a compiled regex is text just because it's traditional to
describe a regex in a textual form. Path objects also blur the "is
this text?" line, since you can divide a Path by a string to
concatenate them, and there are ways of smuggling arbitrary bytes
through them.

I don't think it's necessary to be too adamant about "must be some
sort of thing-we-call-string" here. Let practicality rule, since
purity has already waved a white flag at us.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7P43IPPY6WPTQ24QDLGFJC2IEBZTEXCL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-31 Thread Steven D'Aprano
On Thu, Aug 29, 2019 at 02:10:21PM -0700, Andrew Barnert wrote:

[...]
> And most of the string affixes people have suggested are for 
> string-ish things.

I don't think that's correct. Looking back at the original post in this 
thread, here are the motivating examples:

[quote]

There are quite a few situations where this can be used:
- Fraction literals: `frac'123/4567'`
- Decimals: `dec'5.34'`
- Date/time constants: `t'2019-08-26'`
- SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
- Regular expressions: `rx'[a-zA-Z]+'`
- Version strings: `v'1.13.0a'`
- etc.

[/quote]

By my count, that's zero out of six string-ish things. There may have 
been other proposals, but I haven't trolled through the entire thread to 
find them.


> I’m not sure what a “version string” is, but I 
> might design that as an actual subclass of str that adds extractor 
> methods and overrides comparison.

A version object is a record with fields, most of which are numeric. 
For an existing example, see sys.version_info which is a kind of named 
tuple, not a string.

The version *string* is just a nice human-readable representation. It 
doesn't make sense to implement string methods on a Version object. Why 
would you offer expandtabs(), find(), splitlines(), translate(), 
isspace(), capitalise(), etc methods? Or * and + (repetition and 
concatenation) operators? I cannot think of a single string 
method/operator that a Version object should implement.


> A compiled regex isn’t literally a 
> string, but neither is a bytes; it’s still clearly _similar_ to a 
> string, in important ways. 

It isn't clear to me how a compiled regex object is "similar" to a 
string. The set of methods offered by both regexes and strings is pretty 
small, by my generous count it is just two methods:

- str.split and SRE_Pattern.split;

- str.replace and SRE_Pattern.sub

neither of which use the same API or have the same semantics. Compiled 
regex objects don't offer string methods like translate, isdigits, 
upper, encode, etc. I would say that they are clearly *not* strings.


[...]
> And versions of the proposal that allow delimiters other than quotes 
> so you can write things like regex/a.*b/, well, I’d need to see a 
> specific proposal to be sure, but that seems even less objectionable 
> in this regard. That looks like nothing else in Python, but it looks 
> like a regex in awk or sed or perl, so I’d probably read it as a regex 
> object.

Why do you need the "regex" prefix? Assuming the parser and the human 
reader can cope with using / as both a delimiter and a operator (which 
isn't a given!) /.../ for a regex object seems fine to me.

I suspect that this is going to be ambiguous though:

target = regex/a*b/ +x

could be:

target = ((regex / a) * b) / ( unary-plus x)

or 

target = (regex object) + x

so maybe we do need a prefix.


> > Let me suggest some design principles that should hold for languages 
> > with more-or-less "conventional" syntax. Languages like APL or Forth 
> > excluded.
> > 
> > - anything using ' or " quotation marks as delimiters (with or without 
> >  affixes) ought to return a string, and nothing but a string;
> 
> So b"abc" should not be allowed?

In what way are byte-STRINGS not strings? Unicode-strings and 
byte-strings share a significant fraction of their APIs, and are so 
similar that back in Python 2.2 the devs thought it was a good idea to 
try automagically coercing from one to the other.

I was careful to write *string* rather than *str*. Sorry if that wasn't 
clear enough.


> Let’s say I created a native-UTF16-string type to deal with some 
> horrible Windows or Java stuff. Why would this principle of yours 
> suggest that I shouldn’t be allowed to use u16"" just like b””?

It is a utf16 STRING so making it look like a STRING is perfectly fine.


[...]
> > - as a strong preference, anything using quotation marks as delimiters
> >  ought to be processed at compile-time (f-strings are a conspicuous 
> >  exception to that principle);
> 
> I don’t see why you should even want to _know_ whether it’s true, much 
> less have a strong preference.

Because I care about performance, at least a bit. Because I don't want 
to write code that is unnecessarily slow, for some definition of 
"unnecessary". Because I want to be able to reason (at least in broad 
terms) about the cost of certain operations.

Because I want to be able to reason about the semantics of my code.

Why do I write 1234 instead of int("1234")? The second is longer, but it 
is more explicit and it is self-documenting: the reader knows that its 
an int because it says so right there in the code, even if they come 
from Javascript where 1234 is an IEEE-754 float.

Assuming the builtin int() hasn't be shadowed.

But it's also wastefully slow.

If we are genuinely indifferent to the difference, then we should be 
equally indifferent to a proposal to replace the LOAD_CONST byte-code 
for ints as follows:

dis("1234")  # 

[Python-ideas] Re: Custom string prefixes

2019-08-30 Thread Paul Moore
On Thu, 29 Aug 2019 at 22:12, Andrew Barnert via Python-ideas
 wrote:
> As I’ve said before, I believe that anything that doesn’t have a builtin type 
> does not deserve builtin syntax. And I don’t understand why that isn’t a 
> near-ubiquitous viewpoint. But it’s not just you; at least three people (all 
> of whom dislike the whole concept of custom affixes) seem at least in 
> principle open to the idea of adding builtin affixes for types that don’t 
> exist. Which makes me think it’s almost certainly not that you’re all crazy, 
> but that I’m missing something important. Can you explain it to me?

In my case, it's me that had missed something - namely the whole of this point.

I can imagine having builtin syntax for a stdlib type (like Decimal,
Fraction, or regex), but I agree that it gives the stdlib special
privileges which I'm uncomfortable with. I definitely agree that built
in syntax for 3rd party types is unacceptable.

That quite probably contradicts some of my earlier statements - just
assume I was wrong previously, I'm not going to bother going back over
what I said and correcting my comments :-) I remain of the opinion
that the benefits of user-defined literals would be sufficiently
marginal that they wouldn't justify the cost, though.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/W3FKSALZCBG2B22H47NRIFFHAWDRNC67/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Andrew Barnert via Python-ideas
On Aug 29, 2019, at 16:58, Greg Ewing  wrote:
> 
> Steven D'Aprano wrote:
>> I don't think that stpa...@gmail.com means that the user literally assigns 
>> to locals() themselves. I read his proposal as having the compiler 
>> automatical mangle the names in some way, similar to name mangling inside 
>> classes.
> 
> Yes, but at some point you have to define a function to handle
> your string prefix. If it's at the module level then it's no
> problem, because you can do something like
> 
>   globals()["~f"] = lambda: ...

What happens if you do this, and then include "~f" in __all__, and then import 
* from that module?

I personally would rather have my prefixes or suffixes available in every 
module that imports them, without needing to manually register them each time. 
Not a huge deal, and if nobody else agrees, fine. But if I could __all__ it, I 
could get what I want anyway. :)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SPS3KQGRUWRVSRLUG2CFX6QYRK4SKCU6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Greg Ewing

Steven D'Aprano wrote:
I don't think that stpa...@gmail.com means that the user literally 
assigns to locals() themselves. I read his proposal as having the 
compiler automatical mangle the names in some way, similar to name 
mangling inside classes.


Yes, but at some point you have to define a function to handle
your string prefix. If it's at the module level then it's no
problem, because you can do something like

   globals()["~f"] = lambda: ...

But you can't do that for locals. So mangling to something
unspellable would effectively preclude having string prefixes
local to a function.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IHCCBYVDDRKC4KMKOJFQ3QFY2QWNPS7M/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Greg Ewing

Rhodri James wrote:
Suppose that we did have some funky mechanism to get the compiler to 
create objects at compile time


It doesn't necessarily have to be at compile time. It can be at run
time, as long as it only happens once.

So we use "start_date" somewhere, and mutate it because the start date 
for some purpose was different.  Then we use it somewhere else, and it's 
not the start date we thought it was.  This is essentially the mutable 
default argument gotcha, just writ globally.


I don't think this is as much of a problem as it seems. We often
assign things to globals that are intended to be treated as constants,
with the understanding that it's our responsibility to refrain from
mutating them.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NHB5WD4S5ZXUOICHFTFLK5INZPJRZRL2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Eric V. Smith




One way to handle this particular case would be to do it as a variant
of f-string that doesn't join its arguments, but passes the list to
some other function. Just replace the final step BUILD_STRING step
with BUILD_LIST, then call the function. There'd need to be some way
to recognize which sections were in the literal and which came from
interpolations (one option is to simply include empty strings where
necessary such that it always starts with a literal and then
alternates), but otherwise, the "sql" manager could do all the
escaping it wants. However, this wouldn't be enough to truly
parameterize a query; it would only do escaping into the string
itself.

Another option would be to have a single variant of f-string that,
instead of creating a string, creates a "string with formatted
values". That would then be a single object that can be passed around
as normal, and if conn.execute() received such a string, it could do
the proper parameterization.


See PEP 501: https://www.python.org/dev/peps/pep-0501/

Eric
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JMYEWFPO7XVLAX5VD7TBPNQW53SM3ZPN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Rob Cliffe via Python-ideas

On 29/08/2019 22:10:21, Andrew Barnert via Python-ideas wrote:


As I’ve said before, I believe that anything that doesn’t have a builtin type 
does not deserve builtin syntax. And I don’t understand why that isn’t a 
near-ubiquitous viewpoint.

+1 (maybe that means I'm missing something).
Just curious:  Is there any reason not to make decimal.Decimal a 
built-in type?  It's tried and tested.  There are situations where 
floats are appropriate, and others where Decimals are appropriate (I'm 
currently using it myself); conceptually I see them as on an equal 
footing.  If it were built-in, there would be good reason to accept 
1.23d meaning a Decimal literal (distinct from a float literal), whether 
or not (any part of) the OP was adopted.

Rob Cliffe

  But it’s not just you; at least three people (all of whom dislike the whole 
concept of custom affixes) seem at least in principle open to the idea of 
adding builtin affixes for types that don’t exist. Which makes me think it’s 
almost certainly not that you’re all crazy, but that I’m missing something 
important. Can you explain it to me?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GZF2UHWTJNNREOMUEB3HB5BISNHYXFZH/
Code of Conduct: http://python.org/psf/codeofconduct/


---
This email has been checked for viruses by AVG.
https://www.avg.com

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HZTLBTRRK3YMN5S4E3IM77AY4G3LELIP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Andrew Barnert via Python-ideas
On Aug 29, 2019, at 07:52, Steven D'Aprano  wrote:
> 
>> On Thu, Aug 29, 2019 at 05:30:39AM -0700, Andrew Barnert wrote:
>>> On Aug 29, 2019, at 04:58, Steven D'Aprano  wrote:
>>> 
>>> - quote marks are also used for function calls, but only a limited 
>>> subset of function calls (those which take a single string literal 
>>> argument).
>> 
>> This is a disingenuous argument.
>> 
>> When you read spam.eggs, of course you know that that means to call 
>> the __getattr__('eggs') method on spam. But do you actually read it as 
>> a special method calling syntax that’s restricted to taking a single 
>> string that must be an identifier as an argument
> 
> You make a good point about abstractions, but you are missing the 
> critical point that spam.eggs *doesn't look like a string*. Things that 
> look similar should be similar; things which are different should not 
> look similar.

Which is exactly why you’d read 1.23dec or 1.23f as a number, because it looks 
like a number and also acts like a number, rather than as a function call that 
takes the string '1.23', even if you know that’s how it’s implemented.

And most of the string affixes people have suggested are for string-ish things. 
I’m not sure what a “version string” is, but I might design that as an actual 
subclass of str that adds extractor methods and overrides comparison. A 
compiled regex isn’t literally a string, but neither is a bytes; it’s still 
clearly _similar_ to a string, in important ways. And so is a path, or a URL 
(although I don’t know what you’d use the url prefix for in Python, given that 
we don’t have a string-ish type like ObjC’s NSURL to return and I don’t think 
we need one, but presumably whoever wrote the url affix would be someone who 
disagreed and packaged the prefix with such a class).

And versions of the proposal that allow delimiters other than quotes so you can 
write things like regex/a.*b/, well, I’d need to see a specific proposal to be 
sure, but that seems even less objectionable in this regard. That looks like 
nothing else in Python, but it looks like a regex in awk or sed or perl, so I’d 
probably read it as a regex object.

> I acknowledge your point (and the OP's) that many things in Python are 
> ultimately implemented as function calls. But none of those things look 
> like strings:
> 
> - The argument to the import statement looks like an identifier 
>  (since it is an identifier, not an arbitrary string);
> 
> - The argument to __getattr__ etc looks like an identifier
>  (since it is an identifier, not an arbitrary string);
> 
> - The argument to __getitem__ is an arbitrary expression, not just
>  a string.

The arguments to the dec and f affix handlers look like numeric literals, not 
arbitrary strings.

The arguments to path and version are… probably string literal representations 
(with the quotes and all), not arbitrary strings. Although that does depends on 
the details of the specific proposal, if _any_ of your killer uses needs 
uncooked strings, then either you rcome up with something over complicated like 
C++ where you can register three different kinds of affixes, or you just always 
pass uncooked strings (because it’s trivial to cook on demand but impossible to 
de-cook).

And the arguments to regex may be some _other_ kind of restricted special 
string that… I don’t think anyone has tried to define yet, but you can vaguely 
imagine what it would have to be like, and it certainly won’t be any arbitrary 
string.

> Let me suggest some design principles that should hold for languages 
> with more-or-less "conventional" syntax. Languages like APL or Forth 
> excluded.
> 
> - anything using ' or " quotation marks as delimiters (with or without 
>  affixes) ought to return a string, and nothing but a string;

So b"abc" should not be allowed?

Let’s say I created a native-UTF16-string type to deal with some horrible 
Windows or Java stuff. Why would this principle of yours suggest that I 
shouldn’t be allowed to use u16"" just like b””?

This is a design guideline for affixes, custom or otherwise. Which could be 
useful as a filter on the list of proposed uses, to see if any good ones remain 
(and if no string affix uses remain, then of course the proposal is either 
useless or should be restricted to just numbers or whatever), but it can’t be 
an argument against all affixes, or against custom affixes, or anything else 
generic like that.

> - as a strong preference, anything using quotation marks as delimiters
>  ought to be processed at compile-time (f-strings are a conspicuous 
>  exception to that principle);

I don’t see why you should even want to _know_ whether it’s true, much less 
have a strong preference.

Here are things you probably really do care about: (a) they act like strings, 
(b) they act like constants, (c) if there are potential issues parsing them, 
you see those issues as soon as possible, (d) working with them is more than 
fast enough. Compile time is neither necessary 

[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Andrew Barnert via Python-ideas
> On Aug 29, 2019, at 06:40, Rhodri James  wrote:
> 
> However, it sounds like what you really want is something I've often really 
> wanted to -- a way to get the compiler to pre-create "constant" objects for 
> me.

People often say they want this, but does anyone actually ever have a good 
reason for it?

I was taken in by the lure of this idea myself—all those wasted frozenset 
constructor calls! (This was before the peephole optimizer understood 
frozensets.) Of course I hadn’t even bothered to construct the frozensets from 
tuples instead of lists, which should have been a hint that I was in premature 
optimization mode, and should have been the first thing I tried before going 
off the deep end. But hacking bytecode is fun, so I sat down and wrote a 
bytecode processor that let me replace any expression with a LOAD_CONST, much 
as the builtin optimizer does for things like simple arithmetic. It’s easy to 
hook it up to a decorator to call on a function, or to an import hook to call 
at module compile time. And then, finally, it’s time to benchmark and discover 
that it makes no difference. Stripping things down to something trivial enough 
to be tested… aha, I really was saving 13us, it’s just that 13us is not 
measurable in code that takes seconds to run.

Maybe someone has a real use case where it matters. But I’ve never seen one. I 
tried to find good nails for my shiny new hammer and never found one, and 
eventually just stopped maintaining it. And then I revived it when I wrote my 
decimal literal hack (the predecessor to the more general user literal hack I 
linked earlier in the thread) back during the 2015 iteration of this 
discussion, but again couldn’t come up with a plausible example where those 
2.3d pseudo-literals were measurably affecting performance and needed 
constifying; I don’t think I even bothered mentioning it in that thread.

Also, even if you find a problem, it‘s almost always easy to work around today. 
If the constant is constructed inside a loop, just manually lift it out of the 
loop. If it’s in a function body, this is effectively the same problem as 
global or builtin lookups being too slow inside a function body, and can be 
solved the same way, with a keyword parameter with a default value. And if the 
Python community thinks that _sin=sin is good enough for the uncommon problem 
of lookups significantly affecting performance, surely of 
_vals=frozenset((1,2,3)) is also good enough for that far more uncommon 
problem, and therefore _limit=1e1000dec would also be good enough for the new 
but probably even more uncommon one.

(Also, notice that the param default can be used with mutable values, it’s just 
up to you to make sure you don’t accidentally mutate them; an invisible 
compiler optimization couldn’t do that, at least not without something like 
Victor Stinner’s FAT guards.)

For what it’s worth, I actually found my @constify decorator more readable than 
the param default, especially for global functions—but not nearly enough so 
that it’s worth using a hacky, CPython-specific module that I have to maintain 
across Python versions (and byteplay to byteplay3 to bytecode) and that nobody 
else is using. Or to propose for a builtin (or stdlib but magic) feature.

What this all comes down to is that, despite my initial impression, I really 
don’t care whether Python thinks 1.23d is a constant value or not; I only care 
whether the human reader thinks it is one.

Think about it this way: do you know off the top of your head whether (1, 
(2,3)) gets optimized to a const the same way (1,2) does in CPython? Has it 
ever occurred to you to check before I asked? And this is actually something 
that changed relatively recently. Why would someone who doesn’t even think 
about when tuples are constified want to talk about how to force Python to 
constify other types? Because even years of Python experience hasn’t cured us 
of premature-optimization-itis.


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/L53N6XRIQ7CL43B2R2ZVB3IIFHNK5XD2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Pasha Stetsenko
> There's no such thing, though, any more than there's such a thing as a
> "raw string". There are only two types of string in Python - text and
> bytes. You can't behave differently based on whether you were given a
> triple-quoted, raw, or other string literal.

A simple implementation could be something like:

@register_literal_prefix("sql")
class SqlLiteral(str): pass

class Connection:
...
def execute(self, stmt):
if isinstance(stmt, SqlLiteral):
# proceed as usual
...
else:
throw TypeError("Expected sql'' string")
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6D2E2GFBGKELBZB23PMT4OMEDZIJWFJD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Richard Damon
On 8/29/19 11:14 AM, Chris Angelico wrote:
> On Fri, Aug 30, 2019 at 3:51 AM Pasha Stetsenko  wrote:
>> My understanding is that for a sql prefix the most valuable part is to be 
>> able
>> to know that it was created from a literal. No other magic, definitely not
>> auto-executing. Then it would be legal to write
>>
>> result = conn.execute(sql"SELECT * FROM people WHERE id=?",
>>   user_id)
>>
>> but not
>>
>> result = conn.execute(f"SELECT * FROM people WHERE id={user_id}")
>>
>> In order to achieve this, the `execute()` method only has to look at
>> the type of its argument, and throw an error if it's a plain string.
> There's no such thing, though, any more than there's such a thing as a
> "raw string". There are only two types of string in Python - text and
> bytes. You can't behave differently based on whether you were given a
> triple-quoted, raw, or other string literal.

But isn't the idea of the sql" (or other) prefix was that the 'plain
string' was put through a special function that processes it, and that
function could return an object of some other type, so it could detect
the difference.

>
>> Perhaps with some more imagination we can make
>>
>> result = conn.execute(sql"SELECT * FROM people WHERE id={user_id}")
>>
>> work too, but in this case the `sql"..."` token would only create an
>> `UnpreparedStatement` object, which expects a variable named "user_id",
>> and then the `conn.execute()` method would pass locals()/globals() into
>> the `.prepare()` method of that statement, binding those values to
>> the placeholders. Crucially, the `.prepare()` method shouldn't modify the
>> object, but return a new PreparedStatement, which then gets executed
>> by the `conn.execute()`.
> One way to handle this particular case would be to do it as a variant
> of f-string that doesn't join its arguments, but passes the list to
> some other function. Just replace the final step BUILD_STRING step
> with BUILD_LIST, then call the function. There'd need to be some way
> to recognize which sections were in the literal and which came from
> interpolations (one option is to simply include empty strings where
> necessary such that it always starts with a literal and then
> alternates), but otherwise, the "sql" manager could do all the
> escaping it wants. However, this wouldn't be enough to truly
> parameterize a query; it would only do escaping into the string
> itself.
>
> Another option would be to have a single variant of f-string that,
> instead of creating a string, creates a "string with formatted
> values". That would then be a single object that can be passed around
> as normal, and if conn.execute() received such a string, it could do
> the proper parameterization.
>
> Not sure either of them would be worth the hassle, though.
>
> ChrisA
=

-- 
Richard Damon
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JNVP4DU6S3NXQ3MAXOF6XXY3E6VGKVSL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Chris Angelico
On Fri, Aug 30, 2019 at 3:51 AM Pasha Stetsenko  wrote:
>
> My understanding is that for a sql prefix the most valuable part is to be able
> to know that it was created from a literal. No other magic, definitely not
> auto-executing. Then it would be legal to write
>
> result = conn.execute(sql"SELECT * FROM people WHERE id=?",
>   user_id)
>
> but not
>
> result = conn.execute(f"SELECT * FROM people WHERE id={user_id}")
>
> In order to achieve this, the `execute()` method only has to look at
> the type of its argument, and throw an error if it's a plain string.

There's no such thing, though, any more than there's such a thing as a
"raw string". There are only two types of string in Python - text and
bytes. You can't behave differently based on whether you were given a
triple-quoted, raw, or other string literal.

> Perhaps with some more imagination we can make
>
> result = conn.execute(sql"SELECT * FROM people WHERE id={user_id}")
>
> work too, but in this case the `sql"..."` token would only create an
> `UnpreparedStatement` object, which expects a variable named "user_id",
> and then the `conn.execute()` method would pass locals()/globals() into
> the `.prepare()` method of that statement, binding those values to
> the placeholders. Crucially, the `.prepare()` method shouldn't modify the
> object, but return a new PreparedStatement, which then gets executed
> by the `conn.execute()`.

One way to handle this particular case would be to do it as a variant
of f-string that doesn't join its arguments, but passes the list to
some other function. Just replace the final step BUILD_STRING step
with BUILD_LIST, then call the function. There'd need to be some way
to recognize which sections were in the literal and which came from
interpolations (one option is to simply include empty strings where
necessary such that it always starts with a literal and then
alternates), but otherwise, the "sql" manager could do all the
escaping it wants. However, this wouldn't be enough to truly
parameterize a query; it would only do escaping into the string
itself.

Another option would be to have a single variant of f-string that,
instead of creating a string, creates a "string with formatted
values". That would then be a single object that can be passed around
as normal, and if conn.execute() received such a string, it could do
the proper parameterization.

Not sure either of them would be worth the hassle, though.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6LEZYLI6KJ2WXWZM2C6PVD3STD5LF2QU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Pasha Stetsenko
> How does one get a value into locals()["re~"]?

You're right, I didn't think about that. I agree with Steven's
interpretation that the user is not expected to modify locals
herself, still the immutable nature of locals presents a
considerable challenge.

So I'm thinking that perhaps we could change that to
`globals()["re~"]`, where globals are in fact mutable and 
can even be modified by the user. This would make it so
that affixes can only be declared at a module level, similar
to how `from library import *` is not allowed in a function
either.

This is probably a saner approach anyways -- if affixes 
could mean different things in different functions, that
could be quite confusing...
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XXMC3HQVBTKF5X7ROG4IBZYTH66KZPLB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Pasha Stetsenko
My understanding is that for a sql prefix the most valuable part is to be able
to know that it was created from a literal. No other magic, definitely not 
auto-executing. Then it would be legal to write

result = conn.execute(sql"SELECT * FROM people WHERE id=?",
  user_id)

but not

result = conn.execute(f"SELECT * FROM people WHERE id={user_id}")

In order to achieve this, the `execute()` method only has to look at
the type of its argument, and throw an error if it's a plain string.

Perhaps with some more imagination we can make

result = conn.execute(sql"SELECT * FROM people WHERE id={user_id}")

work too, but in this case the `sql"..."` token would only create an 
`UnpreparedStatement` object, which expects a variable named "user_id",
and then the `conn.execute()` method would pass locals()/globals() into
the `.prepare()` method of that statement, binding those values to
the placeholders. Crucially, the `.prepare()` method shouldn't modify the
object, but return a new PreparedStatement, which then gets executed
by the `conn.execute()`.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y4ISQCWYFNC5DNGUQYRXY5IZMOYUAYVP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Pasha Stetsenko
My understanding is that for a sql prefix the most valuable part is to be able
to know that it was created from a literal. No other magic, definitely not 
auto-executing. Then it would be legal to write

result = conn.execute(sql"SELECT * FROM people WHERE id=?",
  user_id)

but not

result = conn.execute(f"SELECT * FROM people WHERE id={user_id}")

In order to achieve this, the `execute()` method only has to look at
the type of its argument, and throw an error if it's a plain string.

Perhaps with some more imagination we can make

result = conn.execute(sql"SELECT * FROM people WHERE id={user_id}")

work too, but in this case the `sql"..."` token would only create an 
`UnpreparedStatement` object, which expects a variable named "user_id",
and then the `conn.execute()` method would pass locals()/globals() into
the `.prepare()` method of that statement, binding those values to
the placeholders. Crucially, the `.prepare()` method shouldn't modify the
object, but return a new PreparedStatement, which then gets executed
by the `conn.execute()`.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WSY76HP4JMV3VNVOJLCF55ILGW3W7WMM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Richard Damon
On 8/26/19 4:03 PM, stpa...@gmail.com wrote:
> In Python strings are allowed to have a number of special prefixes:
>
> b'', r'', u'', f'' 
> + their combinations.
>
> The proposal is to allow arbitrary (or letter-only) user-defined prefixes as 
> well.
> Essentially, a string prefix would serve as a decorator for a string, 
> allowing the
> user to impose a special semantics of their choosing.
>
> There are quite a few situations where this can be used:
> - Fraction literals: `frac'123/4567'`
> - Decimals: `dec'5.34'`
> - Date/time constants: `t'2019-08-26'`
> - SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
> - Regular expressions: `rx'[a-zA-Z]+'`
> - Version strings: `v'1.13.0a'`
> - etc.
>
> This proposal has been already discussed before, in 2013:
> https://mail.python.org/archives/list/python-ideas@python.org/thread/M3OLUURUGORLUEGOJHFWEAQQXDMDYXLA/
>
> The opinions were divided whether this is a useful addition. The opponents
> mainly argued that as this only "saves a couple of keystrokes", there is no
> need to overcomplicate the language. It seems to me that now, 6 years later, 
> that argument can be dismissed by the fact that we had, in fact, added new
> prefix "f" to the language. Note how the "format strings" would fall squarely
> within this framework if they were not added by now.
>
> In addition, I believe that "saving a few keystroked" is a worthy goal if it 
> adds
> considerable clarity to the expression. Readability counts. Compare:
>
> v"1.13.0a"
> v("1.13.0a")
>
> To me, the former expression is far easier to read. Parentheses, especially as
> they become deeply nested, are not easy on the eyes. But, even more 
> importantly,
> the first expression much better conveys the *intent* of a version string. It 
> has
> a feeling of an immutable object. In the second case the string is passed to 
> the
> constructor, but the string has no meaning of its own. As such, the second
> expression feels artificial. Consider this: if the feature already existed, 
> how *would*
> you prefer to write your code?
>
> The prefixes would also help when writing functions that accept different 
> types
> of their argument. For example:
>
> collection.select("abc")   # find items with name 'abc'
> collection.select(rx"[abc]+")  # find items that match regular expression
>
> I'm not discussing possible implementation of this feature just yet, we can 
> get to
> that point later when there is a general understanding that this is worth 
> considering.

I have seen a lot of discussion on this but haven't seen a few points
that I thought of brought up. One solution to all these would be to have
these be done as suffixes,

Python currently has a number of existing prefixes to strings that are
valid, and it might catch some people when they want to use a
combination that is currently a valid prefix. (It has been brought up
that this converts an invalid prefix from an immediately diagnosable
syntax error to a run time error.)

This also means that it becomes very hard to decide to add a new prefix
as that would now have a defined meaning.

A second issue is that currently some of the prefixes (like r) change
how the string literal is parsed. These means that the existing prefixes
are just a slightly special case of the general rules, but need to be
treated very differently, or perhaps somehow the prefix needs to
indicate what standard prefix to use to parse the string. Some of your
examples could benefit by sometimes being able to use r' and sometimes
not, so being able to say both r'string're or 'string're could be useful.

-- 
Richard Damon
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VXIDEEE3225UIKWJOROCJVIESXBJIS2O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Paul Moore
On Thu, 29 Aug 2019 at 15:54, Steven D'Aprano  wrote:

> Let me suggest some design principles that should hold for languages
> with more-or-less "conventional" syntax. Languages like APL or Forth
> excluded.

This will degenerate into nitpicking very fast, so let me just say
that I understand the general idea that you're trying to express here.
I don't entirely agree with it, though, and I think there are some
fairly common violations of your suggestion below that make your
arguments less persuasive than maybe you'd like.

> - anything using ' or " quotation marks as delimiters (with or without
>   affixes) ought to return a string, and nothing but a string;

In C, Java and C++, 'x' is an integer (char).
In SQL (some dialects, at least) TIMESTAMP'2019-08-22 11:32:12' is a
TIMESTAMP value.
In Python, b'123' is a bytes object (which maybe you're willing to
classify as "a string", but the line blurs quite fast).

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XPL2VXD55GL7VCM7TO36MI4ZAECEJFUS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Steven D'Aprano
On Thu, Aug 29, 2019 at 05:30:39AM -0700, Andrew Barnert wrote:
> On Aug 29, 2019, at 04:58, Steven D'Aprano  wrote:
> > 
> > - quote marks are also used for function calls, but only a limited 
> > subset of function calls (those which take a single string literal 
> > argument).
> 
> This is a disingenuous argument.
> 
> When you read spam.eggs, of course you know that that means to call 
> the __getattr__('eggs') method on spam. But do you actually read it as 
> a special method calling syntax that’s restricted to taking a single 
> string that must be an identifier as an argument

You make a good point about abstractions, but you are missing the 
critical point that spam.eggs *doesn't look like a string*. Things that 
look similar should be similar; things which are different should not 
look similar.

I acknowledge your point (and the OP's) that many things in Python are 
ultimately implemented as function calls. But none of those things look 
like strings:

- The argument to the import statement looks like an identifier 
  (since it is an identifier, not an arbitrary string);

- The argument to __getattr__ etc looks like an identifier
  (since it is an identifier, not an arbitrary string);

- The argument to __getitem__ is an arbitrary expression, not just
  a string.

All three are well understood to involve runtime lookups: modules must 
be searched for and potentially compiled, object superclass inheritance 
hierarchies must be searched; items or keys in a list or dict must be 
looked up. None of them suggest a constant literal in the same way that 
"" string delimiters do.

The large majority of languages follow similar principles, allowing for 
usually minor syntactic differences. Some syntactic conventions are very 
weak, and languages can and do differ greatly. But some are very, very 
strong, e.g.:

123.4567 is nearly always a numeric float of some kind, rather 
than ((say) multiplying two ints;

' and/or " are nearly always used for delimiting strings.

Even languages like Forth, which have radically different syntax to 
mainstream languages, sort-of follows that convention of associating
quote marks with strings.

." outputs the following character string, terminating at 
the next " character.

i.e. ." foo" in Forth would be more or less equivalent to print("foo") 
in Python.

Let me suggest some design principles that should hold for languages 
with more-or-less "conventional" syntax. Languages like APL or Forth 
excluded.

- anything using ' or " quotation marks as delimiters (with or without 
  affixes) ought to return a string, and nothing but a string;

- as a strong preference, anything using quotation marks as delimiters
  ought to be processed at compile-time (f-strings are a conspicuous 
  exception to that principle);

- using affixes for numeric types seems like a fine idea, and languages
  like Julia that offer a wide-range of builtin numeric types show 
  that this works fine; in Python2 we used to have native ints and 
  longints that took a L suffix so there's precedent there.


[...]
> And the same goes for regex"a.*b" or 1.23f as well. Of course you’ll 
> know that under the covers that means something like calling 
> __whatever_registry__['regex'] with the argument "a.*b", but you’re 
> going to think of it as a regex object 

No I'm not. I'm going to think of it as a *string*, because it looks 
like a string.

Particularly given the OP's preference for single-letter prefixes.

1.23f doesn't look like a string, it looks like a number. I have no 
objection to that in principle, although of course there is a question 
whether float32 is important enough to justify either builtin syntax or 
custom, user-defined syntax.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RQQFV5AJCVJHYSYUVM2UQ2HQOLU6KBMV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Rhodri James

On 29/08/2019 14:40, Rhodri James wrote:

Pace Stephen's point


My apologies, it was Steven's point.

--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H4PWNBVMQWNTD25X7L3DW36FCP4R5Y2L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Steven D'Aprano
On Thu, Aug 29, 2019 at 08:17:39PM +1200, Greg Ewing wrote:
> stpa...@gmail.com wrote:
> 
> >re'a|b|c'  --becomes-->  (locals()["re~"])("a|b|c")
> >2.3f   --becomes-->  (locals()["~f"])("2.3")
> 
> How does one get a value into locals()["re~"]?

I don't think that stpa...@gmail.com means that the user literally 
assigns to locals() themselves. I read his proposal as having the 
compiler automatical mangle the names in some way, similar to name 
mangling inside classes.

The transformation from prefix re to mangled name 're~' is easy, the 
compiler could surely handle that, but I'm not sure how the other side 
of it will work. How does one register that re.compile (say) is to be 
aliased as the prefix 're'? I'm fairly sure we don't want to allow ~ in 
identifiers:

# not this
re~ = re.compile

I'm still not convinced that we need this parallel namespace idea, even 
in a watered down version as name-mangling. Why not just have the prefix 
X call name X for any valid name X (apart from the builtin prefixes)? I 
still am not convinced that is a good idea, but at least the complexity 
is significantly reduced.


P.S. stpa...@gmail.com if you're reading this, it would be nice if you 
signed your emails with a name, so we don't have to refer to you by your 
email address or as "the OP".

-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/W45RVSWEBM22EXZQ4DGE5KP7WNIRPWCG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Steven D'Aprano
On Thu, Aug 29, 2019 at 09:58:35PM +1000, Steven D'Aprano wrote:

> Since Python is limited to ASCII syntax, we only have a small number of 
> symbols suitable for delimiters. With such a small number available, 

Oops, I had an interrupted thought there.

With such a small number available, there is bound to be some 
duplication, but it tends to be fairly consistent across the majority of 
conventional programming languages.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5BDUSFIRL2CC47R73HFIUEX2EX2K77N2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Rhodri James

On 29/08/2019 00:24, Andrew Barnert wrote:

On Aug 27, 2019, at 10:21, Rhodri James  wrote:


You make the point yourself: this is something we already understand from 
dealing with complex numbers in other circumstances.  That is not true of 
generic single-character string prefixes.


It certainly is true for 1.23f.


I would contend that (and anyway 1.23f is redundant; 1.23 is already a 
float literal).  But anyway I said "generic single-character string 
prefixes", because that's what the original proposal was.  You seem to 
be going off on creating literal syntax for standard library types 
(which, for the record, I think is a good idea and deserves its own 
thread), but that's not what the OP seems to be going for.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/326USPRMZIYO4WBCEWV4HJETQTTIVKMY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Paul Moore
On Thu, 29 Aug 2019 at 14:21, Andrew Barnert  wrote:
> You can’t avoid tradeoffs by trying to come up with a rule that makes 
> language decisions automatically. (If you could, why would this list even 
> exist?) The closest thing you can get to that is the vague and 
> self-contradictory and facetious but still useful Zen.

Sorry, I wasn't trying to imply that you could. Just that choosing to
implement some, but not all, possible literal affixes on a case by
case basis was a valid language design option, and one that is taken
in many cases. Your statement

> Think about it this way; assuming f and frac and dec and re and sql and so on 
> are useful, out options are:
>
> 1) people don’t get a useful feature
> 2) we add user-defined affixes
> 3) we add all of these as builtin affixes
>
> While #3 theoretically isn’t impossible, it’s wildly implausible, and 
> probably a bad idea to boot, so the realistic choice is between 1 and 2.

seemed to imply that you thought it was an "all or nothing" choice. My
apologies if I misunderstood your point.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3KOQHR5TSNVLCVLOZNGXWWSRW5UHYWLX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Rhodri James

On 28/08/2019 23:01, stpa...@gmail.com wrote:

you have something that looks like a kind of string czt'...'
but is really a function call that might return absolutely
anything at all;


This is kinda the whole point. I understand, of course, how the
idea of a string-that-is-not-a-string may sound blasphemous,
however I invite you to look at this from a different perspective.


I don't think it's blasphemous.  I think it's misleading, and that's far 
worse.



Today's date is 2019-08-28. The date is a moment in time, or
perhaps a point in the calendar, but it is certainly not a string.
How do we write this date in Python? As
`datetime("2019-08-28")`. We are forced to put the date into
a string and pass that string into a function to create an actual
datetime object.


Pace Stephen's point that this is not in fact how datetime works, this 
has the major advantage of being readable.  My thought processes on 
coming across that in code would go something like; "OK, we have a 
function call.  Judging from the name its something to do with dates and 
times, so the result is going to be some date/time thing.  Oh, I 
remember seeing "from datetime import datetime" at the top, so I know 
where to look it up if it becomes important.  Fine.  Moving on."



With this proposal the code would look something like
`dt"2019-08-28"`. You're right, it's not a string anymore. But
it *should not* have been a string to begin with, we only used
a string there because Python didn't offer us any other way.
Now with prefixed strings the justice is finally done: we are
able to express the notion of  directly.


Here my thoughts would be more like; "OK, this is some kind of special 
string.  I wonder what "dt" means.  I wonder where I look it up.  The 
string looks kind of like a date in ISO order, bear that in mind.  Maybe 
"dt" is "date/time"."  Followed a few lines later by "wait, why are we 
calling methods on that string that don't look like string methods? 
WTF?  Maybe "dt" means "delirium tremens".  Abort!  Abort!"


Obviously I've played this up a bit, but the point remains that even if 
I do work out that "dt" is actually a secret function call, I have to go 
back and fix my understanding of the code that I've already read.  This 
significantly increases the chance that my understanding will be wrong. 
This is a Bad Thing.



And the fact that it may still use strings under the hood to
achieve the desired result is really an implementation detail,
that may even change at some point in the future.


If all that dt"string" gives us is a run-time call to dt("string"), it's 
a complete non-starter as far as I'm concerned.  It's adding confusion 
for no real gain.  However, it sounds like what you really want is 
something I've often really wanted to -- a way to get the compiler to 
pre-create "constant" objects for me.  The trouble is that after 
thinking about it for a bit, it almost always turns out that I don't 
want that after all.


Suppose that we did have some funky mechanism to get the compiler to 
create objects at compile time so we don't have the run-time creation 
cost to contend with.  For the sake of argument, let's make it


  start_date = $datetime(2019,8,28)

(I know this syntax would be laughed out of court, but like I said, for 
the sake of argument...)


So we use "start_date" somewhere, and mutate it because the start date 
for some purpose was different.  Then we use it somewhere else, and it's 
not the start date we thought it was.  This is essentially the mutable 
default argument gotcha, just writ globally.


The obvious cure for that would be to have our compile-time created 
objects be immutable.  Leaving aside questions like how we do that, and 
whether contained containers are immutable, and so on, we still have the 
problem that we don't actually want an immutable object most of the 
time.  I find that almost invariably I need to use the constant as a 
starting point, but tweak it somehow.  Perhaps like in the example 
above, the start date is different for a particular purpose.  In that 
case I need to copy the immutable object to a mutable version, so I have 
all the object creation shenanigans to go through anyway, and that 
saving I thought I had has gone away.


I'm afraid these custom string prefixes won't achieve what I think you 
want to achieve, and they will make code less readable in the process.



the OP still hasn't responded to my question about the ambiguity
of the proposal (is czt'...' a one three-letter prefix, or three
one-letter prefixes?)


Sorry, I thought this part was obvious. It's a single three-letter prefix.


So how do you distinguish the custom prefix "br" from a raw byte string? 
 Existing syntax allows prefixes to stack, so there's inherent 
ambiguity in multi-character prefixes.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to 

[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Andrew Barnert via Python-ideas
On Aug 29, 2019, at 00:58, Paul Moore  wrote:
> 
> If you
> assume everything should be handled by general mechanisms, you end up
> at the Lisp/Haskell end of the spectrum. If you decide that the
> language defines the limits, you are at the C end.

And if you don’t make either assumption, but instead judge each case on its own 
merits, you end up with a language which is better than languages at either 
extreme.

There are plenty of cases where Python generalizes beyond most languages (how 
many languages use the same feature for async functions and sequence iteration? 
or get metaclasses for free by having only one “kind” and then defining both 
construction and class definitions as type calls?), and plenty where It doesn’t 
generalize as much as most languages, and its best features are found all 
across that spectrum.

You can’t avoid tradeoffs by trying to come up with a rule that makes language 
decisions automatically. (If you could, why would this list even exist?) The 
closest thing you can get to that is the vague and self-contradictory and 
facetious but still useful Zen.

If you really did try to zealously pick one side or the other, always avoiding 
general solutions whenever a hardcoded solution is simpler no matter what, the 
best-case scenario would be something like Go, where a big ecosystem of codegen 
tools defeats your attempt to be zealous and makes your language actually 
usable despite your own efforts until soon you start using those tools even in 
the stdlib.

Also, I’m not sure the spectrum is nearly as well defined as you imply in the 
first place. It’s hard to find a large C project that doesn’t use the hell out 
of preprocessor macros to effectively create custom syntax for things like 
error handling and looping over collections (not to mention M4 macros to 
autoconf the code so it’s actually portable instead of just theoretically 
portable), and meanwhile Haskell’s syntax is chock full of special-purpose 
features you couldn’t build yourself (would anyone even use the language 
without, say, do blocks?).

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DIOIGVP4EA5GID4DFZGJT2HPMDLNBA7Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Andrew Barnert via Python-ideas
On Aug 29, 2019, at 04:58, Steven D'Aprano  wrote:
> 
> - quote marks are also used for function calls, but only a limited 
> subset of function calls (those which take a single string literal 
> argument).

This is a disingenuous argument.

When you read spam.eggs, of course you know that that means to call the 
__getattr__('eggs') method on spam. But do you actually read it as a special 
method calling syntax that’s restricted to taking a single string that must be 
an identifier as an argument, or do you read it as accessing the eggs member? 
Of course you read it as member access, not as a special restricted calling 
syntax (except in rare cases—e.g., you’re debugging a 
__getattribute__), because to do otherwise would be willfully obtuse to do so, 
and would actively impede your understanding of the code. And the same goes for 
lots of other cases, like [1:7].

And the same goes for regex"a.*b" or 1.23f as well. Of course you’ll know that 
under the covers that means something like calling 
__whatever_registry__['regex'] with the argument "a.*b", but you’re going to 
think of it as a regex object or a float object, not as a special restricted 
calling syntax, unless you want to actively impede your understanding of the 
code.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JGDXZSDXGHFAHSPIS5MCKDDWJJ2WVOV2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Steven D'Aprano
On Wed, Aug 28, 2019 at 10:01:25PM -, stpa...@gmail.com wrote:

> > you have something that looks like a kind of string czt'...' 
> > but is really a function call that might return absolutely 
> > anything at all;
> 
> This is kinda the whole point.

Yes, I understand that. And that's one of the reasons why I think that 
this is a bad idea.

Since Python is limited to ASCII syntax, we only have a small number of 
symbols suitable for delimiters. With such a small number available, 

- parentheses () are used for grouping and function calls;
- square brackets [] are used for lists and subscripting;
- curly brackets {} are used for dicts and sets;
- quote marks are used for bytes and strings;

And with your proposal:

- quote marks are also used for function calls, but only a limited 
subset of function calls (those which take a single string literal 
argument).

Across a large majority of languages, it is traditional and common to 
use round brackets for grouping and function calls, and square and curly 
brackets for collections. There are a handful of languages, like 
Mathematica, which use [] for function calls.






> I understand, of course, how the 
> idea of a string-that-is-not-a-string may sound blasphemous,

Its not a matter of blasphemy. It's a matter of readability and 
clarity.


> however I invite you to look at this from a different perspective.
> 
> Today's date is 2019-08-28. The date is a moment in time, or 
> perhaps a point in the calendar, but it is certainly not a string.
> How do we write this date in Python? As 
> `datetime("2019-08-28")`. We are forced to put the date into
> a string and pass that string into a function to create an actual
> datetime object.

We are "forced" to write that are we? Have you ever tried it?


py> from datetime import datetime
py> datetime("2019-08-28")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: an integer is required (got type str)


> With this proposal the code would look something like 
> `dt"2019-08-28"`. You're right, it's not a string anymore. But
> it *should not* have been a string to begin with, we only used
> a string there because Python didn't offer us any other way.

py> datetime(2019, 8, 28)
datetime.datetime(2019, 8, 28, 0, 0)


It is difficult to take your argument seriously when so much of it rests 
on things which aren't true.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RFAZPMCGCPO4JOHLBHLTE5KNCA5RP6LN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Greg Ewing

stpa...@gmail.com wrote:


re'a|b|c'  --becomes-->  (locals()["re~"])("a|b|c")
2.3f   --becomes-->  (locals()["~f"])("2.3")


How does one get a value into locals()["re~"]?

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SSUP6MFT2XU2BOZKIT4TBGBEMIPQHZW2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-29 Thread Paul Moore
On Thu, 29 Aug 2019 at 01:18, Andrew Barnert  wrote:
> > Also, it's worth noting that the benefits of *user-defined* literals
> > are *not* the same as the benefits of things like 0.2f, or 3.14d, or
> > even re/^hello.*/. Those things may well be useful. But the benefit
> > you gain from *user-defined* literals is that of letting the end user
> > make the design decisions, rather than the language designer. And
> > that's a subtly different thing.
>
> That’s a good point, but I think you’re missing something big here.
>
> Think about it this way; assuming f and frac and dec and re and sql and so on 
> are useful, out options are:
>
> 1) people don’t get a useful feature
> 2) we add user-defined affixes
> 3) we add all of these as builtin affixes
>
> While #3 theoretically isn’t impossible, it’s wildly implausible, and 
> probably a bad idea to boot, so the realistic choice is between 1 and 2.

That's a completely different point. Built in affixes are defined by
the language, user defined affixes are defined by the user
(obviously!) That includes all aspects of design - both how a given
affix works, and whether it's justified to have an affix at all for a
given use case. The argument is identical to that of user-defined
operators vs built in operators. If you can use this argument to
justify user-defined affixes, it applies equally to user-defined
operators, which is something that has been asked for far more often,
with much more widespread precedents in other languages, and been
rejected every time.

Regarding your cases #1, #2, and #3, this is the fundamental point of
language design - you have to choose whether a feature is worthwhile
(in the face of people saying "well *I* would find it useful), and
whether to provide a general mechanism or make a judgement on which
(if any) use cases warrant a special-case language builtin. If you
assume everything should be handled by general mechanisms, you end up
at the Lisp/Haskell end of the spectrum. If you decide that the
language defines the limits, you are at the C end. Traditionally,
Python has been a lot closer to the "language defined" end of the
scale than the "general mechanisms" end. You can argue whether that's
good or bad, or even whether things should change because people have
different expectations nowadays, but it's a fairly pervasive design
principle, and should be treated as such.

This actually goes back to the OP's point:

> we can get to that point later when there is a general understanding that 
> this is worth considering

The biggest roadblock to a "general understanding that this is worth
considering" is precisely that Python has traditionally avoided
(over-) general mechanisms for things like this. The obvious other
example, as I mentioned above, being user defined operators. I've been
very careful *not* to use the term "Pythonic" here, as it's too easy
for that to be a way of just saying "my opinion is more correct than
yours" without a real justification, but the real stumbling block for
proposals like this tends to be far less about the technical issues,
and far *more* about "does this fit into the philosophy of Python as a
language, that has made it as successful as it is?" My instinct is
that it doesn't fit well with Python's general philosophy.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AIB4VA56WQ2Z26GD37ITJHD64OQVVDYT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Andrew Barnert via Python-ideas
On Aug 28, 2019, at 12:45, stpa...@gmail.com wrote:
> 
> In the thread from
> 2013 where this issue was discussed, many people wanted `sql"..."`
> literal to be available as literal and nothing else.

Since this specific use has come up a few times—and a similar feature in other 
languages—can you summarize exactly what people want from this one?

IIRC, DB-API 2.0 doesn’t have any notion of compiled statements, or bound 
statements, just this:

Connection.execute(statement: str, *args) -> Cursor

So the only thing I can think of is that sql"…" is a shortcut for that. Maybe:

curs = sql"SELECT lastname FROM person WHERE firstname={firstname}"

… which would do the equivalent of:

curs = conn.execute("SELECT lastname FROM person WHERE firstname=?", 
firstname)

… except that it knows whether your particular database library uses ? or %s or 
whatever for SQL params.

I can see how that could be useful, but I’m not sure how it could be easily 
implemented.

First, it has to know where to find your connection object. Maybe the library 
that exposes the prefix requires you to put the connection in a global (or 
threadlocal or contextvar) with a specific name, or manages a pool of 
connections that it stores in its own module or something? But that seems 
simultaneously too magical and too restrictive. 

And then it has to do f-string-style evaluation of the brace contents, in your 
scope, to get the args to pass along. Which I’d assume means that prefix 
handlers need to get passed locals and globals, so the sql prefix handler can 
eval each braced expression? (Even that wouldn’t be as good as f-strings, but 
it might be good enough here?)

Even with all that, I‘m pretty sure I’d never use it. I’m often willing to 
bring magic into my database API, but only if I get a lot more magic (an 
expression-builder library, a full-blown ORM, that thing that I forget the name 
of that translates generators into SQL queries quasi-LINQ-style, etc.). But 
maybe there are lots of people who do want just this much magic and no more. Is 
this roughly what people are asking for?

If so, is that eval magic needed for any other examples you’ve seen besides 
sql? It’s definitely not needed for regexes, paths, really-raw strings, or any 
of the numeric examples, but if it is needed for more than one good example, 
it’s probably still worth looking at whether it’s feasible.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TIDUG5ZWIX2ATV7QZMYULQCFPERF3LMI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Andrew Barnert via Python-ideas
On Aug 28, 2019, at 01:05, Paul Moore  wrote:
> 
> On Wed, 28 Aug 2019 at 05:04, Andrew Barnert via Python-ideas
>  wrote:
>> What matters here is not whether things like the OP’s czt'abc' or my 1.23f 
>> or 1.23d are literals to the compiler, but whether they’re readable ways to 
>> enter constant values to the human reader.
>> 
>> If so, they’re useful. Period.
>> 
>> Now, it’s possible that even though they’re useful, the feature is still not 
>> worth adding because of Chris’s issue that it can be abused, or because 
>> there’s an unavoidable performance cost that makes it a bad idea to rely on 
>> them, or because they’re not useful in _enough_ code to be worth the effort, 
>> or whatever. Those are questions worth discussing. But arguing about whether 
>> they meet (one of the three definitions of) “literal” is not relevant.
> 
> Extended (I'm avoiding the term "custom" for now) literals like 0.2f,
> 3.14D, re/^hello.*/ or qw{a b c} have a fairly solid track record in
> other languages, and I think in general have proved both useful and
> straightforward in those languages. And even in Python, constructs
> like f-strings and complex numbers are examples of such things.
> However, I know of almost no examples of other languages that have
> added *user-definable* literal types (with the notable exception of
> C++, and I don't believe I've seen use of that feature in user code -
> which is not to say that it's not used). That to me says that there
> are complexities in extending the question to user-defined literals
> that we need to be careful of.

Agreed 100%. That’s why I think we need a more concrete proposal, that includes 
at least some thought on implementation, before we can go any farther, as I 
said in my first reply.

The OP wanted to get some feeling of whether at least some people might find 
some version of this useful before going further. I think we’ve got that now 
(the fact that not 100% of the responders agree doesn’t change that), so we 
need to get more detailed now.

My own proposal was just to answer the charge that any design will inherently 
be impossible or magical or complicated by giving a design that is none of 
those. It shouldn’t be taken as any more than that. If there are good use cases 
for prefixes, prefixes plus suffixes, etc., then my proposal can’t get you 
there, so let’s wait for the OP’s

> Some specific
> questions which would need to be dealt with:
> 
> 1. What is valid in the "literal" part of the construct (this is the
> p"C:\" question)?

I think this pretty much has to be either (a) exactly what’s valid in the 
equivalent literals today, or (b) something equally simple to describe, and 
parse, even if it’s different (like really-raw strings, or perlesque regex with 
delimiters other than quotes, or whatever).

Either way, I think you want to use the same rule for all affixed literals, not 
allow a choice of different ones like C++ does.

> 2. How do definitions of literal syntax get brought into scope in time
> for the parser to act on them (this is about "import xyz_literal"
> making xyz"a string" valid but leaving abc"a string" as a syntax
> error)?

I don’t know that this is actually necessary. If `abc"a string"` raises an 
error at execution time rather than compile time, yes, that’s different from 
how most syntax errors work today, but is it really unacceptable? (Notice that 
in the most typical case, the error still gets raised from importing the module 
or from the top level of the script—but that’s just the most typical case, not 
all cases—you could get those errors from, say, calling a method, which you 
don’t normally expect.)

There’s clearly a trade off here, because the only other alternative (at least 
that I’ve thought of or seen from anyone else; I’d love to be wrong) is that 
what you’ve imported and/or registers affects how later imports work (and 
doesn’t that mean some kind of registry hash needs to get encoded in .pyc files 
or something too?). While that is normal for people who use import hooks, most 
people don’t use import hooks most of the time, and I suspect that weirdness 
would be more off-putting than the late errors.

Another big one: How do custom prefixes interact with builtin string prefixes? 
For suffixes, there’s no problem suffixing, say, a b-string, but for prefixes, 
there is. If this is going to be allowed, there are multiple ways it could be 
designed, but someone has to pick one and specify it.

(Actually, for suffixes, there _is_ a similar issue: is `1.2jd` a `d` suffix on 
the literal `1.2j`, or a `jd` suffix on `1.2`? I think the former, because it’s 
a trivially simple rule that doesn’t need to touch any of the rest of the 
grammar. Plus, not only is it likely to never matter, but on the rare cases 
where it does matter, I think it’s the rule you’d want. For example, if I 
created my own ComplexDecimal class and wanted to use a suffix for it, why 
would I want to define both `d` and `jd` instead of just 

[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 10:21, Rhodri James  wrote:
> 
> You make the point yourself: this is something we already understand from 
> dealing with complex numbers in other circumstances.  That is not true of 
> generic single-character string prefixes.

It certainly is true for 1.23f.

And, while 1.23d for a decimal or 1/3F for a Fraction may not be identical to 
any other context, it’s a close-enough analogy that it’s immediately familiar. 
Although I might actually prefer 1.23dec or 1/3frac or something more explicit 
in those cases. (Fortunately, there’s nothing in the design stopping me from 
doing that.)

As for string prefixes, I don’t think those should usually, or maybe even ever, 
be single-character. People have given examples like sql"…" (I’m still not sure 
exactly what that does, but it’s apparently used in other languages for 
something?) and regex"…" and path"…" (which are a lot more obvious). I’m not 
sure if they actually are useful, which is why my proposal didn’t have them; 
I’m waiting on the OP to give more complete examples, cite similar uses from 
other languages, etc. But I doubt the problem you’re talking about, that they’d 
all be unfamiliar cryptic one-letter things, is likely to arise.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CL6TTK737GD5KCAJKUL3CACZBBHSHVU3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread stpasha
> all of which hugely outweighs the gain of being able to avoid a pair 
> of parentheses.
 
Thank you for summarizing the main objections so succinctly, 
otherwise it becomes too easy to get lost in the discussion. Let me
try to answer them as best as I can:


> you have something that looks like a kind of string czt'...' 
> but is really a function call that might return absolutely 
> anything at all;

This is kinda the whole point. I understand, of course, how the 
idea of a string-that-is-not-a-string may sound blasphemous,
however I invite you to look at this from a different perspective.

Today's date is 2019-08-28. The date is a moment in time, or 
perhaps a point in the calendar, but it is certainly not a string.
How do we write this date in Python? As 
`datetime("2019-08-28")`. We are forced to put the date into
a string and pass that string into a function to create an actual
datetime object.

With this proposal the code would look something like 
`dt"2019-08-28"`. You're right, it's not a string anymore. But
it *should not* have been a string to begin with, we only used
a string there because Python didn't offer us any other way.
Now with prefixed strings the justice is finally done: we are
able to express the notion of  directly.

And the fact that it may still use strings under the hood to
achieve the desired result is really an implementation detail,
that may even change at some point in the future.


> you have a redundant special case for calling functions that
> take a single argument, but only if that argument is a string
> literal;

There are many things in python that are in fact function calls
in disguise. Decorators? function calls. Imports? function calls.
Class definition? function call. Getters/setters? function calls.
Attribute access? function calls. Even a function call is a 
function call via `__call__()`. I may be oversimplifying a bit, but
the point is that just because something can be written as a
function call doesn't mean it's the most natural way of doing it.

Besides, there are use cases (such as `sql'...'`) where people
do actually want to have a function that is constrained to string
literals only.

Having said that, prefixed (suffixed) strings (numbers) are not 
*exactly* equivalent to function calls. The points of difference 
are:
- prefixes/suffixes are namespaced separately from regular
  variable names.
- their results can be automatically memoized, bringing them
  closer to builtin literals.


> you encourage people to write cryptic single-character 
> functions, like v(), x(), instead of meaningful expressions
> like Version() and re.compile();

Which is why I suggested to put them in a separate 
namespace. You're right that function `v()` is cryptic and 
should be avoided. But a prefix `v"..."` is neither a function
nor a variable, it's ok for it to be short. The existing string
prefixes are all short after all.


> you encourage people to defer parsing that could be efficiently 
> done in your head at edit time into slow and likely inefficient
> string parsing done at runtime;

I don't encourage such thing, it's just that most often there is no
other way. For example, consider regular expression `[0-9]+`. 
I can "parse it in my head" to understand that it means a 
sequence of digits, but how exactly am I supposed to convey
this understanding to Python?

Or perhaps I can parse "2019-08-28" in my head, and write in
Python `datetime(year=2019, month=8, day=28)`. However, such
form would greatly reduce readability of the code from humans'
perspective. And human readability matters more than computer
readability, for now.

In fact, purely from the efficiency perspective, the prefixed strings
can potentially have better performance because they are
auto-memoized, while `datetime("2019-08-28")` needs to re-parse
its input string every time (or add its own internal memoization, 
but even that would be less efficient because it doesn't know the
input is a literal string).


> the OP still hasn't responded to my question about the ambiguity
> of the proposal (is czt'...' a one three-letter prefix, or three 
> one-letter prefixes?)

Sorry, I thought this part was obvious. It's a single three-letter prefix.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XU6XI4NIHKQGZ7IKSBNSJ6SRYMDVYLZD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread stpasha
> A really good example here is the p"C:\" question. Is the 
> proposal that the "string part" of the literal is just a normal 
> string? If so, then how do you address this genuine issue 
> that not all paths are valid? What about backslash-escapes 
> (p"C:\temp")? Is the string a raw string or not? If the 
> proposal is that the path-literal code can define how the 
> string is parsed, then how does that work?

I don't usually work with windows, but I can see how this could
be a pain point for windows users. They need both backslashes
and the quotation marks in their paths.

As nobody has suggested yet how to deal with the problem,
I'd like to give it a try. Behold:

p{C:\}

The part within the curly braces is considered a "really-raw"
string. The "really-raw" means that every character is interpreted
exactly as it looks, there are no escape characters. Internal braces
will be allowed too, provided that they are properly nested:

p{C:\"Program Files"\{hello}\}

If you **need** to have unmatched braces in the string, your last
hope is the triple-braced literal:

p{{{Letter Ж looks like }|{... }}}

The curly braces can only be used with a string prefix (or suffix?).

And while we're at it, why not allow chained literals:

re{(\w+)}{"\1"}
frac{1}{17}
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RDDQSNZF3WT47FXIVXJTVPIO3DQN5Z52/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread stpasha
> In addition, there is the question of how user-defined literals would
> get turned into constants within the code.

So, I'm just brainstorming here, but how about the following
approach:

- Whenever a compiler sees `abc"def"`, it creates a constant of
  the type `ud_literal` with fields `.prefix="abc"`, `.content="def"`.

- When it compiles a function then instead of `LOAD_CONST n`
  op it would emit `LOAD_UD_CONST n` op.

- This new op first checks whether its argument is a "ud_literal",
  and if so calls the '.resolve()` method on that argument. The 
  method should call the prefix with the content, producing an
  object that the LOAD_UD_CONST op stores back in the 
  `co_consts` storage of the function. It is a TypeError for the
  resolve method to return another ud_literal.

- Subsequent calls to the LOAD_UD_CONST op will see that
  the argument is no longer a ud-literal, and will return it as-is.

This system would allow each constant to be evaluated only 
once and subsequently memoized, and only compute those 
constants that will actually be used.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JMLDHB4W44TZP5I72KQC55VIEJ56A5RW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread stpasha
> Ouch! That's adding a lot of additional complexity to the language. 
> ...
> This proposal adds a completely separate, parallel set of scoping rules 
> for these string prefixes. How many layers in this parallel scope?

Right, having a parallel set of scopes sounds like WAY too much work.
Which is why I didn't want to start my proposal with a particular 
implementation -- I simply don't have enough experience for that.
Still, we can brainstorm possible approaches, and come up with 
something that is feasible.

For example, how about this: prefixes/suffixes "live" in the same local
scope as normal variables, however, in order to separate them from 
the normal variables, their names get mangled into something that is
not a valid variable name. Thus,

re'a|b|c'  --becomes-->  (locals()["re~"])("a|b|c")
2.3f   --becomes-->  (locals()["~f"])("2.3")

Assuming that most people don't create variable names that start
or end with `~`, the impact on existing code should be minimal (we
could use an even more rare character there, say `\0`).

The current string prefixes would be special-cased by the compiler to
behave exactly as they behave right now.

Also, a prefix such as `czt""` is always just a single prefix, there is no
need to treat it as 3 single-char prefixes.

> One of the weaknesses of string prefixes is that it's hard to get help 
> for them. ...
> What's the difference between r-strings and u-strings? help() is no help

Well, it's just another problem to overcome. I know in Python one can get
help on keywords and even operators by saying `help('class')` or `help('+')`.
We could extend this to allow `help('foo""')` to give the help for the
prefix "foo".

Specifically, if the argument to `help` is a string, and that string is not a
registered topic, then check whether the string is of the form `""`
or `''` or `""` or `''`, and invoke the help for the corresponding
prefix / suffix.

This will even solve the problem with the help for existing affixes `b""`,
`f""`, `0j`, etc.

>  you probably won't want to do that, since Version will probably be 
> useful for those who want to create Version objects from expressions or 
> variables, not just string literals.

For the Version class you're right. But use cases vary. In the thread from
2013 where this issue was discussed, many people wanted `sql"..."`
literal to be available as literal and nothing else. Presumably, if you wanted
to construct a query dynamically there could be a separate function
`sql_unsafe()` taking a simple string as an argument.


> So the "pollution" isn't really pollution at all, at least not if you 
> use reasonable names, and the main justification for parallel namespaces 
> seems much weaker.

The pollution argument is that, on one hand, we want to use short names
such as "v" for prefixes/suffixes, while on the other hand we don't want 
them to be "regular" variable names because of the possibilities of name
clashes. It's perfectly fine to have a short character for a prefix and at the
same time a longer name for a function. It's like we have the `unicode()`
function and `u"..."` prefix. It's like most command line utilities offer short
single-character options and longer full-name options.

> That's an interesting position for the proponent of a new feature to 
> take. "Don't worry about this being confusing, because hardly anyone 
> will use it."

I'm sorry if I expressed myself ambiguously. What I meant to say is that
the set of different prefixes within a single program will likely be small.


> We can't extrapolate from four built-in prefixes being manageable to 
> concluding that dozens of clashing user-defined prefixes will be too.

That's a valid point. Though we can't extrapolate that they will be 
unmanageable either. There's just not enough data. But we could look 
at other languages who have more suffixes. Say, C or C++.

Ultimately, this can be a self-regulating feature: if having too many
suffixes/prefixes makes one's code unreadable, then simply stop using
them and go back to regular function calls.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OCPR77XVTKWZJUSIBNJAF75FITRPE7AP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Andrew Barnert via Python-ideas
On Aug 28, 2019, at 00:40, Chris Angelico  wrote:
> 
> On Wed, Aug 28, 2019 at 2:40 PM Andrew Barnert  wrote:
>>> People can be trusted with powerful features that can introduce
>>> complexity. There's just not a lot of point introducing a low-value
>>> feature that adds a lot of complexity.
>> 
>> But it really doesn’t add a lot of complexity.
>> 
>> If you’re not convinced that really-raw string processing is doable, drop 
>> that.
>> 
>> Since the OP hasn’t given a detailed version of his grammar, just take mine: 
>> a literal token immediately followed by one or more identifier characters 
>> (that couldn’t have been munched by the literal) is a user-suffix literal. 
>> This is compiled into code that looks up the suffix in a central registry 
>> and calls it with the token’s text. That’s all there is to it.
>> 
> 
> What is a "literal token", what is an "identifier character", 

Literals and identifier characters are already defined today, so I don’t need 
new definitions for them. 

The existing tokens are already implemented in the tokenizer and in the 
tokenize module, which is why I was able to slap together multiple variations 
on a proof of concept 4 years ago in a few minutes as a token-stream-processing 
import hook. 

My import hook version is a hack, of course, but it serves as a counter to your 
argument that there’s no simple thing that could work by being a dead simple 
thing that does work. And there’s no reason to believe a real version wouldn’t 
be at least as simple.

> and how
> does this apply to your example of having digits, a decimal point, and
> then a suffix


We add a`suffixedfloatnumber` production defined as `floatnumber identifier`. 
So, the `2.34` parses as a `floatnumber` the same as always. That `d` can’t be 
part of a `floatnumber`, but it can be the start of an `idenfifier`, and those 
two nodes together can make up a `suffixedfloatnumber`. No need for any new 
lookahead or other context. And for the concrete implementation in CPython, it 
should be obvious that the suffix can be pushed down into the tokenizer, at 
which point the parse becomes trivial.

If you’re asking how my hacky version works, you could just read the code, 
which is simpler than an explanation, but here goes (from memory, because I’m 
on my phone): To the existing tokenizer, `d` isn’t a delimiter character, so it 
tries to match the whole `2.34d`. That doesn’t match anything. But `2.34` does 
match something, etc., so ultimately it emits two tokens, `floatnumber('2.34'), 
error('d')`. My import hook reads the stream of tokens. When it sees a 
`floatnumber` followed by an `error`, it checks whether the error body could be 
an identifier token. If so, it replaces those two tokens in the steam with… I 
forget, but probably I just hand-parsed the lookup and call and emit the tokens 
for that.

I can’t _guarantee_ that the real version would be simpler until I try it. And 
I don’t want to hijack the OP’s thread and replace his proposal (which does 
give me what I want) with mine (which doesn’t give him what he wants), unless 
he abandons the idea of attempting to implement his version. But I’m pretty 
confident it would be as simple as it sounds, which is even simpler than the 
hacky version (which, again, is dead simple and works today).

And most variations on the idea you could design would be just as simple. Maybe 
the OP will perversely design one that isn’t. If so, it’s his job to show that 
it can be implemented. And if he gives up, then I’ll argue for something that I 
can implement simply. But I don’t think that’s even going to come up.

> What if you want to have a string, and what if you want
> to have that string contain backslashes or quotes? If you want to say
> that this doesn't add complexity, give us some SIMPLE rules that
> explain this.

Well, that works exactly the same way a string does today (including the 
optional r prefix). The closing quote can now be followed by a string of 
identifier characters, but everything up to there is exactly the same as today. 
So, it doesn’t add any complexity, because it uses the same rules as today.

I did suggest, as a throwaway addon to the OP’s proposal, that you could 
instead do raw strings or even really-raw (the string ends at the first 
matching quote; backslashes mean nothing). I don’t know if he wants either of 
those, but if he does, raw string literals are already defined in the grammar 
and implemented in the tokenizer, and really-raw is an even simpler grammar 
(identical to the existing grammar except that instead of `longstringchar | 
stringescapeseq` there’s a `` node, and 
the same for `shortstringitem`).

> And make absolutely sure that the rules are identical for EVERY
> possible custom prefix/suffix,

Well, in my version, since the rule for suffixedstringliteral is just 
`stringliteral identifier`, of course it’s the same for every possible suffix; 
there’s no conceivable way it could be different.

If the OP wants to 

[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Konstantin Schukraft

On Wed, Aug 28, 2019 at 04:02:26PM +0100, Paul Moore wrote:

On Wed, 28 Aug 2019 at 15:55, Mike Miller  wrote:



On 2019-08-28 01:05, Paul Moore wrote:
> However, I know of almost no examples of other languages that have
> added*user-definable*  literal types (with the notable exception of

Believe there is such a feature in modern JavaScript:

https://developers.google.com/web/updates/2015/01/ES6-Template-Strings#tagged_templates


Interesting - thanks for the pointer!


Elixir has something it calls sigils. It seems to be basically the
map-to-function variant:

https://elixir-lang.org/getting-started/sigils.html

Konstantin


signature.asc
Description: PGP signature
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GKQ2AXLISVFY6TEIEE4FI5YMV7PDFCEW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Paul Moore
On Wed, 28 Aug 2019 at 15:55, Mike Miller  wrote:
>
>
> On 2019-08-28 01:05, Paul Moore wrote:
> > However, I know of almost no examples of other languages that have
> > added*user-definable*  literal types (with the notable exception of
>
> Believe there is such a feature in modern JavaScript:
>
> https://developers.google.com/web/updates/2015/01/ES6-Template-Strings#tagged_templates

Interesting - thanks for the pointer!
Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J2JQ2ERBNIWQ7267QOEAXRI7N3NMX4X5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Mike Miller



On 2019-08-28 01:05, Paul Moore wrote:

However, I know of almost no examples of other languages that have
added*user-definable*  literal types (with the notable exception of


Believe there is such a feature in modern JavaScript:

https://developers.google.com/web/updates/2015/01/ES6-Template-Strings#tagged_templates

-Mike
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B4EUENINXAXDNZU2VAZ4CCDERRK4ND2E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Chris Angelico
On Wed, Aug 28, 2019 at 10:50 PM Rhodri James  wrote:
>
> On 28/08/2019 02:38, stpa...@gmail.com wrote:
> > Thanks, Andrew, you're able to explain this much better than I do.
> > Just wanted to add that Python*already*  has ways to grossly abuse
> > its syntax and create unreadable code. For example, I can write
> >
> >  >>> о = 3
> >  >>> o = 5
> >  >>> ο = 6
> >  >>> (о, o, ο)
> >  (3, 5, 6)
>
> OK, I'll bite: how?  If you were using "thing.o" I would believe you
> were doing something unhelpful with properties, but just "o"?
>

'\u043e' CYRILLIC SMALL LETTER O
'o' LATIN SMALL LETTER O
'\u03bf' GREEK SMALL LETTER OMICRON

Virtually indistinguishable in most fonts, but distinct characters.
It's the same thing you can do with "I" and "l" in many fonts, or "rn"
and "m" in some, but taken to a more untypable level.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LEHFRMXDVYBJZAAAMSSCCPU7GYTK5IRN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Paul Moore
On Wed, 28 Aug 2019 at 13:49, Rhodri James  wrote:

> OK, I'll bite: how?  If you were using "thing.o" I would believe you
> were doing something unhelpful with properties, but just "o"?

Presumably Unicode variables with confusable characters?

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3DU4YAH5QEMFSLUXFZSQB4GOQO2M3IKW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Paul Moore
On Wed, 28 Aug 2019 at 13:15, Anders Hovmöller  wrote:
>
> > On 28 Aug 2019, at 14:09, Piotr Duda  wrote:

> > There is much simpler solution, just make `abc"whatever"` synatctic
> > sugar for `string_literal_abc(r"whatever", closure)` where closure is
> > object that allow read only access to variables in call site.
>
> So to use abc"foo" we must import string_literal_abc? Seems pretty confusing 
> to me!

The only sane proposal that I can see (assuming that no-one is
proposing to drop the principle that Python shouldn't have mutable
syntax) is to modify the definition

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix::=  "r" | "u" | "R" | "U" | "f" | "F"
 | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"

to expand the definition of  to allow any
identifier-like token (precise details to be confirmed). Then, if it's
one of the values enumerated above (you'd also need some provison for
special-casing bytes literals, which are in a different syntax rule),
work as at present. For any other identifier-like token, you'd define

TOKEN(shortstring|longstring)

as being equivalent to

TOKEN(r(shortstring|longstring))

I.e., treat the string as a raw string, and TOKEN as a function name,
and compile to a function call of the named function with the raw
string as argument.

That's a well-defined proposal, although whether it's what people want
is a different question. Potential issues:

1. It makes a whole class of typos that are currently syntax errors
into runtime errors - fru"foo\and {bar}" is now a function call rather
than a syntax error (it was never a raw Unicode f-string, even though
someone might think it was and be glad to be corrected by the current
syntax error...)
2. It begs the question of whether people want raw-string semantics -
whilst it's the most flexible option, it does mean that literals
wanting to allow escape sequences would need to implement it
themselves.
3. It does nothing for the edge case that a trailing \ isn't allowed -
p"C:\" wouldn't be a valid Path literal.

There are of course other possible proposals, but we'd need more than
broad statements to make sense of them (specifically, either "exactly
*what* new syntax are you suggesting we allow?", or "how are you
proposing to allow users to alter Python syntax on demand?")

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CGGDESNYMT3TA2MCY5CEYTQT5CUHTJLF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Rhodri James

On 28/08/2019 02:38, stpa...@gmail.com wrote:

Thanks, Andrew, you're able to explain this much better than I do.
Just wanted to add that Python*already*  has ways to grossly abuse
its syntax and create unreadable code. For example, I can write

 >>> о = 3
 >>> o = 5
 >>> ο = 6
 >>> (о, o, ο)
 (3, 5, 6)


OK, I'll bite: how?  If you were using "thing.o" I would believe you 
were doing something unhelpful with properties, but just "o"?


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VMYYPWSXOPZ6KOKOY4W5K3VCOWVQFZKY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Anders Hovmöller


> On 28 Aug 2019, at 14:09, Piotr Duda  wrote:
> 
> śr., 28 sie 2019 o 13:18 Steven D'Aprano  napisał(a):
>> 
>>> On Tue, Aug 27, 2019 at 05:13:41PM -, stpa...@gmail.com wrote:
>>> 
>>> The difference between `x'...'` and `x('...')`, other than visual noise, is 
>>> the
>>> following:
>>> 
>>> - The first "x" is in its own namespace of string prefixes. The second "x"
>>>  exists in the global namespace of all other symbols.
>> 
>> Ouch! That's adding a lot of additional complexity to the language.
> 
> There is much simpler solution, just make `abc"whatever"` synatctic
> sugar for `string_literal_abc(r"whatever", closure)` where closure is
> object that allow read only access to variables in call site.

So to use abc"foo" we must import string_literal_abc? Seems pretty confusing to 
me!
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XMKWHIOIEHPPXSFTBDQGZ5AE5RGSWFLE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Piotr Duda
śr., 28 sie 2019 o 13:18 Steven D'Aprano  napisał(a):
>
> On Tue, Aug 27, 2019 at 05:13:41PM -, stpa...@gmail.com wrote:
>
> > The difference between `x'...'` and `x('...')`, other than visual noise, is 
> > the
> > following:
> >
> > - The first "x" is in its own namespace of string prefixes. The second "x"
> >   exists in the global namespace of all other symbols.
>
> Ouch! That's adding a lot of additional complexity to the language.

There is much simpler solution, just make `abc"whatever"` synatctic
sugar for `string_literal_abc(r"whatever", closure)` where closure is
object that allow read only access to variables in call site.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6GFRVQOVKXPSX54BKOA7LDWPY4YSUYAG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Steven D'Aprano
On Tue, Aug 27, 2019 at 05:13:41PM -, stpa...@gmail.com wrote:

> The difference between `x'...'` and `x('...')`, other than visual noise, is 
> the
> following:
> 
> - The first "x" is in its own namespace of string prefixes. The second "x"
>   exists in the global namespace of all other symbols.

Ouch! That's adding a lot of additional complexity to the language.

Python's scoping rules are usually described as LEGB:

- Local
- Enclosing (non-local)
- Global (module)
- Builtins

but that's an over-simplification, dating back to something like Python 
1.5 days. Python scope also includes:

- class bodies can be the local scope, but they don't work quite
  the same as function locals);
- parts of the body of comprehensions behave as if they were a
  seperate scope.

This proposal adds a completely seperate, parallel set of scoping rules 
for these string prefixes. How many layers in this parallel scope?

The simplest design is to have a single, interpreter wide namespace for 
prefixes. Then we will have name clashes, especially since you seem to 
want to encourage single character prefixes like "v" (verbose, version) 
or "d" (date, datetime, decimal). Worse, defining a new prefix will 
affect all other modules using the same prefix.

So we need a more complex parallel scope. How much more complex?


* if I define a string prefix inside a comprehension, function or 
  class body, will that apply across the entire module or just inside 
  that comp/func/class?

* how do nested functions interact with prefixes?

* do we need a set of parallel keywords equivalent to global and 
  nonlocal for prefixes?


If different modules have different registries, then not only do we need 
to build a parallel set of scoping rules for prefixes into the 
interpreter, but we need a parallel way to import them from other 
modules, otherwise they can't be re-used.

Does "from module import x" import the regular object x from the module 
namespace, or the prefix x from the prefix-namespace? So it seems we'll 
need a parallel import system as well.

All this adds more complexity to the language, more things to be coded 
and tested and documented, more for users to learn, more for other 
implementations to re-implement, and the benefit is marginal: the 
ability to drop parentheses from some but not all function calls.


Now consider another problem: introspection, or the lack thereof.

One of the weaknesses of string prefixes is that it's hard to get help 
for them. In the REPL, we can easily get help on any class or function:

help(function)

and that's really, really great. We can use the inspect module or dir() 
to introspect functions, classes and instances, but we can't do the same 
for string prefixes.

What's the difference between r-strings and u-strings? help() is no help 
(pun intended), since help sees only the string instance, not the syntax 
you used to create it. All of these will give precisely the same output:

help(str())
help('')
help(u'')
help(r"")

etc. This is a real weakness of the prefix system, and will apply 
equally to custom prefixes. It is *super easy* to introspect a class or 
function like Version; it is *really hard* to do the same for a prefix.

You want this seperate namespace for prefixes so that you can have an v 
prefix without "polluting" the module namespace with a v function (or 
class). But v doesn't write itself! You still have to write a function 
or class, athough you might give it a better name and then register it 
with the single letter prefix:

@register_prefix('v')
class Version:
...

(say). This still leaves Version lying around in your global namespace, 
unless you explicitly delete it:

del Version

but you probably won't want to do that, since Version will probably be 
useful for those who want to create Version objects from expressions or 
variables, not just string literals.

So the "pollution" isn't really pollution at all, at least not if you 
use reasonable names, and the main justification for parallel namespaces 
seems much weaker.

Let me put it another way: parallel namespaces is not a feature of this 
proposal. It is a point against it.


> - Python style discourages too short variable names, especially in libraries,
>   because they have increased chance of clashing with other symbols, and
>   generally may be hard to understand. At the same time, short names for
>   string prefixes could be perfectly fine: there won't be too many of them
>   anyways.

That's an interesting position for the proponent of a new feature to 
take. "Don't worry about this being confusing, because hardly anyone 
will use it."


>   The standard prefixes "b", "r", "u", "f" are all short, and nobody
>   gets confused about them.

Plenty of people get confused about raw strings.

There's only four, plus uppercase and combinations, and they are 
standard across the entire language. If there were dozens of them, 
coming from lots of different modules 

[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Rhodri James

On 27/08/2019 18:07, Andrew Barnert via Python-ideas wrote:

On Aug 27, 2019, at 08:52, Steven D'Aprano  wrote:

On Tue, Aug 27, 2019 at 05:24:19AM -0700, Andrew Barnert via Python-ideas wrote:

There is a possibility in between the two extremes of “useless” and
“complete monster”: the prefix accepts exactly one token, but can
parse that token however it wants.

How is that different from passing a string argument to a function or
class constructor that can parse that token however it wants?

x'...'

x('...')

Unless there is some significant difference between the two, what does
this proposal give us?

Before I get into this, let me ask you a question. What does the j suffix give 
us? You can write complex numbers without it just fine:

 c = complex
 c(1, 2)

And you can even write a j function trivially:

 def j(x): return complex(0, x)
 1 + j(2)

But would anyone ever write that when they can write it like this:

 1 + 2j

I don’t think so. What does the j suffix give us? The two extra keystrokes are 
trivial. The visual noise of the parens is a bigger deal. The real issue is 
that this matches the way we conceptually think of complex numbers, and the way 
we write them in other contexts. (Well, the way electrical engineers write 
them; most of the rest of us use i rather than j… but still, having to use j 
instead of i is less of an impediment to reading 1+2j than having to use 
function syntax like 1+i(2).


You make the point yourself: this is something we already understand 
from dealing with complex numbers in other circumstances.  That is not 
true of generic single-character string prefixes.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BHSES2YDLYFFUYOSW753G3IC2C5OGVC2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Paul Moore
On Wed, 28 Aug 2019 at 05:04, Andrew Barnert via Python-ideas
 wrote:
> What matters here is not whether things like the OP’s czt'abc' or my 1.23f or 
> 1.23d are literals to the compiler, but whether they’re readable ways to 
> enter constant values to the human reader.
>
> If so, they’re useful. Period.
>
> Now, it’s possible that even though they’re useful, the feature is still not 
> worth adding because of Chris’s issue that it can be abused, or because 
> there’s an unavoidable performance cost that makes it a bad idea to rely on 
> them, or because they’re not useful in _enough_ code to be worth the effort, 
> or whatever. Those are questions worth discussing. But arguing about whether 
> they meet (one of the three definitions of) “literal” is not relevant.

Extended (I'm avoiding the term "custom" for now) literals like 0.2f,
3.14D, re/^hello.*/ or qw{a b c} have a fairly solid track record in
other languages, and I think in general have proved both useful and
straightforward in those languages. And even in Python, constructs
like f-strings and complex numbers are examples of such things.
However, I know of almost no examples of other languages that have
added *user-definable* literal types (with the notable exception of
C++, and I don't believe I've seen use of that feature in user code -
which is not to say that it's not used). That to me says that there
are complexities in extending the question to user-defined literals
that we need to be careful of.

In my view, the issue isn't abuse of the feature, or performance, or
limited value. It's the very basic problem that it's *really hard* to
define and implement such a feature in a way that everyone is happy
with - particularly in a language like Python which doesn't have a
user-exposed "compile source to binary" step (I tried very hard to
cover myself against nitpicking there - I'm sure I failed, but please,
don't get sidetracked, you know what I mean here :-)). Some specific
questions which would need to be dealt with:

1. What is valid in the "literal" part of the construct (this is the
p"C:\" question)?
2. How do definitions of literal syntax get brought into scope in time
for the parser to act on them (this is about "import xyz_literal"
making xyz"a string" valid but leaving abc"a string" as a syntax
error)?

These questions also fundamentally affect other tools like IDEs,
linters, code formatters, etc.

In addition, there is the question of how user-defined literals would
get turned into constants within the code. In common with list
expressions, tuples, etc, user-defined literals would need to be
handled as translating into runtime instructions for constructing the
value (i.e., a function call). But people typically don't expect
values that take the form of a literal like this to be "just" syntax
sugar for a function call. So there's an education issue here. Code
will get errors at runtime that the users might have expected to
happen at compile time, or in the linter.

It's not that these questions can't be answered. Obviously they can,
as you produced a proof of concept implementation. But the design
trade-offs that one person might make are deeply unsatisfactory to
someone else, and there's no "obviously right" answer (at least not
yet, as no-one Dutch has explained what's obvious ;-))

Also, it's worth noting that the benefits of *user-defined* literals
are *not* the same as the benefits of things like 0.2f, or 3.14d, or
even re/^hello.*/. Those things may well be useful. But the benefit
you gain from *user-defined* literals is that of letting the end user
make the design decisions, rather than the language designer. And
that's a subtly different thing.

So, to summarise, the real problem with user defined literal proposals
is that the benefit they give hasn't yet proven sufficient to push
anyone to properly address all of the design-time details. We keep
having high-level "would this be useful" debates, but never really
focus on the key question, of what, in precise detail, is the "this"
that we're talking about - so people are continually making arguments
based on how they conceive such a feature might work. A really good
example here is the p"C:\" question. Is the proposal that the "string
part" of the literal is just a normal string? If so, then how do you
address this genuine issue that not all paths are valid? What about
backslash-escapes (p"C:\temp")? Is the string a raw string or not? If
the proposal is that the path-literal code can define how the string
is parsed, then *how does that work*?

The OP even made this point explicitly:

> I'm not discussing possible implementation of this feature just yet, we can 
> get to
> that point later when there is a general understanding that this is worth 
> considering.

I don't think we *can* agree on much without the implementation
details (well, other than "yes, it's worth discussing, but only if
someone proposes a properly specified design" ;-))

Paul

[Python-ideas] Re: Custom string prefixes

2019-08-28 Thread Chris Angelico
On Wed, Aug 28, 2019 at 2:40 PM Andrew Barnert  wrote:
> > People can be trusted with powerful features that can introduce
> > complexity. There's just not a lot of point introducing a low-value
> > feature that adds a lot of complexity.
>
> But it really doesn’t add a lot of complexity.
>
> If you’re not convinced that really-raw string processing is doable, drop 
> that.
>
> Since the OP hasn’t given a detailed version of his grammar, just take mine: 
> a literal token immediately followed by one or more identifier characters 
> (that couldn’t have been munched by the literal) is a user-suffix literal. 
> This is compiled into code that looks up the suffix in a central registry and 
> calls it with the token’s text. That’s all there is to it.
>

What is a "literal token", what is an "identifier character", and how
does this apply to your example of having digits, a decimal point, and
then a suffix? What if you want to have a string, and what if you want
to have that string contain backslashes or quotes? If you want to say
that this doesn't add complexity, give us some SIMPLE rules that
explain this.

And make absolutely sure that the rules are identical for EVERY
possible custom prefix/suffix, because otherwise you're opening up the
problem of custom prefixes changing the parser again.

> Compare that adding Decimal (and Fraction, as you said last time) literals 
> when the types aren’t even builtin. That’s more complexity, for less benefit. 
> So why is it better?
>

Actually no, it's a lot less complexity, because it's all baked into
the language. You don't have to have the affix registry to figure out
how to parse a script into AST. The definition of a "literal" is given
by the tokenizer, and for instance, "-1+2j" is not a literal. How is
this going to impact your registry? The distinction doesn't matter to
Decimal or Fraction, because you can perform operations on them at
compile time and retain the results, so "-1.23d" would syntactically
be unary negation on the literal Decimal("1.23"), and -4/5f would be
unary negation on the integer 4 and division between that and
Fraction(5). But does that work with your proposed registry? What is a
"literal token", and would it need to include these kinds of things?
What if some registered types need to include them and some don't?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HCHB5PAT5YXY4ZOW66YCYQM5PLWUE2UG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
> On Aug 27, 2019, at 18:19, Chris Angelico  wrote:
> 
>>> On Wed, Aug 28, 2019 at 10:52 AM Andrew Barnert  wrote:
>>> 
>>> On Aug 27, 2019, at 14:41, Chris Angelico  wrote:
>>> All the examples about Windows paths fall into one of two problematic boxes:
>>> 
>>> 1) Proposals that allow an arbitrary prefix to redefine the entire
>>> parser - basically impossible for anything sane
>>> 
>>> 2) Proposals that do not allow the prefix to redefine the parser, and
>>> are utterly useless, because the rest of the string still has to be
>>> valid.
>> 
>> 3) Proposals that do not allow the prefix to redefine the parser for the 
>> entire program, but do allow it to manually parse anything the tokenizer can 
>> recognize as a single (literal) token.
>> 
>> As I said, I haven’t tried to implement this example as I have with the 
>> other examples, so I can’t promise that it’s doable (with the current 
>> tokenizer, or with a reasonable change to it). But if it is doable, it’s 
>> neither insane nor useless. (And evenif it’s not doable, that’s just two 
>> examples that affixes can’t solve—Windows paths and general “super-raw 
>> strings”. They still solve all of the other examples.)
> 
> So what is the definition of "a single literal token" when you're
> creating a path-string? You want this to be valid:
> 
> x = path"C:\"
> 
> For this to work, the path prefix has to redefine the way the parser
> finds the end of the token, does it not?

I’m not sure (maybe about 60% at best), but I think last time I checked this, 
the tokenizer actually hits the error without munching the rest of the file.

If I’m wrong, then you would need to add a “really raw string literal” builtin 
that any affixes that want really raw string literals could use, but that’s all 
you’d have to do.

And I really don’t think it’s worth getting this in-depth into just one of the 
possible uses that I just tossed off as an aside, especially without actually 
sitting down and testing anything. 

>> Look at the plethora of suffixes C has for number and character literals. 
>> Look at how many things people still can’t do with them that they want to.
> 
> I don't know how many there are. The only ones I can think of are "f"
> for single-precision float, and the long and unsigned suffixes on
> integers.

Of the top of my head, there are also long long integers, and long doubles, and 
wide and three Unicode suffixes for char. Those probably aren’t all of them. 
And your compiler probably has extensions for “legacy” suffixes and nonstandard 
types like int128 or decimal64 and so on.

> Python doesn't have these because very few programs need to
> care about whether a float is single-precision or double-precision, or
> how large an int is.

Right, but the issue isn’t which ones, but how many. C doesn’t have decimals or 
fractions, and other things like datetime objects have been suggested in this 
thread, and even more in the two earlier threads. If there are too many useful 
kinds of constants, there are too many to make them all builtins.

>> Do you think Python users are incapable of the kind of restraint and taste 
>> shown by C++ users, and therefore we can’t trust Python users with a tool 
>> that might possibly (but we aren’t sure) if abused badly enough make code 
>> harder to visually parse?
> 
> People can be trusted with powerful features that can introduce
> complexity. There's just not a lot of point introducing a low-value
> feature that adds a lot of complexity.

But it really doesn’t add a lot of complexity.

If you’re not convinced that really-raw string processing is doable, drop that.

Since the OP hasn’t given a detailed version of his grammar, just take mine: a 
literal token immediately followed by one or more identifier characters (that 
couldn’t have been munched by the literal) is a user-suffix literal. This is 
compiled into code that looks up the suffix in a central registry and calls it 
with the token’s text. That’s all there is to it.

Compare that adding Decimal (and Fraction, as you said last time) literals when 
the types aren’t even builtin. That’s more complexity, for less benefit. So why 
is it better?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MC5OSKYMU4IK3U4KPQ3ETJ5YQG2F6EYT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 18:59, Steven D'Aprano  wrote:
> 
> On Tue, Aug 27, 2019 at 10:07:41AM -0700, Andrew Barnert wrote:
> 
>>> How is that different from passing a string argument to a function or 
>>> class constructor that can parse that token however it wants?
>>> 
>>>   x'...'
>>> 
>>>   x('...')
>>> 
>>> Unless there is some significant difference between the two, what does 
>>> this proposal give us?
>> 
> 
>> Before I get into this, let me ask you a question. What does the j 
>> suffix give us?
> 
> I'm going to answer that question, but before I answer it, I'm going to 
> object that this analogy is a poor one. This proposal is *in no way* a 
> proposal for a new compile-time literal.

Yes, you’re the same person who got hung up on the fact that these affixes 
don’t really give us “literals” back in either 2013 or 2016, and I don’t want 
to rehash that argument. I could point out that nobody cares that -1 isn’t 
really a literal, and almost nobody cares that the CPython optimizer 
special-cases its way around that, and the whole issue with Python having three 
different definitions of “literal” that don’t coincide, and so on, but we 
already had this conversation and I don’t think anyone but the two of us cared.

What matters here is not whether things like the OP’s czt'abc' or my 1.23f or 
1.23d are literals to the compiler, but whether they’re readable ways to enter 
constant values to the human reader. 

If so, they’re useful. Period.

Now, it’s possible that even though they’re useful, the feature is still not 
worth adding because of Chris’s issue that it can be abused, or because there’s 
an unavoidable performance cost that makes it a bad idea to rely on them, or 
because they’re not useful in _enough_ code to be worth the effort, or 
whatever. Those are questions worth discussing. But arguing about whether they 
meet (one of the three definitions of) “literal” is not relevant.

> This proposal is for mere syntactic sugar allowing us to drop the 
> parentheses from a tiny subset of function calls, those which take a 
> single string argument.

And to drop the quotes as well. And to avoid polluting the global namespace 
with otherwise-unused one-character function names.

Can you honestly tell me that you see no significant readability difference 
between these examples:

vec = [1.23f, 2.5f, 1.11f]
vec = [f('1.23'), f('2.5'), f('1.11')]

I think anyone would agree that the former is a lot more readable. Sure, you 
have to learn what the f suffix means, but once you do, it means all of the 
dozens of constants in the module are more readable. (And of course most people 
reading this code will probably be people who are used to 3D code and already 
_expect_ that format, since that’s how you write it in C, in shaders, etc.)

> And even then, only when the argument is a 
> string literal:
> 
>czt'abc'  # Okay.
> 
>s = 'abc'
>czt's'  # Oops, wrong, doesn't work.

Sure, just like you can’t apply an r or f prefix to a string expression.

> But, to answer your question, what does the j suffix give us?
> 
> Damn little. Unless there is a large community of Scipy and Numpy users 
> who need complex literals, I suspect that complex literals are one of 
> the least used features in Python.
> 
> I do a lot of maths in Python, and aside from experimentation in the 
> interactive interpreter, I think I can safely say that I have used 
> complex literals exactly zero times in code.

I don’t think your experience here is typical. I can’t think of a good way to 
search GitHub python repos for uses of j, but a hacky search immediately turned 
up this numpy issue:https://github.com/numpy/numpy/issues/13179: 

> A fast way to get the inverse of angle, i.e., exp(1j * a) = cos(a) + 1j * 
> sin(a). Note that for large angle arrays, exp(1j*a)needlessly triples memory 
> use…

That doesn’t prove that people actually call it with `1j * a` instead of 
`complex(0, a)`, but it does seem likely.


>> You can write complex numbers without it just fine:
> [...]
> 
> Indeed. And if we didn't already have complex literals, would we accept 
> a proposal to add them now? I doubt it.

I’m not sure. I assume you’d be against it, but I suspect that most of the 
people who use it today would be for it.

But if we had custom affixes, I think everyone would be happy with “just define 
a custom j suffix”. Would anyone really argue that they need the performance 
benefit or compile-time handling? How often do you evaluate zillions of 
constants in the middle of a tight loop? And what other argument would there be 
for adding it to the grammar and the compiler and forcing every project to use 
it?

Which is exactly what I think of the Decimal and Fraction suffixes, contrary to 
what Chris says. There will be a small number of projects than get a lot of 
readability benefit, but every other project gains nothing, so why add it as a 
builtin for every project?

And I don’t see why float32 is any different from Decimal 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Steven D'Aprano
On Tue, Aug 27, 2019 at 10:07:41AM -0700, Andrew Barnert wrote:

> > How is that different from passing a string argument to a function or 
> > class constructor that can parse that token however it wants?
> > 
> >x'...'
> > 
> >x('...')
> > 
> > Unless there is some significant difference between the two, what does 
> > this proposal give us?
> 

> Before I get into this, let me ask you a question. What does the j 
> suffix give us?

I'm going to answer that question, but before I answer it, I'm going to 
object that this analogy is a poor one. This proposal is *in no way* a 
proposal for a new compile-time literal.

If it were, it might be interesting: I would be very interested to hear 
more about literals for a Decimal type, say, or regular expressions. But 
this proposal doesn't offer that.

This proposal is for mere syntactic sugar allowing us to drop the 
parentheses from a tiny subset of function calls, those which take a 
single string argument. And even then, only when the argument is a 
string literal:

czt'abc'  # Okay.

s = 'abc'
czt's'  # Oops, wrong, doesn't work.


But, to answer your question, what does the j suffix give us?

Damn little. Unless there is a large community of Scipy and Numpy users 
who need complex literals, I suspect that complex literals are one of 
the least used features in Python.

I do a lot of maths in Python, and aside from experimentation in the 
interactive interpreter, I think I can safely say that I have used 
complex literals exactly zero times in code.


> You can write complex numbers without it just fine:
[...]

Indeed. And if we didn't already have complex literals, would we accept 
a proposal to add them now? I doubt it. But if you think we would, how 
about a proposal to add quaternions?

q = 3 + 4i + 2j - 7k


> But would anyone ever write that when they can write it like this:
> 
> 1 + 2j

Given that complex literals are already a thing, of course you are 
correct that if I ever needed a complex literal, I would use the literal 
syntax.

But that's the point: it is *literal syntax* handled by the compiler at 
compile time, not syntactic sugar for a runtime function call that has 
to inefficiency parse a string.

Because it is built-in to the language, we don't have to do this:

def c(astring):
assert isinstance(astring, str)
# Parse the string at runtime
real, imag = ...
return complex(real, imag)

z = c"1.23 + 4.56j"

(I'm aware that the complex constructor actually does parse strings 
already, so in *this specific* example we don't have to write our own 
parser. But that doesn't apply in the general case.)

That is nothing like complex literals:

py> from dis import dis
py> dis(compile('1+2j', '', 'eval'))
  1   0 LOAD_CONST   2 ((1+2j))
  3 RETURN_VALUE


# Hypothetical byte-code generated from custom string prefix 
py> dis(compile("c'1+2j'", '', 'eval'))
  1   0 LOAD_NAME0 (c)
  3 LOAD_CONST   0 ('1+2j')
  6 CALL_FUNCTION1 (1 positional, 0 keyword pair)
  9 RETURN_VALUE

Note that in the first case, we generate a complex literal at compile 
time; in the second case, we generate a *string* literal at compile 
time, which must be parsed at runtime.

This is not a rhetorical question: if we didn't have complex literals, 
why would you write your complex number as a string, deferring parsing 
it until runtime, when you could parse it in your head at edit-time and 
call the constructor directly?

z = complex(1.23, 4.56)  # Assuming there was no literal syntax.


> I don’t think so. What does the j suffix give us? The two extra 
> keystrokes are trivial. The visual noise of the parens is a bigger 
> deal.

I don't think it is. I think the big deals in this proposal are:

- you have something that looks like a kind of string czt'...' 
  but is really a function call that might return absolutely 
  anything at all;

- you have a redundant special case for calling functions that
  take a single argument, but only if that argument is a string
  literal;

- you encourage people to write cryptic single-character 
  functions, like v(), x(), instead of meaningful expressions
  like Version() and re.compile();

- you encourage people to defer parsing that could be efficiently 
  done in your head at edit time into slow and likely inefficient
  string parsing done at runtime;

- the OP still hasn't responded to my question about the ambiguity
  of the proposal (is czt'...' a one three-letter prefix, or three 
  one-letter prefixes?)

all of which *hugely* outweighs the gain of being able to avoid a pair 
of parentheses.


[...]

> And the exact same thing is true in 3D or CUDA code that uses a lot of 
> float32 values. [...] I actually have to go through a string for 
> implementation reasons (because otherwise Python would force me to go 
> through a float64 and distort the values)

Indeed, but this proposal doesn't help 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread stpasha
Thanks, Andrew, you're able to explain this much better than I do.
Just wanted to add that Python *already* has ways to grossly abuse
its syntax and create unreadable code. For example, I can write

>>> о = 3
>>> o = 5
>>> ο = 6
>>> (о, o, ο)
(3, 5, 6)

But just because some feature CAN get abused, doesn't mean it 
ACTUALLY gets abused in practice. People want to write nice, readable
code, because they will ultimately be the ones to support it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZYKQZ2OIQSVKIQ6LMOSWAYLW5UZQP5VG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Chris Angelico
On Wed, Aug 28, 2019 at 10:52 AM Andrew Barnert  wrote:
>
> On Aug 27, 2019, at 14:41, Chris Angelico  wrote:
> > All the examples about Windows paths fall into one of two problematic boxes:
> >
> > 1) Proposals that allow an arbitrary prefix to redefine the entire
> > parser - basically impossible for anything sane
> >
> > 2) Proposals that do not allow the prefix to redefine the parser, and
> > are utterly useless, because the rest of the string still has to be
> > valid.
>
> 3) Proposals that do not allow the prefix to redefine the parser for the 
> entire program, but do allow it to manually parse anything the tokenizer can 
> recognize as a single (literal) token.
>
> As I said, I haven’t tried to implement this example as I have with the other 
> examples, so I can’t promise that it’s doable (with the current tokenizer, or 
> with a reasonable change to it). But if it is doable, it’s neither insane nor 
> useless. (And evenif it’s not doable, that’s just two examples that affixes 
> can’t solve—Windows paths and general “super-raw strings”. They still solve 
> all of the other examples.)
>

So what is the definition of "a single literal token" when you're
creating a path-string? You want this to be valid:

x = path"C:\"

For this to work, the path prefix has to redefine the way the parser
finds the end of the token, does it not? Otherwise, you still have the
same problems you already do - backslashes have to be escaped. That's
why I say that, without being able to redefine the parser, this is
completely useless, as a "path string" might as well just be a
"string".

Which way is it?

> > That line of argument is valid for anything that is specifically
> > defined by the language.
>
> Yes, and? “Literal token” is specifically defined by the language. “Literal 
> token with attached tag” will also be specifically defined by the language. 
> The only thing open to customization is what that token gets compiled to.
>

I don't understand. Are you saying that the prefix is not going to be
able to change how backslashes are handled, or that it is? If you keep
the tokenizer exactly the same and just add a token in front of it,
then things like path"C:\" will be considered to be incomplete and
will continue to consume source code until the next quote (or throw
SyntaxError for EOL inside string literal). Or is your idea of
"literal token" something other than that?

If a "literal token" is simply a string literal, then how is this
actually helping anything? What do you achieve?

> Look at the plethora of suffixes C has for number and character literals. 
> Look at how many things people still can’t do with them that they want to.

I don't know how many there are. The only ones I can think of are "f"
for single-precision float, and the long and unsigned suffixes on
integers. Python doesn't have these because very few programs need to
care about whether a float is single-precision or double-precision, or
how large an int is.

> Look at the way user literals work in C++. While technically you can argue 
> that they are “syntax customization”, in practice the customization is highly 
> constrained. Is it _impossible_ to use that feature to write code that can’t 
> be parsed by a human reader? I don’t know if I could prove that it’s 
> impossible. However, I do know that it’s not easy. And that none of the 
> examples, or real-life uses, that I’ve seen have done so.
>

I also have not yet seen any good examples of user literals in C++.

> Do you think Python users are incapable of the kind of restraint and taste 
> shown by C++ users, and therefore we can’t trust Python users with a tool 
> that might possibly (but we aren’t sure) if abused badly enough make code 
> harder to visually parse?
>

People can be trusted with powerful features that can introduce
complexity. There's just not a lot of point introducing a low-value
feature that adds a lot of complexity.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/THQWD6GA4ESFXE6GRO3BJKSRBQWLAP2X/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 14:41, Chris Angelico  wrote:
> 
>> On Wed, Aug 28, 2019 at 6:03 AM Andrew Barnert  wrote:
>> 
>>> On Tuesday, August 27, 2019, 11:12:51 AM PDT, Chris Angelico 
>>>  wrote:
>>> If your conclusion here were "and that's why Python needs a proper
>>> syntax for Decimal literals", then I would be inclined to agree with
>>> you - a Decimal literal would be lossless (as it can entirely encode
>>> whatever was in the source file), and you could then create the
>>> float32 values from those.
>> 
>> I think builtin Decimal literals are a non-starter. The type isn't even 
>> builtin.
>> 
> 
> Not sure that's a total blocker, but in any case, I'm not arguing for
> that - I'm just saying that everything up to that point in your
> argument would be better served by a Decimal literal than by any
> notion of "custom literals".

No, it really couldn’t. A builtin Decimal literal would arguably serve the 
Decimal use case better (but I’m not even sure about that one; see below), but 
it doesn’t serve the float32 case that you’re responding to.

>> But they're not. You didn't even attempt to answer the comparison with 
>> complex that you quoted. The problem that `j` solves is not that there's no 
>> way to create complex values losslessly out of floats, but that there's no 
>> way to create them _readably_, in a way that's consistent with the way you 
>> read and write them in every other context. Which is exactly the problem 
>> that `f` solves. Adding a Decimal literal would not help that at all—letting 
>> me write `f(1.23d)` instead of `f('1.23')` does not let me write `1.23f`.
>> 
> TBH I don't quite understand the problem. Is it only an issue with
> negative zero? If so, maybe you should say so, because in every other
> way, building a complex out of a float added to an imaginary is
> perfectly lossless.

Negative zero is an irrelevant side issue that Serhiy brought up. It means j is 
not quite perfect—and yet j is still perfectly usable despite that. Ignore 
negative zero.

The problem that j solves is dead simple: 1 + 2j is more readable than 
complex(1, 2). And it matches what you write and read in other contexts besides 
Python. That’s the only problem j solves. But it’s a problem worth solving, at 
least for code that uses a lot of complex numbers. Without it, even if you 
wanted to pollute the namespace with a single-letter global so you could write 
c(1, 2) or 1 + j(2), it _still_ wouldn’t be nearly as readable or as familiar. 
That’s why we have j. There is literally no other benefit, and yet it’s enough.

And the problem that f solves would be exactly the same: 1.23f is more readable 
than calling float32, and it matches what you read and write in other contexts 
besides Python (like, say, C or shader code). Even if you wanted to pollute the 
namespace with a single-letter global f, it still wouldn’t be as readable or as 
familiar. That’s why we should have f. There is literally no other benefit, but 
I think it’s enough benefit, for enough programs, that we should be allowed to 
do it. Just like j.

Unlike j, however, I don’t think it’s useful in enough programs that it should 
be builtin. And I think the same is probably true for Decimal. And for most of 
the other examples that have come up in this thread. Which is why I think we’d 
be better served with something akin to C++ allowing you to explicitly register 
affixes for your specific program, than something like C with its too-big-to 
remember-but-still-not-enough-for-many-uses zoo of builtin affixes.

>> Also, as the OP has pointed out repeatedly and nobody has yet answered, if I 
>> want to write `f(1.23d)` or `f('1.23')`, I have to pollute the global 
>> namespace with a function named `f` (a very commonly-used name); if I want 
>> to write `1.23f`, I don't, since the converter gets stored in some 
>> out-of-the-way place like `__user_literals_registry__['f']` rather than `f`. 
>> That seems like a serious benefit to me.
>> 
> Maybe. But far worse is that you have a very confusing situation that
> this registered value could be different in different programs. 

Sure, and the global f could also be different in different programs—or even in 
different modules in the same program. So what?

1.23f would always have the same meaning everywhere, it’s just that the meaning 
is something like __user_literals__['f']('1.23') instead of 
globals()['f']('1.23').

Yes, of course that is something new to be learned, if you’re looking at a 
program that does a lot of 3D math, or a lot of decimal math, or a lot of 
Windows path stuff, or whatever, people are likely to have used this feature so 
you’ll need to know how to look up the f or d or whatever. But that really 
isn’t a huge hardship, and I think the benefits outweigh the cost. 

> In
> contrast, f(1.23d) would have the same meaning everywhere: call a
> function 'f' with one parameter, the Decimal value 1.23. Allowing
> language syntax to vary between programs is a mess that needs a 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Chris Angelico
On Wed, Aug 28, 2019 at 6:03 AM Andrew Barnert  wrote:
>
> On Tuesday, August 27, 2019, 11:12:51 AM PDT, Chris Angelico 
>  wrote:
> > If your conclusion here were "and that's why Python needs a proper
> > syntax for Decimal literals", then I would be inclined to agree with
> > you - a Decimal literal would be lossless (as it can entirely encode
> > whatever was in the source file), and you could then create the
> > float32 values from those.
>
> I think builtin Decimal literals are a non-starter. The type isn't even 
> builtin.
>

Not sure that's a total blocker, but in any case, I'm not arguing for
that - I'm just saying that everything up to that point in your
argument would be better served by a Decimal literal than by any
notion of "custom literals".

> But they're not. You didn't even attempt to answer the comparison with 
> complex that you quoted. The problem that `j` solves is not that there's no 
> way to create complex values losslessly out of floats, but that there's no 
> way to create them _readably_, in a way that's consistent with the way you 
> read and write them in every other context. Which is exactly the problem that 
> `f` solves. Adding a Decimal literal would not help that at all—letting me 
> write `f(1.23d)` instead of `f('1.23')` does not let me write `1.23f`.
>

TBH I don't quite understand the problem. Is it only an issue with
negative zero? If so, maybe you should say so, because in every other
way, building a complex out of a float added to an imaginary is
perfectly lossless.

> Also, I think you're the one who brought up performance earlier? `%timeit 
> np.float32('1.23')` is 671ns, while `%timeit np.float32(d)` with a 
> pre-constructed `Decimal(1.23)` is 2.56us on my laptop, so adding a Decimal 
> literal instead of custom literals actually encourages _slower_ code, not 
> faster.
>

No, I didn't say that. I have no idea why numpy would take longer to
work with a Decimal than a string, and that's the sort of thing that
could easily change from one version to another. But the main argument
here is about readability, not performance.

> Also, as the OP has pointed out repeatedly and nobody has yet answered, if I 
> want to write `f(1.23d)` or `f('1.23')`, I have to pollute the global 
> namespace with a function named `f` (a very commonly-used name); if I want to 
> write `1.23f`, I don't, since the converter gets stored in some 
> out-of-the-way place like `__user_literals_registry__['f']` rather than `f`. 
> That seems like a serious benefit to me.
>

Maybe. But far worse is that you have a very confusing situation that
this registered value could be different in different programs. In
contrast, f(1.23d) would have the same meaning everywhere: call a
function 'f' with one parameter, the Decimal value 1.23. Allowing
language syntax to vary between programs is a mess that needs a LOT
more justification than anything I've seen so far.

> > But you haven't made the case for generic string prefixes or any sort
> > of "arbitrary literal" that would let you import something that
> > registers something to make your float32 literals.
>
> Sure I did; you just cut off the rest of the email that had other cases.

Which said basically the same as the parts I quoted.

> And ignored most of what you quoted about the float32 case.

What did I ignore?

> And ignored the previous emails by both me and the OP that had other cases. 
> Or can you explain to me how a builtin Decimal literal could solve the 
> problem of Windows paths?

All the examples about Windows paths fall into one of two problematic boxes:

1) Proposals that allow an arbitrary prefix to redefine the entire
parser - basically impossible for anything sane

2) Proposals that do not allow the prefix to redefine the parser, and
are utterly useless, because the rest of the string still has to be
valid.

So no, you still haven't made a case for arbitrary literals.

> Here's a few more: Numeric types that can't be losslessly converted to and 
> from Decimal, like Fraction.

If you want to push for Fraction literals as well, then sure. But
that's still very very different from *arbitrary literal types*.

> Something more similar to complex (e.g., `quat = 1.0x + 0.0y + 0.1z + 1.0w`). 
> What would Decimal literals do for me there?
>

Quaternions are sufficiently niche that it should be possible to
represent them with multiplication.

quat = 1.0 + 0.0*i + 0.1*j + 1.0*k

With appropriate objects i, j, k, it should be possible to craft
something that implements quaternion arithmetic using this syntax.
Yes, it's not quite as easy as 4+3j is, but it's also far FAR rarer.
(And remember, even regular complex numbers are more advanced than a
lot of languages have syntactic support for.)

> I think your reluctance and the OP's excitement here both come from the same 
> source: Any feature that gives you a more convenient way to write and read 
> something is good, because it lets you write things in a way that's 
> consistent with your 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread stpasha
> But you haven't made the case for generic string prefixes or any sort
> of "arbitrary literal" that would let you import something that
> registers something to make your float32 literals.

The case can be made as follows: different people use different parts
of the Python language. Andrew would love to see the support for
decimals, fractions and float32s (possibly float16s too, and maybe
even posit numbers). Myself, I miss datetime and regular expression
literals. Other people on the 2013 thread argued at length in favor of
supporting sql-literals, which would allow them to be used in a much
safer manner. Then there are those who want to write complex 
numbers in a natural fashion, but they already got their wish granted.

In short, the needs vary, and not all of the functionality belongs to the
python standard library either.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YBL6FQIN35VJ3FMJO6RWOQT5GGFJ6RCU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
 On Tuesday, August 27, 2019, 11:12:51 AM PDT, Chris Angelico 
 wrote:
 
 >On Wed, Aug 28, 2019 at 3:10 AM Andrew Barnert via Python-ideas
> wrote:
>> Before I get into this, let me ask you a question. What does the j suffix 
>> give us? You can write complex numbers without it just fine:
>>
>>    c = complex
>>    c(1, 2)
>>
>> And you can even write a j function trivially:
>>
>>    def j(x): return complex(0, x)
>>    1 + j(2)
>>
>> But would anyone ever write that when they can write it like this:
>>
>>    1 + 2j
>>
>> I don’t think so. What does the j suffix give us? The two extra keystrokes 
>> are trivial. The visual noise of the parens is a bigger deal. The real issue 
>> is that this matches the way we conceptually think of complex numbers, and 
>> the way we write them in other contexts. (Well, the way electrical engineers 
>> write them; most of the rest of us use i rather than j… but still, having to 
>> use j instead of i is less of an impediment to reading 1+2j than having to 
>> use function syntax like 1+i(2).
>>
>> And the exact same thing is true in 3D or CUDA code that uses a lot of 
>> float32 values. Or code that uses a lot of Decimal values. In those cases, I 
>> actually have to go through a string for implementation reasons (because 
>> otherwise Python would force me to go through a float64 and distort the 
>> values), but conceptually; there are no strings involved when I write this:
>>
>>    array([f('0.2'), f('0.3'), f('0.1')])
>>
>> … and it would be a lot more readable if I could write it the same way I do 
>> in other programming languages:
>>
>>    array([0.2f, 0.3f, 0.1f])
>>
>> Again, it’s not about saving 4 keystrokes per number, and the visual noise 
>> of the parens is an issue but not the main one (and quotes are barely any 
>> noise by comparison); it’s the fact that these numeric values look like 
>> numeric values instead of looking like strings
>>
> If your conclusion here were "and that's why Python needs a proper> syntax 
> for Decimal literals", then I would be inclined to agree with> you - a 
> Decimal literal would be lossless (as it can entirely encode> whatever was in 
> the source file), and you could then create the> float32 values from those.
I think builtin Decimal literals are a non-starter. The type isn't even 
builtin. You surely wouldn't want to incur the cost of importing it to every 
Python session. And implementing some kind of lazy import mechanism in the 
middle of the json module is one thing, but in the middle of the compiler? So 
how _could_ you implement them? (While we're at it, what would that do to 
MicroPython and… one of the browser Pythons, I forget which… that have 100% 
syntax compatibility with Python but leave out much of the stdlib, including 
decimal? Sure, nobody ever promised they could do that, but it's a happy 
accident that they could, and do we want to break that capriciously?)
Maybe you could come up with some kind of DecimalLiteral object that doesn't 
actually act like a number, but can be converted to all of the different 
numeric types as needed (so, e.g., if you add or radd one to a `float` it 
converts to a `float`, etc.). That works great in languages like Swift and 
Haskell, but I don't think there's a feasible design for a dynamically-typed 
language.
So, even if Decimal literals really were the only thing we needed, a way to 
register Decimal literals may be the best way to do that.
But they're not. You didn't even attempt to answer the comparison with complex 
that you quoted. The problem that `j` solves is not that there's no way to 
create complex values losslessly out of floats, but that there's no way to 
create them _readably_, in a way that's consistent with the way you read and 
write them in every other context. Which is exactly the problem that `f` 
solves. Adding a Decimal literal would not help that at all—letting me write 
`f(1.23d)` instead of `f('1.23')` does not let me write `1.23f`.
Also, I think you're the one who brought up performance earlier? `%timeit 
np.float32('1.23')` is 671ns, while `%timeit np.float32(d)` with a 
pre-constructed `Decimal(1.23)` is 2.56us on my laptop, so adding a Decimal 
literal instead of custom literals actually encourages _slower_ code, not 
faster.
Also, as the OP has pointed out repeatedly and nobody has yet answered, if I 
want to write `f(1.23d)` or `f('1.23')`, I have to pollute the global namespace 
with a function named `f` (a very commonly-used name); if I want to write 
`1.23f`, I don't, since the converter gets stored in some out-of-the-way place 
like `__user_literals_registry__['f']` rather than `f`. That seems like a 
serious benefit to me.
> But you haven't made the case for generic string prefixes or any sort
> of "arbitrary literal" that would let you import something that
> registers something to make your float32 literals.

Sure I did; you just cut off the rest of the email that had other cases. And 
ignored most of what you quoted about the 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
 On Tuesday, August 27, 2019, 11:42:23 AM PDT, Serhiy Storchaka 
 wrote:
 
 > 27.08.19 20:07, Andrew Barnert via Python-ideas пише:
>> Before I get into this, let me ask you a question. What does the j suffix 
>> give us? You can write complex numbers without it just fine:
>> 
>>      c = complex
>>      c(1, 2)
>> 
>> And you can even write a j function trivially:
>> 
>>      def j(x): return complex(0, x)
>>      1 + j(2)
>> 
>> But would anyone ever write that when they can write it like this:
>> 
>>      1 + 2j
>
> And it has its limitation. How would you write complex(-0.0, 1.0)?

And yet, despite that limitation, many people find it useful, and use it on a 
daily basis. Are you suggesting that Python would be better off without the `j` 
suffix because of that problem?   ___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VNZM2WFLBY6P3MVS3OBK7L23ROMVDKM7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Chris Angelico
On Wed, Aug 28, 2019 at 3:10 AM Andrew Barnert via Python-ideas
 wrote:
> Before I get into this, let me ask you a question. What does the j suffix 
> give us? You can write complex numbers without it just fine:
>
> c = complex
> c(1, 2)
>
> And you can even write a j function trivially:
>
> def j(x): return complex(0, x)
> 1 + j(2)
>
> But would anyone ever write that when they can write it like this:
>
> 1 + 2j
>
> I don’t think so. What does the j suffix give us? The two extra keystrokes 
> are trivial. The visual noise of the parens is a bigger deal. The real issue 
> is that this matches the way we conceptually think of complex numbers, and 
> the way we write them in other contexts. (Well, the way electrical engineers 
> write them; most of the rest of us use i rather than j… but still, having to 
> use j instead of i is less of an impediment to reading 1+2j than having to 
> use function syntax like 1+i(2).
>
> And the exact same thing is true in 3D or CUDA code that uses a lot of 
> float32 values. Or code that uses a lot of Decimal values. In those cases, I 
> actually have to go through a string for implementation reasons (because 
> otherwise Python would force me to go through a float64 and distort the 
> values), but conceptually; there are no strings involved when I write this:
>
> array([f('0.2'), f('0.3'), f('0.1')])
>
> … and it would be a lot more readable if I could write it the same way I do 
> in other programming languages:
>
> array([0.2f, 0.3f, 0.1f])
>
> Again, it’s not about saving 4 keystrokes per number, and the visual noise of 
> the parens is an issue but not the main one (and quotes are barely any noise 
> by comparison); it’s the fact that these numeric values look like numeric 
> values instead of looking like strings
>

If your conclusion here were "and that's why Python needs a proper
syntax for Decimal literals", then I would be inclined to agree with
you - a Decimal literal would be lossless (as it can entirely encode
whatever was in the source file), and you could then create the
float32 values from those.

But you haven't made the case for generic string prefixes or any sort
of "arbitrary literal" that would let you import something that
registers something to make your float32 literals.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/57CA2ZAIFXUZMF2ISNBS4UTESPX2ZA4G/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 08:36, Steven D'Aprano  wrote:
> 
> I don't wish to say that parsing strings to extract information is 
> always an anti-pattern:
> 
> http://cyrille.martraire.com/2010/01/the-string-obsession-anti-pattern/
> 
> after all we often need to process data coming from config files or 
> other user-input, where we have no choice but to accept a string.
> 
> But parsing string *literals* usually is an anti-pattern, especially 
> when there is a trivial transformation from the string to the 
> constructor arguments, e.g. 123/4567 --> Fraction(123, 4567).

But there are plenty of cases where parsing string literals is the current 
usual practice. Decimal is obvious, as well as most other non-native numeric 
types. Path objects even more so. Pandas users seem to always build their 
datetime objects out of MMDDTHHMMSS strings. And so on. 

So the status quo doesn’t mean nobody parses string literals, it means people 
_explicitly_ parse string literals. And the proposed change doesn’t mean more 
string literal parsing, it means making some of the existing, uneliminable uses 
less visually prominent and more readable. (And, relevant to the blog you 
linked, it seems to make it _less_ likely, not more, that you’d bind the string 
rather than the value to a name, or pass it around and parse it repeatedly, or 
the other bar practices they were talking about.)

I’ll admit there are some cases where I might sacrifice performance for 
convenience if we had this feature. For example, F1/3 (or 1/3F with suffixes) 
would have to mean at least Fraction(1) / 3, if not Fraction('1') / 3, or even 
that plus an extra LOAD_ATTR. That is clearly going to be more expensive than 
F(1, 3) meaning Fraction(1, 3), but I’d still do it at the REPL, and likely in 
real code as well. But I don’t think that choice would make my code worse 
(because when setup costs matter, I _wouldn’t_ make that choice), so I don’t 
see that as a problem.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YQMFAPYI6TU3APEYWROQN5GFMVG2I3TF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread stpasha
> Unless there is some significant difference between the two, what does 
> this proposal give us?

The difference between `x'...'` and `x('...')`, other than visual noise, is the
following:

- The first "x" is in its own namespace of string prefixes. The second "x"
  exists in the global namespace of all other symbols.

- Python style discourages too short variable names, especially in libraries,
  because they have increased chance of clashing with other symbols, and
  generally may be hard to understand. At the same time, short names for
  string prefixes could be perfectly fine: there won't be too many of them
  anyways. The standard prefixes "b", "r", "u", "f" are all short, and nobody
  gets confused about them.

- Barrier of entry. Today you can write `from re import compile as x` and then
  write `x('...')` to denote a regular expression (if you don't mind having `x` 
as
  a global variable). But this is not the way people usually write code. People
  write the code the way they are taught from examples, and the examples 
  don't speak about regular expression objects. The examples only show
  regular expressions-as-strings, so many python users don't even realize
  that regular expressions can be objects.

  Now, if the string prefixes were available, library authors would think "Do we
  want to export such functionality for the benefit of our users?" And if they
  answer yes, then they'll showcase this in the documentation and examples,
  and the user will see that their code has become cleaner and more 
  understandable.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YYN7DVRINV7HGZ5ENVFBBERB2LY2SCM7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 08:52, Steven D'Aprano  wrote:
> 
>> On Tue, Aug 27, 2019 at 05:24:19AM -0700, Andrew Barnert via Python-ideas 
>> wrote:
>> 
>> There is a possibility in between the two extremes of “useless” and 
>> “complete monster”: the prefix accepts exactly one token, but can 
>> parse that token however it wants.
> 
> How is that different from passing a string argument to a function or 
> class constructor that can parse that token however it wants?
> 
>x'...'
> 
>x('...')
> 
> Unless there is some significant difference between the two, what does 
> this proposal give us?

Before I get into this, let me ask you a question. What does the j suffix give 
us? You can write complex numbers without it just fine:

c = complex
c(1, 2)

And you can even write a j function trivially:

def j(x): return complex(0, x)
1 + j(2)

But would anyone ever write that when they can write it like this:

1 + 2j

I don’t think so. What does the j suffix give us? The two extra keystrokes are 
trivial. The visual noise of the parens is a bigger deal. The real issue is 
that this matches the way we conceptually think of complex numbers, and the way 
we write them in other contexts. (Well, the way electrical engineers write 
them; most of the rest of us use i rather than j… but still, having to use j 
instead of i is less of an impediment to reading 1+2j than having to use 
function syntax like 1+i(2).

And the exact same thing is true in 3D or CUDA code that uses a lot of float32 
values. Or code that uses a lot of Decimal values. In those cases, I actually 
have to go through a string for implementation reasons (because otherwise 
Python would force me to go through a float64 and distort the values), but 
conceptually; there are no strings involved when I write this:

array([f('0.2'), f('0.3'), f('0.1')])

… and it would be a lot more readable if I could write it the same way I do in 
other programming languages:

array([0.2f, 0.3f, 0.1f])

Again, it’s not about saving 4 keystrokes per number, and the visual noise of 
the parens is an issue but not the main one (and quotes are barely any noise by 
comparison); it’s the fact that these numeric values look like numeric values 
instead of looking like strings

The fact that they look the same as the same values in other contexts like a 
C++ program or a GLSL shader is a pretty large added bonus. But I don’t think 
that’s essential to the value here. If you forced me to use prefixes instead of 
suffixes (I don’t think there’s any good reason for that, but who knows how the 
winds of bikeshedding may blow), I’d still prefer f2.3 to f('2.3'), because it 
still looks like a number, as it should.

I know this is doable, because I’ve written an import hook that does it, plus I 
have a decade of experience with another popular language (C++) that has 
essentially the same feature.

What about the performance cost of these values not being constants? A 
decorator that finds np.float32 calls on constants and promoted them to 
constants by hacking the bytecode is pretty trivial to write, or you can load 
the whole array in one go from a bytes constant and put the readable version in 
a comment, or whatever. But anything that’s slow enough to be worth optimizing 
is doing a huge matmul or pushing zillions of values back and forth to the GPU 
or something else that swamps the setup cost, even if the setup cost involves a 
few dozen string parses, so it never matters. At least not for me.

—-

For a completely different example—but one that I’ve also already given earlier 
in this thread, so I won’t belabor it too much:

path'C:\'

bs"this\ space won’t have a backslash before it, also \e[22; is an escape 
sequence and of course \a is still a bell because I’m using the rules from 
C/JS/etc."

bs37"this\ space has a backslash before it without raising a warning or an 
error even in Python 3.15 because I’ve implemented the 3.7 rules”

… and so on.

Some of these _could_ be done with a raw string and a (maybe slightly more 
complicated) function call, but at least the first one is impossible to do that 
way.

Unlike the numeric suffixes, this one I haven’t actually implemented a hacky 
version of, and I don’t know of any other languages that have an identical 
feature, so I can’t promise it’s feasible, but it seems like it should be.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RKGU2ZQUQDBSIFQOBXQPO4UVSYIF4NEF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Steven D'Aprano
On Tue, Aug 27, 2019 at 05:24:19AM -0700, Andrew Barnert via Python-ideas wrote:

> There is a possibility in between the two extremes of “useless” and 
> “complete monster”: the prefix accepts exactly one token, but can 
> parse that token however it wants.

How is that different from passing a string argument to a function or 
class constructor that can parse that token however it wants?

x'...'

x('...')

Unless there is some significant difference between the two, what does 
this proposal give us?


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EASIYWVRMVS3QNNWFOVQT7EIZFOUAPBU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Steven D'Aprano
On Tue, Aug 27, 2019 at 08:22:22AM -, stpa...@gmail.com wrote:

> The string (or number) prefixes add new power to the language

I don't think they do. It's just syntactic sugar for a function call. 
There's nothing that czt'...' will do that czt('...') can't already do.

If you have a proposal that allows custom string prefixes to do 
something that a function call cannot do, I've missed it.


> If a certain feature can potentially be misused shouldn't deter us
> from adding it, if the benefits are significant.

Very true, but so far I see nothing in this proposal that suggests that 
the benefits are more significant than avoiding having to type a pair of 
parentheses. Every benefit I have seen applies equally to the function 
call version, but without the added complexity to the language of 
allowing custom string prefixes.


> And the benefits in terms of readability can be significant.

I don't think they will be. I think they will encourage cryptic 
one-character function names disguised as prefixes:

v'...' instead of Version(...)
x'...' instead of re.compile(...)

to take two examples from your proposal. At least this is somewhat 
better:

sql'...'

but that leaves the ambiguity of not knowing whether that's a chained 
function call s(q(l(...))) or a single sql(...).

I believe it will also encourage inefficient and cryptic string parsing 
instead of more clear use of seperate arguments. Your earlier example:

frac'123/4567'

The Fraction constructor already accepts such strings, and it is 
occasionally handy for parsing user-input. But using it to parse string 
literals gives slow, inefficient code for little or no benefit:

[steve@ando cpython]$ ./python -m timeit -s 'from fractions import 
Fraction' 'Fraction(123, 4567)'
2 loops, best of 5: 18.9 usec per loop

[steve@ando cpython]$ ./python -m timeit -s 'from fractions import 
Fraction' 'Fraction("123/4567")'
5000 loops, best of 5: 52.9 usec per loop


Unless you can suggest a way to parse arbitrary strings in arbitrary 
ways at compile-time, these custom string prefixes are probably doomed 
to be slow and inefficient.

The best thing I can say about this is that at least frac'123/4567' 
would probably be easy to understand, since the / syntax for fractions 
is familiar to most people from school. But the same cannot be said for 
other custom prefixes:

cf'[0; 37, 7, 1, 2, 5]'

Perhaps you can guess the meaning of that cf-string. Perhaps you can't. 
A hint might point you in the right direction:

assert cf'[0; 37, 7, 1, 2, 5]' == Fraction(123, 4567)

(By the way, the semi-colon is meaningful and not a typo.)

To the degree that custom string prefixes will encourage cryptic one and 
two letter names, I think that this will hurt readability and clarity of 
code. But if the reader has the domain knowledge to recognise what "cf" 
stands for, this may be no worse than (say) "re" (regular expression).

In conventional code, we might call the cf function like this:

cf([0, 37, 7, 1, 2, 5])  # Single list argument.
cf(0, 37, 7, 1, 2, 5)# *args version.

Either way works for me. But it is your argument that replacing the 
parentheses with quote marks is "more readable":

cf([0, 37, 7, 1, 2, 5])
cf'[0; 37, 7, 1, 2, 5]'

not just a little bit more readable, but enough to make up for the 
inefficiency of having to write your own parser, deal with errors, 
compile a string literal, parse it at runtime, and only then call the 
actual cf constructor and return a cf object.

Even if I accepted your claim that swapping (...) for '...' was more 
readable, I am skeptical that the additional work and runtime 
inefficiency would be worth the supposed benefit.


I don't wish to say that parsing strings to extract information is 
always an anti-pattern:

http://cyrille.martraire.com/2010/01/the-string-obsession-anti-pattern/

after all we often need to process data coming from config files or 
other user-input, where we have no choice but to accept a string.

But parsing string *literals* usually is an anti-pattern, especially 
when there is a trivial transformation from the string to the 
constructor arguments, e.g. 123/4567 --> Fraction(123, 4567).


[...]
> Exactly. You look at string "1.10a" and you know it must be a version string,
> because you're a human, you're smart. The compiler is not a human, it has no
> idea. To the Python interpreter it's just a PyUnicode object of length 5. It's
> meaningless. But when you combine this string with a prefix into a single
> object, it gains power. It can have methods or special behaviors. It can have
> a type, different from `str`, that can be inspected when passing this object 
> to
> another function.

Everything you say there applies to ordinary function call syntax too:

Version('1.10a')

can have methods, special behaviours, a type different from str, etc. 
Not one of those benefits comes from *custom string prefixes*. They all 
come from the use of a 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 27, 2019, at 01:42, Chris Angelico  wrote:
> 
> Will these "custom
> prefixes" be able to define anything syntactically? If not, why not
> just use a function call? And if they can, then you have created an
> absolute monster, where a v-string in one context can have completely
> different syntactic influence on what follows it than a v-string in
> another context.

There is a possibility in between the two extremes of “useless” and “complete 
monster”: the prefix accepts exactly one token, but can parse that token 
however it wants.

That’s pretty close to what C++ does, and pretty close to the way my hacky 
proof of concept last time around worked, and I don’t think that only works 
because those are suffix-only designs. 

(That being said, if you do allow “really raw” string literals as input to the 
user prefixes/suffixes to handle the path'C:\' case, then it’s possible to 
invent cases that would tokenize differently with and without the feature—in 
fact. I just did—and therefore it _might_ be possible to invent cases that 
parse validly but differently, in which case the monster is lurking after all. 
Someone might want to look more carefully at the C++ rules for that?)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GLZPVGS5NSWZ64DJMQBPETZFT3TDJEGH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Andrew Barnert via Python-ideas
On Aug 26, 2019, at 23:43, Serhiy Storchaka  wrote:
> 
> 27.08.19 06:38, Andrew Barnert via Python-ideas пише:
>>  * JSON (register the stdlib or simplejson or ujson),
> 
> What if the JSON suffix for?

I think you’d mainly want it in combination with percent-, html-, or 
uu-equals-decoding, which makes it a potential stress test of the “multiple 
affixes” or “affixes with modifiers” idea. Which I think is important, because 
I like what the OP came up with for that idea, so I want to push it beyond just 
the “regex with flags” example to see if it breaks.

Maybe URL, which often has the same html and percent encoding issues, would be 
a better example? I personally don’t need to decode URLs that often in Python 
(unlike in, say, ObjC, where there’s a smart URL class that you use in place of 
strings all over the place), but maybe others do?

> JSON is virtually a subset of Python except that that it uses true, false and 
> null instead of True, False and None.

Is it _virtually_ a subset, or literally so, modulo those three values? I don’t 
know off the top of my head. Look at all the trouble caused by Crockford just 
assuming that the syntax he’d defined was a strict subset of JS when actually 
it isn’t quite.

Actually, now that I think of it, I do know. Python has allow_nan on by 
default, so you’d need to also `from math import nan as NaN` and `from math 
import inf as Infinity`. But is that it? I’m not sure.

And of course if you’ve done this:

jdec = json.JSONDecoder(parse_float=Decimal)
__register_prefix__(jdec.decode, 'j')

… then even j'1.1' and 1.1 are no longer the same values.

 Not to mention what you get if you registered Pandas’s JSON reader instead of 
the stdlib’s.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FICNLZX77UVOYS6P7VG2AYFNUVM3FBMX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Chris Angelico
On Tue, Aug 27, 2019 at 6:25 PM  wrote:
> You're correct that, devoid of context, `v"smth..."` is not very meaningful. 
> The
> "v" suffix could mean "version", or "verbose", or "volatile", or "vectorized",
> or "velociraptor", or whatever. Luckily, the code is almost always exists
> within a specific context. It solves a particular problem, and works within a
> particular domain, and makes perfect sense for people working within that
> domain.
>
> This isn't much different than, say, `np.` suffix, which means "numpy" in the
> domain of numerical computations, NP-completeness for some mathematicians,
> and "no problem" for regular users.

Syntactically, the "np." prefix (not suffix fwiw) actually means "look
up the np object, then locate an attribute called ". That's true of every prefix you could ever get, and they're
always looked up at run time; the attribute name always follows the
exact same syntactic rules no matter what the prefix is. Literals, on
the other hand, are part of syntax - a different string type prefix
can change the way the entire file gets parsed. Will these "custom
prefixes" be able to define anything syntactically? If not, why not
just use a function call? And if they can, then you have created an
absolute monster, where a v-string in one context can have completely
different syntactic influence on what follows it than a v-string in
another context. At least with attribute lookups, you can parse a file
without knowing what "np" actually means, and even examine things at
run-time.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZDKVSRIASJQFORKF7FPARBYFELGUDBM2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread stpasha
Thank you, Steven, for taking the time to write such an elaborate rebuttal.
If I understand the heart of your argument correctly, you're concerned that
the prefixed strings may add confusion to the code. That nobody knows
what `l'abc'` or `czt'xxx'` could possibly mean, while at the same time
`v'1.0'` could mean many things, whereas `v'cal-{a}'` would mean nothing
at all...

These are all valid concerns. The string (or number) prefixes add new power
to the language, and with new power comes new responsibility. While the
syntax can be used to enhance readability of the code, it can also be abused
to make the code more obscure. However, Python does not attempt to be
an idiot-proof language. "We are all consenting adults" is one of its guiding
principles. If a certain feature can potentially be misused shouldn't deter us
from adding it, if the benefits are significant.

And the benefits in terms of readability can be significant. Consider the
existing python prefixes: `r'...'` is purely for readability, it adds no extra
functionality; `f'...'` has a neat compiler support, but even if it didn't (and 
most
python users don't actually realize f-strings get preprocess by the compiler)
it would still enhance readability compared to `str.format()`. It's nice to be 
able
to write a complex number as `5 + 3j` instead of `complex(5, 3)`. And so on.

> What's v() do? Verbose string?
> Oh, you intended a version string did you? If only you had written 
> version instead of v I might not have guessed wrong. What were 
> you saying about preferring readability and clarity over brevity?

You're correct that, devoid of context, `v"smth..."` is not very meaningful. The
"v" suffix could mean "version", or "verbose", or "volatile", or "vectorized",
or "velociraptor", or whatever. Luckily, the code is almost always exists
within a specific context. It solves a particular problem, and works within a
particular domain, and makes perfect sense for people working within that
domain.

This isn't much different than, say, `np.` suffix, which means "numpy" in the
domain of numerical computations, NP-completeness for some mathematicians,
and "no problem" for regular users. 

>From practical perspective, the meaning of each particular symbol will come
from the way that it was created or imported. For example, if you script says
`from packaging.version import v` then "v" is a version. If, on the other hand,
it says `from zoo import velociraptor as v`, then it's an altogether different 
beast.

> In other words, I got all of the meaning from the string part, not the 
> prefix. The prefix on its own, I would have guessed completely wrong.

Exactly. You look at string "1.10a" and you know it must be a version string,
because you're a human, you're smart. The compiler is not a human, it has no
idea. To the Python interpreter it's just a PyUnicode object of length 5. It's
meaningless. But when you combine this string with a prefix into a single
object, it gains power. It can have methods or special behaviors. It can have
a type, different from `str`, that can be inspected when passing this object to
another function.

Think of `v"1.10a"` as making a "typed string" (even though it may end up not
being a string at all). By writing `v"1.10a"` I convey the intent for this to 
be a
version string.

> for rather insignificant gains, the saving of two parentheses. 

Two bytes doesn't sound like a lot. I mean, it is quite little on the grand 
scale
of things. However, I don't think the simple byte-count is a proper measure
here. There could be benefits to readability even if it was 0 or negative byte
difference.

I believe a good way to think about this is the following: if the feature was 
already implemented, would people want to use it, and would it improve
readability of their code? I speculate that the answer is true to both of these
questions. At least some people.

As a practical example, consider function `pandas.read_csv()`. The documentation
for its `sep` parameter says "In addition, separators longer than 1 character 
and
different from ``'\s+'`` will be interpreted as regular expressions ...". In 
this case
they wanted the `sep` parameter to handle both simple separators, and the
regular expression separators. However, as there is no syntax to create a 
"regular expression string", they ended up with this dubious heuristic based on
the length of the string... Ideally, they should have said that `sep` could be 
either
a string or a regexp-object, but the barrier to write 

from re import compile as rx
rx('...')

is just impossibly high for a typical user. Not to mention that such code 
**would**
be actually harder to read, because I'd be inventing my own notation for a 
function that is commonly known under a different name.

My another pet peeve is datetime literals. Or, rather, their absence. I often 
see,
again in pandas, how people create columns of strings ["2010-05-01", 
"2010-05-02", 
...], and then call `parse_datetime()`. 

[Python-ideas] Re: Custom string prefixes

2019-08-27 Thread Serhiy Storchaka

27.08.19 06:38, Andrew Barnert via Python-ideas пише:

  * JSON (register the stdlib or simplejson or ujson),


What if the JSON suffix for? JSON is virtually a subset of Python except 
that that it uses true, false and null instead of True, False and None. 
If you set these three variables you can embed JSON syntax in pure Python.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MJNZEPTAWTG2ESGFDRU523ATJGV5UGGU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-26 Thread Andrew Barnert via Python-ideas
> On Aug 26, 2019, at 18:41, stpa...@gmail.com wrote:
> 
> Thanks, Andrew, for your feedback. I didn't even think about string 
> **suffixes**, but
> clearly they can be implemented together with the prefixes for additional 
> flexibility.

What about _instead of_ rather than _together with_? Half of Stephen’s 
objections are related to the ambiguity (to a human, even if not to the parser) 
of user prefixes in the (potential) presence of the builtin prefixes. None of 
those go even arise with suffixes. Anyway, maybe you already have good answers 
for all of those objections, but if not…

Also, there’s at least one mainstream language (C++) that allows user suffixes 
and has literal syntax otherwise somewhat like Python’s, and the proposals for 
other languages like Rust generally seem to be generally trying to do “like C++ 
but minus all the usual C++ over-complexity”. Are there actual examples of 
languages with user prefixes?

The only different designs I know of rely on the static type of the evaluation 
context. (For example, in Swift, you can just statically type `23 : km` or 
`"abc]*" : regex`, or even just pass the literal to a function that’s declared 
or inferred to take a regex if that happens to be readable in your use case, so 
there’s no need for a suffix syntax.) Which is neat, but obviously not 
applicable to Python. 

> And your idea that ` ` is conceptually no different 
> than
> ` ` is absolutely insightful.

Well, back in 2015 I probably just stole the idea from C++. :)

Another question that raises that I just remembered: the word “literal” has 
three overlapping but distinct meanings in Python. Which one do we actually 
mean here? In particular, are container displays “literals”? For that matter, 
is -2 even a literal?

Also, from what I remember, either in 2013 or in 2015, the discussion got 
side-tracked over people not liking the word “literal” to mean “something 
that’s actually the result of a runtime function call”. That may be less of a 
problem after f-strings (which are called literals in the PEP; not sure about 
the language reference), but last time around, bringing up the fact that “-2” 
is actually a function call didn’t sway anyone. So, maybe I shouldn’t be using 
the word “literal” this time, and I really hope it doesn’t ruin your proposal…

> Speaking of string suffixes, flags on regular expressions immediately come to 
> mind.
> For example `rx"(abc)"ig` could create a regular expression that performs 
> global 
> case-insensitive search.

That’s an interesting idea. And that’s something you can’t do with a 
single-affix design; you need prefixes and suffixes, unless you have some kind 
of separator for chaining, or only allow single characters.

>> I don’t think you can fairly discuss this idea without getting at least a
>> _little_ bit into the implementation details.
> 
> Right. So, the first question to answer is what the compiler should do when 
> it sees
> a prefixed (suffixed) string? That is, what byte-code should be emitted when 
> the
> compiler sees `lambda: a"bcd"e` ?
> 
> In one approach, we'd want this expression to be evaluated at compile time, 
> similar
> to how f-strings work. However, how would the compiler know what prefix "a" 
> means
> exactly? There has to be some kind of directive to tell the compiler that. 
> For example,
> imagine the compiler sees near the top of the file
> 
>#pragma from mymodule import a
> 
> It would then import the symbol `a`, call `a("bcd", suffix="e")`. This would 
> return an
> AST tree that will be plugged in place of the original string.
> 
> This solution allows maximum efficiency, but seems inflexible and deeply 
> invasive.
> 
> Another approach would defer the construction of objects to compile time. 
> Though
> not as efficient, it would allow loading prefixes at run-time. In this case 
> `a"bcd"e` can
> be interpreted by the compiler as if it was
> 
>a("bcd", suffix="e")
> 
> where symbol `a` is to be looked up in the local/global scope.

My hack works basically like this. The compiler just converts it to a function 
call, which is looked up normally. I think that’s the right tack here. IIRC, my 
hack translates a D suffix into a call to something like -_user_literal_D, 
which solves the problem with accidental pollution of the namespace. But this 
does mean that any code that wants to use the D suffix has to `from 
decimal_literals import *, or `2.3D` raises a NameError about nothing named 
_user_literal_D. (Either that, or someone has to inject it into builtins…) I’m 
not sure whether that’s user-friendly enough.

Anyway, I think your registry idea makes more sense. Then `2.3D` effectively 
just means `__user_literals__['D']('2.3')`, and there’s no namespace pollution 
at all.

> For this approach to work, we'd create a
> new code op, so that `a"bcd"e` would become
> 
>0 LOAD_CONST1 ('a', 'bcd', 'e')
>2 STR_RESOLVE_TAG   0
> 
> where `STR_RESOLVE_TAG` would effectively call 

[Python-ideas] Re: Custom string prefixes

2019-08-26 Thread Steven D'Aprano
On Mon, Aug 26, 2019 at 11:03:38PM -, stpa...@gmail.com wrote:
> In Python strings are allowed to have a number of special prefixes:
> 
> b'', r'', u'', f'' 
> + their combinations.
> 
> The proposal is to allow arbitrary (or letter-only) user-defined prefixes as 
> well.
> Essentially, a string prefix would serve as a decorator for a string, 
> allowing the
> user to impose a special semantics of their choosing.
> 
> There are quite a few situations where this can be used:
> - Fraction literals: `frac'123/4567'`

Current string prefixes are allowed in combinations. Does the same apply 
to your custom prefixes?

If yes, then they are ambiguous: how could the reader tell whether the 
string prefix frac'...' is a f- r- a- c-string combination, a fra- 
c-string combination, a fr- ac-string combination, or a f- rac- string 
combination?

If no, then it will confuse and frustrate users who wonder why they can 
combine built-in prefixes like fr'...' but not their own prefixes.

What kind of object is a frac-string? You might think it is obvious that 
it is a "frac" (Fraction? Something else?) but how about a czt-string?

As a reader, at least I know that czt('...') is a function call that 
could return anything at all. That is standard across hundreds of 
programming languages. But as a string prefix, it looks like a kind of 
string, but could be anything at all. Imagine trying to reason about 
Python syntax:

1. u'...' is a unicode string, evaluating to a str.
2. r'...' is a raw string, evaluating to a str.
3. f'...' is a f-string, evaluating to a str.
4. b'...' is a byte-string, evaluating to a bytes object, which
   is not a str object but is still conceptually a kind of string.

5. Therefore z'...' is what kind of string, evaluating to what 
   kind of object?


Things that look similar should be similar. This string prefix idea 
means that things that look similar can be radically different. It looks 
like a string, but may not be anything like a string.

The same applies to function call syntax, of course, but as I mentioned 
above, function call syntax is standard across hundreds of languages and 
readers don't expect that the result of an arbitrary function call is 
necessarily the same as its first argument(s). We don't expect that 
foo('abcde') will return a string, even if we're a little unclear about 
what foo() actually does.

u- (unicode) strings, r- (raw) strings, and even b- (byte) strings are 
all kinds of *string*. We know just by looking at them that they 
evaluate to a str or bytes object. Even f-strings, which is syntax for 
executable code, at least is guaranteed to evaluate to a str object. But 
these arbitrary string prefixes could return anything.


> This proposal has been already discussed before, in 2013:
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/M3OLUURUGORLUEGOJHFWEAQQXDMDYXLA/
> 
> The opinions were divided whether this is a useful addition. The opponents
> mainly argued that as this only "saves a couple of keystrokes", there is no
> need to overcomplicate the language.

Indeed. czt'...' saves only two characters from czt('...').


> It seems to me that now, 6 years later, 
> that argument can be dismissed by the fact that we had, in fact, added new
> prefix "f" to the language.

I don't see how that follows. The existence of one new prefix adds 
*this* much new complexity:

[holds forefinger and thumb about a millimeter apart]

for significant gains. Trying to write your own f-string 
equivalent function would be quite difficult, but being in the language 
not only is it faster and more efficient than a function call, but it 
needs to be only written once.

But adding a new way of writing single-argument function calls with a 
string argument:

czt'...' is equivalent to czt('...')

adds *this* much complexity to the language:

[holds forefingers of each hand about shoulder-width apart]

for rather insignificant gains, the saving of two parentheses. You still 
have to write the czt() function, it will have to parse the string 
itself, you will have no support from the compiler, and anyone needing 
this czt() will either have to re-invent the wheel or hope that somebody 
publishes it on PyPI with a suitable licence.


> Note how the "format strings" would fall squarely
> within this framework if they were not added by now.
>
> In addition, I believe that "saving a few keystroked" is a worthy goal if it 
> adds
> considerable clarity to the expression. Readability counts. Compare:
> 
> v"1.13.0a"
> v("1.13.0a")

What's v() do? Verbose string?

 
> To me, the former expression is far easier to read. Parentheses, especially as
> they become deeply nested, are not easy on the eyes. But, even more 
> importantly,
> the first expression much better conveys the *intent* of a version string. 

Oh, you intended a version string did you? If only you had written 
``version`` instead of ``v`` I might not have guessed wrong. What 

[Python-ideas] Re: Custom string prefixes

2019-08-26 Thread MRAB

On 2019-08-27 00:03, stpa...@gmail.com wrote:

In Python strings are allowed to have a number of special prefixes:

 b'', r'', u'', f''
 + their combinations.

The proposal is to allow arbitrary (or letter-only) user-defined prefixes as 
well.
Essentially, a string prefix would serve as a decorator for a string, allowing 
the
user to impose a special semantics of their choosing.

There are quite a few situations where this can be used:
- Fraction literals: `frac'123/4567'`
- Decimals: `dec'5.34'`
- Date/time constants: `t'2019-08-26'`
- SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
- Regular expressions: `rx'[a-zA-Z]+'`
- Version strings: `v'1.13.0a'`
- etc.

This proposal has been already discussed before, in 2013:
https://mail.python.org/archives/list/python-ideas@python.org/thread/M3OLUURUGORLUEGOJHFWEAQQXDMDYXLA/

The opinions were divided whether this is a useful addition. The opponents
mainly argued that as this only "saves a couple of keystrokes", there is no
need to overcomplicate the language. It seems to me that now, 6 years later,
that argument can be dismissed by the fact that we had, in fact, added new
prefix "f" to the language. Note how the "format strings" would fall squarely
within this framework if they were not added by now.

In addition, I believe that "saving a few keystroked" is a worthy goal if it 
adds
considerable clarity to the expression. Readability counts. Compare:

 v"1.13.0a"
 v("1.13.0a")

To me, the former expression is far easier to read. Parentheses, especially as
they become deeply nested, are not easy on the eyes. But, even more importantly,
the first expression much better conveys the *intent* of a version string. It 
has
a feeling of an immutable object. In the second case the string is passed to the
constructor, but the string has no meaning of its own. As such, the second
expression feels artificial. Consider this: if the feature already existed, how 
*would*
you prefer to write your code?

The prefixes would also help when writing functions that accept different types
of their argument. For example:

 collection.select("abc")   # find items with name 'abc'
 collection.select(rx"[abc]+")  # find items that match regular expression

I'm not discussing possible implementation of this feature just yet, we can get 
to
that point later when there is a general understanding that this is worth 
considering.


At what point would backslashes be handled?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6TBI2EBGB2IBGZUZDQEMHIJPILFXTJQ5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-26 Thread Andrew Barnert via Python-ideas
On Aug 26, 2019, at 16:03, stpa...@gmail.com wrote:
> 
> In Python strings are allowed to have a number of special prefixes:
> 
>b'', r'', u'', f'' 
>+ their combinations.
> 
> The proposal is to allow arbitrary (or letter-only) user-defined prefixes as 
> well.
> Essentially, a string prefix would serve as a decorator for a string, 
> allowing the
> user to impose a special semantics of their choosing.

I don’t think you can fairly discuss this idea without getting at least a 
_little_ bit into the implementation details.

How does your code specify a new prefix? How does the tokenizer know which 
prefixes are active? What code does the compiler emit for a prefixed string? 
The answers to those questions will determine which potential prefixes are 
useful.

In particular, you mention that f-strings “would fall squarely within this 
framework”, but it’s actually pretty hard to imagine an implementation that 
would have actually allowed for f-strings. They essentially need to recursively 
call the compiler on the elements inside braces, and then inline the resulting 
expressions into the containing scope.

> In addition, I believe that "saving a few keystroked" is a worthy goal if it 
> adds
> considerable clarity to the expression. Readability counts. Compare:
> 
>v"1.13.0a"
>v("1.13.0a")
> 
> To me, the former expression is far easier to read. Parentheses, especially as
> they become deeply nested, are not easy on the eyes. But, even more 
> importantly,
> the first expression much better conveys the *intent* of a version string. It 
> has
> a feeling of an immutable object. In the second case the string is passed to 
> the
> constructor, but the string has no meaning of its own. As such, the second
> expression feels artificial. Consider this: if the feature already existed, 
> how *would*
> you prefer to write your code?

Neither. I’d prefer this:

2.3D # decimal.Decimal('2.3')
1/3F # 1/fractions.Fraction('3')

After all, why would I want to put the number in a string when it’s not a 
string, but a number? This looks a lot like C’s `2.3f` that gives me 2.3 as a 
float rather than a double, and it works like it too, so there’s no surprise. 
And C++ already proves that such a thing can be widely usable; it’s been part 
of that language for three versions, since 2011.

Also this:

p'C:\'

That can’t be handled by just using a “native
Path” prefix together with the existing raw prefix, because even in raw string 
literals you can’t end with a backslash.

And this is another place where talking about implementation matters.

At first glance it might seem like arbitrary-literal affixes would be a lot 
more difficult than string-literal-only affixes, but in fact, as soon as you 
try to implement it, you realize that you get the exact same set of issues, no 
more. See https://github.com/abarnert/userliteralhack for a proof of concept I 
wrote back in 2015. (Not that we’d want to actually implement them the way I 
did, just demonstrating that it can be done, and doesn’t cause ambiguity.) I’ve 
got a couple older PoCs up there as well if you want to play around more, 
including one that only allows string literals (so you can see that it’s 
actually no easier, and solves no ambiguity problems). I can’t remember if I 
did one that does prefixes instead of suffixes, but I don’t _think_ that raises 
any new issues, except for the one about interacting with the existing prefixes.

And it might seem like having some affixes get the totally raw token, others 
get a cooked string is too complicated, but C++ actually lives with a 3-way 
distinction between raw token, cooked string, and fully parsed value. (Why 
would you ever want the last one? So your units-and-quantities library can 
define a _km suffix so 2_km is a km, 2.3_km is a km, 2.3f_km is a 
km, and maybe even 2.3dec_km is a km.) I’m not sure we need 
this last distinction, but the first one might be worth copying, so that Path 
and other literals can work, but things like version can interact nicely with 
plain string literals, and r, and b if that’s appropriate, and most of all f, 
by just accepting a cooked string.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C7R3CLKBWT4LICX33HCSIT7ETHTQUVDM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Custom string prefixes

2019-08-26 Thread Robert Vanden Eynde
On Another Subject, we could also have a language change staying that those
two lines are equivalent :

something"hello"
something("hello")

So that, any callable in the context can be used as a prefix ?

On Tue, Aug 27, 2019, 01:11  wrote:

> In Python strings are allowed to have a number of special prefixes:
>
> b'', r'', u'', f''
> + their combinations.
>
> The proposal is to allow arbitrary (or letter-only) user-defined prefixes
> as well.
> Essentially, a string prefix would serve as a decorator for a string,
> allowing the
> user to impose a special semantics of their choosing.
>
> There are quite a few situations where this can be used:
> - Fraction literals: `frac'123/4567'`
> - Decimals: `dec'5.34'`
> - Date/time constants: `t'2019-08-26'`
> - SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
> - Regular expressions: `rx'[a-zA-Z]+'`
> - Version strings: `v'1.13.0a'`
> - etc.
>
> This proposal has been already discussed before, in 2013:
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/M3OLUURUGORLUEGOJHFWEAQQXDMDYXLA/
>
> The opinions were divided whether this is a useful addition. The opponents
> mainly argued that as this only "saves a couple of keystrokes", there is no
> need to overcomplicate the language. It seems to me that now, 6 years
> later,
> that argument can be dismissed by the fact that we had, in fact, added new
> prefix "f" to the language. Note how the "format strings" would fall
> squarely
> within this framework if they were not added by now.
>
> In addition, I believe that "saving a few keystroked" is a worthy goal if
> it adds
> considerable clarity to the expression. Readability counts. Compare:
>
> v"1.13.0a"
> v("1.13.0a")
>
> To me, the former expression is far easier to read. Parentheses,
> especially as
> they become deeply nested, are not easy on the eyes. But, even more
> importantly,
> the first expression much better conveys the *intent* of a version string.
> It has
> a feeling of an immutable object. In the second case the string is passed
> to the
> constructor, but the string has no meaning of its own. As such, the second
> expression feels artificial. Consider this: if the feature already
> existed, how *would*
> you prefer to write your code?
>
> The prefixes would also help when writing functions that accept different
> types
> of their argument. For example:
>
> collection.select("abc")   # find items with name 'abc'
> collection.select(rx"[abc]+")  # find items that match regular
> expression
>
> I'm not discussing possible implementation of this feature just yet, we
> can get to
> that point later when there is a general understanding that this is worth
> considering.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/3Z2YTIGJLSYMKKIGRSFK2DTDIXXVDGEK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5VEZTGHHWIGL46LNYBKJB5BAPUBCLTCM/
Code of Conduct: http://python.org/psf/codeofconduct/