[Python-ideas] Re: Custom string prefixes

Andrew Barnert via Python-ideas Mon, 26 Aug 2019 20:39:44 -0700

> On Aug 26, 2019, at 18:41, [email protected] wrote:
> 
> Thanks, Andrew, for your feedback. I didn't even think about string 
> **suffixes**, but
> clearly they can be implemented together with the prefixes for additional 
> flexibility.


What about _instead of_ rather than _together with_? Half of Stephen’s 
objections are related to the ambiguity (to a human, even if not to the parser) 
of user prefixes in the (potential) presence of the builtin prefixes. None of 
those go even arise with suffixes. Anyway, maybe you already have good answers 
for all of those objections, but if not…

Also, there’s at least one mainstream language (C++) that allows user suffixes 
and has literal syntax otherwise somewhat like Python’s, and the proposals for 
other languages like Rust generally seem to be generally trying to do “like C++ 
but minus all the usual C++ over-complexity”. Are there actual examples of 
languages with user prefixes?

The only different designs I know of rely on the static type of the evaluation 
context. (For example, in Swift, you can just statically type `23 : km` or 
`"abc]*" : regex`, or even just pass the literal to a function that’s declared 
or inferred to take a regex if that happens to be readable in your use case, so 
there’s no need for a suffix syntax.) Which is neat, but obviously not 
applicable to Python. 

> And your idea that `<string literal> <suffix>` is conceptually no different 
> than
> `<numeric literal> <suffix>` is absolutely insightful.

Well, back in 2015 I probably just stole the idea from C++. :)

Another question that raises that I just remembered: the word “literal” has 
three overlapping but distinct meanings in Python. Which one do we actually 
mean here? In particular, are container displays “literals”? For that matter, 
is -2 even a literal?

Also, from what I remember, either in 2013 or in 2015, the discussion got 
side-tracked over people not liking the word “literal” to mean “something 
that’s actually the result of a runtime function call”. That may be less of a 
problem after f-strings (which are called literals in the PEP; not sure about 
the language reference), but last time around, bringing up the fact that “-2” 
is actually a function call didn’t sway anyone. So, maybe I shouldn’t be using 
the word “literal” this time, and I really hope it doesn’t ruin your proposal…

> Speaking of string suffixes, flags on regular expressions immediately come to 
> mind.
> For example `rx"(abc)"ig` could create a regular expression that performs 
> global 
> case-insensitive search.

That’s an interesting idea. And that’s something you can’t do with a 
single-affix design; you need prefixes and suffixes, unless you have some kind 
of separator for chaining, or only allow single characters.

>> I don’t think you can fairly discuss this idea without getting at least a
>> _little_ bit into the implementation details.
> 
> Right. So, the first question to answer is what the compiler should do when 
> it sees
> a prefixed (suffixed) string? That is, what byte-code should be emitted when 
> the
> compiler sees `lambda: a"bcd"e` ?
> 
> In one approach, we'd want this expression to be evaluated at compile time, 
> similar
> to how f-strings work. However, how would the compiler know what prefix "a" 
> means
> exactly? There has to be some kind of directive to tell the compiler that. 
> For example,
> imagine the compiler sees near the top of the file
> 
>    #pragma from mymodule import a
> 
> It would then import the symbol `a`, call `a("bcd", suffix="e")`. This would 
> return an
> AST tree that will be plugged in place of the original string.
> 
> This solution allows maximum efficiency, but seems inflexible and deeply 
> invasive.
> 
> Another approach would defer the construction of objects to compile time. 
> Though
> not as efficient, it would allow loading prefixes at run-time. In this case 
> `a"bcd"e` can
> be interpreted by the compiler as if it was
> 
>    a("bcd", suffix="e")
> 
> where symbol `a` is to be looked up in the local/global scope.

My hack works basically like this. The compiler just converts it to a function 
call, which is looked up normally. I think that’s the right tack here. IIRC, my 
hack translates a D suffix into a call to something like -_user_literal_D, 
which solves the problem with accidental pollution of the namespace. But this 
does mean that any code that wants to use the D suffix has to `from 
decimal_literals import *, or `2.3D` raises a NameError about nothing named 
_user_literal_D. (Either that, or someone has to inject it into builtins…) I’m 
not sure whether that’s user-friendly enough.

Anyway, I think your registry idea makes more sense. Then `2.3D` effectively 
just means `__user_literals__['D']('2.3')`, and there’s no namespace pollution 
at all.

> For this approach to work, we'd create a
> new code op, so that `a"bcd"e` would become
> 
>    0 LOAD_CONST        1 ('a', 'bcd', 'e')
>    2 STR_RESOLVE_TAG   0
> 
> where `STR_RESOLVE_TAG` would effectively call `__resolve_tag__()` special
> method. The method would search for `a` in the registry of known string tags,
> and then pass the tuple to the corresponding constructor.

Do we even need that? It’s true that most things in Python translate reasonably 
directly to bytecodes, but in this case it might be easier to just compile to 
existing bytecodes to look up and call the function.

> There will, of course, be a method to register new tags. Something like 
> 
>    str.___register_tag__('a', MyAObject)

If the params are (handler, name=None), and None means to use the __name__ of 
the handler at the tag, then you can use it as a decorator:

    @__register_tag__
    def D(decimal_string):
        return decimal.Decimal(decimal_string)

Although this may not be the best example, because it might actually be clearer 
(as well as more efficient) to just register the constructor:

    __register_tag__(decimal.Decimal, 'D')

… but I suspect many examples won’t be just a matter of calling a constructor 
on the string.

> As for suffix-only literals, we can treat them as if they begin with an 
> underscore.
> Thus, `1/3f` would be equivalent to
> 
>    1/_f(3)

Does that mean you can’t actually register a prefix named `_f`? Or that, if you 
do, it also registers a suffix named `f`?

Also, I think for most non-single-letter suffixes you’d actually want an 
underscore at the start of the suffix. See C++ for lots of examples, but for a 
quick illustration,compare these:

    c = 2.99792458e8mps
    c = 2.99792458e8_mps

    c = 299_792_458mps
    c = 299_792_458_mps

The _mps suffix looks a lot better than the mps suffix, doesn’t it? But would 
you want the function to have to be named __mps with two underscores?

It may be worth coming up with the most compelling examples and then working 
out what feature set would support as many as possible, rather that trying to 
work out the ultimate feature set first and then see what we can do with it. 
It’s probably worth stealing liberally from the C++ discussion (and any other 
languages that have similar features) as well as the 2013 Python discussion, 
but off the top of my head:

 * Decimal, Fraction, np.float32, mpz, …
 * Path objects
 * Windows native Path objects, possibly with “really raw” processing to allow 
trailing backslashes
 * regex, possibly with flags, possibly with “really raw” backslashes
 * “Really raw” strings in general.
 * JSON (register the stdlib or simplejson or ujson), XML (register ETree or 
lxml or bs4 or whatever you want), HTML, etc.
 * unit suffixes for quantities



_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/YMMTTJJW3RARQUWH7EFRJN5UEGP7G6YJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Custom string prefixes

Reply via email to