On Thu, Aug 29, 2019 at 02:10:21PM -0700, Andrew Barnert wrote: [...] > And most of the string affixes people have suggested are for > string-ish things.
I don't think that's correct. Looking back at the original post in this thread, here are the motivating examples: [quote] There are quite a few situations where this can be used: - Fraction literals: `frac'123/4567'` - Decimals: `dec'5.34'` - Date/time constants: `t'2019-08-26'` - SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)` - Regular expressions: `rx'[a-zA-Z]+'` - Version strings: `v'1.13.0a'` - etc. [/quote] By my count, that's zero out of six string-ish things. There may have been other proposals, but I haven't trolled through the entire thread to find them. > I’m not sure what a “version string” is, but I > might design that as an actual subclass of str that adds extractor > methods and overrides comparison. A version object is a record with fields, most of which are numeric. For an existing example, see sys.version_info which is a kind of named tuple, not a string. The version *string* is just a nice human-readable representation. It doesn't make sense to implement string methods on a Version object. Why would you offer expandtabs(), find(), splitlines(), translate(), isspace(), capitalise(), etc methods? Or * and + (repetition and concatenation) operators? I cannot think of a single string method/operator that a Version object should implement. > A compiled regex isn’t literally a > string, but neither is a bytes; it’s still clearly _similar_ to a > string, in important ways. It isn't clear to me how a compiled regex object is "similar" to a string. The set of methods offered by both regexes and strings is pretty small, by my generous count it is just two methods: - str.split and SRE_Pattern.split; - str.replace and SRE_Pattern.sub neither of which use the same API or have the same semantics. Compiled regex objects don't offer string methods like translate, isdigits, upper, encode, etc. I would say that they are clearly *not* strings. [...] > And versions of the proposal that allow delimiters other than quotes > so you can write things like regex/a.*b/, well, I’d need to see a > specific proposal to be sure, but that seems even less objectionable > in this regard. That looks like nothing else in Python, but it looks > like a regex in awk or sed or perl, so I’d probably read it as a regex > object. Why do you need the "regex" prefix? Assuming the parser and the human reader can cope with using / as both a delimiter and a operator (which isn't a given!) /.../ for a regex object seems fine to me. I suspect that this is going to be ambiguous though: target = regex/a*b/ +x could be: target = ((regex / a) * b) / ( unary-plus x) or target = (regex object) + x so maybe we do need a prefix. > > Let me suggest some design principles that should hold for languages > > with more-or-less "conventional" syntax. Languages like APL or Forth > > excluded. > > > > - anything using ' or " quotation marks as delimiters (with or without > > affixes) ought to return a string, and nothing but a string; > > So b"abc" should not be allowed? In what way are byte-STRINGS not strings? Unicode-strings and byte-strings share a significant fraction of their APIs, and are so similar that back in Python 2.2 the devs thought it was a good idea to try automagically coercing from one to the other. I was careful to write *string* rather than *str*. Sorry if that wasn't clear enough. > Let’s say I created a native-UTF16-string type to deal with some > horrible Windows or Java stuff. Why would this principle of yours > suggest that I shouldn’t be allowed to use u16"" just like b””? It is a utf16 STRING so making it look like a STRING is perfectly fine. [...] > > - as a strong preference, anything using quotation marks as delimiters > > ought to be processed at compile-time (f-strings are a conspicuous > > exception to that principle); > > I don’t see why you should even want to _know_ whether it’s true, much > less have a strong preference. Because I care about performance, at least a bit. Because I don't want to write code that is unnecessarily slow, for some definition of "unnecessary". Because I want to be able to reason (at least in broad terms) about the cost of certain operations. Because I want to be able to reason about the semantics of my code. Why do I write 1234 instead of int("1234")? The second is longer, but it is more explicit and it is self-documenting: the reader knows that its an int because it says so right there in the code, even if they come from Javascript where 1234 is an IEEE-754 float. Assuming the builtin int() hasn't be shadowed. But it's also wastefully slow. If we are genuinely indifferent to the difference, then we should be equally indifferent to a proposal to replace the LOAD_CONST byte-code for ints as follows: dis("1234") # in current Python LOAD_CONST 0 (1234) # In the future: LOAD_NAME 0 (int) LOAD_CONST 0 ('1234') CALL_FUNCTION 1 (1 positional, 0 keyword pair) If you were asked to vote +1 or -1 on this proposal (sitting on the fence not allowed), which would you vote? I would vote -1. Aside from the performance hit, it's also a semantic change: what was a compile-time literal is now a runtime function call which can be shadowed. It is nice to know that when I say ``n = 1234`` that the value of n is guaranteed to be 1234 no matter what odd things are going on. (Short of running a modified interpreter.) String literals (byte- or unicode, raw or cooked, triple- or single-quoted) are, with the exception of f-strings, LOAD_CONST calls like ints. I think that's a valuable, useful thing to know, and not something we should lightly give up. > Here are things you probably really do care about: (a) they act like > strings, (b) they act like constants, Don't confuse immutability with constant-ness. Python doesn't have constants, except by convention. There's no easy way to prevent a simple name from being rebound. > (c) if there are potential issues parsing them, you see those issues > as soon as possible, Like at compile-time? Consider the difference between the compile-time syntax error you get here: x = 123w456 versus the run-time error you get here: x = int("123w456") I can understand saying "we have no choice but to make this a runtime operation", or even "on the balance of pros and cons, it isn't worth the extra work to make this happen at compile-time". I don't like it that we have to write Decimal("123.456"), but I understand the reasons why we have to and can accept that it is a necessary evil. (To clarify: of course it is a feature that we *can* pass strings to the Decimal constructor, when such strings come from user-input or are read from data files etc.) But I don't think that it is a feature that there is no alternative but to pass a string, even when the value is known at edit-time. And I don't understand your position that I shouldn't care about the difference. > (d) working with them is more than fast enough. You are right that Python is usually "fast enough" (except when it isn't), and that the one-off cost of creating a few pseudo-constants is generally only a small fraction of the cost of most programs. But Python isn't quote-unquote "slow" because of any one single thing, it is more of a death by a thousand cuts, lots of *small* inefficiences which individually don't matter but collectively add up to making Python up to a hundred times slower than C. When performance matters, which would you rather write? for item in huge_sequence: value = item + 1234 value = item + int("1234") I know that when I use a literal, it will be as fast as it possibly can be in Python, or at least there's nothing *I* can do to make it faster. But when I have to use a call like Decimal("123.45"), that's one more thing for me to have to worry about: is it fast enough? Can I make it faster? Should I make it faster? We should be wary about hiding potentially slow code in something that looks like fast code. (Yes, that's a criticism of properties too, but in the case of properties we know that the benefits outweigh the risk. It's not clear that this is the case here.) > Compile time is neither > necessary (Haskell) nor sufficient (Tcl) for any of that. So why > insist on compile-time instead of insisting on a-d? I think you will find that I said this should be "a strong preference", which is hardly *insisting*. > > No I'm not. I'm going to think of it as a *string*, because it looks > > like a string. > > Well, yes. It’s a path string, or a regex string, or a version string, Actually, no, it will be a Path object, a compiled regex SRE_pattern object, or a Version object, not a string at all. > or whatever, which is loosely a kind of string but not literally one. > Like bytes. Bytes literally are strings. They just aren't strings of Unicode characters. > Or it’s a sql cursor, in which case it was probably a misuse of the feature. That's one of the motivating examples. I agree it is a misuse of the proposed feature. > > Particularly given the OP's preference for single-letter prefixes. > > OK, I will agree with you there that the overuse of single-letter > prefixes in the motivating examples is a worrying sign. In principle > there’s nothing wrong with single letters (and I think I can make a > good case for the f suffix as a good use in 3D-math code). I can concur with all of that. [...] > As I’ve said before, I believe that anything that doesn’t have a > builtin type does not deserve builtin syntax. Agreed. Although there's a bit of fuzziness over the concept of "builtin". Not all built-in objects are available in the ``builtins`` module, e.g. NoneType, or FunctionType. > And I don’t understand > why that isn’t a near-ubiquitous viewpoint. But it’s not just you; at > least three people (all of whom dislike the whole concept of custom > affixes) seem at least in principle open to the idea of adding builtin > affixes for types that don’t exist. Which makes me think it’s almost > certainly not that you’re all crazy, but that I’m missing something > important. Can you explain it to me? I thought it went without saying that a necessary pre-condition for adding builtin syntax for a type was for the type to become built-in first. Sorry if it wasn't as clear or obvious as I thought. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WDR2QHG4EBB3FP6Z2T6CGKC7O7D4KDA5/ Code of Conduct: http://python.org/psf/codeofconduct/