Re: [Python-ideas] Vectorization [was Re: Add list.join() please]
On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano wrote: > I'm sorry, I did not see your comment that you thought new syntax was a > bad idea. If I had, I would have responded directly to that. > Well... I don't think it's the worst idea ever. But in general adding more operators is something I am generally wary about. Plus there's the "grit on Uncle Timmy's screen" test. Actually, if I wanted an operator, I think that @ is more intuitive than extra dots. Vectorization isn't matrix multiplication, but they are sort of in the same ballpark, so the iconography is not ruined. > We can perform thought-experiments and > we don't need anything but a text editor for that. As far as I'm > concerned, the thought experiment of comparing these two snippets: > > ((seq .* 2)..name)..upper() > > versus > > map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, > seq))) > OK... now compare: (Vec(seq) * 2).name.upper() Or: vec_seq = Vector(seq) (vec_seq * 2).name.upper() # ... bunch more stuff seq = vec_seq.unwrap() I'm not saying the double dots are terrible, but they don't read *better* than wrapping (and optionally unwrapping) to me. If we were to take @ as "vectorize", it might be: (seq @* 2) @.name @.upper() I don't hate that. demonstrates conclusively that even with the ugly double dot syntax, > infix syntax easily and conclusively beats map. > Agreed. > If I recall correctly, the three maps here were originally proposed by > you as examples of why map() alone was sufficient and there was no > benefit to the Julia syntax. Well... your maps are kinda deliberately ugly. Even in that direction, I'd write: map(lambda s: (s*2).name.upper(), seq) I don't *love* that, but it's a lot less monstrous than what you wrote. A comprehension probably even better: [(s*2).name.upper() for s in seq] Again, I apologise, I did not see where you said that this was intended > as a proof-of-concept to experiment with the concept. > All happy. Puppies and flowers. > If the Vector class is only a proof of concept, then we surely don't > need to care about moving things in and out of "vector mode". We can > take it as a given that "the real thing" will work that way: the syntax > will be duck-typed and work with any iterable, Well... I at least moderately think that a wrapper class is BETTER than new syntax. So I'd like the proof-of-concept to be at least moderately functional. In any case, there is ZERO code needed to move in/out of "vector mode." The wrapped thing is simply an attribute of the object. When we call vectorized methods, it's just `getattr(type(item), attr)` to figure out the method in a duck-typed way. one of the things which lead me to believe that you thought that a > wrapper class was in and of itself a solution to the problem. If you had > been proposing this Vector class as a viable working solution (or at > least a first alpha version towards a viable solution) then worrying > about round-tripping would be important. > Yes, I consider the Vector class a first alpha version of a viable solution. I haven't seen anything that makes me prefer new syntax. I feel like a wrapper makes it more clear that we are "living in vector land" for a while. The same is true for NumPy, in my mind. Maybe it's just familiarity, but I LIKE the fact that I know that when my object is an ndarray, operations are going to be vectorized ones. Maybe 15 years ago different decisions could have been made, and some "vectorize this operation syntax" could have made the ndarray structure just a behavior of lists instead. But I think the separation is nice. > But as a proof-of-concept of the functionality, then: > > set( Vector(set_of_stuff) + spam ) > list( Vector(list_of_stuff) + spam ) > That's fine. But there's no harm in the class *remembering* what it wraps either. We might want to distinguish: set(Vector(some_collection) + spam) # Make it a set after the operations (Vector(some_collection) + spam).unwrap() # Recover whatever type it was before > Why do you care about type uniformity or type-checking the contents of > the iterable? > Because some people have said "I want my vector to be specifically a *sequence of strings* not of other stuff" And MAYBE there is some optimization to be had if we know we'll never have a non-footype in the sequence (after all, NumPy is hella optimized). That's why the `stringpy` name that someone suggested. Maybe we'd bypass most of the Python-land calls when we did the vectorized operations, but ONLY if we assume type uniformity. But yes, I generally care about duck-typing only. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Re: [Python-ideas] Vectorization [was Re: Add list.join() please]
On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote: > Many apologies if people got one or more encrypted versions of this. > > On 2/7/19 12:13 AM, Steven D'Aprano wrote: > > It wasn't a concrete proposal, just food for thought. Unfortunately the > thinking seems to have missed the point of the Julia syntax and run off > with the idea of a wrapper class. > > I did not miss the point! I think adding new syntax à la Julia is a bad > idea—or at very least, not something we can experiment with today (and > wrote as much). I'm sorry, I did not see your comment that you thought new syntax was a bad idea. If I had, I would have responded directly to that. Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely not sufficiently useful, or unnecessary? You're certainly right that we can't easily experiment in the interpreter with new syntax, but we can perform thought-experiments and we don't need anything but a text editor for that. As far as I'm concerned, the thought experiment of comparing these two snippets: ((seq .* 2)..name)..upper() versus map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq))) demonstrates conclusively that even with the ugly double dot syntax, infix syntax easily and conclusively beats map. If I recall correctly, the three maps here were originally proposed by you as examples of why map() alone was sufficient and there was no benefit to the Julia syntax. I suggested composing them together as a single operation instead of considering them in isolation. > Therefore, something we CAN think about and experiment with today is a > wrapper class. Again, I apologise, I did not see where you said that this was intended as a proof-of-concept to experiment with the concept. [...] > One of the principles I had in mind in my demonstration is that I want > to wrap the original collection type (or keep it an iterator if it > started as one). A number of other ideas here, whether for built-in > syntax or different behaviors of a wrapper, effectively always reduce > every sequence to a list under the hood. This makes my approach less > intrusive to move things in and out of "vector mode." For example: If the Vector class is only a proof of concept, then we surely don't need to care about moving things in and out of "vector mode". We can take it as a given that "the real thing" will work that way: the syntax will be duck-typed and work with any iterable, and there will not be any actual wrapper class involved and consequently no need to move things in and out of the wrapper. I had taken note of this functionality of the class before, and that was one of the things which lead me to believe that you thought that a wrapper class was in and of itself a solution to the problem. If you had been proposing this Vector class as a viable working solution (or at least a first alpha version towards a viable solution) then worrying about round-tripping would be important. But as a proof-of-concept of the functionality, then: set( Vector(set_of_stuff) + spam ) list( Vector(list_of_stuff) + spam ) should be enough to play around with the concept. [...] > Inasmuch as I want to handle iterator here, it is impossible to do any > type check upon creating a Vector. For concrete > `collections.abc.Sequence` objects we could check, in principle. But > I'd rather it be "we're all adults here" ... or at most provide some > `check_type_uniformity()` function or method that had to be called > explicitly. Why do you care about type uniformity or type-checking the contents of the iterable? Comments like this suggest to me that you haven't understood the idea as I have tried to explain it. I'm sorry that I have failed to explain it better. Julia is (if I understand correctly) statically typed, and that allows it to produce efficient machine code because it knows that it is iterating over (let's say) an array of 32-bit ints. While that might be important for the efficiency of the generated machine code, that's not important for the semantic meaning of the code. In Python, we duck-type and resolve operations at runtime. We don't typically validate types in advance: for x in sequence: if not isinstance(x, Spam): raise TypeError('not Spam') for x in sequence: process(x) (except under unusual circumstances). More to the point, when we write a for-loop: result = [] for a_string in seq: result.append(a_string.upper()) we don't expect that the interpreter will validate that the sequence contains nothing but strings in advance. So if I write this using Julia syntax: result = seq..upper() I shouldn't expect the iterpreter to check that seq contains nothing but strings either. -- Steven ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct:
Re: [Python-ideas] Vectorization [was Re: Add list.join() please]
Many apologies if people got one or more encrypted versions of this. On 2/7/19 12:13 AM, Steven D'Aprano wrote: It wasn't a concrete proposal, just food for thought. Unfortunately the thinking seems to have missed the point of the Julia syntax and run off with the idea of a wrapper class. I did not miss the point! I think adding new syntax à la Julia is a bad idea—or at very least, not something we can experiment with today (and wrote as much). Therefore, something we CAN think about and experiment with today is a wrapper class. This approach is pretty much exactly the same thing I tried in a discussion of PEP 505 a while back (None-aware operators). In the same vein as that—where I happen to dislike PEP 505 pretty strongly—one approach to simulate or avoid new syntax is precisely to use a wrapper class. As a footnote, I think my demonstration of PEP 505 got derailed by lots of comments along the lines of "Your current toy library gets the semantics of the proposed new syntax wrong in these edge cases." Those comments were true (and I think I didn't fix all the issues since my interest faded with the active thread)... but none of them were impossible to fix, just small errors I had made. With my *very toy* stringpy.Vector class, I'm just experimenting with usage ideas. I have shown a number of uses that I think could be useful to capture most or all of what folks want in "string vectorization." Most of what I've but in this list is what the little module does already, but some is just ideas for what it might do if I add the code (or someone else makes a PR at https://github.com/DavidMertz/stringpy). One of the principles I had in mind in my demonstration is that I want to wrap the original collection type (or keep it an iterator if it started as one). A number of other ideas here, whether for built-in syntax or different behaviors of a wrapper, effectively always reduce every sequence to a list under the hood. This makes my approach less intrusive to move things in and out of "vector mode." For example: v1 = Vector(set_of_strings) set_of_strings = v1.lower().apply(my_str_fun)._it # Get a set back v2 = Vector(list_of_strings) list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back v3 = Vector(deque_of_strings) deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back v4 = Vector(iter_of_strings) iter_of_strings = v4.lower().apply(my_str_fun)._it # stays lazy! So this is round-tripping through vector-land. Small note: I use the attribute `._it` to store the "sequential thing." That feels internal, so maybe some better way of spelling "get the wrapped thing" would be desirable. I've also lost track of whether anyone is proposing a "vector of strings' as opposed to a vector of arbitrary objects. Nothing I wrote is actually string-specific. That is just the main use case stated. My `stringpy.Vector` might be misnamed in that it is happy to contain any kind of items. But we hope they are all items with the particular methods we want to vectorize. I showed an example where a list might contain a custom string-like object that happens to have methods like `.lower()` as an illustration. Inasmuch as I want to handle iterator here, it is impossible to do any type check upon creating a Vector. For concrete `collections.abc.Sequence` objects we could check, in principle. But I'd rather it be "we're all adults here" ... or at most provide some `check_type_uniformity()` function or method that had to be called explicitly. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Vectorization [was Re: Add list.join() please]
Here are some alternate syntaxes. These are all equivalent to len(print(list)). (len | print)(list) (len |> print)(list) (print <| len)(list) print <| len << list list >> print <| len list >> len |> print ## Traditional argument order print <| len << list ## Stored functions print_lengths = len | print print_lengths = len |> print print_lengths = print <| len These can be called using callable syntax. These can be called using << syntax. These can be called using >> syntax. ## Lightweight traditional syntax order (print | len)() # Explanation The pipeline operator (|, |>, <|) create an object. That object implements, depending on the chosen implementation, some combination of the __call__ operator, the __rshift__ operator, and/or the __lshift__ operator. — I am not proposing Python has all these operators at the same time, just putting these ideas out there for discussion. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Vectorization [was Re: Add list.join() please]
On 2019-02-07 05:27, Chris Angelico wrote: On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano wrote: At the risk of causing confusion^1, we could have a "vector call" syntax: # apply len to each element of obj, instead of obj itself len[obj] which has the advantage that it only requires that we give functions a __getitem__ method, rather than adding new syntax. But it has the disadvantage that it doesn't generalise to operators, without which I don't think this is worth bothering with. Generalizing to operators is definitely going to require new syntax, since both operands can be arbitrary objects. So if that's essential to the idea, we can instantly reject anything that's based on functions (like "make multiplying a function by a tuple equivalent to blah blah blah"). In that case, we come straight to a few key questions: 1) Is this feature even worth adding syntax for? (My thinking: "quite possibly", based on matmul's success despite having an even narrower field of use than this.) 2) Should it create a list? a generator? something that depends on the type of the operand? (Me: "no idea") 2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope") 3) If not, what syntax would be more appropriate? This is a general purpose feature akin to comprehensions (and, in fact, can be used in place of some annoyingly-verbose comprehensions). It needs to be easy to type and read. Pike's automap syntax is to subscript an array with [*], implying "subscript this with every possible value". It's great if you want to do just one simple thing: f(stuff[*]) # [f(x) for x in stuff] stuff[*][1] # [x[1] for x in stuff] but clunky for chained operations: (f(stuff[*])[*] * 3)[*] + 1 # [f(x) * 3 + 1 for x in stuff] That might not be a problem in Python, since you can always just use a comprehension if vectorized application doesn't suit you. I kinda like the idea, but the devil's in the details. Would it be possible, at compile time, to retain it as an automap throughout the expression? stuff[*] # [x for x in suffix] f(stuff[*]) # [f(x) for x in stuff] (f(stuff[*]) * 3) + 1 # [f(x) * 3 + 1 for x in stuff] There could also be a way to 'collapse' it again. An uncollapsed automap would be collapsed at the end of the expression. (Still a bit fuzzy about the details...) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Multi-line string indentation
Was: "Dart (Swift) like multi line strings indentation" This discussion petered-out but I liked the idea, as it alleviates something occasionally annoying. Am supportive of the d'' prefix, perhaps the capital prefixes can be deprecated to avoid issues? If not, a sometimes-optimized (or C-accelerated) str.dedent() is acceptable too. Anyone still interested in this? -Mike On 3/31/18 5:43 PM, Steven D'Aprano wrote: The ideal solution would: - require only a single pair of starting/ending string delimiters; - allow string literals to be indented to the current block, for the visual look and to make it more convenient with editors which automatically indent; - evaluate without the indents; - with no runtime cost. One solution is to add yet another string prefix, let's say d for dedent, but as Terry and others point out, that leads to a combinational explosion with f-strings and r-strings already existing. Another possibility is to make dedent a string method: def spam(): text = """\ some text another line and a third """.dedent() print(text) and avoid the import of textwrap. However, that also imposes a runtime cost, which could be expensive if you are careless: for x in seq: for y in another_seq: process("""/ some large indented string """.dedent() ) (Note: the same applies to using textwrap.dedent.) But we could avoid that runtime cost if the keyhole optimizer performed the dedent at compile time: triple-quoted string literal .dedent() could be optimized at compile-time, like other constant-folding. Out of all the options, including the status quo, the one I dislike the least is the last one: - make dedent a string method; - recommend (but don't require) that implementations perform the dedent of string literals at compile time; (failure to do so is a quality of implementation issue, not a bug) - textwrap.dedent then becomes a thin wrapper around the string method. On 4/1/18 4:41 AM, Michel Desmoulin wrote:> > A "d" prefix to do textwrap.dedent is something I wished for a long time. > > It's like the "f" one: we already can do it, be hell is it convenient to > have a shortcut. > > This is especially if, like me, you take a lot of care in the error > messages you give to the user. I write a LOT of them, very long, very > descriptive, and I have to either import textwrap or play the > concatenation game. > > Having a str.dedent() method would be nice, but the d prefix has the > huge advantage to be able to dedent on parsing, and hence be more > performant. > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/