Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread David Mertz
On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano  wrote:

> I'm sorry, I did not see your comment that you thought new syntax was a
> bad idea. If I had, I would have responded directly to that.
>

Well... I don't think it's the worst idea ever.  But in general adding more
operators is something I am generally wary about.  Plus there's the "grit
on Uncle Timmy's screen" test.

Actually, if I wanted an operator, I think that @ is more intuitive than
extra dots.  Vectorization isn't matrix multiplication, but they are sort
of in the same ballpark, so the iconography is not ruined.


> We can perform thought-experiments and
> we don't need anything but a text editor for that. As far as I'm
> concerned, the thought experiment of comparing these two snippets:
>
> ((seq .* 2)..name)..upper()
>
> versus
>
> map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2,
> seq)))
>

OK... now compare:

(Vec(seq) * 2).name.upper()

Or:

vec_seq = Vector(seq)
(vec_seq * 2).name.upper()
# ... bunch more stuff
seq = vec_seq.unwrap()

I'm not saying the double dots are terrible, but they don't read *better*
than wrapping (and optionally unwrapping) to me.

If we were to take @ as "vectorize", it might be:

(seq @* 2) @.name @.upper()

I don't hate that.

demonstrates conclusively that even with the ugly double dot syntax,
> infix syntax easily and conclusively beats map.
>

Agreed.


> If I recall correctly, the three maps here were originally proposed by
> you as examples of why map() alone was sufficient and there was no
> benefit to the Julia syntax.


Well... your maps are kinda deliberately ugly.  Even in that direction, I'd
write:

map(lambda s: (s*2).name.upper(), seq)

I don't *love* that, but it's a lot less monstrous than what you wrote.  A
comprehension probably even better:

[(s*2).name.upper() for s in seq]

Again, I apologise, I did not see where you said that this was intended
> as a proof-of-concept to experiment with the concept.
>

All happy.  Puppies and flowers.


> If the Vector class is only a proof of concept, then we surely don't
> need to care about moving things in and out of "vector mode". We can
> take it as a given that "the real thing" will work that way: the syntax
> will be duck-typed and work with any iterable,


Well... I at least moderately think that a wrapper class is BETTER than new
syntax. So I'd like the proof-of-concept to be at least moderately
functional.  In any case, there is ZERO code needed to move in/out of
"vector mode." The wrapped thing is simply an attribute of the object.
When we call vectorized methods, it's just `getattr(type(item), attr)` to
figure out the method in a duck-typed way.

one of the things which lead me to believe that you thought that a
> wrapper class was in and of itself a solution to the problem. If you had
> been proposing this Vector class as a viable working solution (or at
> least a first alpha version towards a viable solution) then worrying
> about round-tripping would be important.
>

Yes, I consider the Vector class a first alpha version of a viable
solution.  I haven't seen anything that makes me prefer new syntax.  I feel
like a wrapper makes it more clear that we are "living in vector land" for
a while.

The same is true for NumPy, in my mind.  Maybe it's just familiarity, but I
LIKE the fact that I know that when my object is an ndarray, operations are
going to be vectorized ones.  Maybe 15 years ago different decisions could
have been made, and some "vectorize this operation syntax" could have made
the ndarray structure just a behavior of lists instead.  But I think the
separation is nice.


> But as a proof-of-concept of the functionality, then:
>
> set( Vector(set_of_stuff) + spam )
> list( Vector(list_of_stuff) + spam )
>

That's fine.  But there's no harm in the class *remembering* what it wraps
either.  We might want to distinguish:

set(Vector(some_collection) + spam) # Make it a set after
the operations
(Vector(some_collection) + spam).unwrap()  # Recover whatever type it
was before


> Why do you care about type uniformity or type-checking the contents of
> the iterable?
>

Because some people have said "I want my vector to be specifically a
*sequence of strings* not of other stuff"

And MAYBE there is some optimization to be had if we know we'll never have
a non-footype in the sequence (after all, NumPy is hella optimized).
That's why the `stringpy` name that someone suggested.  Maybe we'd bypass
most of the Python-land calls when we did the vectorized operations, but
ONLY if we assume type uniformity.

But yes, I generally care about duck-typing only.


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread Steven D'Aprano
On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote:
> Many apologies if people got one or more encrypted versions of this.
> 
> On 2/7/19 12:13 AM, Steven D'Aprano wrote:
> 
> It wasn't a concrete proposal, just food for thought. Unfortunately the
> thinking seems to have missed the point of the Julia syntax and run off
> with the idea of a wrapper class.
> 
> I did not miss the point! I think adding new syntax à la Julia is a bad
> idea—or at very least, not something we can experiment with today (and
> wrote as much).

I'm sorry, I did not see your comment that you thought new syntax was a 
bad idea. If I had, I would have responded directly to that.

Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely 
not sufficiently useful, or unnecessary?

You're certainly right that we can't easily experiment in the 
interpreter with new syntax, but we can perform thought-experiments and 
we don't need anything but a text editor for that. As far as I'm 
concerned, the thought experiment of comparing these two snippets:

((seq .* 2)..name)..upper()

versus

map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq)))

demonstrates conclusively that even with the ugly double dot syntax, 
infix syntax easily and conclusively beats map.

If I recall correctly, the three maps here were originally proposed by 
you as examples of why map() alone was sufficient and there was no 
benefit to the Julia syntax. I suggested composing them together as a 
single operation instead of considering them in isolation.


> Therefore, something we CAN think about and experiment with today is a
> wrapper class.

Again, I apologise, I did not see where you said that this was intended 
as a proof-of-concept to experiment with the concept.


[...]
> One of the principles I had in mind in my demonstration is that I want
> to wrap the original collection type (or keep it an iterator if it
> started as one).  A number of other ideas here, whether for built-in
> syntax or different behaviors of a wrapper, effectively always reduce
> every sequence to a list under the hood.  This makes my approach less
> intrusive to move things in and out of "vector mode."  For example:

If the Vector class is only a proof of concept, then we surely don't 
need to care about moving things in and out of "vector mode". We can 
take it as a given that "the real thing" will work that way: the syntax 
will be duck-typed and work with any iterable, and there will not be any 
actual wrapper class involved and consequently no need to move things in 
and out of the wrapper.

I had taken note of this functionality of the class before, and that was 
one of the things which lead me to believe that you thought that a 
wrapper class was in and of itself a solution to the problem. If you had 
been proposing this Vector class as a viable working solution (or at 
least a first alpha version towards a viable solution) then worrying 
about round-tripping would be important.

But as a proof-of-concept of the functionality, then:

set( Vector(set_of_stuff) + spam )
list( Vector(list_of_stuff) + spam )

should be enough to play around with the concept.



[...]
> Inasmuch as I want to handle iterator here, it is impossible to do any
> type check upon creating a Vector.  For concrete
> `collections.abc.Sequence` objects we could check, in principle.  But
> I'd rather it be "we're all adults here" ... or at most provide some
> `check_type_uniformity()` function or method that had to be called
> explicitly.

Why do you care about type uniformity or type-checking the contents of 
the iterable?

Comments like this suggest to me that you haven't understood the 
idea as I have tried to explain it. I'm sorry that I have failed to 
explain it better.

Julia is (if I understand correctly) statically typed, and that allows 
it to produce efficient machine code because it knows that it is 
iterating over (let's say) an array of 32-bit ints.

While that might be important for the efficiency of the generated 
machine code, that's not important for the semantic meaning of the code. 
In Python, we duck-type and resolve operations at runtime. We don't 
typically validate types in advance:

for x in sequence:
if not isinstance(x, Spam):
 raise TypeError('not Spam')
for x in sequence:
process(x)

(except under unusual circumstances). More to the point, when we write a 
for-loop:

result = []
for a_string in seq:
result.append(a_string.upper())

we don't expect that the interpreter will validate that the sequence 
contains nothing but strings in advance. So if I write this using Julia 
syntax:

result = seq..upper()

I shouldn't expect the iterpreter to check that seq contains nothing but 
strings either.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: 

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread David Mertz
Many apologies if people got one or more encrypted versions of this.

On 2/7/19 12:13 AM, Steven D'Aprano wrote:

It wasn't a concrete proposal, just food for thought. Unfortunately the
thinking seems to have missed the point of the Julia syntax and run off
with the idea of a wrapper class.

I did not miss the point! I think adding new syntax à la Julia is a bad
idea—or at very least, not something we can experiment with today (and
wrote as much).

Therefore, something we CAN think about and experiment with today is a
wrapper class.  This approach is pretty much exactly the same thing I
tried in a discussion of PEP 505 a while back (None-aware operators).
In the same vein as that—where I happen to dislike PEP 505 pretty
strongly—one approach to simulate or avoid new syntax is precisely to
use a wrapper class.

As a footnote, I think my demonstration of PEP 505 got derailed by lots
of comments along the lines of "Your current toy library gets the
semantics of the proposed new syntax wrong in these edge cases."  Those
comments were true (and I think I didn't fix all the issues since my
interest faded with the active thread)... but none of them were
impossible to fix, just small errors I had made.

With my *very toy* stringpy.Vector class, I'm just experimenting with
usage ideas.  I have shown a number of uses that I think could be useful
to capture most or all of what folks want in "string vectorization."
Most of what I've but in this list is what the little module does
already, but some is just ideas for what it might do if I add the code
(or someone else makes a PR at https://github.com/DavidMertz/stringpy).

One of the principles I had in mind in my demonstration is that I want
to wrap the original collection type (or keep it an iterator if it
started as one).  A number of other ideas here, whether for built-in
syntax or different behaviors of a wrapper, effectively always reduce
every sequence to a list under the hood.  This makes my approach less
intrusive to move things in and out of "vector mode."  For example:

  v1 = Vector(set_of_strings)
  set_of_strings = v1.lower().apply(my_str_fun)._it  # Get a set back
  v2 = Vector(list_of_strings)
  list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back
  v3 = Vector(deque_of_strings)
  deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back
  v4 = Vector(iter_of_strings)
  iter_of_strings = v4.lower().apply(my_str_fun)._it  # stays lazy!

So this is round-tripping through vector-land.

Small note: I use the attribute `._it` to store the "sequential thing."
 That feels internal, so maybe some better way of spelling "get the
wrapped thing" would be desirable.

I've also lost track of whether anyone is proposing a "vector of strings'
as opposed to a vector of arbitrary objects.

Nothing I wrote is actually string-specific.  That is just the main use
case stated.  My `stringpy.Vector` might be misnamed in that it is happy
to contain any kind of items.  But we hope they are all items with the
particular methods we want to vectorize.  I showed an example where a
list might contain a custom string-like object that happens to have
methods like `.lower()` as an illustration.

Inasmuch as I want to handle iterator here, it is impossible to do any
type check upon creating a Vector.  For concrete
`collections.abc.Sequence` objects we could check, in principle.  But
I'd rather it be "we're all adults here" ... or at most provide some
`check_type_uniformity()` function or method that had to be called
explicitly.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread James Lu
Here are some alternate syntaxes.

These are all equivalent to len(print(list)).

(len | print)(list)
(len |> print)(list)
(print <| len)(list)
print <| len << list
list >> print <| len
list >> len |> print


## Traditional argument order 
print <| len << list

## Stored functions 
print_lengths = len | print
print_lengths = len |> print
print_lengths = print <| len

These can be called using callable syntax.
These can be called using << syntax.
These can be called using >> syntax.
## Lightweight traditional syntax order
(print | len)()

# Explanation
The pipeline operator (|, |>, <|) create an object.

That object implements, depending on the chosen implementation, some 
combination of the __call__ operator, the __rshift__ operator, and/or the 
__lshift__ operator.
—
I am not proposing Python has all these operators at the same time, just 
putting these ideas out there for discussion. 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread MRAB

On 2019-02-07 05:27, Chris Angelico wrote:

On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano  wrote:

At the risk of causing confusion^1, we could have a "vector call"
syntax:

# apply len to each element of obj, instead of obj itself
len[obj]

which has the advantage that it only requires that we give functions a
__getitem__ method, rather than adding new syntax. But it has the
disadvantage that it doesn't generalise to operators, without which I
don't think this is worth bothering with.


Generalizing to operators is definitely going to require new syntax,
since both operands can be arbitrary objects. So if that's essential
to the idea, we can instantly reject anything that's based on
functions (like "make multiplying a function by a tuple equivalent to
blah blah blah"). In that case, we come straight to a few key
questions:

1) Is this feature even worth adding syntax for? (My thinking: "quite
possibly", based on matmul's success despite having an even narrower
field of use than this.)

2) Should it create a list? a generator? something that depends on the
type of the operand? (Me: "no idea")

2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope")

3) If not, what syntax would be more appropriate?

This is a general purpose feature akin to comprehensions (and, in
fact, can be used in place of some annoyingly-verbose comprehensions).
It needs to be easy to type and read.

Pike's automap syntax is to subscript an array with [*], implying
"subscript this with every possible value". It's great if you want to
do just one simple thing:

f(stuff[*])
# [f(x) for x in stuff]
stuff[*][1]
# [x[1] for x in stuff]

but clunky for chained operations:

(f(stuff[*])[*] * 3)[*] + 1
# [f(x) * 3 + 1 for x in stuff]

That might not be a problem in Python, since you can always just use a
comprehension if vectorized application doesn't suit you.

I kinda like the idea, but the devil's in the details.

Would it be possible, at compile time, to retain it as an automap 
throughout the expression?


stuff[*]
# [x for x in suffix]

f(stuff[*])
# [f(x) for x in stuff]

(f(stuff[*]) * 3) + 1
# [f(x) * 3 + 1 for x in stuff]

There could also be a way to 'collapse' it again. An uncollapsed automap 
would be collapsed at the end of the expression. (Still a bit fuzzy 
about the details...)

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Multi-line string indentation

2019-02-07 Thread Mike Miller

Was: "Dart (Swift) like multi line strings indentation"

This discussion petered-out but I liked the idea, as it alleviates something 
occasionally annoying.


Am supportive of the d'' prefix, perhaps the capital prefixes can be deprecated 
to avoid issues?  If not, a sometimes-optimized (or C-accelerated) str.dedent() 
is acceptable too.


Anyone still interested in this?

-Mike



On 3/31/18 5:43 PM, Steven D'Aprano wrote:

The ideal solution would:

- require only a single pair of starting/ending string delimiters;

- allow string literals to be indented to the current block, for
   the visual look and to make it more convenient with editors
   which automatically indent;

- evaluate without the indents;

- with no runtime cost.


One solution is to add yet another string prefix, let's say d for
dedent, but as Terry and others point out, that leads to a combinational
explosion with f-strings and r-strings already existing.

Another possibility is to make dedent a string method:

def spam():
 text = """\
some text
another line
and a third
""".dedent()
 print(text)

and avoid the import of textwrap. However, that also imposes a runtime
cost, which could be expensive if you are careless:

for x in seq:
for y in another_seq:
   process("""/
   some large indented string
   """.dedent()
   )

(Note: the same applies to using textwrap.dedent.)

But we could avoid that runtime cost if the keyhole optimizer performed
the dedent at compile time:

 triple-quoted string literal
 .dedent()

could be optimized at compile-time, like other constant-folding.

Out of all the options, including the status quo, the one I dislike the
least is the last one:

- make dedent a string method;

- recommend (but don't require) that implementations perform the
   dedent of string literals at compile time;

   (failure to do so is a quality of implementation issue, not a bug)

- textwrap.dedent then becomes a thin wrapper around the string method.




On 4/1/18 4:41 AM, Michel Desmoulin wrote:>
> A "d" prefix to do textwrap.dedent is something I wished for a long time.
>
> It's like the "f" one: we already can do it, be hell is it convenient to
> have a shortcut.
>
> This is especially if, like me, you take a lot of care in the error
> messages you give to the user. I write a LOT of them, very long, very
> descriptive, and I have to either import textwrap or play the
> concatenation game.
>
> Having a str.dedent() method would be nice, but the d prefix has the
> huge advantage to be able to dedent on parsing, and hence be more
> performant.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/