Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-11 Thread Kyle Lahnakoski
CHB,

Thank you! I had forgotten that discussion at the beginning of July [1]. 

Googling the list [2] also shows mention of PythonQL [3], which may
point to use cases that can guide a Vectorization idea.


[1] groupby discussion -
https://mail.python.org/pipermail/python-ideas/2018-July/051786.html

[2] google search -
https://www.google.ca/search?q=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F=group+by+site%3Ahttps%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-ideas%2F

[3] PythonQL - https://github.com/pythonql/pythonql



On 2019-02-11 10:43, Christopher Barker wrote:
> Do take a look in the fairly recent archives of this list for a big
> discussion of groupby -- it kind of petered out but there were a
> couple options on the table.
>
> -CHB
>

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-11 Thread Christopher Barker
Do take a look in the fairly recent archives of this list for a big
discussion of groupby -- it kind of petered out but there were a couple
options on the table.

-CHB



On Sun, Feb 10, 2019 at 9:23 PM Kyle Lahnakoski 
wrote:

>
> On 2019-02-10 18:30, Steven D'Aprano wrote:
> >
> > Can you post a simplified example of how you would do it in SQL,
> > compared to what you would have to do in standard Python?
>
> Can I do the same in standard Python? If I did, then I would use Pandas:
> it has groupby, and some primitive joining, and window functions may
> come naturally because of its imperative nature, but I have not tried
> it.  If I can not use Pandas, then I would write the groupby and window
> functions and call them in sequence. This is similar to what you see in
> my code now: a number of properties who's values get dispatched to
> Python functions.  My code is more complicated only because those
> structures can be dispatched to translators for databases too.
>
> I am certain there are many variations of groupby out in the wild, and
> it would be nice to have the concept standardized when/if Python has
> vector operations. Join would be nice to have too, but I do not use it
> much; dictionary lookup seems to fill that need.  Window functions
> (which are like mini queries) are powerful, but like Pandas, may end up
> end up being free because Python is imperative.
>
> My code I pointed to has two parts. Here is the first part in SQL (well,
> an approximation of SQL since I did not test this, and now I am rusty).
> A detailed description is below
>
> |   WITH time_range AS (
> |   SELECT
> |   num
> |   FROM
> |   all_integers
> |   WHERE
> |   num % 60 =0 AND
> |   num >= floor(<>/60/60)*60*60-<> AND
> |   num < floor(<>/60/60) + 60*60
> |   )
> |   SELECT
> |   availability_zone,
> |   instance_type,
> |   time_range.num AS time
> |   MAX(price) as PRICE,
> |   COUNT(1) AS `COUNT`,
> |   LAST(current_price) OVER (
> |   PARTITION BY
> |   availability_zone,
> |   instance_type
> |   ORDER BY
> |   timestamp
> |   ) AS current_price
> |   FROM
> |   (
> |   SELECT
> |   *,
> |   COALESCE(LAG(timestampvalue, 1), <>) OVER (
> |   PARTITION BY
> |   availability_zone,
> |   instance_type
> |   ORDER BY
> |   timestamp
> |   ) AS expire,
> |   timestamp-<> AS effective
> |   FROM
> |   prices
> |   ) temp
> |   RIGHT JOIN
> |   time_range ON time_range.num BETWEEN temp.effective AND temp.expire
> |   GROUP BY
> |   availability_zone,
> |   instance_type,
> |   time_range.num AS time
> |   WHERE
> |   expire > floor(<>/60/60)*60*60 - <>
>
>
> Now, for the same, with description:
>
> This WITH clause is not real SQL; it is meant to stand in for a
> temporary table that contains all hours of the time range I am
> interested. Definitely easier to do in Python. All time is assumed to be
> in seconds since epoch.
>
> |   WITH time_range AS (
> |   SELECT
> |   num
> |   FROM
> |   all_integers
> |   WHERE
> |   num % 60 =0 AND
> |   num >= floor(<>/60/60)*60*60-<> AND
> |   num < floor(<>/60/60) + 60*60
> |   )
>
> We will select the three dimensions we are interested in (see GROUP BY
> below), along with the MAX price we have seen in the given hour, and the
> current_price for any (availability_zone, instance_type) pair.
>
> |   SELECT
> |   availability_zone,
> |   instance_type,
> |   time_range.num AS time
> |   MAX(price) as PRICE,
> |   COUNT(1) AS `COUNT`,
> |   LAST(current_price) OVER (
> |   PARTITION BY
> |   availability_zone,
> |   instance_type
> |   ORDER BY
> |   timestamp
> |   ) AS current_price
> |   FROM
>
> The prices coming from Amazon only have a timestamp for when that price
> is effective; so this sub-query adds an `effective` start time, and an
> `expire` time so the rest of the query need only deal with ranges. The
> timestamp-<> is putting the start time back further
> into the past so the past can "see" future pricing.
>
> |   (
> |   SELECT
> |   *,
> |   COALESCE(LAG(timestamp, 1), <>) OVER (
> |   PARTITION BY
> |   availability_zone,
> |   instance_type
> |   ORDER BY
> |   timestamp
> |   ) AS expire,
> |   timestamp-<> AS effective
> |   FROM
> |   prices
> |   ) temp
>
> This is the point where we use the time_range from above and find every
> hour a price is effective.  This could have been a sub-query, but I am
> rusty at SQL
>
> |   RIGHT JOIN
> | 

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-10 Thread Kyle Lahnakoski

On 2019-02-10 18:30, Steven D'Aprano wrote:
>
> Can you post a simplified example of how you would do it in SQL, 
> compared to what you would have to do in standard Python?

Can I do the same in standard Python? If I did, then I would use Pandas:
it has groupby, and some primitive joining, and window functions may
come naturally because of its imperative nature, but I have not tried
it.  If I can not use Pandas, then I would write the groupby and window
functions and call them in sequence. This is similar to what you see in
my code now: a number of properties who's values get dispatched to
Python functions.  My code is more complicated only because those
structures can be dispatched to translators for databases too.

I am certain there are many variations of groupby out in the wild, and
it would be nice to have the concept standardized when/if Python has
vector operations. Join would be nice to have too, but I do not use it
much; dictionary lookup seems to fill that need.  Window functions
(which are like mini queries) are powerful, but like Pandas, may end up
end up being free because Python is imperative.

My code I pointed to has two parts. Here is the first part in SQL (well,
an approximation of SQL since I did not test this, and now I am rusty). 
A detailed description is below

|   WITH time_range AS (
|   SELECT
|   num
|   FROM
|   all_integers
|   WHERE
|   num % 60 =0 AND
|   num >= floor(<>/60/60)*60*60-<> AND
|   num < floor(<>/60/60) + 60*60
|   )
|   SELECT 
|   availability_zone,
|   instance_type,
|   time_range.num AS time
|   MAX(price) as PRICE,
|   COUNT(1) AS `COUNT`,
|   LAST(current_price) OVER (
|   PARTITION BY
|   availability_zone,
|   instance_type
|   ORDER BY
|   timestamp
|   ) AS current_price
|   FROM
|   (
|   SELECT
|   *,
|   COALESCE(LAG(timestampvalue, 1), <>) OVER (
|   PARTITION BY
|   availability_zone,
|   instance_type
|   ORDER BY
|   timestamp
|   ) AS expire,
|   timestamp-<> AS effective
|   FROM
|   prices
|   ) temp
|   RIGHT JOIN
|   time_range ON time_range.num BETWEEN temp.effective AND temp.expire
|   GROUP BY
|   availability_zone,
|   instance_type,
|   time_range.num AS time
|   WHERE
|   expire > floor(<>/60/60)*60*60 - <>


Now, for the same, with description:

This WITH clause is not real SQL; it is meant to stand in for a
temporary table that contains all hours of the time range I am
interested. Definitely easier to do in Python. All time is assumed to be
in seconds since epoch.

|   WITH time_range AS (
|   SELECT
|   num
|   FROM
|   all_integers
|   WHERE
|   num % 60 =0 AND
|   num >= floor(<>/60/60)*60*60-<> AND
|   num < floor(<>/60/60) + 60*60
|   )

We will select the three dimensions we are interested in (see GROUP BY
below), along with the MAX price we have seen in the given hour, and the
current_price for any (availability_zone, instance_type) pair.

|   SELECT 
|   availability_zone,
|   instance_type,
|   time_range.num AS time
|   MAX(price) as PRICE,
|   COUNT(1) AS `COUNT`,
|   LAST(current_price) OVER (
|   PARTITION BY
|   availability_zone,
|   instance_type
|   ORDER BY
|   timestamp
|   ) AS current_price
|   FROM

The prices coming from Amazon only have a timestamp for when that price
is effective; so this sub-query adds an `effective` start time, and an
`expire` time so the rest of the query need only deal with ranges. The 
timestamp-<> is putting the start time back further
into the past so the past can "see" future pricing. 

|   (
|   SELECT
|   *,
|   COALESCE(LAG(timestamp, 1), <>) OVER (
|   PARTITION BY
|   availability_zone,
|   instance_type
|   ORDER BY
|   timestamp
|   ) AS expire,
|   timestamp-<> AS effective
|   FROM
|   prices
|   ) temp

This is the point where we use the time_range from above and find every
hour a price is effective.  This could have been a sub-query, but I am
rusty at SQL

|   RIGHT JOIN
|   time_range ON time_range.num BETWEEN temp.effective AND temp.expire

These are the three dimensions we are interested in

|   GROUP BY
|   availability_zone,
|   instance_type,
|   time_range.num AS time

and we are only interested in calculating back to a certain point

|   WHERE
|   expire > floor(<>/60/60)*60*60 - <>






___
Python-ideas mailing list
Python-ideas@python.org

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-10 Thread Steven D'Aprano
On Sun, Feb 10, 2019 at 01:05:42PM -0500, Kyle Lahnakoski wrote:

> I am interested in vector operations.  I have situations where I want to
> perform some conceptually simple operations on a series of
> not-defined-by-me objects to make a series of conclusions.  The
> calculations can be done succinctly in SQL, but Python makes them
> difficult.

Can you post a simplified example of how you would do it in SQL, 
compared to what you would have to do in standard Python?


-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-10 Thread Christopher Barker
On Sun, Feb 10, 2019 at 10:06 AM Kyle Lahnakoski 
wrote:

> but none of these are as elegant or readable as the vectorized syntax
>
>  result = process.(vector .+ sequence) .* items
>
> I've a bit lost track of who is proposing what, but this looks like an
extra set of operators: ".*",  ".+" etc. for vectorized operations.

So I want to point out that is was proposed way back when for numpy:

MATLAB for instance, has a the usual operators: *, +, etc meaning "matrix
math", and then another set of "itemwise" operators with a "dot form: .*,
.+ . - for "itemwise" math.

numpy, on the other other had uses the regular operators for itemwise
operations (what we're calling vectorized here), and Python lacked an extra
set of operators that could be used for matrix math. Adding another full
set (.*, .+, etc) was discussed A LOT and the Python community did not want
that.

Then someone had the brilliant observation that matrix multiplication was
teh only one that was really useful and presto! the @ operator was born.

Anyway -- just suggesting that a full set of "vectorized" operators will
liley see a lot of resistance. And for my part, having mean the opposite of
what it does for numpy would be unfortunate as well.

> I am interested in vector operations.  I have situations where I want to
perform some conceptually simple operations on a series of
not-defined-by-me objects to make a series of conclusions.  The
calculations can be done succinctly in SQL, but Python makes them difficult.

Bringing real world examples of this would be a good idea for this
discussion

I'm inclined to think that something like pandas (maybe more generally SQL
-like that the number crunching focus of Pandas) might be better than new
syntax for the language -- but only real examples will tell.

I don't work with data like that much, but I"m pretty sure I've seen Python
packages that to attempt to address these use cases. (that being querying
and processing tabular data)

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-10 Thread Kyle Lahnakoski

On 2019-02-02 18:11, Steven D'Aprano wrote:
> We can improve that comprehension a tiny bit by splitting it into
> multiple steps:
>
>  temp1 = [d+e for d, e in zip(vector, sequence)]
>  temp2 = [process(c) for x in temp1]
>  result = [a*b for a, b in zip(temp2, items)]
>
> but none of these are as elegant or readable as the vectorized syntax
>
>  result = process.(vector .+ sequence) .* items

The following reads a little better:

| result = [
|     process(v+s)*i
| for v, s, i in zip(vector, sequence, items)
| ]

Vector operations will promote the use of data formats that work well
with vector operations. So, I would expect data to appear like rows in a
table, rather than in the columnar form shown above. Even if columnar
form must be dealt with, we can extend our Vector class (or whatever
abstraction you are using to enter vector space) to naturally zip() columns.

| Vector(zip(vector, sequence, items))
|     .map(lambda v, s, i: process(v+s)*i)    

If we let Vector represent a list of tuples instead of a list of values,
we can make construction simpler:

| Vector(vector, sequence, items)
|     .map(lambda v, s, i: process(v+s)*i)    

If we have zip() to extend the tuples in the Vector, then we can be
verbose to demonstrate how to use columnar data:

| Vector(vector)
| .zip(sequence)
| .map(operator.add)
| .map(process)
| .zip(items)
| .map(operator.mul)

This looks verbose, but it is not too far from the vectorized syntax:

the Vector() brings us to vector mode, and the two zip()s convert from
columnar form. This verbose form may be *better* than the vectorized
syntax because the operations are in order, rather than the mixing infix
and functional forms seen in the vectorized syntax form.

I suggest this discussion include vector operations on (frozen)
dicts/objects and (frozen) lists/tuples.  Then we can have an
interesting discussion about the meaning of group_by, join, and window
functions, plus other operations we find in database query languages.

I am interested in vector operations.  I have situations where I want to
perform some conceptually simple operations on a series of
not-defined-by-me objects to make a series of conclusions.  The
calculations can be done succinctly in SQL, but Python makes them
difficult. Right now, my solution is to describe the transformations in
JSON, and have an interpreter do the processing:

https://github.com/klahnakoski/SpotManager/blob/65f2c5743f3a9cfd1363cafec258c0a663e194c3/spot/spot_manager.py#L611



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-09 Thread Stephen J. Turnbull
Christopher Barker writes:

 > well, vectorization is kinda the *opposite* of matrix multiplication --
 > matrix multiplication is treating the matrix as a whole,

When I think of treating the matrix as a whole, I think of linear
algebra.  Matrix multiplication is repeated application of the inner
product, which is in turn a sum over vectorized multiplication.  I
share David's intuition about this, although it might not be the
common one.

Steve

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-08 Thread Marcos Eliziario
Just a quick idea. Wouldn't an arrow operator -> be less of an eye sore?

Em sex, 8 de fev de 2019 às 18:16, Christopher Barker 
escreveu:

> On Thu, Feb 7, 2019 at 4:27 PM David Mertz  wrote:
>
> > Actually, if I wanted an operator, I think that @ is more intuitive than
> extra dots.  Vectorization isn't matrix multiplication, but they are sort
> of in the same ballpark, so the iconography is not ruined.
>
> well, vectorization is kinda the *opposite* of matrix multiplication --
> matrix multiplication is treating the matrix as a whole, rther than
> applying multiplication to each element. And it is certainly the opposite
> in the numpy case.
>
> Which gives me an idea -- we could make an object that applied operators
> (and methods??) to each element individually, and use the @ operator when
> you wanted the method to act on the whole object instead.
>
> Note: I haven't thought about the details at all -- may not be practical
> to use an operator for that.
>
> >(Vec(seq) * 2).name.upper()
>
> > Or:
>
> >vec_seq = Vector(seq)
> >(vec_seq * 2).name.upper()
> ># ... bunch more stuff
> >seq = vec_seq.unwrap()
>
> what type would .unwrap() return?
>
> One of the strengths of the "operator" approach is that is could apply to
> any (appropriately mutable) sequence and keep that sequence. I"m not sure
> how much that actually matters, as I'm expecting this is a 99% list case
> anyway.
>
> and why would .unwrap() be required at all -- as opposed to say:
>
> seq = list(vec_seq)
>
> > I'm not saying the double dots are terrible, but they don't read
> *better* than wrapping (and optionally unwrapping) to me.
>
> nor to me.
>
> > Well... your maps are kinda deliberately ugly.
>
> That's actually pretty key -- in fact, if you wanted to apply a handful of
> operations to each item in a sequence, you would probably use a single
> expression (If possible) in a lambda in a map, or in a comprehension,
> rather than chaining the map.
>
> Even if it was more complex, you could write a function, and then apply
> that with a map or comprehension.
>
> In the numpy case, compare:
>
> c = sqrt(a**2 + b**2)
>
> to
>
> c = [sqrt(a**2 + b**2) for a,b in zip(a,b)]
>
> so still a single comprehension. But:
>
> 1) given the familiariy of math expressions -- the first really does read
> a LOT better
> 2) the first version can be better optimized (by numpy)
>
> So the questions becomes:
>
> * For other than math with numbers (which we have numpy for), are there
> use cases where we'd really get that much extra clarity?
>
> * Could we better optimize, say, a sequence of strings enough to make it
> all worth it?
>
> -CHB
>
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Marcos Eliziário Santos
mobile/whatsapp/telegram: +55(21) 9-8027-0156
skype: marcos.elizia...@gmail.com
linked-in : https://www.linkedin.com/in/eliziario/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-08 Thread James Lu
Has anyone thought about my proposal yet? I think because it allows chained 
function calls to be stored, which is probably something that is a common; if 
imagine people turning the same series of chained functions into a lambda of 
its own once it’s used more than once in a program.

Arguably, the lambda syntax is more readable and puts on less visual burden.

Sent from my iPhone

> On Feb 8, 2019, at 3:35 PM, David Mertz  wrote:
> 
>> On Fri, Feb 8, 2019 at 3:17 PM Christopher Barker  
>> wrote:
> 
>> >vec_seq = Vector(seq)
>> >(vec_seq * 2).name.upper()
>> ># ... bunch more stuff
>> >seq = vec_seq.unwrap()
>> 
>> what type would .unwrap() return?
> 
> The idea—and the current toy implementation/alpha—has .unwrap return whatever 
> type went into the Vector creation.  Might be a tuple, list, set, deque, or 
> it might be an iterator.  It might even be some custom collection that isn't 
> in the standard library.
> 
> But you can also explicitly make a Vector into something else by using that 
> constructor.  Pretty much as I gave example before:
> 
> set(Vector(a_list)) # Get a set
> Vector(a_list)).unwrap()# Get a list (without needing to know type to 
> call .unwrap())
>  
> -- 
> Keeping medicines from the bloodstreams of the sick; food 
> from the bellies of the hungry; books from the hands of the 
> uneducated; technology from the underdeveloped; and putting 
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-08 Thread David Mertz
On Fri, Feb 8, 2019 at 3:17 PM Christopher Barker 
wrote:

> >vec_seq = Vector(seq)
> >(vec_seq * 2).name.upper()
> ># ... bunch more stuff
> >seq = vec_seq.unwrap()
>
> what type would .unwrap() return?
>

The idea—and the current toy implementation/alpha—has .unwrap return
whatever type went into the Vector creation.  Might be a tuple, list, set,
deque, or it might be an iterator.  It might even be some custom collection
that isn't in the standard library.

But you can also explicitly make a Vector into something else by using that
constructor.  Pretty much as I gave example before:

set(Vector(a_list)) # Get a set
Vector(a_list)).unwrap()# Get a list (without needing to know type
to call .unwrap())

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-08 Thread Christopher Barker
On Thu, Feb 7, 2019 at 4:27 PM David Mertz  wrote:

> Actually, if I wanted an operator, I think that @ is more intuitive than
extra dots.  Vectorization isn't matrix multiplication, but they are sort
of in the same ballpark, so the iconography is not ruined.

well, vectorization is kinda the *opposite* of matrix multiplication --
matrix multiplication is treating the matrix as a whole, rther than
applying multiplication to each element. And it is certainly the opposite
in the numpy case.

Which gives me an idea -- we could make an object that applied operators
(and methods??) to each element individually, and use the @ operator when
you wanted the method to act on the whole object instead.

Note: I haven't thought about the details at all -- may not be practical to
use an operator for that.

>(Vec(seq) * 2).name.upper()

> Or:

>vec_seq = Vector(seq)
>(vec_seq * 2).name.upper()
># ... bunch more stuff
>seq = vec_seq.unwrap()

what type would .unwrap() return?

One of the strengths of the "operator" approach is that is could apply to
any (appropriately mutable) sequence and keep that sequence. I"m not sure
how much that actually matters, as I'm expecting this is a 99% list case
anyway.

and why would .unwrap() be required at all -- as opposed to say:

seq = list(vec_seq)

> I'm not saying the double dots are terrible, but they don't read *better*
than wrapping (and optionally unwrapping) to me.

nor to me.

> Well... your maps are kinda deliberately ugly.

That's actually pretty key -- in fact, if you wanted to apply a handful of
operations to each item in a sequence, you would probably use a single
expression (If possible) in a lambda in a map, or in a comprehension,
rather than chaining the map.

Even if it was more complex, you could write a function, and then apply
that with a map or comprehension.

In the numpy case, compare:

c = sqrt(a**2 + b**2)

to

c = [sqrt(a**2 + b**2) for a,b in zip(a,b)]

so still a single comprehension. But:

1) given the familiariy of math expressions -- the first really does read a
LOT better
2) the first version can be better optimized (by numpy)

So the questions becomes:

* For other than math with numbers (which we have numpy for), are there use
cases where we'd really get that much extra clarity?

* Could we better optimize, say, a sequence of strings enough to make it
all worth it?

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread David Mertz
On Thu, Feb 7, 2019 at 6:48 PM Steven D'Aprano  wrote:

> I'm sorry, I did not see your comment that you thought new syntax was a
> bad idea. If I had, I would have responded directly to that.
>

Well... I don't think it's the worst idea ever.  But in general adding more
operators is something I am generally wary about.  Plus there's the "grit
on Uncle Timmy's screen" test.

Actually, if I wanted an operator, I think that @ is more intuitive than
extra dots.  Vectorization isn't matrix multiplication, but they are sort
of in the same ballpark, so the iconography is not ruined.


> We can perform thought-experiments and
> we don't need anything but a text editor for that. As far as I'm
> concerned, the thought experiment of comparing these two snippets:
>
> ((seq .* 2)..name)..upper()
>
> versus
>
> map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2,
> seq)))
>

OK... now compare:

(Vec(seq) * 2).name.upper()

Or:

vec_seq = Vector(seq)
(vec_seq * 2).name.upper()
# ... bunch more stuff
seq = vec_seq.unwrap()

I'm not saying the double dots are terrible, but they don't read *better*
than wrapping (and optionally unwrapping) to me.

If we were to take @ as "vectorize", it might be:

(seq @* 2) @.name @.upper()

I don't hate that.

demonstrates conclusively that even with the ugly double dot syntax,
> infix syntax easily and conclusively beats map.
>

Agreed.


> If I recall correctly, the three maps here were originally proposed by
> you as examples of why map() alone was sufficient and there was no
> benefit to the Julia syntax.


Well... your maps are kinda deliberately ugly.  Even in that direction, I'd
write:

map(lambda s: (s*2).name.upper(), seq)

I don't *love* that, but it's a lot less monstrous than what you wrote.  A
comprehension probably even better:

[(s*2).name.upper() for s in seq]

Again, I apologise, I did not see where you said that this was intended
> as a proof-of-concept to experiment with the concept.
>

All happy.  Puppies and flowers.


> If the Vector class is only a proof of concept, then we surely don't
> need to care about moving things in and out of "vector mode". We can
> take it as a given that "the real thing" will work that way: the syntax
> will be duck-typed and work with any iterable,


Well... I at least moderately think that a wrapper class is BETTER than new
syntax. So I'd like the proof-of-concept to be at least moderately
functional.  In any case, there is ZERO code needed to move in/out of
"vector mode." The wrapped thing is simply an attribute of the object.
When we call vectorized methods, it's just `getattr(type(item), attr)` to
figure out the method in a duck-typed way.

one of the things which lead me to believe that you thought that a
> wrapper class was in and of itself a solution to the problem. If you had
> been proposing this Vector class as a viable working solution (or at
> least a first alpha version towards a viable solution) then worrying
> about round-tripping would be important.
>

Yes, I consider the Vector class a first alpha version of a viable
solution.  I haven't seen anything that makes me prefer new syntax.  I feel
like a wrapper makes it more clear that we are "living in vector land" for
a while.

The same is true for NumPy, in my mind.  Maybe it's just familiarity, but I
LIKE the fact that I know that when my object is an ndarray, operations are
going to be vectorized ones.  Maybe 15 years ago different decisions could
have been made, and some "vectorize this operation syntax" could have made
the ndarray structure just a behavior of lists instead.  But I think the
separation is nice.


> But as a proof-of-concept of the functionality, then:
>
> set( Vector(set_of_stuff) + spam )
> list( Vector(list_of_stuff) + spam )
>

That's fine.  But there's no harm in the class *remembering* what it wraps
either.  We might want to distinguish:

set(Vector(some_collection) + spam) # Make it a set after
the operations
(Vector(some_collection) + spam).unwrap()  # Recover whatever type it
was before


> Why do you care about type uniformity or type-checking the contents of
> the iterable?
>

Because some people have said "I want my vector to be specifically a
*sequence of strings* not of other stuff"

And MAYBE there is some optimization to be had if we know we'll never have
a non-footype in the sequence (after all, NumPy is hella optimized).
That's why the `stringpy` name that someone suggested.  Maybe we'd bypass
most of the Python-land calls when we did the vectorized operations, but
ONLY if we assume type uniformity.

But yes, I generally care about duck-typing only.


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread Steven D'Aprano
On Thu, Feb 07, 2019 at 03:17:18PM -0500, David Mertz wrote:
> Many apologies if people got one or more encrypted versions of this.
> 
> On 2/7/19 12:13 AM, Steven D'Aprano wrote:
> 
> It wasn't a concrete proposal, just food for thought. Unfortunately the
> thinking seems to have missed the point of the Julia syntax and run off
> with the idea of a wrapper class.
> 
> I did not miss the point! I think adding new syntax à la Julia is a bad
> idea—or at very least, not something we can experiment with today (and
> wrote as much).

I'm sorry, I did not see your comment that you thought new syntax was a 
bad idea. If I had, I would have responded directly to that.

Why is it an overtly *bad* (i.e. harmful) idea? As opposed to merely 
not sufficiently useful, or unnecessary?

You're certainly right that we can't easily experiment in the 
interpreter with new syntax, but we can perform thought-experiments and 
we don't need anything but a text editor for that. As far as I'm 
concerned, the thought experiment of comparing these two snippets:

((seq .* 2)..name)..upper()

versus

map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq)))

demonstrates conclusively that even with the ugly double dot syntax, 
infix syntax easily and conclusively beats map.

If I recall correctly, the three maps here were originally proposed by 
you as examples of why map() alone was sufficient and there was no 
benefit to the Julia syntax. I suggested composing them together as a 
single operation instead of considering them in isolation.


> Therefore, something we CAN think about and experiment with today is a
> wrapper class.

Again, I apologise, I did not see where you said that this was intended 
as a proof-of-concept to experiment with the concept.


[...]
> One of the principles I had in mind in my demonstration is that I want
> to wrap the original collection type (or keep it an iterator if it
> started as one).  A number of other ideas here, whether for built-in
> syntax or different behaviors of a wrapper, effectively always reduce
> every sequence to a list under the hood.  This makes my approach less
> intrusive to move things in and out of "vector mode."  For example:

If the Vector class is only a proof of concept, then we surely don't 
need to care about moving things in and out of "vector mode". We can 
take it as a given that "the real thing" will work that way: the syntax 
will be duck-typed and work with any iterable, and there will not be any 
actual wrapper class involved and consequently no need to move things in 
and out of the wrapper.

I had taken note of this functionality of the class before, and that was 
one of the things which lead me to believe that you thought that a 
wrapper class was in and of itself a solution to the problem. If you had 
been proposing this Vector class as a viable working solution (or at 
least a first alpha version towards a viable solution) then worrying 
about round-tripping would be important.

But as a proof-of-concept of the functionality, then:

set( Vector(set_of_stuff) + spam )
list( Vector(list_of_stuff) + spam )

should be enough to play around with the concept.



[...]
> Inasmuch as I want to handle iterator here, it is impossible to do any
> type check upon creating a Vector.  For concrete
> `collections.abc.Sequence` objects we could check, in principle.  But
> I'd rather it be "we're all adults here" ... or at most provide some
> `check_type_uniformity()` function or method that had to be called
> explicitly.

Why do you care about type uniformity or type-checking the contents of 
the iterable?

Comments like this suggest to me that you haven't understood the 
idea as I have tried to explain it. I'm sorry that I have failed to 
explain it better.

Julia is (if I understand correctly) statically typed, and that allows 
it to produce efficient machine code because it knows that it is 
iterating over (let's say) an array of 32-bit ints.

While that might be important for the efficiency of the generated 
machine code, that's not important for the semantic meaning of the code. 
In Python, we duck-type and resolve operations at runtime. We don't 
typically validate types in advance:

for x in sequence:
if not isinstance(x, Spam):
 raise TypeError('not Spam')
for x in sequence:
process(x)

(except under unusual circumstances). More to the point, when we write a 
for-loop:

result = []
for a_string in seq:
result.append(a_string.upper())

we don't expect that the interpreter will validate that the sequence 
contains nothing but strings in advance. So if I write this using Julia 
syntax:

result = seq..upper()

I shouldn't expect the iterpreter to check that seq contains nothing but 
strings either.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: 

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread David Mertz
Many apologies if people got one or more encrypted versions of this.

On 2/7/19 12:13 AM, Steven D'Aprano wrote:

It wasn't a concrete proposal, just food for thought. Unfortunately the
thinking seems to have missed the point of the Julia syntax and run off
with the idea of a wrapper class.

I did not miss the point! I think adding new syntax à la Julia is a bad
idea—or at very least, not something we can experiment with today (and
wrote as much).

Therefore, something we CAN think about and experiment with today is a
wrapper class.  This approach is pretty much exactly the same thing I
tried in a discussion of PEP 505 a while back (None-aware operators).
In the same vein as that—where I happen to dislike PEP 505 pretty
strongly—one approach to simulate or avoid new syntax is precisely to
use a wrapper class.

As a footnote, I think my demonstration of PEP 505 got derailed by lots
of comments along the lines of "Your current toy library gets the
semantics of the proposed new syntax wrong in these edge cases."  Those
comments were true (and I think I didn't fix all the issues since my
interest faded with the active thread)... but none of them were
impossible to fix, just small errors I had made.

With my *very toy* stringpy.Vector class, I'm just experimenting with
usage ideas.  I have shown a number of uses that I think could be useful
to capture most or all of what folks want in "string vectorization."
Most of what I've but in this list is what the little module does
already, but some is just ideas for what it might do if I add the code
(or someone else makes a PR at https://github.com/DavidMertz/stringpy).

One of the principles I had in mind in my demonstration is that I want
to wrap the original collection type (or keep it an iterator if it
started as one).  A number of other ideas here, whether for built-in
syntax or different behaviors of a wrapper, effectively always reduce
every sequence to a list under the hood.  This makes my approach less
intrusive to move things in and out of "vector mode."  For example:

  v1 = Vector(set_of_strings)
  set_of_strings = v1.lower().apply(my_str_fun)._it  # Get a set back
  v2 = Vector(list_of_strings)
  list_of_strings = v2.lower().apply(my_str_fun)._it # Get a list back
  v3 = Vector(deque_of_strings)
  deque_of_strings = v3.lower().apply(my_str_fun)._it # Get a deque back
  v4 = Vector(iter_of_strings)
  iter_of_strings = v4.lower().apply(my_str_fun)._it  # stays lazy!

So this is round-tripping through vector-land.

Small note: I use the attribute `._it` to store the "sequential thing."
 That feels internal, so maybe some better way of spelling "get the
wrapped thing" would be desirable.

I've also lost track of whether anyone is proposing a "vector of strings'
as opposed to a vector of arbitrary objects.

Nothing I wrote is actually string-specific.  That is just the main use
case stated.  My `stringpy.Vector` might be misnamed in that it is happy
to contain any kind of items.  But we hope they are all items with the
particular methods we want to vectorize.  I showed an example where a
list might contain a custom string-like object that happens to have
methods like `.lower()` as an illustration.

Inasmuch as I want to handle iterator here, it is impossible to do any
type check upon creating a Vector.  For concrete
`collections.abc.Sequence` objects we could check, in principle.  But
I'd rather it be "we're all adults here" ... or at most provide some
`check_type_uniformity()` function or method that had to be called
explicitly.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread James Lu
Here are some alternate syntaxes.

These are all equivalent to len(print(list)).

(len | print)(list)
(len |> print)(list)
(print <| len)(list)
print <| len << list
list >> print <| len
list >> len |> print


## Traditional argument order 
print <| len << list

## Stored functions 
print_lengths = len | print
print_lengths = len |> print
print_lengths = print <| len

These can be called using callable syntax.
These can be called using << syntax.
These can be called using >> syntax.
## Lightweight traditional syntax order
(print | len)()

# Explanation
The pipeline operator (|, |>, <|) create an object.

That object implements, depending on the chosen implementation, some 
combination of the __call__ operator, the __rshift__ operator, and/or the 
__lshift__ operator.
—
I am not proposing Python has all these operators at the same time, just 
putting these ideas out there for discussion. 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-07 Thread MRAB

On 2019-02-07 05:27, Chris Angelico wrote:

On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano  wrote:

At the risk of causing confusion^1, we could have a "vector call"
syntax:

# apply len to each element of obj, instead of obj itself
len[obj]

which has the advantage that it only requires that we give functions a
__getitem__ method, rather than adding new syntax. But it has the
disadvantage that it doesn't generalise to operators, without which I
don't think this is worth bothering with.


Generalizing to operators is definitely going to require new syntax,
since both operands can be arbitrary objects. So if that's essential
to the idea, we can instantly reject anything that's based on
functions (like "make multiplying a function by a tuple equivalent to
blah blah blah"). In that case, we come straight to a few key
questions:

1) Is this feature even worth adding syntax for? (My thinking: "quite
possibly", based on matmul's success despite having an even narrower
field of use than this.)

2) Should it create a list? a generator? something that depends on the
type of the operand? (Me: "no idea")

2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope")

3) If not, what syntax would be more appropriate?

This is a general purpose feature akin to comprehensions (and, in
fact, can be used in place of some annoyingly-verbose comprehensions).
It needs to be easy to type and read.

Pike's automap syntax is to subscript an array with [*], implying
"subscript this with every possible value". It's great if you want to
do just one simple thing:

f(stuff[*])
# [f(x) for x in stuff]
stuff[*][1]
# [x[1] for x in stuff]

but clunky for chained operations:

(f(stuff[*])[*] * 3)[*] + 1
# [f(x) * 3 + 1 for x in stuff]

That might not be a problem in Python, since you can always just use a
comprehension if vectorized application doesn't suit you.

I kinda like the idea, but the devil's in the details.

Would it be possible, at compile time, to retain it as an automap 
throughout the expression?


stuff[*]
# [x for x in suffix]

f(stuff[*])
# [f(x) for x in stuff]

(f(stuff[*]) * 3) + 1
# [f(x) * 3 + 1 for x in stuff]

There could also be a way to 'collapse' it again. An uncollapsed automap 
would be collapsed at the end of the expression. (Still a bit fuzzy 
about the details...)

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-06 Thread Chris Angelico
On Thu, Feb 7, 2019 at 4:03 PM Steven D'Aprano  wrote:
> At the risk of causing confusion^1, we could have a "vector call"
> syntax:
>
> # apply len to each element of obj, instead of obj itself
> len[obj]
>
> which has the advantage that it only requires that we give functions a
> __getitem__ method, rather than adding new syntax. But it has the
> disadvantage that it doesn't generalise to operators, without which I
> don't think this is worth bothering with.

Generalizing to operators is definitely going to require new syntax,
since both operands can be arbitrary objects. So if that's essential
to the idea, we can instantly reject anything that's based on
functions (like "make multiplying a function by a tuple equivalent to
blah blah blah"). In that case, we come straight to a few key
questions:

1) Is this feature even worth adding syntax for? (My thinking: "quite
possibly", based on matmul's success despite having an even narrower
field of use than this.)

2) Should it create a list? a generator? something that depends on the
type of the operand? (Me: "no idea")

2) Does the Julia-like "x." syntax pass the grit test? (My answer: "nope")

3) If not, what syntax would be more appropriate?

This is a general purpose feature akin to comprehensions (and, in
fact, can be used in place of some annoyingly-verbose comprehensions).
It needs to be easy to type and read.

Pike's automap syntax is to subscript an array with [*], implying
"subscript this with every possible value". It's great if you want to
do just one simple thing:

f(stuff[*])
# [f(x) for x in stuff]
stuff[*][1]
# [x[1] for x in stuff]

but clunky for chained operations:

(f(stuff[*])[*] * 3)[*] + 1
# [f(x) * 3 + 1 for x in stuff]

That might not be a problem in Python, since you can always just use a
comprehension if vectorized application doesn't suit you.

I kinda like the idea, but the devil's in the details.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-06 Thread Steven D'Aprano
On Sun, Feb 03, 2019 at 09:46:44PM -0800, Christopher Barker wrote:

> I've lost track if who is advocating what, but:

Ironically, I started this sub-thread in response to your complaint that 
you didn't like having to explicitly write loops/maps. So I pointed out 
that in Julia, people can use (almost) ordinary infix syntax using 
operators and function calls and have it apply automatically to each 
item in arrays.

It wasn't a concrete proposal, just food for thought. Unfortunately the 
thinking seems to have missed the point of the Julia syntax and run off 
with the idea of a wrapper class.

[...]
> I do not get the point of this at all -- we already have map"
> 
> map(v, lambda s s.replace()"a,", "b")

The order of arguments is the other way around. And you did say you 
didn't like map. Wouldn't you rather write:

items.replace("a", "b")

rather than 

map(lambda s: s.replace("a", "b"), items)

or

[s.replace("a", "b") for s in items]


I know I would. Provided of course we could distinguish between 
operations which apply to a single string, and those which apply to a 
generic collection of strings.

Beside, while a single map or comprehension is bearable, more complex 
operations are horrible to read when written that way, but trivially 
easy to read when written in standard infix arithmetic notation. See my 
earlier posts for examples.


> > v.replace("a", "b")
> >
> 
> This is adding something - maybe just compactness, but I also think
> readability.

Indeed. In Julia that also offers opportunities for the compiler to 
optimize the code, bringing it to within 10% or so of a C loop. 
Maybe PyPy could get there as well, but CPython probably can't.

> I've also lost track of whether anyone is proposing a "vector of strings'
> as opposed to a vector of arbitrary objects.

Not me.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-06 Thread Steven D'Aprano
Before I respond to a specific point below, I'd like to make a general 
observation.

I changed the subject line of this sub-thread to discuss a feature of 
Julia, which allows one to write vectorized code in standard infix 
arithmetic notation, that applies to any array type, using any existing 
function or operator, WITHOUT having to wrap your data in a special 
delegate class like this "Vector".

So as far as I'm concerned, this entire discussion about this wrapper 
class misses the point.

(Aside: why is this class called "Vector" when it doesn't implement a 
vector?)

Anyway, on to my response to a specific point:


On Mon, Feb 04, 2019 at 11:12:08AM -0500, David Mertz wrote:
> On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov 
> wrote:
> 
> > len(v)   # -> 12
> >
> > v[len]   # -> 
> >
> >
> > In this case you can apply any function, even custom_linked_list from
> > my_inhouse_module.py.
> >
> 
> I think I really like this idea.  Maybe as an extra spelling but still
> allow .apply() to do the same thing. It feels reasonably intuitive to me.
> Not *identical to* indexing in NumPy and Pandas, but sort of in the same
> spirit as predicative or selection based indices.
> 
> What do other people on this thread think? Would you learn that easily?

obj[len] already has an established meaning as obj.__getitem__(len). 
There's going to be clash here between key lookup and applying a 
function:

obj[len]  # look up key=len
obj[len]  # apply function len

Mathematica does use square brackets for calling functions, but it uses 
ordinary arithmetic order len[obj] rather than postfix order obj[len].

At the risk of causing confusion^1, we could have a "vector call" 
syntax:

# apply len to each element of obj, instead of obj itself
len[obj]

which has the advantage that it only requires that we give functions a 
__getitem__ method, rather than adding new syntax. But it has the 
disadvantage that it doesn't generalise to operators, without which I 
don't think this is worth bothering with.




^1 Cue a thousand Stackoverflow posts asking whether they should use 
round brackets or square when calling a function, and why they get weird 
error messages sometimes and not other times.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-05 Thread Jimmy Girardet
Hi,

I'm not sure to understand the real purpose of Vector.

Is that a new collection ?

Is that a list with a builtin map() function ?

Is it a  wrapper to other types ?

Should it be iterable ?


The clear need explained before is using fluent interface on a collection :

MyVector.strip().replace("A","E")

Why do we need Vector to behave like list. We just want to work on our
strings but with a cleaner/shorter/nicer syntax.

My idea (not totally clear in my mind) is that Vector should behave
quite like the type it wraps so having only one type.

I don't want a collection of strings, I want a MegaString (...) which I
can use exactly like alone string.

An iteration on Vector would iter like itertools.chain does.

At the end, I would only need one more method which would return an
iterable of the items like MyVector.explode()


For me Vector should be something like that :

class Vector:

    def __init__(self, a_list):
    self.data = a_list
    self._type = type(self.data[0])

    for data in self.data:
    if type(data) != self._type:
    raise TypeError

    def __getattr__(self, name):
    fn =  getattr(self._type, name)

    def wrapped(*args, **kwargs):
    self.data = [fn(i, *args, **kwargs) for i in self.data]
    return self
    return wrapped

    def explode(self):
  return iter(self.data)


I'm not saying it should only handle strings but it seems to be the
major use case.

Jimmy


Le 04/02/2019 à 17:12, David Mertz a écrit :
> On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov  > wrote:
>
> len(v)   # -> 12
>
> v[len]   # -> 
>
>
> In this case you can apply any function,
> even custom_linked_list frommy_inhouse_module.py. 
>
>
> I think I really like this idea.  Maybe as an extra spelling but still
> allow .apply() to do the same thing. It feels reasonably intuitive to
> me. Not *identical to* indexing in NumPy and Pandas, but sort of in
> the same spirit as predicative or selection based indices.
>
> What do other people on this thread think? Would you learn that
> easily? Could you teach it?
>  
>
> >>> v[1:]  
>  'Sep', 'Oct', 'Nov', 'Dec']>  
> >>> v[i[1:]] # some helper class `i`
>  'ep', 'ct', 'ov', 'ec']>  
>
>
> This feels more forced, unfortunately.  Something short would be good,
> but not sure I like this.  This is really just a short spelling of
> pandas.IndexSlice or numpy.s_  It came up in another thread some
> months ago, but there is another proposal to allow the obvious
> spelling `slice[start:stop:sep]` as a way of creating slices.
>
> Actually, I guess that's all halfway for the above.  We'd need to do
> this still:
>
> v[itemgetter(IndexSlicer[1:])]
>
>  
> That's way too noisy.  I guess I just don't find the lowercase `i` to
> be iconic enough.  I think with a better SHORT name, I'd like:
>
> v[Item[1:]]
>
>
>  Maybe that's not the name?
>
> -- 
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


Le 04/02/2019 à 17:12, David Mertz a écrit :
> On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov  > wrote:
>
> len(v)   # -> 12
>
> v[len]   # -> 
>
>
> In this case you can apply any function,
> even custom_linked_list frommy_inhouse_module.py. 
>
>
> I think I really like this idea.  Maybe as an extra spelling but still
> allow .apply() to do the same thing. It feels reasonably intuitive to
> me. Not *identical to* indexing in NumPy and Pandas, but sort of in
> the same spirit as predicative or selection based indices.
>
> What do other people on this thread think? Would you learn that
> easily? Could you teach it?
>  
>
> >>> v[1:]  
>  'Sep', 'Oct', 'Nov', 'Dec']>  
> >>> v[i[1:]] # some helper class `i`
>  'ep', 'ct', 'ov', 'ec']>  
>
>
> This feels more forced, unfortunately.  Something short would be good,
> but not sure I like this.  This is really just a short spelling of
> pandas.IndexSlice or numpy.s_  It came up in another thread some
> months ago, but there is another proposal to allow the obvious
> spelling `slice[start:stop:sep]` as a way of creating slices.
>
> Actually, I guess that's all halfway for the above.  We'd need to do
> this still:
>
> v[itemgetter(IndexSlicer[1:])]
>
>  
> That's way too noisy.  I guess I just don't find the lowercase `i` to
> be iconic enough.  I think with a better SHORT name, I'd like:
>
>   

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-04 Thread David Mertz
On Mon, Feb 4, 2019 at 7:14 AM Kirill Balunov 
wrote:

> len(v)   # -> 12
>
> v[len]   # -> 
>
>
> In this case you can apply any function, even custom_linked_list from
> my_inhouse_module.py.
>

I think I really like this idea.  Maybe as an extra spelling but still
allow .apply() to do the same thing. It feels reasonably intuitive to me.
Not *identical to* indexing in NumPy and Pandas, but sort of in the same
spirit as predicative or selection based indices.

What do other people on this thread think? Would you learn that easily?
Could you teach it?


> >>> v[1:]
>  'Nov', 'Dec']>
> >>> v[i[1:]] # some helper class `i`
>  'ov', 'ec']>
>
>
This feels more forced, unfortunately.  Something short would be good, but
not sure I like this.  This is really just a short spelling of
pandas.IndexSlice or numpy.s_  It came up in another thread some months
ago, but there is another proposal to allow the obvious spelling
`slice[start:stop:sep]` as a way of creating slices.

Actually, I guess that's all halfway for the above.  We'd need to do this
still:

v[itemgetter(IndexSlicer[1:])]


That's way too noisy.  I guess I just don't find the lowercase `i` to be
iconic enough.  I think with a better SHORT name, I'd like:

v[Item[1:]]


 Maybe that's not the name?

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-04 Thread Kirill Balunov
вс, 3 февр. 2019 г. в 21:23, David Mertz :

>
> I think the principled thing to do here is add the minimal number of
> methods to Vector itself, and have everything else pass through as
> vectorized calls.  Most of that minimal number are "magic method":
> __len__(), __contains__(), __str__(), __repr__(), __iter__(),
> __reversed__().  I might have forgotten a couple.  All of those should not
> be called directly, normally, but act as magic for operators or built-in
> functions.
>
> I think I should then create regular methods of the same name that perform
> the vectorized version.  So we would have:
>
> len(v)   # -> 12
>
> v.len()  # -> 
>
> list(v)  # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...]
> v.list() # -> 
>
>
>
Hi David! Thank you for taking the time to implement this idea. Sorry, I'm
on a trip now and can't try it. From what I've read in this thread, I think
I mostly agree with your perception how the vector should work: that `len(v)
 # -> 12` and that `.some_method()` call must apply to elements (although
pedants may argue that in this case there is not much difference). The only
moment that I don’t like is `v.len(), v.list() and ...`, for the same
reasons -  in general this will not work. I also don't like the option with
`.apply` - what if `.apply` method is already defined for elements in a
vector?


> I can't implement every single constructor that users might
> conceivably want, of course, but I can do it for the basic types in
> builtins and common standard library.  E.g. I might do:
>
> v.deque() # ->  ... >
>
>
> But I certainly won't manually add:
>
> v.custom_linked_list()  # From my_inhouse_module.py
>
>
> Hmm... maybe even I could look at names of maybe-constructors in the
> current namespace and try them.  That starts to feel too magic.  Falling
> back to this feels better:
>
> map(custom_linked_list, v)  # From my_inhouse_module.py
>
>
>
Actually my thoughts on this. At first I thought that for these purposes it
is possible to use __call__:

len(v)   # -> 12

v(len)   # -> 


But it somehow this idea did not fit in my head. Then I found the next way
and I think I even like it - to reuse the `__getitem__`, when its argument
is a function it means that you apply this function to every element in the
vector.

len(v)   # -> 12

v[len]   # -> 


In this case you can apply any function, even custom_linked_list from
my_inhouse_module.py. From this stream I did not understand what desired
behavior for unary operations like `vector + 1` and  the others. Also what
is the desired behaviour for `vector[1:5]`? Considering the above, I would
like to take this operation on the contrary:

>>> v

>>> v[1:]

>>> v[i[1:]] # some helper class `i`


With kind regards,
-gdg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-04 Thread David Mertz
On Mon, Feb 4, 2019, 12:47 AM Christopher Barker

> I've lost track if who is advocating what, but:
>

Well, I made a toy implementation of a Vector class. I'm not sure what that
means I advocate other than the existence of a module on GitHub.

FWIW, I called the repo 'stringpy' as a start, so that expressed some
interest in it being about vectors of strings. But so-far, I haven't found
anything that actually needs to be string-like.  In general, methods get
passed through to their underlying objects and deliberately duck typed,
like:

v.replace("a", "b")
>>
>
As an extra, we could enforce homogeneity, or even string-nesss
specifically. I don't really know what homogeneity means though, once we
consider ABCs, subclasses, and duck types that don't use inheritance on r
ABC registration. At least so far, I haven't coded anything that would get
a performance gain from enforcing the string-nesss of items (but all pure
Python so far, no Cython or C)

This is adding something - maybe just compactness, but I also think
> readability.
>

I think with changed methods the win gets greater:

v.replace("a", "b").upper().apply(myfun)

If you want to do any generic items, it becomes a lot harder.
>

So far, generic has been a lot easier to code than hand-rolled methods.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Christopher Barker
I've lost track if who is advocating what, but:

> >>> # Replace all "a" by "b"

> >>> v.apply(lambda s: s.replace("a", "b"))
>>
>
I do not get the point of this at all -- we already have map"

map(v, lambda s s.replace()"a,", "b")

these seem equally expressive an easy to me, and map doesn't require a
custom class of anything new at all.


> v.replace("a", "b")
>

This is adding something - maybe just compactness, but I also think
readability.

I've also lost track of whether anyone is proposing a "vector of strings'
as opposed to a vector of arbitrary objects.

I think a vector strings could be useful and then it would be easier to
decide which string methods should be applied to items vs the vector as a
whole. If you want to do any generic items, it becomes a lot harder.

I think numpy has had the success it has because it assumes all dytpes are
numerical and thus support (mostly) the same operations.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Christopher Barker
> I know, but if an element-wise operator is useful it would also be useful
> for libraries like NumPy that already support the @ operator for matrix
> multiplication.
>

A bit of history:

A fair amount of inspiration (or at least experience) for numpy came from
MATLAB.

MATLAB has essentially two complete sets of math operators: the regular
version, and the dot version.

A * B

Means matrix multiplication, and

A .* B

Means elementwise multiplication. And there is a full set of matrix and
elementwise operators.

Back in the day, Numeric (numpy’s predecessor”) used the math operators for
elementwise operations, and doing matrix math was unwieldy. There was a lit
of discussion and a number of proosals for s full set of additional
operators in python that could be used for matrix operations ( side note:
there was (is) a numpy.matrix class that defines __mul__ as matrix
multiplication).

Someone at some point realized that we didn’t need a full set, because
multiplication was really the only compelling use case. So the @ operator
was added.

End history.

Numpy, or course, is but one third party package, but it is an important
one — major inconsistency with it is a bad idea.

 -CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
How would you spell these with funcoperators?

v.replace("a","b").upper().count("B")
vec1.replace("PLACEHOLDER", vec2)
concat = vec1 + vec2

On Sun, Feb 3, 2019, 6:40 PM Robert Vanden Eynde 
>
> On Sat, 2 Feb 2019, 21:46 Brendan Barnwell 
> Yeah, it's called pip install funcoperators :
>
>>  some_list @ str.lower @ tokenize @ remove_stopwords
>>
>
> → some_list @ to(str.lower) @ to(tokenize) @ to(remove_stopwords)
>
> Where from funcoperators import postfix as to
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
On Sun, Feb 3, 2019, 6:36 PM Greg Ewing

> But they only cover the special case of a function that takes
> elements  from just one input vector. What about one that takes
> coresponding elements from two or more vectors?
>

What syntax would you like? Not necessarily new syntax per se, but what
calling convention.

I can think of a few useful cases.

vec1.replace("PLACEHOLDER", vec2)

Maybe that would transform one vector using the corresponding strings from
another vector.

What should happen if the vector length mismatch? I think this should
probably be an exception... unlike what zip() and itertools.zip_longest()
do. But maybe not.

concat = vec1 + vec2

Again the vector length question is there. But assuming the same length,
this seems like a reasonable way to get a new vector concatenating each
corresponding element.

Other uses? Are they different in general pattern?

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Robert Vanden Eynde
On Sat, 2 Feb 2019, 21:46 Brendan Barnwell   some_list @ str.lower @ tokenize @ remove_stopwords
>

→ some_list @ to(str.lower) @ to(tokenize) @ to(remove_stopwords)

Where from funcoperators import postfix as to
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Greg Ewing

Adrien Ricocotam wrote:

 >>> # Compute the length of each element of the Vector `v`
 >>> v.apply(len)
 >>> v @ len

Another example with parameters
 >>> # Replace all "a" by "b"
 >>> v.apply(lambda s: s.replace("a", "b"))
 >>> v @ (lambda s: s.replace("a", "b"))

My personal opinion is that the two notations feel good.


But they only cover the special case of a function that takes
elements  from just one input vector. What about one that takes
coresponding elements from two or more vectors?

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread MRAB

On 2019-02-03 22:58, David Mertz wrote:

 >>> len(v)  # Number of elements in the Vector `v`


Agreed, this should definitely be the behavior.  So how do we get a 
vector of lengths of each element?


 >>> # Compute the length of each element of the Vector `v`
 >>> v.apply(len)
 >>> v @ len


Also possible is:

     v.len()

We couldn't do that for every possible function, but this one is special 
inasmuch as we expect the items each to have a .__len__() but don't want 
to spell the dunders. Likewise for just a handful of other 
methods/functions.


The key different though is that *I* would want to a way to use both 
methods already attached to the objects/items. in a vector and also a 
generic user-provided function that operates on the items. I guess you 
disagree about "method pass-through" but it reads more elegantly to me:


 >>> # Replace all "a" by "b"
 >>> v.apply(lambda s: s.replace("a", "b"))
 >>> v @ (lambda s: s.replace("a", "b"))


Compare these with:

     v.replace("a", "b")

Since we already know v is a Vector, we kinda expect methods to be 
vectorized.  This feels like the "least surprise" and also the least 
extra code.  Moreover, spelling chained methods with many .appy() calls 
(even if spelled '@') feels very cumbersome:



Do they need multiple uses of apply and @?

(A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda 
s: s.count("B"))

>
v.apply(lambda s: s.replace("a", "b").upper().count("B"))


(B) v @ lambda s: s.replace("a", "b") @ str.upper  @ lambda s: s.count("B")


v @ lambda s: s.replace("a", "b").upper().count("B")


(C) v.replace("a","b").upper().count("B")

Between these, (C) feels a heck of a lot more intuitive and readable to me.


[snip]
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Greg Ewing

Ronald Oussoren via Python-ideas wrote:


On 3 Feb 2019, at 21:34, David Mertz > wrote:

>>
Using @ both for matrix multiplication and 
element-wise application could be made to work, but would be very 
confusing.


The way @ is defined in numpy does actually work for both.
E.g. v1 @ v2 where v1 and v2 are 3-dimensional arrays is
equivalent to multiplying two 1D arrays of 2D matrices
elementwise.

Is this confusing? Maybe, but it's certainly useful.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
>
> >>> len(v)  # Number of elements in the Vector `v`
>

Agreed, this should definitely be the behavior.  So how do we get a vector
of lengths of each element?


> >>> # Compute the length of each element of the Vector `v`
> >>> v.apply(len)
> >>> v @ len
>

Also possible is:

v.len()

We couldn't do that for every possible function, but this one is special
inasmuch as we expect the items each to have a .__len__() but don't want to
spell the dunders. Likewise for just a handful of other methods/functions.

The key different though is that *I* would want to a way to use both
methods already attached to the objects/items. in a vector and also a
generic user-provided function that operates on the items. I guess you
disagree about "method pass-through" but it reads more elegantly to me:

>>> # Replace all "a" by "b"
> >>> v.apply(lambda s: s.replace("a", "b"))
> >>> v @ (lambda s: s.replace("a", "b"))
>

Compare these with:

v.replace("a", "b")

Since we already know v is a Vector, we kinda expect methods to be
vectorized.  This feels like the "least surprise" and also the least extra
code.  Moreover, spelling chained methods with many .appy() calls (even if
spelled '@') feels very cumbersome:

(A) v.apply(lambda s: s.replace("a", "b")).apply(str.upper).apply(lambda s:
s.count("B"))

(B) v @ lambda s: s.replace("a", "b") @ str.upper  @ lambda s: s.count("B")

(C) v.replace("a","b").upper().count("B")

Between these, (C) feels a heck of a lot more intuitive and readable to me.

Here we put an emphasis on the methods already attached to objects.  But
this isn't terrible:

def double(x):
return x*2
v.apply(double).replace("a","b").upper().count("B")

In @ notation it would be:

v @ double @ lambda s: s.replace("a", "b") @ str.upper  @ lambda s:
s.count("B")

The 'double' is slightly easier, but the method calls are much worse.

MOREOVER, the model of "everything is apply/@" falls down terribly once we
have duck typing.

This is a completely silly example, but it's one that apply/@ simply cannot
address because it assumes it is the SAME function/method applied to each
object:

>>> class CaseInsensitiveStr(str):
... def replace(self, old, new):
... return str.upper(self).replace(old.upper(), new.upper())
...
>>> l = ['Monday', CaseInsensitiveStr('Tuesday'), 'Wednesday']
>>> v = Vector(l)
>>> v.replace('day', 'time')





-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Greg Ewing

Adrien Ricocotam wrote:

I honestly don’t understand what you don’t like the @ syntax.


Another probkem with @ is that it already has an intended meaing,
i.e. matrix multiplication. What if you have two vectors of matrices
and you want to multiply corresponding ones?

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Adrien Ricocotam
@David Mertz 
 I think I can't explain well my ideas ^^. I'll try to be really detailed
so I'm not sure I'm actually saying what I'm thinking.

Let's consider the idea of that Vector class this way :
Vectors are list of a defined type (may be immutable ?) and adds sugar
syntaxing for vectorized operations.

Based on this small and not complete enough definition, we should be able
to apply any function to that vector.
I identify two ways functions are used with  vectors : it's either applied
on the vector as an iterable/list, or on the elements of this vector.
Thus, we need to be have different notations for those two uses. To keep it
coherent with  Python, if a functions is applied on the vector as an
iterable,
the vector is given as a parameter :

>>> len(v)  # Number of elements in the Vector `v`

If we want to apply a function on each element of the list, we should then
use another notations. So far, several have been proposed.
In the following example showing the different notations, we use the
generic way so we can apply it to user-defined functions :

>>> # Compute the length of each element of the Vector `v`
>>> v.apply(len)
>>> v @ len

Another example with parameters
>>> # Replace all "a" by "b"
>>> v.apply(lambda s: s.replace("a", "b"))
>>> v @ (lambda s: s.replace("a", "b"))

My personal opinion is that the two notations feel good. One is standard,
the other is not but is less verbose and it's a good point.

Now that I detailed everything in my brain and by mail, I guess we are just
saying the same thing !

There's something I didn't mention on purpose, it's the use of : `v.lower()`
I think having special cases of how vectors works is not a good idea : it's
confusing.
If we want the user to be able to use user-defined functions we need a
notation. Having something different for some
of the functions feels weird to me. And obviously, if the user can't use
its own functions, this whole thing is pretty useless.

Tell me if I got anything wrong.

Nb : I found a way to simplify my previous example using lambda instead of
partial.

Le dim. 3 févr. 2019 à 21:34, David Mertz  a écrit :

> On Sun, Feb 3, 2019 at 3:16 PM Ronald Oussoren 
> wrote:
>
>> The @ operator is meant for matrix multiplication (see PEP 465) and is
>> already used for that in NumPy. IMHO just that is a good enough reason for
>> not using @ as an elementwise application operator (ignoring if having an
>> such an operator is a good idea in the first place).
>>
>
> Co-opting operators is pretty common in Python.  For example, the
> `.__div__()` operator spelled '/' is most often used for some kind of
> numeric division.  Some variations on that, for example vectorized in
> NumPy.  And different numeric types operate a bit differently.  The name of
> the magic method obvious suggests division.
>
> And yet, in the standard library we have pathlib which we can use like
> this (from the module documentation):
>
> >>> p = Path('/etc')>>> q = p / 'init.d' / 'reboot'
>
> That use is reasonable and iconic, even if it is nothing like division.
>
> The `.__mod__()` operator spelled '%' means something very different in
> relation to a float or int object versus a string object.  I.e. modulo
> division versus string interpolation.
>
> I've even seen documentation of some library that coopts `.__matmul__()`
> to do something with email addresses.  It's not a library I use, just
> something I once saw the documentation on, so I'm not certain of details.
> But you can imagine that e.g. :
>
> email = user @ domain
>
>
> Could be helpful and reasonable (exact behavior and purpose could vary,
> but it's "something about email" iconically).
>
> In other words, I'm not opposed to using the @ operator in my
> stringpy.Vector class out of purity about the meaning of operators.  I just
> am not convinced that it actually adds anything that is not easier without
> it.
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Ronald Oussoren via Python-ideas

> On 3 Feb 2019, at 21:34, David Mertz  wrote:
> 
> On Sun, Feb 3, 2019 at 3:16 PM Ronald Oussoren  > wrote:
> The @ operator is meant for matrix multiplication (see PEP 465) and is 
> already used for that in NumPy. IMHO just that is a good enough reason for 
> not using @ as an elementwise application operator (ignoring if having an 
> such an operator is a good idea in the first place).
> 
> Co-opting operators is pretty common in Python.  For example, the 
> `.__div__()` operator spelled '/' is most often used for some kind of numeric 
> division.  Some variations on that, for example vectorized in NumPy.  And 
> different numeric types operate a bit differently.  The name of the magic 
> method obvious suggests division.

I know, but if an element-wise operator is useful it would also be useful for 
libraries like NumPy that already support the @ operator for matrix 
multiplication.  Using @ both for matrix multiplication and element-wise 
application could be made to work, but would be very confusing. 
 
Ronald

—

Twitter: @ronaldoussoren
Blog: https://blog.ronaldoussoren.net/___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
On Sun, Feb 3, 2019 at 3:16 PM Ronald Oussoren 
wrote:

> The @ operator is meant for matrix multiplication (see PEP 465) and is
> already used for that in NumPy. IMHO just that is a good enough reason for
> not using @ as an elementwise application operator (ignoring if having an
> such an operator is a good idea in the first place).
>

Co-opting operators is pretty common in Python.  For example, the
`.__div__()` operator spelled '/' is most often used for some kind of
numeric division.  Some variations on that, for example vectorized in
NumPy.  And different numeric types operate a bit differently.  The name of
the magic method obvious suggests division.

And yet, in the standard library we have pathlib which we can use like this
(from the module documentation):

>>> p = Path('/etc')>>> q = p / 'init.d' / 'reboot'

That use is reasonable and iconic, even if it is nothing like division.

The `.__mod__()` operator spelled '%' means something very different in
relation to a float or int object versus a string object.  I.e. modulo
division versus string interpolation.

I've even seen documentation of some library that coopts `.__matmul__()` to
do something with email addresses.  It's not a library I use, just
something I once saw the documentation on, so I'm not certain of details.
But you can imagine that e.g. :

email = user @ domain


Could be helpful and reasonable (exact behavior and purpose could vary, but
it's "something about email" iconically).

In other words, I'm not opposed to using the @ operator in my
stringpy.Vector class out of purity about the meaning of operators.  I just
am not convinced that it actually adds anything that is not easier without
it.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Ronald Oussoren via Python-ideas


> On 3 Feb 2019, at 09:54, Adrien Ricocotam  wrote:
> 
> Nice that you implemented it !
> 
> I think all the issues you have right now would go of using another 
> operation. I proposed the @ notation that is clear and different from 
> everything else,
> plus the operator is called "matmul" so it completely makes sense.

The @ operator is meant for matrix multiplication (see PEP 465) and is already 
used for that in NumPy. IMHO just that is a good enough reason for not using @ 
as an elementwise application operator (ignoring if having an such an operator 
is a good idea in the first place).

Ronald


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
On Sun, Feb 3, 2019 at 1:38 PM Adrien Ricocotam  wrote:

> I honestly don’t understand what you don’t like the @ syntax.
>

Can you show any single example that would work with the @ syntax that
would not work in almost exactly the same way without it? I have not seen
any yet, and none seem obvious.  Adding new syntax for its own sake is
definitely to be avoided when possible (even though technically the
operator exists, so it wouldn't be actual new syntax).


> My idea is using functions that takes on argument : an object of the type
> of the vector. That’s actually how map works.
>

I do not understand this.  Spell my simple example using @ notation.  I.e.

my_vec @ replace {something? here for 'foo' with 'bar'}



> What I understood from your previous message is that there’s ambiguity
> when using magic functions on whether it’s applied to each element of the
> vector or the vector itself. That was the first thing I saw.
>

I decided there really isn't.  I think that any function applied to the
vector should operate on the sequence as a whole.  E.g. what length does it
have? Cast it to a different kind of sequence. Print it out. Serialize it.
Etc.

The things that are vectorized should always be methods of the vector
instead.  And ALMOST every method should in fact be a vectorized
operation.  In most cases, those will be a "pass through" to the methods of
the items inside of the vector. We won't write every possible method in the
Vector class.

My toy so far only works with methods that the items actually have.  In the
examples, string methods.  But actually, I should add one method like this:

my_vec.apply(lambda x: x*2)


That is, we might want to vectorize custom functions also.  Maybe in that
example we should name the function 'double' for clarity: '
my_vec.apply(double)'.  I do think that just a few methods need to be
custom programmed because they correspond to magic methods of the items
rather than regular names (or not even directly to magic methods, but more
machinery).  So:

my_vec.list()  #-> cast each item to a list

my_vec.tuple() #-> cast each item to a tuple

my_vec.set()   #-> cast each item to a set


Maybe that's doing too much though.  We could always do that with map() or
comprehensions; it's not clear it's a common enough use case.

Functions that could be used are then the same we can use in map. But I do
> agree it’s not easy to have functions with parameters. That’s why I used
> functools.partial
>

I really did not understand how that was meant to work.  But it was a whole
lot of lines to accomplish something very small either way.


> On Sun 3 Feb 2019 at 19:23, David Mertz  wrote:
>
>> On Sun, Feb 3, 2019 at 3:54 AM Adrien Ricocotam 
>> wrote:
>>
>>> I think all the issues you have right now would go of using another
>>> operation. I proposed the @ notation that is clear and different from
>>> everything else,
>>>
>> plus the operator is called "matmul" so it completely makes sense. The
>>> the examples would be :
>>>
>>
>>
>>> >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
>>> >>> v = Vector(l)
>>> >>> len(v)
>>> 12
>>> >>> v @ len
>>> 
>>>
>>
>> I cannot really see how using the @ operator helps anything here.  If
>> this were a language that isn't Python (or conceivably some future version
>> of Python, but that doesn't feel likely or desirable to me), I could
>> imagine @ as an operator to vectorize any arbitrary sequence (or
>> iterator).  But given that we've already made the sequence into a Vector,
>> there's no need for extra syntax to say it should act in a vectorized way.
>>
>> Moreover, your syntax is awkward for methods with arguments.  How would I
>> spell:
>>
>> v.replace('foo', 'bar')
>>
>>
>> In the @ syntax? I actually made an error on my first pass where simply
>> naming a method was calling it.  I thought about keeping it for a moment,
>> but that really only allows zero argument calls.
>>
>> I think the principled thing to do here is add the minimal number of
>> methods to Vector itself, and have everything else pass through as
>> vectorized calls.  Most of that minimal number are "magic method":
>> __len__(), __contains__(), __str__(), __repr__(), __iter__(),
>> __reversed__().  I might have forgotten a couple.  All of those should not
>> be called directly, normally, but act as magic for operators or built-in
>> functions.
>>
>> I think I should then create regular methods of the same name that
>> perform the vectorized version.  So we would have:
>>
>> len(v)   # -> 12
>>
>> v.len()  # -> 
>>
>> list(v)  # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...]
>> v.list() # -> 
>>
>>
>> I can't implement every single constructor that users might
>> conceivably want, of course, but I can do it for the basic types in
>> builtins and common standard library.  E.g. I might do:
>>
>> v.deque() # -> > "b"]) ... >
>>
>>
>> But I certainly won't manually add:
>>
>> v.custom_linked_list()  # From my_inhouse_module.py
>>
>>
>> Hmm... maybe even I 

Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Adrien Ricocotam
I honestly don’t understand what you don’t like the @ syntax.

My idea is using functions that takes on argument : an object of the type
of the vector. That’s actually how map works.

What I understood from your previous message is that there’s ambiguity when
using magic functions on whether it’s applied to each element of the vector
or the vector itself. That was the first thing I saw.

While reading your examples, I noticed that you were using «
my_vec.function() ». You just said that we will not code the « .function »
for any function. That’s the other problem I wanted to address with the @
notation.

Functions that could be used are then the same we can use in map. But I do
agree it’s not easy to have functions with parameters. That’s why I used
functools.partial

On Sun 3 Feb 2019 at 19:23, David Mertz  wrote:

> On Sun, Feb 3, 2019 at 3:54 AM Adrien Ricocotam 
> wrote:
>
>> I think all the issues you have right now would go of using another
>> operation. I proposed the @ notation that is clear and different from
>> everything else,
>>
> plus the operator is called "matmul" so it completely makes sense. The the
>> examples would be :
>>
>
>
>> >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
>> >>> v = Vector(l)
>> >>> len(v)
>> 12
>> >>> v @ len
>> 
>>
>
> I cannot really see how using the @ operator helps anything here.  If this
> were a language that isn't Python (or conceivably some future version of
> Python, but that doesn't feel likely or desirable to me), I could imagine @
> as an operator to vectorize any arbitrary sequence (or iterator).  But
> given that we've already made the sequence into a Vector, there's no need
> for extra syntax to say it should act in a vectorized way.
>
> Moreover, your syntax is awkward for methods with arguments.  How would I
> spell:
>
> v.replace('foo', 'bar')
>
>
> In the @ syntax? I actually made an error on my first pass where simply
> naming a method was calling it.  I thought about keeping it for a moment,
> but that really only allows zero argument calls.
>
> I think the principled thing to do here is add the minimal number of
> methods to Vector itself, and have everything else pass through as
> vectorized calls.  Most of that minimal number are "magic method":
> __len__(), __contains__(), __str__(), __repr__(), __iter__(),
> __reversed__().  I might have forgotten a couple.  All of those should not
> be called directly, normally, but act as magic for operators or built-in
> functions.
>
> I think I should then create regular methods of the same name that perform
> the vectorized version.  So we would have:
>
> len(v)   # -> 12
>
> v.len()  # -> 
>
> list(v)  # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...]
> v.list() # -> 
>
>
> I can't implement every single constructor that users might
> conceivably want, of course, but I can do it for the basic types in
> builtins and common standard library.  E.g. I might do:
>
> v.deque() # ->  ... >
>
>
> But I certainly won't manually add:
>
> v.custom_linked_list()  # From my_inhouse_module.py
>
>
> Hmm... maybe even I could look at names of maybe-constructors in the
> current namespace and try them.  That starts to feel too magic.  Falling
> back to this feels better:
>
> map(custom_linked_list, v)  # From my_inhouse_module.py
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread David Mertz
On Sun, Feb 3, 2019 at 3:54 AM Adrien Ricocotam  wrote:

> I think all the issues you have right now would go of using another
> operation. I proposed the @ notation that is clear and different from
> everything else,
>
plus the operator is called "matmul" so it completely makes sense. The the
> examples would be :
>


> >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
> >>> v = Vector(l)
> >>> len(v)
> 12
> >>> v @ len
> 
>

I cannot really see how using the @ operator helps anything here.  If this
were a language that isn't Python (or conceivably some future version of
Python, but that doesn't feel likely or desirable to me), I could imagine @
as an operator to vectorize any arbitrary sequence (or iterator).  But
given that we've already made the sequence into a Vector, there's no need
for extra syntax to say it should act in a vectorized way.

Moreover, your syntax is awkward for methods with arguments.  How would I
spell:

v.replace('foo', 'bar')


In the @ syntax? I actually made an error on my first pass where simply
naming a method was calling it.  I thought about keeping it for a moment,
but that really only allows zero argument calls.

I think the principled thing to do here is add the minimal number of
methods to Vector itself, and have everything else pass through as
vectorized calls.  Most of that minimal number are "magic method":
__len__(), __contains__(), __str__(), __repr__(), __iter__(),
__reversed__().  I might have forgotten a couple.  All of those should not
be called directly, normally, but act as magic for operators or built-in
functions.

I think I should then create regular methods of the same name that perform
the vectorized version.  So we would have:

len(v)   # -> 12

v.len()  # -> 

list(v)  # -> ["Jan", "Feb", "Mar", "Apr", "May", "Jul" ...]
v.list() # -> 


I can't implement every single constructor that users might
conceivably want, of course, but I can do it for the basic types in
builtins and common standard library.  E.g. I might do:

v.deque() # -> 


But I certainly won't manually add:

v.custom_linked_list()  # From my_inhouse_module.py


Hmm... maybe even I could look at names of maybe-constructors in the
current namespace and try them.  That starts to feel too magic.  Falling
back to this feels better:

map(custom_linked_list, v)  # From my_inhouse_module.py



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread James Lu
There is no need for any of you to argue over this small point. Tolerate each 
other’s language.

Sent from my iPhone

> On Feb 2, 2019, at 3:58 AM, Steven D'Aprano  wrote:
> 
>> On Sat, Feb 02, 2019 at 05:10:14AM +, MRAB wrote:
>>> On 2019-02-02 04:32, Steven D'Aprano wrote:
>>> [snip]
>>> 
>>> Of course it makes sense. Even numpy supports inhomogeneous data:
>>> 
>> [snip]
>> 
>> "inhomogeneous"? Who came up with that?
> 
> I don't know, but it has been used since at least the early 1920s
> 
> https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomogeneous
> 
> and the Oxford dictionary describes "inhomogenity" as being used from 
> the late 19th century. So my guess is, probably people who were more 
> familiar with Latin and Greek than we are.
> 
> There are many words that are derived from both Latin and Greek. There's 
> no rule that says that because a word was derived from Greek, we must 
> use Greek grammatical forms for it. We are speaking English, not Greek, 
> and in English, we can negate words using the "in" prefix.
> 
> 
> 
> -- 
> Steven
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-03 Thread Adrien Ricocotam
Nice that you implemented it !

I think all the issues you have right now would go of using another
operation. I proposed the @ notation that is clear and different from
everything else,
plus the operator is called "matmul" so it completely makes sense. The the
examples would be :

>>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
>>> v = Vector(l)
>>> len(v)
12
>>> v @ len

>>> list(v)
["Jan", "Feb", "Mar", "Apr", "May", "Jul", "Aug", "Sep", "Oct", "Nov",
"Dec"]
>>> v @ list
>> from functools import partial
>>> import operator
>>> v[1:]

>>> my_slice = slice(1, None, None)
>>> indexing_operation = partial(operator.getitem(my_slice))
>>> v @ indexing_operation


That little example shows the need of configuring functions so they only
accept on argument.
It's actually not a new problem since map have the same "issue".

A vector of one element should still be a vector, as a list/tuple/dict of
one element is a list/tuple/dict, imo.

I suggested Vector objects to inherit from lists, and therefore be
iterables. It would be handy to iterator over
its elements and simple loops, maps, etc, should still be available to
them. It might be clearer to use "old" notations
for some operations.

About the `Vector("A Super String")`, if we want it to be a vector of one
element, we should use `Vector(["A Super String"])`,
as we would do in any other function using an iterable as input.


Side Note :
Honestly, I don't think it's the good thread to debate whether we should
use ["in", "un", "an", "non"] - homogeneous or heterogeneous.
As long as it's clear, does it matter ?

Le dim. 3 févr. 2019 à 04:19, David Mertz  a écrit :

> I think it should follow the pre-existing behaviour of list, set, tuple,
>> etc.
>>
>>  >>> Vector("hello")
>> 
>>
>
> I try to keep the underlying datatype of the wrapped collection as much
> as possible.  Casting a string to a list changes that.
>
> >>> Vector(d)
> 
> >>> Vector(tuple(d))
> 
> >>> Vector(set(d))
> 
> >>> from collections import deque
> >>> Vector(deque(d))
> 
>
>
> Strings are already a Collection, there is not firm need cast them to a
> list to live inside a Vector.  I like the idea of maintaining the original
> type if someone wants it back later (possibly after transformations of the
> values).
>
> Why is it pointless for a vector, but not for a list?
>>
>
> I guess it really isn't.  I was thinking of just .upper() and .lower()
> where upper/lower-casing each individual letter is the same as doing so to
> the whole string.  But for .replace() or .count() or .title() or
> .swapcase() the meaning is very different if it is letter-at-a-time.
>
> I guess a string gets unstringified pretty quickly no matter what though.
> E.g. this seems like right behavior once we transform something:
>
> >>> vstr = Vector('Monday')
> >>> vstr
> 
> >>> vstr.upper()
> 
>
>
> I dunno... I suppose I *could* do `self._it = "".join(self._it)` whenever
> I do a transform on a string to keep the underlying iterable as a string.
> But the point of a Vector really is sequences of strings not sequences of
> characters.
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
>
> I think it should follow the pre-existing behaviour of list, set, tuple,
> etc.
>
>  >>> Vector("hello")
> 
>

I try to keep the underlying datatype of the wrapped collection as much
as possible.  Casting a string to a list changes that.

>>> Vector(d)

>>> Vector(tuple(d))

>>> Vector(set(d))

>>> from collections import deque
>>> Vector(deque(d))



Strings are already a Collection, there is not firm need cast them to a
list to live inside a Vector.  I like the idea of maintaining the original
type if someone wants it back later (possibly after transformations of the
values).

Why is it pointless for a vector, but not for a list?
>

I guess it really isn't.  I was thinking of just .upper() and .lower()
where upper/lower-casing each individual letter is the same as doing so to
the whole string.  But for .replace() or .count() or .title() or
.swapcase() the meaning is very different if it is letter-at-a-time.

I guess a string gets unstringified pretty quickly no matter what though.
E.g. this seems like right behavior once we transform something:

>>> vstr = Vector('Monday')
>>> vstr

>>> vstr.upper()



I dunno... I suppose I *could* do `self._it = "".join(self._it)` whenever I
do a transform on a string to keep the underlying iterable as a string.
But the point of a Vector really is sequences of strings not sequences of
characters.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
On Sat, Feb 2, 2019 at 10:00 PM MRAB  wrote:

> Perhaps a reserved attribute that let's you refer to the vector itself
> instead of its members, e.g. '.self'?
>
> >>> len(v)
> 
>  >>> len(v.self)
> 12
>

I like that! But I'm not sure if '.self' is misleading.  I use an attribute
called '._it' already that does exactly this.  But since we're asking the
length of the list or tuple or set or deque or etc that the Vector wraps,
does it feel like it would be deceptive to call them all '.self'?

I'm really not sure.  I could just rename '._it' to '.self' and get the
behavior you show (well, I still need a little checking whether the thing
wrapped is a collection or an iterator ... I guess a '.self' property.  Or
some other name to do that).

You remind me that I need to at .__getitem__() too so I can slice and index
Vectors.  But I know how to do that easily enough.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread MRAB

On 2019-02-03 02:03, David Mertz wrote:

Slightly more on my initial behavior:

 >>> Vector({1:2,3:4})
TypeError: Ambiguity vectorizing a map, perhaps try it.keys(),
it.values(), or it.items()

 >>> Vector(37)
TypeError: Vector can only be initialized with an iterable

 >>> Vector("hello")



I'm wondering if maybe making a vector out of a scalar should simply be 
a length-one vector. What do you think?


Also, should a single string be treated like a vector of characters or 
like a scalar? It feels kinda pointless to make a vector of characters 
since I cannot think of anything it would do better than a plain string 
already does (largely just the same thing slower).



[snip]
I think it should follow the pre-existing behaviour of list, set, tuple, 
etc.


>>> Vector("hello")


Why is it pointless for a vector, but not for a list?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
Trying to make iterators behave in a semi-nice way also. I kinda like this
(example remains silly, but it shows idea).

>>> for n, mon in enumerate(vi.upper().replace('J','_').title()):
... print(mon)
... if n>3: break
...
...
_An
Feb
Mar
Apr
May
>>> vi
>
>>> list(vi)
['Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
>>> vi
>
>>> list(vi)
[]


On Sat, Feb 2, 2019 at 9:03 PM David Mertz  wrote:

> Slightly more on my initial behavior:
>
> >>> Vector({1:2,3:4})
> TypeError: Ambiguity vectorizing a map, perhaps try it.keys(),
> it.values(), or it.items()
>
> >>> Vector(37)
> TypeError: Vector can only be initialized with an iterable
>
> >>> Vector("hello")
> 
>
>
> I'm wondering if maybe making a vector out of a scalar should simply be a
> length-one vector. What do you think?
>
> Also, should a single string be treated like a vector of characters or
> like a scalar? It feels kinda pointless to make a vector of characters
> since I cannot think of anything it would do better than a plain string
> already does (largely just the same thing slower).
>
> On Sat, Feb 2, 2019 at 8:54 PM David Mertz  wrote:
>
>> Here is a very toy proof-of-concept:
>>
>> >>> from vector import Vector
>> >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
>> >>> v = Vector(l)
>> >>> v
>> > 'Sep', 'Oct', 'Nov', 'Dec']>
>> >>> v.strip().lower().replace('a','X')
>> > 'sep', 'oct', 'nov', 'dec']>
>> >>> vt = Vector(tuple(l))
>> >>> vt
>> > 'Sep', 'Oct', 'Nov', 'Dec')>
>> >>> vt.lower().replace('o','X')
>> > 'sep', 'Xct', 'nXv', 'dec')>
>>
>>
>> My few lines are at https://github.com/DavidMertz/stringpy
>>
>> One thing I think I'd like to be different is to have some way of
>> accessing EITHER the collection being held OR each element.  So now I just
>> get:
>>
>> >>> v.__len__()
>> 
>>
>>
>> Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the
>> moment.  It would be nice also to be able to ask "what's the length of the
>> vector, in a non-vectorized way" (i.e. 12 in this case).  Maybe some naming
>> convention like:
>>
>> >>> v.collection__len__()
>> 12
>>
>>
>> This last is just a possible behavior, not in the code I just uploaded.
>>
>>
>> On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico  wrote:
>>
>>> On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould 
>>> wrote:
>>> >
>>> > On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker 
>>> wrote:
>>> >>
>>> >> a_list_of_strings.strip().lower().title()
>>> >>
>>> >> is a lot nicer than:
>>> >>
>>> >> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
>>> a_list_of_strings])]
>>> >>
>>> >> or
>>> >>
>>> >> list(map(str.title, (map(str.lower, (map(str.strip,
>>> a_list_of_strings # untested
>>> >
>>> > In this case you can write
>>> >
>>> > [s.strip().lower().title() for s in a_list_of_strings]
>>>
>>> What if it's a more complicated example?
>>>
>>> len(sorted(a_list_of_strings.casefold())[:100])
>>>
>>> where the len() is supposed to give back a list of the lengths of the
>>> first hundred strings, sorted case insensitively? (Okay so it's a
>>> horrible contrived example. Bear with me.)
>>>
>>> With current syntax, this would need multiple map calls or
>>> comprehensions:
>>>
>>> [len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]]
>>>
>>> (Better examples welcomed.)
>>>
>>> ChrisA
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread MRAB

On 2019-02-03 01:54, David Mertz wrote:

Here is a very toy proof-of-concept:

 >>> from vector import Vector
 >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
 >>> v = Vector(l)
 >>> v

 >>> v.strip().lower().replace('a','X')

 >>> vt = Vector(tuple(l))
 >>> vt

 >>> vt.lower().replace('o','X')



My few lines are at https://github.com/DavidMertz/stringpy

One thing I think I'd like to be different is to have some way of 
accessing EITHER the collection being held OR each element.  So now I 
just get:


 >>> v.__len__()



Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the 
moment.  It would be nice also to be able to ask "what's the length of 
the vector, in a non-vectorized way" (i.e. 12 in this case).  Maybe some 
naming convention like:


 >>> v.collection__len__()
12


This last is just a possible behavior, not in the code I just uploaded.

Perhaps a reserved attribute that let's you refer to the vector itself 
instead of its members, e.g. '.self'?



len(v)


>>> len(v.self)
12

>>> v[1 : ]
'ov', 'ec']>

>>> v.self[1 : ]
'Oct', 'Nov', 'Dec']>




On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico > wrote:


On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould
mailto:benrud...@gmail.com>> wrote:
 >
 > On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker
mailto:python...@gmail.com>> wrote:
 >>
 >> a_list_of_strings.strip().lower().title()
 >>
 >> is a lot nicer than:
 >>
 >> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
a_list_of_strings])]
 >>
 >> or
 >>
 >> list(map(str.title, (map(str.lower, (map(str.strip,
a_list_of_strings # untested
 >
 > In this case you can write
 >
 >     [s.strip().lower().title() for s in a_list_of_strings]

What if it's a more complicated example?

len(sorted(a_list_of_strings.casefold())[:100])

where the len() is supposed to give back a list of the lengths of the
first hundred strings, sorted case insensitively? (Okay so it's a
horrible contrived example. Bear with me.)

With current syntax, this would need multiple map calls or
comprehensions:

[len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]]

(Better examples welcomed.)


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
Slightly more on my initial behavior:

>>> Vector({1:2,3:4})
TypeError: Ambiguity vectorizing a map, perhaps try it.keys(), it.values(),
or it.items()

>>> Vector(37)
TypeError: Vector can only be initialized with an iterable

>>> Vector("hello")



I'm wondering if maybe making a vector out of a scalar should simply be a
length-one vector. What do you think?

Also, should a single string be treated like a vector of characters or like
a scalar? It feels kinda pointless to make a vector of characters since I
cannot think of anything it would do better than a plain string already
does (largely just the same thing slower).

On Sat, Feb 2, 2019 at 8:54 PM David Mertz  wrote:

> Here is a very toy proof-of-concept:
>
> >>> from vector import Vector
> >>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
> >>> v = Vector(l)
> >>> v
>  'Oct', 'Nov', 'Dec']>
> >>> v.strip().lower().replace('a','X')
>  'oct', 'nov', 'dec']>
> >>> vt = Vector(tuple(l))
> >>> vt
>  'Oct', 'Nov', 'Dec')>
> >>> vt.lower().replace('o','X')
>  'Xct', 'nXv', 'dec')>
>
>
> My few lines are at https://github.com/DavidMertz/stringpy
>
> One thing I think I'd like to be different is to have some way of
> accessing EITHER the collection being held OR each element.  So now I just
> get:
>
> >>> v.__len__()
> 
>
>
> Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the
> moment.  It would be nice also to be able to ask "what's the length of the
> vector, in a non-vectorized way" (i.e. 12 in this case).  Maybe some naming
> convention like:
>
> >>> v.collection__len__()
> 12
>
>
> This last is just a possible behavior, not in the code I just uploaded.
>
>
> On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico  wrote:
>
>> On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould 
>> wrote:
>> >
>> > On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker 
>> wrote:
>> >>
>> >> a_list_of_strings.strip().lower().title()
>> >>
>> >> is a lot nicer than:
>> >>
>> >> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
>> a_list_of_strings])]
>> >>
>> >> or
>> >>
>> >> list(map(str.title, (map(str.lower, (map(str.strip,
>> a_list_of_strings # untested
>> >
>> > In this case you can write
>> >
>> > [s.strip().lower().title() for s in a_list_of_strings]
>>
>> What if it's a more complicated example?
>>
>> len(sorted(a_list_of_strings.casefold())[:100])
>>
>> where the len() is supposed to give back a list of the lengths of the
>> first hundred strings, sorted case insensitively? (Okay so it's a
>> horrible contrived example. Bear with me.)
>>
>> With current syntax, this would need multiple map calls or comprehensions:
>>
>> [len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]]
>>
>> (Better examples welcomed.)
>>
>> ChrisA
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
Here is a very toy proof-of-concept:

>>> from vector import Vector
>>> l = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
>>> v = Vector(l)
>>> v

>>> v.strip().lower().replace('a','X')

>>> vt = Vector(tuple(l))
>>> vt

>>> vt.lower().replace('o','X')



My few lines are at https://github.com/DavidMertz/stringpy

One thing I think I'd like to be different is to have some way of accessing
EITHER the collection being held OR each element.  So now I just get:

>>> v.__len__()



Yes, that's an ugly spelling of `len(v)`, but let's bracket that for the
moment.  It would be nice also to be able to ask "what's the length of the
vector, in a non-vectorized way" (i.e. 12 in this case).  Maybe some naming
convention like:

>>> v.collection__len__()
12


This last is just a possible behavior, not in the code I just uploaded.


On Sat, Feb 2, 2019 at 6:47 PM Chris Angelico  wrote:

> On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould 
> wrote:
> >
> > On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker 
> wrote:
> >>
> >> a_list_of_strings.strip().lower().title()
> >>
> >> is a lot nicer than:
> >>
> >> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
> a_list_of_strings])]
> >>
> >> or
> >>
> >> list(map(str.title, (map(str.lower, (map(str.strip,
> a_list_of_strings # untested
> >
> > In this case you can write
> >
> > [s.strip().lower().title() for s in a_list_of_strings]
>
> What if it's a more complicated example?
>
> len(sorted(a_list_of_strings.casefold())[:100])
>
> where the len() is supposed to give back a list of the lengths of the
> first hundred strings, sorted case insensitively? (Okay so it's a
> horrible contrived example. Bear with me.)
>
> With current syntax, this would need multiple map calls or comprehensions:
>
> [len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]]
>
> (Better examples welcomed.)
>
> ChrisA
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Greg Ewing

MRAB wrote:

Well, if we were using an English prefix, wouldn't it be "unhomogeneous"?


If we're sticking with Greek it would have to be something
like "anhomogeneous".

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Ben Rudiak-Gould
On Sat, Feb 2, 2019 at 3:31 PM Steven D'Aprano  wrote:

> The comprehension version isn't awful:
>
> [(a*2).name.upper() for a in seq]
>
> but not all vectorized operations can be written as a chain of calls on
> a single sequence.
>

If they are strictly parallel (no dot products) and you know when writing
the code which variables hold vectors, then (denoting the vector variables
by v1, ..., vn) you can always write

[(expr with x1, ..., xn substituted for v1, ..., vn)
 for x1, ..., xn in zip(v1, ..., vn)]

which seems not much worse than the auto-vectorized version (with or
without special syntax).

Haskell (GHC) has parallel list comprehension syntax (
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#parallel-list-comprehensions)
so you don't have to explicitly call zip. I wouldn't mind having that in
Python but I don't know what the syntax would be.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
My double dot was a typo on my tablet, not borrowing Julia syntax, in this
case.

On Sat, Feb 2, 2019, 6:43 PM David Mertz  On Sat, Feb 2, 2019, 6:23 PM Christopher Barker
>
>> a_list_of_strings.strip().lower().title()
>>
>> is a lot nicer than:
>>
>> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
>> a_list_of_strings])]
>>
>> or
>>
>> list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings
>> # untested
>>
>
> I'm warming up some. But is this imagined as vectors of strings, or as
> generically homogeneous objects? And what is homogeneity exactly in the
> face of duck typing?
>
> Absent the vector wrapping, I think I might write this for your example:
>
> map(lambda s: s..strip().lower().title(), a_list_of_strings)
>
> That's slightly longer, but just by the length of the word lambda.
>
> One could write a wrapper to vectorize pretty easily. So maybe:
>
> Vector(a_list_of_strings).strip().lower().title()
>
> This would just pass along the methods to the individual items, and
> wouldn't need to think about typing per se. Maybe other objects happen to
> have those three methods, so are string-like in a duck way.
>
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Chris Angelico
On Sun, Feb 3, 2019 at 10:36 AM Ben Rudiak-Gould  wrote:
>
> On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker  wrote:
>>
>> a_list_of_strings.strip().lower().title()
>>
>> is a lot nicer than:
>>
>> [s.title() for s in (s.lower() for s in [s.strip(s) for s in 
>> a_list_of_strings])]
>>
>> or
>>
>> list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings # 
>> untested
>
> In this case you can write
>
> [s.strip().lower().title() for s in a_list_of_strings]

What if it's a more complicated example?

len(sorted(a_list_of_strings.casefold())[:100])

where the len() is supposed to give back a list of the lengths of the
first hundred strings, sorted case insensitively? (Okay so it's a
horrible contrived example. Bear with me.)

With current syntax, this would need multiple map calls or comprehensions:

[len(s) for s in sorted(s.casefold() for s in a_list_of_strings)[:100]]

(Better examples welcomed.)

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
On Sat, Feb 2, 2019, 6:23 PM Christopher Barker

> a_list_of_strings.strip().lower().title()
>
> is a lot nicer than:
>
> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
> a_list_of_strings])]
>
> or
>
> list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings
> # untested
>

I'm warming up some. But is this imagined as vectors of strings, or as
generically homogeneous objects? And what is homogeneity exactly in the
face of duck typing?

Absent the vector wrapping, I think I might write this for your example:

map(lambda s: s..strip().lower().title(), a_list_of_strings)

That's slightly longer, but just by the length of the word lambda.

One could write a wrapper to vectorize pretty easily. So maybe:

Vector(a_list_of_strings).strip().lower().title()

This would just pass along the methods to the individual items, and
wouldn't need to think about typing per se. Maybe other objects happen to
have those three methods, so are string-like in a duck way.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 03:22:12PM -0800, Christopher Barker wrote:

[This bit was me] 
> Even numpy supports inhomogeneous data:
> > py> a = np.array([1, 'spam'])
> > py> a
> > array(['1', 'spam'],
> >   dtype='|S4')
> 
> 
> well, no -- it doesn't -- look carefully, that is an array or type '!S4' --
> i,e, a 4 element long string --every element in that array is that same
> type.

So it is. I wondered what the cryptic '|S4' symbol meant, and I 
completely missed the '' quotes around the 1.

Thanks for the correction.

[...]
> c = np.sqrt(a**2 + b**2)
> 
> is a heck of a lot easer to read, write, and get correct than:
> 
> c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a),
> map(lambda x: x**2, b)
>   )))

Indeed. This hypothetical syntax brings the readability advantages of 
infix operators to code that operates on iterables, without requiring 
every iterable to support arbitrary functions and methods.




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Chris Angelico
On Sun, Feb 3, 2019 at 10:31 AM Steven D'Aprano  wrote:
> The dot arguably fails the "syntax should not look like grit on Tim's
> monitor" test (although attribute access already fails that test). I
> think the double-dot syntax looks like a typo, which is unfortunate.

Agreed, so I would like to see a different spelling of it. Pike has an
automap syntax that looks a lot like subscripting:

numbers[*] * 2

Borrowing that syntax would pass the grit test, and it currently isn't
valid syntax.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Ben Rudiak-Gould
On Sat, Feb 2, 2019 at 3:23 PM Christopher Barker 
wrote:

> performance asside, I use numpy because:
>
> c = np.sqrt(a**2 + b**2)
>
> is a heck of a lot easer to read, write, and get correct than:
>
> c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a),
> map(lambda x: x**2, b)
>   )))
>
> or:
>
> [math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a),
>   (x**2 for x in b)
>   ))]
>

You can also write

c = [math.sqrt(x**2 + y**2) for x, y in zip(a, b)]

or

c = list(map(lambda x, y: math.sqrt(x**2 + y**2), a, b))

or, since math.hypot exists,

c = list(map(math.hypot, a, b))

In recent Python versions you can write [*map(...)] instead of
list(map(...)), which I find more readable.

a_list_of_strings.strip().lower().title()
>
> is a lot nicer than:
>
> [s.title() for s in (s.lower() for s in [s.strip(s) for s in
> a_list_of_strings])]
>
> or
>
> list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings
> # untested
>

In this case you can write

[s.strip().lower().title() for s in a_list_of_strings]

-- Ben
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 06:08:24PM -0500, David Mertz wrote:

> In terms of other examples:
> 
> map(str.upper, seq) uppercases each item
> map(operator.attrgetter('name'), seq) gets the name attribute of each item
> map(lambda a: a*2, seq) doubles each item

Now compose those operations:

((seq .* 2)..name)..upper()

versus

# Gag me with a spoon!
map(str.upper, map(operator.attrgetter('name'), map(lambda a: a*2, seq)))

The comprehension version isn't awful:

[(a*2).name.upper() for a in seq]

but not all vectorized operations can be written as a chain of calls on 
a single sequence.

There are still some open issues that I don't have good answers for.

Consider ``x .+ y``. In Julia, I think that the compiler has enough type 
information to distinguish between the array plus scalar and array plus 
array cases, but I don't think Python will have that. So possibly there 
will still be some runtime information needed to make this work.

The dot arguably fails the "syntax should not look like grit on Tim's 
monitor" test (although attribute access already fails that test). I 
think the double-dot syntax looks like a typo, which is unfortunate.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Christopher Barker
On Fri, Feb 1, 2019 at 5:00 PM David Mertz  wrote:

> is is certainly doable. But why would it be better than:
>>
>
> map(str.lower, my_string_vector)
> map(compute_grad, my_student_vector)
>

or [s.lower() for s in my_string_vector]

Side note: It's really interesting to me that Python introduced
comprehension sytax some years ago, and even "hid" reduce(), and now there
seems to be a big interest / revival of "map".

Even numpy supports inhomogeneous data:
> py> a = np.array([1, 'spam'])
> py> a
> array(['1', 'spam'],
>   dtype='|S4')


well, no -- it doesn't -- look carefully, that is an array or type '!S4' --
i,e, a 4 element long string --every element in that array is that same
type. Also note that numpy's support for strings a not very complete.

numpy does support an "object" type, that can be inhomogeneous -- it's
still a single type, but that type is a python object (under the hood it's
an array fo pointers to pyobjects):

In [3]: a = np.array([1, 'spam'], dtype=np.object)

In [4]: a

Out[4]: array([1, 'spam'], dtype=object)

And it does support vectorization to some extent:
In  [5]: a * 5

Out [5]: array([5, 'spamspamspamspamspam'], dtype=object)

But not with any performance benefits.

I think there are good reasons to have a "string_vector" that is known to
be homogenous:

Performance -- it could be significantly optimized (are there many use
cases for that? I don't know.

Clear API: a string_vector would have all the relevant string methods.

You could easily write a list subclass that passed on method calls to the
enclosed objects, but then you'd have a fair bit of confusion as to what
might be a vector method vs a method on the objects.

which I suppose leaves us with something like:

list.elements.upper()

list.elements * 5

hmm -- not sure how much I like this, but it's pretty doable.

I still haven't seen any examples that aren't already spelled 'map(fun, it)'


and I don't think you will -- I *think* get credit for starting this part
of the the thread, and I started by saying I have often longed for
essentially a more concise way to spell map() or comprehensions.
performance asside, I use numpy because:

c = np.sqrt(a**2 + b**2)

is a heck of a lot easer to read, write, and get correct than:

c = list(map(math.sqrt, map(lambda x, y: x + y, map(lambda x: x**2, a),
map(lambda x: x**2, b)
  )))

or:

[math.sqrt(x) for x in (a + b for a, b in zip((x**2 for x in a),
  (x**2 for x in b)
  ))]

Note: it took me quite a while to get those right! (and I know I could have
used the operator module to get the map version maybe a bit cleaner, but
the point stands)

Does this apply to string processing? I'm not sure, though I do a fair bit
of chaining of string operations:

my_string.strip().lower().title()

if you wanted to do that to a list of strings:

a_list_of_strings.strip().lower().title()

is a lot nicer than:

[s.title() for s in (s.lower() for s in [s.strip(s) for s in
a_list_of_strings])]

or

list(map(str.title, (map(str.lower, (map(str.strip, a_list_of_strings #
untested

How common is that use case? not common enough for me to go any further
with this.

-CHB





-CHB



-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 03:31:29PM -0500, David Mertz wrote:

> I still haven't seen any examples that aren't already spelled 'map(fun, it)'

You might be right. But then there's nothing that map() can do that 
couldn't be written as a comprehension, and nothing that you can't do 
with a comprehension that can't be written as a for-loop.

And nothing that can't be written as a for-loop that couldn't be written 
as a while-loop. The only loop construct we really need is a while loop. 
And even that is redundant if we had GOTO.

Its not about the functionality, but expressibility and readability.

This hypothetical vectorization syntax might have a performance 
advantage as well. My understanding is that Julia is able to efficiently 
vectorize code, bringing it to within 10% of the speed of unrolled C 
loops. It may be that CPython cannot do anything that fast, but there 
may be some opportunities for optimization that we cannot apply to 
for-loops or comprehensions due to the way they are defined.

But primarily it is about the readability of the code:

result = process.(vector .+ sequence) .* items

versus:

# Ouch!
result = map(operator.mul, 
 zip(map(process, 
 map(operator.add,
 zip(vector, sequence)), 
 items))


Here's the comprehension version:

result = [a*b for a, b in zip(
 [process(c) for c in 
 [d+e for d, e in zip(vector, sequence)]], 
 items)]

We can improve that comprehension a tiny bit by splitting it into 
multiple steps:

 temp1 = [d+e for d, e in zip(vector, sequence)]
 temp2 = [process(c) for x in temp1]
 result = [a*b for a, b in zip(temp2, items)]

but none of these are as elegant or readable as the vectorized syntax

 result = process.(vector .+ sequence) .* items



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
Beyond possibly saving 3-5 characters, I continue not to see anything
different from map in this discussion.

list(vector) applies list to the vector itself.
> list.(vector) applies list to each component of vector.
>

In Python:

list(seq) applies list to the sequence itself
map(list, seq) applies list to each component of seq

In terms of other examples:

map(str.upper, seq) uppercases each item
map(operator.attrgetter('name'), seq) gets the name attribute of each item
map(lambda a: a*2, seq) doubles each item
(lambda a: a*2)(seq) doubles the sequence itself

... Last two might enjoy named function 'double'




> > The problem, of course, is that list() now has to understand Vector
> > specially, and so does any function you think of applying to it.
>
> *The whole point* of the Julia syntax is that no function has to
> understand any sequence. When we write:
>
> for item in vector:
> func(item)
>
> func only has to understand item, not vector. The same applies to the
> Julia syntax
>
> func.(vector)
>
> There's no puzzle here, no tricky cases, because it is completely
> deterministic and explicit: func(x) always calls func with x as
> argument, func.(x) always calls func with each of x's items as
> arguments.
>
>
>
> > Operators are easier (even those like [1:]) because Vector can make its
> > own definition of each through (a finite set of) dunder methods. To make
> > a Vector accept an arbitrarily-named method call like my_strings.upper()
> > to mean:
>
> With the Julia syntax, there is no need for vectors (or lists, or
> generators, or tuples, or sets, or any other iterator...) to accept
> arbitrary method calls. So long as vectors can be iterated over,
> func.(vector) will work.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 07:58:34PM +, Jeff Allen wrote:

[MRAB asked]
> >OK, here's another one: if you use 'list(...)' on a vector, does it 
> >apply to the vector itself or its members?

With the Julia vectorization operator, there is no puzzle there.

list(vector) applies list to the vector itself.

list.(vector) applies list to each component of vector.


> The problem, of course, is that list() now has to understand Vector 
> specially, and so does any function you think of applying to it. 

*The whole point* of the Julia syntax is that no function has to 
understand any sequence. When we write:

for item in vector:
func(item)

func only has to understand item, not vector. The same applies to the 
Julia syntax

func.(vector)

There's no puzzle here, no tricky cases, because it is completely 
deterministic and explicit: func(x) always calls func with x as 
argument, func.(x) always calls func with each of x's items as 
arguments.



> Operators are easier (even those like [1:]) because Vector can make its 
> own definition of each through (a finite set of) dunder methods. To make 
> a Vector accept an arbitrarily-named method call like my_strings.upper() 
> to mean:

With the Julia syntax, there is no need for vectors (or lists, or 
generators, or tuples, or sets, or any other iterator...) to accept 
arbitrary method calls. So long as vectors can be iterated over, 
func.(vector) will work.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Brendan Barnwell

On 2019-02-02 12:31, David Mertz wrote:

I still haven't seen any examples that aren't already spelled 'map(fun, it)'


	The problem with this is the same problem with having a function called 
"add" instead of an operator.  There is little gain when you're applying 
ONE function, but if you're applying multiple functions you get a 
thicket of parentheses.  I would rather see this:


some_list @ str.lower @ tokenize @ remove_stopwords

. . .than this:

map(remove_stopwords, map(tokenize, map(str.lower, some_list)))

	That said, I don't necessarily think this needs to be added to the 
language.  Things like pandas already provide this and so much more that 
it's unclear whether the gain from adding vectorization on its own would 
be worth it.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread David Mertz
I still haven't seen any examples that aren't already spelled 'map(fun, it)'

On Sat, Feb 2, 2019, 3:17 PM Jeff Allen  On 02/02/2019 18:44, MRAB wrote:
>
> On 2019-02-02 17:31, Adrien Ricocotam wrote:
> > I personally would the first option to be the case. But then vectors
> shouldn't be list-like but more generator like.
> >
> OK, here's another one: if you use 'list(...)' on a vector, does it apply
> to the vector itself or its members?
>
> >>> list(my_strings)
>
> You might be wanting to convert a vector into a list:
>
> ['one', 'two', 'three']
>
> or convert each of its members onto lists:
>
> Vector([['one'], ['two'], ['three']])
>
> More likely you mean:
>
> >>> [list(i) for i in ['one', 'two', 'three']]
> [['o', 'n', 'e'], ['t', 'w', 'o'], ['t', 'h', 'r', 'e', 'e']]
>
> The problem, of course, is that list() now has to understand Vector
> specially, and so does any function you think of applying to it. Operators
> are easier (even those like [1:]) because Vector can make its own
> definition of each through (a finite set of) dunder methods. To make a
> Vector accept an arbitrarily-named method call like my_strings.upper() to
> mean:
>
> >>> [i.upper() for i in ['one', 'two', 'three']]
> ['ONE', 'TWO', 'THREE']
>
> is perhaps just about possible by manipulating __getattribute__ to resolve
> names matching methods on the underlying type to a callable that loops over
> the content.
>
> Jeff
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Jeff Allen

On 02/02/2019 18:44, MRAB wrote:

On 2019-02-02 17:31, Adrien Ricocotam wrote:
> I personally would the first option to be the case. But then vectors 
shouldn't be list-like but more generator like.

>
OK, here's another one: if you use 'list(...)' on a vector, does it 
apply to the vector itself or its members?


>>> list(my_strings)

You might be wanting to convert a vector into a list:

['one', 'two', 'three']

or convert each of its members onto lists:

Vector([['one'], ['two'], ['three']])


More likely you mean:

>>> [list(i) for i in ['one', 'two', 'three']]
[['o', 'n', 'e'], ['t', 'w', 'o'], ['t', 'h', 'r', 'e', 'e']]

The problem, of course, is that list() now has to understand Vector 
specially, and so does any function you think of applying to it. 
Operators are easier (even those like [1:]) because Vector can make its 
own definition of each through (a finite set of) dunder methods. To make 
a Vector accept an arbitrarily-named method call like my_strings.upper() 
to mean:


>>> [i.upper() for i in ['one', 'two', 'three']]
['ONE', 'TWO', 'THREE']

is perhaps just about possible by manipulating __getattribute__ to 
resolve names matching methods on the underlying type to a callable that 
loops over the content.


Jeff

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Adrien Ricocotam
That's tough. I'd say conver the vector to a list.

But :
my_vector.list()

Would apply list on each element of the vector.

Globally, I'd say if the vector is used as an argument, it's a usual
iterable, if you use a member function (or any other notation like @ or ..
or whatever) it's like map.

Note that it's just my opinion.

Le sam. 2 févr. 2019 à 19:46, MRAB  a écrit :

> On 2019-02-02 17:31, Adrien Ricocotam wrote:
>  > I personally would the first option to be the case. But then vectors
> shouldn't be list-like but more generator like.
>  >
> OK, here's another one: if you use 'list(...)' on a vector, does it
> apply to the vector itself or its members?
>
>  >>> list(my_strings)
>
> You might be wanting to convert a vector into a list:
>
> ['one', 'two', 'three']
>
> or convert each of its members onto lists:
>
> Vector([['one'], ['two'], ['three']])
>
>  > Le sam. 2 févr. 2019 à 19:26, MRAB  a
> écrit :
>  >
>  > On 2019-02-02 09:22, Kirill Balunov wrote:
>  > >
>  > >
>  > > сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano   > > >:
>  > >
>  > >
>  > > I didn't say anything about a vector type.
>  > >
>  > >
>  > > I agree  you did not say. But since you started a new thread
> from the
>  > > one where the vector type was a little discussed, it seemed to
> me  that
>  > > it is appropriate to mention it here. Sorry about that.
>  > >
>  > >  > Therefore, it allows you to ensure that the method is
> present for
>  > > each
>  > >  > element in the vector. The first given example is what
> numpy is
>  > > all about
>  > >  > and without some guarantee that L consists of
> homogeneous data it
>  > > hardly
>  > >  > make sense.
>  > >
>  > > Of course it makes sense. Even numpy supports inhomogeneous
> data:
>  > >
>  > > py> a = np.array([1, 'spam'])
>  > > py> a
>  > > array(['1', 'spam'],
>  > >dtype='|S4')
>  > >
>  > >
>  > > Yes, numpy, at some degree, supports heterogeneous arrays. But
> not in
>  > > the way you brought it. Your example just shows homogeneous
> array of
>  > > type `'|S4'`. In the same way as `np.array([1, 1.234])` will be
>  > > homogeneous. Of course you can say -  np.array([1, 'spam'],
>  > > dtype='object'), but in this case it will also be homogeneous
> array, but
>  > > of type `object`.
>  > >
>  > > Inhomogeneous data may rule out some optimizations, but
> that hardly
>  > > means that it "doesn't make sense" to use it.
>  > >
>  > >
>  > > I did not say that it  "doesn't make sense". I only said that
> you should
>  > > be lucky to call `..method()` on collections of heterogeneous
> data. And
>  > > therefore, usually this kind of operations imply that you are
> working
>  > > with a "homogeneous data". Unfortunately, built-in containers
> cannot
>  > > provide such a guarantee without self-checking. Therefore, in
> my opinion
>  > > that at the moment such an operator is not needed.
>  > >
>  > Here's a question: when you use a subscript on a vector, does it
> apply
>  > to the vector itself, or its members?
>  >
>  > For example, given:
>  >
>  >  >>> my_strings = Vector(['one', 'two', 'three'])
>  >
>  > what is:
>  >
>  >  >>> my_strings[1 : ]
>  >
>  > ?
>  >
>  > Is it:
>  >
>  > Vector(['ne', 'wo', 'hree'])
>  >
>  > or:
>  >
>  > Vector(['two', 'three'])
>  >
>  > ?
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread MRAB

On 2019-02-02 17:31, Adrien Ricocotam wrote:
> I personally would the first option to be the case. But then vectors 
shouldn't be list-like but more generator like.

>
OK, here's another one: if you use 'list(...)' on a vector, does it 
apply to the vector itself or its members?


>>> list(my_strings)

You might be wanting to convert a vector into a list:

['one', 'two', 'three']

or convert each of its members onto lists:

Vector([['one'], ['two'], ['three']])

> Le sam. 2 févr. 2019 à 19:26, MRAB  a écrit :
>
> On 2019-02-02 09:22, Kirill Balunov wrote:
> >
> >
> > сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano  > >:
> >
> >
> > I didn't say anything about a vector type.
> >
> >
> > I agree  you did not say. But since you started a new thread 
from the
> > one where the vector type was a little discussed, it seemed to 
me  that

> > it is appropriate to mention it here. Sorry about that.
> >
> >  > Therefore, it allows you to ensure that the method is 
present for

> > each
> >  > element in the vector. The first given example is what 
numpy is

> > all about
> >  > and without some guarantee that L consists of 
homogeneous data it

> > hardly
> >  > make sense.
> >
> > Of course it makes sense. Even numpy supports inhomogeneous 
data:

> >
> > py> a = np.array([1, 'spam'])
> > py> a
> > array(['1', 'spam'],
> >    dtype='|S4')
> >
> >
> > Yes, numpy, at some degree, supports heterogeneous arrays. But 
not in
> > the way you brought it. Your example just shows homogeneous 
array of

> > type `'|S4'`. In the same way as `np.array([1, 1.234])` will be
> > homogeneous. Of course you can say -  np.array([1, 'spam'],
> > dtype='object'), but in this case it will also be homogeneous 
array, but

> > of type `object`.
> >
> > Inhomogeneous data may rule out some optimizations, but 
that hardly

> > means that it "doesn't make sense" to use it.
> >
> >
> > I did not say that it  "doesn't make sense". I only said that 
you should
> > be lucky to call `..method()` on collections of heterogeneous 
data. And
> > therefore, usually this kind of operations imply that you are 
working
> > with a "homogeneous data". Unfortunately, built-in containers 
cannot
> > provide such a guarantee without self-checking. Therefore, in 
my opinion

> > that at the moment such an operator is not needed.
> >
> Here's a question: when you use a subscript on a vector, does it 
apply

> to the vector itself, or its members?
>
> For example, given:
>
>  >>> my_strings = Vector(['one', 'two', 'three'])
>
> what is:
>
>  >>> my_strings[1 : ]
>
> ?
>
> Is it:
>
> Vector(['ne', 'wo', 'hree'])
>
> or:
>
> Vector(['two', 'three'])
>
> ?

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Adrien Ricocotam
I personally would the first option to be the case. But then vectors
shouldn't be list-like but more generator like.

Le sam. 2 févr. 2019 à 19:26, MRAB  a écrit :

> On 2019-02-02 09:22, Kirill Balunov wrote:
> >
> >
> > сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano  > >:
> >
> >
> > I didn't say anything about a vector type.
> >
> >
> > I agree  you did not say. But since you started a new thread from the
> > one where the vector type was a little discussed, it seemed to me  that
> > it is appropriate to mention it here. Sorry about that.
> >
> >  > Therefore, it allows you to ensure that the method is present for
> > each
> >  > element in the vector. The first given example is what numpy is
> > all about
> >  > and without some guarantee that L consists of homogeneous data it
> > hardly
> >  > make sense.
> >
> > Of course it makes sense. Even numpy supports inhomogeneous data:
> >
> > py> a = np.array([1, 'spam'])
> > py> a
> > array(['1', 'spam'],
> >dtype='|S4')
> >
> >
> > Yes, numpy, at some degree, supports heterogeneous arrays. But not in
> > the way you brought it. Your example just shows homogeneous array of
> > type `'|S4'`. In the same way as `np.array([1, 1.234])` will be
> > homogeneous. Of course you can say -  np.array([1, 'spam'],
> > dtype='object'), but in this case it will also be homogeneous array, but
> > of type `object`.
> >
> > Inhomogeneous data may rule out some optimizations, but that hardly
> > means that it "doesn't make sense" to use it.
> >
> >
> > I did not say that it  "doesn't make sense". I only said that you should
> > be lucky to call `..method()` on collections of heterogeneous data. And
> > therefore, usually this kind of operations imply that you are working
> > with a "homogeneous data". Unfortunately, built-in containers cannot
> > provide such a guarantee without self-checking. Therefore, in my opinion
> > that at the moment such an operator is not needed.
> >
> Here's a question: when you use a subscript on a vector, does it apply
> to the vector itself, or its members?
>
> For example, given:
>
>  >>> my_strings = Vector(['one', 'two', 'three'])
>
> what is:
>
>  >>> my_strings[1 : ]
>
> ?
>
> Is it:
>
> Vector(['ne', 'wo', 'hree'])
>
> or:
>
> Vector(['two', 'three'])
>
> ?
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread MRAB

On 2019-02-02 09:22, Kirill Balunov wrote:



сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano >:



I didn't say anything about a vector type.


I agree  you did not say. But since you started a new thread from the 
one where the vector type was a little discussed, it seemed to me  that 
it is appropriate to mention it here. Sorry about that.


 > Therefore, it allows you to ensure that the method is present for
each
 > element in the vector. The first given example is what numpy is
all about
 > and without some guarantee that L consists of homogeneous data it
hardly
 > make sense.

Of course it makes sense. Even numpy supports inhomogeneous data:

py> a = np.array([1, 'spam'])
py> a
array(['1', 'spam'],
       dtype='|S4')


Yes, numpy, at some degree, supports heterogeneous arrays. But not in 
the way you brought it. Your example just shows homogeneous array of 
type `'|S4'`. In the same way as `np.array([1, 1.234])` will be 
homogeneous. Of course you can say -  np.array([1, 'spam'], 
dtype='object'), but in this case it will also be homogeneous array, but 
of type `object`.


Inhomogeneous data may rule out some optimizations, but that hardly
means that it "doesn't make sense" to use it.


I did not say that it  "doesn't make sense". I only said that you should 
be lucky to call `..method()` on collections of heterogeneous data. And 
therefore, usually this kind of operations imply that you are working 
with a "homogeneous data". Unfortunately, built-in containers cannot 
provide such a guarantee without self-checking. Therefore, in my opinion 
that at the moment such an operator is not needed.


Here's a question: when you use a subscript on a vector, does it apply 
to the vector itself, or its members?


For example, given:

>>> my_strings = Vector(['one', 'two', 'three'])

what is:

>>> my_strings[1 : ]

?

Is it:

Vector(['ne', 'wo', 'hree'])

or:

Vector(['two', 'three'])

?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread MRAB

On 2019-02-02 08:58, Steven D'Aprano wrote:

On Sat, Feb 02, 2019 at 05:10:14AM +, MRAB wrote:

On 2019-02-02 04:32, Steven D'Aprano wrote:
[snip]
>
>Of course it makes sense. Even numpy supports inhomogeneous data:
>
[snip]

"inhomogeneous"? Who came up with that?


I don't know, but it has been used since at least the early 1920s

https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomogeneous

and the Oxford dictionary describes "inhomogenity" as being used from
the late 19th century. So my guess is, probably people who were more
familiar with Latin and Greek than we are.

There are many words that are derived from both Latin and Greek. There's
no rule that says that because a word was derived from Greek, we must
use Greek grammatical forms for it. We are speaking English, not Greek,
and in English, we can negate words using the "in" prefix.


Well, if we were using an English prefix, wouldn't it be "unhomogeneous"?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Adrien Ricocotam
@D’aprano I think you’re misleading by what I said, sorry for not being
crystal clear.

I just read the link on Julia (which I didn’t do) and I get what you mean
now and it’s not quite different from what I said.

I proposed introducing a new type : « vector »
A few steps have been made in Python for typing and I think the next step
is having typed collections. Keeping with nothing checked is better imo.
So if we take this next step, we’ll get a vector type with *not-guaranteed*
homogeneous data. Whether its type is « object » « int » or anything else
doesn’t matter as long as it’s supposed to be the same.

This doesn’t change anything in term of usage. Of course we should/could
use map and usual operators on collections. What I was then proposing, to
complete what you suggested and because I don’t like the dot notation, is
using the matrix-multiplication the same way it is used in Julia with the
dots.

But I have a question. I never coded anything at C-level nor a compiler, is
this possible for user defined types to make the vectorieation optimized
the same way it’s done with numbers in numpy ?

If yes, I think it would benefit the community. If no, it’s less likely,
though it’s pursuing the steps made with typing

On Sat 2 Feb 2019 at 10:23, Kirill Balunov  wrote:

>
>
> сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano :
>
>>
>> I didn't say anything about a vector type.
>>
>>
> I agree  you did not say. But since you started a new thread from the one
> where the vector type was a little discussed, it seemed to me  that it is
> appropriate to mention it here. Sorry about that.
>
>
>> > Therefore, it allows you to ensure that the method is present for each
>> > element in the vector. The first given example is what numpy is all
>> about
>> > and without some guarantee that L consists of homogeneous data it hardly
>> > make sense.
>>
>> Of course it makes sense. Even numpy supports inhomogeneous data:
>>
>> py> a = np.array([1, 'spam'])
>> py> a
>> array(['1', 'spam'],
>>   dtype='|S4')
>>
>>
> Yes, numpy, at some degree, supports heterogeneous arrays. But not in the
> way you brought it. Your example just shows homogeneous array of type
> `'|S4'`. In the same way as `np.array([1, 1.234])` will be homogeneous. Of
> course you can say -  np.array([1, 'spam'], dtype='object'), but in this
> case it will also be homogeneous array, but of type `object`.
>
>
>> Inhomogeneous data may rule out some optimizations, but that hardly
>> means that it "doesn't make sense" to use it.
>>
>
> I did not say that it  "doesn't make sense". I only said that you should
> be lucky to call `..method()` on collections of heterogeneous data. And
> therefore, usually this kind of operations imply that you are working with
> a "homogeneous data". Unfortunately, built-in containers cannot provide
> such a guarantee without self-checking. Therefore, in my opinion that at
> the moment such an operator is not needed.
>
> With kind regards,
> -gdg
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Kirill Balunov
сб, 2 февр. 2019 г. в 07:33, Steven D'Aprano :

>
> I didn't say anything about a vector type.
>
>
I agree  you did not say. But since you started a new thread from the one
where the vector type was a little discussed, it seemed to me  that it is
appropriate to mention it here. Sorry about that.


> > Therefore, it allows you to ensure that the method is present for each
> > element in the vector. The first given example is what numpy is all about
> > and without some guarantee that L consists of homogeneous data it hardly
> > make sense.
>
> Of course it makes sense. Even numpy supports inhomogeneous data:
>
> py> a = np.array([1, 'spam'])
> py> a
> array(['1', 'spam'],
>   dtype='|S4')
>
>
Yes, numpy, at some degree, supports heterogeneous arrays. But not in the
way you brought it. Your example just shows homogeneous array of type
`'|S4'`. In the same way as `np.array([1, 1.234])` will be homogeneous. Of
course you can say -  np.array([1, 'spam'], dtype='object'), but in this
case it will also be homogeneous array, but of type `object`.


> Inhomogeneous data may rule out some optimizations, but that hardly
> means that it "doesn't make sense" to use it.
>

I did not say that it  "doesn't make sense". I only said that you should be
lucky to call `..method()` on collections of heterogeneous data. And
therefore, usually this kind of operations imply that you are working with
a "homogeneous data". Unfortunately, built-in containers cannot provide
such a guarantee without self-checking. Therefore, in my opinion that at
the moment such an operator is not needed.

With kind regards,
-gdg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 02:06:56AM -0500, Alex Walters wrote:

> "Television" as a word must annoy you :)  I mentally replaced
> "inhomogeneous" with "heterogeneous"

They don't mean the same thing.

https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomogeneous

-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-02 Thread Steven D'Aprano
On Sat, Feb 02, 2019 at 05:10:14AM +, MRAB wrote:
> On 2019-02-02 04:32, Steven D'Aprano wrote:
> [snip]
> >
> >Of course it makes sense. Even numpy supports inhomogeneous data:
> >
> [snip]
> 
> "inhomogeneous"? Who came up with that?

I don't know, but it has been used since at least the early 1920s

https://english.stackexchange.com/questions/194906/heterogeneous-vs-inhomogeneous

and the Oxford dictionary describes "inhomogenity" as being used from 
the late 19th century. So my guess is, probably people who were more 
familiar with Latin and Greek than we are.

There are many words that are derived from both Latin and Greek. There's 
no rule that says that because a word was derived from Greek, we must 
use Greek grammatical forms for it. We are speaking English, not Greek, 
and in English, we can negate words using the "in" prefix.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-01 Thread MRAB

On 2019-02-02 04:32, Steven D'Aprano wrote:
[snip]


Of course it makes sense. Even numpy supports inhomogeneous data:


[snip]

"inhomogeneous"? Who came up with that?

 "in-" is a negative prefix in Latin words, but "homogeneous" 
comes from Greek, where the negative prefix is "a-" (or "an-" before a 
vowel). I'd go with either "heterogeneous" or "non-homogeneous". 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-01 Thread David Mertz
On Fri, Feb 1, 2019, 6:16 PM Adrien Ricocotam  A thing I thought about but I'm not satisfy is using the new
> matrix-multiplication operator:
>
> my_string_vector @ str.lower
>
> def compute_grad(a_student):
> return "you bad"
> my_student_vector @ compute_grad
>

This is certainly doable. But why would it be better than:

map(str.lower, my_string_vector)
map(compute_grad, my_student_vector)

These latter seem obvious, clear, and familiar.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-01 Thread Adrien Ricocotam
I think the actual proposal is having a new type of list (ie : vectors)
that works like numpy but for any data.
Instead of a list where the user has to be sure all the data is the same
type, vectors makes him-er sure it's full of the same data than can me
processed using a particular function (as s-he would do with map).

I think the syntax proposed is not cool, it's kinda unique in python and
doesn't feel pythonic to me. A thing I thought about but I'm not satisfy is
using the new matrix-multiplication operator:

my_string_vector @ str.lower

def compute_grad(a_student):
return "you bad"
my_student_vector @ compute_grad

But it's a bit confusing to me.

Le ven. 1 févr. 2019 à 17:04, Kirill Balunov  a
écrit :

>
>
> пт, 1 февр. 2019 г. в 02:24, Steven D'Aprano :
>
>> On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas
>> wrote:
>>
>> > I do a lot of numerical programming, and used to use MATLAB and now
>> numpy a
>> > lot. So I am very used to "vectorization" -- i.e. having operations that
>> > work on a whole collection of items at once.
>> [...]
>> > You can imagine that for more complex expressions the "vectorized"
>> approach
>> > can make for much clearer and easier to parse code. Also much faster,
>> which
>> > is what is usually talked about, but I think the readability is the
>> bigger
>> > deal.
>>
>> Julia has special "dot" vectorize operator that looks like this:
>>
>>  L .+ 1   # adds 1 to each item in L
>>
>>  func.(L)   # calls f on each item in L
>>
>> https://julialang.org/blog/2017/01/moredots
>>
>> The beauty of this is that you can apply it to any function or operator
>> and the compiler will automatically vectorize it. The function doesn't
>> have to be written to specifically support vectorization.
>>
>>
> IMO, the beauty of vector type is that it contains homogeneous data.
> Therefore, it allows you to ensure that the method is present for each
> element in the vector. The first given example is what numpy is all about
> and without some guarantee that L consists of homogeneous data it hardly
> make sense. The second one is just `map`. So I can't catch what you are
> proposing:
>
> 1. To make an operator form of `map`.
> 2. To pull numpy into stdlib.
> 3. Or something else, which is not obvious to me from the examples given.
>
> With kind regards,
> -gdg
>
>
>>
>> > So what does this have to do with the topic at hand?
>> >
>> > I know that when I'm used to working with numpy and then need to do some
>> > string processing or some such, I find myself missing this
>> "vectorization"
>> > -- if I want to do the same operation on a whole bunch of strings, why
>> do I
>> > need to write a loop or comprehension or map? that is:
>> >
>> > [s.lower() for s in a_list_of_strings]
>> >
>> > rather than:
>> >
>> > a_list_of_strings.lower()
>>
>> Using Julia syntax, that might become a_list_of_strings..lower(). If you
>> don't like the double dot, perhaps str.lower.(a_list_of_strings) would
>> be less ugly.
>>
>>
>>
>> --
>> Steven
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-02-01 Thread Kirill Balunov
пт, 1 февр. 2019 г. в 02:24, Steven D'Aprano :

> On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas
> wrote:
>
> > I do a lot of numerical programming, and used to use MATLAB and now
> numpy a
> > lot. So I am very used to "vectorization" -- i.e. having operations that
> > work on a whole collection of items at once.
> [...]
> > You can imagine that for more complex expressions the "vectorized"
> approach
> > can make for much clearer and easier to parse code. Also much faster,
> which
> > is what is usually talked about, but I think the readability is the
> bigger
> > deal.
>
> Julia has special "dot" vectorize operator that looks like this:
>
>  L .+ 1   # adds 1 to each item in L
>
>  func.(L)   # calls f on each item in L
>
> https://julialang.org/blog/2017/01/moredots
>
> The beauty of this is that you can apply it to any function or operator
> and the compiler will automatically vectorize it. The function doesn't
> have to be written to specifically support vectorization.
>
>
IMO, the beauty of vector type is that it contains homogeneous data.
Therefore, it allows you to ensure that the method is present for each
element in the vector. The first given example is what numpy is all about
and without some guarantee that L consists of homogeneous data it hardly
make sense. The second one is just `map`. So I can't catch what you are
proposing:

1. To make an operator form of `map`.
2. To pull numpy into stdlib.
3. Or something else, which is not obvious to me from the examples given.

With kind regards,
-gdg


>
> > So what does this have to do with the topic at hand?
> >
> > I know that when I'm used to working with numpy and then need to do some
> > string processing or some such, I find myself missing this
> "vectorization"
> > -- if I want to do the same operation on a whole bunch of strings, why
> do I
> > need to write a loop or comprehension or map? that is:
> >
> > [s.lower() for s in a_list_of_strings]
> >
> > rather than:
> >
> > a_list_of_strings.lower()
>
> Using Julia syntax, that might become a_list_of_strings..lower(). If you
> don't like the double dot, perhaps str.lower.(a_list_of_strings) would
> be less ugly.
>
>
>
> --
> Steven
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-01-31 Thread Robert Vanden Eynde
I love moredots ❤️

With pip install funcoperators, one can implement the *dotmul* iff dotmul
can be implemented as a function.

L *dotmul* 1

Would work.

Or even a simple tweak to the library would allow L *dot* s to be [x*s for
x in L] and L /dot/ s to be [x/s for x in L]"

I'd implement something like "if left is iterable and right is not, apply
[x*y for x in left] else if both are iterable, apply [x*y for x,y in
zip(left, right)] etc."

Iterble

Disclaimer : I'm the creator of funcoperators

On Fri, 1 Feb 2019, 00:23 Steven D'Aprano  On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas
> wrote:
>
> > I do a lot of numerical programming, and used to use MATLAB and now
> numpy a
> > lot. So I am very used to "vectorization" -- i.e. having operations that
> > work on a whole collection of items at once.
> [...]
> > You can imagine that for more complex expressions the "vectorized"
> approach
> > can make for much clearer and easier to parse code. Also much faster,
> which
> > is what is usually talked about, but I think the readability is the
> bigger
> > deal.
>
> Julia has special "dot" vectorize operator that looks like this:
>
>  L .+ 1   # adds 1 to each item in L
>
>  func.(L)   # calls f on each item in L
>
> https://julialang.org/blog/2017/01/moredots
>
> The beauty of this is that you can apply it to any function or operator
> and the compiler will automatically vectorize it. The function doesn't
> have to be written to specifically support vectorization.
>
>
> > So what does this have to do with the topic at hand?
> >
> > I know that when I'm used to working with numpy and then need to do some
> > string processing or some such, I find myself missing this
> "vectorization"
> > -- if I want to do the same operation on a whole bunch of strings, why
> do I
> > need to write a loop or comprehension or map? that is:
> >
> > [s.lower() for s in a_list_of_strings]
> >
> > rather than:
> >
> > a_list_of_strings.lower()
>
> Using Julia syntax, that might become a_list_of_strings..lower(). If you
> don't like the double dot, perhaps str.lower.(a_list_of_strings) would
> be less ugly.
>
>
>
> --
> Steven
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Vectorization [was Re: Add list.join() please]

2019-01-31 Thread David Allemang
I accidentally replied only to Steven - sorry! - this is what I said, with
a typo corrected:

> a_list_of_strings..lower()
>
> str.lower.(a_list_of_strings)

I much prefer this solution to any of the other things discussed so far. I
wonder, though, would it be general enough to simply have this new '.' operator
interact with __iter__, or would there have to be new magic methods like
__veccall__, __vecgetattr__, etc? Would a single __vectorize__ magic method
be enough?

For example, I would expect   (1, 2, 3) .** 2   to evaluate as a tuple and
 [1, 2, 3] .** 2   to evaluate as a list, and   some_generator() .** 2   to
still be a generator.

If there were a   __vectorize__(self, func)   which returned the iterable
result of applying func on each element of self:

class list:
def __vectorize__(self, func):
return [func(e) for e in self]

some_list .* otherbecomes   some_list.__vectorize__(lambda e: e * 2)
some_string..lower()  becomes   some_string.__vectorize__(str.lower)
some_list..attr   becomes
 some_list.__vectorize__(operator.__attrgetter__('attr'))

Perhaps there would be a better name for such a magic method, but I believe
it would allow existing sequences to behave as one might expect, but not
require each operator to require its own definition. I might also be
over-complicating this, but I'm not sure how else to allow different
sequences to give results of their same type.

On Thu, Jan 31, 2019 at 6:24 PM Steven D'Aprano  wrote:

> On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas
> wrote:
>
> > I do a lot of numerical programming, and used to use MATLAB and now
> numpy a
> > lot. So I am very used to "vectorization" -- i.e. having operations that
> > work on a whole collection of items at once.
> [...]
> > You can imagine that for more complex expressions the "vectorized"
> approach
> > can make for much clearer and easier to parse code. Also much faster,
> which
> > is what is usually talked about, but I think the readability is the
> bigger
> > deal.
>
> Julia has special "dot" vectorize operator that looks like this:
>
>  L .+ 1   # adds 1 to each item in L
>
>  func.(L)   # calls f on each item in L
>
> https://julialang.org/blog/2017/01/moredots
>
> The beauty of this is that you can apply it to any function or operator
> and the compiler will automatically vectorize it. The function doesn't
> have to be written to specifically support vectorization.
>
>
> > So what does this have to do with the topic at hand?
> >
> > I know that when I'm used to working with numpy and then need to do some
> > string processing or some such, I find myself missing this
> "vectorization"
> > -- if I want to do the same operation on a whole bunch of strings, why
> do I
> > need to write a loop or comprehension or map? that is:
> >
> > [s.lower() for s in a_list_of_strings]
> >
> > rather than:
> >
> > a_list_of_strings.lower()
>
> Using Julia syntax, that might become a_list_of_strings..lower(). If you
> don't like the double dot, perhaps str.lower.(a_list_of_strings) would
> be less ugly.
>
>
>
> --
> Steven
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Vectorization [was Re: Add list.join() please]

2019-01-31 Thread Steven D'Aprano
On Thu, Jan 31, 2019 at 09:51:20AM -0800, Chris Barker via Python-ideas wrote:

> I do a lot of numerical programming, and used to use MATLAB and now numpy a
> lot. So I am very used to "vectorization" -- i.e. having operations that
> work on a whole collection of items at once.
[...]
> You can imagine that for more complex expressions the "vectorized" approach
> can make for much clearer and easier to parse code. Also much faster, which
> is what is usually talked about, but I think the readability is the bigger
> deal.

Julia has special "dot" vectorize operator that looks like this:

 L .+ 1   # adds 1 to each item in L

 func.(L)   # calls f on each item in L

https://julialang.org/blog/2017/01/moredots

The beauty of this is that you can apply it to any function or operator 
and the compiler will automatically vectorize it. The function doesn't 
have to be written to specifically support vectorization.


> So what does this have to do with the topic at hand?
> 
> I know that when I'm used to working with numpy and then need to do some
> string processing or some such, I find myself missing this "vectorization"
> -- if I want to do the same operation on a whole bunch of strings, why do I
> need to write a loop or comprehension or map? that is:
> 
> [s.lower() for s in a_list_of_strings]
> 
> rather than:
> 
> a_list_of_strings.lower()

Using Julia syntax, that might become a_list_of_strings..lower(). If you 
don't like the double dot, perhaps str.lower.(a_list_of_strings) would 
be less ugly.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/