[Numpy-discussion] Proposal to support __format__

2017-02-14 Thread Gustav Larsson
Hi everyone!

I want to discuss adding support for __format__ in ndarray and I am willing to
contribute code-wise once consensus has been reached. It was briefly
discussed on GitHub two years ago (https://github.com/numpy/numpy/issues/5543)
and I will re-iterate some of the points made there and build off of that. I
have been thinking about this a lot in the last few weeks and my thoughts turned
into a fairly fleshed out proposal. The discussion should probably start more
high-level, so I apologize if the level of detail is inappropriate at this
point in time.

I decided on a gist, since the email got too long and clear formatting helps:

https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069

OK, those are my thoughts for now. What do you think?

Cheers,
Gustav
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to support __format__

2017-02-14 Thread Gustav Larsson
>
> I encourage you to submit it as a pull request to the NumPy repository as
> a "NumPy Enhancement Proposal", either now or after we've discussed it:
> https://docs.scipy.org/doc/numpy-dev/neps/index.html


OK, I will let it go through one iteration of comments and then I'll submit
one. Thanks!

1. For object arrays, I would default to calling format on each element
> (your "map principle") rather than raising an error.


I'm glad you brought this up as a possibility. It might be possible, but
there are some issues that would need to be resolved. First of all, {} and
{:} always works and gives the same result it currently does. So, this only
affects the situation where the format spec is non-empty. I think there are
two main issues:

Heterogeneity: Let's say we have x = np.array([12.3, True, 'string',
Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a
ValueError since the string does not support format type 'f'. This could
create a lot of ValueError land mines for the user. For x[:2] however it
should work and produce something like [12.3  1.0]. Note, the "map
principle" still can't be strictly true. Let's say we have an array with
type object and mostly string-like elements. Then {:5s} will still not
produce exactly {:5s} element-wise, because the string representations need
to be repr-based inside the array (otherwise it could break for newlines
and things like that and produce spaces that make the boundary between
elements ambiguous). This brings me to the next issue.

Str vs. repr: If we have a homogeneous object-array with types Foo and Foo
implements __format__, it would be great if this worked. However, one issue
is that Foo.__format__ might return things like newline (or spaces), which
would break (or confuse) the printed output (unless it is made incredibly
smart to support "vertical alignment"). This issue is essentially the same
as for strings in general, which is why they use repr instead. I can think
of two solutions: 1) Try to sanitize (or repr-ify) the string returned by
__format__ somehow; 2) Put the responsibility on the user and simply let
the rendering break if Foo.__format__ does not play well.

2. It's absolutely OK to leave functionality unimplemented and not
> immediately nail down every edge case. As a default, I would suggest
> raising errors whenever non-empty type specifications are provided rather
> than raising errors in every case.
>

I agree.

Gustav


On Tue, Feb 14, 2017 at 3:59 PM, Stephan Hoyer <sho...@gmail.com> wrote:

> On Tue, Feb 14, 2017 at 3:34 PM, Gustav Larsson <lars...@cs.uchicago.edu>
> wrote:
>
>> Hi everyone!
>>
>> I want to discuss adding support for __format__ in ndarray and I am
>> willing to
>> contribute code-wise once consensus has been reached. It was briefly
>> discussed on GitHub two years ago (https://github.com/numpy/nump
>> y/issues/5543)
>> and I will re-iterate some of the points made there and build off of
>> that. I
>> have been thinking about this a lot in the last few weeks and my thoughts
>> turned
>> into a fairly fleshed out proposal. The discussion should probably start
>> more
>> high-level, so I apologize if the level of detail is inappropriate at this
>> point in time.
>>
>> I decided on a gist, since the email got too long and clear formatting
>> helps:
>>
>> https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069
>
>
> This is a lovely and clearly written document. Thanks for taking the time
> to think through this!
>
> I encourage you to submit it as a pull request to the NumPy repository as
> a "NumPy Enhancement Proposal", either now or after we've discussed it:
> https://docs.scipy.org/doc/numpy-dev/neps/index.html
>
>
>> OK, those are my thoughts for now. What do you think?
>>
>
> Two thoughts for now:
> 1. For object arrays, I would default to calling format on each element
> (your "map principle") rather than raising an error.
> 2. It's absolutely OK to leave functionality unimplemented and not
> immediately nail down every edge case. As a default, I would suggest
> raising errors whenever non-empty type specifications are provided rather
> than raising errors in every case.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to support __format__

2017-02-15 Thread Gustav Larsson
>
> This is great!


Thanks! Glad to be met by enthusiasm about this.

1. You basically have a NEP already! Making a PR from it allows to
> give line-by-line comments, so would help!


I will do this soon.

2. Don't worry about supporting python2 specifics; just try to ensure
> it doesn't break; I would not say more about it!


Sounds good to me.

3. On `set_printoptions` -- ideally, it will become possible to use
> this as a context (i.e., `with set_printoption(...)`). It might make
> sense to have an `override_format` keyword argument to it.


Having a `with np.printoptions(...)` context manager is a great idea. It
does sound orthogonal to __format__ though, so it could be addressed
separately.

4. Otherwise, my main suggestion is to start small with the more
> obvious ones, and not worry too much about format validation, but
> rather about getting the simple ones to work well (e.g., for an object
> array, just apply the format given; if it doesn't work, it will error
> out on its own, which is OK).


Sounds good to me. I was thinking of approaching the implementation by
writing unit tests first and group them into different priority tiers. That
way, the unit tests can go through another review before implementation
gets going. I agree that __format__ doesn't have to check format validation
if a ValueError is going to be raised anyway by sub-calls.

5. One bit of detail: the "g" one does confuse me.


I will re-write this a bit to make it clearer. Basically, the 'g' with the
mix of 'e'/'f' depending on max/min>1000 is all from the current numpy
behavior, so it is not something I had much creative input on at all.
Although, as it is written right now it may seem so. That is, the goal is
to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for
built-in floats. Then, if the user departs a bit, like {:.2g}, it will
simply be identical to calling np.set_printoptions(precision=2) first.

Gustav

On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Gustav,
>
> This is great!  A few quick comments (mostly echo-ing Stephan's).
>
> 1. You basically have a NEP already! Making a PR from it allows to
> give line-by-line comments, so would help!
>
> 2. Don't worry about supporting python2 specifics; just try to ensure
> it doesn't break; I would not say more about it!
>
> 3. On `set_printoptions` -- ideally, it will become possible to use
> this as a context (i.e., `with set_printoption(...)`). It might make
> sense to have an `override_format` keyword argument to it.
>
> 4. Otherwise, my main suggestion is to start small with the more
> obvious ones, and not worry too much about format validation, but
> rather about getting the simple ones to work well (e.g., for an object
> array, just apply the format given; if it doesn't work, it will error
> out on its own, which is OK).
>
> 5. One bit of detail: the "g" one does confuse me.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Arrays and format()

2017-02-28 Thread Gustav Larsson
I am hoping to submit a PR for a __format__ numpy enhancement proposal this
weekend. I will be a slightly revised version of my original draft posted
here two weeks ago. Ryan, if you have any thoughts on the writeup
 so far,
I'd love to hear them.

On Tue, Feb 28, 2017 at 9:38 AM, Nathan Goldbaum 
wrote:

> See this issue:
>
> https://github.com/numpy/numpy/issues/5543
>
> There was also a very thorough discussion of this recently on this mailing
> list:
>
> http://numpy-discussion.10968.n7.nabble.com/Proposal-to-
> support-format-td43931.html
>
> On Tue, Feb 28, 2017 at 11:32 AM Ryan May  wrote:
>
>> Hi,
>>
>> Can someone take a look at: https://github.com/numpy/numpy/issues/7978
>>
>> The crux of the issue is that this:
>>
>> # This works
>> a = "%0.3g" % np.array(2)
>> a
>> '2'
>>
>> # This does not
>> a = "{0:0.3g}".format(np.array(2))
>> TypeError: non-empty format string passed to object.__format__
>>
>> I've now hit this in my code. If someone can even point me in the general
>> direction of the code to dig into for this (please let it be python, please
>> let it be python...), I'll dig in more.
>>
>> Ryan
>>
>> --
>> Ryan May
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion