Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Nathaniel Smith
On Tue, Oct 25, 2016 at 12:38 PM, Stephan Hoyer  wrote:
> With a custom wrapper class, it's possible to preserve NumPy views when
> pickling:
> https://stackoverflow.com/questions/13746601/preserving-numpy-view-when-pickling
>
> This can result in significant time/space savings with pickling views along
> with base arrays and brings the behavior of NumPy more in line with Python
> proper. Is this something that we can/should port into NumPy itself?

Concretely, what do would you suggest should happen with:

base = np.zeros(1)
view = base[:10]

# case 1
pickle.dump(view, file)

# case 2
pickle.dump(base, file)
pickle.dump(view, file)

# case 3
pickle.dump(view, file)
pickle.dump(base, file)

?

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Stephan Hoyer
On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith  wrote:

> Concretely, what do would you suggest should happen with:
>
> base = np.zeros(1)
> view = base[:10]
>
> # case 1
> pickle.dump(view, file)
>
> # case 2
> pickle.dump(base, file)
> pickle.dump(view, file)
>
> # case 3
> pickle.dump(view, file)
> pickle.dump(base, file)
>
> ?
>

I see what you're getting at here. We would need a rule for when to include
the base in the pickle and when not to. Otherwise, pickle.dump(view, file)
always contains data from the base pickle, even with view is much smaller
than base.

The safe answer is "only use views in the pickle when base is already being
pickled", but that isn't possible to check unless all the arrays are
together in a custom container. So, this isn't really feasible for NumPy.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Intel random number package

2016-10-25 Thread Charles R Harris
Hi All,

There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
oleksandr-pavlyk  and implements the
number random number package using MKL for increased speed. I think we are
definitely interested in the improved speed, but I'm not sure numpy is the
best place to put the package. I'd welcome any comments on the PR itself,
as well as any thoughts on the best way organize or use of this work. Maybe
scikit-random

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Charles R Harris
On Tue, Oct 25, 2016 at 10:41 PM, Robert Kern  wrote:

> On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
> >
> > Hi All,
> >
> > There is a proposed random number package PR now up on github:
> https://github.com/numpy/numpy/pull/8209. It is from
> > oleksandr-pavlyk and implements the number random number package using
> MKL for increased speed. I think we are definitely interested in the
> improved speed, but I'm not sure numpy is the best place to put the
> package. I'd welcome any comments on the PR itself, as well as any thoughts
> on the best way organize or use of this work. Maybe scikit-random
>
> This is what ng-numpy-randomstate is for.
>
> https://github.com/bashtage/ng-numpy-randomstate
>

Interesting, despite old fashioned original ziggurat implementation of the
normal and gnu c style... Does that project seek to preserve all the
bytestreams or is it still in flux?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer  wrote:
>
> On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith  wrote:
>>
>> Concretely, what do would you suggest should happen with:
>>
>> base = np.zeros(1)
>> view = base[:10]
>>
>> # case 1
>> pickle.dump(view, file)
>>
>> # case 2
>> pickle.dump(base, file)
>> pickle.dump(view, file)
>>
>> # case 3
>> pickle.dump(view, file)
>> pickle.dump(base, file)
>>
>> ?
>
> I see what you're getting at here. We would need a rule for when to
include the base in the pickle and when not to. Otherwise,
pickle.dump(view, file) always contains data from the base pickle, even
with view is much smaller than base.
>
> The safe answer is "only use views in the pickle when base is already
being pickled", but that isn't possible to check unless all the arrays are
together in a custom container. So, this isn't really feasible for NumPy.

It would be possible with a custom Pickler/Unpickler since they already
keep track of objects previously (un)pickled. That would handle [base,
view] okay but not [view, base], so it's probably not going to be all that
useful outside of special situations. It would make a neat recipe, but I
probably would not provide it in numpy itself.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris 
wrote:
>
> Hi All,
>
> There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
> oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random

This is what ng-numpy-randomstate is for.

https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 10:22 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>
> On Tue, Oct 25, 2016 at 10:41 PM, Robert Kern 
wrote:
>>
>> On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
>> > oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random
>>
>> This is what ng-numpy-randomstate is for.
>>
>> https://github.com/bashtage/ng-numpy-randomstate
>
> Interesting, despite old fashioned original ziggurat implementation of
the normal and gnu c style... Does that project seek to preserve all the
bytestreams or is it still in flux?

I would assume some flux for now, but you can ask the author by submitting
a corrected ziggurat PR as a trial balloon. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Nathaniel Smith
On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan
 wrote:
> It seems pickle keeps track of references for basic python types.
>
> x = [1]
> y = [x]
> x,y = pickle.loads(pickle.dumps((x,y)))
> x.append(2)
> print(y)
 [[1,2]]

Yes, but the problem is: suppose I have a 10 gigabyte array, and then
take a 20 byte slice of it, and then pickle that slice. Do you expect
the pickle file to be 20 bytes, or 10 gigabytes? Both options are
possible, but you have to pick one, and numpy picks 20 bytes. The
advantage is obviously that you don't have mysterious 10 gigabyte
pickle files; the disadvantage is that you can't reconstruct the view
relationships afterwards. (You might think: oh, but we can be clever,
and only record the view relationships if the user pickles both
objects together. But while pickle might know whether the user is
pickling both objects together, it unfortunately doesn't tell numpy,
so we can't really do anything clever or different in this case.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan <
harrigan.matt...@gmail.com> wrote:
>
> It seems pickle keeps track of references for basic python types.
>
> x = [1]
> y = [x]
> x,y = pickle.loads(pickle.dumps((x,y)))
> x.append(2)
> print(y)
> >>> [[1,2]]
>
> Numpy arrays are different but references are forgotten after
pickle/unpickle.  Shared objects do not remain shared.  Based on the quote
below it could be considered bug with numpy/pickle.

Not a bug, but an explicit design decision on numpy's part.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 7:05 PM, Feng Yu  wrote:
>
> Hi,
>
> Just another perspective. base' and 'data' in PyArrayObject are two
> separate variables.
>
> base can point to any PyObject, but it is `data` that defines where
> data is accessed in memory.
>
> 1. There is no clear way to pickle a pointer (`data`) in a meaningful
> way. In order for `data` member to make sense we still need to
> 'readout' the values stored at `data` pointer in the pickle.
>
> 2. By definition base is not necessary a numpy array but it is just
> some other object for managing the memory.

In general, yes, but most often it's another ndarray, and the child is
related to the parent by a slice operation that could be computed by
comparing the `data` tuples. The exercise here isn't to always represent
the general case in this way, but to see what can be done opportunistically
and if that actually helps solve a practical problem.

> 3. One can surely pickle the `base` object as a reference, but it is
> useless if the data memory has been reconstructed independently during
> unpickling.
>
> 4. Unless there is clear way to notify the referencing numpy array of
> the new data pointer. There probably isn't.
>
> BTW, is the stride information is lost during pickling, too? The
> behavior shall probably be documented if not yet.

The stride information may be lost, yes. We reserve the right to retain it,
though (for example, if .T is contiguous then we might well serialize the
transposed data linearly and return a view on that data upon
deserialization). I don't believe that we guarantee that the unpickled
result is contiguous.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Matthew Harrigan
It seems pickle keeps track of references for basic python types.

x = [1]
y = [x]
x,y = pickle.loads(pickle.dumps((x,y)))
x.append(2)
print(y)
>>> [[1,2]]

Numpy arrays are different but references are forgotten after
pickle/unpickle.  Shared objects do not remain shared.  Based on the quote
below it could be considered bug with numpy/pickle.

Object sharing (references to the same object in different places): This is
similar to self-referencing objects; pickle stores the object once, and
ensures that all other references point to the master copy. Shared objects
remain shared, which can be very important for mutable objects.  link


Another example with ndarrays:


x = np.arange(5)
y = x[::-1]
x, y = pickle.loads(pickle.dumps((x, y)))
x[0] = 9
print(y)
>>> [4, 3, 2, 1, 0]

In this case the two arrays share the exact same object for the data buffer
(although object might not be the right word here)

On Tue, Oct 25, 2016 at 7:28 PM, Robert Kern  wrote:

> On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer  wrote:
> >
> > On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith  wrote:
> >>
> >> Concretely, what do would you suggest should happen with:
> >>
> >> base = np.zeros(1)
> >> view = base[:10]
> >>
> >> # case 1
> >> pickle.dump(view, file)
> >>
> >> # case 2
> >> pickle.dump(base, file)
> >> pickle.dump(view, file)
> >>
> >> # case 3
> >> pickle.dump(view, file)
> >> pickle.dump(base, file)
> >>
> >> ?
> >
> > I see what you're getting at here. We would need a rule for when to
> include the base in the pickle and when not to. Otherwise,
> pickle.dump(view, file) always contains data from the base pickle, even
> with view is much smaller than base.
> >
> > The safe answer is "only use views in the pickle when base is already
> being pickled", but that isn't possible to check unless all the arrays are
> together in a custom container. So, this isn't really feasible for NumPy.
>
> It would be possible with a custom Pickler/Unpickler since they already
> keep track of objects previously (un)pickled. That would handle [base,
> view] okay but not [view, base], so it's probably not going to be all that
> useful outside of special situations. It would make a neat recipe, but I
> probably would not provide it in numpy itself.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-25 Thread Peter Creasey
> Date: Mon, 24 Oct 2016 08:44:46 -0400
> From: Matthew Harrigan 
>
> I posted a pull request  which
> adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> options are based on what's available in ediff1d.  It closes this issue
> 

I like the proposal, though I suspect that making it general has
obscured that the most common use-case for padding is to make the
inverse of np.cumsum (at least that’s what I frequently need), and now
in the multidimensional case you have the somewhat unwieldy:

>>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))

rather than

>>> np.diff(a, axis=axis, keep_left=True)

which of course could just be an option upon what you already have.

Best,
Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Stephan Hoyer
With a custom wrapper class, it's possible to preserve NumPy views when
pickling:
https://stackoverflow.com/questions/13746601/preserving-numpy-view-when-pickling

This can result in significant time/space savings with pickling views along
with base arrays and brings the behavior of NumPy more in line with Python
proper. Is this something that we can/should port into NumPy itself?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion