Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Nathaniel Smith
On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
 np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
 np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in regression,
> and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)

In general practice no numpy functions make any guarantee about memory
layout, unless that's explicitly a documented part of their contract
(e.g. 'ascontiguous', or some functions that take an order= argument
-- I say "some" b/c there are functions like 'reshape' that take an
argument called order= that doesn't actually refer to memory layout).
This isn't so much an official policy as just a fact of life -- if
no-one has any idea that the someone is depending on some memory
layout detail then there's no way to realize that we've broken
something. (But it is a good policy IMO.)

If this kind of problem gets caught during a pre-release cycle then we
generally do try to fix it, because we try not to break code, but if
it's been broken for 2 full releases then there's no much we can do --
we can't go back in time to fix it so it sounds like you're stuck
working around the problem no matter what (unless you want to refuse
to support 1.9.0 through 1.10.1, which I assume you don't... worst
case, you just have to do a global search replace of np.column_stack
with statsmodels.utils.column_stack_f, right?).

And the regression issue seems like the only real argument for
changing it back -- we'd never guarantee f-contiguity here if starting
from a blank slate, I think?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:

> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>  np.column_stack((np.ones(10), np.ones(10))).flags
> >   C_CONTIGUOUS : True
> >   F_CONTIGUOUS : False
> >
>  np.__version__
> > '1.9.2rc1'
> >
> >
> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >
> >
> > I was just trying to optimize a loop over variable adjustment in
> regression,
> > and found out that we lost fortran contiguity.
> >
> > I always thought column_stack is for fortran usage (linalg)
> >
> > What's the alternative?
> > column_stack was one of my favorite commands, and I always assumed we
> have
> > in statsmodels the right memory layout to call the linalg libraries.
> >
> > ("assumed" means we don't have timing nor unit tests for it.)
>
> In general practice no numpy functions make any guarantee about memory
> layout, unless that's explicitly a documented part of their contract
> (e.g. 'ascontiguous', or some functions that take an order= argument
> -- I say "some" b/c there are functions like 'reshape' that take an
> argument called order= that doesn't actually refer to memory layout).
> This isn't so much an official policy as just a fact of life -- if
> no-one has any idea that the someone is depending on some memory
> layout detail then there's no way to realize that we've broken
> something. (But it is a good policy IMO.)
>

I understand that in general.

However, I always thought column_stack is a array creation function which
have guaranteed memory layout. And since it's stacking by columns I thought
that order is always Fortran.
And the fact that it doesn't have an order keyword yet, I thought is just a
missing extension.



>
> If this kind of problem gets caught during a pre-release cycle then we
> generally do try to fix it, because we try not to break code, but if
> it's been broken for 2 full releases then there's no much we can do --
> we can't go back in time to fix it so it sounds like you're stuck
> working around the problem no matter what (unless you want to refuse
> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> case, you just have to do a global search replace of np.column_stack
> with statsmodels.utils.column_stack_f, right?).
>
> And the regression issue seems like the only real argument for
> changing it back -- we'd never guarantee f-contiguity here if starting
> from a blank slate, I think?
>

When the cat is out of the bag, the down stream developer writes
compatibility code or helper functions.

I will do that at at least the parts I know are intentionally designed for
F memory order.

---

statsmodels doesn't really check or consistently optimize the memory order,
except in some cython functions.
But, I thought we should be doing quite well with getting Fortran ordered
arrays. I only paid attention where we have more extensive loops internally.

Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
design matrices?

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg 
wrote:

> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> >
>
> 
>
>
> >
> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
> > it didn't do in 1.6
> >
>
> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
> "K" (since "K" did not really exists before 1.7 IIRC).
> Not sure what we can do about it, the order is not something that is
> easily fixed unless explicitly given. It might be optimized (as in this
> case I would guess).
> Whether or not doing the fastest route for these kind of functions is
> faster for the user is of course impossible to know, we can only hope
> that in most cases it is better.
> If someone has an idea how to decide I am all ears, but I think all we
> can do is put in asserts/tests in the downstream code if it relies
> heavily on the order (or just copy, if the order is wrong) :(, another
> example is change of the output order in advanced indexing in some
> cases, it makes it faster sometimes, and probably slower in others, what
> is right seems very much non-trivial.
>

To understand the reason:

Is this to have more efficient memory access during copying?

AFAIU, column_stack needs to create a new array which has to be either F or
C contiguous, so we always have to pick one of the two. With a large number
of 1d arrays it seemed more "intuitive" to me to copy them by columns.

Josef



>
> - Sebastian
>
>
> >
> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
> >   C_CONTIGUOUS : False
> >   F_CONTIGUOUS : True
> >   OWNDATA : True
> >   WRITEABLE : True
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> >
> >
> >
> >
> > which means the default order looks more like "K" now, not "C", IIUC
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Stephan
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy-1.11.0.dev0 windows wheels compiled with mingwpy available

2015-10-19 Thread Olivier Grisel
> Is it possible to test this with py35 as well?

Unfortunately not yet.

> For MSVC, py35 requires a new compiler toolchain (VS2015) -- is that
something mingwpy/mingw-w64 can handle?

I am pretty sure that mingwpy does not support Python 3.5 yet.

I don't know the status of the interop of mingw-w64 w.r.t. VS2015 but as
far as I know it's not supported yet either. Once the issue is fixed at the
upstream level, I think mingwpy could be rebuilt to benefit from the fix.

-- 
Olivier Grisel
​
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers

2015-10-19 Thread cy18
I think these would be useful and easy to implement.

greater_close(a, b) = greater_equal(a, b) | isclose(a, b)
less_close(a, b) = less_equal(a, b) | isclose(a, b)
greater_no_close = greater(a, b) & ~isclose(a, b)
less_no_close = less(a, b) & ~isclose(a, b)

The results are element-wise, just like the original functions.

I'm not sure if it is useful enough to be a part of numpy. If so, I will
try to implement them and make a pull request.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Sebastian Berg
On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> 




> 
> It looks like in 1.9 it depends on the order of the 2-d arrays, which
> it didn't do in 1.6
> 

Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
"K" (since "K" did not really exists before 1.7 IIRC).
Not sure what we can do about it, the order is not something that is
easily fixed unless explicitly given. It might be optimized (as in this
case I would guess).
Whether or not doing the fastest route for these kind of functions is
faster for the user is of course impossible to know, we can only hope
that in most cases it is better.
If someone has an idea how to decide I am all ears, but I think all we
can do is put in asserts/tests in the downstream code if it relies
heavily on the order (or just copy, if the order is wrong) :(, another
example is change of the output order in advanced indexing in some
cases, it makes it faster sometimes, and probably slower in others, what
is right seems very much non-trivial.

- Sebastian


> 
> >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> 
> 
> 
> which means the default order looks more like "K" now, not "C", IIUC
> 
> 
> Josef
> 
> 
>  
> 
> 
> Josef
>  
> 
> 
> 
> 
> Stephan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:00 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
>> >
>>
>> 
>>
>>
>> >
>> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
>> > it didn't do in 1.6
>> >
>>
>> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
>> "K" (since "K" did not really exists before 1.7 IIRC).
>> Not sure what we can do about it, the order is not something that is
>> easily fixed unless explicitly given. It might be optimized (as in this
>> case I would guess).
>> Whether or not doing the fastest route for these kind of functions is
>> faster for the user is of course impossible to know, we can only hope
>> that in most cases it is better.
>> If someone has an idea how to decide I am all ears, but I think all we
>> can do is put in asserts/tests in the downstream code if it relies
>> heavily on the order (or just copy, if the order is wrong) :(, another
>> example is change of the output order in advanced indexing in some
>> cases, it makes it faster sometimes, and probably slower in others, what
>> is right seems very much non-trivial.
>>
>
> To understand the reason:
>
> Is this to have more efficient memory access during copying?
>
> AFAIU, column_stack needs to create a new array which has to be either F
> or C contiguous, so we always have to pick one of the two. With a large
> number of 1d arrays it seemed more "intuitive" to me to copy them by
> columns.
>


just as background

I was mainly surprised last night about having my long held beliefs
shattered. I skipped numpy 1.7 and 1.8 in my development environment and
still need to catch up now that I use 1.9 as my main numpy version.

I might have to update a bit my "folk wisdom", which is not codified
anywhere and doesn't have unit tests.

For example, the improvement iteration for Fortran contiguous or not C or F
contiguous arrays sounded very useful, but I never checked if it would
affect us.

Josef



>
> Josef
>
>
>
>>
>> - Sebastian
>>
>>
>> >
>> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>> >   C_CONTIGUOUS : False
>> >   F_CONTIGUOUS : True
>> >   OWNDATA : True
>> >   WRITEABLE : True
>> >   ALIGNED : True
>> >   UPDATEIFCOPY : False
>> >
>> >
>> >
>> >
>> > which means the default order looks more like "K" now, not "C", IIUC
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Stephan
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> >
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
Hi all,

As mentioned a few weeks ago, I am organizing a "Become an Open Source
Contributor" workshop tonight, for the Data Science Student Society at UCSD.

During this morning I will be creating a few ridiculously simple issues,
e.g. "missing space, arrayobject --> array object", for participants to
work on as part of the workshop. So there may also be a surge in simple PRs
starting at around 7 PM PST.

Please, bear with us. And refrain from fixing those issues if you are not a
workshop participant!

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers

2015-10-19 Thread Robert Kern
On Mon, Oct 19, 2015 at 9:51 PM, cy18  wrote:
>
> It would be useful when we need to subtracting a bit before comparing by
greater or less. By subtracting a bit, we only have an absolute error
tolerance and with the new functions, we can have both absolute and
relative error tolerance. This is how isclose(a, b) better than
abs(a-b)<=atol.

You just adjust the value by whichever tolerance is greatest in magnitude.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers

2015-10-19 Thread cy18
It would be useful when we need to subtracting a bit before comparing by
greater or less. By subtracting a bit, we only have an absolute error
tolerance and with the new functions, we can have both absolute and
relative error tolerance. This is how isclose(a, b) better than
abs(a-b)<=atol.


2015-10-19 15:46 GMT-04:00 Chris Barker :

>
>
> On Mon, Oct 19, 2015 at 3:06 AM, cy18  wrote:
>
>> I think these would be useful and easy to implement.
>>
>> greater_close(a, b) = greater_equal(a, b) | isclose(a, b)
>> less_close(a, b) = less_equal(a, b) | isclose(a, b)
>> greater_no_close = greater(a, b) & ~isclose(a, b)
>> less_no_close = less(a, b) & ~isclose(a, b)
>>
>
> What's the use-case here? we need is_close because we want to test
> equality, but precision errors are such that two floats may be as close to
> equal as they can be given the computations done. And the assumption is
> that you don't care about the precision to the point you specify.
>
> But for a greater_than (or equiv) comparison, if you the precision is not
> important beyond a certain level, then it's generally not important whether
> you get greater than or less than when it's that close
>
> And this would great a wierd property that some values would be greater
> than, less than, and equal to a target value -- pretty weird!
>
> note that you can get the same effect by subtracting a bit from your
> comparison value for a greater than check...
>
> But maybe there is a common use-case that I'm not thinking of..
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers

2015-10-19 Thread Chris Barker
On Mon, Oct 19, 2015 at 3:06 AM, cy18  wrote:

> I think these would be useful and easy to implement.
>
> greater_close(a, b) = greater_equal(a, b) | isclose(a, b)
> less_close(a, b) = less_equal(a, b) | isclose(a, b)
> greater_no_close = greater(a, b) & ~isclose(a, b)
> less_no_close = less(a, b) & ~isclose(a, b)
>

What's the use-case here? we need is_close because we want to test
equality, but precision errors are such that two floats may be as close to
equal as they can be given the computations done. And the assumption is
that you don't care about the precision to the point you specify.

But for a greater_than (or equiv) comparison, if you the precision is not
important beyond a certain level, then it's generally not important whether
you get greater than or less than when it's that close

And this would great a wierd property that some values would be greater
than, less than, and equal to a target value -- pretty weird!

note that you can get the same effect by subtracting a bit from your
comparison value for a greater than check...

But maybe there is a common use-case that I'm not thinking of..

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Workshop tonight, expect GitHub activity

2015-10-19 Thread Jaime Fernández del Río
On Mon, Oct 19, 2015 at 9:11 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> Hi all,
>
> As mentioned a few weeks ago, I am organizing a "Become an Open Source
> Contributor" workshop tonight, for the Data Science Student Society at UCSD.
>
> During this morning I will be creating a few ridiculously simple issues,
> e.g. "missing space, arrayobject --> array object", for participants to
> work on as part of the workshop. So there may also be a surge in simple PRs
> starting at around 7 PM PST.
>

Ok, so issues 6515 up to and including 6525 are mine, should have them all
fixed and closed by the end of today.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making datetime64 timezone naive

2015-10-19 Thread Chris Barker
On Sun, Oct 18, 2015 at 12:20 PM, Alexander Belopolsky 
wrote:

>
> On Sat, Oct 17, 2015 at 6:59 PM, Chris Barker 
> wrote:
>
>> Off the top of my head, I think allowing a 60th second makes more sense
>> -- jsut like we do leap years.
>
>
> Yet we don't implement DST by allowing the 24th hour.  Even the countries
> that adjust the clocks at midnight don't do that.
>

Well, isn't that about conforming to already existing standards? DST is a
civil construct -- and mst (all?) implementations use the convention of
having repeated times. -- so that's what software has to deal with.

IIUC, at least +some+standards handle leap seconds by adding a 60th (61st)
second, rather than having a repeated one. So it's at least an option to do
it that way. And it can then fit into the already existing standards for
representing datetimes, etc.

Does the "fold" flag approach for representing, well, "folds" exist in a
widely used standards? It's my impression that it doesn't since we had to
argue a lot about what to call it :-)


> In some sense leap seconds are more similar to timezone changes (DST or
> political) because they are irregular and unpredictable.
>

in that regard, yes -- you need a constantly updating database to use them.
but I don't know that that has any impact on how you represent them. They
seem a lot more like leap years to me -- some februaries have a 29th day --
some hours on some days have a 61st second.


> Furthermore, the notion of "fold" is not tied to a particular 24/60/60
> system of encoding times and thus more applicable to numpy where
> times are encoded as binary integers.
>

but there are no folds in the underlying integer representation -- that is
the "continuous" time scale -- the folds (or leap seconds, or leap years,
or any of the 24/60/60 business comes in only when you want to go to-from
the "datetime" representation.

If anyone decides to actually get around to leap seconds support in numpy
> datetime, s/he can decide ...


This attitude is the reason why we will probably never have bug free
software when it comes to civil time reckoning.

OK -- fair enough -- good to think about it sooner than later.


Similarly, current numpy.datetime64 design ties arithmetic with encoding.
> This makes arithmetic easier, but in the long run may preclude designs that
> better match the problem domain.


I don't follow here -- how can you NOT tied arithmetic to encoding? sure
you could decide that you are going to overload the arithmetic, and it's up
t the object that encodes the data to do that math -- but that's pretty
much what datetime64 is doing -- defining an encoding so that it can do
math -- numpy dtypes are very much about binary representation. No reason
one couldn't make a different numpy dtype for datetimes that encoded it a
different way, and then it would have to implement math, too.



Note how the development of PEP 495 has highlighted the fact that allowing
binary operations (subtraction, comparison etc.) between times in different
timezones was a design mistake.  It will be wise to learn from such
mistakes when redesigning numpy.datetime64.

So was not considering folds -- frankly, and I this this may be your point,
I don't think timezones were well thought out at all when datetime
was first introduced -- and however well thought out it was, if you don't
provide an implementation, you are not going to find the limitations. And
despite Tim's articulate defense of the original impp;imentation decisions,
I think encoding the datetime in the local "calendar/clock" just invites a
mess. And I'm quite convinced that it wouldn't be a the way to go for numpy
use-cases.

If you ever plan to support civil time in some form, you should think about
it now.

well, the goal for now is naive time -- and unlike the original datetime --
we are not adding on a "you can implement your own timezone handling this
way" hook yet.

> In Python 3.6, datetime.now() will return different values in the first
and the second repeated hour in the "fall-back fold." > If you allow
datetime.datetime to numpy.datetime64 conversion, you should decide what
you do with that difference.

Indeed. Though will that only occur with timezones that have DST? I know
I'd be fine with NOT being able to create a numpy datetime64 from a
non-naive datetime object.  Which would force the user to think about and
convert to the timezone they want before passing off to numpy.

Unless you can suggest a sensible default way to handle this. At first
blush, I think naive time does not have folds, so there is no way to handle
them "properly"

Also -- I think we are at phase one of a (at least) two step process:

1) clean up datetime64 just enough that it is useful, and less error-prone
-- i.e. have it not pretend to support anything other than naive datetimes.

2) Do it right -- perhaps adding some time zone support. This is going to
wait until the numpy dtype machinery is cleaned up some.

Phase 2 is 

Re: [Numpy-discussion] Making datetime64 timezone naive

2015-10-19 Thread Chris Barker - NOAA Federal
> This is fine.  Just be aware that *naive* datetimes will also have the PEP 
> 495 "fold" attribute in Python 3.6.  You are free to ignore it, but you will 
> loose the ability to round-trip between naive stdlib datetimes and 
> numpy.datetime64.

Sigh. I can see why it's there ( primarily to support now(), I
suppose). But a naive datetime doesn't have a timezone, so how could
you know what  time one actually corresponds to if fold is True? And
what could you do with it if you did know?

I've always figured that if you are using naive time for times in a
timezone that has DST, than you'd better know wether you were in DST
or not.

(Which fold tells you, I guess) but the fold isn't guaranteed to be an
hour is it? So without more info, what can you do? And if the fold bit
is False, then you still have no idea if you are in DST or not.

And then what if you attach a timezone to it? Then the fold bit could
be wrong...

I take it back, I can't see why the fold bit could be anything but
confusing for a naive datetime. :-)

Anyway, all I can see to do here is for the datetime64 docs to say
that fold is ignored if it's there.

But what should datetime64 do when provided with a datetime with a timezone?

- Raise an exception?
- ignore the timezone?
- Convert to UTC?

If the time zone is ignored, then you could get DST and non DST times
in the same array - that could be ugly.

Is there any way to query a timezone object to ask if it's a constant-offset?

And yes, I did mean "most". There is no way I'm ever going to
introduce a three letter "timezone" abbreviation in one of these
threads!

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy-1.11.0.dev0 windows wheels compiled with mingwpy available

2015-10-19 Thread Nathaniel Smith
On Mon, Oct 19, 2015 at 2:26 AM, Olivier Grisel
 wrote:
>> Is it possible to test this with py35 as well?
>
> Unfortunately not yet.
>
>> For MSVC, py35 requires a new compiler toolchain (VS2015) -- is that
>> something mingwpy/mingw-w64 can handle?
>
> I am pretty sure that mingwpy does not support Python 3.5 yet.

Correct.

> I don't know the status of the interop of mingw-w64 w.r.t. VS2015 but as far
> as I know it's not supported yet either. Once the issue is fixed at the
> upstream level, I think mingwpy could be rebuilt to benefit from the fix.

Upstream mingw-w64 doesn't support interop with any version of visual
studio that was released this millennium -- all the interop stuff is
new in mingwpy.

VS2015 had a major reorganization of how it handles runtime libraries,
so it's not quite so trivial as just adding support the same way as
was done for VS2008 and VS2010. Or rather, IIUC: we *could* just add
support the same way as before, but there are undocumented rules about
which parts of the new runtime are considered stable and which are
not, so if we did this willy-nilly then we might end up using some of
the "unstable" parts. And then in 2017 the Windows team does some
internal refactoring and pushes it out through windows update and
suddenly NumPy / R / Julia / git / ... all start segfaulting at
startup on Windows, which would be a disaster from everyone's point of
view. We've pointed this out to the Python team at Microsoft and
they've promised to try and put Carl and the relevant mingw-w64 folks
in touch with the relevant internal folks at MS to hopefully tell us
how to do this correctly... fingers crossed :-).

Aside from that, the main challenge for mingwpy in general is exactly
the issue of upstream support: if we don't get the interop stuff
pushed upstream from mingwpy to mingw-w64, then it will rot and break.
And upstream would love to have this interoperability as an officially
supported feature... but upstream doesn't consider what we have right
now to be maintainable, so they won't take it as is. (And honestly,
this is a reasonable opinion.) So what I've been trying to do is to
scrounge up some funding to support Carl and upstream doing this right
(the rough estimate is ~3 person-months of work).

The original goal was to get MS to pay for this, on the theory that
they should be cleaning up their own messes, but after 6 months of
back-and-forth we've pretty much given up on that at this point, and
I'm in the process of emailing everyone I can think of who might be
convinced to donate some money to the cause. Maybe we should have a
kickstarter or something, I dunno :-).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Behavior of numpy.copy with sub-classes

2015-10-19 Thread Jonathan Helmus
In GitHub issue #3474, a number of us have started a conversation on how 
NumPy's copy function should behave when passed an instance which is a 
sub-class of the array class.  Specifically, the issue began by noting 
that when a MaskedArray is passed to np.copy, the sub-class is not 
passed through but rather a ndarray is returned.


I suggested adding a "subok" parameter which controls how sub-classes 
are handled and others suggested having the function call a copy method 
on duck arrays.  The "subok" parameter is implemented in PR #6509 as an 
example. Both of these options would change the API of numpy.copy and 
possibly break backwards compatibility.  Do others have an opinion of 
how np.copy should handle sub-classes?


For a concrete example of this behavior and possible changes, what type 
should copy_x be in the following snippet:


import numpy as np
x = np.ma.array([1,2,3])
copy_x = np.copy(x)


Cheers,

- Jonathan Helmus
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:

> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
> >
> >
> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
> >>
> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
> >>  np.column_stack((np.ones(10), np.ones(10))).flags
> >> >   C_CONTIGUOUS : True
> >> >   F_CONTIGUOUS : False
> >> >
> >>  np.__version__
> >> > '1.9.2rc1'
> >> >
> >> >
> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >> >
> >> >
> >> > I was just trying to optimize a loop over variable adjustment in
> >> > regression,
> >> > and found out that we lost fortran contiguity.
> >> >
> >> > I always thought column_stack is for fortran usage (linalg)
> >> >
> >> > What's the alternative?
> >> > column_stack was one of my favorite commands, and I always assumed we
> >> > have
> >> > in statsmodels the right memory layout to call the linalg libraries.
> >> >
> >> > ("assumed" means we don't have timing nor unit tests for it.)
> >>
> >> In general practice no numpy functions make any guarantee about memory
> >> layout, unless that's explicitly a documented part of their contract
> >> (e.g. 'ascontiguous', or some functions that take an order= argument
> >> -- I say "some" b/c there are functions like 'reshape' that take an
> >> argument called order= that doesn't actually refer to memory layout).
> >> This isn't so much an official policy as just a fact of life -- if
> >> no-one has any idea that the someone is depending on some memory
> >> layout detail then there's no way to realize that we've broken
> >> something. (But it is a good policy IMO.)
> >
> >
> > I understand that in general.
> >
> > However, I always thought column_stack is a array creation function which
> > have guaranteed memory layout. And since it's stacking by columns I
> thought
> > that order is always Fortran.
> > And the fact that it doesn't have an order keyword yet, I thought is
> just a
> > missing extension.
>
> I guess I don't know what to say except that I'm sorry to hear that
> and sorry that no-one noticed until several releases later.
>


Were there more contiguity changes in 0.10?
I just saw a large number of test errors and failures in statespace models
which are heavily based on cython code where it's not just a question of
performance.

I don't know yet what's going on, but I just saw that we have some explicit
tests for fortran contiguity which just started to fail.




>
> >> If this kind of problem gets caught during a pre-release cycle then we
> >> generally do try to fix it, because we try not to break code, but if
> >> it's been broken for 2 full releases then there's no much we can do --
> >> we can't go back in time to fix it so it sounds like you're stuck
> >> working around the problem no matter what (unless you want to refuse
> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> >> case, you just have to do a global search replace of np.column_stack
> >> with statsmodels.utils.column_stack_f, right?).
> >>
> >> And the regression issue seems like the only real argument for
> >> changing it back -- we'd never guarantee f-contiguity here if starting
> >> from a blank slate, I think?
> >
> >
> > When the cat is out of the bag, the down stream developer writes
> > compatibility code or helper functions.
> >
> > I will do that at at least the parts I know are intentionally designed
> for F
> > memory order.
> >
> > ---
> >
> > statsmodels doesn't really check or consistently optimize the memory
> order,
> > except in some cython functions.
> > But, I thought we should be doing quite well with getting Fortran ordered
> > arrays. I only paid attention where we have more extensive loops
> internally.
> >
> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> > design matrices?
>
> I never thought about it :-). So: no, it looks like right now patsy
> usually returns C-order matrices (or really, whatever np.empty or
> np.repeat returns), and there aren't any particular guarantees that
> this will continue to be the case in the future.
>
> Is returning matrices in F-contiguous layout really important? Should
> there be a return_type="fortran_matrix" option or something like that?
>

I don't know, yet. My intuition was that it would be better because we feed
the arrays directly to pinv/SVD or QR which, I think, require by default
Fortran contiguous.

However, my intuition might not be correct, and it might not make much
difference in a single OLS estimation.

There are a few critical loops in variable selection that I'm planning to
investigate to see how much it matters.
Memory optimization was never high in our priority compared to expanding
the functionality overall, but reading the Julia mailing list is starting
to worry me a bit. :)

(I'm even starting to see the reason for multiple dispatch.)

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

Re: [Numpy-discussion] Behavior of numpy.copy with sub-classes

2015-10-19 Thread Nathan Goldbaum
On Mon, Oct 19, 2015 at 7:23 PM, Jonathan Helmus  wrote:

> In GitHub issue #3474, a number of us have started a conversation on how
> NumPy's copy function should behave when passed an instance which is a
> sub-class of the array class.  Specifically, the issue began by noting that
> when a MaskedArray is passed to np.copy, the sub-class is not passed
> through but rather a ndarray is returned.
>
> I suggested adding a "subok" parameter which controls how sub-classes are
> handled and others suggested having the function call a copy method on duck
> arrays.  The "subok" parameter is implemented in PR #6509 as an example.
> Both of these options would change the API of numpy.copy and possibly break
> backwards compatibility.  Do others have an opinion of how np.copy should
> handle sub-classes?
>
> For a concrete example of this behavior and possible changes, what type
> should copy_x be in the following snippet:
>
> import numpy as np
> x = np.ma.array([1,2,3])
> copy_x = np.copy(x)
>

FWIW, it looks like np.copy() is never used in our code to work with the
ndarray subclass we maintain in yt. Instead we use the copy() method much
more often, and that returns the appropriate type. I guess it makes sense
to have the type of the return value of np.copy() agree with the type of
the copy() member function.

That said, breaking backwards compatibility here before numpy 2.0 might
very well break real code. It might be worth it search e.g. github for all
instances of np.copy() to see if they're dealing with subclasses.


>
>
> Cheers,
>
> - Jonathan Helmus
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] correct sizeof for ndarray

2015-10-19 Thread Jason Newton
Hi folks,

I noticed an unexpected behavior of itemsize for structures with offsets
that are larger than that of a packed structure in memory.  This matters
when parsing in memory structures from C and some others (recently and
HDF5/h5py detail got me for a bit).

So what is the correct way to get "sizeof" a structure?  AFAIK this is the
size of the last item + it's offset.  If this doesn't exist... shouldn't it?

Thanks,
Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Nathaniel Smith
On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>
>
> On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >   C_CONTIGUOUS : True
>> >   F_CONTIGUOUS : False
>> >
>>  np.__version__
>> > '1.9.2rc1'
>> >
>> >
>> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >
>> >
>> > I was just trying to optimize a loop over variable adjustment in
>> > regression,
>> > and found out that we lost fortran contiguity.
>> >
>> > I always thought column_stack is for fortran usage (linalg)
>> >
>> > What's the alternative?
>> > column_stack was one of my favorite commands, and I always assumed we
>> > have
>> > in statsmodels the right memory layout to call the linalg libraries.
>> >
>> > ("assumed" means we don't have timing nor unit tests for it.)
>>
>> In general practice no numpy functions make any guarantee about memory
>> layout, unless that's explicitly a documented part of their contract
>> (e.g. 'ascontiguous', or some functions that take an order= argument
>> -- I say "some" b/c there are functions like 'reshape' that take an
>> argument called order= that doesn't actually refer to memory layout).
>> This isn't so much an official policy as just a fact of life -- if
>> no-one has any idea that the someone is depending on some memory
>> layout detail then there's no way to realize that we've broken
>> something. (But it is a good policy IMO.)
>
>
> I understand that in general.
>
> However, I always thought column_stack is a array creation function which
> have guaranteed memory layout. And since it's stacking by columns I thought
> that order is always Fortran.
> And the fact that it doesn't have an order keyword yet, I thought is just a
> missing extension.

I guess I don't know what to say except that I'm sorry to hear that
and sorry that no-one noticed until several releases later.

>> If this kind of problem gets caught during a pre-release cycle then we
>> generally do try to fix it, because we try not to break code, but if
>> it's been broken for 2 full releases then there's no much we can do --
>> we can't go back in time to fix it so it sounds like you're stuck
>> working around the problem no matter what (unless you want to refuse
>> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> case, you just have to do a global search replace of np.column_stack
>> with statsmodels.utils.column_stack_f, right?).
>>
>> And the regression issue seems like the only real argument for
>> changing it back -- we'd never guarantee f-contiguity here if starting
>> from a blank slate, I think?
>
>
> When the cat is out of the bag, the down stream developer writes
> compatibility code or helper functions.
>
> I will do that at at least the parts I know are intentionally designed for F
> memory order.
>
> ---
>
> statsmodels doesn't really check or consistently optimize the memory order,
> except in some cython functions.
> But, I thought we should be doing quite well with getting Fortran ordered
> arrays. I only paid attention where we have more extensive loops internally.
>
> Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> design matrices?

I never thought about it :-). So: no, it looks like right now patsy
usually returns C-order matrices (or really, whatever np.empty or
np.repeat returns), and there aren't any particular guarantees that
this will continue to be the case in the future.

Is returning matrices in F-contiguous layout really important? Should
there be a return_type="fortran_matrix" option or something like that?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-19 Thread josef.pktd
On Fri, Oct 16, 2015 at 9:31 PM, Allan Haldane 
wrote:

> On 10/16/2015 09:17 PM, josef.p...@gmail.com wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane > > wrote:
>>
>> On 10/16/2015 05:31 PM, josef.p...@gmail.com
>>  wrote:
>> >
>> >
>> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 11:58 AM, > 
>>  > >
>> >> wrote:
>>  >
>>  > was there a change with reduce operations with
>> recarrays in
>>  > 1.10 or 1.10.1?
>>  >
>>  > Travis shows a new test failure in the statsmodels
>> testsuite
>>  > with 1.10.1:
>>  >
>>  > ERROR: test suite for >  > 'statsmodels.base.tests.test_data.TestRecarrays'>
>>  >
>>  >   File
>>  >
>>
>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>  > line 131, in _handle_constant
>>  > const_idx = np.where(self.exog.ptp(axis=0) ==
>>  > 0)[0].squeeze()
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sorry for asking so late.
>>  > (statsmodels is short on maintainers, and I'm
>> distracted)
>>  >
>>  >
>>  > statsmodels still has code to support recarrays and
>>  > structured dtypes from the time before pandas became
>>  > popular, but I don't think anyone is using them
>> together
>>  > with statsmodels anymore.
>>  >
>>  >
>>  > There were several commits dealing both recarrays and
>> ufuncs, so
>>  > this might well be a regression.
>>  >
>>  >
>>  > A bisection would be helpful. Also, open an issue.
>>  >
>>  >
>>  >
>>  > The reason for the test failure might be somewhere else hiding
>> behind
>>  > several layers of statsmodels, but only started to show up with
>> numpy 1.10.1
>>  >
>>  > I already have the reduce exception with my currently installed
>> numpy
>>  > '1.9.2rc1'
>>  >
>>   x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
>>  > ('x_2', 'f8')]).view(np.recarray)
>>  >
>>   np.ptp(x, axis=0)
>>  > Traceback (most recent call last):
>>  >   File "", line 1, in 
>>  >   File
>>  >
>>
>> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
>>  > line 2047, in ptp
>>  > return ptp(axis, out)
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sounds like fun, and I don't even know how to automatically bisect.
>>  >
>>  > Josef
>>
>> That example isn't the problem (ptp should definitely fail on
>> structured
>> arrays), but I've tracked down what is - it has to do with views of
>> record arrays.
>>
>> The fix looks simple, I'll get it in for the next release.
>>
>>
>> Thanks,
>>
>> I realized that at that point in the statsmodels code we should have
>> only regular ndarrays, so the array conversion fails somewhere.
>>
>> AFAICS, the main helper function to convert is
>>
>> def struct_to_ndarray(arr):
>>  return arr.view((float, len(arr.dtype.names)))
>>
>> which doesn't look like it will handle other dtypes than float64. Nobody
>> ever complained, so maybe our test suite is the only user of this.
>>
>> What is now the recommended way of converting structured
>> dtypes/recarrays to ndarrays?
>>
>> Josef
>>
>
> Yes, that's the code I narrowed it down to as well. I think the code in
> statsmodels is fine, the problem is actually a  bug I must admit I
> introduced in changes to the way views of recarrays work.
>
> If you are curious, the bug is in this line:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467
>
> This line was intended to fix the problem that accessing a nested record
> array field would lose the 'np.record' dtype. I only considered void
> structured arrays, and had forgotten about sub-arrays which statsmodels
> uses.
>
> I think the fix is to replace `issubclass(val.type, 

Re: [Numpy-discussion] Behavior of numpy.copy with sub-classes

2015-10-19 Thread Charles R Harris
On Mon, Oct 19, 2015 at 8:28 PM, Nathan Goldbaum 
wrote:

>
>
> On Mon, Oct 19, 2015 at 7:23 PM, Jonathan Helmus 
> wrote:
>
>> In GitHub issue #3474, a number of us have started a conversation on how
>> NumPy's copy function should behave when passed an instance which is a
>> sub-class of the array class.  Specifically, the issue began by noting that
>> when a MaskedArray is passed to np.copy, the sub-class is not passed
>> through but rather a ndarray is returned.
>>
>> I suggested adding a "subok" parameter which controls how sub-classes are
>> handled and others suggested having the function call a copy method on duck
>> arrays.  The "subok" parameter is implemented in PR #6509 as an example.
>> Both of these options would change the API of numpy.copy and possibly break
>> backwards compatibility.  Do others have an opinion of how np.copy should
>> handle sub-classes?
>>
>> For a concrete example of this behavior and possible changes, what type
>> should copy_x be in the following snippet:
>>
>> import numpy as np
>> x = np.ma.array([1,2,3])
>> copy_x = np.copy(x)
>>
>
> FWIW, it looks like np.copy() is never used in our code to work with the
> ndarray subclass we maintain in yt. Instead we use the copy() method much
> more often, and that returns the appropriate type. I guess it makes sense
> to have the type of the return value of np.copy() agree with the type of
> the copy() member function.
>
> That said, breaking backwards compatibility here before numpy 2.0 might
> very well break real code. It might be worth it search e.g. github for all
> instances of np.copy() to see if they're dealing with subclasses.
>

The problem with github searches is that there are a ton of numpy forks.
ISTR once finding a method to avoid them, but can't remember what is was.
If anyone knows how to do that, I'd appreciate learning.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion