Re: [Numpy-discussion] Possible bug in Numpy.convolve

2017-03-22 Thread josef . pktd
On Wed, Mar 22, 2017 at 10:06 PM, Thøger Emil Rivera-Thorsen <
thoger.e...@gmail.com> wrote:

> Dear list;
>
> I am honestly not certain whether this, or the SciPy list, is the
> appropriate place to post this; please let me know if I got it wrong.
>
> I am convolving a 1D data set containing a relatively narrow peak, with a
> relatively narrow Gaussian kernel, in order to emulate the effect of
> atmospheric seeing on astrophysical observations.
>
> I have a 1D data array 45 pixels long, and a Gaussian kernel, and run
> np.convolve(data, kernel, mode='same') on the two arrays, the resulting
> array's peak is shifted relative to the origin. I have attached a plot to
> illustrate.
>
> The original data is shown in blue. When I convolve it with a symmetric
> kernel (black), I get an offset resulting peak (magenta). If I flip the
> kernel -- even though it is perfectly symmetric -- the resulting curve is
> offset in the opposite direction (yellow). However, if I offset the kernel
> so it is centered exactly one pixel below the central value, the output
> array gets centered correct (red), even if I flip the (now no longer
> symmetric) kernel.
>
> This is using Numpy 1.11.3, python 2.7.13, on Anaconda 4.3.0 64-bit on
> Ubuntu 16.10
>
> Using astropy.convolution, reproduces the correct red curve, so I can use
> that for now, but it seems to me this is either a bug or, if it is indeed
> the intended behavior, a word of caution would be merited in the docstring.
>
> Cheers,
>
> Emil Rivera-Thorsen
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
Can you provide an example to replicate?

I haven't seen this behavior, it looks centered to me, at least for odd
window length..
AFAIR, I had to try a bit in the past for how to set the window and
location with even window length.

>>> np.__version__
'1.11.2'

>>> x = np.linspace(-1, 1, 21)
>>> w = stats.norm.pdf(np.linspace(-3, 3, 5))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ -2.e-01,  -1.33368234e-01],
   [ -1.e-01,  -6.66841169e-02],
   [  0.e+00,   1.51788304e-17],
   [  1.e-01,   6.66841169e-02],
   [  2.e-01,   1.33368234e-01]])


>>> x = np.abs(np.linspace(-1, 1, 21))
>>> w = stats.norm.pdf(np.linspace(-3, 3, 4))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ 0.2   ,  0.12320129],
   [ 0.1   ,  0.07392077],
   [ 0.,  0.02552663],
   [ 0.1   ,  0.02552663],
   [ 0.2   ,  0.07392077]])


>>> w = stats.norm.pdf(np.linspace(-3, 3, 5))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ 0.2   ,  0.13336823],
   [ 0.1   ,  0.06757049],
   [ 0.,  0.02767626],
   [ 0.1   ,  0.06757049],
   [ 0.2   ,  0.13336823]])

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread josef . pktd
On Wed, Feb 22, 2017 at 11:57 AM, Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hi Matthew,
> maybe it is not the best place to discuss problems of pandas, but to show
> that I am not missing something, let's consider a simple example.
>
> # simplest DataFrame
> x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20)))
> # simplest indexing. Can you predict results without looking at comments?
> x[:2] # returns two first rows, as expected
> x[[0, 1]]# returns copy of x, whole dataframe
> x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you 
> guess why?)
> x[[0, 1], :] # unhashable type: list
>
>
> just in case - I know about .loc and .iloc, but when you write code with
> many subroutines, you concentrate on numpy inputs, and at some point you
> simply *forget* to convert some of the data you operated with to numpy
> and it *continues* to work, but it yields wrong results (while you tested
> everything, but you tested this for numpy). Checking all the inputs in each
> small subroutine is strange.
>
> Ok, a bit more:
>
> x[x['a'] > 5]# works as expected
> x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be 
> hashed
> lookup = numpy.arange(10)
> x[lookup[x['a']] > 5] # works as expected
> x[lookup[x['a']] > 5, :]  # TypeError: unhashable type: 'numpy.ndarray'
>
> x[lookup]['a']   # indexError
> x['a'][lookup]   # works as expected
>
>
> Now let's go a bit further: train/test splitted the data for machine
> learning (again, the most frequent operation)
>
> from sklearn.model_selection import train_test_split
> x1, x2 = train_test_split(x, random_state=42)
> # compare next to operations with pandas.DataFrame
> col = x1['a']print col[:2]   # first two elementsprint col[[0, 1]]  # 
> doesn't fail (while there in no row with index 0), fills it with NaNprint 
> col[numpy.arange(2)] # same as previous
> print col[col > 4] # as expectedprint col[col.values > 4] # as expectedprint 
> col.values[col > 4] # converts boolean to int, uses int indexing, but at 
> least raises warning
>
>
> Mistakes done by such silent misoperating are not easy to detect (when
> your data pipeline consists of several steps), quite hard to locate the
> source of problem and almost impossible to be sure that you indeed avoided
> all such caveats. Code review turns into paranoidal process (if you care
> about the result, of course).
>
> Things are even worse, because I've demonstrated this for my installation,
> and probably if you run this with some other pandas installation, you get
> some other results (that were really basic operations). So things that
> worked ok in one version, may work different way in the other, this becomes
> completely intractable.
>
> Pandas may be nice, if you need a report, and you need get it done
> tomorrow. Then you'll throw away the code. When we initially used pandas as
> main data storage in yandex/rep, it looked like an good idea, but a year
> later it was obvious this was a wrong decision. In case when you build data
> pipeline / research that should be working several years later (using some
> other installation by someone else), usage of pandas shall be *minimal*.
>
> That's why I am looking for a reliable pandas substitute, which should be:
> - completely consistent with numpy and should fail when this wasn't
> implemented / impossible
> - fewer new abstractions, nobody wants to learn 
> one-more-way-to-manipulate-the-data,
> specifically other researchers
> - it may be less convenient for interactive data mungling
>   - in particular, less methods is ok
> - written code should be interpretable, and hardly can be misinterpreted.
> - not super slow, 1-10 gigabytes datasets are a normal situation
>

Just to the pandas part

statsmodels supported pandas almost from the very beginning (or maybe after
1.5 years) when the new pandas was still very young.

However, what I insisted on is that pandas is in the wrapper/interface
code, and internally only numpy arrays are used. Besides the confusing
"magic" indexing of early pandas, there were a lot of details that silently
produced different results, e.g. default iteration on axis=1, ddof in std
and var =1 instead of numpy =0.

Essentially, every interface corresponds to np.asarry, but we store the
DataFrame information, mainly the index and column names, wo we can return
the appropriate pandas object if a pandas object was used for the input.

This has worked pretty well. Users can have their dataframes, and we have
pure numpy algorithms.

Recently we have started to use pandas inside a few functions or classes
that are less tightly integrated into the overall setup. We also use pandas
for some things that are not convenient or not available in numpy. Our
internal use of pandas groupby and similar will most likely increase over
time.
(One of the main issues we had was date and time index because that was a
moving target in both numpy and pandas.)


One issue for c

Re: [Numpy-discussion] PowerPC testing servers

2017-02-16 Thread josef . pktd
On Thu, Feb 16, 2017 at 3:55 AM, Ralf Gommers 
wrote:

>
>
> On Thu, Feb 16, 2017 at 3:53 PM, Sandro Tosi  wrote:
>
>> > A recent post to the wheel-builders mailing list pointed out some
>> > links to places providing free PowerPC hosting for open source
>> > projects, if they agree to a submitted request:
>>
>> The debian project has some powerpc machines (and we still build numpy
>> on those boxes when i upload a new revision to our archives) and they
>> also have hosts dedicated to let debian developers login and debug
>> issues with their packages on that architecture. I can sponsor access
>> to those machines for some of you, but it is not a place where you can
>> host a CI instance.
>>
>> Just keep it in mind more broadly than powerpc, f.e. these are all the
>> archs where numpy was built after the last upload
>> https://buildd.debian.org/status/package.php?p=python-numpy&;
>> suite=unstable
>> (the grayed out archs are the ones non release critical, so packages
>> are built as best effort and if missing is not a big deal)
>
>
> Thanks Sandro. It looks like even for the release-critical ones, it's just
> the build that has to succeed and failures are not detected? For example,
> armel is green but has 9 failures: https://buildd.debian.org/stat
> us/fetch.php?pkg=python-numpy&arch=armel&ver=1%3A1.12.0-2&
> stamp=1484889563&raw=0
>
> Ralf
>


More general questions on this:

Are there any overviews over which packages in the python for science or
python for data anlaysis areas work correctly on different platforms:
Are there any platforms/processors, besides the standard x32/x54, where
this is important?

for example for statsmodels:
In early releases of statsmodels, maybe 5 to 7 years ago, Yarik and I were
still debugging problems on several machines like ppc and s390x during
Debian testing. Since then I haven't heard much about specific problems.
The current status for statsmodels on Debian machines is pretty mixed. In
several of them some dependencies are not available, in some cases we have
errors that might be caused by errors in dependencies, e.g. cvxopt.

ppc64el test run for statsmodels has a large number of failure
but checking scipy, it looks like it's also not working properly
https://buildd.debian.org/status/fetch.php?pkg=python-
scipy&arch=ppc64el&ver=0.18.1-2&stamp=1477075663&raw=0

In those cases it would be impossible to start debugging, if we would have
to debug through the entire dependency chain.

CI-testing for Windows, Apple and Linux for mainly x64 seems to be working
pretty well, with some delays while version incompatibilities are fixed.
But anything that is not in a CI testing setup looks pretty random to me.

(I'm mainly curious what the status for those machines are. I'm not really
eager to create more debugging work, but sometimes failures on a machine
point to code that is "fragile".)

Josef



>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] offset in fill diagonal

2017-01-21 Thread josef . pktd
Is there a simple way to fill in diagonal elements in an array for other
than main diagonal?

As far as I can see, the diagxxx functions that have offset can only read
and not inplace modify, and the functions for modifying don't have offset
and only allow changing the main diagonal.

Usecase: creating banded matrices (2-D arrays) similar to toeplitz.


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 8:53 AM,  wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>

 User asks: please give me random pairs of the three items, where item 1
 has probability 0.2, item 2 has 0.4, and 3 has 0.4.

 Function returns: random pairs, where if you make many random returned
 results (as in the law of large numbers) and look at the items they
 contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
 0.38333.
 These are not (quite) the probabilities the user asked for...

 Can you explain a sense where the user's requested probabilities (0.2,
 0.4, 0.4) are actually adhered in the results which random.choice returns?

>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>  "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>

BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.


(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef



>
> Josef
>
>
>
>>
>> Nadav.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:

>
> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
> wrote:
>
>> Let's look at what the user asked this function, and what it returns:
>>
>>>
>>> User asks: please give me random pairs of the three items, where item 1
>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>
>>> Function returns: random pairs, where if you make many random returned
>>> results (as in the law of large numbers) and look at the items they
>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>> 0.38333.
>>> These are not (quite) the probabilities the user asked for...
>>>
>>> Can you explain a sense where the user's requested probabilities (0.2,
>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>
>>
>> I think that the question the user is asking by specifying p is a
>> slightly different one:
>>  "please give me random pairs of the three items extracted from a
>> population of 3 items where item 1 has probability of being extracted of
>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>> extracted."
>>
>
> You are right, if that is what the user wants, numpy.random.choice does
> the right thing.
>
> I'm just wondering whether this is actually what users want, and whether
> they understand this is what they are getting.
>
> As I said, I expected it to generate pairs with, empirically, the desired
> distribution of individual items. The documentation of numpy.random.choice
> seemed to me (wrongly) that it implis that that's what it does. So I was
> surprised to realize that it does not.
>

As Alessandro and you showed, the function returns something that makes
sense. If the user wants something different, then they need to look for a
different function, which is however difficult if it doesn't have a
solution in general.

Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or
find it counter intuitive, it is what it is given the sampling scheme.

Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties

Josef



>
> Nadav.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
On Tue, Jan 17, 2017 at 6:58 PM, aleba...@gmail.com 
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El :
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
> Thus, the result we get from random.choice IMHO definitely makes sense. Of
> course, I think we could always discuss about implementing other sampling
> methods if they are useful to some application.
>
>
>>
>>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
>>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that,
>>> once normalised, translate into 1/3 and 2/3 respectively)
>>> Therefore P([1,2]) = 0.7/3 = 0.2
>>> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53
>>>
>>
>> Right, these are the numbers that the algorithm in the current code, and
>> the formula above, produce:
>>
>> P([1,2]) = P([1,3]) = 0.2
>> P([2,3]) = 0.5
>>
>> What I'm puzzled about is that these probabilities do not really fullfill
>> the given probability vector 0.2, 0.4, 0.4...
>> Let me try to explain explain:
>>
>> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
>> items in the first place?
>>
>> One reasonable interpretation is that the user wants in his random picks
>> to see item 1 half the time of item 2 or 3.
>> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
>> it half as often will result in an equal expenditure on each item.
>>
>> If the user randomly picks the items individually (a single item at a
>> time), he indeed gets exactly this distribution: 0.2 of the time item 1,
>> 0.4 of the time item 2, 0.4 of the time item 3.
>>
>> Now, what happens if he picks not individual items, but pairs of
>> different items using numpy.random.choice with two items, replace=false?
>> Suddenly, the distribution of the individual items in the results get
>> skewed: If we look at the expected number of times we'll see each item in
>> one draw of a random pair, we will get:
>>
>> E(1) = P([1,2]) + P([1,3]) = 0.4
>> E(2) = P([1,2]) + P([2,3]) = 0.7
>> E(3) = P([1,3]) + P([2,3]) = 0.7
>>
>> Or renormalizing by dividing by 2:
>>
>> P(1) = 0.23
>> P(2) = 0.38
>> P(3) = 0.38
>>
>> As you can see this is not quite the probabilities we wanted (which were
>> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
>> often than we wanted, and item 2 and 3 were used a bit less often!
>>
>
> p is not the probability of the output but the one of the source finite
> population. I think that if you want to preserve that distribution, as
> Josef pointed out, you have to make extractions independent, that is either
> sample with replacement or approximate an infinite population (that is
> basically the same thing).  But of course in this case you will also end up
> with events [X,X].
>

With replacement and keeping duplicates the results might also be similar
in the pattern of the marginal probabilities
https://onlinecourses.science.psu.edu/stat506/node/17

Another approach in survey sampling is also to drop duplicates in with
replacement sampling, but then the sample size itself is random. (again I
didn't try to understand the small print)

(another related aside: The problem with discrete sample space in small
samples shows up also in calculating hypothesis tests, e.g. fisher's exact
or similar. Because, we only get a few discrete possibilities in the sample
space, it is not possible to construct a test that has exactly the desired
type 1 error.)


Josef



>
>
>> So that brought my question of why we consider these numbers right.
>>
>> In this example, it's actually possible to get the right item
>> distribution, if we pick the pair outcomes with the following probabilties:
>>
>>P([1,2]) = 0.2(not 0.23 as above)
>>P([1,3]) = 0.2
>>P([2,3]) = 0.6(not 0.53 as above)
>>
>> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>>
>> Interestingly, fixing things like I suggest is not always possible.
>> Consider a different probability-vector example for three items - 0.99,
>> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking
>> pairs from these three items, *each* returned pair will inevitably contain
>> one of the two very-low-probability items, so each of those items will
>> appear in roughly 

Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
On Tue, Jan 17, 2017 at 4:13 PM, Nadav Har'El  wrote:
>
>
> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
wrote:
>>
>> Hi Nadav,
>>
>> I may be wrong, but I think that the result of the current
implementation is actually the expected one.
>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
0.4
>>
>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>
>
> Yes, this formula does fit well with the actual algorithm in the code.
But, my question is *why* we want this formula to be correct:
>
>>
>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that,
once normalised, translate into 1/3 and 2/3 respectively)
>> Therefore P([1,2]) = 0.7/3 = 0.2
>> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53
>
>
> Right, these are the numbers that the algorithm in the current code, and
the formula above, produce:
>
> P([1,2]) = P([1,3]) = 0.2
> P([2,3]) = 0.5
>
> What I'm puzzled about is that these probabilities do not really fullfill
the given probability vector 0.2, 0.4, 0.4...
> Let me try to explain explain:
>
> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
items in the first place?
>
> One reasonable interpretation is that the user wants in his random picks
to see item 1 half the time of item 2 or 3.
> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
it half as often will result in an equal expenditure on each item.
>
> If the user randomly picks the items individually (a single item at a
time), he indeed gets exactly this distribution: 0.2 of the time item 1,
0.4 of the time item 2, 0.4 of the time item 3.
>
> Now, what happens if he picks not individual items, but pairs of
different items using numpy.random.choice with two items, replace=false?
> Suddenly, the distribution of the individual items in the results get
skewed: If we look at the expected number of times we'll see each item in
one draw of a random pair, we will get:
>
> E(1) = P([1,2]) + P([1,3]) = 0.4
> E(2) = P([1,2]) + P([2,3]) = 0.7
> E(3) = P([1,3]) + P([2,3]) = 0.7
>
> Or renormalizing by dividing by 2:
>
> P(1) = 0.23
> P(2) = 0.38
> P(3) = 0.38
>
> As you can see this is not quite the probabilities we wanted (which were
0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
often than we wanted, and item 2 and 3 were used a bit less often!
>
> So that brought my question of why we consider these numbers right.
>
> In this example, it's actually possible to get the right item
distribution, if we pick the pair outcomes with the following probabilties:
>
>P([1,2]) = 0.2(not 0.23 as above)
>P([1,3]) = 0.2
>P([2,3]) = 0.6(not 0.53 as above)
>
> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>
> Interestingly, fixing things like I suggest is not always possible.
Consider a different probability-vector example for three items - 0.99,
0.005, 0.005. Now, no matter which algorithm we use for randomly picking
pairs from these three items, *each* returned pair will inevitably contain
one of the two very-low-probability items, so each of those items will
appear in roughly half the pairs, instead of in a vanishingly small
percentage as we hoped.
>
> But in other choices of probabilities (like the one in my original
example), there is a solution. For 2-out-of-3 sampling we can actually show
a system of three linear equations in three variables, so there is always
one solution but if this solution has components not valid as probabilities
(not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
0.005 example.


I think the underlying problem is that in the sampling space the events (1,
2) (1, 3) (2, 3) are correlated and because of the discreteness an
arbitrary marginal distribution on the individual events 1, 2, 3 is not
possible.

related aside:
I'm not able (or willing to spend the time) on the math, but I just went
through something similar for survey sampling in finite population (e.g.
survey two out of 3 individuals, where 3 is the population), leading to the
Horvitz–Thompson estimator. The books have chapters on different sampling
schemes and derivation of the marginal and joint probability to be surveyed.

(I gave up on sampling without replacement, and assume we have a large
population where it doesn't make a difference.)

In some of the sampling schemes they pick sequentially and adjust the
probabilities for the remaining individuals. That seems to provide more
flexibility to create a desired or optimal sampling scheme.

Josef




>
>
>
>>
>> What am I missing?
>>
>> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 :
>>>
>>> Hi, I'm looking for a way to find a random sample of C different items
out
>>> of N items, with a some desired probabilty Pi for each item i.
>>>
>>> I saw that numpy has a fun

Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread josef . pktd
On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat  wrote:

> > Note that you're proposing a new scipy feature (right?) on the numpy
> list
>
> > This sounds like a good idea to me. As a former heavy Matlab user I
> remember a lot of things to dislike, but "\" behavior was quite nice.
>
> Correct, I am not sure where this might go in. It seemed like a NumPy
> array operation (touching array elements rapidly etc. can also be added for
> similar functionalities other than solve) hence the NumPy list. But of
> course it can be pushed as an exclusive SciPy feature. I'm not sure what
> the outlook on np.linalg.solve is.
>
>
> > How much is a noticeable slowdown? Note that we still have the current
> interfaces available for users that know what they need, so a nice
> convenience function that is say 5-10% slower would not be the end of the
> world.
>
> the fastest case was around 150-400% slower but of course it might be the
> case that I'm not using the fastest methods. It was mostly shuffling things
> around and using np.any on them in the pure python3 case. I will cook up
> something again for the baseline as soon as I have time.
>
>
>
All this checks sound a bit expensive, if we have almost always completely
unstructured arrays that don't satisfy any special matrix pattern.

In analogy to the type proliferation in Julia to handle those cases: Is
there a way to attach information to numpy arrays that for example signals
that a 2d array is hermitian, banded or diagonal or ...?

(After second thought: maybe completely unstructured is not too expensive
to detect if the checks are short-circuited, one off diagonal element
nonzero - not diagonal, two opposite diagonal different - not symmetric,
...)

Josef




>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating matrices.

2017-01-06 Thread josef . pktd
On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers  wrote:

>
>
> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey 
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers 
>> wrote:
>>
>>> This sounds like a reasonable idea. Timeline could be something like:
>>>
>>> 1. Now: create new package, deprecate np.matrix in docs.
>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>> 3. After 2020: remove matrix from numpy.
>>>
>>> Ralf
>>>
>>
>> I think this sounds reasonable, and reminds me of the deliberate
>> deprecation process taken for scipy.weave. I guess we'll see how successful
>> it was when 0.19 is released.
>>
>> The major problem I have with removing numpy matrices is the effect on
>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>> often produces numpy.matrix results when densifying. The two are coupled
>> tightly enough that if numpy matrices go away, all of the existing sparse
>> matrix classes will have to go at the same time.
>>
>> I don't think that would be the end of the world,
>>
>
> Not the end of the world literally, but the impact would be pretty major.
> I think we're stuck with scipy.sparse, and may at some point will add a new
> sparse *array* implementation next to it. For scipy we will have to add a
> dependency on the new npmatrix package or vendor it.
>

That sounds to me like moving maintenance of numpy.matrix from numpy to
scipy, if scipy.sparse is one of the main users and still depends on it.

Josef



>
> Ralf
>
>
>
>> but it's definitely something that should happen while scipy is still
>> pre-1.0, if it's ever going to happen.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating matrices.

2017-01-02 Thread josef . pktd
On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers  wrote:

>
>
> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> Just throwing this click bait out for discussion. Now that the `@`
>> operator is available and things seem to be moving towards Python 3,
>> especially in the classroom, we should consider the real possibility of
>> deprecating the matrix type and later removing it. No doubt there are old
>> scripts that require them, but older versions of numpy are available for
>> those who need to run old scripts.
>>
>> Thoughts?
>>
>
> Clearly deprecate in the docs now, and warn only later imho. We can't warn
> before we have a good solution for scipy.sparse matrices, which have matrix
> semantics and return matrix instances.
>
> Ralf
>

How about dropping python 2 support at the same time, then we can all be in
a @ world.

Josef



>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:57 PM, Charles R Harris  wrote:

>
>
> On Wed, Oct 26, 2016 at 1:39 PM,  wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer 
>>> wrote:
>>>
 I am also concerned about adding more special cases for NumPy scalars
 vs arrays. These cases are already confusing (e.g., making no distinction
 between 0d arrays and scalars) and poorly documented.

 On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:

> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>  wrote:
> > Hi All,
> >
> > I've been thinking about this some (a lot) more and have an alternate
> > proposal for the behavior of the `**` operator
> >
> > if both base and power are numpy/python scalar integers, convert to
> python
> > integers and call the `**` operator. That would solve both the
> precision and
> > compatibility problems and I think is the option of least surprise.
> For
> > those who need type preservation and modular arithmetic, the np.power
> > function remains, although the type conversions can be surpirising
> as it
> > seems that the base and power should  play different roles in
> determining
> > the type, at least to me.
> > Array, 0-d or not, are treated differently from scalars and integers
> raised
> > to negative integer powers always raise an error.
> >
> > I think this solves most problems and would not be difficult to
> implement.
> >
> > Thoughts?
>
> My main concern about this is that it adds more special cases to numpy
> scalars, and a new behavioral deviation between 0d arrays and scalars,
> when ideally we should be trying to reduce the
> duplication/discrepancies between these. It's also inconsistent with
> how other operations on integer scalars work, e.g. regular addition
> overflows rather than promoting to Python int:
>
> In [8]: np.int64(2 ** 63 - 1) + 1
> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
> overflow encountered in long_scalars
>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
> Out[8]: -9223372036854775808
>
> So I'm inclined to try and keep it simple, like in your previous
> proposal... theoretically of course it would be nice to have the
> perfect solution here, but at this point it feels like we might be
> overthinking this trying to get that last 1% of improvement. The thing
> where 2 ** -1 returns 0 is just broken and bites people so we should
> definitely fix it, but beyond that I'm not sure it really matters
> *that* much what we do, and "special cases aren't special enough to
> break the rules" and all that.
>
>
>>> What I have been concerned about are the follow combinations that
>>> currently return floats
>>>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>>
>>> The other combinations of signed and unsigned integers to signed powers
>>> currently raise ValueError due to the change to the power ufunc. The
>>> exceptions that aren't covered by uint64 + signed (which won't change) seem
>>> to occur when the exponent can be safely cast to the base type. I suspect
>>> that people have already come to depend on that, especially as python
>>> integers on 64 bit linux convert to int64. So in those cases we should
>>> perhaps raise a FutureWarning instead of an error.
>>>
>>
>>
>> >>> np.int64(2)**np.array(-1, np.int64)
>> 0.5
>> >>> np.__version__
>> '1.10.4'
>> >>> np.int64(2)**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>> >>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>> >>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
>> array([ 0.5,  4. ])
>> >>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>>
>>
>> (IMO: If you have to break backwards compatibility, break forwards not
>> backwards.)
>>
>
> Current master is different. I'm not too worried in the array cases as the
> results for negative exponents were zero except then raising -1 to a power.
> Since that result is incorrect raising an error  falls on the fine line
> between bug fix and compatibility break. If the pre-rele

Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:39 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris
>  wrote:
> [...]
> > What I have been concerned about are the follow combinations that
> currently
> > return floats
> >
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
>
> What's this referring to? For both arrays and scalars I get:
>
> In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype
> Out[8]: dtype('int8')
>
> In [9]: (np.int8(2) ** np.int8(2)).dtype
> Out[9]: dtype('int8')
>


>>> (np.array([2], dtype=np.int8) ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('int8')
>>> (np.array([2], dtype=np.int8)[0] ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('float32')

>>> (np.int8(2)**np.int8(-1)).dtype
dtype('float32')
>>> (np.int8(2)**np.int8(2)).dtype
dtype('int8')

The last one looks like value dependent scalar dtype

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris  wrote:

>
>
> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer  wrote:
>
>> I am also concerned about adding more special cases for NumPy scalars vs
>> arrays. These cases are already confusing (e.g., making no distinction
>> between 0d arrays and scalars) and poorly documented.
>>
>> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:
>>
>>> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>>>  wrote:
>>> > Hi All,
>>> >
>>> > I've been thinking about this some (a lot) more and have an alternate
>>> > proposal for the behavior of the `**` operator
>>> >
>>> > if both base and power are numpy/python scalar integers, convert to
>>> python
>>> > integers and call the `**` operator. That would solve both the
>>> precision and
>>> > compatibility problems and I think is the option of least surprise. For
>>> > those who need type preservation and modular arithmetic, the np.power
>>> > function remains, although the type conversions can be surpirising as
>>> it
>>> > seems that the base and power should  play different roles in
>>> determining
>>> > the type, at least to me.
>>> > Array, 0-d or not, are treated differently from scalars and integers
>>> raised
>>> > to negative integer powers always raise an error.
>>> >
>>> > I think this solves most problems and would not be difficult to
>>> implement.
>>> >
>>> > Thoughts?
>>>
>>> My main concern about this is that it adds more special cases to numpy
>>> scalars, and a new behavioral deviation between 0d arrays and scalars,
>>> when ideally we should be trying to reduce the
>>> duplication/discrepancies between these. It's also inconsistent with
>>> how other operations on integer scalars work, e.g. regular addition
>>> overflows rather than promoting to Python int:
>>>
>>> In [8]: np.int64(2 ** 63 - 1) + 1
>>> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
>>> overflow encountered in long_scalars
>>>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
>>> Out[8]: -9223372036854775808
>>>
>>> So I'm inclined to try and keep it simple, like in your previous
>>> proposal... theoretically of course it would be nice to have the
>>> perfect solution here, but at this point it feels like we might be
>>> overthinking this trying to get that last 1% of improvement. The thing
>>> where 2 ** -1 returns 0 is just broken and bites people so we should
>>> definitely fix it, but beyond that I'm not sure it really matters
>>> *that* much what we do, and "special cases aren't special enough to
>>> break the rules" and all that.
>>>
>>>
> What I have been concerned about are the follow combinations that
> currently return floats
>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
>
> The other combinations of signed and unsigned integers to signed powers
> currently raise ValueError due to the change to the power ufunc. The
> exceptions that aren't covered by uint64 + signed (which won't change) seem
> to occur when the exponent can be safely cast to the base type. I suspect
> that people have already come to depend on that, especially as python
> integers on 64 bit linux convert to int64. So in those cases we should
> perhaps raise a FutureWarning instead of an error.
>


>>> np.int64(2)**np.array(-1, np.int64)
0.5
>>> np.__version__
'1.10.4'
>>> np.int64(2)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
array([ 0.5,  4. ])
>>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)


(IMO: If you have to break backwards compatibility, break forwards not
backwards.)


Josef
http://www.stanlaurelandoliverhardy.com/nicemess.htm



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

>
>
> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith  wrote:
>
>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer  wrote:
>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>> >  wrote:
>> >>
>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer 
>> wrote:
>> >>>
>> >>> I wonder if the goals of this addition could be achieved by simply
>> adding
>> >>> an optional `cov` argument
>> >>>
>> >>> to np.corr, which would provide a pre-computed covariance.
>> >>
>> >>
>> >> That's a fair suggestion which I'm happy to switch to. This eliminates
>> the
>> >> need for two new functions.
>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>> returns
>> >> a tuple (corr, cov) instead.
>> >>
>> >>>
>> >>>
>> >>> Either way, `covcorr` feels like a helper function that could exist in
>> >>> user code rather than numpy proper.
>> >>
>> >>
>> >> The user would have to re-implement the part that converts the
>> covariance
>> >> matrix to a correlation
>> >> coefficient. I made this PR to avoid that code duplication.
>> >
>> >
>> > With the API I was envisioning (or even your proposed API, for that
>> matter),
>> > this function would only be a few lines, e.g.,
>> >
>> > def covcorr(x):
>> > cov = np.cov(x)
>> > corr = np.corrcoef(x, cov=cov)
>>
>> IIUC, if you have a covariance matrix then you can compute the
>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>> most sense is just to expose something like the covtocorr function
>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>> can use that to build their own covcorr or whatever if they want it.
>>
>
> Right, agreed, this is why I said `x` becomes redundant when `cov` is
> specified
> when calling `numpy.corrcoef`.  So we have two alternatives:
>
> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
> that lets
> one obtain a tuple containing the covariance and the correlation matrices
> in the same call
> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
> better
> name) but remove `np.covcorr` since this is easy for the user to add.
>
> My preference is option 2.
>

cov2corr is a useful function
http://www.statsmodels.org/dev/generated/statsmodels.stats.moment_helpers.cov2corr.html
I also wrote the inverse function corr2cov, but AFAIR use it only in some
test cases.


I don't think adding any of the options to corrcoef or covcor is useful
since there is no computational advantage to it.
What I'm missing are functions that return the intermediate results, e.g.
var and mean or cov and mean.

(For statsmodels I decided to return mean and cov or mean and var in the
related functions. Some R packages return the mean as an option.)

Josef



>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Tue, Oct 18, 2016 at 1:30 PM,   wrote:
> On Tue, Oct 18, 2016 at 1:25 PM,   wrote:
>> On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
>>  wrote:
>>> Hi,
>>>
>>>
>>> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
 When I have similar situations, I usually interpolate between the valid 
 values. I assume there are a lot of use cases for convolutions but I have 
 difficulties imagining that ignoring a missing value and, for the purpose 
 of the computation, treating it as zero is useful in many of them.
>>> When estimating the autocorrelation of a signal, it make sense to drop
>>> missing pairs of values. Only in this use case, it opens the question of
>>> correcting or not correcting for the number of missing elements  when
>>> computing the mean. I don't remember what R function "acf" is doing.
>
> as aside: statsmodels has now an option for acf and similar
>
> missing : str
> A string in ['none', 'raise', 'conservative', 'drop']
> specifying how the NaNs
> are to be treated.

aside to the aside: statsmodels was just catching up in this

The original for masked array acf including correct counting of "valid" terms is

https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/lib/avcf.py

(which I looked at way before statsmodels had any acf)

Josef

>
> Josef
>
>>>
>>>
>>> Also, coming back to the initial question, I feel that it is necessary
>>> that the name "mask" (or "na" or similar) appears in the parameter name.
>>> Otherwise, people will wonder : "what on earth is contagious/being
>>> propagated"
>>>
>>> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>>>
>>> If I remember well, in R it is dropna. It would be nice if the boolean
>>> switch followed the same logic.
>>>
>>> Now of course the convolution function is more general than just
>>> autocorrelation...
>>
>> I think "drop" or "ignore" is too generic, for correlation it would be
>> for example ignore pairs versus ignore cases.
>>
>> To me propagate sounds ok to me, but something with `valid` might be
>> more explicit for convolution or `correlate`, however `valid` also
>> refers to the end points, so maybe valid_na or valid_masked=True
>>
>> Josef
>>
>>>
>>> best,
>>> Pierre
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Tue, Oct 18, 2016 at 1:25 PM,   wrote:
> On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
>  wrote:
>> Hi,
>>
>>
>> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
>>> When I have similar situations, I usually interpolate between the valid 
>>> values. I assume there are a lot of use cases for convolutions but I have 
>>> difficulties imagining that ignoring a missing value and, for the purpose 
>>> of the computation, treating it as zero is useful in many of them.
>> When estimating the autocorrelation of a signal, it make sense to drop
>> missing pairs of values. Only in this use case, it opens the question of
>> correcting or not correcting for the number of missing elements  when
>> computing the mean. I don't remember what R function "acf" is doing.

as aside: statsmodels has now an option for acf and similar

missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.

Josef

>>
>>
>> Also, coming back to the initial question, I feel that it is necessary
>> that the name "mask" (or "na" or similar) appears in the parameter name.
>> Otherwise, people will wonder : "what on earth is contagious/being
>> propagated"
>>
>> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>>
>> If I remember well, in R it is dropna. It would be nice if the boolean
>> switch followed the same logic.
>>
>> Now of course the convolution function is more general than just
>> autocorrelation...
>
> I think "drop" or "ignore" is too generic, for correlation it would be
> for example ignore pairs versus ignore cases.
>
> To me propagate sounds ok to me, but something with `valid` might be
> more explicit for convolution or `correlate`, however `valid` also
> refers to the end points, so maybe valid_na or valid_masked=True
>
> Josef
>
>>
>> best,
>> Pierre
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
 wrote:
> Hi,
>
>
> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
>> When I have similar situations, I usually interpolate between the valid 
>> values. I assume there are a lot of use cases for convolutions but I have 
>> difficulties imagining that ignoring a missing value and, for the purpose of 
>> the computation, treating it as zero is useful in many of them.
> When estimating the autocorrelation of a signal, it make sense to drop
> missing pairs of values. Only in this use case, it opens the question of
> correcting or not correcting for the number of missing elements  when
> computing the mean. I don't remember what R function "acf" is doing.
>
>
> Also, coming back to the initial question, I feel that it is necessary
> that the name "mask" (or "na" or similar) appears in the parameter name.
> Otherwise, people will wonder : "what on earth is contagious/being
> propagated"
>
> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>
> If I remember well, in R it is dropna. It would be nice if the boolean
> switch followed the same logic.
>
> Now of course the convolution function is more general than just
> autocorrelation...

I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.

To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True

Josef

>
> best,
> Pierre
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to negative integer powers, time for a decision.

2016-10-07 Thread josef . pktd
On Fri, Oct 7, 2016 at 9:12 PM, Charles R Harris
 wrote:
> Hi All,
>
> The time for NumPy 1.12.0 approaches and I like to have a final decision on
> the treatment of integers to negative integer powers with the `**` operator.
> The two alternatives looked to be
>
> Raise an error for arrays and numpy scalars, including 1 and -1 to negative
> powers.
>
> Pluses
>
> Backward compatible
> Allows common powers to be integer, e.g., arange(3)**2
> Consistent with inplace operators
> Fixes current wrong behavior.
> Preserves type
>
>
> Minuses
>
> Integer overflow
> Computational inconvenience
> Inconsistent with Python integers
>
>
> Always return a float
>
> Pluses
>
> Computational convenience
>
>
> Minuses
>
> Loss of type
> Possible backward incompatibilities
> Not applicable to inplace operators
>
>
>
> Thoughts?

2: +1
I'm still in favor of number 2: less buggy code and less mental
gymnastics (watch out for that int, or which int do I need)

(upcasting is not applicable for any inplace operators, AFAIU   *=0.5 ?
zz = np.arange(5)
zz**(-1)
zz *= 0.5

tried in
>>> np.__version__
'1.9.2rc1'
>>> np.__version__
'1.10.4'
backwards compatibility ?
)


Josef


>
> Chuck
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Which NumPy/Numpy/numpy spelling?

2016-08-29 Thread josef . pktd
https://mail.scipy.org/pipermail/scipy-user/2010-June/025756.html

"NumPy and SciPy to refer to the projects. numpy and scipy to refer to
the packages, specifically. When in doubt, use the former."

I thought there was also another discussion about capital letters, but
I don't find it.

Josef


On Mon, Aug 29, 2016 at 9:22 AM, Bartosz Telenczuk  wrote:
> Hi,
>
> I would not mind any choice as long as it's consistent.
>
> I agree that using all-lowercase spelling may avoid some common errors. 
> However,
> PEP8 requires all module/package names to be lower case [1].  If we force the
> name of the library and the corresponding package to be the same, all Python
> libraries would be named in lowercase. This would not be the best choice for
> libraries, which have multi-component names (like NumPy = Numerical Python).
>
> Note also that both the Wikipedia page [2] and the official NumPy logo [3] use
> "NumPy" spelling.
>
> Some other popular [4] libraries use similar dichotomies:
>
> - Django - import django
> - Cython - import cython
> - PyYAML - import yaml
> - scikit-learn - import sklearn
>
> On the other hand all standard Python libraries are lower-case named.
>
> Cheers,
>
> Bartosz
>
> [1] https://www.python.org/dev/peps/pep-0008/#package-and-module-names
> [2] https://en.wikipedia.org/wiki/NumPy
> [3] http://www.numpy.org/_static/numpy_logo.png
> [4] http://pypi-ranking.info/alltime
>
>> On 08/29/2016 07:43 AM, m...@telenczuk.pl wrote:
>> > What is the official spelling of NumPy/Numpy/numpy?
>>
>> IMHO it should be written numpy, because ...
>>
>>  >>> import NumPy
>> Traceback (most recent call last):
>>File "", line 1, in 
>> ImportError: No module named NumPy
>>  >>> import Numpy
>> Traceback (most recent call last):
>>File "", line 1, in 
>> ImportError: No module named Numpy
>>  >>> import numpy
>>  >>>
>>
>> Phil
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] StackOverflow documentation

2016-07-21 Thread josef . pktd
On Thu, Jul 21, 2016 at 3:10 PM, Pauli Virtanen  wrote:
> Thu, 21 Jul 2016 16:24:15 +0200, mail kirjoitti:
> [clip]
>> Since it is run by the community, perhaps it's not a bad idea to
>> encourage people to share their examples.
>
> I would perhaps rather encourage people to improve the "Numpy User Guide"
> or the main documentation. Of course, working on that requires a somewhat
> different level of committment than editing what's essentially
> Stackexchange-provided wiki (where content gets relicensed with
> attribution clauses where you have to reference stackexchange and not the
> original author directly).


The way it looks right now, it is more of an example collection by
topic, similar to the old scipy wiki examples for numpy.
http://stackoverflow.com/documentation/python/196/comprehensions#t=201607212153155953853

It doesn't look like it will be a substitute for proper API
documentation, docstrings with numpy standard.
Longer examples or recipes on stackoverflow also have the problem that
code doesn't have a different license and falls under the
documentation license.
So, I don't think it will become a substitute for good notebook or
blog post examples either.

(license is the usual one way street, they copy, we cannot copy back
under our license)

Josef

>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread josef . pktd
On Wed, Jul 6, 2016 at 3:29 AM,  wrote:

>
>
> On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommers 
> wrote:
>
>>
>>
>> On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith  wrote:
>>
>> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>>> > function np.atleast_nd in PR#7804
>>> > (https://github.com/numpy/numpy/pull/7804).
>>> >
>>> > As a result of this PR, I have a couple of questions about
>>> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>>> > the dimensions: If the input is 1D, it prepends and appends a size-1
>>> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>>> > inconsistent with `np.atleast_2d`, which always prepends (as does
>>> > `np.atleast_nd`).
>>> >
>>> >   - Is there any reason for this behavior?
>>> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>>> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>>> > be a slight API change since the output would not be exactly the same.
>>>
>>> Changing atleast_3d seems likely to break a bunch of stuff...
>>>
>>> Beyond that, I find it hard to have an opinion about the best design for
>>> these functions, because I don't think I've ever encountered a situation
>>> where they were actually what I wanted. I'm not a big fan of coercing
>>> dimensions in the first place, for the usual "refuse to guess" reasons. And
>>> then generally if I do want to coerce an array to another dimension, then I
>>> have some opinion about where the new dimensions should go, and/or I have
>>> some opinion about the minimum acceptable starting dimension, and/or I have
>>> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>>> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>>> requirements list.)
>>>
>>> I don't know how typical I am in this. But it does make me wonder if the
>>> atleast_* functions act as an attractive nuisance, where new users take
>>> their presence as an implicit recommendation that they are actually a
>>> useful thing to reach for, even though they... aren't that. And maybe we
>>> should be recommending folk move away from them rather than trying to
>>> extend them further?
>>>
>>> Or maybe they're totally useful and I'm just missing it. What's your use
>>> case that motivates atleast_nd?
>>>
>> I think you're just missing it:) atleast_1d/2d are used quite a bit in
>> Scipy and Statsmodels (those are the only ones I checked), and in the large
>> majority of cases it's the best thing to use there. There's a bunch of
>> atleast_2d calls with a transpose appended because the input needs to be
>> treated as columns instead of rows, but that's still efficient and readable
>> enough.
>>
>
>
> As Ralph pointed out its usage in statsmodels. I do find them useful as
> replacement for several lines of ifs and reshapes
>
> We stilll need in many cases the atleast_2d_cols, that appends the newaxis
> if necessary.
>
> roughly the equivalent of
>
> if x.ndim == 1:
> x = x[:, None]
> else:
> x = np.atleast_2d(x)
>
> Josef
>
>
>>
>> For 3D/nD I can see that you'd need more control over where the
>> dimensions go, but 1D/2D are fine.
>>
>

statsmodels has currently very little code with ndim >2, so I have no
overview of possible use cases, but it would be necessary to have full
control over the added axis since axis have a strict meaning and stats
still prefer Fortran order to default numpy/C ordering.

Josef



>
>>
>> Ralf
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread josef . pktd
On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommers  wrote:

>
>
> On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith  wrote:
>
> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" 
>> wrote:
>> >
>> > Hi,
>> >
>> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>> > function np.atleast_nd in PR#7804
>> > (https://github.com/numpy/numpy/pull/7804).
>> >
>> > As a result of this PR, I have a couple of questions about
>> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>> > the dimensions: If the input is 1D, it prepends and appends a size-1
>> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>> > inconsistent with `np.atleast_2d`, which always prepends (as does
>> > `np.atleast_nd`).
>> >
>> >   - Is there any reason for this behavior?
>> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>> > be a slight API change since the output would not be exactly the same.
>>
>> Changing atleast_3d seems likely to break a bunch of stuff...
>>
>> Beyond that, I find it hard to have an opinion about the best design for
>> these functions, because I don't think I've ever encountered a situation
>> where they were actually what I wanted. I'm not a big fan of coercing
>> dimensions in the first place, for the usual "refuse to guess" reasons. And
>> then generally if I do want to coerce an array to another dimension, then I
>> have some opinion about where the new dimensions should go, and/or I have
>> some opinion about the minimum acceptable starting dimension, and/or I have
>> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>> requirements list.)
>>
>> I don't know how typical I am in this. But it does make me wonder if the
>> atleast_* functions act as an attractive nuisance, where new users take
>> their presence as an implicit recommendation that they are actually a
>> useful thing to reach for, even though they... aren't that. And maybe we
>> should be recommending folk move away from them rather than trying to
>> extend them further?
>>
>> Or maybe they're totally useful and I'm just missing it. What's your use
>> case that motivates atleast_nd?
>>
> I think you're just missing it:) atleast_1d/2d are used quite a bit in
> Scipy and Statsmodels (those are the only ones I checked), and in the large
> majority of cases it's the best thing to use there. There's a bunch of
> atleast_2d calls with a transpose appended because the input needs to be
> treated as columns instead of rows, but that's still efficient and readable
> enough.
>


As Ralph pointed out its usage in statsmodels. I do find them useful as
replacement for several lines of ifs and reshapes

We stilll need in many cases the atleast_2d_cols, that appends the newaxis
if necessary.

roughly the equivalent of

if x.ndim == 1:
x = x[:, None]
else:
x = np.atleast_2d(x)

Josef


>
> For 3D/nD I can see that you'd need more control over where the dimensions
> go, but 1D/2D are fine.
>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.dtype has the wrong size, try recompiling

2016-07-05 Thread josef . pktd
On Tue, Jul 5, 2016 at 4:22 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> This question on Stack Overflow:
>
>
> http://stackoverflow.com/questions/38197086/sklearn-numpy-dtype-has-the-wrong-size-try-recompiling-in-both-pycharm-and-te
>
> If I remember correctly this has something to do with Cython, right? Can't
> remeber the details though, can someone help that poor soul out?
>


from what I understood many years ago (based mainly on what David
Cournapeau was saying)
http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling

Josef


>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread josef . pktd
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesias 
wrote:

> On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com)
> wrote:
>
> Hashing it probably wouldn't work, too
> great a chance for collisions.
>
>
> If the string is ASCII, you can always interpret the bytes as part of an 8
> byte integer. Or, you can map unique values to consecutive integers.
>
IIUC

np.nonzero(a[1] == a[:-1])   gives all changes independent of dtype. add or
remove a 1 to adjust which element is indexed.

(IIRC from a long time ago, arraysetops used/uses something like this.)

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-20 Thread josef . pktd
On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac  wrote:
> On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote:
>>
>> 1. What in principle is the best return type for int ** int (which
>> Josef I think most properly rephrased as whether `**` should be
>> thought of as a float operator, like `/` in python3 and `sqrt` etc.);
>
>
>
> Perhaps the question is somewhat different.  Maybe it is: what type
> should a user expect when the exponent is a Python int?  The obvious
> choices seem to be an object array of Python ints, or an array of
> floats.  So far, nobody has proposed the former, and concerns have
> been expressed about the latter.  More important, either would break
> the rule that the scalar type is not important in array operations,
> which seems like a good general rule (useful and easy to remember).
>
> How much commitment is there to such a rule?  E.g.,
> np.int64(2**7)*np.arange(5,dtype=np.int8)
> violates this.  One thing that has come out of this
> discussion for me is that the actual rules in play are
> hard to keep track of.  Are they all written down in
> one place?
>
> I suspect there is general support for the idea that if someone
> explicitly specifies the same dtype for the base and the
> exponent then the result should also have that dtype.
> I think this is already true for array exponentiation
> and for scalar exponentiation.
>
> One other thing that a user might expect, I believe, is that
> any type promotion rules for scalars and arrays will be the same.
> This is not currently the case, and that feels like an
> inconsistency.  But is it an inconsistency?  If the rule is that
> that array type dominates the scalar type, that may
> be understandable, but then it should be a firm rule.
> In this case, an exponent that is a Python int should not
> affect the dtype of the (array) result.
>
> In sum, as a user, I've come around to Chuck's original proposal:
> integers raised to negative integer powers raise an error.
> My reason for coming around is that I believe it meshes
> well with a general rule that in binary operations the
> scalar dtypes should not influence the dtype of an array result.
> Otoh, it is unclear to me how much commitment there is to that rule.
>
> Thanks in advance to anyone who can help me understand better
> the issues in play.

the main thing I get out of the discussion in this thread is that this
is way to complicated.

which ints do I have?

is it python or one of the many numpy int types, or two different
(u)int types or maybe one is a scalar so it shouldn't count?


scalar dominates here

>>> (np.ones(5, np.int8) *1.0).dtype
dtype('float64')

otherwise a huge amount of code would be broken that uses the *1. trick

Josef


>
> Cheers,
> Alan Isaac
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 12:07 PM,  wrote:

>
>
> On Mon, Jun 13, 2016 at 11:51 AM,  wrote:
>
>>
>>
>> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
>> wrote:
>>
>>> On Mon, 13 Jun 2016 10:49:44 -0400
>>> josef.p...@gmail.com wrote:
>>> >
>>> > My argument is that `**` is like integer division and sqrt where the
>>> domain
>>> > where integer return are the correct numbers is too small to avoid
>>> > headaches by users.
>>>
>>> float64 has less integer precision than int64:
>>>
>>> >>> math.pow(3, 39) == 3**39
>>> False
>>> >>> np.int64(3)**39 == 3**39
>>> True
>>>
>>
>> but if a user does this, then ???  (headaches or head scratching)
>>
>> >>> np.array([3])**39
>> RuntimeWarning: invalid value encountered in power
>>
>> array([-2147483648], dtype=int32)
>>
>
> I forgot to add
>
> the real headaches start in the second call, when we don't get the
> RuntimeWarning anymore
>
> >>> np.array([4])**39
> array([-2147483648], dtype=int32)
>
>
> ("Now, why do I owe so much money, when I made a huge profit all year." )
>

(grumpy off-topic complaint:
The Canadian tax system is like this. They make a mistake in transferring
information to a new computerized system, and then they send a bill for
taxes based on reassessment of something that happened 5 years ago because
their computerized record is wrong.
)



>
> Josef
>
>
>
>>
>> Josef
>>
>>
>>>
>>>
>>> (as a sidenote, np.float64's equality operator seems to be slightly
>>> broken:
>>>
>>> >>> np.float64(3)**39 == 3**39
>>> True
>>> >>> int(np.float64(3)**39) == 3**39
>>> False
>>> >>> float(np.float64(3)**39) == 3**39
>>> False
>>> )
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 11:51 AM,  wrote:

>
>
> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
> wrote:
>
>> On Mon, 13 Jun 2016 10:49:44 -0400
>> josef.p...@gmail.com wrote:
>> >
>> > My argument is that `**` is like integer division and sqrt where the
>> domain
>> > where integer return are the correct numbers is too small to avoid
>> > headaches by users.
>>
>> float64 has less integer precision than int64:
>>
>> >>> math.pow(3, 39) == 3**39
>> False
>> >>> np.int64(3)**39 == 3**39
>> True
>>
>
> but if a user does this, then ???  (headaches or head scratching)
>
> >>> np.array([3])**39
> RuntimeWarning: invalid value encountered in power
>
> array([-2147483648], dtype=int32)
>

I forgot to add

the real headaches start in the second call, when we don't get the
RuntimeWarning anymore

>>> np.array([4])**39
array([-2147483648], dtype=int32)


("Now, why do I owe so much money, when I made a huge profit all year." )

Josef



>
> Josef
>
>
>>
>>
>> (as a sidenote, np.float64's equality operator seems to be slightly
>> broken:
>>
>> >>> np.float64(3)**39 == 3**39
>> True
>> >>> int(np.float64(3)**39) == 3**39
>> False
>> >>> float(np.float64(3)**39) == 3**39
>> False
>> )
>>
>> Regards
>>
>> Antoine.
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
wrote:

> On Mon, 13 Jun 2016 10:49:44 -0400
> josef.p...@gmail.com wrote:
> >
> > My argument is that `**` is like integer division and sqrt where the
> domain
> > where integer return are the correct numbers is too small to avoid
> > headaches by users.
>
> float64 has less integer precision than int64:
>
> >>> math.pow(3, 39) == 3**39
> False
> >>> np.int64(3)**39 == 3**39
> True
>

but if a user does this, then ???  (headaches or head scratching)

>>> np.array([3])**39
RuntimeWarning: invalid value encountered in power

array([-2147483648], dtype=int32)

Josef


>
>
> (as a sidenote, np.float64's equality operator seems to be slightly
> broken:
>
> >>> np.float64(3)**39 == 3**39
> True
> >>> int(np.float64(3)**39) == 3**39
> False
> >>> float(np.float64(3)**39) == 3**39
> False
> )
>
> Regards
>
> Antoine.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 10:05 AM, Alan Isaac  wrote:

> On 6/13/2016 4:47 AM, Antoine Pitrou wrote:
>
>> Currently, the choice is simple: if you want an int output,
>> have an int input; if you want a float output, have a float output.
>>
>
>
> That is a misunderstanding, which may be influencing the discussion.
> Examples of complications:
>
> >>> type(np.int8(2)**2)
> 
> >>> type(np.uint64(2)**np.int8(2))
> 
>
> I don't think anyone has proposed first principles
> from which the desirable behavior could be deduced.
> I do think reference to the reasoning used by other
> languages in making this decision could be helpful.


I think the main principle is whether an operator is a "float" operator.

for example, I don't think anyone would expect sqrt(int) to return int,
even if it would have exact results in a countable infinite number of cases
(theoretically)

another case is division which moved from return-int to return-float
definition in the py2 - py3 move.

My argument is that `**` is like integer division and sqrt where the domain
where integer return are the correct numbers is too small to avoid
headaches by users.

Josef


>
>
> Alan Isaac
> (on Windows)
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd
On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron  wrote:

> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
>
> Regarding your remark:  Maybe Im having a terminology problem.  It seems
> to me once you do
>
> osm = dist.ppf(osm_uniform)
>>>
>>>
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
>


Yes and No, quantile, i.e. what you get from ppf, are units of the random
variable. So it is on the scale of the random variable not on a probability
scale. The axis labels are in units of the random variable.

pp-plots have probabilities on the axis and are uniform scaled in
probabilities but non-uniform in the values of the random variable.

The difficult part to follow is if the plot is done uniform in one scale,
but the axis are labeled non-uniform in the other scale. That's what Paul's
probscale does and what you have in mind, AFAIU.

Josef


>
> Mark
>
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
>
>
>
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
> wrote:
>
>> Hi Mark,
>>
>> Note that the scipy-dev or scipy-user mailing list would have been more
>> appropriate for this question.
>>
>>
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron 
>> wrote:
>>
>>>
>>>
>>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>>> values versus actual data values for visualization of fit to a
>>> distribution.  First a one-D array of expected percentiles is generated for
>>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>>> function for the chosen distribution, to return an array of expected
>>> values.  The visualized data points are pairs of expected and actual
>>> values, and a linear regression is done on these to produce the line data
>>> points in this distribution should lie on.
>>>
>>> Where x is the input data array and dist the chosen distribution we have:
>>>
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>>
>>>
>>> My question concerns the plot display.
>>>
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>>
>>>
>>> The x-axis of the resulting plot is labeled quantiles, but the xticks
>>> and xticklabels produced produced by qqplot and problplot do not seem
>>> correct for the their intended interpretations.  First the numbers on the
>>> x-axis do not represent quantiles; the intervals between them do not in
>>> general contain equal numbers of points.  For a normal distribution with
>>> sigma=1, they represent standard deviations.  Changing the label on the
>>> x-axis does not seem like a very good solution, because the interpretation
>>> of the values on the x-axis will be different for different distributions.
>>> Rather the right solution seems to be to actually show quantiles on the
>>> x-axis. The numbers on the x-axis can stay as they are, representing
>>> quantile indexes, but they need to be spaced so as to show the actual
>>> division points that carve the population up into  groups of the same
>>> size.  This can be done in something like the following way.
>>>
>>
>> The ticks are correct I think, but they're theoretical quantiles and not
>> sample quantiles. This was discussed in [1] and is consistent with R [2]
>> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
>> x-axis label (mea culpa). Does adding that resolve your concern?
>>
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3]
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>>
>> Ralf
>>
>>
> as related link
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>
> Paul Hobson has done a lot of work for getting different probabitlity
> scales attached to pp-plots or generalized versions of probability plots. I
> think qqplots are less ambiguous because they are on the original or
> standardized scale.
>
> I haven't worked my way through the various interpretation of probability
> axis yet because I find it "not obvious". It might be easier for fields
> that have a tradition of using probability papers.
>
> It's planned to be added to the statsmodels probability plots so that
> there will be a large choice of axis labels and scales.
>
> Josef
>
>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd
On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
wrote:

> Hi Mark,
>
> Note that the scipy-dev or scipy-user mailing list would have been more
> appropriate for this question.
>
>
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
>
>>
>>
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>> values versus actual data values for visualization of fit to a
>> distribution.  First a one-D array of expected percentiles is generated for
>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>> function for the chosen distribution, to return an array of expected
>> values.  The visualized data points are pairs of expected and actual
>> values, and a linear regression is done on these to produce the line data
>> points in this distribution should lie on.
>>
>> Where x is the input data array and dist the chosen distribution we have:
>>
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>
>>
>> My question concerns the plot display.
>>
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>
>>
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and
>> xticklabels produced produced by qqplot and problplot do not seem correct
>> for the their intended interpretations.  First the numbers on the x-axis do
>> not represent quantiles; the intervals between them do not in general
>> contain equal numbers of points.  For a normal distribution with sigma=1,
>> they represent standard deviations.  Changing the label on the x-axis does
>> not seem like a very good solution, because the interpretation of the
>> values on the x-axis will be different for different distributions.  Rather
>> the right solution seems to be to actually show quantiles on the x-axis.
>> The numbers on the x-axis can stay as they are, representing quantile
>> indexes, but they need to be spaced so as to show the actual division
>> points that carve the population up into  groups of the same size.  This
>> can be done in something like the following way.
>>
>
> The ticks are correct I think, but they're theoretical quantiles and not
> sample quantiles. This was discussed in [1] and is consistent with R [2]
> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
> x-axis label (mea culpa). Does adding that resolve your concern?
>
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3]
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>
> Ralf
>
>
as related link
http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html

Paul Hobson has done a lot of work for getting different probabitlity
scales attached to pp-plots or generalized versions of probability plots. I
think qqplots are less ambiguous because they are on the original or
standardized scale.

I haven't worked my way through the various interpretation of probability
axis yet because I find it "not obvious". It might be easier for fields
that have a tradition of using probability papers.

It's planned to be added to the statsmodels probability plots so that there
will be a large choice of axis labels and scales.

Josef


>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-10 Thread josef . pktd
On Fri, Jun 10, 2016 at 2:00 PM, Nathaniel Smith  wrote:

> On Jun 10, 2016 10:50, "Alan Isaac"  wrote:
> >
> > On 6/10/2016 1:34 PM, Nathaniel Smith wrote:
> >>
> >> You keep pounding on this example. It's a fine example, but, c'mon. **2
> is probably at least 100x more common in real source code. Maybe 1000x more
> common. Why should we break the
> >> common case for your edge case?
> >
> >
> >
> > It is hardly an "edge case".
> > Again, **almost all** integer combinations overflow: that's the point.
>
> When you say "almost all", you're assuming inputs that are uniformly
> sampled integers. I'm much more interested in what proportion of calls to
> the ** operator involve inputs that can overflow, and in real life those
> inputs are very heavily biased towards small numbers.
>
> (I also think we should default to raising an error on overflow in
> general, with a seterr switch to turn it off when desired. But that's
> another discussion...)
>

but x**2 is just x*x which some seem to recommend (I have no idea why), and
then there are not so many "common" cases left.


(However, I find integers pretty useless except in some very specific
cases. When I started to cleanup scipy.stats.distribution, I threw out
integers for discrete distributions and replaced all or most `**` by
np.power, IIRC mainly because of the old python behavior and better numpy
behavior.)

(I'd rather use robust calculations that provide correct numbers, than
chasing individual edge cases and save a bit of memory in some common
cases. scipy stats also doesn't use factorial in almost all cases, because
special.gamma and variants are more robust )


Josef


> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-05 Thread josef . pktd
On Sun, Jun 5, 2016 at 8:41 PM, Stephan Hoyer  wrote:

> If possible, I'd love to add new functions for "generalized ufunc" linear
> algebra, and then deprecate (or at least discourage) using the older
> versions with inferior broadcasting rules. Adding a new keyword arg means
> we'll be stuck with an awkward API for a long time to come.
>
> There are three types of matrix/vector products for which ufuncs would be
> nice:
> 1. matrix-matrix product (covered by matmul)
> 2. matrix-vector product
> 3. vector-vector (inner) product
>
> It's straightful to implement either of the later two options by inserting
> dummy dimensions and then calling matmul, but that's a pretty awkward API,
> especially for inner products. Unfortunately, we already use the two most
> obvious one word names for vector inner products (inner and dot). But on
> the other hand, one word names are not very descriptive, and the short name
> "dot" probably mostly exists because of the lack of an infix operator.
>
> So I'll start by throwing out some potential new names:
>
> For matrix-vector products:
> matvecmul (if it's worth making a new operator)
>
> For inner products:
> vecmul (similar to matmul, but probably too ambiguous)
> dot_product
> inner_prod
> inner_product
>
>
how about names in plural as in the PR
I thought the `s` in inner_prods would signal better the broadcasting
behavior

dot_products
...

"dots" ?  (I guess not)

Josef


>
>
>
>
> On Sat, May 28, 2016 at 8:53 PM, Scott Sievert 
> wrote:
>
>> I recently ran into an application where I had to compute many inner
>> products quickly (roughy 50k inner products in less than a second). I
>> wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1,
>> …, xn.T @ A @ xn]` with A.shape = (1k, 1k).
>>
>> My first instinct was to look for a NumPy function to quickly compute
>> this, such as np.inner. However, it looks like np.inner has some other
>> behavior and I couldn’t get tensordot/einsum to work for me.
>>
>> Then a labmate pointed out that I can just do some slick matrix
>> multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`.
>> I opened [a PR] with this, and proposed that we define a new function
>> called `inner_prods` for this.
>>
>> However, in the PR, @shoyer pointed out
>>
>> > The main challenge is to figure out how to transition the behavior of
>> all these operations, while preserving backwards compatibility. Quite
>> likely, we need to pick new names for these functions, though we should try
>> to pick something that doesn't suggest that they are second class
>> alternatives.
>>
>> Do we choose new function names? Do we add a keyword arg that changes
>> what np.inner returns?
>>
>> [a PR]:https://github.com/numpy/numpy/pull/7690
>>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 6:17 PM,  wrote:
>
>>
>>
>> On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jun 4, 2016 at 5:27 PM,  wrote:
>>>


 On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:

> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
> > Also in favor of 2. Always return a float for '**'
>
> Even if we did want to switch to this, it's such a major
> backwards-incompatible change that I'm not sure how we could actually
> make the transition without first making it an error for a while.
>

 AFAIU, only the dtype for int**int would change. So, what would be the
 problem with FutureWarnings as with other dtype changes that were done in
 recent releases.


>>> The main problem I see with that is that numpy integers would behave
>>> differently than Python integers, and the difference would be silent. With
>>> option 1 it is possible to write code that behaves the same up to overflow
>>> and the error message would supply a warning when the exponent should be
>>> float. One could argue that numpy scalar integer types could be made to
>>> behave like python integers, but then their behavior would differ from
>>> numpy arrays and numpy scalar arrays.
>>>
>>
>> I'm not sure I understand.
>>
>> Do you mean
>>
>> np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)
>>
>> or 2**2 would behave differently than np.int_(2)**np.int(2)
>>
>
> The second case. Python returns ints for non-negative integer powers of
> ints.
>
>
>>
>> ?
>>
>>
>> AFAICS, there are many cases where numpy scalars don't behave like python
>> scalars. Also, does different behavior mean different type/dtype or
>> different numbers.  (The first I can live with, the second requires human
>> memory usage, which is a scarce resource.)
>>
>> >>> 2**(-2)
>> 0.25
>>
>>
> But we can't mix types in np.arrays and we can't depend on the element
> values of arrays in the exponent, but only on their type, so 2 ** array([1,
> -1]) must contain a single type and making that type float would surely
> break code.  Scalar arrays, which are arrays, have the same problem. We
> can't do what Python does with ndarrays and numpy scalars, and it would be
> best to be consistent. Division was a simpler problem to deal with, as
> there were two operators, `//` and `/`. If there were two exponential
> operators life would be simpler.
>

What bothers me with the entire argument is that you are putting higher
priority on returning a dtype than on returning the correct numbers.

Reverse the argument: Because we cannot make the return type value
dependent we **have** to return float, in order to get the correct number.
(It's an argument not what we really have to do.)


Which code really breaks, code that gets a float instead of an int, and
with some advance warning users that really need to watch their memory can
use np.power.

My argument before was that I think a simple operator like `**` should work
for 90+% of the users and match their expectation, and the users that need
to watch dtypes can as well use the function.

(I can also live with the exception from case 1., but I really think this
is like the python 2 integer division "surprise")

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 5:27 PM,  wrote:
>
>>
>>
>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:
>>
>>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
>>> > Also in favor of 2. Always return a float for '**'
>>>
>>> Even if we did want to switch to this, it's such a major
>>> backwards-incompatible change that I'm not sure how we could actually
>>> make the transition without first making it an error for a while.
>>>
>>
>> AFAIU, only the dtype for int**int would change. So, what would be the
>> problem with FutureWarnings as with other dtype changes that were done in
>> recent releases.
>>
>>
> The main problem I see with that is that numpy integers would behave
> differently than Python integers, and the difference would be silent. With
> option 1 it is possible to write code that behaves the same up to overflow
> and the error message would supply a warning when the exponent should be
> float. One could argue that numpy scalar integer types could be made to
> behave like python integers, but then their behavior would differ from
> numpy arrays and numpy scalar arrays.
>

I'm not sure I understand.

Do you mean

np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)

or 2**2 would behave differently than np.int_(2)**np.int(2)

?


AFAICS, there are many cases where numpy scalars don't behave like python
scalars. Also, does different behavior mean different type/dtype or
different numbers.  (The first I can live with, the second requires human
memory usage, which is a scarce resource.)

>>> 2**(-2)
0.25

Josef



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:

> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
> > Also in favor of 2. Always return a float for '**'
>
> Even if we did want to switch to this, it's such a major
> backwards-incompatible change that I'm not sure how we could actually
> make the transition without first making it an error for a while.
>

AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 3:49 PM, Matthew Brett 
wrote:

> On Sat, Jun 4, 2016 at 12:47 PM,   wrote:
> >
> >
> > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris <
> charlesr.har...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris
> >>  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I've made a new post so that we can make an explicit decision. AFAICT,
> >>> the two proposals are
> >>>
> >>> Integers to negative integer powers raise an error.
> >>> Integers to integer powers always results in floats.
> >>>
> >>> My own sense is that 1. would be closest to current behavior and using
> a
> >>> float exponential when a float is wanted is an explicit way to
> indicate that
> >>> desire. OTOH, 2. would be the most convenient default for everyday
> numerical
> >>> computation, but I think would more likely break current code. I am
> going to
> >>> come down on the side of 1., which I don't think should cause too many
> >>> problems if we start with a {Future, Deprecation}Warning explaining the
> >>> workaround.
> >
> >
> > I'm in favor of 2.  always float for `**`
> > I don't see enough pure integer usecases to throw away a nice operator.
>
> I can't make sense of 'throw away a nice operator' - you still have
> arr ** 2.0 if you want floats.
>


but if we have x**y, then we always need to check the dtype. If we don't we
get RuntimeErrors or overflow, where we might have forgotten to include the
relevant cases in the unit tests.

numpy has got pickier with using only integers in some areas (index, ...).
Now we have to watch out that we convert back to floats for power.

Not a serious problem for a library with unit tests and enough users who
run into the dtype issues and report them.
But I'm sure I will have to fix any scripts or interactive work that I'm
writing.
It's just another thing to watch out for, after we managed to get rid of
integer division 1/2=?.

Josef




>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've made a new post so that we can make an explicit decision. AFAICT,
>> the two proposals are
>>
>>
>>1. Integers to negative integer powers raise an error.
>>2. Integers to integer powers always results in floats.
>>
>> My own sense is that 1. would be closest to current behavior and using a
>> float exponential when a float is wanted is an explicit way to indicate
>> that desire. OTOH, 2. would be the most convenient default for everyday
>> numerical computation, but I think would more likely break current code. I
>> am going to come down on the side of 1., which I don't think should cause
>> too many problems if we start with a {Future, Deprecation}Warning
>> explaining the workaround.
>>
>
I'm in favor of 2.  always float for `**`
I don't see enough pure integer usecases to throw away a nice operator.

Josef



>
> Note that current behavior in 1.11 is such a mess
> ```
> In [5]: array([0], dtype=int64) ** -1
> Out[5]: array([-9223372036854775808])
>
> In [6]: array([0], dtype=uint64) ** -1
> Out[6]: array([ inf])
> ```
> That the simplest approach might be to start by raising an error rather
> than by trying to maintain current behavior and issuing a warning.
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 5:04 PM, Joseph Martinot-Lagarde <
joseph.martinot-laga...@m4x.org> wrote:

> Le 18/07/2014 22:46, Chris Barker a écrit :
> > On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde
> >  > > wrote:
> >
> > In addition,
> > you have to use AltGr on some keyboards to get the brackets.
> >
> >
> > If it's hard to type square brackets -- you're kind of dead in the water
> > with Python anyway -- this is not going to help.
> >
> > -Chris
> >
> Welcome to the azerty world ! ;)
>
> It's not that hard to type, just a bit more involved. My biggest problem
> is that you have to type the opening and closing bracket for each line,
> with a comma in between. It will always be harder and more error prone
> than a single semicolon, whatever the keyboard.
>
> My use case is not teaching but doing quick'n'dirty computations with a
> few values. Sometimes these values are copy-pasted from a space
> separated file, or from a printed array in another console. Having to
> add comas and bracket makes simple computations less easy. That's why I
> often use Octave for these.
>

my copy paste approaches for almost quick'n'dirty (no semicolons):

given:

a b c

1 2 3

4 5 6

7 8 9


(select & Ctrl-C)


>>> pandas.read_clipboard(sep=' ')

   a  b  c

0  1  2  3

1  4  5  6

2  7  8  9


>>> np.asarray(pandas.read_clipboard())

array([[1, 2, 3],

   [4, 5, 6],

   [7, 8, 9]], dtype=int64)


>>> pandas.read_clipboard().values

array([[1, 2, 3],

   [4, 5, 6],

   [7, 8, 9]], dtype=int64)


arr = np.array('''\

1 2 3

4 5 6

7 8 9'''.split(), float).reshape(-1, 3)


the last is not so quick and dirty but reusable and reused.


Josef





>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 4:21 PM, Charles G. Waldman 
wrote:

> Joseph Martinot-Lagarde writes:
>
> > Compare what's comparable:
>
> That's fair.
>
> > In addition, you have to use AltGr on some keyboards to get the brackets
>
> Wow, it must be rather painful to do any real programming on such a
> keyboard!
>
>  - C
>
>
> On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde
>  wrote:
> > Le 18/07/2014 20:42, Charles G. Waldman a écrit :
> >> Well, if the goal is "shorthand", typing numpy.array(numpy.mat())
> >> won't please many users.
> >>
> >> But the more I think about it, the less I think Numpy should support
> >> this (non-Pythonic) input mode.  Too much molly-coddling of new users!
> >> When doing interactive work I usually just type:
> >>
> > np.array([[1,2,3],
> >> ...   [4,5,6],
> >> ...   [7,8,9]])
> >>
> >> which is (IMO) easier to read:  e.g. it's not totally obvious that
> >> "1,0,0;0,1,0;0,0,1" represents a 3x3 identity matrix, but
> >>
> >> [[1,0,0],
> >>[0,1,0],
> >>[0,0,1]]
> >>
> >> is pretty obvious.
> >>
> > Compare what's comparable:
> >
> > [[1,0,0],
> >   [0,1,0],
> >   [0,0,1]]
> >
> > vs
> >
> > "1 0 0;"
> > "0 1 0;"
> > "0 0 1"
> >
> > or
> >
> > """
> > 1 0 0;
> > 0 1 0;
> > 0 0 1
> > """
> >
> > [[1,0,0], [0,1,0], [0,0,1]]
> > vs
> > "1 0 0; 0 1 0; 0 0 1"
> >
> >> The difference in (non-whitespace) chars is 19 vs 25, so the
> >> "shorthand" doesn't seem to save that much.
> >
> > Well, it's easier to type "" (twice the same character) than [], and you
> > have no risk in swapping en opening and a closing bracket. In addition,
> > you have to use AltGr on some keyboards to get the brackets. It doesn't
> > boils down to a number of characters.
> >
> >>
> >> Just my €0.02,
>


It's the year of the notebook.

notebooks are reusable.
notebooks correctly align the brackets in the second and third line
and it looks pretty, just like a matrix


(But, I don't have to teach newbies, and often I even correct whitespace on
the commandline, because it looks ugly and I will eventually copy it to a
script file.)

Josef
no broken windows!
well, except for the ones I don't feel like fixing right now.






> >>
> >> - C
> >>
> >>
> >>
> >>
> >> On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac 
> wrote:
> >>> On 7/18/2014 12:45 PM, Mark Miller wrote:
>  If the true goal is to just allow quick entry of a 2d array, why not
> just advocate using
>  a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9"))
> >>>
> >>>
> >>> It's even simpler:
> >>> a = np.mat(' 1 2 3;4 5 6;7 8 9').A
> >>>
> >>> I'm not putting a dog in this race.  Still I would say that
> >>> the reason why such proposals miss the point is that
> >>> there are introductory settings where one would like
> >>> to explain as few complications as possible.  In
> >>> particular, one might prefer *not* to discuss the
> >>> existence of a matrix type.  As an additional downside,
> >>> this is only good for 2d, and there have been proposals
> >>> for the new array builder to handle other dimensions.
> >>>
> >>> fwiw,
> >>> Alan Isaac
> >>>
> >>> ___
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:44 PM, Nathaniel Smith  wrote:

> On 18 Jul 2014 19:31,  wrote:
> >>
> >>
> >> > Making the behavior of assert_allclose depending on whether desired is
> >> > exactly zero or 1e-20 looks too difficult to remember, and which
> desired I
> >> > use would depend on what I get out of R or Stata.
> >>
> >> I thought your whole point here was that 1e-20 and zero are
> >> qualitatively different values that you would not want to accidentally
> >> confuse? Surely R and Stata aren't returning exact zeros for small
> >> non-zero values like probability tails?
> >>
> >
> > I was thinking of the case when we only see "pvalue < 1e-16" or
> something like this, and we replace this by assert close to zero.
> > which would translate to `assert_allclose(pvalue, 0, atol=1e-16)`
> > with maybe an additional rtol=1e-11 if we have an array of pvalues where
> some are "large" (>0.5).
>
> This example is also handled correctly by my proposal :-)
>
depends on the details of your proposal

alternative: desired is exactly zero means assert_equal

(Pdb) self.res_reg.params[m:]
array([ 0.,  0.,  0.])
(Pdb) assert_allclose(0, self.res_reg.params[m:])
(Pdb) assert_allclose(0, self.res_reg.params[m:], rtol=0, atol=0)
(Pdb)

This test uses currently assert_almost_equal with decimal=4   :(

regularized estimation with hard thresholding: the first m values are
estimate not equal zero, the m to the end elements are "exactly zero".

This is discrete models fit_regularized which predates numpy
assert_allclose.  I haven't checked what the unit test of Kerby's current
additions for fit_regularized looks like.

unit testing is serious business:
I'd rather have good unit test in SciPy related packages than convincing a
few more newbies that they can use the defaults for everything.

Josef



> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:29 PM, Nathaniel Smith  wrote:

> On Fri, Jul 18, 2014 at 7:03 PM,   wrote:
> >
> > On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith  wrote:
> >>
> >> For cases like this, you need *some* non-zero atol or the thing just
> >> doesn't work, and one could quibble over the exact value as long as
> >> it's larger than "normal" floating point error. These calculations
> >> usually involve "normal" sized numbers, so atol should be comparable
> >> to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
> >> to around 1e8, which is a plausible upper bound for where people might
> >> expect assert_allclose to just work. I'm trying to figure out some way
> >> to support your use cases while also supporting other use cases.
> >
> >
> > my problem is that there is no "normal" floating point error.
> > If I have units in 1000 or units in 0.0001 depends on the example and
> > dataset that we use for testing.
> >
> > this test two different functions/methods that calculate the same thing
> >
> > (Pdb) pval
> > array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> > (Pdb) res2.pvalues
> > array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> > (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
> >
> > I don't care about errors that are smaller that 1e-25
> >
> > for example testing p-values against Stata
> >
> > (Pdb) tt.pvalue
> > array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> > (Pdb) res2.pvalues
> > array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> > (Pdb) tt.pvalue - res2.pvalues
> > array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
> > (Pdb) tt.pvalue / res2.pvalues - 1
> > array([  3.79811738e-11,   4.01900735e-14,   7.33806349e-11])
> > (Pdb) rtol
> > 1e-10
> > (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
> >
> >
> > I could find a lot more and maybe nicer examples, since I spend quite a
> bit
> > of time fine tuning unit tests.
>
> ...these are all cases where there are not exact zeros, so my proposal
> would not affect them?
>
> I can see the argument that we shouldn't provide any default rtol/atol
> at all because there is no good default, but... I don't think putting
> that big of a barrier in front of newbies writing their first tests is
> a good idea.
>

I think atol=0 is **very** good for newbies, and everyone else.
If expected is really zero or very small, then it immediately causes a test
failure, and it's relatively obvious how to fix it.

I worry a lot more about unit tests that don't "bite" written by newbies or
not so newbies who just use a default.

That's one of the problems we had with assert_almost_equal, and why I was
very happy to switch to assert_allclose with it's emphasis on relative
tolerance.

Josef



>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
>
>
> > Making the behavior of assert_allclose depending on whether desired is
> > exactly zero or 1e-20 looks too difficult to remember, and which desired
> I
> > use would depend on what I get out of R or Stata.
>
> I thought your whole point here was that 1e-20 and zero are
> qualitatively different values that you would not want to accidentally
> confuse? Surely R and Stata aren't returning exact zeros for small
> non-zero values like probability tails?
>
>
I was thinking of the case when we only see "pvalue < 1e-16" or something
like this, and we replace this by assert close to zero.
which would translate to `assert_allclose(pvalue, 0, atol=1e-16)`
with maybe an additional rtol=1e-11 if we have an array of pvalues where
some are "large" (>0.5).

It's not a very frequent case, mainly when we don't have access to the
underlying float numbers and only have the print representation.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:03 PM,  wrote:

>
>
>
> On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith  wrote:
>
>> On Fri, Jul 18, 2014 at 12:38 PM,   wrote:
>> >
>> > On Thu, Jul 17, 2014 at 4:07 PM,  wrote:
>> >
>> >> If you mean by this to add atol=1e-8 as default, then I'm against it.
>> >>
>> >> At least it will change the meaning of many of our tests in
>> statsmodels.
>> >>
>> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
>> >> completely swamped if you change the default atol=0.
>> >> Adding atol=0 to all assert_allclose that currently use only rtol is a
>> lot
>> >> of work.
>> >> I think I almost never use a default rtol, but I often leave atol at
>> the
>> >> default = 0.
>> >>
>> >> If we have zeros, then I don't think it's too much work to decide
>> whether
>> >> this should be atol=1e-20, or 1e-8.
>> >
>> >
>> > copied from
>> > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
>> > since I didn't get any messages here
>> >
>> > This is a compelling use-case, but there are also lots of compelling
>> > usecases that want some non-zero atol (i.e., comparing stuff to 0).
>> > Saying that allclose is for one of those use cases and assert_allclose
>> > is for the other is... not a very felicitious API design, I think. So
>> > we really should do *something*.
>> >
>> > Are there really any cases where you want non-zero atol= that don't
>> > involve comparing something against a 'desired' value of zero? It's a
>> > little wacky, but I'm wondering if we ought to change the rule (for
>> > all versions of allclose) to
>> >
>> > if desired == 0:
>> > tol = atol
>> > else:
>> > tol = rtol * desired
>> >
>> > In particular, means that np.allclose(x, 1e-30) would reject x values
>> > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
>> >
>> > -n
>> >
>> >
>> > That's much too confusing.
>> > I don't know what the usecases for np.allclose are since I don't have
>> any.
>>
>> I wrote allclose because it's shorter, but my point is that
>> assert_allclose and allclose should use the same criterion, and was
>> making a suggestion for what that shared criterion might be.
>>
>> > assert_allclose is one of our (statsmodels) most frequently used numpy
>> > function
>> >
>> > this is not informative:
>> >
>> > `np.allclose(x, 1e-30)`
>> >
>> >
>> > since there are keywords
>> > either np.assert_allclose(x, atol=1e-30)
>>
>> I think we might be talking past each other here -- 1e-30 here is my
>> "gold" p-value that I'm hoping x will match, not a tolerance argument.
>>
>
> my mistake
>
>
>
>>
>> > if I want to be "close" to zero
>> > or
>> >
>> > np.assert_allclose(x, rtol=1e-11, atol=1e-25)
>> >
>> > if we have a mix of large numbers and "zeros" in an array.
>> >
>> > Making the behavior of assert_allclose depending on whether desired is
>> > exactly zero or 1e-20 looks too difficult to remember, and which
>> desired I
>> > use would depend on what I get out of R or Stata.
>>
>> I thought your whole point here was that 1e-20 and zero are
>> qualitatively different values that you would not want to accidentally
>> confuse? Surely R and Stata aren't returning exact zeros for small
>> non-zero values like probability tails?
>>
>> > atol=1e-8 is not close to zero in most cases in my experience.
>>
>> If I understand correctly (Tony?) the problem here is that another
>> common use case for assert_allclose is in cases like
>>
>> assert_allclose(np.sin(some * complex ** calculation / (that - should
>> - be * zero)), 0)
>>
>> For cases like this, you need *some* non-zero atol or the thing just
>> doesn't work, and one could quibble over the exact value as long as
>> it's larger than "normal" floating point error. These calculations
>> usually involve "normal" sized numbers, so atol should be comparable
>> to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
>> to around 1e8, which is a plausible upper bound for where people might
>> expect assert_allclose to just work. I'm trying to figure out some way
>> to support your use cases while also supporting other use cases.
>>
>
> my problem is that there is no "normal" floating point error.
> If I have units in 1000 or units in 0.0001 depends on the example and
> dataset that we use for testing.
>
> this test two different functions/methods that calculate the same thing
>
> (Pdb) pval
> array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> (Pdb) res2.pvalues
> array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
>
> I don't care about errors that are smaller that 1e-25
>
> for example testing p-values against Stata
>
> (Pdb) tt.pvalue
> array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> (Pdb) res2.pvalues
> array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> (Pdb) tt.pvalue - res2.pvalues
> array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
> (Pdb) tt.pvalue / res2.pv

Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith  wrote:

> On Fri, Jul 18, 2014 at 12:38 PM,   wrote:
> >
> > On Thu, Jul 17, 2014 at 4:07 PM,  wrote:
> >
> >> If you mean by this to add atol=1e-8 as default, then I'm against it.
> >>
> >> At least it will change the meaning of many of our tests in statsmodels.
> >>
> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
> >> completely swamped if you change the default atol=0.
> >> Adding atol=0 to all assert_allclose that currently use only rtol is a
> lot
> >> of work.
> >> I think I almost never use a default rtol, but I often leave atol at the
> >> default = 0.
> >>
> >> If we have zeros, then I don't think it's too much work to decide
> whether
> >> this should be atol=1e-20, or 1e-8.
> >
> >
> > copied from
> > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
> > since I didn't get any messages here
> >
> > This is a compelling use-case, but there are also lots of compelling
> > usecases that want some non-zero atol (i.e., comparing stuff to 0).
> > Saying that allclose is for one of those use cases and assert_allclose
> > is for the other is... not a very felicitious API design, I think. So
> > we really should do *something*.
> >
> > Are there really any cases where you want non-zero atol= that don't
> > involve comparing something against a 'desired' value of zero? It's a
> > little wacky, but I'm wondering if we ought to change the rule (for
> > all versions of allclose) to
> >
> > if desired == 0:
> > tol = atol
> > else:
> > tol = rtol * desired
> >
> > In particular, means that np.allclose(x, 1e-30) would reject x values
> > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
> >
> > -n
> >
> >
> > That's much too confusing.
> > I don't know what the usecases for np.allclose are since I don't have
> any.
>
> I wrote allclose because it's shorter, but my point is that
> assert_allclose and allclose should use the same criterion, and was
> making a suggestion for what that shared criterion might be.
>
> > assert_allclose is one of our (statsmodels) most frequently used numpy
> > function
> >
> > this is not informative:
> >
> > `np.allclose(x, 1e-30)`
> >
> >
> > since there are keywords
> > either np.assert_allclose(x, atol=1e-30)
>
> I think we might be talking past each other here -- 1e-30 here is my
> "gold" p-value that I'm hoping x will match, not a tolerance argument.
>

my mistake



>
> > if I want to be "close" to zero
> > or
> >
> > np.assert_allclose(x, rtol=1e-11, atol=1e-25)
> >
> > if we have a mix of large numbers and "zeros" in an array.
> >
> > Making the behavior of assert_allclose depending on whether desired is
> > exactly zero or 1e-20 looks too difficult to remember, and which desired
> I
> > use would depend on what I get out of R or Stata.
>
> I thought your whole point here was that 1e-20 and zero are
> qualitatively different values that you would not want to accidentally
> confuse? Surely R and Stata aren't returning exact zeros for small
> non-zero values like probability tails?
>
> > atol=1e-8 is not close to zero in most cases in my experience.
>
> If I understand correctly (Tony?) the problem here is that another
> common use case for assert_allclose is in cases like
>
> assert_allclose(np.sin(some * complex ** calculation / (that - should
> - be * zero)), 0)
>
> For cases like this, you need *some* non-zero atol or the thing just
> doesn't work, and one could quibble over the exact value as long as
> it's larger than "normal" floating point error. These calculations
> usually involve "normal" sized numbers, so atol should be comparable
> to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
> to around 1e8, which is a plausible upper bound for where people might
> expect assert_allclose to just work. I'm trying to figure out some way
> to support your use cases while also supporting other use cases.
>

my problem is that there is no "normal" floating point error.
If I have units in 1000 or units in 0.0001 depends on the example and
dataset that we use for testing.

this test two different functions/methods that calculate the same thing

(Pdb) pval
array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
(Pdb) res2.pvalues
array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
(Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)

I don't care about errors that are smaller that 1e-25

for example testing p-values against Stata

(Pdb) tt.pvalue
array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
(Pdb) res2.pvalues
array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
(Pdb) tt.pvalue - res2.pvalues
array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
(Pdb) tt.pvalue / res2.pvalues - 1
array([  3.79811738e-11,   4.01900735e-14,   7.33806349e-11])
(Pdb) rtol
1e-10
(Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)


I could find a lot more and maybe nicer examples, since 

Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 11:37 AM, Nathaniel Smith  wrote:

> On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers 
> wrote:
> >
> > On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu  wrote:
> >> It seems like the defaults for  `allclose` and `assert_allclose` should
> >> match, and an absolute tolerance of 0 is probably not ideal. I guess
> this is
> >> a pretty big behavioral change, but the current default for
> >> `assert_allclose` doesn't seem ideal.
> >
> > I agree, current behavior quite annoying. It would make sense to change
> the
> > atol default to 1e-8, but technically it's a backwards compatibility
> break.
> > Would probably have a very minor impact though. Changing the default for
> > rtol in one of the functions may be much more painful though, I don't
> think
> > that should be done.
>
> Currently we have:
>
> allclose: rtol=1e-5, atol=1e-8
> assert_allclose: rtol=1e-7, atol=0
>
> Why would it be painful to change assert_allclose to match allclose?
> It would weaken some tests, but no code would break.
>

We might break our code, if suddenly our test suite doesn't do what it is
supposed to do.

(rough guess: 40% of the statsmodels code are unit tests.)

Josef


>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:07 PM,  wrote:

>
>
>
> On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith  wrote:
>
>> On 16 Jul 2014 10:26, "Tony Yu"  wrote:
>> >
>> > Is there any reason why the defaults for `allclose` and
>> `assert_allclose` differ? This makes debugging a broken test much more
>> difficult. More importantly, using an absolute tolerance of 0 causes
>> failures for some common cases. For example, if two values are very close
>> to zero, a test will fail:
>> >
>> > np.testing.assert_allclose(0, 1e-14)
>> >
>> > Git blame suggests the change was made in the following commit, but I
>> guess that change only reverted to the original behavior.
>> >
>> >
>> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
>> >
>> > It seems like the defaults for  `allclose` and `assert_allclose` should
>> match, and an absolute tolerance of 0 is probably not ideal. I guess this
>> is a pretty big behavioral change, but the current default for
>> `assert_allclose` doesn't seem ideal.
>>
>> What you say makes sense to me, and loosening the default tolerances
>> won't break any existing tests. (And I'm not too worried about people who
>> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
>> that much to you exactly what tolerance you test, you should be setting the
>> tolerance explicitly!) I vote that unless someone comes up with some
>> terrible objection in the next few days then you should submit a PR :-)
>>
>
> If you mean by this to add atol=1e-8 as default, then I'm against it.
>
> At least it will change the meaning of many of our tests in statsmodels.
>
> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
> completely swamped if you change the default atol=0.
> Adding atol=0 to all assert_allclose that currently use only rtol is a lot
> of work.
> I think I almost never use a default rtol, but I often leave atol at the
> default = 0.
>
> If we have zeros, then I don't think it's too much work to decide whether
> this should be atol=1e-20, or 1e-8.
>

copied from
http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
since I didn't get any messages here

This is a compelling use-case, but there are also lots of compelling
usecases that want some non-zero atol (i.e., comparing stuff to 0).
Saying that allclose is for one of those use cases and assert_allclose
is for the other is... not a very felicitious API design, I think. So
we really should do *something*.

Are there really any cases where you want non-zero atol= that don't
involve comparing something against a 'desired' value of zero? It's a
little wacky, but I'm wondering if we ought to change the rule (for
all versions of allclose) to

if desired == 0:
tol = atol
else:
tol = rtol * desired

In particular, means that np.allclose(x, 1e-30) would reject x values
of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.

-n


That's much too confusing.
I don't know what the usecases for np.allclose are since I don't have any.

assert_allclose is one of our (statsmodels) most frequently used numpy
function

this is not informative:

`np.allclose(x, 1e-30)`


since there are keywords
either np.assert_allclose(x, atol=1e-30)
if I want to be "close" to zero
or

np.assert_allclose(x, rtol=1e-11, atol=1e-25)

if we have a mix of large numbers and "zeros" in an array.

Making the behavior of assert_allclose depending on whether desired is
exactly zero or 1e-20 looks too difficult to remember, and which desired I
use would depend on what I get out of R or Stata.

atol=1e-8 is not close to zero in most cases in my experience.


The numpy.testing assert functions are some of the most useful functions in
numpy, and heavily used "code".

Josef



>
> Josef
>
>
>
>> -n
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] problems with mailing list ?

2014-07-18 Thread josef . pktd
Are the problems with sending out the messages with the mailing lists?

I'm getting some replies without original messages, and in some threads I
don't get replies, missing part of the discussions.


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:07 PM,  wrote:

>
>
>
> On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith  wrote:
>
>> On 16 Jul 2014 10:26, "Tony Yu"  wrote:
>> >
>> > Is there any reason why the defaults for `allclose` and
>> `assert_allclose` differ? This makes debugging a broken test much more
>> difficult. More importantly, using an absolute tolerance of 0 causes
>> failures for some common cases. For example, if two values are very close
>> to zero, a test will fail:
>> >
>> > np.testing.assert_allclose(0, 1e-14)
>> >
>> > Git blame suggests the change was made in the following commit, but I
>> guess that change only reverted to the original behavior.
>> >
>> >
>> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
>> >
>> > It seems like the defaults for  `allclose` and `assert_allclose` should
>> match, and an absolute tolerance of 0 is probably not ideal. I guess this
>> is a pretty big behavioral change, but the current default for
>> `assert_allclose` doesn't seem ideal.
>>
>> What you say makes sense to me, and loosening the default tolerances
>> won't break any existing tests. (And I'm not too worried about people who
>> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
>> that much to you exactly what tolerance you test, you should be setting the
>> tolerance explicitly!) I vote that unless someone comes up with some
>> terrible objection in the next few days then you should submit a PR :-)
>>
>
> If you mean by this to add atol=1e-8 as default, then I'm against it.
>
> At least it will change the meaning of many of our tests in statsmodels.
>
> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
> completely swamped if you change the default atol=0.
> Adding atol=0 to all assert_allclose that currently use only rtol is a lot
> of work.
> I think I almost never use a default rtol, but I often leave atol at the
> default = 0.
>
> If we have zeros, then I don't think it's too much work to decide whether
> this should be atol=1e-20, or 1e-8.
>

Just to explain, p-values, sf of the distributions are usually accurate at
1e-30 or 1e-50 or something like that. And when we test the tails of the
distributions we use that the relative error is small and the absolute
error is "tiny".

We would need to do a grep to see how many cases there actually are in
scipy and statsmodels, before we change it because for some use cases we
only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization).
Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR.

Josef


>
> Josef
>
>
>
>> -n
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith  wrote:

> On 16 Jul 2014 10:26, "Tony Yu"  wrote:
> >
> > Is there any reason why the defaults for `allclose` and
> `assert_allclose` differ? This makes debugging a broken test much more
> difficult. More importantly, using an absolute tolerance of 0 causes
> failures for some common cases. For example, if two values are very close
> to zero, a test will fail:
> >
> > np.testing.assert_allclose(0, 1e-14)
> >
> > Git blame suggests the change was made in the following commit, but I
> guess that change only reverted to the original behavior.
> >
> >
> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
> >
> > It seems like the defaults for  `allclose` and `assert_allclose` should
> match, and an absolute tolerance of 0 is probably not ideal. I guess this
> is a pretty big behavioral change, but the current default for
> `assert_allclose` doesn't seem ideal.
>
> What you say makes sense to me, and loosening the default tolerances won't
> break any existing tests. (And I'm not too worried about people who were
> counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that
> much to you exactly what tolerance you test, you should be setting the
> tolerance explicitly!) I vote that unless someone comes up with some
> terrible objection in the next few days then you should submit a PR :-)
>

If you mean by this to add atol=1e-8 as default, then I'm against it.

At least it will change the meaning of many of our tests in statsmodels.

I'm using rtol to check for correct 1e-15 or 1e-30, which would be
completely swamped if you change the default atol=0.
Adding atol=0 to all assert_allclose that currently use only rtol is a lot
of work.
I think I almost never use a default rtol, but I often leave atol at the
default = 0.

If we have zeros, then I don't think it's too much work to decide whether
this should be atol=1e-20, or 1e-8.

Josef



> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:21 PM,  wrote:

>
>
>
> On Thu, Jul 17, 2014 at 4:07 PM,  wrote:
>
>>
>>
>>
>> On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith  wrote:
>>
>>> On 16 Jul 2014 10:26, "Tony Yu"  wrote:
>>> >
>>> > Is there any reason why the defaults for `allclose` and
>>> `assert_allclose` differ? This makes debugging a broken test much more
>>> difficult. More importantly, using an absolute tolerance of 0 causes
>>> failures for some common cases. For example, if two values are very close
>>> to zero, a test will fail:
>>>
>>
And one more comment: I debug "broken tests" pretty often. My favorites in
pdb are

np.max(np.abs(x - y))

and

np.max(np.abs(x / y - 1))

to see how much I would have to adjust atol and rtol in assert_allclose in
the tests to make them pass, and to decide whether this is an acceptable
numerical difference or a bug.

allclose doesn't tell me anything and I almost never use it.

Josef



> >
>>> > np.testing.assert_allclose(0, 1e-14)
>>> >
>>> > Git blame suggests the change was made in the following commit, but I
>>> guess that change only reverted to the original behavior.
>>> >
>>> >
>>> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
>>> >
>>> > It seems like the defaults for  `allclose` and `assert_allclose`
>>> should match, and an absolute tolerance of 0 is probably not ideal. I guess
>>> this is a pretty big behavioral change, but the current default for
>>> `assert_allclose` doesn't seem ideal.
>>>
>>> What you say makes sense to me, and loosening the default tolerances
>>> won't break any existing tests. (And I'm not too worried about people who
>>> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
>>> that much to you exactly what tolerance you test, you should be setting the
>>> tolerance explicitly!) I vote that unless someone comes up with some
>>> terrible objection in the next few days then you should submit a PR :-)
>>>
>>
>> If you mean by this to add atol=1e-8 as default, then I'm against it.
>>
>> At least it will change the meaning of many of our tests in statsmodels.
>>
>> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
>> completely swamped if you change the default atol=0.
>> Adding atol=0 to all assert_allclose that currently use only rtol is a
>> lot of work.
>> I think I almost never use a default rtol, but I often leave atol at the
>> default = 0.
>>
>> If we have zeros, then I don't think it's too much work to decide whether
>> this should be atol=1e-20, or 1e-8.
>>
>
> Just to explain, p-values, sf of the distributions are usually accurate at
> 1e-30 or 1e-50 or something like that. And when we test the tails of the
> distributions we use that the relative error is small and the absolute
> error is "tiny".
>
> We would need to do a grep to see how many cases there actually are in
> scipy and statsmodels, before we change it because for some use cases we
> only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization).
> Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR.
>
> Josef
>
>
>>
>> Josef
>>
>>
>>
>>> -n
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-07 Thread josef . pktd
On Mon, Jul 7, 2014 at 1:58 PM, Nathaniel Smith  wrote:

> On Mon, Jul 7, 2014 at 3:28 PM, Sebastian Berg
>  wrote:
> > On Mo, 2014-07-07 at 09:50 -0400, josef.p...@gmail.com wrote:
> >>
> >> On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg
> >>  wrote:
> >> On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote:
> >> > On 7/7/2014 7:17 AM, Daπid wrote:
> >> > > How about a new one? np.matarray, for MATLAB array.
> >> >
> >> >
> >> > How about `str2arr` or even `build`, since teaching appears
> >> to be a focus.
> >> > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become
> >> 2d.
> >> > It seems unambiguous to allow '1 2 3;;' to be 3d, or even
> >> > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting
> >> > that, not urging that it be implemented.
> >> >
> >>
> >> Probably overdoing it, but if we plan on more then just this,
> >> what about
> >> banning such functions to something like
> >> numpy.interactive/numpy.helpers
> >> which you can then import * (or better specific functions)
> >> from?
> >>
> >> I think the fact that you need many imports on startup should
> >> rather be
> >> fixed by an ipython scientific mode or other startup imports.
> >>
> >>
> >>
> >>
> >> Is this whole thing really worth it? We get back to a numpy pylab.
> >>
> >>
> >> First users learn the dirty shortcuts, and then they have to learn how
> >> to do it "properly".
> >>
> >
> > Yeah, you are right. Just a bit afraid of creating too many such
> > functions that I am not sure are very useful/used much. For example I am
> > not sure that many use np.r_ or np.c_
>
> Yeah, we definitely have too many random bits of API around overall.
> But I think this one is probably worthwhile. It doesn't add any real
> complexity (no new types, trivial for readers to understand the first
> time they encounter it, etc.), and it addresses a recurring perceived
> shortcoming of numpy that people run into in the first 5 minutes of
> use, at a time when it's pretty easy to give up and go back to Matlab.
> And, it removes one of the perceived advantages of np.matrix over
> np.ndarray, so it smooths our way for eventually phasing out
> np.matrix.
>
> I'm not sure that preserving np.arr is that important ( here
> only saves 1 character!), but some possible alternatives for short
> names:
>
>   np.marr  ("matlab-like array construction")
>   np.sarr   ("string array")
>   np.parse
>

short like np.s   (didn't know there is already s_)

something long like

>>> np.fromstring('1 2', sep=' ')
array([ 1.,  2.])


>>> np.fromstring2d('1 2 3; 5 3.4 7')
array([[ 1. ,  2. ,  3. ],
   [ 5. ,  3.4,  7. ]])


Josef





>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-07 Thread josef . pktd
On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg 
wrote:

> On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote:
> > On 7/7/2014 7:17 AM, Daπid wrote:
> > > How about a new one? np.matarray, for MATLAB array.
> >
> >
> > How about `str2arr` or even `build`, since teaching appears to be a
> focus.
> > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d.
> > It seems unambiguous to allow '1 2 3;;' to be 3d, or even
> > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting
> > that, not urging that it be implemented.
> >
>
> Probably overdoing it, but if we plan on more then just this, what about
> banning such functions to something like numpy.interactive/numpy.helpers
> which you can then import * (or better specific functions) from?
>
> I think the fact that you need many imports on startup should rather be
> fixed by an ipython scientific mode or other startup imports.
>


Is this whole thing really worth it? We get back to a numpy pylab.

First users learn the dirty shortcuts, and then they have to learn how to
do it "properly".

(I'm using quite often string split and reshape for copy-pasted text
tables.)

Josef



>
> - Sebastian
>
> > Alan Isaac
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.random selection from a loglogistic distribution?

2014-06-16 Thread josef . pktd
On Mon, Jun 16, 2014 at 1:38 PM, Liz VanWormer  wrote:
> Dear all,
>
> I'm a novice Python user, and I need to draw random variables from a
> loglogistic distribution. I've used the numpy.random command in the past to
> select variables from supported distributions (beta, normal, lognormal,
> etc). Is there a way to add in a distribution to be used with the
> numpy.random command?

You or someone can add new distribution for numpy.random.

However, scipy has the fisk distribution which according to wikipedia
is the same as log-logistic.
fisk is implemented as a special case of burr and has an explicit
inverse cdf (ppf)

So using fisk.rvs should be reasonably fast for vectorized calls,
since the overhead is much larger than for numpy.random functions.

http://en.wikipedia.org/wiki/Log-logistic_distribution

to make distribution names more fun:
Wikipedia:"generalized log-logistic distribution"  "These include the
Burr Type XII distribution (also known as the Singh-Maddala
distribution) "

Josef

>
> Thank you for your insight,
> Liz
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Switch to using ATLAS for OSX binary wheels

2014-06-16 Thread josef . pktd
On Mon, Jun 16, 2014 at 7:10 AM, Matthew Brett  wrote:
> Hi,
>
> On Sun, Jun 15, 2014 at 10:51 PM, Ralf Gommers  wrote:
>>
>>
>>
>> On Sat, Jun 14, 2014 at 11:56 AM, Matthew Brett 
>> wrote:
>>>
>>>
>>>
>>> On Friday, June 13, 2014, Ralf Gommers  wrote:




 On Fri, Jun 13, 2014 at 2:07 PM, Matthew Brett 
 wrote:
>
> Hi,
>
> Summary : I'm planning to upload OSX wheels for numpy and scipy using
> the ATLAS blas / lapack library instead of the default OSX Accelerate
> framework.
>
> We've run into some trouble with a segfault in recent OSX Accelerate:
>
> https://github.com/numpy/numpy/issues/4007
>
> and Accelerate also doesn't play well with multiprocessing.
>
> https://github.com/numpy/numpy/issues/4776
>
> Because there's nothing I love more than half-day compilation runs on
> my laptop, I've built ATLAS binaries with gcc 4.8, and linked numpy
> and scipy to them to make OSX wheels.  These pass all tests in i386
> and x86_64 mode, including numpy, scipy, matplotlib, pandas:
>
>
> https://travis-ci.org/matthew-brett/scipy-stack-osx-testing/builds/27442987
>
> The build process needs some automating, but it's recorded here:
>
> https://github.com/matthew-brett/numpy-atlas-binaries
>
> It's possible to get travis-ci to build these guys from a bare machine
> and then upload them somewhere, but I haven't tried to do that.
>
> Meanwhile Sturla kindly worked up a patch to numpy to work round the
> Accelerate segfault [1].  I haven't tested that, but given I'd already
> built the wheels, I prefer the ATLAS builds because they work with
> multiprocessing.
>
> I propose uploading these wheels as the default for numpy and scipy.
> Does anyone have any objection or comments before I go ahead and do
> that?


 From your README and wscript I don't see what numpy version you're using
 to compile scipy against. I got the impression that you used 1.8.1, but it
 should be numpy 1.5.1 for the 2.7 build, and 1.7.1 for 3.x.

 I've tried the scipy 0.14.0 python2.7 wheel, but I get import errors (see
 below). Your wheels should work with all common Python installs (mine is
 homebrew) right?

 Ralf


 $ python2.7 -c "import scipy; scipy.test()"
 Running unit tests for scipy
 NumPy version 1.9.0.dev-056ab73
 NumPy is installed in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy
 SciPy version 0.14.0
 SciPy is installed in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy
 Python version 2.7.5 (default, Jun 18 2013, 21:21:44) [GCC 4.2.1
 Compatible Apple LLVM 4.2 (clang-425.0.28)]
 nose version 1.3.0
 E...EEEE

 ==
 ERROR: Failure: ImportError
 (dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so,
 2): Symbol not found: _PyModule_Create2
   Referenced from:
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so
   Expected in: flat namespace
  in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so)

 --
 Traceback (most recent call last):
   File
 "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py",
 line 413, in loadTestsFromName
 addr.filename, addr.module)
   File
 "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py",
 line 47, in importFromPath
 return self.importFromDir(dir_path, fqname)
   File
 "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py",
 line 94, in importFromDir
 mod = load_module(part_fqname, fh, filename, desc)
   File
 "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/__init__.py",
 line 27, in 
 from . import vq, hierarchy
   File
 "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.py",
 line 175, in 
 from . import _hierarchy_wrap
 ImportError:
 dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/li

Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release

2014-06-09 Thread josef . pktd
On Mon, Jun 9, 2014 at 11:10 PM,   wrote:
> On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris
>  wrote:
>>
>>
>>
>> On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris 
>> wrote:
>>>
>>>
>>>
>>>
>>> On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris
>>>  wrote:




 On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback  wrote:
>
> The one pandas test failure that is valid: ERROR: test_interp_regression
> (pandas.tests.test_generic.TestSeries)
>
> has been fixed in pandas master / 0.14.1 (prob releasing in 1 month).
>
> (the other test failures are for clipboard / network issues)
>
>
>

 Thanks for the feedback.

 

>>>
>>> Looks to me like a fair number of the other test failures are actually
>>> bugs in the tests, or could be argued to be so.
>>
>>
>> Or rather, actual bugs, but not in numpy. I can see this will take a while
>> to settle ;)
>
> not really a bug on our side given the old numpy behavior

forgot to define: our side = statsmodels

> I was surprised about the failure because it still produces the correct 
> result.
>
> params[nuis_param_index] = nuisance_params
> ValueError: shape mismatch: value array of shape (2,) could not be
> broadcast to indexing result of shape (0,)
>
> As far as I have figured out based on Christoph's test failures
>
> `nuis_param_index`  is array([], dtype=int32)
> and because numpy didn't assign anything, `nuisance_params` was picked
> essentially arbitrary along this code path
>
 x = np.arange(5.)
 we_dont_care_what_this_is = np.random.randn(10)
 x[[]] = we_dont_care_what_this_is
 x
> array([ 0.,  1.,  2.,  3.,  4.])
>
> Does x[[]] = [] work with numpy 1.9?
>
> Josef
>
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release

2014-06-09 Thread josef . pktd
On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris
 wrote:
>
>
>
> On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris 
> wrote:
>>
>>
>>
>>
>> On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris
>>  wrote:
>>>
>>>
>>>
>>>
>>> On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback  wrote:

 The one pandas test failure that is valid: ERROR: test_interp_regression
 (pandas.tests.test_generic.TestSeries)

 has been fixed in pandas master / 0.14.1 (prob releasing in 1 month).

 (the other test failures are for clipboard / network issues)



>>>
>>> Thanks for the feedback.
>>>
>>> 
>>>
>>
>> Looks to me like a fair number of the other test failures are actually
>> bugs in the tests, or could be argued to be so.
>
>
> Or rather, actual bugs, but not in numpy. I can see this will take a while
> to settle ;)

not really a bug on our side given the old numpy behavior
I was surprised about the failure because it still produces the correct result.

params[nuis_param_index] = nuisance_params
ValueError: shape mismatch: value array of shape (2,) could not be
broadcast to indexing result of shape (0,)

As far as I have figured out based on Christoph's test failures

`nuis_param_index`  is array([], dtype=int32)
and because numpy didn't assign anything, `nuisance_params` was picked
essentially arbitrary along this code path

>>> x = np.arange(5.)
>>> we_dont_care_what_this_is = np.random.randn(10)
>>> x[[]] = we_dont_care_what_this_is
>>> x
array([ 0.,  1.,  2.,  3.,  4.])

Does x[[]] = [] work with numpy 1.9?

Josef

>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

2014-06-05 Thread josef . pktd
On Thu, Jun 5, 2014 at 8:40 AM, David Cournapeau  wrote:

>
>
>
> On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>>
>> On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant 
>> wrote:
>>
>>> Believe me, I'm all for incremental changes if it is actually possible
>>> and doesn't actually cost more.  It's also why I've been silent until now
>>> about anything we are doing being a candidate for a NumPy 2.0.  I
>>> understand the challenges of getting people to change.  But, features and
>>> solid improvements *will* get people to change --- especially if their new
>>> library can be used along with the old library and the transition can be
>>> done gradually. Python 3's struggle is the lack of features.
>>>
>>> At some point there *will* be a NumPy 2.0.   What features go into NumPy
>>> 2.0, how much backward compatibility is provided, and how much porting is
>>> needed to move your code from NumPy 1.X to NumPy 2.X is the real user
>>> question --- not whether it is characterized as "incremental" change or
>>> "re-write". What I call a re-write and what you call an
>>> "incremental-change" are two points on a spectrum and likely overlap
>>> signficantly if we really compared what we are thinking about.
>>>
>>> One huge benefit that came out of the numeric / numarray / numpy
>>> transition that we mustn't forget about was actually the extended buffer
>>> protocol and memory view objects.  This really does allow multiple array
>>> objects to co-exist and libraries to use the object that they prefer in a
>>> way that did not exist when Numarray / numeric / numpy came out.So, we
>>> shouldn't be afraid of that world.   The existence of easy package managers
>>> to update environments to try out new features and have applications on a
>>> single system that use multiple versions of the same library is also
>>> something that didn't exist before and that will make any transition easier
>>> for users.
>>>
>>> One thing I regret about my working on NumPy originally is that I didn't
>>> have the foresight, skill, and understanding to work more on a more
>>> extended and better designed multiple-dispatch system so that multiple
>>> array objects could participate together in an expression flow.   The
>>> __numpy_ufunc__ mechanism gives enough capability in that direction that it
>>> may be better now.
>>>
>>> Ultimately, I don't disagree that NumPy can continue to exist in
>>> "incremental" change mode ( though if you are swapping out whole swaths of
>>> C-code for Cython code --- it sounds a lot like a "re-write") as long as
>>> there are people willing to put the effort into changing it.   I think this
>>> is actually benefited by the existence of other array objects that are
>>> pushing the feature envelope without the constraints --- in much the same
>>> way that the Python standard library is benefitted by many versions of
>>> different capabilities being tried out before moving into the standard
>>> library.
>>>
>>> I remain optimistic that things will continue to improve in multiple
>>> ways --- if a little "messier" than any of us would conceive individually.
>>>   It *is* great to see all the PR's coming from multiple people on NumPy
>>> and all the new energy around improving things whether great or small.
>>>
>>
>> @nathaniel IIRC, one of the objections to the missing values work was
>> that it changed the underlying array object by adding a couple of variables
>> to the structure. I'm willing to do that sort of thing, but it would be
>> good to have general agreement that that is acceptable.
>>
>
>
> I think changing the ABI for some versions of numpy (2.0 , whatever) is
> acceptable. There is little doubt that the ABI will need to change to
> accommodate a better and more flexible architecture.
>
> Changing the C API is more tricky: I am not up to date to the changes from
> the last 2-3 years, but at that time, most things could have been changed
> internally without breaking much, though I did not go far enough to
> estimate what the performance impact could be (if any).
>

My impression is that you can do it once (in a while) so that no more than
two incompatible versions of numpy are alive at the same time.

It doesn't look worse to me than supporting a new python version, but
doubles the number of binaries and wheels.

(Supporting python 3.4 for cython based projects was hoping or helping that
cython takes care of it. And cython developers took care of it. )

Josef


>
>
>
>> As to blaze/dynd, I'd like to steal bits here and there, and maybe in the
>> long term base numpy on top of it with a compatibility layer. There is a
>> lot of thought and effort that has gone into those projects and we should
>> use what we can. As is, I think numpy is good for another five to ten years
>> and will probably hang on for fifteen, but it will be outdated by the end
>> of that period. Like great whites, we need to keep swimming just to have
>> oxygen. Software projects

Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

2014-06-05 Thread josef . pktd
On Thu, Jun 5, 2014 at 8:58 AM, Todd  wrote:

>
> On 5 Jun 2014 14:28, "David Cournapeau"  wrote:
> >
> >
> >
> >
> > On Thu, Jun 5, 2014 at 9:44 AM, Todd  wrote:
> >>
> >>
> >> On 5 Jun 2014 02:57, "Nathaniel Smith"  wrote:
> >> >
> >> > On Wed, Jun 4, 2014 at 7:18 AM, Travis Oliphant 
> wrote:
> >> > And numpy will be much harder to replace than numeric --
> >> > numeric wasn't the most-imported package in the pythonverse ;-).
> >>
> >> If numpy is really such a core part of  python ecosystem, does it
> really make sense to keep it as a stand-alone package?  Rather than
> thinking about a numpy 2, might it be better to be focusing on getting
> ndarray and dtype to a level of quality where acceptance upstream might be
> plausible?
> >
> >
> > There has been discussions about integrating numpy a long time ago
> (can't find a reference right now), and the consensus was that this was
> possible in its current shape nor advisable. The situation has not changed.
> >
> > Putting something in the stdlib means it basically cannot change
> anymore: API compatibility requirements would be stronger than what we
> provide even now. NumPy is also a large codebase which would need some
> major clean up to be accepted, etc...
> >
> > David
>
> I am not suggesting merging all of numpy, only ndarray and dtype (which I
> know is a huge job itself).  And perhaps not even all of what us currently
> included in those, some methods could be split out to their own functions.
>
> And any numpy 2.0 would also imply a major code cleanup.  So although
> ndarray and dtype are certainly not ready for such a thing right now, if
> you are talking about numpy 2.0 already, perhaps part of that discussion
> could involve a plan to get the code into a state where such a move might
> be plausible.  Even if the merge doesn't actually happen, having the code
> at that quality level would still be a good thing.
>
> I agree that the relationship between numpy and python has not changed
> very much in the last few years, but I think the scientific computing
> landscape is changing.  The latter issue is where my primary concern lies.
>
I don't think it would have any effect on scientific computing users. It
might be useful for other users that occasionally want to do a bit of array
processing.

Scientific users need the extended SciPy stack and not a part of numpy that
can be imported from the standard library.
For example in "Data Science", where I pay more attention and where Python
is getting pretty popular, the usual recommended list requires numpy scipy
and 5 to 10 more python libraries.

Should pandas also go into the python standard library?
Python 3.4 got a statistics library, but I don't think it has any effect on
the potential statsmodels user base.

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Pandas 0.14.0 released

2014-05-31 Thread josef . pktd
No comment,

Josef

On Fri, May 30, 2014 at 6:16 PM, Neal Becker  wrote:
> pip install --user --up pandas
> Downloading/unpacking pandas from
> https://pypi.python.org/packages/source/p/pandas/pandas-0.14.0.tar.gz#md5=b775987c0ceebcc8d5ace4a1241c967a
> ...
>
> Downloading/unpacking numpy>=1.6.1 from
> https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678
> (from pandas)
>   Downloading numpy-1.8.1.tar.gz (3.8MB): 3.8MB downloaded
>   Running setup.py egg_info for package numpy
> Running from numpy source directory.
>
> warning: no files found matching 'tools/py3tool.py'
> warning: no files found matching '*' under directory 'doc/f2py'
> warning: no previously-included files matching '*.pyc' found anywhere in
> distribution
> warning: no previously-included files matching '*.pyo' found anywhere in
> distribution
> warning: no previously-included files matching '*.pyd' found anywhere in
> distribution
> Downloading/unpacking six from
> https://pypi.python.org/packages/source/s/six/six-1.6.1.tar.gz#md5=07d606ac08595d795bf926cc9985674f
> (from python-dateutil->pandas)
>   Downloading six-1.6.1.tar.gz
>   Running setup.py egg_info for package six
>
> no previously-included directories found matching 'documentation/_build'
> Installing collected packages: pandas, pytz, numpy, six
> 
>
> What?  I already have numpy-1.8.0 installed (also have six, pytz).
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] smoothing function

2014-05-15 Thread josef . pktd
On Thu, May 15, 2014 at 12:17 PM, Nathaniel Smith  wrote:

> On Thu, May 15, 2014 at 1:04 PM, rodrigo koblitz
>  wrote:
> > Buenos,
> > I'm reading Zuur book (ecology models with R) and try make it entire in
> > python.
> > Have this function in R:
> > M4 <- gam(So ∼ s(De) + factor(ID), subset = I1)
> >
> > the 's' term indicated with So is modelled as a smoothing function of De
> >
> > I'm looking for something close to this in python.
>
> The closest thing that doesn't require writing your own code is
> probably to use patsy's [1] support for (simple unpenalized) spline
> basis transformations [2]. I think using statsmodels this works like:
>
> import statsmodels.formula.api as smf
> # adjust '5' to taste -- bigger = wigglier, less bias, more overfitting
> results = smf.ols("So ~ bs(De, 5) + C(ID)", data=my_df).fit()
> print results.summary()
>

Nice


>
> To graph the resulting curve you'll want to use the results to somehow
> do "prediction" -- I'm not sure what the API for that looks like in
> statsmodels. If you need help figuring it out then the asking on the
> statsmodels list or stackoverflow is probably the quickest way to get
> help.
>

seems to work (in a very simple made up example)

results.predict({'De':np.arange(1,5), 'ID':['a']*4}, transform=True)
#array([ 0.75 , 1.0833, 0.75 , 0.4167])

Josef


> -n
>
> [1] http://patsy.readthedocs.org/en/latest/
> [2]
> http://patsy.readthedocs.org/en/latest/builtins-reference.html#patsy.builtins.bs
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] smoothing function

2014-05-15 Thread josef . pktd
On Thu, May 15, 2014 at 8:04 AM, rodrigo koblitz
wrote:

> Buenos,
> I'm reading Zuur book (ecology models with R) and try make it entire in
> python.
> Have this function in R:
> M4 <- gam(So ∼ s(De) + factor(ID), subset = I1)
>
> the 's' term indicated with So is modelled as a smoothing function of De
>
> I'm looking for something close to this in python.
>

These kind of general questions are better asked on the scipy-user mailing
list which covers more general topics than numpy-discussion.

As far as I know, GAMs are not available in python, at least I never came
across any.

statsmodels has an ancient GAM in the sandbox that has never been connected
to any smoother, since, lowess, spline and kernel regression support was
missing. Nobody is working on that right now.
If you have only a single nonparametric variable, then statsmodels also has
partial linear model based on kernel regression, that is not cleaned up or
verified, but Padarn is currently working on this.

I think in this case using a penalized linear model with spline basis
functions would be more efficient, but there is also nothing clean
available, AFAIK.

It's not too difficult to write the basic models, but it takes time to
figure out the last 10% and to verify the results and write unit tests.


If you make your code publicly available, then I would be very interested
in a link. I'm trying to collect examples from books that have a python
solution.

Josef


>
> Someone can help me?
>
> abraços,
> Koblitz
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-27 Thread josef . pktd
On Sun, Apr 27, 2014 at 6:06 PM, Carl Kleffner  wrote:

> A possible option is to install the toolchain inside site-packages and to
> deploy it as PYPI wheel or wininst packages. The PATH to the toolchain
> could be extended during import of the package. But I have no idea, whats
> the best strategy to additionaly install ATLAS or other third party
> libraries.
>

What I did in the past is just to download the ATLAS binaries from the
scipy/numpy wiki and move them into the Python/Dlls directory. IIRC

My impression was that finding ATLAS binaries was the difficult part, not
moving them into the right directory.


>
> Cheers,
>
> Carl
>
>
> 2014-04-27 23:46 GMT+02:00 Matthew Brett :
>
> Hi,
>>
>> On Sun, Apr 27, 2014 at 2:34 PM, Carl Kleffner 
>> wrote:
>> > Hi,
>> >
>> > I will definitly don't have not time until thursday this week working
>> out
>> > the github workflow for a numpy pull request. So feel free to do it for
>> me.
>>
>> OK - I will have a go at this tomorrow.
>>
>> > BTW: There is a missing feature in the mingw-w64 toolchain. By now it
>> > features linking to msvcrt90 runtime only. I have do extend the specs
>> file
>> > to allow linking to msvcr100 with an addional flag. Or create a
>> dedicated
>> > toolchain - what do you think?
>>
>> I don't know.
>>
>> Is this a discussion that should go to the mingw-w64 list do you
>> think?  It must be a very common feature.
>>
>> As you know, I'm really hoping it will be possible make a devkit for
>> Python similar to the Ruby devkits [1].
>>
>
I got my entire initial setup on the computer I'm using right now through
python-xy, including MingW 32.

The only thing I ever had to do was to create the `distutils.cfg` in new
python install.

python-xy relies on the availability of a open source development
environment for numpy and scipy, and has been restricted so far to python
32 versions.
winpython is only a python distribution and is also available for 64bit
(with Gohlke binaries I think)

I think it would be very helpful to get python-xy set up for development
for 64 bit versions, now that the toolchain with MingW is available.

I'm skeptical about having lot's of distributions that install all their
own full toolchain (I always worry about which one is actually on the path.
 I deleted my first git for Windows version because it came with a built-in
MSYS/MingW toolchain and now just use the nice and small portable version.)



>
>> The ideal would be a devkit that transparently picked up 32 vs 64 bit,
>> and MSVC runtime according to the Python version.  For example, OSX
>> compilation automatically picks up the OSX SDK with which the relevant
>> Python was built.  Do you think something like this is possible?  That
>> would be a great improvement for people building extensions and wheels
>> on Windows.
>>
>
How does MingW64 decide whether to build 32 or to build 64 bit versions?
Does the python version matter for MingW?

or should this pick up one of the Visual SDK's that the user needs to
install?

Josef



>
>> Cheers,
>>
>> Matthew
>>
>> [1] http://rubyinstaller.org/add-ons/devkit/
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 8:31 PM,  wrote:

>
>
>
> On Sat, Apr 26, 2014 at 8:05 PM,  wrote:
>
>>
>>
>>
>> On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett 
>> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett 
>>> wrote:
>>> > Hi,
>>> >
>>> > On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith 
>>> wrote:
>>> >> On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett <
>>> matthew.br...@gmail.com> wrote:
>>> >>> Hi,
>>> >>>
>>> >>> I'm exploring Mingw-w64 for numpy building, and I've found it gives a
>>> >>> slightly different answer for 'exp' than - say - gcc on OSX.
>>> >>>
>>> >>> The difference is of the order of the eps value for the output number
>>> >>> (2 * eps for a result of ~2.0).
>>> >>>
>>> >>> Is accuracy somewhere specified for C functions like exp?  Or is
>>> >>> accuracy left as an implementation detail for the C library author?
>>> >>
>>> >> C99 says (sec 5.2.4.2.2) that "The accuracy of the floating point
>>> >> operations ... and of the library functions in  and
>>> >>  that return floating point results is implemenetation
>>> >> defined. The implementation may state that the accuracy is unknown."
>>> >> (This last sentence is basically saying that with regard to some
>>> >> higher up clauses that required all conforming implementations to
>>> >> document this stuff, saying "eh, who knows" counts as documenting it.
>>> >> Hooray for standards!)
>>> >>
>>> >> Presumably the accuracy in this case is a function of the C library
>>> >> anyway, not the compiler?
>>> >
>>> > Mingw-w64 implementation is in assembly:
>>> >
>>> >
>>> http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
>>> >
>>> >> Numpy has its own implementations for a
>>> >> bunch of the math functions, and it's been unclear in the past whether
>>> >> numpy or the libc implementations were better in any particular case.
>>> >
>>> > I only investigated this particular value, in which case it looked as
>>> > though the OSX value was closer to the exact value (via sympy.mpmath)
>>> > - by ~1 unit-at-the-last-place.  This was causing a divergence in the
>>> > powell optimization path and therefore a single scipy test failure.  I
>>> > haven't investigated further - was wondering what investigation I
>>> > should do, more than running the numpy / scipy test suites.
>>>
>>> Investigating further, with this script:
>>>
>>> https://gist.github.com/matthew-brett/11301221
>>>
>>> The following are tests of np.exp accuracy for input values between 0
>>> and 10, for numpy 1.8.1.
>>>
>>> If np.exp(x) performs perfectly, it will return the nearest floating
>>> point value to the exact value of exp(x).  If it does, this scores a
>>> zero for error in the tables below.  If 'proportion of zeros' is 1 -
>>> then np.exp performs perfectly for all tested values of exp (as is the
>>> case for linux here).
>>>
>>> OSX 10.9
>>>
>>> Proportion of zeros: 0.99789
>>> Sum of error: 2.15021267458e-09
>>> Sum of squared error: 2.47149370032e-14
>>> Max / min error: 5.96046447754e-08 -2.98023223877e-08
>>> Sum of squared relative error: 5.22456992025e-30
>>> Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
>>> eps:  2.22044604925e-16
>>> Proportion of relative err >= eps: 0.0
>>>
>>> Debian Jessie / Sid
>>>
>>> Proportion of zeros: 1.0
>>> Sum of error: 0.0
>>> Sum of squared error: 0.0
>>> Max / min error: 0.0 0.0
>>> Sum of squared relative error: 0.0
>>> Max / min relative error: 0.0 0.0
>>> eps:  2.22044604925e-16
>>> Proportion of relative err >= eps: 0.0
>>>
>>> Mingw-w64 Windows 7
>>>
>>> Proportion of zeros: 0.82089
>>> Sum of error: 8.08415331122e-07
>>> Sum of squared error: 2.90045099615e-12
>>> Max / min error: 5.96046447754e-08 -5.96046447754e-08
>>> Sum of squared relative error: 4.18466468175e-28
>>> Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
>>> eps:  2.22044604925e-16
>>> Proportion of relative err >= eps: 0.0
>>>
>>> Take-home : exp implementation for mingw-w64 is exactly (floating
>>> point) correct 82% of the time, and one unit-at-the-last-place off for
>>> the rest [1].  OSX is off by 1 ULP only 0.2% of the time.
>>>
>>
>>
>> Windows 64 with MKL
>>
>> \WinPython-64bit-3.3.2.2\python-3.3.2.amd64>python
>> "E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py"
>> Proportion of zeros: 0.99793
>> Sum of error: -2.10546855506e-07
>> Sum of squared error: 3.33304327526e-14
>> Max / min error: 5.96046447754e-08 -5.96046447754e-08
>> Sum of squared relative error: 4.98420694339e-30
>> Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
>> eps:  2.22044604925e-16
>> Proportion of relative err >= eps: 0.0
>>
>>
>> Windows 32 bit python with official MingW binaries
>>
>> Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
>> (Intel)] on win32
>>
>> Proportion of zeros: 0.99464
>> Sum of error: -3.91621083118e-07
>> Sum of squared error: 9.2239247812e-14
>>  Max / min error: 5.96046447754e-08 -5.96046447754e-08
>

Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 8:05 PM,  wrote:

>
>
>
> On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett 
>> wrote:
>> > Hi,
>> >
>> > On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith  wrote:
>> >> On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett <
>> matthew.br...@gmail.com> wrote:
>> >>> Hi,
>> >>>
>> >>> I'm exploring Mingw-w64 for numpy building, and I've found it gives a
>> >>> slightly different answer for 'exp' than - say - gcc on OSX.
>> >>>
>> >>> The difference is of the order of the eps value for the output number
>> >>> (2 * eps for a result of ~2.0).
>> >>>
>> >>> Is accuracy somewhere specified for C functions like exp?  Or is
>> >>> accuracy left as an implementation detail for the C library author?
>> >>
>> >> C99 says (sec 5.2.4.2.2) that "The accuracy of the floating point
>> >> operations ... and of the library functions in  and
>> >>  that return floating point results is implemenetation
>> >> defined. The implementation may state that the accuracy is unknown."
>> >> (This last sentence is basically saying that with regard to some
>> >> higher up clauses that required all conforming implementations to
>> >> document this stuff, saying "eh, who knows" counts as documenting it.
>> >> Hooray for standards!)
>> >>
>> >> Presumably the accuracy in this case is a function of the C library
>> >> anyway, not the compiler?
>> >
>> > Mingw-w64 implementation is in assembly:
>> >
>> >
>> http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
>> >
>> >> Numpy has its own implementations for a
>> >> bunch of the math functions, and it's been unclear in the past whether
>> >> numpy or the libc implementations were better in any particular case.
>> >
>> > I only investigated this particular value, in which case it looked as
>> > though the OSX value was closer to the exact value (via sympy.mpmath)
>> > - by ~1 unit-at-the-last-place.  This was causing a divergence in the
>> > powell optimization path and therefore a single scipy test failure.  I
>> > haven't investigated further - was wondering what investigation I
>> > should do, more than running the numpy / scipy test suites.
>>
>> Investigating further, with this script:
>>
>> https://gist.github.com/matthew-brett/11301221
>>
>> The following are tests of np.exp accuracy for input values between 0
>> and 10, for numpy 1.8.1.
>>
>> If np.exp(x) performs perfectly, it will return the nearest floating
>> point value to the exact value of exp(x).  If it does, this scores a
>> zero for error in the tables below.  If 'proportion of zeros' is 1 -
>> then np.exp performs perfectly for all tested values of exp (as is the
>> case for linux here).
>>
>> OSX 10.9
>>
>> Proportion of zeros: 0.99789
>> Sum of error: 2.15021267458e-09
>> Sum of squared error: 2.47149370032e-14
>> Max / min error: 5.96046447754e-08 -2.98023223877e-08
>> Sum of squared relative error: 5.22456992025e-30
>> Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
>> eps:  2.22044604925e-16
>> Proportion of relative err >= eps: 0.0
>>
>> Debian Jessie / Sid
>>
>> Proportion of zeros: 1.0
>> Sum of error: 0.0
>> Sum of squared error: 0.0
>> Max / min error: 0.0 0.0
>> Sum of squared relative error: 0.0
>> Max / min relative error: 0.0 0.0
>> eps:  2.22044604925e-16
>> Proportion of relative err >= eps: 0.0
>>
>> Mingw-w64 Windows 7
>>
>> Proportion of zeros: 0.82089
>> Sum of error: 8.08415331122e-07
>> Sum of squared error: 2.90045099615e-12
>> Max / min error: 5.96046447754e-08 -5.96046447754e-08
>> Sum of squared relative error: 4.18466468175e-28
>> Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
>> eps:  2.22044604925e-16
>> Proportion of relative err >= eps: 0.0
>>
>> Take-home : exp implementation for mingw-w64 is exactly (floating
>> point) correct 82% of the time, and one unit-at-the-last-place off for
>> the rest [1].  OSX is off by 1 ULP only 0.2% of the time.
>>
>
>
> Windows 64 with MKL
>
> \WinPython-64bit-3.3.2.2\python-3.3.2.amd64>python
> "E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py"
> Proportion of zeros: 0.99793
> Sum of error: -2.10546855506e-07
> Sum of squared error: 3.33304327526e-14
> Max / min error: 5.96046447754e-08 -5.96046447754e-08
> Sum of squared relative error: 4.98420694339e-30
> Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
> eps:  2.22044604925e-16
> Proportion of relative err >= eps: 0.0
>
>
> Windows 32 bit python with official MingW binaries
>
> Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
> (Intel)] on win32
>
> Proportion of zeros: 0.99464
> Sum of error: -3.91621083118e-07
> Sum of squared error: 9.2239247812e-14
> Max / min error: 5.96046447754e-08 -5.96046447754e-08
> Sum of squared relative error: 1.3334972729e-29
> Max / min relative error: 2.21593462148e-16 -2.2098803255e-16
> eps:  2.22044604925e-16
> Proportion of relative err >= eps: 0.0
>
>
>
>>

Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett wrote:

> Hi,
>
> On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett 
> wrote:
> > Hi,
> >
> > On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith  wrote:
> >> On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett 
> wrote:
> >>> Hi,
> >>>
> >>> I'm exploring Mingw-w64 for numpy building, and I've found it gives a
> >>> slightly different answer for 'exp' than - say - gcc on OSX.
> >>>
> >>> The difference is of the order of the eps value for the output number
> >>> (2 * eps for a result of ~2.0).
> >>>
> >>> Is accuracy somewhere specified for C functions like exp?  Or is
> >>> accuracy left as an implementation detail for the C library author?
> >>
> >> C99 says (sec 5.2.4.2.2) that "The accuracy of the floating point
> >> operations ... and of the library functions in  and
> >>  that return floating point results is implemenetation
> >> defined. The implementation may state that the accuracy is unknown."
> >> (This last sentence is basically saying that with regard to some
> >> higher up clauses that required all conforming implementations to
> >> document this stuff, saying "eh, who knows" counts as documenting it.
> >> Hooray for standards!)
> >>
> >> Presumably the accuracy in this case is a function of the C library
> >> anyway, not the compiler?
> >
> > Mingw-w64 implementation is in assembly:
> >
> >
> http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
> >
> >> Numpy has its own implementations for a
> >> bunch of the math functions, and it's been unclear in the past whether
> >> numpy or the libc implementations were better in any particular case.
> >
> > I only investigated this particular value, in which case it looked as
> > though the OSX value was closer to the exact value (via sympy.mpmath)
> > - by ~1 unit-at-the-last-place.  This was causing a divergence in the
> > powell optimization path and therefore a single scipy test failure.  I
> > haven't investigated further - was wondering what investigation I
> > should do, more than running the numpy / scipy test suites.
>
> Investigating further, with this script:
>
> https://gist.github.com/matthew-brett/11301221
>
> The following are tests of np.exp accuracy for input values between 0
> and 10, for numpy 1.8.1.
>
> If np.exp(x) performs perfectly, it will return the nearest floating
> point value to the exact value of exp(x).  If it does, this scores a
> zero for error in the tables below.  If 'proportion of zeros' is 1 -
> then np.exp performs perfectly for all tested values of exp (as is the
> case for linux here).
>
> OSX 10.9
>
> Proportion of zeros: 0.99789
> Sum of error: 2.15021267458e-09
> Sum of squared error: 2.47149370032e-14
> Max / min error: 5.96046447754e-08 -2.98023223877e-08
> Sum of squared relative error: 5.22456992025e-30
> Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
> eps:  2.22044604925e-16
> Proportion of relative err >= eps: 0.0
>
> Debian Jessie / Sid
>
> Proportion of zeros: 1.0
> Sum of error: 0.0
> Sum of squared error: 0.0
> Max / min error: 0.0 0.0
> Sum of squared relative error: 0.0
> Max / min relative error: 0.0 0.0
> eps:  2.22044604925e-16
> Proportion of relative err >= eps: 0.0
>
> Mingw-w64 Windows 7
>
> Proportion of zeros: 0.82089
> Sum of error: 8.08415331122e-07
> Sum of squared error: 2.90045099615e-12
> Max / min error: 5.96046447754e-08 -5.96046447754e-08
> Sum of squared relative error: 4.18466468175e-28
> Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
> eps:  2.22044604925e-16
> Proportion of relative err >= eps: 0.0
>
> Take-home : exp implementation for mingw-w64 is exactly (floating
> point) correct 82% of the time, and one unit-at-the-last-place off for
> the rest [1].  OSX is off by 1 ULP only 0.2% of the time.
>


Windows 64 with MKL

\WinPython-64bit-3.3.2.2\python-3.3.2.amd64>python
"E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py"
Proportion of zeros: 0.99793
Sum of error: -2.10546855506e-07
Sum of squared error: 3.33304327526e-14
Max / min error: 5.96046447754e-08 -5.96046447754e-08
Sum of squared relative error: 4.98420694339e-30
Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
eps:  2.22044604925e-16
Proportion of relative err >= eps: 0.0


Windows 32 bit python with official MingW binaries

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32

Proportion of zeros: 0.99464
Sum of error: -3.91621083118e-07
Sum of squared error: 9.2239247812e-14
Max / min error: 5.96046447754e-08 -5.96046447754e-08
Sum of squared relative error: 1.3334972729e-29
Max / min relative error: 2.21593462148e-16 -2.2098803255e-16
eps:  2.22044604925e-16
Proportion of relative err >= eps: 0.0



>
> Is mingw-w64 accurate enough?  Do we have any policy on this?
>

I wouldn't worry about a missing or an extra eps in our applications, but
the competition is more accurate.

Josef


>
> Cheers,
>
> Matthew
>
> [1] http://matthe

Re: [Numpy-discussion] 1.9.x branch

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 1:12 PM, Sebastian Berg
wrote:

> On Mi, 2014-04-23 at 10:11 -0400, josef.p...@gmail.com wrote:
> > On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg
> >  wrote:
> > > On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote:
> > >> Hi All,
> > >>
> > >>
> > >> I'd like to branch 1.9.x at the end of the month. There are a couple
> > >> of reasons for the timing. First, we have a lot of new stuff in the
> > >> development branch. Second, there is work ongoing in masked arrays
> > >> that I'd like to keep out of the release so that it has more time to
> > >> settle. Third, it's past time ;)
> > >
> > > Sounds good.
> > >
> > >> There are currently a number of 1.9.0 blockers, which can be seen
> > >> here.
> > >>
> > >> Datetime timezone handling broken in 1.7.x
> > >>
> > >> I don't think there is time to get this done for 1.9.0 and it needs to
> > >> be pushed off to 1.10.0.
> > >>
> > >> Return multiple field selection as ro view
> > >>
> > >> I have a branch for this, but because the returned type differs from a
> > >> copy by alignment spacing there was a test failure. Merging that
> > >> branch might cause some incompatibilities.
> > >>
> > >
> > > I am a bit worried here that comparisons might make trouble.
> > >
> > >> Object array creation new conversion to int
> > >>
> > >>
> > >> This one needs a decision. Julian, Sebastian, thoughts?
> > >>
> > >
> > > Maybe for all to consider this is about what happens for object arrays
> > > if you do things like:
> > >
> > > # Array cast to object array (np.array(arr) would be identical):
> > > a = np.arange(10).astype(object)
> > > # Array passed into new array creation (not just *one* array):
> > > b = np.array([np.arange(10)], dtype=object)
> > > # Numerical array is assigned to object array:
> > > c = np.empty(10, dtype=object)
> > > c[...] = np.arange(10)
> > >
> > > Before this change, the result was:
> > > type(a[0]) is int
> > > type(b[0,0]) is np.int_  # Note the numpy type
> > > type(c[0]) is int
> > >
> > > After this change, they are all `int`. Though note that the numpy type
> > > is preserved for example for long double. On the one hand preserving
> the
> > > numpy type might be nice, but on the other hand we don't care much
> about
> > > the dtypes of scalars and in practice the python types are probably
> more
> > > often wanted.
> >
> > what if I don't like python?
> >
>
> Fair point :). I also think it is more consistent if we use the numpy
> types (and the user has to cast to the python type explicitly). Could
> argue that if someone casts to object, they might like python objects,
> but if you don't want them that is tricky, too.
>
> There is the option that the numerical array -> object array cast always
> returns an array of numpy scalars. Making it consistent (opposite to
> what is currently in numpy master).
>
> This would mean that `numerical_arr.astype(object)` would give an array
> of numpy scalars always. Getting python scalars would only be possible
> using `arr.item()` (or `tolist`). I guess that change is less likely to
> cause problems, because the numpy types can do more things normally
> though they are slower.
>
> So a (still a bit unsure) +1 from me for making numeric -> object casts
> return arrays of numpy scalars unless we have reason to expect that to
> cause problems. Not sure how easy that would be to change, it wasn't a
> one line change when I tried, so there is something slightly tricky to
> clear out, but probably not too hard either.
>

More general background question.

Why is there casting involved in object arrays?

I thought object arrays will just take and return whatever you put in.
If I use python ints, I might want python ints.
If I use numpy int_s, I might want numpy ints.

b1 = np.array([np.arange(10)], dtype=object)
versus
b2 = np.array([list(range(10))], dtype=object)


>>> b1 = np.array([np.arange(10)], dtype=object)
>>> type(b1[0,2])

>>> type(b1[0][2])

>>>

>>> b2 = np.array([list(range(10))], dtype=object)
>>> type(b2[0,2])

>>> type(b2[0][2])


another version

>>> type(np.array(np.arange(10).tolist(), dtype=object)[2])

>>> type(np.array(list(np.arange(10)), dtype=object)[2])


Josef


>
> - Sebastian
>
> > >>> np.int_(0)**(-1)
> > inf
> > >>> 0**-1
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > 0**-1
> > ZeroDivisionError: 0.0 cannot be raised to a negative power
> >
> >
> > >>> type(np.arange(5)[0])
> > 
> > >>> np.arange(5)[0]**-1
> > inf
> >
> > >>> type(np.arange(5)[0].item())
> > 
> > >>> np.arange(5)[0].item()**-1
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > np.arange(5)[0].item()**-1
> > ZeroDivisionError: 0.0 cannot be raised to a negative power
> >
> > >>> np.__version__
> > '1.6.1'
> >
> >
> > I remember struggling through this (avoiding python operations) quite
> > a bit in my early bugfixes to scipy.stats.distributions.
> >
> > (IIRC I ended up avoiding most ints.)
> >
> > Josef
> >
> > >
> > > Since I jus

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 10:20 AM,  wrote:

>
>
>
> On Sat, Apr 26, 2014 at 10:10 AM,  wrote:
>
>>
>>
>>
>> On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett 
>> wrote:
>>
>>> Hi,
>>>
>>> On Thu, Apr 24, 2014 at 5:26 PM,   wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Apr 24, 2014 at 7:29 PM,  wrote:
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
>>> >>  wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Thu, Apr 24, 2014 at 5:08 PM,  wrote:
>>> 
>>> 
>>> 
>>> 
>>>  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
>>>   wrote:
>>> >
>>> >
>>> > Hi Matthew,
>>> >
>>> > On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
>>> >  wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Thanks to Cark Kleffner's toolchain and some help from Clint
>>> Whaley
>>> >> (main author of ATLAS), I've built 64-bit windows numpy and scipy
>>> >> wheels for testing.
>>> >>
>>> >> The build uses Carl's custom mingw-w64 build with static linking.
>>> >>
>>> >> There are two harmless test failures on scipy (being discussed on
>>> the
>>> >> list at the moment) - tests otherwise clean.
>>> >>
>>> >> Wheels are here:
>>> >>
>>> >>
>>> >>
>>> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>>> >>
>>> >>
>>> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>>> >>
>>> >> You can test with:
>>> >>
>>> >> pip install -U pip # to upgrade pip to latest
>>> >> pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy
>>> >> scipy
>>> >>
>>> >> Please do send feedback.
>>> >>
>>> >> ATLAS binary here:
>>> >>
>>> >>
>>> >>
>>> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>>> >>
>>> >> Many thanks for Carl in particular for doing all the hard work,
>>> >>
>>> >
>>> > Cool. After all these long years... Now all we need is a box
>>> running
>>> > tests for CI.
>>> >
>>> > Chuck
>>> >
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> 
>>>  I get two test failures with numpy
>>> 
>>>  Josef
>>> 
>>>  >>> np.test()
>>>  Running unit tests for numpy
>>>  NumPy version 1.8.1
>>>  NumPy is installed in C:\Python27\lib\site-packages\numpy
>>>  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500
>>> 64 bit
>>>  (AMD64)]
>>>  nose version 1.1.2
>>> 
>>> 
>>> ==
>>>  FAIL: test_iterator.test_iter_broadcasting_errors
>>> 
>>> --
>>>  Traceback (most recent call last):
>>>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>>>  runTest
>>>  self.test(*self.arg)
>>>    File
>>>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>>> line 657,
>>>  in test_iter_broadcasting_errors
>>>  '(2)->(2,newaxis)') % msg)
>>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>>> 44,
>>>  in assert_
>>>  raise AssertionError(msg)
>>>  AssertionError: Message "operands could not be broadcast together
>>> with
>>>  remapped shapes [original->remapped]: (2,3)->(2,3)
>>> (2,)->(2,newaxis) and
>>>  requested shape (4,3)" doesn't contain remapped operand
>>>  shape(2)->(2,newaxis)
>>> 
>>> 
>>> ==
>>>  FAIL: test_iterator.test_iter_array_cast
>>> 
>>> --
>>>  Traceback (most recent call last):
>>>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>>>  runTest
>>>  self.test(*self.arg)
>>>    File
>>>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>>> line 836,
>>>  in test_iter_array_cast
>>>  assert_equal(i.operands[0].strides, (-96,8,-32))
>>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>>> 255,
>>>  in assert_equal
>>>  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,
>>> err_msg),
>>>  verbose)
>>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>>> 317,
>>>  in assert_equal
>>>  raise AssertionError(msg)
>>>  AssertionError:
>>>  Items are not equal:
>>>  item=0
>>> 
>>>   ACTUAL: 96L
>>>   DESIRED: -96
>>> 
>>> 
>>> --
>>>  Ran 4828 tests in 46.306s
>>> 
>>>  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
>>>  
>>> 
>>>

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 10:10 AM,  wrote:

>
>
>
> On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Thu, Apr 24, 2014 at 5:26 PM,   wrote:
>> >
>> >
>> >
>> > On Thu, Apr 24, 2014 at 7:29 PM,  wrote:
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
>> >>  wrote:
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 24, 2014 at 5:08 PM,  wrote:
>> 
>> 
>> 
>> 
>>  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
>>   wrote:
>> >
>> >
>> > Hi Matthew,
>> >
>> > On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
>> >> (main author of ATLAS), I've built 64-bit windows numpy and scipy
>> >> wheels for testing.
>> >>
>> >> The build uses Carl's custom mingw-w64 build with static linking.
>> >>
>> >> There are two harmless test failures on scipy (being discussed on
>> the
>> >> list at the moment) - tests otherwise clean.
>> >>
>> >> Wheels are here:
>> >>
>> >>
>> >>
>> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>> >>
>> >>
>> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>> >>
>> >> You can test with:
>> >>
>> >> pip install -U pip # to upgrade pip to latest
>> >> pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy
>> >> scipy
>> >>
>> >> Please do send feedback.
>> >>
>> >> ATLAS binary here:
>> >>
>> >>
>> >>
>> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>> >>
>> >> Many thanks for Carl in particular for doing all the hard work,
>> >>
>> >
>> > Cool. After all these long years... Now all we need is a box running
>> > tests for CI.
>> >
>> > Chuck
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> 
>>  I get two test failures with numpy
>> 
>>  Josef
>> 
>>  >>> np.test()
>>  Running unit tests for numpy
>>  NumPy version 1.8.1
>>  NumPy is installed in C:\Python27\lib\site-packages\numpy
>>  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64
>> bit
>>  (AMD64)]
>>  nose version 1.1.2
>> 
>> 
>> ==
>>  FAIL: test_iterator.test_iter_broadcasting_errors
>> 
>> --
>>  Traceback (most recent call last):
>>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>>  runTest
>>  self.test(*self.arg)
>>    File
>>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>> line 657,
>>  in test_iter_broadcasting_errors
>>  '(2)->(2,newaxis)') % msg)
>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>> 44,
>>  in assert_
>>  raise AssertionError(msg)
>>  AssertionError: Message "operands could not be broadcast together
>> with
>>  remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis)
>> and
>>  requested shape (4,3)" doesn't contain remapped operand
>>  shape(2)->(2,newaxis)
>> 
>> 
>> ==
>>  FAIL: test_iterator.test_iter_array_cast
>> 
>> --
>>  Traceback (most recent call last):
>>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>>  runTest
>>  self.test(*self.arg)
>>    File
>>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>> line 836,
>>  in test_iter_array_cast
>>  assert_equal(i.operands[0].strides, (-96,8,-32))
>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>> 255,
>>  in assert_equal
>>  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
>>  verbose)
>>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
>> 317,
>>  in assert_equal
>>  raise AssertionError(msg)
>>  AssertionError:
>>  Items are not equal:
>>  item=0
>> 
>>   ACTUAL: 96L
>>   DESIRED: -96
>> 
>> 
>> --
>>  Ran 4828 tests in 46.306s
>> 
>>  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
>>  
>> 
>> >>>
>> >>> Strange. That second one looks familiar, at least the "-96" part.
>> Wonder
>> >>> why this doesn't show up with the MKL builds.
>> >>
>> >>
>> >> ok tried again, this time deleting the old numpy directories before
>>

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett wrote:

> Hi,
>
> On Thu, Apr 24, 2014 at 5:26 PM,   wrote:
> >
> >
> >
> > On Thu, Apr 24, 2014 at 7:29 PM,  wrote:
> >>
> >>
> >>
> >>
> >> On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
> >>  wrote:
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Apr 24, 2014 at 5:08 PM,  wrote:
> 
> 
> 
> 
>  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
>   wrote:
> >
> >
> > Hi Matthew,
> >
> > On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
> >  wrote:
> >>
> >> Hi,
> >>
> >> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
> >> (main author of ATLAS), I've built 64-bit windows numpy and scipy
> >> wheels for testing.
> >>
> >> The build uses Carl's custom mingw-w64 build with static linking.
> >>
> >> There are two harmless test failures on scipy (being discussed on
> the
> >> list at the moment) - tests otherwise clean.
> >>
> >> Wheels are here:
> >>
> >>
> >>
> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
> >>
> >>
> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
> >>
> >> You can test with:
> >>
> >> pip install -U pip # to upgrade pip to latest
> >> pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
> >> scipy
> >>
> >> Please do send feedback.
> >>
> >> ATLAS binary here:
> >>
> >>
> >>
> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
> >>
> >> Many thanks for Carl in particular for doing all the hard work,
> >>
> >
> > Cool. After all these long years... Now all we need is a box running
> > tests for CI.
> >
> > Chuck
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> 
>  I get two test failures with numpy
> 
>  Josef
> 
>  >>> np.test()
>  Running unit tests for numpy
>  NumPy version 1.8.1
>  NumPy is installed in C:\Python27\lib\site-packages\numpy
>  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64
> bit
>  (AMD64)]
>  nose version 1.1.2
> 
>  ==
>  FAIL: test_iterator.test_iter_broadcasting_errors
>  --
>  Traceback (most recent call last):
>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>  runTest
>  self.test(*self.arg)
>    File
>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
> line 657,
>  in test_iter_broadcasting_errors
>  '(2)->(2,newaxis)') % msg)
>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
> 44,
>  in assert_
>  raise AssertionError(msg)
>  AssertionError: Message "operands could not be broadcast together with
>  remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis)
> and
>  requested shape (4,3)" doesn't contain remapped operand
>  shape(2)->(2,newaxis)
> 
>  ==
>  FAIL: test_iterator.test_iter_array_cast
>  --
>  Traceback (most recent call last):
>    File "C:\Python27\lib\site-packages\nose\case.py", line 197, in
>  runTest
>  self.test(*self.arg)
>    File
>  "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
> line 836,
>  in test_iter_array_cast
>  assert_equal(i.operands[0].strides, (-96,8,-32))
>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
> 255,
>  in assert_equal
>  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
>  verbose)
>    File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line
> 317,
>  in assert_equal
>  raise AssertionError(msg)
>  AssertionError:
>  Items are not equal:
>  item=0
> 
>   ACTUAL: 96L
>   DESIRED: -96
> 
>  --
>  Ran 4828 tests in 46.306s
> 
>  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
>  
> 
> >>>
> >>> Strange. That second one looks familiar, at least the "-96" part.
> Wonder
> >>> why this doesn't show up with the MKL builds.
> >>
> >>
> >> ok tried again, this time deleting the old numpy directories before
> >> installing
> >>
> >> Ran 4760 tests in 42.124s
> >>
> >> OK (KNOWNFAIL=10, SKIP=8)
> >> 
> >>
> >>
> >> so pip also seems to be reusing leftover files.
> >>
> >> all clear.
> >
> >
> > Running the statsmodels test suite, I

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris  wrote:

>
> Hi Matthew,
>
> On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
>> (main author of ATLAS), I've built 64-bit windows numpy and scipy
>> wheels for testing.
>>
>> The build uses Carl's custom mingw-w64 build with static linking.
>>
>> There are two harmless test failures on scipy (being discussed on the
>> list at the moment) - tests otherwise clean.
>>
>> Wheels are here:
>>
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>>
>> You can test with:
>>
>> pip install -U pip # to upgrade pip to latest
>> pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy
>>
>> Please do send feedback.
>>
>> ATLAS binary here:
>>
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>>
>> Many thanks for Carl in particular for doing all the hard work,
>>
>>
> Cool. After all these long years... Now all we need is a box running tests
> for CI.
>

Very good news, after 3 years interruption I might be able to build scipy
again, and switch now to a 64bit development version.

Thanks for pushing for this, and doing all the hard work.

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:29 PM,  wrote:

>
>
>
> On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>>
>> On Thu, Apr 24, 2014 at 5:08 PM,  wrote:
>>
>>>
>>>
>>>
>>> On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris <
>>> charlesr.har...@gmail.com> wrote:
>>>

 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett >>> > wrote:

> Hi,
>
> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
> (main author of ATLAS), I've built 64-bit windows numpy and scipy
> wheels for testing.
>
> The build uses Carl's custom mingw-w64 build with static linking.
>
> There are two harmless test failures on scipy (being discussed on the
> list at the moment) - tests otherwise clean.
>
> Wheels are here:
>
>
> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>
> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>
> You can test with:
>
> pip install -U pip # to upgrade pip to latest
> pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
> scipy
>
> Please do send feedback.
>
> ATLAS binary here:
>
>
> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>
> Many thanks for Carl in particular for doing all the hard work,
>
>
 Cool. After all these long years... Now all we need is a box running
 tests for CI.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


>>> I get two test failures with numpy
>>>
>>> Josef
>>>
>>> >>> np.test()
>>> Running unit tests for numpy
>>> NumPy version 1.8.1
>>> NumPy is installed in C:\Python27\lib\site-packages\numpy
>>> Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
>>> (AMD64)]
>>> nose version 1.1.2
>>>
>>> ==
>>> FAIL: test_iterator.test_iter_broadcasting_errors
>>> --
>>> Traceback (most recent call last):
>>>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
>>> self.test(*self.arg)
>>>   File
>>> "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py", line
>>> 657, in test_iter_broadcasting_errors
>>> '(2)->(2,newaxis)') % msg)
>>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 44,
>>> in assert_
>>> raise AssertionError(msg)
>>> AssertionError: Message "operands could not be broadcast together with
>>> remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis) and
>>> requested shape (4,3)" doesn't contain remapped operand
>>> shape(2)->(2,newaxis)
>>>
>>> ==
>>> FAIL: test_iterator.test_iter_array_cast
>>> --
>>> Traceback (most recent call last):
>>>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
>>> self.test(*self.arg)
>>>   File
>>> "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py", line
>>> 836, in test_iter_array_cast
>>> assert_equal(i.operands[0].strides, (-96,8,-32))
>>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 255,
>>> in assert_equal
>>> assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
>>> verbose)
>>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 317,
>>> in assert_equal
>>> raise AssertionError(msg)
>>> AssertionError:
>>> Items are not equal:
>>> item=0
>>>
>>>  ACTUAL: 96L
>>>  DESIRED: -96
>>>
>>> --
>>> Ran 4828 tests in 46.306s
>>>
>>> FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
>>> 
>>>
>>>
>>  Strange. That second one looks familiar, at least the "-96" part. Wonder
>> why this doesn't show up with the MKL builds.
>>
>
> ok tried again, this time deleting the old numpy directories before
> installing
>
> Ran 4760 tests in 42.124s
>
> OK (KNOWNFAIL=10, SKIP=8)
> 
>
>
> so pip also seems to be reusing leftover files.
>
> all clear.
>

Running the statsmodels test suite, I get a failure
in test_discrete.TestProbitCG where fmin_cg converges to something that
differs in the 3rd decimal.

I usually only test the 32-bit version, so I don't know if this is specific
to this scipy version, but we haven't seen this in a long time.
I used our nightly binaries http://statsmodels.sourceforge.net/binaries/

Josef




>
> Josef
>
>
>
>
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
>
>
>
OT:   Oh, I hate pip and packages that require numpy

E:\tmp>C:\Python27\python C:\Python27\Scripts\pip-script.py install -U patsy
Downloading/unpacking patsy
  Running setup.py
(path:c:\users\josef\appdata\local\temp\pip_build_josef\patsy\setup.py)
egg_info for package patsy

no previously-included directories found matching 'doc\_build'
Downloading/unpacking numpy from
https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678(from
patsy)


  Found existing installation: numpy 1.6.2
Uninstalling numpy:
  Successfully uninstalled numpy
  Running setup.py install for numpy
...

...

C:\Python27\lib\distutils\dist.py:267: UserWarning: Unknown distribution
option: 'define_macros'

  warnings.warn(msg)

error: Unable to find vcvarsall.bat


  Rolling back uninstall of numpy
Cleaning up...


-
user error ?
I have a stale numpy-1.6.2-py2.7.egg-info file in site-packages

put that's a nice new feature of pip   "Rolling back uninstall of numpy"
numpy is still here

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris  wrote:

>
>
>
> On Thu, Apr 24, 2014 at 5:08 PM,  wrote:
>
>>
>>
>>
>> On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>> Hi Matthew,
>>>
>>> On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett 
>>> wrote:
>>>
 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
 scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


>>> Cool. After all these long years... Now all we need is a box running
>>> tests for CI.
>>>
>>> Chuck
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> I get two test failures with numpy
>>
>> Josef
>>
>> >>> np.test()
>> Running unit tests for numpy
>> NumPy version 1.8.1
>> NumPy is installed in C:\Python27\lib\site-packages\numpy
>> Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
>> (AMD64)]
>> nose version 1.1.2
>>
>> ==
>> FAIL: test_iterator.test_iter_broadcasting_errors
>> --
>> Traceback (most recent call last):
>>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
>> self.test(*self.arg)
>>   File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>> line 657, in test_iter_broadcasting_errors
>> '(2)->(2,newaxis)') % msg)
>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 44,
>> in assert_
>> raise AssertionError(msg)
>> AssertionError: Message "operands could not be broadcast together with
>> remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis) and
>> requested shape (4,3)" doesn't contain remapped operand
>> shape(2)->(2,newaxis)
>>
>> ==
>> FAIL: test_iterator.test_iter_array_cast
>> --
>> Traceback (most recent call last):
>>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
>> self.test(*self.arg)
>>   File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
>> line 836, in test_iter_array_cast
>> assert_equal(i.operands[0].strides, (-96,8,-32))
>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 255,
>> in assert_equal
>> assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
>> verbose)
>>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 317,
>> in assert_equal
>> raise AssertionError(msg)
>> AssertionError:
>> Items are not equal:
>> item=0
>>
>>  ACTUAL: 96L
>>  DESIRED: -96
>>
>> --
>> Ran 4828 tests in 46.306s
>>
>> FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
>> 
>>
>>
> Strange. That second one looks familiar, at least the "-96" part. Wonder
> why this doesn't show up with the MKL builds.
>

ok tried again, this time deleting the old numpy directories before
installing

Ran 4760 tests in 42.124s

OK (KNOWNFAIL=10, SKIP=8)



so pip also seems to be reusing leftover files.

all clear.

Josef




>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:08 PM,  wrote:

>
>
>
> On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>> Hi Matthew,
>>
>> On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett 
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
>>> (main author of ATLAS), I've built 64-bit windows numpy and scipy
>>> wheels for testing.
>>>
>>> The build uses Carl's custom mingw-w64 build with static linking.
>>>
>>> There are two harmless test failures on scipy (being discussed on the
>>> list at the moment) - tests otherwise clean.
>>>
>>> Wheels are here:
>>>
>>>
>>> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>>>
>>> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>>>
>>> You can test with:
>>>
>>> pip install -U pip # to upgrade pip to latest
>>> pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
>>> scipy
>>>
>>> Please do send feedback.
>>>
>>> ATLAS binary here:
>>>
>>>
>>> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>>>
>>> Many thanks for Carl in particular for doing all the hard work,
>>>
>>>
>> Cool. After all these long years... Now all we need is a box running
>> tests for CI.
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> I get two test failures with numpy
>

scipy looks good, just two powell trace failures

Josef


>
> Josef
>
> >>> np.test()
> Running unit tests for numpy
> NumPy version 1.8.1
> NumPy is installed in C:\Python27\lib\site-packages\numpy
> Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
> (AMD64)]
> nose version 1.1.2
>
> ==
> FAIL: test_iterator.test_iter_broadcasting_errors
> --
> Traceback (most recent call last):
>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
> self.test(*self.arg)
>   File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
> line 657, in test_iter_broadcasting_errors
> '(2)->(2,newaxis)') % msg)
>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 44, in
> assert_
> raise AssertionError(msg)
> AssertionError: Message "operands could not be broadcast together with
> remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis) and
> requested shape (4,3)" doesn't contain remapped operand
> shape(2)->(2,newaxis)
>
> ==
> FAIL: test_iterator.test_iter_array_cast
> --
> Traceback (most recent call last):
>   File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
> self.test(*self.arg)
>   File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
> line 836, in test_iter_array_cast
> assert_equal(i.operands[0].strides, (-96,8,-32))
>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 255,
> in assert_equal
> assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
> verbose)
>   File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 317,
> in assert_equal
> raise AssertionError(msg)
> AssertionError:
> Items are not equal:
> item=0
>
>  ACTUAL: 96L
>  DESIRED: -96
>
> --
> Ran 4828 tests in 46.306s
>
> FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
> 
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris  wrote:

>
> Hi Matthew,
>
> On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
>> (main author of ATLAS), I've built 64-bit windows numpy and scipy
>> wheels for testing.
>>
>> The build uses Carl's custom mingw-w64 build with static linking.
>>
>> There are two harmless test failures on scipy (being discussed on the
>> list at the moment) - tests otherwise clean.
>>
>> Wheels are here:
>>
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
>>
>> You can test with:
>>
>> pip install -U pip # to upgrade pip to latest
>> pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy
>>
>> Please do send feedback.
>>
>> ATLAS binary here:
>>
>>
>> https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
>>
>> Many thanks for Carl in particular for doing all the hard work,
>>
>>
> Cool. After all these long years... Now all we need is a box running tests
> for CI.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
I get two test failures with numpy

Josef

>>> np.test()
Running unit tests for numpy
NumPy version 1.8.1
NumPy is installed in C:\Python27\lib\site-packages\numpy
Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
(AMD64)]
nose version 1.1.2

==
FAIL: test_iterator.test_iter_broadcasting_errors
--
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
  File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
line 657, in test_iter_broadcasting_errors
'(2)->(2,newaxis)') % msg)
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 44, in
assert_
raise AssertionError(msg)
AssertionError: Message "operands could not be broadcast together with
remapped shapes [original->remapped]: (2,3)->(2,3) (2,)->(2,newaxis) and
requested shape (4,3)" doesn't contain remapped operand
shape(2)->(2,newaxis)

==
FAIL: test_iterator.test_iter_array_cast
--
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
  File "C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py",
line 836, in test_iter_array_cast
assert_equal(i.operands[0].strides, (-96,8,-32))
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 255, in
assert_equal
assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
verbose)
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 317, in
assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
item=0

 ACTUAL: 96L
 DESIRED: -96

--
Ran 4828 tests in 46.306s

FAILED (KNOWNFAIL=10, SKIP=8, failures=2)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.9.x branch

2014-04-23 Thread josef . pktd
On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg
 wrote:
> On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote:
>> Hi All,
>>
>>
>> I'd like to branch 1.9.x at the end of the month. There are a couple
>> of reasons for the timing. First, we have a lot of new stuff in the
>> development branch. Second, there is work ongoing in masked arrays
>> that I'd like to keep out of the release so that it has more time to
>> settle. Third, it's past time ;)
>
> Sounds good.
>
>> There are currently a number of 1.9.0 blockers, which can be seen
>> here.
>>
>> Datetime timezone handling broken in 1.7.x
>>
>> I don't think there is time to get this done for 1.9.0 and it needs to
>> be pushed off to 1.10.0.
>>
>> Return multiple field selection as ro view
>>
>> I have a branch for this, but because the returned type differs from a
>> copy by alignment spacing there was a test failure. Merging that
>> branch might cause some incompatibilities.
>>
>
> I am a bit worried here that comparisons might make trouble.
>
>> Object array creation new conversion to int
>>
>>
>> This one needs a decision. Julian, Sebastian, thoughts?
>>
>
> Maybe for all to consider this is about what happens for object arrays
> if you do things like:
>
> # Array cast to object array (np.array(arr) would be identical):
> a = np.arange(10).astype(object)
> # Array passed into new array creation (not just *one* array):
> b = np.array([np.arange(10)], dtype=object)
> # Numerical array is assigned to object array:
> c = np.empty(10, dtype=object)
> c[...] = np.arange(10)
>
> Before this change, the result was:
> type(a[0]) is int
> type(b[0,0]) is np.int_  # Note the numpy type
> type(c[0]) is int
>
> After this change, they are all `int`. Though note that the numpy type
> is preserved for example for long double. On the one hand preserving the
> numpy type might be nice, but on the other hand we don't care much about
> the dtypes of scalars and in practice the python types are probably more
> often wanted.

what if I don't like python?

>>> np.int_(0)**(-1)
inf
>>> 0**-1
Traceback (most recent call last):
  File "", line 1, in 
0**-1
ZeroDivisionError: 0.0 cannot be raised to a negative power


>>> type(np.arange(5)[0])

>>> np.arange(5)[0]**-1
inf

>>> type(np.arange(5)[0].item())

>>> np.arange(5)[0].item()**-1
Traceback (most recent call last):
  File "", line 1, in 
np.arange(5)[0].item()**-1
ZeroDivisionError: 0.0 cannot be raised to a negative power

>>> np.__version__
'1.6.1'


I remember struggling through this (avoiding python operations) quite
a bit in my early bugfixes to scipy.stats.distributions.

(IIRC I ended up avoiding most ints.)

Josef

>
> Since I just realized that things are safe (float128 does not cast to
> float after all), I changed my mind and am tempted to keep the new
> behaviour. That is, if it does not create any problems (there was some
> issue in scipy, not sure how bad).
>
> - Sebastian
>
>> Median of np.matrix is broken(
>>
>>
>> Not sure what the status of this one is.
>>
>> 1.8 deprecations: Follow-up ticket
>>
>>
>> Things that might should be removed.
>>
>> ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3
>>
>>
>> I believe this one was fixed. For general problems reading/writing big
>> files on OS X, I believe they were fixed in Maverick and I'm inclined
>> to recommend an OS upgrade rather than work to chunk all the io.
>>
>> Deprecate NPY_CHAR
>> This one is waiting on a fix from Pearu to make f2py use numpy
>> strings. I think we will need to do this ourselves if we want to carry
>> through the deprecation. In any case it probably needs to be pushed
>> off to 1.10.
>>
>> 1.7 deprecations: Follow-up ticket
>> Some of these changes have been made, ro diagonal view for instance,
>> some still remain.
>>
>>
>>
>> Additions, updates, and thoughts welcome.
>>
>>
>> Chuck
>>
>>
>>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed

2014-04-10 Thread josef . pktd
On Thu, Apr 10, 2014 at 4:43 AM, Francesc Alted  wrote:
> On 4/9/14, 10:46 PM, Chris Barker wrote:
>> On Tue, Apr 8, 2014 at 11:14 AM, Nathaniel Smith > > wrote:
>>
>> Thank you! Though I suspect that the most important part of my
>> contribution may have just been my high tolerance for writing emails
>> ;-).
>>
>>
>> no -- it's your high tolerance for _reading_ emails...
>>
>> Far too many of us have a high tolerance for writing them!
>
> Ha ha, very true!

I'm trying, but it's hard.
Sometimes avoiding the `send` button helps. I still have a collection
of draft replies.

Josef
"I'm not going to send this reply" -- oops, did it again

>
> --
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 3:33 PM,   wrote:
> On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett  wrote:
>> Hi,
>>
>> On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris
>>  wrote:
>>>
>>>
>>>
>>> On Fri, Apr 4, 2014 at 12:48 PM,  wrote:

 On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
  wrote:
 > Hi All,
 >
 > Currently there are several placements of the '.. versionadded::'
 > directive
 > and I'd like to settle
 > on a proper style for consistency. There are two occasions on which it
 > is
 > used, first, when a new function or class is added and second, when a
 > new
 > keyword is added to an existing function or method. The options are as
 > follows.
 >
 > New Function
 >
 > 1) Originally, the directive was added in the notes section.
 >
 > Notes
 > -
 > .. versionadded:: 1.5.0
 >
 >  2) Alternatively, it is placed after the extended summary.
 >
 > blah, blah
 >
 > ..versionadded:: 1.5.0
 >
 > Between these two, I vote for 2) because the version is easily found
 > when
 > reading the documentation either in a terminal or rendered into HTML.
 >
 > New Parameter
 >
 > 1) It is placed before the parameter description
 >
 > newoption : int, optional
 > .. versionadded:: 1.5.0
 > blah.
 >
 > 2) It is placed after the parameter description.
 >
 > newoption : int, optional
 > blah.
 >
 > .. versionadded:: 1.5.0
 >
 > Both of these render correctly, but the first is more compact while the
 > second puts the version
 > after the description where it doesn't interrupt the reading. I'm
 > tending
 > towards 1) on account of its compactness.
 >
 > Thoughts?

 I'm in favor of putting them only in the Notes section.

 Most of the time they are not "crucial" information and it's
 distracting.  I usually only look for them when I'm working explicitly
 across several numpy versions.

 like in python: versionadded 2.1 is only interesting for historians.

since I like history:

AFAICS:

arraysetops was changed in 1.4

histogram was added in 0.4
corrcoef was added in 0.9.2

numpy 0.9.2 is 8 years old
python 2.1 has soon it's 13th anniversary

Josef


>>>
>>>
>>> I find the opposite to be true. Because numpy needs maintain compatibility
>>> with a number python versions, I often check the python documentation to see
>>> in which version a function was added.
>>
>> I agree; versionadded 2.1 is not likely interesting but versionadded
>> 2.7 is very interesting.
>
> That's true, but it's a mess for maintainers because we support now 5
> python versions.
>
> numpy doesn't have a long history of versionadded yet, I didn't find
> anything for 1.3 in a quick search.
> statsmodels has now numpy 1.6 as minimum requirement and I'm
> interested in the features that become available with a minimum
> version increase.
> Once I know what I'm allowed to use, I only care about the "real"
> documentation, "How does einsum really work?".
>
> But as a numpy user, I was never really interested in the information
> that arraysetops where enhanced and renamed in numpy 1.x (x=?<4), or
> that take was added in 0.y, ...  Even the first part of polynomial is
> already in 1.4
> (It might just make me feel old if I remember when it was changed.)
>
> versionadded is not very distracting in the html rendering, so I'm
> just +0.1 on Notes.
>
> Josef
>
>>
>> Cheers,
>>
>> Matthew
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett  wrote:
> Hi,
>
> On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris
>  wrote:
>>
>>
>>
>> On Fri, Apr 4, 2014 at 12:48 PM,  wrote:
>>>
>>> On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
>>>  wrote:
>>> > Hi All,
>>> >
>>> > Currently there are several placements of the '.. versionadded::'
>>> > directive
>>> > and I'd like to settle
>>> > on a proper style for consistency. There are two occasions on which it
>>> > is
>>> > used, first, when a new function or class is added and second, when a
>>> > new
>>> > keyword is added to an existing function or method. The options are as
>>> > follows.
>>> >
>>> > New Function
>>> >
>>> > 1) Originally, the directive was added in the notes section.
>>> >
>>> > Notes
>>> > -
>>> > .. versionadded:: 1.5.0
>>> >
>>> >  2) Alternatively, it is placed after the extended summary.
>>> >
>>> > blah, blah
>>> >
>>> > ..versionadded:: 1.5.0
>>> >
>>> > Between these two, I vote for 2) because the version is easily found
>>> > when
>>> > reading the documentation either in a terminal or rendered into HTML.
>>> >
>>> > New Parameter
>>> >
>>> > 1) It is placed before the parameter description
>>> >
>>> > newoption : int, optional
>>> > .. versionadded:: 1.5.0
>>> > blah.
>>> >
>>> > 2) It is placed after the parameter description.
>>> >
>>> > newoption : int, optional
>>> > blah.
>>> >
>>> > .. versionadded:: 1.5.0
>>> >
>>> > Both of these render correctly, but the first is more compact while the
>>> > second puts the version
>>> > after the description where it doesn't interrupt the reading. I'm
>>> > tending
>>> > towards 1) on account of its compactness.
>>> >
>>> > Thoughts?
>>>
>>> I'm in favor of putting them only in the Notes section.
>>>
>>> Most of the time they are not "crucial" information and it's
>>> distracting.  I usually only look for them when I'm working explicitly
>>> across several numpy versions.
>>>
>>> like in python: versionadded 2.1 is only interesting for historians.
>>
>>
>> I find the opposite to be true. Because numpy needs maintain compatibility
>> with a number python versions, I often check the python documentation to see
>> in which version a function was added.
>
> I agree; versionadded 2.1 is not likely interesting but versionadded
> 2.7 is very interesting.

That's true, but it's a mess for maintainers because we support now 5
python versions.

numpy doesn't have a long history of versionadded yet, I didn't find
anything for 1.3 in a quick search.
statsmodels has now numpy 1.6 as minimum requirement and I'm
interested in the features that become available with a minimum
version increase.
Once I know what I'm allowed to use, I only care about the "real"
documentation, "How does einsum really work?".

But as a numpy user, I was never really interested in the information
that arraysetops where enhanced and renamed in numpy 1.x (x=?<4), or
that take was added in 0.y, ...  Even the first part of polynomial is
already in 1.4
(It might just make me feel old if I remember when it was changed.)

versionadded is not very distracting in the html rendering, so I'm
just +0.1 on Notes.

Josef

>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
 wrote:
> Hi All,
>
> Currently there are several placements of the '.. versionadded::' directive
> and I'd like to settle
> on a proper style for consistency. There are two occasions on which it is
> used, first, when a new function or class is added and second, when a new
> keyword is added to an existing function or method. The options are as
> follows.
>
> New Function
>
> 1) Originally, the directive was added in the notes section.
>
> Notes
> -
> .. versionadded:: 1.5.0
>
>  2) Alternatively, it is placed after the extended summary.
>
> blah, blah
>
> ..versionadded:: 1.5.0
>
> Between these two, I vote for 2) because the version is easily found when
> reading the documentation either in a terminal or rendered into HTML.
>
> New Parameter
>
> 1) It is placed before the parameter description
>
> newoption : int, optional
> .. versionadded:: 1.5.0
> blah.
>
> 2) It is placed after the parameter description.
>
> newoption : int, optional
> blah.
>
> .. versionadded:: 1.5.0
>
> Both of these render correctly, but the first is more compact while the
> second puts the version
> after the description where it doesn't interrupt the reading. I'm tending
> towards 1) on account of its compactness.
>
> Thoughts?

I'm in favor of putting them only in the Notes section.

Most of the time they are not "crucial" information and it's
distracting.  I usually only look for them when I'm working explicitly
across several numpy versions.

like in python: versionadded 2.1 is only interesting for historians.

Josef

>
> Chuck
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 8:50 AM, Daπid  wrote:
>
> On 2 April 2014 16:06, Sturla Molden  wrote:
>>
>>  wrote:
>>
>> > pandas came later and thought ddof=1 is worth more than consistency.
>>
>> Pandas is a data analysis package. NumPy is a numerical array package.
>>
>> I think ddof=1 is justified for Pandas, for consistency with statistical
>> software (SPSS et al.)
>>
>> For NumPy, there are many computational tasks where the Bessel correction
>> is not wanted, so providing a uncorrected result is the correct thing to
>> do. NumPy should be a low-level array library that does very little magic.
>
>
> All this discussion reminds me of the book "Numerical Recipes":
>
> "if the difference between N and N − 1 ever matters to you, then you
> are probably up to no good anyway — e.g., trying to substantiate a
> questionable
> hypothesis with marginal data."
>
> For any reasonably sized data set, it is a correction in the second
> significant figure.

I fully agree, but sometimes you don't have much choice.

`big data` == `statistics with negative degrees of freedom` ?

or maybe

`machine learning` == `statistics with negative degrees of freedom` ?

Josef

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread josef . pktd
On Thu, Apr 3, 2014 at 2:21 PM, Bago  wrote:
>
>
>
>>
>> Sturla
>>
>> P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
>> the biased estimator has smaller error. This is from experience in
>> marksmanship: I'd rather shoot a tight series with small systematic error
>> than scatter my bullets wildly but "unbiased" on the target. It is the
>> total error that counts. The series with smallest total error gets the
>> best
>> score. It is better to shoot two series and calibrate the sight in between
>> than use a calibration-free sight that don't allow us to aim. That's why I
>> think classical statistics got this one wrong. Unbiased is never a virtue,
>> but the smallest error is. Thus, if we are to repeat an experiment, we
>> should calibrate our estimator just like a marksman calibrates his sight.
>> But the aim should always be calibrated to give the smallest error, not an
>> unbiased scatter. Noone in their right mind would claim a shotgun is more
>> precise than a rifle because it has smaller bias. But that is what
>> applying
>> the Bessel correction implies.
>>
>
> I agree with the point, and what makes it even worse is that ddof=1 does not
> even produce an unbiased standard deviation estimate. I produces an unbiased
> variance estimate but the sqrt of this variance estimate is a biased
> standard deviation estimate,
> http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

But ddof=1 still produces a smaller bias than ddof=0

I think the main point in stats is that without ddof, the variance
will be too small and t-test or similar will be liberal in small
samples, or confidence intervals will be too short.
(for statisticians that prefer to have tests that maintain their level
and prefer to err on the "conservative" side.)

Josef


>
> Bago
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread josef . pktd
On Wed, Apr 2, 2014 at 10:06 AM, Sturla Molden  wrote:
>  wrote:
>
>> pandas came later and thought ddof=1 is worth more than consistency.
>
> Pandas is a data analysis package. NumPy is a numerical array package.
>
> I think ddof=1 is justified for Pandas, for consistency with statistical
> software (SPSS et al.)
>
> For NumPy, there are many computational tasks where the Bessel correction
> is not wanted, so providing a uncorrected result is the correct thing to
> do. NumPy should be a low-level array library that does very little magic.
>
> Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
> or specify ddof. Bu that belongs in the docs.
>
>
> Sturla
>
> P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
> the biased estimator has smaller error. This is from experience in
> marksmanship: I'd rather shoot a tight series with small systematic error
> than scatter my bullets wildly but "unbiased" on the target. It is the
> total error that counts. The series with smallest total error gets the best
> score. It is better to shoot two series and calibrate the sight in between
> than use a calibration-free sight that don't allow us to aim.

calibration == bias correction ?

That's why I
> think classical statistics got this one wrong. Unbiased is never a virtue,
> but the smallest error is. Thus, if we are to repeat an experiment, we
> should calibrate our estimator just like a marksman calibrates his sight.
> But the aim should always be calibrated to give the smallest error, not an
> unbiased scatter. Noone in their right mind would claim a shotgun is more
> precise than a rifle because it has smaller bias. But that is what applying
> the Bessel correction implies.

https://www.youtube.com/watch?v=i4xcEZZDW_I


I spent several days trying to figure out what Stata is doing for
small sample corrections to reduce the bias of the rejection interval
with "uncorrected" variance estimates.

Josef

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread josef . pktd
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith  wrote:
> On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers  wrote:
>>
>>
>>
>> On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith  wrote:
>>>
>>> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
>>> wrote:
>>> > Haslwanter Thomas  wrote:
>>> >
>>> >> Personally I cannot think of many applications where it would be
>>> >> desired
>>> >> to calculate the standard deviation with ddof=0. In addition, I feel
>>> >> that
>>> >> there should be consistency between standard modules such as numpy,
>>> >> scipy, and pandas.
>>> >
>>> > ddof=0 is the maxiumum likelihood estimate. It is also needed in
>>> > Bayesian
>>> > estimation.
>>>
>>> It's true, but the counter-arguments are also strong. And regardless
>>> of whether ddof=1 or ddof=0 is better, surely the same one is better
>>> for both numpy and scipy.
>>
>> If we could still choose here without any costs, obviously that's true. This
>> particular ship sailed a long time ago though. By the way, there isn't even
>> a `scipy.stats.std`, so we're comparing with differently named functions
>> (nanstd for example).
>
> Presumably nanstd is a lot less heavily used than std, and presumably
> people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
> think of changing nanstd to ddof=0 to match numpy? (With appropriate
> FutureWarning transition, etc.)

numpy is numpy, a numerical library
scipy.stats is stats and behaves differently.  (axis=0)

nanstd in scipy.stats will hopefully also go away soon, so I don't
think it's worth changing there either.

pandas came later and thought ddof=1 is worth more than consistency.

I don't think ddof defaults's are worth jumping through deprecation hoops.

(bias in cov, corrcoef is "non-standard" ddof)

Josef


>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.vander

2014-03-29 Thread josef . pktd
On Sat, Mar 29, 2014 at 7:31 AM,   wrote:
> On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río
>  wrote:
>> Hi,
>>
>> I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds
>> up `np.vander` by using accumulated multiplication instead of exponentiation
>> to compute the Vandermonde matrix. For largish matrices the speed-ups can be
>> quite dramatic, over an order of magnitude.
>>
>> Julian has raised concerns on numerical stability and loss of precision,
>> which don't seem to be all that relevant. Do speak up if you think
>> otherwise.
>>
>> We are also discussing replacing a recently added kwarg, "order", which now
>> accepts a string, either "increasing" or "decreasing", to indicate the
>> ordering of the matrix columns. This was not present in 1.8, so can still be
>> modified. The proposal is to replace it with a "reversed" boolean flag.
>> Unfortunately, the return of np.vander in 1.8 and before is the opposite
>> (i.e. its reversed) from the standard definition, which has powers
>> increasing from left to right. So it is not clear what the reversed keyword
>> should refer to:
>>
>> 1. If it refers to the standard definition, then it would default to False
>> for backwards compatibility, but be consistent with the conventional
>> definition.
>>
>> 2. If it refers to the existing behavior of numpy's vander, then it would
>> default to True, and not be consistent with the conventional definition.
>>
>> I prefer option 1, but would like to hear other's opinions. Which could of
>> course include naming the boolean flag more ingeniously, or keeping the
>> string flag. If he's reading, I'd specially like to hear Warren Weckesser's
>> thoughts, as he is the one who added the "order" kwarg.
>
> "order" is not a good name, I would find it very confusing (I'm
> usually mixing up order and degree)
> http://en.wikipedia.org/wiki/Order_of_a_polynomial
>
> how about calling the keyword "increasing=False" ?
> which would avoid defining what it's reversed against.

Obviously I didn't read the PR before answering.

But it shows that `increasing` might be obvious, or that Warren and I
think the same way.

Josef

>
>
> I don't know about precision loss.
> There's a nasty NIST problem for polynomial regression. If that
> doesn't change much, I wouldn't worry about differences in precision
> for statistical applications.
>
> But the problem is for the regression part, and might not be affected
> much by the vander precision. (Besides it's an example for "Don't do
> that at home.")
> http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html
> http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression
>
> Josef
>
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
>> dominación mundial.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.vander

2014-03-29 Thread josef . pktd
On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río
 wrote:
> Hi,
>
> I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds
> up `np.vander` by using accumulated multiplication instead of exponentiation
> to compute the Vandermonde matrix. For largish matrices the speed-ups can be
> quite dramatic, over an order of magnitude.
>
> Julian has raised concerns on numerical stability and loss of precision,
> which don't seem to be all that relevant. Do speak up if you think
> otherwise.
>
> We are also discussing replacing a recently added kwarg, "order", which now
> accepts a string, either "increasing" or "decreasing", to indicate the
> ordering of the matrix columns. This was not present in 1.8, so can still be
> modified. The proposal is to replace it with a "reversed" boolean flag.
> Unfortunately, the return of np.vander in 1.8 and before is the opposite
> (i.e. its reversed) from the standard definition, which has powers
> increasing from left to right. So it is not clear what the reversed keyword
> should refer to:
>
> 1. If it refers to the standard definition, then it would default to False
> for backwards compatibility, but be consistent with the conventional
> definition.
>
> 2. If it refers to the existing behavior of numpy's vander, then it would
> default to True, and not be consistent with the conventional definition.
>
> I prefer option 1, but would like to hear other's opinions. Which could of
> course include naming the boolean flag more ingeniously, or keeping the
> string flag. If he's reading, I'd specially like to hear Warren Weckesser's
> thoughts, as he is the one who added the "order" kwarg.

"order" is not a good name, I would find it very confusing (I'm
usually mixing up order and degree)
http://en.wikipedia.org/wiki/Order_of_a_polynomial

how about calling the keyword "increasing=False" ?
which would avoid defining what it's reversed against.


I don't know about precision loss.
There's a nasty NIST problem for polynomial regression. If that
doesn't change much, I wouldn't worry about differences in precision
for statistical applications.

But the problem is for the regression part, and might not be affected
much by the vander precision. (Besides it's an example for "Don't do
that at home.")
http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html
http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression

Josef

>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
> dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-27 Thread josef . pktd
On Thu, Mar 27, 2014 at 9:59 AM, Olivier Grisel
 wrote:
> 2014-03-27 14:55 GMT+01:00  :
>> On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel
>>  wrote:
>>> My understanding of Carl's effort is that the long term goal is to
>>> have official windows whl packages for both numpy and scipy published
>>> on PyPI with a builtin BLAS / LAPACK implementation so that users can
>>> do `pip install scipy` under windows and get something that just works
>>> without have to install any compiler (fortran or C) nor any additional
>>> library manually.
>>>
>>> Most windows users are beginners and you cannot really expect them to
>>> understand how to build the whole scipy stack from source.
>>>
>>> The current solution (executable setup installers) is not optimal as
>>> it requires Administrator rights to run, does not resolve dependencies
>>> as pip does and cannot be installed in virtualenvs.
>>
>> as small related point:
>>
>> The official installers can be used to install in virtualenv
>> The way I do it:
>> Run the superpack, official installer, wait until it extracts the
>> correct (SSE) install exe, then cancel
>> Then easy_install the install exe file that has been extracted to the
>> temp folder into the virtualenv.
>>
>> I don't remember if the extraction already requires admin rights, but
>> I think not.
>> easy_install doesn't require any, IIRC.
>
> Hackish but interesting. Maybe the extraction can be done with generic
> tools like winzip?

I tried to open and unzip with WinRAR but couldn't make sense of the content.

BTW: easy_install for other installers like matplotlib also works
nicely for virtualenv



However, the official installers are only for 32-bit python, and I
appreciate all the efforts to "modernize" the numpy and scipy builds.

Josef

>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-27 Thread josef . pktd
On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel
 wrote:
> My understanding of Carl's effort is that the long term goal is to
> have official windows whl packages for both numpy and scipy published
> on PyPI with a builtin BLAS / LAPACK implementation so that users can
> do `pip install scipy` under windows and get something that just works
> without have to install any compiler (fortran or C) nor any additional
> library manually.
>
> Most windows users are beginners and you cannot really expect them to
> understand how to build the whole scipy stack from source.
>
> The current solution (executable setup installers) is not optimal as
> it requires Administrator rights to run, does not resolve dependencies
> as pip does and cannot be installed in virtualenvs.

as small related point:

The official installers can be used to install in virtualenv
The way I do it:
Run the superpack, official installer, wait until it extracts the
correct (SSE) install exe, then cancel
Then easy_install the install exe file that has been extracted to the
temp folder into the virtualenv.

I don't remember if the extraction already requires admin rights, but
I think not.
easy_install doesn't require any, IIRC.

Josef

>
> If we can build numpy / scipy whl packages for windows with the Atlas
> dlls then fine embedded in the numpy package then good. It does not
> need to be the
> fastest BLAS / LAPACK lib in my opinion. Just something that works.
>
> The problem with ATLAS is that you need to select the number of thread
> at build time AFAIK. But we could set it to a reasonable default (e.g.
> 4 threads) for the default windows package.
>
> --
> Olivier
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-26 Thread josef . pktd
On Mon, Mar 24, 2014 at 8:33 PM, Nathaniel Smith  wrote:
> On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris
>  wrote:
>> On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith  wrote:
>>>
>>> On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith  wrote:
>>> > After 88 emails we don't have a conclusion in the other thread (see
>>> > [1] for background). But we have to come to some conclusion or another
>>> > if we want @ to exist :-). So I'll summarize where the discussion
>>> > stands and let's see if we can find some way to resolve this.
>>>
>>> Response in this thread so far seems (AFAICT) to have pretty much
>>> converged on same-left.
>>>
>>> If you think that this would be terrible and there is some compelling
>>> argument against it, then please speak up! Otherwise, if no-one
>>> objects, then I'll go ahead in the next few days and put same-left
>>> into the PEP.
>>
>>
>> I think we should take a close look at broadcasting before deciding on the
>> precedence.
>
> Can you elaborate? Like what, concretely, do you think we need to do now?

???


"In examples like this, parenthesizing the code aggressively to spell
out the logic, not
only to Stata but also to yourself and anybody else reading it, should
cause no embarrassment.
You need not assume knowledge of Stata's precedence rules that determine
interpretation when several operators are used in one expression. More
importantly,
you may avoid some horrible little bugs." Nicholas J. Cox

Trying to figure out what Stata is using: elementwise operations are
just below their matrix version in operator precedence.
But Stata came late to matrix algebra, and is definitely not like
Matlab or Gauss, or numpy.

Josef



>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] why sort does not accept a key?

2014-03-24 Thread josef . pktd
On Mon, Mar 24, 2014 at 12:08 PM, Alan G Isaac  wrote:
>> On Mon, Mar 24, 2014 at 11:32 AM, Alan G Isaac wrote:
>>> I'm wondering if `sort` intentionally does not accept
>>> a `key`
>>> or if this is just a missing feature?
>
>
> On 3/24/2014 11:47 AM, Alexander Belopolsky wrote:
>> It would be very inefficient to call a key function on
>> every element compared during the sort.   See np.argsort
>> and np.lexsort for faster alternatives.
>
>
> But the keys could be as in `lexsort`.
>
> I am currently using `argsort`, but I can do so because
> I don't need a lexicographically determined sort order
> for the indexes.
>
>
> To close with a related question:
> what is the preferred idiom for a descending sort?

adding [::-1] just creates a new view, pretty low cost.

Josef


>
> Thanks,
> Alan
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 9:27 PM,  wrote:

>
>
>
> On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Mar 21, 2014 at 6:49 PM,  wrote:
>>
>>>
>>>
>>>
>>> On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris <
>>> charlesr.har...@gmail.com> wrote:
>>>



 On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac wrote:

> The documentation of numpy.unique
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
> does not seem to promise that return_index=True will always index the
> *first* occurrence of each unique item, which I believe is the current
> behavior.
>
> A promise would be nice.
> Is it intended?
>
>
 Yes, it is intended, although the required mergesort wasn't available
 for all types before numpy 1.7.

>>>
summary, AFAICS: since numpy 1.6.2 np.unique used mergesort if
return_index=True and provides a stable sort.

Josef



>
>>> Does this mean return_inverse works again for all cases, even with
>>> return_index?
>>>
>>> I removed return_index from my code in statsmodels because I make
>>> frequent use of return_inverse, which was broken. We don't have any
>>> unittests in statsmodels anymore that use both return_xxx.
>>>
>>>
>> I don't know, needs checking. Seems to work now with a simple trial array
>> of integers.
>>
>
> my example from may 2012,  thread "1.6.2 no more unique for rows"
> works fine on  python 3.3  numpy 1.7.1
>
> >>> groups = np.random.randint(0,4,size=(10,2))
> >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
> return_inverse=True)
> >>> uni
> array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)],
>   dtype=[('f0', ' >>> uni_inv
> array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32)
> >>> np.__version__
> '1.7.1'
>
> Thanks,
>
> Josef
>
>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris  wrote:

>
>
>
> On Fri, Mar 21, 2014 at 6:49 PM,  wrote:
>
>>
>>
>>
>> On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac wrote:
>>>
 The documentation of numpy.unique
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
 does not seem to promise that return_index=True will always index the
 *first* occurrence of each unique item, which I believe is the current
 behavior.

 A promise would be nice.
 Is it intended?


>>> Yes, it is intended, although the required mergesort wasn't available
>>> for all types before numpy 1.7.
>>>
>>
>> Does this mean return_inverse works again for all cases, even with
>> return_index?
>>
>> I removed return_index from my code in statsmodels because I make
>> frequent use of return_inverse, which was broken. We don't have any
>> unittests in statsmodels anymore that use both return_xxx.
>>
>>
> I don't know, needs checking. Seems to work now with a simple trial array
> of integers.
>

my example from may 2012,  thread "1.6.2 no more unique for rows"
works fine on  python 3.3  numpy 1.7.1

>>> groups = np.random.randint(0,4,size=(10,2))
>>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
>>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
return_inverse=True)
>>> uni
array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)],
  dtype=[('f0', '>> uni_inv
array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32)
>>> np.__version__
'1.7.1'

Thanks,

Josef


> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris  wrote:

>
>
>
> On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac wrote:
>
>> The documentation of numpy.unique
>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
>> does not seem to promise that return_index=True will always index the
>> *first* occurrence of each unique item, which I believe is the current
>> behavior.
>>
>> A promise would be nice.
>> Is it intended?
>>
>>
> Yes, it is intended, although the required mergesort wasn't available for
> all types before numpy 1.7.
>

Does this mean return_inverse works again for all cases, even with
return_index?

I removed return_index from my code in statsmodels because I make frequent
use of return_inverse, which was broken. We don't have any unittests in
statsmodels anymore that use both return_xxx.

Josef



>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 8:26 PM, Alan G Isaac  wrote:

> The documentation of numpy.unique
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
> does not seem to promise that return_index=True will always index the
> *first* occurrence of each unique item, which I believe is the current
> behavior.
>
> A promise would be nice.
> Is it intended?
>

AFAIU it's not, or it was in version, but shouldn't be.  ??

I think this broke return_inverse in some cases if both were set to true

I haven't kept track of the problems, and the code still seems to be the
same that was changed to stable sorting.

Josef




>
> Alan Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-20 Thread josef . pktd
On Thu, Mar 20, 2014 at 1:25 PM, Nathaniel Smith  wrote:

> On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith  wrote:
>
>> Okay, I wrote a little script [1] to scan Python source files look for
>> things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot
>> method equivalents. So what we get out is:
>> - a count of how many 'dot' calls there are
>> - a count of how often we see left-associative nestings: dot(dot(a, b), c)
>> - a count of how often we see right-associative nestings: dot(a, dot(b,
>> c))
>>
>> Running it on a bunch of projects, I get:
>>
>> | project  | dots | left | right | right/left |
>> |--+--+--+---+|
>> | scipy|  796 |   53 |27 |   0.51 |
>> | nipy |  275 |3 |19 |   6.33 |
>> | scikit-learn |  472 |   11 |10 |   0.91 |
>> | statsmodels  |  803 |   46 |38 |   0.83 |
>> | astropy  |   17 |0 | 0 |nan |
>> | scikit-image |   15 |1 | 0 |   0.00 |
>> |--+--+--+---+|
>> | total| 2378 |  114 |94 |   0.82 |
>>
>
> Another way to visualize this, converting each contiguous "chain" of calls
> to np.dot into a parenthesized expression, and then counting how often we
> see each pattern.
>
>   1943  (_ @ _)
>100  ((_ @ _) @ _) # left
> 86  (_ @ (_ @ _)) # right
>  2  (_ @ ((_ @ _) @ _))
>  2  (((_ @ _) @ _) @ _) # left
>  1  ((_ @ (_ @ _)) @ _)
>  1  ((_ @ _) @ (_ @ _))
>  1  (((_ @ _) @ _) @ (_ @ _))
>  1  ((_ @ ((_ @ _) @ _)) @ _)
>  1  ((_ @ _) @ (_ @ (_ @ _)))
>
> (This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3
> different patterns that have a consistent associativity.
>
> From this I'm leaning towards the conclusions that:
>
> - Expressions with complex parenthesization do happen, but probably not
> often enough to justify elaborate stuff like my 'chaining' proposal -- only
> 8.7% of these cases involve more than one @.
>

just for statsmodels

We do have a very large amount of chaining, but in many cases this has been
taken out of a single expression into a temporary or permanent variable for
parts of the chain. (similar to the quadratic form example in the PEP),
either for clarity (a temp variable), or because one dot product shows up
several times in the same expression (quadratic forms) or because we need
to keep it around for reuse in other expressions.

That's what I tried to explain before, that chaining and breaking up larger
multi-dot expressions is most of the time a intentional choice and not just
random because the the dot function forces us.

The most convincing argument for me for @ is that it makes parenthesis
visible (until I realized that I didn't really care about @).
This reduces the cases where we separate out a dot product for clarity and
readibility, but still leaves us with the other two cases, where our
chaining won't change whatever numpy provides additionally.

Josef



>
> - There's very little support here for the intuition that
> right-associativity is more useful than left-associativity on a day-to-day
> basis.
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-19 Thread josef . pktd
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith  wrote:

> On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern 
> wrote:
> > On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith  wrote:
> >> On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith  wrote:
> >>> Mathematica: instead of having an associativity, a @ b @ c gets
> >>> converted into mdot([a, b, c])
> >>
> >> So, I've been thinking about this (thanks to @rfateman for pointing it
> >> out), and wondering if Mathematica's approach is worth following up
> >> more. (It would need to make it past python-dev, of course, but worst
> >> case is just that they say no and we're back where we are now, so we
> >> might as well think it through.)
> >
> > I predict with near-certainty that this will be rejected,
>
> I guess that's what everyone thought about @ too? ;-)
>
> > but that
> > doesn't prevent it from derailing the discussion. This proposal is
> > unlike anything else in Python. Chained comparisons are *not* similar
> > to this proposal. The chaining only happens at the syntax level, not
> > the semantics. `a < b < c` gets compiled down to `a.__lt__(b) and
> > b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.
>
> Yes, the syntax is the same as chained comparisons, and the dispatch
> is a generalization of regular operators. It is unusual; OTOH, @ is
> unusual in that no other operators in Python have the property that
> evaluating in the wrong order can cost you seconds of time and
> gigabytes of memory. Perhaps.
>
> > We have approval for a binary @ operator. Take the win.
>
> We have approval, and we have a request: that we figure out how @
> should work in detail to be most useful to us. Maybe that's this
> proposal; maybe not. Ultimately rejected-or-not-rejected comes down to
> how strong the arguments for something are. And while we can make some
> guesses about that, it's impossible to know how strong an argument
> will be until one sits down and works it out. So I still would like to
> hear what people think, even if it just ends in the conclusion that
> it's a terrible idea ;-).
>


What happens if you have 5 @ in a row?

My head hurts if I had to think about what would actually be going on.
and don't forget, the sparse matrix is stuck in the middle.

But I would be happy to have a optimizing multi_dot or chain_dot function
when it feels safe enough.



>
> As for arguments against the "grouping" semantics, I did think of one
> another case where @ is not associative, though it's pretty weird:
>
> In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4))
>
> In [10]: np.dot(a, np.dot(a, a.astype(float)))
> Out[10]:
> array([[  1680.,   1940.,   2200.,   2460.],
>[  4880.,   5620.,   6360.,   7100.],
>[  8080.,   9300.,  10520.,  11740.],
>[ 11280.,  12980.,  14680.,  16380.]])
>
> In [12]: np.dot(np.dot(a, a), a.astype(float))
> Out[12]:
> array([[ 1680.,  1940.,  2200.,  2460.],
>[-1264., -1548., -1832., -2116.],
>[ 1936.,  2132.,  2328.,  2524.],
>[-1008., -1100., -1192., -1284.]])
>
> (What's happening is that we have int8 @ int8 @ float, so (int8 @
> int8) @ float has overflows in the first computation, but int8 @ (int8
> @ float) does all the computations in float, with no overflows.)
>

That's similar to my example before that mixes in some scalar *.

I thought of it as an argument for same-left

Josef



>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread josef . pktd
I'm still bothered by what Nathaniel mentioned about mixing 1d and 2d arrays


>>> c = np.arange(4)
>>> a = np.arange(16).reshape(4,4)
>>> cc = c[:,None]

>>> a.dot(c).dot(c.T)
420

>>> a.dot(c.dot(c.T))
array([[  0,  14,  28,  42],
   [ 56,  70,  84,  98],
   [112, 126, 140, 154],
   [168, 182, 196, 210]])


>>> a.dot(cc).dot(cc.T)
array([[  0,  14,  28,  42],
   [  0,  38,  76, 114],
   [  0,  62, 124, 186],
   [  0,  86, 172, 258]])

>>> a.dot(cc.dot(cc.T))
array([[  0,  14,  28,  42],
   [  0,  38,  76, 114],
   [  0,  62, 124, 186],
   [  0,  86, 172, 258]])


hint:
>>> c.dot(c.T)
14

and I expect it will be a lot more fun if we mix in some 3d or nd arrays.


I think some of the decisions should not be driven by what is the most
convenient for the usual cases, but by how easy it is to read your code and
find the bugs where we made "silly" mistakes.


A biased view from someone who learned how to use numpy and scipy by
debugging.


Matlab and GAUSS are user friendly, they don't allow for reduced dimension
and never steal an axis in a reduce operation.

I didn't manage to come up with more difficult examples (and ran out of
time)

>>> (a.dot(c).dot(c.T)).dot(2*a)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'numpy.int32' object has no attribute 'dot'

If we make too many mistakes, then numpy tells us.
In the above examples, the scalar dot matrix would raise according to the
PEP.

I cannot come up with examples where we mix 3d and 2d and 1d because dot
currently doesn't do it.


"sane-left"

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread josef . pktd
On Mon, Mar 17, 2014 at 6:33 PM, Christophe Bal  wrote:

> I think that weak-left is a little strange, just think a little of the
> operators used by mathematicians that always follow a hierarchy.
>
> A parser is mostly done using grammars : see
> http://docs.python.org/3.1/reference/grammar.html.
>
> Defining *-product to have stronger priority than the @-product, and this
> last having stronger priority than +, will make the changes in the grammar
> easier.
>
> I'm now convinced of the usefulness of @ and @@ too but I also think that
> you must think of other uses than only for numpy. In other words, numpy is
> a the good argument for this new operators, but this can also open new
> perspectives for other uses.
>


My main problem with weak-left (* higher) and tight-left (@ higher)
compared to same-left is that I don't see any obvious choice between the
weak and tight.
I don't think I would have problems with readability.

Wikipedia doesn't say anything about precedence of Hadamard versus matrix
product.

matlab, IDL and Gauss (I checked the manual) all use same-left, as
Nathaniel pointed out.

For scalar * together with dot product which is more common in formulas, we
would just read it sequentially, i.e. same-left.

I don't remember when I have seen dot-in-a-circle in a paper, but I don't
think there was any precedence either.

---
I guess the same applies for other (mis)uses of @

from math import sqrt

class MyOp(object):

def __init__(self, func):
self.func = func

def __at__(self, x):
return [self.func(xi) for xi in x]


myop = MyOp(lambda x: sqrt(x))

print myop.__at__(range(3))   # myop @ range(5)
print myop.__at__(range(3) * 2)  # myop @ (range(5) * 2)
print myop.__at__(range(3)) * 3  # myop @ range(5) * 3

'''
[0.0, 1.0, 1.4142135623730951]
[0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951]
[0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951, 0.0, 1.0,
1.4142135623730951]
'''

-


Josef




>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   6   7   8   9   10   >