Re: [Numpy-discussion] Possible bug in Numpy.convolve

2017-03-22 Thread josef . pktd
On Wed, Mar 22, 2017 at 10:06 PM, Thøger Emil Rivera-Thorsen <
thoger.e...@gmail.com> wrote:

> Dear list;
>
> I am honestly not certain whether this, or the SciPy list, is the
> appropriate place to post this; please let me know if I got it wrong.
>
> I am convolving a 1D data set containing a relatively narrow peak, with a
> relatively narrow Gaussian kernel, in order to emulate the effect of
> atmospheric seeing on astrophysical observations.
>
> I have a 1D data array 45 pixels long, and a Gaussian kernel, and run
> np.convolve(data, kernel, mode='same') on the two arrays, the resulting
> array's peak is shifted relative to the origin. I have attached a plot to
> illustrate.
>
> The original data is shown in blue. When I convolve it with a symmetric
> kernel (black), I get an offset resulting peak (magenta). If I flip the
> kernel -- even though it is perfectly symmetric -- the resulting curve is
> offset in the opposite direction (yellow). However, if I offset the kernel
> so it is centered exactly one pixel below the central value, the output
> array gets centered correct (red), even if I flip the (now no longer
> symmetric) kernel.
>
> This is using Numpy 1.11.3, python 2.7.13, on Anaconda 4.3.0 64-bit on
> Ubuntu 16.10
>
> Using astropy.convolution, reproduces the correct red curve, so I can use
> that for now, but it seems to me this is either a bug or, if it is indeed
> the intended behavior, a word of caution would be merited in the docstring.
>
> Cheers,
>
> Emil Rivera-Thorsen
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
Can you provide an example to replicate?

I haven't seen this behavior, it looks centered to me, at least for odd
window length..
AFAIR, I had to try a bit in the past for how to set the window and
location with even window length.

>>> np.__version__
'1.11.2'

>>> x = np.linspace(-1, 1, 21)
>>> w = stats.norm.pdf(np.linspace(-3, 3, 5))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ -2.e-01,  -1.33368234e-01],
   [ -1.e-01,  -6.66841169e-02],
   [  0.e+00,   1.51788304e-17],
   [  1.e-01,   6.66841169e-02],
   [  2.e-01,   1.33368234e-01]])


>>> x = np.abs(np.linspace(-1, 1, 21))
>>> w = stats.norm.pdf(np.linspace(-3, 3, 4))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ 0.2   ,  0.12320129],
   [ 0.1   ,  0.07392077],
   [ 0.,  0.02552663],
   [ 0.1   ,  0.02552663],
   [ 0.2   ,  0.07392077]])


>>> w = stats.norm.pdf(np.linspace(-3, 3, 5))
>>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13]
array([[ 0.2   ,  0.13336823],
   [ 0.1   ,  0.06757049],
   [ 0.,  0.02767626],
   [ 0.1   ,  0.06757049],
   [ 0.2   ,  0.13336823]])

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread josef . pktd
On Wed, Feb 22, 2017 at 11:57 AM, Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hi Matthew,
> maybe it is not the best place to discuss problems of pandas, but to show
> that I am not missing something, let's consider a simple example.
>
> # simplest DataFrame
> x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20)))
> # simplest indexing. Can you predict results without looking at comments?
> x[:2] # returns two first rows, as expected
> x[[0, 1]]# returns copy of x, whole dataframe
> x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you 
> guess why?)
> x[[0, 1], :] # unhashable type: list
>
>
> just in case - I know about .loc and .iloc, but when you write code with
> many subroutines, you concentrate on numpy inputs, and at some point you
> simply *forget* to convert some of the data you operated with to numpy
> and it *continues* to work, but it yields wrong results (while you tested
> everything, but you tested this for numpy). Checking all the inputs in each
> small subroutine is strange.
>
> Ok, a bit more:
>
> x[x['a'] > 5]# works as expected
> x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be 
> hashed
> lookup = numpy.arange(10)
> x[lookup[x['a']] > 5] # works as expected
> x[lookup[x['a']] > 5, :]  # TypeError: unhashable type: 'numpy.ndarray'
>
> x[lookup]['a']   # indexError
> x['a'][lookup]   # works as expected
>
>
> Now let's go a bit further: train/test splitted the data for machine
> learning (again, the most frequent operation)
>
> from sklearn.model_selection import train_test_split
> x1, x2 = train_test_split(x, random_state=42)
> # compare next to operations with pandas.DataFrame
> col = x1['a']print col[:2]   # first two elementsprint col[[0, 1]]  # 
> doesn't fail (while there in no row with index 0), fills it with NaNprint 
> col[numpy.arange(2)] # same as previous
> print col[col > 4] # as expectedprint col[col.values > 4] # as expectedprint 
> col.values[col > 4] # converts boolean to int, uses int indexing, but at 
> least raises warning
>
>
> Mistakes done by such silent misoperating are not easy to detect (when
> your data pipeline consists of several steps), quite hard to locate the
> source of problem and almost impossible to be sure that you indeed avoided
> all such caveats. Code review turns into paranoidal process (if you care
> about the result, of course).
>
> Things are even worse, because I've demonstrated this for my installation,
> and probably if you run this with some other pandas installation, you get
> some other results (that were really basic operations). So things that
> worked ok in one version, may work different way in the other, this becomes
> completely intractable.
>
> Pandas may be nice, if you need a report, and you need get it done
> tomorrow. Then you'll throw away the code. When we initially used pandas as
> main data storage in yandex/rep, it looked like an good idea, but a year
> later it was obvious this was a wrong decision. In case when you build data
> pipeline / research that should be working several years later (using some
> other installation by someone else), usage of pandas shall be *minimal*.
>
> That's why I am looking for a reliable pandas substitute, which should be:
> - completely consistent with numpy and should fail when this wasn't
> implemented / impossible
> - fewer new abstractions, nobody wants to learn 
> one-more-way-to-manipulate-the-data,
> specifically other researchers
> - it may be less convenient for interactive data mungling
>   - in particular, less methods is ok
> - written code should be interpretable, and hardly can be misinterpreted.
> - not super slow, 1-10 gigabytes datasets are a normal situation
>

Just to the pandas part

statsmodels supported pandas almost from the very beginning (or maybe after
1.5 years) when the new pandas was still very young.

However, what I insisted on is that pandas is in the wrapper/interface
code, and internally only numpy arrays are used. Besides the confusing
"magic" indexing of early pandas, there were a lot of details that silently
produced different results, e.g. default iteration on axis=1, ddof in std
and var =1 instead of numpy =0.

Essentially, every interface corresponds to np.asarry, but we store the
DataFrame information, mainly the index and column names, wo we can return
the appropriate pandas object if a pandas object was used for the input.

This has worked pretty well. Users can have their dataframes, and we have
pure numpy algorithms.

Recently we have started to use pandas inside a few functions or classes
that are less tightly integrated into the overall setup. We also use pandas
for some things that are not convenient or not available in numpy. Our
internal use of pandas groupby and similar will most likely increase over
time.
(One of the main issues we had was date and time index because that was a
moving target in both numpy and pandas.)


One issue for 

Re: [Numpy-discussion] PowerPC testing servers

2017-02-16 Thread josef . pktd
On Thu, Feb 16, 2017 at 3:55 AM, Ralf Gommers 
wrote:

>
>
> On Thu, Feb 16, 2017 at 3:53 PM, Sandro Tosi  wrote:
>
>> > A recent post to the wheel-builders mailing list pointed out some
>> > links to places providing free PowerPC hosting for open source
>> > projects, if they agree to a submitted request:
>>
>> The debian project has some powerpc machines (and we still build numpy
>> on those boxes when i upload a new revision to our archives) and they
>> also have hosts dedicated to let debian developers login and debug
>> issues with their packages on that architecture. I can sponsor access
>> to those machines for some of you, but it is not a place where you can
>> host a CI instance.
>>
>> Just keep it in mind more broadly than powerpc, f.e. these are all the
>> archs where numpy was built after the last upload
>> https://buildd.debian.org/status/package.php?p=python-numpy;
>> suite=unstable
>> (the grayed out archs are the ones non release critical, so packages
>> are built as best effort and if missing is not a big deal)
>
>
> Thanks Sandro. It looks like even for the release-critical ones, it's just
> the build that has to succeed and failures are not detected? For example,
> armel is green but has 9 failures: https://buildd.debian.org/stat
> us/fetch.php?pkg=python-numpy=armel=1%3A1.12.0-2&
> stamp=1484889563=0
>
> Ralf
>


More general questions on this:

Are there any overviews over which packages in the python for science or
python for data anlaysis areas work correctly on different platforms:
Are there any platforms/processors, besides the standard x32/x54, where
this is important?

for example for statsmodels:
In early releases of statsmodels, maybe 5 to 7 years ago, Yarik and I were
still debugging problems on several machines like ppc and s390x during
Debian testing. Since then I haven't heard much about specific problems.
The current status for statsmodels on Debian machines is pretty mixed. In
several of them some dependencies are not available, in some cases we have
errors that might be caused by errors in dependencies, e.g. cvxopt.

ppc64el test run for statsmodels has a large number of failure
but checking scipy, it looks like it's also not working properly
https://buildd.debian.org/status/fetch.php?pkg=python-
scipy=ppc64el=0.18.1-2=1477075663=0

In those cases it would be impossible to start debugging, if we would have
to debug through the entire dependency chain.

CI-testing for Windows, Apple and Linux for mainly x64 seems to be working
pretty well, with some delays while version incompatibilities are fixed.
But anything that is not in a CI testing setup looks pretty random to me.

(I'm mainly curious what the status for those machines are. I'm not really
eager to create more debugging work, but sometimes failures on a machine
point to code that is "fragile".)

Josef



>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] offset in fill diagonal

2017-01-21 Thread josef . pktd
Is there a simple way to fill in diagonal elements in an array for other
than main diagonal?

As far as I can see, the diagxxx functions that have offset can only read
and not inplace modify, and the functions for modifying don't have offset
and only allow changing the main diagonal.

Usecase: creating banded matrices (2-D arrays) similar to toeplitz.


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 8:53 AM,  wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>

 User asks: please give me random pairs of the three items, where item 1
 has probability 0.2, item 2 has 0.4, and 3 has 0.4.

 Function returns: random pairs, where if you make many random returned
 results (as in the law of large numbers) and look at the items they
 contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
 0.38333.
 These are not (quite) the probabilities the user asked for...

 Can you explain a sense where the user's requested probabilities (0.2,
 0.4, 0.4) are actually adhered in the results which random.choice returns?

>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>  "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>

BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.


(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef



>
> Josef
>
>
>
>>
>> Nadav.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-18 Thread josef . pktd
On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El  wrote:

>
> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com 
> wrote:
>
>> Let's look at what the user asked this function, and what it returns:
>>
>>>
>>> User asks: please give me random pairs of the three items, where item 1
>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>
>>> Function returns: random pairs, where if you make many random returned
>>> results (as in the law of large numbers) and look at the items they
>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>> 0.38333.
>>> These are not (quite) the probabilities the user asked for...
>>>
>>> Can you explain a sense where the user's requested probabilities (0.2,
>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>
>>
>> I think that the question the user is asking by specifying p is a
>> slightly different one:
>>  "please give me random pairs of the three items extracted from a
>> population of 3 items where item 1 has probability of being extracted of
>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>> extracted."
>>
>
> You are right, if that is what the user wants, numpy.random.choice does
> the right thing.
>
> I'm just wondering whether this is actually what users want, and whether
> they understand this is what they are getting.
>
> As I said, I expected it to generate pairs with, empirically, the desired
> distribution of individual items. The documentation of numpy.random.choice
> seemed to me (wrongly) that it implis that that's what it does. So I was
> surprised to realize that it does not.
>

As Alessandro and you showed, the function returns something that makes
sense. If the user wants something different, then they need to look for a
different function, which is however difficult if it doesn't have a
solution in general.

Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or
find it counter intuitive, it is what it is given the sampling scheme.

Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties

Josef



>
> Nadav.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
On Tue, Jan 17, 2017 at 6:58 PM, aleba...@gmail.com 
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El :
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
> Thus, the result we get from random.choice IMHO definitely makes sense. Of
> course, I think we could always discuss about implementing other sampling
> methods if they are useful to some application.
>
>
>>
>>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
>>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that,
>>> once normalised, translate into 1/3 and 2/3 respectively)
>>> Therefore P([1,2]) = 0.7/3 = 0.2
>>> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53
>>>
>>
>> Right, these are the numbers that the algorithm in the current code, and
>> the formula above, produce:
>>
>> P([1,2]) = P([1,3]) = 0.2
>> P([2,3]) = 0.5
>>
>> What I'm puzzled about is that these probabilities do not really fullfill
>> the given probability vector 0.2, 0.4, 0.4...
>> Let me try to explain explain:
>>
>> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
>> items in the first place?
>>
>> One reasonable interpretation is that the user wants in his random picks
>> to see item 1 half the time of item 2 or 3.
>> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
>> it half as often will result in an equal expenditure on each item.
>>
>> If the user randomly picks the items individually (a single item at a
>> time), he indeed gets exactly this distribution: 0.2 of the time item 1,
>> 0.4 of the time item 2, 0.4 of the time item 3.
>>
>> Now, what happens if he picks not individual items, but pairs of
>> different items using numpy.random.choice with two items, replace=false?
>> Suddenly, the distribution of the individual items in the results get
>> skewed: If we look at the expected number of times we'll see each item in
>> one draw of a random pair, we will get:
>>
>> E(1) = P([1,2]) + P([1,3]) = 0.4
>> E(2) = P([1,2]) + P([2,3]) = 0.7
>> E(3) = P([1,3]) + P([2,3]) = 0.7
>>
>> Or renormalizing by dividing by 2:
>>
>> P(1) = 0.23
>> P(2) = 0.38
>> P(3) = 0.38
>>
>> As you can see this is not quite the probabilities we wanted (which were
>> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
>> often than we wanted, and item 2 and 3 were used a bit less often!
>>
>
> p is not the probability of the output but the one of the source finite
> population. I think that if you want to preserve that distribution, as
> Josef pointed out, you have to make extractions independent, that is either
> sample with replacement or approximate an infinite population (that is
> basically the same thing).  But of course in this case you will also end up
> with events [X,X].
>

With replacement and keeping duplicates the results might also be similar
in the pattern of the marginal probabilities
https://onlinecourses.science.psu.edu/stat506/node/17

Another approach in survey sampling is also to drop duplicates in with
replacement sampling, but then the sample size itself is random. (again I
didn't try to understand the small print)

(another related aside: The problem with discrete sample space in small
samples shows up also in calculating hypothesis tests, e.g. fisher's exact
or similar. Because, we only get a few discrete possibilities in the sample
space, it is not possible to construct a test that has exactly the desired
type 1 error.)


Josef



>
>
>> So that brought my question of why we consider these numbers right.
>>
>> In this example, it's actually possible to get the right item
>> distribution, if we pick the pair outcomes with the following probabilties:
>>
>>P([1,2]) = 0.2(not 0.23 as above)
>>P([1,3]) = 0.2
>>P([2,3]) = 0.6(not 0.53 as above)
>>
>> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>>
>> Interestingly, fixing things like I suggest is not always possible.
>> Consider a different probability-vector example for three items - 0.99,
>> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking
>> pairs from these three items, *each* returned pair will inevitably contain
>> one of the two 

Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
On Tue, Jan 17, 2017 at 4:13 PM, Nadav Har'El  wrote:
>
>
> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com 
wrote:
>>
>> Hi Nadav,
>>
>> I may be wrong, but I think that the result of the current
implementation is actually the expected one.
>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
0.4
>>
>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>
>
> Yes, this formula does fit well with the actual algorithm in the code.
But, my question is *why* we want this formula to be correct:
>
>>
>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that,
once normalised, translate into 1/3 and 2/3 respectively)
>> Therefore P([1,2]) = 0.7/3 = 0.2
>> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53
>
>
> Right, these are the numbers that the algorithm in the current code, and
the formula above, produce:
>
> P([1,2]) = P([1,3]) = 0.2
> P([2,3]) = 0.5
>
> What I'm puzzled about is that these probabilities do not really fullfill
the given probability vector 0.2, 0.4, 0.4...
> Let me try to explain explain:
>
> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
items in the first place?
>
> One reasonable interpretation is that the user wants in his random picks
to see item 1 half the time of item 2 or 3.
> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
it half as often will result in an equal expenditure on each item.
>
> If the user randomly picks the items individually (a single item at a
time), he indeed gets exactly this distribution: 0.2 of the time item 1,
0.4 of the time item 2, 0.4 of the time item 3.
>
> Now, what happens if he picks not individual items, but pairs of
different items using numpy.random.choice with two items, replace=false?
> Suddenly, the distribution of the individual items in the results get
skewed: If we look at the expected number of times we'll see each item in
one draw of a random pair, we will get:
>
> E(1) = P([1,2]) + P([1,3]) = 0.4
> E(2) = P([1,2]) + P([2,3]) = 0.7
> E(3) = P([1,3]) + P([2,3]) = 0.7
>
> Or renormalizing by dividing by 2:
>
> P(1) = 0.23
> P(2) = 0.38
> P(3) = 0.38
>
> As you can see this is not quite the probabilities we wanted (which were
0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
often than we wanted, and item 2 and 3 were used a bit less often!
>
> So that brought my question of why we consider these numbers right.
>
> In this example, it's actually possible to get the right item
distribution, if we pick the pair outcomes with the following probabilties:
>
>P([1,2]) = 0.2(not 0.23 as above)
>P([1,3]) = 0.2
>P([2,3]) = 0.6(not 0.53 as above)
>
> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>
> Interestingly, fixing things like I suggest is not always possible.
Consider a different probability-vector example for three items - 0.99,
0.005, 0.005. Now, no matter which algorithm we use for randomly picking
pairs from these three items, *each* returned pair will inevitably contain
one of the two very-low-probability items, so each of those items will
appear in roughly half the pairs, instead of in a vanishingly small
percentage as we hoped.
>
> But in other choices of probabilities (like the one in my original
example), there is a solution. For 2-out-of-3 sampling we can actually show
a system of three linear equations in three variables, so there is always
one solution but if this solution has components not valid as probabilities
(not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
0.005 example.


I think the underlying problem is that in the sampling space the events (1,
2) (1, 3) (2, 3) are correlated and because of the discreteness an
arbitrary marginal distribution on the individual events 1, 2, 3 is not
possible.

related aside:
I'm not able (or willing to spend the time) on the math, but I just went
through something similar for survey sampling in finite population (e.g.
survey two out of 3 individuals, where 3 is the population), leading to the
Horvitz–Thompson estimator. The books have chapters on different sampling
schemes and derivation of the marginal and joint probability to be surveyed.

(I gave up on sampling without replacement, and assume we have a large
population where it doesn't make a difference.)

In some of the sampling schemes they pick sequentially and adjust the
probabilities for the remaining individuals. That seems to provide more
flexibility to create a desired or optimal sampling scheme.

Josef




>
>
>
>>
>> What am I missing?
>>
>> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 :
>>>
>>> Hi, I'm looking for a way to find a random sample of C different items
out
>>> of N items, with a 

Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread josef . pktd
On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat  wrote:

> > Note that you're proposing a new scipy feature (right?) on the numpy
> list
>
> > This sounds like a good idea to me. As a former heavy Matlab user I
> remember a lot of things to dislike, but "\" behavior was quite nice.
>
> Correct, I am not sure where this might go in. It seemed like a NumPy
> array operation (touching array elements rapidly etc. can also be added for
> similar functionalities other than solve) hence the NumPy list. But of
> course it can be pushed as an exclusive SciPy feature. I'm not sure what
> the outlook on np.linalg.solve is.
>
>
> > How much is a noticeable slowdown? Note that we still have the current
> interfaces available for users that know what they need, so a nice
> convenience function that is say 5-10% slower would not be the end of the
> world.
>
> the fastest case was around 150-400% slower but of course it might be the
> case that I'm not using the fastest methods. It was mostly shuffling things
> around and using np.any on them in the pure python3 case. I will cook up
> something again for the baseline as soon as I have time.
>
>
>
All this checks sound a bit expensive, if we have almost always completely
unstructured arrays that don't satisfy any special matrix pattern.

In analogy to the type proliferation in Julia to handle those cases: Is
there a way to attach information to numpy arrays that for example signals
that a 2d array is hermitian, banded or diagonal or ...?

(After second thought: maybe completely unstructured is not too expensive
to detect if the checks are short-circuited, one off diagonal element
nonzero - not diagonal, two opposite diagonal different - not symmetric,
...)

Josef




>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating matrices.

2017-01-06 Thread josef . pktd
On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers  wrote:

>
>
> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey 
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers 
>> wrote:
>>
>>> This sounds like a reasonable idea. Timeline could be something like:
>>>
>>> 1. Now: create new package, deprecate np.matrix in docs.
>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>> 3. After 2020: remove matrix from numpy.
>>>
>>> Ralf
>>>
>>
>> I think this sounds reasonable, and reminds me of the deliberate
>> deprecation process taken for scipy.weave. I guess we'll see how successful
>> it was when 0.19 is released.
>>
>> The major problem I have with removing numpy matrices is the effect on
>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>> often produces numpy.matrix results when densifying. The two are coupled
>> tightly enough that if numpy matrices go away, all of the existing sparse
>> matrix classes will have to go at the same time.
>>
>> I don't think that would be the end of the world,
>>
>
> Not the end of the world literally, but the impact would be pretty major.
> I think we're stuck with scipy.sparse, and may at some point will add a new
> sparse *array* implementation next to it. For scipy we will have to add a
> dependency on the new npmatrix package or vendor it.
>

That sounds to me like moving maintenance of numpy.matrix from numpy to
scipy, if scipy.sparse is one of the main users and still depends on it.

Josef



>
> Ralf
>
>
>
>> but it's definitely something that should happen while scipy is still
>> pre-1.0, if it's ever going to happen.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating matrices.

2017-01-02 Thread josef . pktd
On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers  wrote:

>
>
> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> Just throwing this click bait out for discussion. Now that the `@`
>> operator is available and things seem to be moving towards Python 3,
>> especially in the classroom, we should consider the real possibility of
>> deprecating the matrix type and later removing it. No doubt there are old
>> scripts that require them, but older versions of numpy are available for
>> those who need to run old scripts.
>>
>> Thoughts?
>>
>
> Clearly deprecate in the docs now, and warn only later imho. We can't warn
> before we have a good solution for scipy.sparse matrices, which have matrix
> semantics and return matrix instances.
>
> Ralf
>

How about dropping python 2 support at the same time, then we can all be in
a @ world.

Josef



>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:39 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris
>  wrote:
> [...]
> > What I have been concerned about are the follow combinations that
> currently
> > return floats
> >
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
>
> What's this referring to? For both arrays and scalars I get:
>
> In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype
> Out[8]: dtype('int8')
>
> In [9]: (np.int8(2) ** np.int8(2)).dtype
> Out[9]: dtype('int8')
>


>>> (np.array([2], dtype=np.int8) ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('int8')
>>> (np.array([2], dtype=np.int8)[0] ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('float32')

>>> (np.int8(2)**np.int8(-1)).dtype
dtype('float32')
>>> (np.int8(2)**np.int8(2)).dtype
dtype('int8')

The last one looks like value dependent scalar dtype

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris  wrote:

>
>
> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer  wrote:
>
>> I am also concerned about adding more special cases for NumPy scalars vs
>> arrays. These cases are already confusing (e.g., making no distinction
>> between 0d arrays and scalars) and poorly documented.
>>
>> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:
>>
>>> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>>>  wrote:
>>> > Hi All,
>>> >
>>> > I've been thinking about this some (a lot) more and have an alternate
>>> > proposal for the behavior of the `**` operator
>>> >
>>> > if both base and power are numpy/python scalar integers, convert to
>>> python
>>> > integers and call the `**` operator. That would solve both the
>>> precision and
>>> > compatibility problems and I think is the option of least surprise. For
>>> > those who need type preservation and modular arithmetic, the np.power
>>> > function remains, although the type conversions can be surpirising as
>>> it
>>> > seems that the base and power should  play different roles in
>>> determining
>>> > the type, at least to me.
>>> > Array, 0-d or not, are treated differently from scalars and integers
>>> raised
>>> > to negative integer powers always raise an error.
>>> >
>>> > I think this solves most problems and would not be difficult to
>>> implement.
>>> >
>>> > Thoughts?
>>>
>>> My main concern about this is that it adds more special cases to numpy
>>> scalars, and a new behavioral deviation between 0d arrays and scalars,
>>> when ideally we should be trying to reduce the
>>> duplication/discrepancies between these. It's also inconsistent with
>>> how other operations on integer scalars work, e.g. regular addition
>>> overflows rather than promoting to Python int:
>>>
>>> In [8]: np.int64(2 ** 63 - 1) + 1
>>> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
>>> overflow encountered in long_scalars
>>>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
>>> Out[8]: -9223372036854775808
>>>
>>> So I'm inclined to try and keep it simple, like in your previous
>>> proposal... theoretically of course it would be nice to have the
>>> perfect solution here, but at this point it feels like we might be
>>> overthinking this trying to get that last 1% of improvement. The thing
>>> where 2 ** -1 returns 0 is just broken and bites people so we should
>>> definitely fix it, but beyond that I'm not sure it really matters
>>> *that* much what we do, and "special cases aren't special enough to
>>> break the rules" and all that.
>>>
>>>
> What I have been concerned about are the follow combinations that
> currently return floats
>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
>
> The other combinations of signed and unsigned integers to signed powers
> currently raise ValueError due to the change to the power ufunc. The
> exceptions that aren't covered by uint64 + signed (which won't change) seem
> to occur when the exponent can be safely cast to the base type. I suspect
> that people have already come to depend on that, especially as python
> integers on 64 bit linux convert to int64. So in those cases we should
> perhaps raise a FutureWarning instead of an error.
>


>>> np.int64(2)**np.array(-1, np.int64)
0.5
>>> np.__version__
'1.10.4'
>>> np.int64(2)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
array([ 0.5,  4. ])
>>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)


(IMO: If you have to break backwards compatibility, break forwards not
backwards.)


Josef
http://www.stanlaurelandoliverhardy.com/nicemess.htm



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Tue, Oct 18, 2016 at 1:30 PM,   wrote:
> On Tue, Oct 18, 2016 at 1:25 PM,   wrote:
>> On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
>>  wrote:
>>> Hi,
>>>
>>>
>>> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
 When I have similar situations, I usually interpolate between the valid 
 values. I assume there are a lot of use cases for convolutions but I have 
 difficulties imagining that ignoring a missing value and, for the purpose 
 of the computation, treating it as zero is useful in many of them.
>>> When estimating the autocorrelation of a signal, it make sense to drop
>>> missing pairs of values. Only in this use case, it opens the question of
>>> correcting or not correcting for the number of missing elements  when
>>> computing the mean. I don't remember what R function "acf" is doing.
>
> as aside: statsmodels has now an option for acf and similar
>
> missing : str
> A string in ['none', 'raise', 'conservative', 'drop']
> specifying how the NaNs
> are to be treated.

aside to the aside: statsmodels was just catching up in this

The original for masked array acf including correct counting of "valid" terms is

https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/lib/avcf.py

(which I looked at way before statsmodels had any acf)

Josef

>
> Josef
>
>>>
>>>
>>> Also, coming back to the initial question, I feel that it is necessary
>>> that the name "mask" (or "na" or similar) appears in the parameter name.
>>> Otherwise, people will wonder : "what on earth is contagious/being
>>> propagated"
>>>
>>> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>>>
>>> If I remember well, in R it is dropna. It would be nice if the boolean
>>> switch followed the same logic.
>>>
>>> Now of course the convolution function is more general than just
>>> autocorrelation...
>>
>> I think "drop" or "ignore" is too generic, for correlation it would be
>> for example ignore pairs versus ignore cases.
>>
>> To me propagate sounds ok to me, but something with `valid` might be
>> more explicit for convolution or `correlate`, however `valid` also
>> refers to the end points, so maybe valid_na or valid_masked=True
>>
>> Josef
>>
>>>
>>> best,
>>> Pierre
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Tue, Oct 18, 2016 at 1:25 PM,   wrote:
> On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
>  wrote:
>> Hi,
>>
>>
>> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
>>> When I have similar situations, I usually interpolate between the valid 
>>> values. I assume there are a lot of use cases for convolutions but I have 
>>> difficulties imagining that ignoring a missing value and, for the purpose 
>>> of the computation, treating it as zero is useful in many of them.
>> When estimating the autocorrelation of a signal, it make sense to drop
>> missing pairs of values. Only in this use case, it opens the question of
>> correcting or not correcting for the number of missing elements  when
>> computing the mean. I don't remember what R function "acf" is doing.

as aside: statsmodels has now an option for acf and similar

missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.

Josef

>>
>>
>> Also, coming back to the initial question, I feel that it is necessary
>> that the name "mask" (or "na" or similar) appears in the parameter name.
>> Otherwise, people will wonder : "what on earth is contagious/being
>> propagated"
>>
>> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>>
>> If I remember well, in R it is dropna. It would be nice if the boolean
>> switch followed the same logic.
>>
>> Now of course the convolution function is more general than just
>> autocorrelation...
>
> I think "drop" or "ignore" is too generic, for correlation it would be
> for example ignore pairs versus ignore cases.
>
> To me propagate sounds ok to me, but something with `valid` might be
> more explicit for convolution or `correlate`, however `valid` also
> refers to the end points, so maybe valid_na or valid_masked=True
>
> Josef
>
>>
>> best,
>> Pierre
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-18 Thread josef . pktd
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
 wrote:
> Hi,
>
>
> Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
>> When I have similar situations, I usually interpolate between the valid 
>> values. I assume there are a lot of use cases for convolutions but I have 
>> difficulties imagining that ignoring a missing value and, for the purpose of 
>> the computation, treating it as zero is useful in many of them.
> When estimating the autocorrelation of a signal, it make sense to drop
> missing pairs of values. Only in this use case, it opens the question of
> correcting or not correcting for the number of missing elements  when
> computing the mean. I don't remember what R function "acf" is doing.
>
>
> Also, coming back to the initial question, I feel that it is necessary
> that the name "mask" (or "na" or similar) appears in the parameter name.
> Otherwise, people will wonder : "what on earth is contagious/being
> propagated"
>
> just thinking of yet another keyword name  : ignore_masked (or drop_masked)
>
> If I remember well, in R it is dropna. It would be nice if the boolean
> switch followed the same logic.
>
> Now of course the convolution function is more general than just
> autocorrelation...

I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.

To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True

Josef

>
> best,
> Pierre
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to negative integer powers, time for a decision.

2016-10-07 Thread josef . pktd
On Fri, Oct 7, 2016 at 9:12 PM, Charles R Harris
 wrote:
> Hi All,
>
> The time for NumPy 1.12.0 approaches and I like to have a final decision on
> the treatment of integers to negative integer powers with the `**` operator.
> The two alternatives looked to be
>
> Raise an error for arrays and numpy scalars, including 1 and -1 to negative
> powers.
>
> Pluses
>
> Backward compatible
> Allows common powers to be integer, e.g., arange(3)**2
> Consistent with inplace operators
> Fixes current wrong behavior.
> Preserves type
>
>
> Minuses
>
> Integer overflow
> Computational inconvenience
> Inconsistent with Python integers
>
>
> Always return a float
>
> Pluses
>
> Computational convenience
>
>
> Minuses
>
> Loss of type
> Possible backward incompatibilities
> Not applicable to inplace operators
>
>
>
> Thoughts?

2: +1
I'm still in favor of number 2: less buggy code and less mental
gymnastics (watch out for that int, or which int do I need)

(upcasting is not applicable for any inplace operators, AFAIU   *=0.5 ?
zz = np.arange(5)
zz**(-1)
zz *= 0.5

tried in
>>> np.__version__
'1.9.2rc1'
>>> np.__version__
'1.10.4'
backwards compatibility ?
)


Josef


>
> Chuck
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Which NumPy/Numpy/numpy spelling?

2016-08-29 Thread josef . pktd
https://mail.scipy.org/pipermail/scipy-user/2010-June/025756.html

"NumPy and SciPy to refer to the projects. numpy and scipy to refer to
the packages, specifically. When in doubt, use the former."

I thought there was also another discussion about capital letters, but
I don't find it.

Josef


On Mon, Aug 29, 2016 at 9:22 AM, Bartosz Telenczuk  wrote:
> Hi,
>
> I would not mind any choice as long as it's consistent.
>
> I agree that using all-lowercase spelling may avoid some common errors. 
> However,
> PEP8 requires all module/package names to be lower case [1].  If we force the
> name of the library and the corresponding package to be the same, all Python
> libraries would be named in lowercase. This would not be the best choice for
> libraries, which have multi-component names (like NumPy = Numerical Python).
>
> Note also that both the Wikipedia page [2] and the official NumPy logo [3] use
> "NumPy" spelling.
>
> Some other popular [4] libraries use similar dichotomies:
>
> - Django - import django
> - Cython - import cython
> - PyYAML - import yaml
> - scikit-learn - import sklearn
>
> On the other hand all standard Python libraries are lower-case named.
>
> Cheers,
>
> Bartosz
>
> [1] https://www.python.org/dev/peps/pep-0008/#package-and-module-names
> [2] https://en.wikipedia.org/wiki/NumPy
> [3] http://www.numpy.org/_static/numpy_logo.png
> [4] http://pypi-ranking.info/alltime
>
>> On 08/29/2016 07:43 AM, m...@telenczuk.pl wrote:
>> > What is the official spelling of NumPy/Numpy/numpy?
>>
>> IMHO it should be written numpy, because ...
>>
>>  >>> import NumPy
>> Traceback (most recent call last):
>>File "", line 1, in 
>> ImportError: No module named NumPy
>>  >>> import Numpy
>> Traceback (most recent call last):
>>File "", line 1, in 
>> ImportError: No module named Numpy
>>  >>> import numpy
>>  >>>
>>
>> Phil
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] StackOverflow documentation

2016-07-21 Thread josef . pktd
On Thu, Jul 21, 2016 at 3:10 PM, Pauli Virtanen  wrote:
> Thu, 21 Jul 2016 16:24:15 +0200, mail kirjoitti:
> [clip]
>> Since it is run by the community, perhaps it's not a bad idea to
>> encourage people to share their examples.
>
> I would perhaps rather encourage people to improve the "Numpy User Guide"
> or the main documentation. Of course, working on that requires a somewhat
> different level of committment than editing what's essentially
> Stackexchange-provided wiki (where content gets relicensed with
> attribution clauses where you have to reference stackexchange and not the
> original author directly).


The way it looks right now, it is more of an example collection by
topic, similar to the old scipy wiki examples for numpy.
http://stackoverflow.com/documentation/python/196/comprehensions#t=201607212153155953853

It doesn't look like it will be a substitute for proper API
documentation, docstrings with numpy standard.
Longer examples or recipes on stackoverflow also have the problem that
code doesn't have a different license and falls under the
documentation license.
So, I don't think it will become a substitute for good notebook or
blog post examples either.

(license is the usual one way street, they copy, we cannot copy back
under our license)

Josef

>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread josef . pktd
On Wed, Jul 6, 2016 at 3:29 AM,  wrote:

>
>
> On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommers 
> wrote:
>
>>
>>
>> On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith  wrote:
>>
>> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>>> > function np.atleast_nd in PR#7804
>>> > (https://github.com/numpy/numpy/pull/7804).
>>> >
>>> > As a result of this PR, I have a couple of questions about
>>> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>>> > the dimensions: If the input is 1D, it prepends and appends a size-1
>>> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>>> > inconsistent with `np.atleast_2d`, which always prepends (as does
>>> > `np.atleast_nd`).
>>> >
>>> >   - Is there any reason for this behavior?
>>> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>>> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>>> > be a slight API change since the output would not be exactly the same.
>>>
>>> Changing atleast_3d seems likely to break a bunch of stuff...
>>>
>>> Beyond that, I find it hard to have an opinion about the best design for
>>> these functions, because I don't think I've ever encountered a situation
>>> where they were actually what I wanted. I'm not a big fan of coercing
>>> dimensions in the first place, for the usual "refuse to guess" reasons. And
>>> then generally if I do want to coerce an array to another dimension, then I
>>> have some opinion about where the new dimensions should go, and/or I have
>>> some opinion about the minimum acceptable starting dimension, and/or I have
>>> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>>> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>>> requirements list.)
>>>
>>> I don't know how typical I am in this. But it does make me wonder if the
>>> atleast_* functions act as an attractive nuisance, where new users take
>>> their presence as an implicit recommendation that they are actually a
>>> useful thing to reach for, even though they... aren't that. And maybe we
>>> should be recommending folk move away from them rather than trying to
>>> extend them further?
>>>
>>> Or maybe they're totally useful and I'm just missing it. What's your use
>>> case that motivates atleast_nd?
>>>
>> I think you're just missing it:) atleast_1d/2d are used quite a bit in
>> Scipy and Statsmodels (those are the only ones I checked), and in the large
>> majority of cases it's the best thing to use there. There's a bunch of
>> atleast_2d calls with a transpose appended because the input needs to be
>> treated as columns instead of rows, but that's still efficient and readable
>> enough.
>>
>
>
> As Ralph pointed out its usage in statsmodels. I do find them useful as
> replacement for several lines of ifs and reshapes
>
> We stilll need in many cases the atleast_2d_cols, that appends the newaxis
> if necessary.
>
> roughly the equivalent of
>
> if x.ndim == 1:
> x = x[:, None]
> else:
> x = np.atleast_2d(x)
>
> Josef
>
>
>>
>> For 3D/nD I can see that you'd need more control over where the
>> dimensions go, but 1D/2D are fine.
>>
>

statsmodels has currently very little code with ndim >2, so I have no
overview of possible use cases, but it would be necessary to have full
control over the added axis since axis have a strict meaning and stats
still prefer Fortran order to default numpy/C ordering.

Josef



>
>>
>> Ralf
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread josef . pktd
On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommers  wrote:

>
>
> On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith  wrote:
>
> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" 
>> wrote:
>> >
>> > Hi,
>> >
>> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>> > function np.atleast_nd in PR#7804
>> > (https://github.com/numpy/numpy/pull/7804).
>> >
>> > As a result of this PR, I have a couple of questions about
>> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>> > the dimensions: If the input is 1D, it prepends and appends a size-1
>> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>> > inconsistent with `np.atleast_2d`, which always prepends (as does
>> > `np.atleast_nd`).
>> >
>> >   - Is there any reason for this behavior?
>> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>> > be a slight API change since the output would not be exactly the same.
>>
>> Changing atleast_3d seems likely to break a bunch of stuff...
>>
>> Beyond that, I find it hard to have an opinion about the best design for
>> these functions, because I don't think I've ever encountered a situation
>> where they were actually what I wanted. I'm not a big fan of coercing
>> dimensions in the first place, for the usual "refuse to guess" reasons. And
>> then generally if I do want to coerce an array to another dimension, then I
>> have some opinion about where the new dimensions should go, and/or I have
>> some opinion about the minimum acceptable starting dimension, and/or I have
>> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>> requirements list.)
>>
>> I don't know how typical I am in this. But it does make me wonder if the
>> atleast_* functions act as an attractive nuisance, where new users take
>> their presence as an implicit recommendation that they are actually a
>> useful thing to reach for, even though they... aren't that. And maybe we
>> should be recommending folk move away from them rather than trying to
>> extend them further?
>>
>> Or maybe they're totally useful and I'm just missing it. What's your use
>> case that motivates atleast_nd?
>>
> I think you're just missing it:) atleast_1d/2d are used quite a bit in
> Scipy and Statsmodels (those are the only ones I checked), and in the large
> majority of cases it's the best thing to use there. There's a bunch of
> atleast_2d calls with a transpose appended because the input needs to be
> treated as columns instead of rows, but that's still efficient and readable
> enough.
>


As Ralph pointed out its usage in statsmodels. I do find them useful as
replacement for several lines of ifs and reshapes

We stilll need in many cases the atleast_2d_cols, that appends the newaxis
if necessary.

roughly the equivalent of

if x.ndim == 1:
x = x[:, None]
else:
x = np.atleast_2d(x)

Josef


>
> For 3D/nD I can see that you'd need more control over where the dimensions
> go, but 1D/2D are fine.
>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.dtype has the wrong size, try recompiling

2016-07-05 Thread josef . pktd
On Tue, Jul 5, 2016 at 4:22 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> This question on Stack Overflow:
>
>
> http://stackoverflow.com/questions/38197086/sklearn-numpy-dtype-has-the-wrong-size-try-recompiling-in-both-pycharm-and-te
>
> If I remember correctly this has something to do with Cython, right? Can't
> remeber the details though, can someone help that poor soul out?
>


from what I understood many years ago (based mainly on what David
Cournapeau was saying)
http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling

Josef


>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-05 Thread josef . pktd
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesias 
wrote:

> On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com)
> wrote:
>
> Hashing it probably wouldn't work, too
> great a chance for collisions.
>
>
> If the string is ASCII, you can always interpret the bytes as part of an 8
> byte integer. Or, you can map unique values to consecutive integers.
>
IIUC

np.nonzero(a[1] == a[:-1])   gives all changes independent of dtype. add or
remove a 1 to adjust which element is indexed.

(IIRC from a long time ago, arraysetops used/uses something like this.)

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-20 Thread josef . pktd
On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac  wrote:
> On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote:
>>
>> 1. What in principle is the best return type for int ** int (which
>> Josef I think most properly rephrased as whether `**` should be
>> thought of as a float operator, like `/` in python3 and `sqrt` etc.);
>
>
>
> Perhaps the question is somewhat different.  Maybe it is: what type
> should a user expect when the exponent is a Python int?  The obvious
> choices seem to be an object array of Python ints, or an array of
> floats.  So far, nobody has proposed the former, and concerns have
> been expressed about the latter.  More important, either would break
> the rule that the scalar type is not important in array operations,
> which seems like a good general rule (useful and easy to remember).
>
> How much commitment is there to such a rule?  E.g.,
> np.int64(2**7)*np.arange(5,dtype=np.int8)
> violates this.  One thing that has come out of this
> discussion for me is that the actual rules in play are
> hard to keep track of.  Are they all written down in
> one place?
>
> I suspect there is general support for the idea that if someone
> explicitly specifies the same dtype for the base and the
> exponent then the result should also have that dtype.
> I think this is already true for array exponentiation
> and for scalar exponentiation.
>
> One other thing that a user might expect, I believe, is that
> any type promotion rules for scalars and arrays will be the same.
> This is not currently the case, and that feels like an
> inconsistency.  But is it an inconsistency?  If the rule is that
> that array type dominates the scalar type, that may
> be understandable, but then it should be a firm rule.
> In this case, an exponent that is a Python int should not
> affect the dtype of the (array) result.
>
> In sum, as a user, I've come around to Chuck's original proposal:
> integers raised to negative integer powers raise an error.
> My reason for coming around is that I believe it meshes
> well with a general rule that in binary operations the
> scalar dtypes should not influence the dtype of an array result.
> Otoh, it is unclear to me how much commitment there is to that rule.
>
> Thanks in advance to anyone who can help me understand better
> the issues in play.

the main thing I get out of the discussion in this thread is that this
is way to complicated.

which ints do I have?

is it python or one of the many numpy int types, or two different
(u)int types or maybe one is a scalar so it shouldn't count?


scalar dominates here

>>> (np.ones(5, np.int8) *1.0).dtype
dtype('float64')

otherwise a huge amount of code would be broken that uses the *1. trick

Josef


>
> Cheers,
> Alan Isaac
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 12:07 PM,  wrote:

>
>
> On Mon, Jun 13, 2016 at 11:51 AM,  wrote:
>
>>
>>
>> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
>> wrote:
>>
>>> On Mon, 13 Jun 2016 10:49:44 -0400
>>> josef.p...@gmail.com wrote:
>>> >
>>> > My argument is that `**` is like integer division and sqrt where the
>>> domain
>>> > where integer return are the correct numbers is too small to avoid
>>> > headaches by users.
>>>
>>> float64 has less integer precision than int64:
>>>
>>> >>> math.pow(3, 39) == 3**39
>>> False
>>> >>> np.int64(3)**39 == 3**39
>>> True
>>>
>>
>> but if a user does this, then ???  (headaches or head scratching)
>>
>> >>> np.array([3])**39
>> RuntimeWarning: invalid value encountered in power
>>
>> array([-2147483648], dtype=int32)
>>
>
> I forgot to add
>
> the real headaches start in the second call, when we don't get the
> RuntimeWarning anymore
>
> >>> np.array([4])**39
> array([-2147483648], dtype=int32)
>
>
> ("Now, why do I owe so much money, when I made a huge profit all year." )
>

(grumpy off-topic complaint:
The Canadian tax system is like this. They make a mistake in transferring
information to a new computerized system, and then they send a bill for
taxes based on reassessment of something that happened 5 years ago because
their computerized record is wrong.
)



>
> Josef
>
>
>
>>
>> Josef
>>
>>
>>>
>>>
>>> (as a sidenote, np.float64's equality operator seems to be slightly
>>> broken:
>>>
>>> >>> np.float64(3)**39 == 3**39
>>> True
>>> >>> int(np.float64(3)**39) == 3**39
>>> False
>>> >>> float(np.float64(3)**39) == 3**39
>>> False
>>> )
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 11:51 AM,  wrote:

>
>
> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
> wrote:
>
>> On Mon, 13 Jun 2016 10:49:44 -0400
>> josef.p...@gmail.com wrote:
>> >
>> > My argument is that `**` is like integer division and sqrt where the
>> domain
>> > where integer return are the correct numbers is too small to avoid
>> > headaches by users.
>>
>> float64 has less integer precision than int64:
>>
>> >>> math.pow(3, 39) == 3**39
>> False
>> >>> np.int64(3)**39 == 3**39
>> True
>>
>
> but if a user does this, then ???  (headaches or head scratching)
>
> >>> np.array([3])**39
> RuntimeWarning: invalid value encountered in power
>
> array([-2147483648], dtype=int32)
>

I forgot to add

the real headaches start in the second call, when we don't get the
RuntimeWarning anymore

>>> np.array([4])**39
array([-2147483648], dtype=int32)


("Now, why do I owe so much money, when I made a huge profit all year." )

Josef



>
> Josef
>
>
>>
>>
>> (as a sidenote, np.float64's equality operator seems to be slightly
>> broken:
>>
>> >>> np.float64(3)**39 == 3**39
>> True
>> >>> int(np.float64(3)**39) == 3**39
>> False
>> >>> float(np.float64(3)**39) == 3**39
>> False
>> )
>>
>> Regards
>>
>> Antoine.
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou 
wrote:

> On Mon, 13 Jun 2016 10:49:44 -0400
> josef.p...@gmail.com wrote:
> >
> > My argument is that `**` is like integer division and sqrt where the
> domain
> > where integer return are the correct numbers is too small to avoid
> > headaches by users.
>
> float64 has less integer precision than int64:
>
> >>> math.pow(3, 39) == 3**39
> False
> >>> np.int64(3)**39 == 3**39
> True
>

but if a user does this, then ???  (headaches or head scratching)

>>> np.array([3])**39
RuntimeWarning: invalid value encountered in power

array([-2147483648], dtype=int32)

Josef


>
>
> (as a sidenote, np.float64's equality operator seems to be slightly
> broken:
>
> >>> np.float64(3)**39 == 3**39
> True
> >>> int(np.float64(3)**39) == 3**39
> False
> >>> float(np.float64(3)**39) == 3**39
> False
> )
>
> Regards
>
> Antoine.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread josef . pktd
On Mon, Jun 13, 2016 at 10:05 AM, Alan Isaac  wrote:

> On 6/13/2016 4:47 AM, Antoine Pitrou wrote:
>
>> Currently, the choice is simple: if you want an int output,
>> have an int input; if you want a float output, have a float output.
>>
>
>
> That is a misunderstanding, which may be influencing the discussion.
> Examples of complications:
>
> >>> type(np.int8(2)**2)
> 
> >>> type(np.uint64(2)**np.int8(2))
> 
>
> I don't think anyone has proposed first principles
> from which the desirable behavior could be deduced.
> I do think reference to the reasoning used by other
> languages in making this decision could be helpful.


I think the main principle is whether an operator is a "float" operator.

for example, I don't think anyone would expect sqrt(int) to return int,
even if it would have exact results in a countable infinite number of cases
(theoretically)

another case is division which moved from return-int to return-float
definition in the py2 - py3 move.

My argument is that `**` is like integer division and sqrt where the domain
where integer return are the correct numbers is too small to avoid
headaches by users.

Josef


>
>
> Alan Isaac
> (on Windows)
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd
On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron  wrote:

> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
>
> Regarding your remark:  Maybe Im having a terminology problem.  It seems
> to me once you do
>
> osm = dist.ppf(osm_uniform)
>>>
>>>
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
>


Yes and No, quantile, i.e. what you get from ppf, are units of the random
variable. So it is on the scale of the random variable not on a probability
scale. The axis labels are in units of the random variable.

pp-plots have probabilities on the axis and are uniform scaled in
probabilities but non-uniform in the values of the random variable.

The difficult part to follow is if the plot is done uniform in one scale,
but the axis are labeled non-uniform in the other scale. That's what Paul's
probscale does and what you have in mind, AFAIU.

Josef


>
> Mark
>
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
>
>
>
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
> wrote:
>
>> Hi Mark,
>>
>> Note that the scipy-dev or scipy-user mailing list would have been more
>> appropriate for this question.
>>
>>
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron 
>> wrote:
>>
>>>
>>>
>>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>>> values versus actual data values for visualization of fit to a
>>> distribution.  First a one-D array of expected percentiles is generated for
>>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>>> function for the chosen distribution, to return an array of expected
>>> values.  The visualized data points are pairs of expected and actual
>>> values, and a linear regression is done on these to produce the line data
>>> points in this distribution should lie on.
>>>
>>> Where x is the input data array and dist the chosen distribution we have:
>>>
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>>
>>>
>>> My question concerns the plot display.
>>>
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>>
>>>
>>> The x-axis of the resulting plot is labeled quantiles, but the xticks
>>> and xticklabels produced produced by qqplot and problplot do not seem
>>> correct for the their intended interpretations.  First the numbers on the
>>> x-axis do not represent quantiles; the intervals between them do not in
>>> general contain equal numbers of points.  For a normal distribution with
>>> sigma=1, they represent standard deviations.  Changing the label on the
>>> x-axis does not seem like a very good solution, because the interpretation
>>> of the values on the x-axis will be different for different distributions.
>>> Rather the right solution seems to be to actually show quantiles on the
>>> x-axis. The numbers on the x-axis can stay as they are, representing
>>> quantile indexes, but they need to be spaced so as to show the actual
>>> division points that carve the population up into  groups of the same
>>> size.  This can be done in something like the following way.
>>>
>>
>> The ticks are correct I think, but they're theoretical quantiles and not
>> sample quantiles. This was discussed in [1] and is consistent with R [2]
>> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
>> x-axis label (mea culpa). Does adding that resolve your concern?
>>
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3]
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>>
>> Ralf
>>
>>
> as related link
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>
> Paul Hobson has done a lot of work for getting different probabitlity
> scales attached to pp-plots or generalized versions of probability plots. I
> think qqplots are less ambiguous because they are on the original or
> standardized scale.
>
> I haven't worked my way through the various interpretation of probability
> axis yet because I find it "not obvious". It might be easier for fields
> that have a tradition of using probability papers.
>
> It's planned to be added to the statsmodels probability plots so that
> there will be a large choice of axis labels and scales.
>
> Josef
>
>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> 

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd
On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
wrote:

> Hi Mark,
>
> Note that the scipy-dev or scipy-user mailing list would have been more
> appropriate for this question.
>
>
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
>
>>
>>
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>> values versus actual data values for visualization of fit to a
>> distribution.  First a one-D array of expected percentiles is generated for
>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>> function for the chosen distribution, to return an array of expected
>> values.  The visualized data points are pairs of expected and actual
>> values, and a linear regression is done on these to produce the line data
>> points in this distribution should lie on.
>>
>> Where x is the input data array and dist the chosen distribution we have:
>>
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>
>>
>> My question concerns the plot display.
>>
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>
>>
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and
>> xticklabels produced produced by qqplot and problplot do not seem correct
>> for the their intended interpretations.  First the numbers on the x-axis do
>> not represent quantiles; the intervals between them do not in general
>> contain equal numbers of points.  For a normal distribution with sigma=1,
>> they represent standard deviations.  Changing the label on the x-axis does
>> not seem like a very good solution, because the interpretation of the
>> values on the x-axis will be different for different distributions.  Rather
>> the right solution seems to be to actually show quantiles on the x-axis.
>> The numbers on the x-axis can stay as they are, representing quantile
>> indexes, but they need to be spaced so as to show the actual division
>> points that carve the population up into  groups of the same size.  This
>> can be done in something like the following way.
>>
>
> The ticks are correct I think, but they're theoretical quantiles and not
> sample quantiles. This was discussed in [1] and is consistent with R [2]
> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
> x-axis label (mea culpa). Does adding that resolve your concern?
>
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3]
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>
> Ralf
>
>
as related link
http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html

Paul Hobson has done a lot of work for getting different probabitlity
scales attached to pp-plots or generalized versions of probability plots. I
think qqplots are less ambiguous because they are on the original or
standardized scale.

I haven't worked my way through the various interpretation of probability
axis yet because I find it "not obvious". It might be easier for fields
that have a tradition of using probability papers.

It's planned to be added to the statsmodels probability plots so that there
will be a large choice of axis labels and scales.

Josef


>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-10 Thread josef . pktd
On Fri, Jun 10, 2016 at 2:00 PM, Nathaniel Smith  wrote:

> On Jun 10, 2016 10:50, "Alan Isaac"  wrote:
> >
> > On 6/10/2016 1:34 PM, Nathaniel Smith wrote:
> >>
> >> You keep pounding on this example. It's a fine example, but, c'mon. **2
> is probably at least 100x more common in real source code. Maybe 1000x more
> common. Why should we break the
> >> common case for your edge case?
> >
> >
> >
> > It is hardly an "edge case".
> > Again, **almost all** integer combinations overflow: that's the point.
>
> When you say "almost all", you're assuming inputs that are uniformly
> sampled integers. I'm much more interested in what proportion of calls to
> the ** operator involve inputs that can overflow, and in real life those
> inputs are very heavily biased towards small numbers.
>
> (I also think we should default to raising an error on overflow in
> general, with a seterr switch to turn it off when desired. But that's
> another discussion...)
>

but x**2 is just x*x which some seem to recommend (I have no idea why), and
then there are not so many "common" cases left.


(However, I find integers pretty useless except in some very specific
cases. When I started to cleanup scipy.stats.distribution, I threw out
integers for discrete distributions and replaced all or most `**` by
np.power, IIRC mainly because of the old python behavior and better numpy
behavior.)

(I'd rather use robust calculations that provide correct numbers, than
chasing individual edge cases and save a bit of memory in some common
cases. scipy stats also doesn't use factorial in almost all cases, because
special.gamma and variants are more robust )


Josef


> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-05 Thread josef . pktd
On Sun, Jun 5, 2016 at 8:41 PM, Stephan Hoyer  wrote:

> If possible, I'd love to add new functions for "generalized ufunc" linear
> algebra, and then deprecate (or at least discourage) using the older
> versions with inferior broadcasting rules. Adding a new keyword arg means
> we'll be stuck with an awkward API for a long time to come.
>
> There are three types of matrix/vector products for which ufuncs would be
> nice:
> 1. matrix-matrix product (covered by matmul)
> 2. matrix-vector product
> 3. vector-vector (inner) product
>
> It's straightful to implement either of the later two options by inserting
> dummy dimensions and then calling matmul, but that's a pretty awkward API,
> especially for inner products. Unfortunately, we already use the two most
> obvious one word names for vector inner products (inner and dot). But on
> the other hand, one word names are not very descriptive, and the short name
> "dot" probably mostly exists because of the lack of an infix operator.
>
> So I'll start by throwing out some potential new names:
>
> For matrix-vector products:
> matvecmul (if it's worth making a new operator)
>
> For inner products:
> vecmul (similar to matmul, but probably too ambiguous)
> dot_product
> inner_prod
> inner_product
>
>
how about names in plural as in the PR
I thought the `s` in inner_prods would signal better the broadcasting
behavior

dot_products
...

"dots" ?  (I guess not)

Josef


>
>
>
>
> On Sat, May 28, 2016 at 8:53 PM, Scott Sievert 
> wrote:
>
>> I recently ran into an application where I had to compute many inner
>> products quickly (roughy 50k inner products in less than a second). I
>> wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1,
>> …, xn.T @ A @ xn]` with A.shape = (1k, 1k).
>>
>> My first instinct was to look for a NumPy function to quickly compute
>> this, such as np.inner. However, it looks like np.inner has some other
>> behavior and I couldn’t get tensordot/einsum to work for me.
>>
>> Then a labmate pointed out that I can just do some slick matrix
>> multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`.
>> I opened [a PR] with this, and proposed that we define a new function
>> called `inner_prods` for this.
>>
>> However, in the PR, @shoyer pointed out
>>
>> > The main challenge is to figure out how to transition the behavior of
>> all these operations, while preserving backwards compatibility. Quite
>> likely, we need to pick new names for these functions, though we should try
>> to pick something that doesn't suggest that they are second class
>> alternatives.
>>
>> Do we choose new function names? Do we add a keyword arg that changes
>> what np.inner returns?
>>
>> [a PR]:https://github.com/numpy/numpy/pull/7690
>>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 6:17 PM,  wrote:
>
>>
>>
>> On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jun 4, 2016 at 5:27 PM,  wrote:
>>>


 On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:

> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
> > Also in favor of 2. Always return a float for '**'
>
> Even if we did want to switch to this, it's such a major
> backwards-incompatible change that I'm not sure how we could actually
> make the transition without first making it an error for a while.
>

 AFAIU, only the dtype for int**int would change. So, what would be the
 problem with FutureWarnings as with other dtype changes that were done in
 recent releases.


>>> The main problem I see with that is that numpy integers would behave
>>> differently than Python integers, and the difference would be silent. With
>>> option 1 it is possible to write code that behaves the same up to overflow
>>> and the error message would supply a warning when the exponent should be
>>> float. One could argue that numpy scalar integer types could be made to
>>> behave like python integers, but then their behavior would differ from
>>> numpy arrays and numpy scalar arrays.
>>>
>>
>> I'm not sure I understand.
>>
>> Do you mean
>>
>> np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)
>>
>> or 2**2 would behave differently than np.int_(2)**np.int(2)
>>
>
> The second case. Python returns ints for non-negative integer powers of
> ints.
>
>
>>
>> ?
>>
>>
>> AFAICS, there are many cases where numpy scalars don't behave like python
>> scalars. Also, does different behavior mean different type/dtype or
>> different numbers.  (The first I can live with, the second requires human
>> memory usage, which is a scarce resource.)
>>
>> >>> 2**(-2)
>> 0.25
>>
>>
> But we can't mix types in np.arrays and we can't depend on the element
> values of arrays in the exponent, but only on their type, so 2 ** array([1,
> -1]) must contain a single type and making that type float would surely
> break code.  Scalar arrays, which are arrays, have the same problem. We
> can't do what Python does with ndarrays and numpy scalars, and it would be
> best to be consistent. Division was a simpler problem to deal with, as
> there were two operators, `//` and `/`. If there were two exponential
> operators life would be simpler.
>

What bothers me with the entire argument is that you are putting higher
priority on returning a dtype than on returning the correct numbers.

Reverse the argument: Because we cannot make the return type value
dependent we **have** to return float, in order to get the correct number.
(It's an argument not what we really have to do.)


Which code really breaks, code that gets a float instead of an int, and
with some advance warning users that really need to watch their memory can
use np.power.

My argument before was that I think a simple operator like `**` should work
for 90+% of the users and match their expectation, and the users that need
to watch dtypes can as well use the function.

(I can also live with the exception from case 1., but I really think this
is like the python 2 integer division "surprise")

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 5:27 PM,  wrote:
>
>>
>>
>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:
>>
>>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
>>> > Also in favor of 2. Always return a float for '**'
>>>
>>> Even if we did want to switch to this, it's such a major
>>> backwards-incompatible change that I'm not sure how we could actually
>>> make the transition without first making it an error for a while.
>>>
>>
>> AFAIU, only the dtype for int**int would change. So, what would be the
>> problem with FutureWarnings as with other dtype changes that were done in
>> recent releases.
>>
>>
> The main problem I see with that is that numpy integers would behave
> differently than Python integers, and the difference would be silent. With
> option 1 it is possible to write code that behaves the same up to overflow
> and the error message would supply a warning when the exponent should be
> float. One could argue that numpy scalar integer types could be made to
> behave like python integers, but then their behavior would differ from
> numpy arrays and numpy scalar arrays.
>

I'm not sure I understand.

Do you mean

np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2)

or 2**2 would behave differently than np.int_(2)**np.int(2)

?


AFAICS, there are many cases where numpy scalars don't behave like python
scalars. Also, does different behavior mean different type/dtype or
different numbers.  (The first I can live with, the second requires human
memory usage, which is a scarce resource.)

>>> 2**(-2)
0.25

Josef



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith  wrote:

> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole  wrote:
> > Also in favor of 2. Always return a float for '**'
>
> Even if we did want to switch to this, it's such a major
> backwards-incompatible change that I'm not sure how we could actually
> make the transition without first making it an error for a while.
>

AFAIU, only the dtype for int**int would change. So, what would be the
problem with FutureWarnings as with other dtype changes that were done in
recent releases.

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 3:49 PM, Matthew Brett 
wrote:

> On Sat, Jun 4, 2016 at 12:47 PM,   wrote:
> >
> >
> > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris <
> charlesr.har...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris
> >>  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I've made a new post so that we can make an explicit decision. AFAICT,
> >>> the two proposals are
> >>>
> >>> Integers to negative integer powers raise an error.
> >>> Integers to integer powers always results in floats.
> >>>
> >>> My own sense is that 1. would be closest to current behavior and using
> a
> >>> float exponential when a float is wanted is an explicit way to
> indicate that
> >>> desire. OTOH, 2. would be the most convenient default for everyday
> numerical
> >>> computation, but I think would more likely break current code. I am
> going to
> >>> come down on the side of 1., which I don't think should cause too many
> >>> problems if we start with a {Future, Deprecation}Warning explaining the
> >>> workaround.
> >
> >
> > I'm in favor of 2.  always float for `**`
> > I don't see enough pure integer usecases to throw away a nice operator.
>
> I can't make sense of 'throw away a nice operator' - you still have
> arr ** 2.0 if you want floats.
>


but if we have x**y, then we always need to check the dtype. If we don't we
get RuntimeErrors or overflow, where we might have forgotten to include the
relevant cases in the unit tests.

numpy has got pickier with using only integers in some areas (index, ...).
Now we have to watch out that we convert back to floats for power.

Not a serious problem for a library with unit tests and enough users who
run into the dtype issues and report them.
But I'm sure I will have to fix any scripts or interactive work that I'm
writing.
It's just another thing to watch out for, after we managed to get rid of
integer division 1/2=?.

Josef




>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-04 Thread josef . pktd
On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris 
wrote:

>
>
> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've made a new post so that we can make an explicit decision. AFAICT,
>> the two proposals are
>>
>>
>>1. Integers to negative integer powers raise an error.
>>2. Integers to integer powers always results in floats.
>>
>> My own sense is that 1. would be closest to current behavior and using a
>> float exponential when a float is wanted is an explicit way to indicate
>> that desire. OTOH, 2. would be the most convenient default for everyday
>> numerical computation, but I think would more likely break current code. I
>> am going to come down on the side of 1., which I don't think should cause
>> too many problems if we start with a {Future, Deprecation}Warning
>> explaining the workaround.
>>
>
I'm in favor of 2.  always float for `**`
I don't see enough pure integer usecases to throw away a nice operator.

Josef



>
> Note that current behavior in 1.11 is such a mess
> ```
> In [5]: array([0], dtype=int64) ** -1
> Out[5]: array([-9223372036854775808])
>
> In [6]: array([0], dtype=uint64) ** -1
> Out[6]: array([ inf])
> ```
> That the simplest approach might be to start by raising an error rather
> than by trying to maintain current behavior and issuing a warning.
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:21 PM, josef.p...@gmail.com wrote:




 On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote:




 On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote:

 On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote:
 
  Is there any reason why the defaults for `allclose` and
 `assert_allclose` differ? This makes debugging a broken test much more
 difficult. More importantly, using an absolute tolerance of 0 causes
 failures for some common cases. For example, if two values are very close
 to zero, a test will fail:


And one more comment: I debug broken tests pretty often. My favorites in
pdb are

np.max(np.abs(x - y))

and

np.max(np.abs(x / y - 1))

to see how much I would have to adjust atol and rtol in assert_allclose in
the tests to make them pass, and to decide whether this is an acceptable
numerical difference or a bug.

allclose doesn't tell me anything and I almost never use it.

Josef



 
  np.testing.assert_allclose(0, 1e-14)
 
  Git blame suggests the change was made in the following commit, but I
 guess that change only reverted to the original behavior.
 
 
 https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
 
  It seems like the defaults for  `allclose` and `assert_allclose`
 should match, and an absolute tolerance of 0 is probably not ideal. I guess
 this is a pretty big behavioral change, but the current default for
 `assert_allclose` doesn't seem ideal.

 What you say makes sense to me, and loosening the default tolerances
 won't break any existing tests. (And I'm not too worried about people who
 were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
 that much to you exactly what tolerance you test, you should be setting the
 tolerance explicitly!) I vote that unless someone comes up with some
 terrible objection in the next few days then you should submit a PR :-)


 If you mean by this to add atol=1e-8 as default, then I'm against it.

 At least it will change the meaning of many of our tests in statsmodels.

 I'm using rtol to check for correct 1e-15 or 1e-30, which would be
 completely swamped if you change the default atol=0.
 Adding atol=0 to all assert_allclose that currently use only rtol is a
 lot of work.
 I think I almost never use a default rtol, but I often leave atol at the
 default = 0.

 If we have zeros, then I don't think it's too much work to decide whether
 this should be atol=1e-20, or 1e-8.


 Just to explain, p-values, sf of the distributions are usually accurate at
 1e-30 or 1e-50 or something like that. And when we test the tails of the
 distributions we use that the relative error is small and the absolute
 error is tiny.

 We would need to do a grep to see how many cases there actually are in
 scipy and statsmodels, before we change it because for some use cases we
 only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization).
 Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR.

 Josef



 Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote:

 On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote:
 
  Is there any reason why the defaults for `allclose` and
 `assert_allclose` differ? This makes debugging a broken test much more
 difficult. More importantly, using an absolute tolerance of 0 causes
 failures for some common cases. For example, if two values are very close
 to zero, a test will fail:
 
  np.testing.assert_allclose(0, 1e-14)
 
  Git blame suggests the change was made in the following commit, but I
 guess that change only reverted to the original behavior.
 
 
 https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
 
  It seems like the defaults for  `allclose` and `assert_allclose` should
 match, and an absolute tolerance of 0 is probably not ideal. I guess this
 is a pretty big behavioral change, but the current default for
 `assert_allclose` doesn't seem ideal.

 What you say makes sense to me, and loosening the default tolerances won't
 break any existing tests. (And I'm not too worried about people who were
 counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that
 much to you exactly what tolerance you test, you should be setting the
 tolerance explicitly!) I vote that unless someone comes up with some
 terrible objection in the next few days then you should submit a PR :-)


If you mean by this to add atol=1e-8 as default, then I'm against it.

At least it will change the meaning of many of our tests in statsmodels.

I'm using rtol to check for correct 1e-15 or 1e-30, which would be
completely swamped if you change the default atol=0.
Adding atol=0 to all assert_allclose that currently use only rtol is a lot
of work.
I think I almost never use a default rtol, but I often leave atol at the
default = 0.

If we have zeros, then I don't think it's too much work to decide whether
this should be atol=1e-20, or 1e-8.

Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote:




 On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote:

 On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote:
 
  Is there any reason why the defaults for `allclose` and
 `assert_allclose` differ? This makes debugging a broken test much more
 difficult. More importantly, using an absolute tolerance of 0 causes
 failures for some common cases. For example, if two values are very close
 to zero, a test will fail:
 
  np.testing.assert_allclose(0, 1e-14)
 
  Git blame suggests the change was made in the following commit, but I
 guess that change only reverted to the original behavior.
 
 
 https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
 
  It seems like the defaults for  `allclose` and `assert_allclose` should
 match, and an absolute tolerance of 0 is probably not ideal. I guess this
 is a pretty big behavioral change, but the current default for
 `assert_allclose` doesn't seem ideal.

 What you say makes sense to me, and loosening the default tolerances
 won't break any existing tests. (And I'm not too worried about people who
 were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
 that much to you exactly what tolerance you test, you should be setting the
 tolerance explicitly!) I vote that unless someone comes up with some
 terrible objection in the next few days then you should submit a PR :-)


 If you mean by this to add atol=1e-8 as default, then I'm against it.

 At least it will change the meaning of many of our tests in statsmodels.

 I'm using rtol to check for correct 1e-15 or 1e-30, which would be
 completely swamped if you change the default atol=0.
 Adding atol=0 to all assert_allclose that currently use only rtol is a lot
 of work.
 I think I almost never use a default rtol, but I often leave atol at the
 default = 0.

 If we have zeros, then I don't think it's too much work to decide whether
 this should be atol=1e-20, or 1e-8.


Just to explain, p-values, sf of the distributions are usually accurate at
1e-30 or 1e-50 or something like that. And when we test the tails of the
distributions we use that the relative error is small and the absolute
error is tiny.

We would need to do a grep to see how many cases there actually are in
scipy and statsmodels, before we change it because for some use cases we
only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization).
Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR.

Josef



 Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] problems with mailing list ?

2014-07-18 Thread josef . pktd
Are the problems with sending out the messages with the mailing lists?

I'm getting some replies without original messages, and in some threads I
don't get replies, missing part of the discussions.


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote:




 On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote:

 On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote:
 
  Is there any reason why the defaults for `allclose` and
 `assert_allclose` differ? This makes debugging a broken test much more
 difficult. More importantly, using an absolute tolerance of 0 causes
 failures for some common cases. For example, if two values are very close
 to zero, a test will fail:
 
  np.testing.assert_allclose(0, 1e-14)
 
  Git blame suggests the change was made in the following commit, but I
 guess that change only reverted to the original behavior.
 
 
 https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf
 
  It seems like the defaults for  `allclose` and `assert_allclose` should
 match, and an absolute tolerance of 0 is probably not ideal. I guess this
 is a pretty big behavioral change, but the current default for
 `assert_allclose` doesn't seem ideal.

 What you say makes sense to me, and loosening the default tolerances
 won't break any existing tests. (And I'm not too worried about people who
 were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters
 that much to you exactly what tolerance you test, you should be setting the
 tolerance explicitly!) I vote that unless someone comes up with some
 terrible objection in the next few days then you should submit a PR :-)


 If you mean by this to add atol=1e-8 as default, then I'm against it.

 At least it will change the meaning of many of our tests in statsmodels.

 I'm using rtol to check for correct 1e-15 or 1e-30, which would be
 completely swamped if you change the default atol=0.
 Adding atol=0 to all assert_allclose that currently use only rtol is a lot
 of work.
 I think I almost never use a default rtol, but I often leave atol at the
 default = 0.

 If we have zeros, then I don't think it's too much work to decide whether
 this should be atol=1e-20, or 1e-8.


copied from
http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
since I didn't get any messages here

This is a compelling use-case, but there are also lots of compelling
usecases that want some non-zero atol (i.e., comparing stuff to 0).
Saying that allclose is for one of those use cases and assert_allclose
is for the other is... not a very felicitious API design, I think. So
we really should do *something*.

Are there really any cases where you want non-zero atol= that don't
involve comparing something against a 'desired' value of zero? It's a
little wacky, but I'm wondering if we ought to change the rule (for
all versions of allclose) to

if desired == 0:
tol = atol
else:
tol = rtol * desired

In particular, means that np.allclose(x, 1e-30) would reject x values
of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.

-n


That's much too confusing.
I don't know what the usecases for np.allclose are since I don't have any.

assert_allclose is one of our (statsmodels) most frequently used numpy
function

this is not informative:

`np.allclose(x, 1e-30)`


since there are keywords
either np.assert_allclose(x, atol=1e-30)
if I want to be close to zero
or

np.assert_allclose(x, rtol=1e-11, atol=1e-25)

if we have a mix of large numbers and zeros in an array.

Making the behavior of assert_allclose depending on whether desired is
exactly zero or 1e-20 looks too difficult to remember, and which desired I
use would depend on what I get out of R or Stata.

atol=1e-8 is not close to zero in most cases in my experience.


The numpy.testing assert functions are some of the most useful functions in
numpy, and heavily used code.

Josef




 Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Thu, Jul 17, 2014 at 11:37 AM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:
 
  On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu tsy...@gmail.com wrote:
  It seems like the defaults for  `allclose` and `assert_allclose` should
  match, and an absolute tolerance of 0 is probably not ideal. I guess
 this is
  a pretty big behavioral change, but the current default for
  `assert_allclose` doesn't seem ideal.
 
  I agree, current behavior quite annoying. It would make sense to change
 the
  atol default to 1e-8, but technically it's a backwards compatibility
 break.
  Would probably have a very minor impact though. Changing the default for
  rtol in one of the functions may be much more painful though, I don't
 think
  that should be done.

 Currently we have:

 allclose: rtol=1e-5, atol=1e-8
 assert_allclose: rtol=1e-7, atol=0

 Why would it be painful to change assert_allclose to match allclose?
 It would weaken some tests, but no code would break.


We might break our code, if suddenly our test suite doesn't do what it is
supposed to do.

(rough guess: 40% of the statsmodels code are unit tests.)

Josef



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:03 PM, josef.p...@gmail.com wrote:




 On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith n...@pobox.com wrote:

 On Fri, Jul 18, 2014 at 12:38 PM,  josef.p...@gmail.com wrote:
 
  On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote:
 
  If you mean by this to add atol=1e-8 as default, then I'm against it.
 
  At least it will change the meaning of many of our tests in
 statsmodels.
 
  I'm using rtol to check for correct 1e-15 or 1e-30, which would be
  completely swamped if you change the default atol=0.
  Adding atol=0 to all assert_allclose that currently use only rtol is a
 lot
  of work.
  I think I almost never use a default rtol, but I often leave atol at
 the
  default = 0.
 
  If we have zeros, then I don't think it's too much work to decide
 whether
  this should be atol=1e-20, or 1e-8.
 
 
  copied from
  http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
  since I didn't get any messages here
 
  This is a compelling use-case, but there are also lots of compelling
  usecases that want some non-zero atol (i.e., comparing stuff to 0).
  Saying that allclose is for one of those use cases and assert_allclose
  is for the other is... not a very felicitious API design, I think. So
  we really should do *something*.
 
  Are there really any cases where you want non-zero atol= that don't
  involve comparing something against a 'desired' value of zero? It's a
  little wacky, but I'm wondering if we ought to change the rule (for
  all versions of allclose) to
 
  if desired == 0:
  tol = atol
  else:
  tol = rtol * desired
 
  In particular, means that np.allclose(x, 1e-30) would reject x values
  of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
 
  -n
 
 
  That's much too confusing.
  I don't know what the usecases for np.allclose are since I don't have
 any.

 I wrote allclose because it's shorter, but my point is that
 assert_allclose and allclose should use the same criterion, and was
 making a suggestion for what that shared criterion might be.

  assert_allclose is one of our (statsmodels) most frequently used numpy
  function
 
  this is not informative:
 
  `np.allclose(x, 1e-30)`
 
 
  since there are keywords
  either np.assert_allclose(x, atol=1e-30)

 I think we might be talking past each other here -- 1e-30 here is my
 gold p-value that I'm hoping x will match, not a tolerance argument.


 my mistake




  if I want to be close to zero
  or
 
  np.assert_allclose(x, rtol=1e-11, atol=1e-25)
 
  if we have a mix of large numbers and zeros in an array.
 
  Making the behavior of assert_allclose depending on whether desired is
  exactly zero or 1e-20 looks too difficult to remember, and which
 desired I
  use would depend on what I get out of R or Stata.

 I thought your whole point here was that 1e-20 and zero are
 qualitatively different values that you would not want to accidentally
 confuse? Surely R and Stata aren't returning exact zeros for small
 non-zero values like probability tails?

  atol=1e-8 is not close to zero in most cases in my experience.

 If I understand correctly (Tony?) the problem here is that another
 common use case for assert_allclose is in cases like

 assert_allclose(np.sin(some * complex ** calculation / (that - should
 - be * zero)), 0)

 For cases like this, you need *some* non-zero atol or the thing just
 doesn't work, and one could quibble over the exact value as long as
 it's larger than normal floating point error. These calculations
 usually involve normal sized numbers, so atol should be comparable
 to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
 to around 1e8, which is a plausible upper bound for where people might
 expect assert_allclose to just work. I'm trying to figure out some way
 to support your use cases while also supporting other use cases.


 my problem is that there is no normal floating point error.
 If I have units in 1000 or units in 0.0001 depends on the example and
 dataset that we use for testing.

 this test two different functions/methods that calculate the same thing

 (Pdb) pval
 array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
 (Pdb) res2.pvalues
 array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
 (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)

 I don't care about errors that are smaller that 1e-25

 for example testing p-values against Stata

 (Pdb) tt.pvalue
 array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
 (Pdb) res2.pvalues
 array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
 (Pdb) tt.pvalue - res2.pvalues
 array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
 (Pdb) tt.pvalue / res2.pvalues - 1
 array([  3.79811738e-11,   4.01900735e-14,   7.33806349e-11])
 (Pdb) rtol
 1e-10
 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)


 I could find a lot more and maybe nicer examples, since I spend quite a
 bit of time fine tuning unit 

Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:29 PM, Nathaniel Smith n...@pobox.com wrote:

 On Fri, Jul 18, 2014 at 7:03 PM,  josef.p...@gmail.com wrote:
 
  On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith n...@pobox.com wrote:
 
  For cases like this, you need *some* non-zero atol or the thing just
  doesn't work, and one could quibble over the exact value as long as
  it's larger than normal floating point error. These calculations
  usually involve normal sized numbers, so atol should be comparable
  to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
  to around 1e8, which is a plausible upper bound for where people might
  expect assert_allclose to just work. I'm trying to figure out some way
  to support your use cases while also supporting other use cases.
 
 
  my problem is that there is no normal floating point error.
  If I have units in 1000 or units in 0.0001 depends on the example and
  dataset that we use for testing.
 
  this test two different functions/methods that calculate the same thing
 
  (Pdb) pval
  array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
  (Pdb) res2.pvalues
  array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
  (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
 
  I don't care about errors that are smaller that 1e-25
 
  for example testing p-values against Stata
 
  (Pdb) tt.pvalue
  array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
  (Pdb) res2.pvalues
  array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
  (Pdb) tt.pvalue - res2.pvalues
  array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
  (Pdb) tt.pvalue / res2.pvalues - 1
  array([  3.79811738e-11,   4.01900735e-14,   7.33806349e-11])
  (Pdb) rtol
  1e-10
  (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
 
 
  I could find a lot more and maybe nicer examples, since I spend quite a
 bit
  of time fine tuning unit tests.

 ...these are all cases where there are not exact zeros, so my proposal
 would not affect them?

 I can see the argument that we shouldn't provide any default rtol/atol
 at all because there is no good default, but... I don't think putting
 that big of a barrier in front of newbies writing their first tests is
 a good idea.


I think atol=0 is **very** good for newbies, and everyone else.
If expected is really zero or very small, then it immediately causes a test
failure, and it's relatively obvious how to fix it.

I worry a lot more about unit tests that don't bite written by newbies or
not so newbies who just use a default.

That's one of the problems we had with assert_almost_equal, and why I was
very happy to switch to assert_allclose with it's emphasis on relative
tolerance.

Josef




 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 2:44 PM, Nathaniel Smith n...@pobox.com wrote:

 On 18 Jul 2014 19:31, josef.p...@gmail.com wrote:
 
 
   Making the behavior of assert_allclose depending on whether desired is
   exactly zero or 1e-20 looks too difficult to remember, and which
 desired I
   use would depend on what I get out of R or Stata.
 
  I thought your whole point here was that 1e-20 and zero are
  qualitatively different values that you would not want to accidentally
  confuse? Surely R and Stata aren't returning exact zeros for small
  non-zero values like probability tails?
 
 
  I was thinking of the case when we only see pvalue  1e-16 or
 something like this, and we replace this by assert close to zero.
  which would translate to `assert_allclose(pvalue, 0, atol=1e-16)`
  with maybe an additional rtol=1e-11 if we have an array of pvalues where
 some are large (0.5).

 This example is also handled correctly by my proposal :-)

depends on the details of your proposal

alternative: desired is exactly zero means assert_equal

(Pdb) self.res_reg.params[m:]
array([ 0.,  0.,  0.])
(Pdb) assert_allclose(0, self.res_reg.params[m:])
(Pdb) assert_allclose(0, self.res_reg.params[m:], rtol=0, atol=0)
(Pdb)

This test uses currently assert_almost_equal with decimal=4   :(

regularized estimation with hard thresholding: the first m values are
estimate not equal zero, the m to the end elements are exactly zero.

This is discrete models fit_regularized which predates numpy
assert_allclose.  I haven't checked what the unit test of Kerby's current
additions for fit_regularized looks like.

unit testing is serious business:
I'd rather have good unit test in SciPy related packages than convincing a
few more newbies that they can use the defaults for everything.

Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 4:21 PM, Charles G. Waldman char...@crunch.io
wrote:

 Joseph Martinot-Lagarde writes:

  Compare what's comparable:

 That's fair.

  In addition, you have to use AltGr on some keyboards to get the brackets

 Wow, it must be rather painful to do any real programming on such a
 keyboard!

  - C


 On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde
 joseph.martinot-laga...@m4x.org wrote:
  Le 18/07/2014 20:42, Charles G. Waldman a écrit :
  Well, if the goal is shorthand, typing numpy.array(numpy.mat())
  won't please many users.
 
  But the more I think about it, the less I think Numpy should support
  this (non-Pythonic) input mode.  Too much molly-coddling of new users!
  When doing interactive work I usually just type:
 
  np.array([[1,2,3],
  ...   [4,5,6],
  ...   [7,8,9]])
 
  which is (IMO) easier to read:  e.g. it's not totally obvious that
  1,0,0;0,1,0;0,0,1 represents a 3x3 identity matrix, but
 
  [[1,0,0],
 [0,1,0],
 [0,0,1]]
 
  is pretty obvious.
 
  Compare what's comparable:
 
  [[1,0,0],
[0,1,0],
[0,0,1]]
 
  vs
 
  1 0 0;
  0 1 0;
  0 0 1
 
  or
 
  
  1 0 0;
  0 1 0;
  0 0 1
  
 
  [[1,0,0], [0,1,0], [0,0,1]]
  vs
  1 0 0; 0 1 0; 0 0 1
 
  The difference in (non-whitespace) chars is 19 vs 25, so the
  shorthand doesn't seem to save that much.
 
  Well, it's easier to type  (twice the same character) than [], and you
  have no risk in swapping en opening and a closing bracket. In addition,
  you have to use AltGr on some keyboards to get the brackets. It doesn't
  boils down to a number of characters.
 
 
  Just my €0.02,



It's the year of the notebook.

notebooks are reusable.
notebooks correctly align the brackets in the second and third line
and it looks pretty, just like a matrix


(But, I don't have to teach newbies, and often I even correct whitespace on
the commandline, because it looks ugly and I will eventually copy it to a
script file.)

Josef
no broken windows!
well, except for the ones I don't feel like fixing right now.






 
  - C
 
 
 
 
  On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac alan.is...@gmail.com
 wrote:
  On 7/18/2014 12:45 PM, Mark Miller wrote:
  If the true goal is to just allow quick entry of a 2d array, why not
 just advocate using
  a = numpy.array(numpy.mat(1 2 3; 4 5 6; 7 8 9))
 
 
  It's even simpler:
  a = np.mat(' 1 2 3;4 5 6;7 8 9').A
 
  I'm not putting a dog in this race.  Still I would say that
  the reason why such proposals miss the point is that
  there are introductory settings where one would like
  to explain as few complications as possible.  In
  particular, one might prefer *not* to discuss the
  existence of a matrix type.  As an additional downside,
  this is only good for 2d, and there have been proposals
  for the new array builder to handle other dimensions.
 
  fwiw,
  Alan Isaac
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-18 Thread josef . pktd
On Fri, Jul 18, 2014 at 5:04 PM, Joseph Martinot-Lagarde 
joseph.martinot-laga...@m4x.org wrote:

 Le 18/07/2014 22:46, Chris Barker a écrit :
  On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde
  joseph.martinot-laga...@m4x.org
  mailto:joseph.martinot-laga...@m4x.org wrote:
 
  In addition,
  you have to use AltGr on some keyboards to get the brackets.
 
 
  If it's hard to type square brackets -- you're kind of dead in the water
  with Python anyway -- this is not going to help.
 
  -Chris
 
 Welcome to the azerty world ! ;)

 It's not that hard to type, just a bit more involved. My biggest problem
 is that you have to type the opening and closing bracket for each line,
 with a comma in between. It will always be harder and more error prone
 than a single semicolon, whatever the keyboard.

 My use case is not teaching but doing quick'n'dirty computations with a
 few values. Sometimes these values are copy-pasted from a space
 separated file, or from a printed array in another console. Having to
 add comas and bracket makes simple computations less easy. That's why I
 often use Octave for these.


my copy paste approaches for almost quick'n'dirty (no semicolons):

given:

a b c

1 2 3

4 5 6

7 8 9


(select  Ctrl-C)


 pandas.read_clipboard(sep=' ')

   a  b  c

0  1  2  3

1  4  5  6

2  7  8  9


 np.asarray(pandas.read_clipboard())

array([[1, 2, 3],

   [4, 5, 6],

   [7, 8, 9]], dtype=int64)


 pandas.read_clipboard().values

array([[1, 2, 3],

   [4, 5, 6],

   [7, 8, 9]], dtype=int64)


arr = np.array('''\

1 2 3

4 5 6

7 8 9'''.split(), float).reshape(-1, 3)


the last is not so quick and dirty but reusable and reused.


Josef







 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-07 Thread josef . pktd
On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote:
  On 7/7/2014 7:17 AM, Daπid wrote:
   How about a new one? np.matarray, for MATLAB array.
 
 
  How about `str2arr` or even `build`, since teaching appears to be a
 focus.
  Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d.
  It seems unambiguous to allow '1 2 3;;' to be 3d, or even
  '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting
  that, not urging that it be implemented.
 

 Probably overdoing it, but if we plan on more then just this, what about
 banning such functions to something like numpy.interactive/numpy.helpers
 which you can then import * (or better specific functions) from?

 I think the fact that you need many imports on startup should rather be
 fixed by an ipython scientific mode or other startup imports.



Is this whole thing really worth it? We get back to a numpy pylab.

First users learn the dirty shortcuts, and then they have to learn how to
do it properly.

(I'm using quite often string split and reshape for copy-pasted text
tables.)

Josef




 - Sebastian

  Alan Isaac
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-07 Thread josef . pktd
On Mon, Jul 7, 2014 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jul 7, 2014 at 3:28 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Mo, 2014-07-07 at 09:50 -0400, josef.p...@gmail.com wrote:
 
  On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote:
   On 7/7/2014 7:17 AM, Daπid wrote:
How about a new one? np.matarray, for MATLAB array.
  
  
   How about `str2arr` or even `build`, since teaching appears
  to be a focus.
   Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become
  2d.
   It seems unambiguous to allow '1 2 3;;' to be 3d, or even
   '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting
   that, not urging that it be implemented.
  
 
  Probably overdoing it, but if we plan on more then just this,
  what about
  banning such functions to something like
  numpy.interactive/numpy.helpers
  which you can then import * (or better specific functions)
  from?
 
  I think the fact that you need many imports on startup should
  rather be
  fixed by an ipython scientific mode or other startup imports.
 
 
 
 
  Is this whole thing really worth it? We get back to a numpy pylab.
 
 
  First users learn the dirty shortcuts, and then they have to learn how
  to do it properly.
 
 
  Yeah, you are right. Just a bit afraid of creating too many such
  functions that I am not sure are very useful/used much. For example I am
  not sure that many use np.r_ or np.c_

 Yeah, we definitely have too many random bits of API around overall.
 But I think this one is probably worthwhile. It doesn't add any real
 complexity (no new types, trivial for readers to understand the first
 time they encounter it, etc.), and it addresses a recurring perceived
 shortcoming of numpy that people run into in the first 5 minutes of
 use, at a time when it's pretty easy to give up and go back to Matlab.
 And, it removes one of the perceived advantages of np.matrix over
 np.ndarray, so it smooths our way for eventually phasing out
 np.matrix.

 I'm not sure that preserving np.arrtab is that important (tab here
 only saves 1 character!), but some possible alternatives for short
 names:

   np.marr  (matlab-like array construction)
   np.sarr   (string array)
   np.parse


short like np.s   (didn't know there is already s_)

something long like

 np.fromstring('1 2', sep=' ')
array([ 1.,  2.])


 np.fromstring2d('1 2 3; 5 3.4 7')
array([[ 1. ,  2. ,  3. ],
   [ 5. ,  3.4,  7. ]])


Josef






 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Switch to using ATLAS for OSX binary wheels

2014-06-16 Thread josef . pktd
On Mon, Jun 16, 2014 at 7:10 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sun, Jun 15, 2014 at 10:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Sat, Jun 14, 2014 at 11:56 AM, Matthew Brett matthew.br...@gmail.com
 wrote:



 On Friday, June 13, 2014, Ralf Gommers ralf.gomm...@gmail.com wrote:




 On Fri, Jun 13, 2014 at 2:07 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 Summary : I'm planning to upload OSX wheels for numpy and scipy using
 the ATLAS blas / lapack library instead of the default OSX Accelerate
 framework.

 We've run into some trouble with a segfault in recent OSX Accelerate:

 https://github.com/numpy/numpy/issues/4007

 and Accelerate also doesn't play well with multiprocessing.

 https://github.com/numpy/numpy/issues/4776

 Because there's nothing I love more than half-day compilation runs on
 my laptop, I've built ATLAS binaries with gcc 4.8, and linked numpy
 and scipy to them to make OSX wheels.  These pass all tests in i386
 and x86_64 mode, including numpy, scipy, matplotlib, pandas:


 https://travis-ci.org/matthew-brett/scipy-stack-osx-testing/builds/27442987

 The build process needs some automating, but it's recorded here:

 https://github.com/matthew-brett/numpy-atlas-binaries

 It's possible to get travis-ci to build these guys from a bare machine
 and then upload them somewhere, but I haven't tried to do that.

 Meanwhile Sturla kindly worked up a patch to numpy to work round the
 Accelerate segfault [1].  I haven't tested that, but given I'd already
 built the wheels, I prefer the ATLAS builds because they work with
 multiprocessing.

 I propose uploading these wheels as the default for numpy and scipy.
 Does anyone have any objection or comments before I go ahead and do
 that?


 From your README and wscript I don't see what numpy version you're using
 to compile scipy against. I got the impression that you used 1.8.1, but it
 should be numpy 1.5.1 for the 2.7 build, and 1.7.1 for 3.x.

 I've tried the scipy 0.14.0 python2.7 wheel, but I get import errors (see
 below). Your wheels should work with all common Python installs (mine is
 homebrew) right?

 Ralf


 $ python2.7 -c import scipy; scipy.test()
 Running unit tests for scipy
 NumPy version 1.9.0.dev-056ab73
 NumPy is installed in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy
 SciPy version 0.14.0
 SciPy is installed in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy
 Python version 2.7.5 (default, Jun 18 2013, 21:21:44) [GCC 4.2.1
 Compatible Apple LLVM 4.2 (clang-425.0.28)]
 nose version 1.3.0
 E...EEEE

 ==
 ERROR: Failure: ImportError
 (dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so,
 2): Symbol not found: _PyModule_Create2
   Referenced from:
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so
   Expected in: flat namespace
  in
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so)

 --
 Traceback (most recent call last):
   File
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py,
 line 413, in loadTestsFromName
 addr.filename, addr.module)
   File
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py,
 line 47, in importFromPath
 return self.importFromDir(dir_path, fqname)
   File
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py,
 line 94, in importFromDir
 mod = load_module(part_fqname, fh, filename, desc)
   File
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/__init__.py,
 line 27, in module
 from . import vq, hierarchy
   File
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.py,
 line 175, in module
 from . import _hierarchy_wrap
 ImportError:
 dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so,
 2): Symbol not found: _PyModule_Create2
   Referenced from:
 /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so
   Expected in: flat namespace
  in
 

Re: [Numpy-discussion] numpy.random selection from a loglogistic distribution?

2014-06-16 Thread josef . pktd
On Mon, Jun 16, 2014 at 1:38 PM, Liz VanWormer evanwor...@ucdavis.edu wrote:
 Dear all,

 I'm a novice Python user, and I need to draw random variables from a
 loglogistic distribution. I've used the numpy.random command in the past to
 select variables from supported distributions (beta, normal, lognormal,
 etc). Is there a way to add in a distribution to be used with the
 numpy.random command?

You or someone can add new distribution for numpy.random.

However, scipy has the fisk distribution which according to wikipedia
is the same as log-logistic.
fisk is implemented as a special case of burr and has an explicit
inverse cdf (ppf)

So using fisk.rvs should be reasonably fast for vectorized calls,
since the overhead is much larger than for numpy.random functions.

http://en.wikipedia.org/wiki/Log-logistic_distribution

to make distribution names more fun:
Wikipedia:generalized log-logistic distribution  These include the
Burr Type XII distribution (also known as the Singh-Maddala
distribution) 

Josef


 Thank you for your insight,
 Liz

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release

2014-06-09 Thread josef . pktd
On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:




 On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback jeffreb...@gmail.com wrote:

 The one pandas test failure that is valid: ERROR: test_interp_regression
 (pandas.tests.test_generic.TestSeries)

 has been fixed in pandas master / 0.14.1 (prob releasing in 1 month).

 (the other test failures are for clipboard / network issues)




 Thanks for the feedback.

 snip


 Looks to me like a fair number of the other test failures are actually
 bugs in the tests, or could be argued to be so.


 Or rather, actual bugs, but not in numpy. I can see this will take a while
 to settle ;)

not really a bug on our side given the old numpy behavior
I was surprised about the failure because it still produces the correct result.

params[nuis_param_index] = nuisance_params
ValueError: shape mismatch: value array of shape (2,) could not be
broadcast to indexing result of shape (0,)

As far as I have figured out based on Christoph's test failures

`nuis_param_index`  is array([], dtype=int32)
and because numpy didn't assign anything, `nuisance_params` was picked
essentially arbitrary along this code path

 x = np.arange(5.)
 we_dont_care_what_this_is = np.random.randn(10)
 x[[]] = we_dont_care_what_this_is
 x
array([ 0.,  1.,  2.,  3.,  4.])

Does x[[]] = [] work with numpy 1.9?

Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release

2014-06-09 Thread josef . pktd
On Mon, Jun 9, 2014 at 11:10 PM,  josef.p...@gmail.com wrote:
 On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:




 On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback jeffreb...@gmail.com wrote:

 The one pandas test failure that is valid: ERROR: test_interp_regression
 (pandas.tests.test_generic.TestSeries)

 has been fixed in pandas master / 0.14.1 (prob releasing in 1 month).

 (the other test failures are for clipboard / network issues)




 Thanks for the feedback.

 snip


 Looks to me like a fair number of the other test failures are actually
 bugs in the tests, or could be argued to be so.


 Or rather, actual bugs, but not in numpy. I can see this will take a while
 to settle ;)

 not really a bug on our side given the old numpy behavior

forgot to define: our side = statsmodels

 I was surprised about the failure because it still produces the correct 
 result.

 params[nuis_param_index] = nuisance_params
 ValueError: shape mismatch: value array of shape (2,) could not be
 broadcast to indexing result of shape (0,)

 As far as I have figured out based on Christoph's test failures

 `nuis_param_index`  is array([], dtype=int32)
 and because numpy didn't assign anything, `nuisance_params` was picked
 essentially arbitrary along this code path

 x = np.arange(5.)
 we_dont_care_what_this_is = np.random.randn(10)
 x[[]] = we_dont_care_what_this_is
 x
 array([ 0.,  1.,  2.,  3.,  4.])

 Does x[[]] = [] work with numpy 1.9?

 Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

2014-06-05 Thread josef . pktd
On Thu, Jun 5, 2014 at 8:58 AM, Todd toddr...@gmail.com wrote:


 On 5 Jun 2014 14:28, David Cournapeau courn...@gmail.com wrote:
 
 
 
 
  On Thu, Jun 5, 2014 at 9:44 AM, Todd toddr...@gmail.com wrote:
 
 
  On 5 Jun 2014 02:57, Nathaniel Smith n...@pobox.com wrote:
  
   On Wed, Jun 4, 2014 at 7:18 AM, Travis Oliphant tra...@continuum.io
 wrote:
   And numpy will be much harder to replace than numeric --
   numeric wasn't the most-imported package in the pythonverse ;-).
 
  If numpy is really such a core part of  python ecosystem, does it
 really make sense to keep it as a stand-alone package?  Rather than
 thinking about a numpy 2, might it be better to be focusing on getting
 ndarray and dtype to a level of quality where acceptance upstream might be
 plausible?
 
 
  There has been discussions about integrating numpy a long time ago
 (can't find a reference right now), and the consensus was that this was
 possible in its current shape nor advisable. The situation has not changed.
 
  Putting something in the stdlib means it basically cannot change
 anymore: API compatibility requirements would be stronger than what we
 provide even now. NumPy is also a large codebase which would need some
 major clean up to be accepted, etc...
 
  David

 I am not suggesting merging all of numpy, only ndarray and dtype (which I
 know is a huge job itself).  And perhaps not even all of what us currently
 included in those, some methods could be split out to their own functions.

 And any numpy 2.0 would also imply a major code cleanup.  So although
 ndarray and dtype are certainly not ready for such a thing right now, if
 you are talking about numpy 2.0 already, perhaps part of that discussion
 could involve a plan to get the code into a state where such a move might
 be plausible.  Even if the merge doesn't actually happen, having the code
 at that quality level would still be a good thing.

 I agree that the relationship between numpy and python has not changed
 very much in the last few years, but I think the scientific computing
 landscape is changing.  The latter issue is where my primary concern lies.

I don't think it would have any effect on scientific computing users. It
might be useful for other users that occasionally want to do a bit of array
processing.

Scientific users need the extended SciPy stack and not a part of numpy that
can be imported from the standard library.
For example in Data Science, where I pay more attention and where Python
is getting pretty popular, the usual recommended list requires numpy scipy
and 5 to 10 more python libraries.

Should pandas also go into the python standard library?
Python 3.4 got a statistics library, but I don't think it has any effect on
the potential statsmodels user base.

Josef



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

2014-06-05 Thread josef . pktd
On Thu, Jun 5, 2014 at 8:40 AM, David Cournapeau courn...@gmail.com wrote:




 On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant tra...@continuum.io
 wrote:

 Believe me, I'm all for incremental changes if it is actually possible
 and doesn't actually cost more.  It's also why I've been silent until now
 about anything we are doing being a candidate for a NumPy 2.0.  I
 understand the challenges of getting people to change.  But, features and
 solid improvements *will* get people to change --- especially if their new
 library can be used along with the old library and the transition can be
 done gradually. Python 3's struggle is the lack of features.

 At some point there *will* be a NumPy 2.0.   What features go into NumPy
 2.0, how much backward compatibility is provided, and how much porting is
 needed to move your code from NumPy 1.X to NumPy 2.X is the real user
 question --- not whether it is characterized as incremental change or
 re-write. What I call a re-write and what you call an
 incremental-change are two points on a spectrum and likely overlap
 signficantly if we really compared what we are thinking about.

 One huge benefit that came out of the numeric / numarray / numpy
 transition that we mustn't forget about was actually the extended buffer
 protocol and memory view objects.  This really does allow multiple array
 objects to co-exist and libraries to use the object that they prefer in a
 way that did not exist when Numarray / numeric / numpy came out.So, we
 shouldn't be afraid of that world.   The existence of easy package managers
 to update environments to try out new features and have applications on a
 single system that use multiple versions of the same library is also
 something that didn't exist before and that will make any transition easier
 for users.

 One thing I regret about my working on NumPy originally is that I didn't
 have the foresight, skill, and understanding to work more on a more
 extended and better designed multiple-dispatch system so that multiple
 array objects could participate together in an expression flow.   The
 __numpy_ufunc__ mechanism gives enough capability in that direction that it
 may be better now.

 Ultimately, I don't disagree that NumPy can continue to exist in
 incremental change mode ( though if you are swapping out whole swaths of
 C-code for Cython code --- it sounds a lot like a re-write) as long as
 there are people willing to put the effort into changing it.   I think this
 is actually benefited by the existence of other array objects that are
 pushing the feature envelope without the constraints --- in much the same
 way that the Python standard library is benefitted by many versions of
 different capabilities being tried out before moving into the standard
 library.

 I remain optimistic that things will continue to improve in multiple
 ways --- if a little messier than any of us would conceive individually.
   It *is* great to see all the PR's coming from multiple people on NumPy
 and all the new energy around improving things whether great or small.


 @nathaniel IIRC, one of the objections to the missing values work was
 that it changed the underlying array object by adding a couple of variables
 to the structure. I'm willing to do that sort of thing, but it would be
 good to have general agreement that that is acceptable.



 I think changing the ABI for some versions of numpy (2.0 , whatever) is
 acceptable. There is little doubt that the ABI will need to change to
 accommodate a better and more flexible architecture.

 Changing the C API is more tricky: I am not up to date to the changes from
 the last 2-3 years, but at that time, most things could have been changed
 internally without breaking much, though I did not go far enough to
 estimate what the performance impact could be (if any).


My impression is that you can do it once (in a while) so that no more than
two incompatible versions of numpy are alive at the same time.

It doesn't look worse to me than supporting a new python version, but
doubles the number of binaries and wheels.

(Supporting python 3.4 for cython based projects was hoping or helping that
cython takes care of it. And cython developers took care of it. )

Josef





 As to blaze/dynd, I'd like to steal bits here and there, and maybe in the
 long term base numpy on top of it with a compatibility layer. There is a
 lot of thought and effort that has gone into those projects and we should
 use what we can. As is, I think numpy is good for another five to ten years
 and will probably hang on for fifteen, but it will be outdated by the end
 of that period. Like great whites, we need to keep swimming just to have
 oxygen. Software projects tend to be obligate ram ventilators.

 The Python 3 experience is definitely something we want to avoid. And
 while blaze does big data and offers some nice features, I don't know 

Re: [Numpy-discussion] ANN: Pandas 0.14.0 released

2014-05-31 Thread josef . pktd
No comment,

Josef

On Fri, May 30, 2014 at 6:16 PM, Neal Becker ndbeck...@gmail.com wrote:
 pip install --user --up pandas
 Downloading/unpacking pandas from
 https://pypi.python.org/packages/source/p/pandas/pandas-0.14.0.tar.gz#md5=b775987c0ceebcc8d5ace4a1241c967a
 ...

 Downloading/unpacking numpy=1.6.1 from
 https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678
 (from pandas)
   Downloading numpy-1.8.1.tar.gz (3.8MB): 3.8MB downloaded
   Running setup.py egg_info for package numpy
 Running from numpy source directory.

 warning: no files found matching 'tools/py3tool.py'
 warning: no files found matching '*' under directory 'doc/f2py'
 warning: no previously-included files matching '*.pyc' found anywhere in
 distribution
 warning: no previously-included files matching '*.pyo' found anywhere in
 distribution
 warning: no previously-included files matching '*.pyd' found anywhere in
 distribution
 Downloading/unpacking six from
 https://pypi.python.org/packages/source/s/six/six-1.6.1.tar.gz#md5=07d606ac08595d795bf926cc9985674f
 (from python-dateutil-pandas)
   Downloading six-1.6.1.tar.gz
   Running setup.py egg_info for package six

 no previously-included directories found matching 'documentation/_build'
 Installing collected packages: pandas, pytz, numpy, six
 

 What?  I already have numpy-1.8.0 installed (also have six, pytz).

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] smoothing function

2014-05-15 Thread josef . pktd
On Thu, May 15, 2014 at 8:04 AM, rodrigo koblitz
rodrigokobl...@gmail.comwrote:

 Buenos,
 I'm reading Zuur book (ecology models with R) and try make it entire in
 python.
 Have this function in R:
 M4 - gam(So ∼ s(De) + factor(ID), subset = I1)

 the 's' term indicated with So is modelled as a smoothing function of De

 I'm looking for something close to this in python.


These kind of general questions are better asked on the scipy-user mailing
list which covers more general topics than numpy-discussion.

As far as I know, GAMs are not available in python, at least I never came
across any.

statsmodels has an ancient GAM in the sandbox that has never been connected
to any smoother, since, lowess, spline and kernel regression support was
missing. Nobody is working on that right now.
If you have only a single nonparametric variable, then statsmodels also has
partial linear model based on kernel regression, that is not cleaned up or
verified, but Padarn is currently working on this.

I think in this case using a penalized linear model with spline basis
functions would be more efficient, but there is also nothing clean
available, AFAIK.

It's not too difficult to write the basic models, but it takes time to
figure out the last 10% and to verify the results and write unit tests.


If you make your code publicly available, then I would be very interested
in a link. I'm trying to collect examples from books that have a python
solution.

Josef



 Someone can help me?

 abraços,
 Koblitz

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] smoothing function

2014-05-15 Thread josef . pktd
On Thu, May 15, 2014 at 12:17 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, May 15, 2014 at 1:04 PM, rodrigo koblitz
 rodrigokobl...@gmail.com wrote:
  Buenos,
  I'm reading Zuur book (ecology models with R) and try make it entire in
  python.
  Have this function in R:
  M4 - gam(So ∼ s(De) + factor(ID), subset = I1)
 
  the 's' term indicated with So is modelled as a smoothing function of De
 
  I'm looking for something close to this in python.

 The closest thing that doesn't require writing your own code is
 probably to use patsy's [1] support for (simple unpenalized) spline
 basis transformations [2]. I think using statsmodels this works like:

 import statsmodels.formula.api as smf
 # adjust '5' to taste -- bigger = wigglier, less bias, more overfitting
 results = smf.ols(So ~ bs(De, 5) + C(ID), data=my_df).fit()
 print results.summary()


Nice



 To graph the resulting curve you'll want to use the results to somehow
 do prediction -- I'm not sure what the API for that looks like in
 statsmodels. If you need help figuring it out then the asking on the
 statsmodels list or stackoverflow is probably the quickest way to get
 help.


seems to work (in a very simple made up example)

results.predict({'De':np.arange(1,5), 'ID':['a']*4}, transform=True)
#array([ 0.75 , 1.0833, 0.75 , 0.4167])

Josef


 -n

 [1] http://patsy.readthedocs.org/en/latest/
 [2]
 http://patsy.readthedocs.org/en/latest/builtins-reference.html#patsy.builtins.bs

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-27 Thread josef . pktd
On Sun, Apr 27, 2014 at 6:06 PM, Carl Kleffner cmkleff...@gmail.com wrote:

 A possible option is to install the toolchain inside site-packages and to
 deploy it as PYPI wheel or wininst packages. The PATH to the toolchain
 could be extended during import of the package. But I have no idea, whats
 the best strategy to additionaly install ATLAS or other third party
 libraries.


What I did in the past is just to download the ATLAS binaries from the
scipy/numpy wiki and move them into the Python/Dlls directory. IIRC

My impression was that finding ATLAS binaries was the difficult part, not
moving them into the right directory.



 Cheers,

 Carl


 2014-04-27 23:46 GMT+02:00 Matthew Brett matthew.br...@gmail.com:

 Hi,

 On Sun, Apr 27, 2014 at 2:34 PM, Carl Kleffner cmkleff...@gmail.com
 wrote:
  Hi,
 
  I will definitly don't have not time until thursday this week working
 out
  the github workflow for a numpy pull request. So feel free to do it for
 me.

 OK - I will have a go at this tomorrow.

  BTW: There is a missing feature in the mingw-w64 toolchain. By now it
  features linking to msvcrt90 runtime only. I have do extend the specs
 file
  to allow linking to msvcr100 with an addional flag. Or create a
 dedicated
  toolchain - what do you think?

 I don't know.

 Is this a discussion that should go to the mingw-w64 list do you
 think?  It must be a very common feature.

 As you know, I'm really hoping it will be possible make a devkit for
 Python similar to the Ruby devkits [1].


I got my entire initial setup on the computer I'm using right now through
python-xy, including MingW 32.

The only thing I ever had to do was to create the `distutils.cfg` in new
python install.

python-xy relies on the availability of a open source development
environment for numpy and scipy, and has been restricted so far to python
32 versions.
winpython is only a python distribution and is also available for 64bit
(with Gohlke binaries I think)

I think it would be very helpful to get python-xy set up for development
for 64 bit versions, now that the toolchain with MingW is available.

I'm skeptical about having lot's of distributions that install all their
own full toolchain (I always worry about which one is actually on the path.
 I deleted my first git for Windows version because it came with a built-in
MSYS/MingW toolchain and now just use the nice and small portable version.)




 The ideal would be a devkit that transparently picked up 32 vs 64 bit,
 and MSVC runtime according to the Python version.  For example, OSX
 compilation automatically picks up the OSX SDK with which the relevant
 Python was built.  Do you think something like this is possible?  That
 would be a great improvement for people building extensions and wheels
 on Windows.


How does MingW64 decide whether to build 32 or to build 64 bit versions?
Does the python version matter for MingW?

or should this pick up one of the Visual SDK's that the user needs to
install?

Josef




 Cheers,

 Matthew

 [1] http://rubyinstaller.org/add-ons/devkit/
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Thu, Apr 24, 2014 at 5:26 PM,  josef.p...@gmail.com wrote:
 
 
 
  On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
  Hi Matthew,
 
  On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Hi,
 
  Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
  (main author of ATLAS), I've built 64-bit windows numpy and scipy
  wheels for testing.
 
  The build uses Carl's custom mingw-w64 build with static linking.
 
  There are two harmless test failures on scipy (being discussed on
 the
  list at the moment) - tests otherwise clean.
 
  Wheels are here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
 
  You can test with:
 
  pip install -U pip # to upgrade pip to latest
  pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
  scipy
 
  Please do send feedback.
 
  ATLAS binary here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
 
  Many thanks for Carl in particular for doing all the hard work,
 
 
  Cool. After all these long years... Now all we need is a box running
  tests for CI.
 
  Chuck
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  I get two test failures with numpy
 
  Josef
 
   np.test()
  Running unit tests for numpy
  NumPy version 1.8.1
  NumPy is installed in C:\Python27\lib\site-packages\numpy
  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64
 bit
  (AMD64)]
  nose version 1.1.2
 
  ==
  FAIL: test_iterator.test_iter_broadcasting_errors
  --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 657,
  in test_iter_broadcasting_errors
  '(2)-(2,newaxis)') % msg)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 44,
  in assert_
  raise AssertionError(msg)
  AssertionError: Message operands could not be broadcast together with
  remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis)
 and
  requested shape (4,3) doesn't contain remapped operand
  shape(2)-(2,newaxis)
 
  ==
  FAIL: test_iterator.test_iter_array_cast
  --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 836,
  in test_iter_array_cast
  assert_equal(i.operands[0].strides, (-96,8,-32))
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 255,
  in assert_equal
  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
  verbose)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 317,
  in assert_equal
  raise AssertionError(msg)
  AssertionError:
  Items are not equal:
  item=0
 
   ACTUAL: 96L
   DESIRED: -96
 
  --
  Ran 4828 tests in 46.306s
 
  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
  nose.result.TextTestResult run=4828 errors=0 failures=2
 
 
  Strange. That second one looks familiar, at least the -96 part.
 Wonder
  why this doesn't show up with the MKL builds.
 
 
  ok tried again, this time deleting the old numpy directories before
  installing
 
  Ran 4760 tests in 42.124s
 
  OK (KNOWNFAIL=10, SKIP=8)
  nose.result.TextTestResult run=4760 errors=0 failures=0
 
 
  so pip also seems to be reusing leftover files.
 
  all clear.
 
 
  Running the statsmodels test suite, I get a failure in
  test_discrete.TestProbitCG where fmin_cg converges to something that
 differs
  in the 3rd decimal.
 
  I usually only test the 32-bit version, so I don't know if this is
 specific
  to this scipy version, but we haven't seen this in a long time.
  I used our nightly binaries http://statsmodels.sourceforge.net/binaries/

 That's interesting, you saw also we're getting failures on the tests
 for powell optimization because of small unit-at-last-place
 differences in the exp function in mingw-w64.  Is there any chance you
 can track down 

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 10:10 AM, josef.p...@gmail.com wrote:




 On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Thu, Apr 24, 2014 at 5:26 PM,  josef.p...@gmail.com wrote:
 
 
 
  On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
  Hi Matthew,
 
  On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Hi,
 
  Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
  (main author of ATLAS), I've built 64-bit windows numpy and scipy
  wheels for testing.
 
  The build uses Carl's custom mingw-w64 build with static linking.
 
  There are two harmless test failures on scipy (being discussed on
 the
  list at the moment) - tests otherwise clean.
 
  Wheels are here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
 
  You can test with:
 
  pip install -U pip # to upgrade pip to latest
  pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy
  scipy
 
  Please do send feedback.
 
  ATLAS binary here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
 
  Many thanks for Carl in particular for doing all the hard work,
 
 
  Cool. After all these long years... Now all we need is a box running
  tests for CI.
 
  Chuck
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  I get two test failures with numpy
 
  Josef
 
   np.test()
  Running unit tests for numpy
  NumPy version 1.8.1
  NumPy is installed in C:\Python27\lib\site-packages\numpy
  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64
 bit
  (AMD64)]
  nose version 1.1.2
 
 
 ==
  FAIL: test_iterator.test_iter_broadcasting_errors
 
 --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 657,
  in test_iter_broadcasting_errors
  '(2)-(2,newaxis)') % msg)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 44,
  in assert_
  raise AssertionError(msg)
  AssertionError: Message operands could not be broadcast together
 with
  remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis)
 and
  requested shape (4,3) doesn't contain remapped operand
  shape(2)-(2,newaxis)
 
 
 ==
  FAIL: test_iterator.test_iter_array_cast
 
 --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 836,
  in test_iter_array_cast
  assert_equal(i.operands[0].strides, (-96,8,-32))
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 255,
  in assert_equal
  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
  verbose)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 317,
  in assert_equal
  raise AssertionError(msg)
  AssertionError:
  Items are not equal:
  item=0
 
   ACTUAL: 96L
   DESIRED: -96
 
 
 --
  Ran 4828 tests in 46.306s
 
  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
  nose.result.TextTestResult run=4828 errors=0 failures=2
 
 
  Strange. That second one looks familiar, at least the -96 part.
 Wonder
  why this doesn't show up with the MKL builds.
 
 
  ok tried again, this time deleting the old numpy directories before
  installing
 
  Ran 4760 tests in 42.124s
 
  OK (KNOWNFAIL=10, SKIP=8)
  nose.result.TextTestResult run=4760 errors=0 failures=0
 
 
  so pip also seems to be reusing leftover files.
 
  all clear.
 
 
  Running the statsmodels test suite, I get a failure in
  test_discrete.TestProbitCG where fmin_cg converges to something that
 differs
  in the 3rd decimal.
 
  I usually only test the 32-bit version, so I don't know if this is
 specific
  to this scipy version, but we haven't seen this in a long time.
  I used our nightly binaries
 http://statsmodels.sourceforge.net/binaries/

 That's interesting, you saw also we're getting failures on the tests
 for powell optimization because of small unit-at-last-place
 differences in 

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 10:20 AM, josef.p...@gmail.com wrote:




 On Sat, Apr 26, 2014 at 10:10 AM, josef.p...@gmail.com wrote:




 On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett 
 matthew.br...@gmail.comwrote:

 Hi,

 On Thu, Apr 24, 2014 at 5:26 PM,  josef.p...@gmail.com wrote:
 
 
 
  On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote:
 
 
 
 
  On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
  Hi Matthew,
 
  On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett
  matthew.br...@gmail.com wrote:
 
  Hi,
 
  Thanks to Cark Kleffner's toolchain and some help from Clint
 Whaley
  (main author of ATLAS), I've built 64-bit windows numpy and scipy
  wheels for testing.
 
  The build uses Carl's custom mingw-w64 build with static linking.
 
  There are two harmless test failures on scipy (being discussed on
 the
  list at the moment) - tests otherwise clean.
 
  Wheels are here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl
 
  You can test with:
 
  pip install -U pip # to upgrade pip to latest
  pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy
  scipy
 
  Please do send feedback.
 
  ATLAS binary here:
 
 
 
 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2
 
  Many thanks for Carl in particular for doing all the hard work,
 
 
  Cool. After all these long years... Now all we need is a box
 running
  tests for CI.
 
  Chuck
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  I get two test failures with numpy
 
  Josef
 
   np.test()
  Running unit tests for numpy
  NumPy version 1.8.1
  NumPy is installed in C:\Python27\lib\site-packages\numpy
  Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500
 64 bit
  (AMD64)]
  nose version 1.1.2
 
 
 ==
  FAIL: test_iterator.test_iter_broadcasting_errors
 
 --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 657,
  in test_iter_broadcasting_errors
  '(2)-(2,newaxis)') % msg)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 44,
  in assert_
  raise AssertionError(msg)
  AssertionError: Message operands could not be broadcast together
 with
  remapped shapes [original-remapped]: (2,3)-(2,3)
 (2,)-(2,newaxis) and
  requested shape (4,3) doesn't contain remapped operand
  shape(2)-(2,newaxis)
 
 
 ==
  FAIL: test_iterator.test_iter_array_cast
 
 --
  Traceback (most recent call last):
File C:\Python27\lib\site-packages\nose\case.py, line 197, in
  runTest
  self.test(*self.arg)
File
  C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 836,
  in test_iter_array_cast
  assert_equal(i.operands[0].strides, (-96,8,-32))
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 255,
  in assert_equal
  assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,
 err_msg),
  verbose)
File C:\Python27\lib\site-packages\numpy\testing\utils.py, line
 317,
  in assert_equal
  raise AssertionError(msg)
  AssertionError:
  Items are not equal:
  item=0
 
   ACTUAL: 96L
   DESIRED: -96
 
 
 --
  Ran 4828 tests in 46.306s
 
  FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
  nose.result.TextTestResult run=4828 errors=0 failures=2
 
 
  Strange. That second one looks familiar, at least the -96 part.
 Wonder
  why this doesn't show up with the MKL builds.
 
 
  ok tried again, this time deleting the old numpy directories before
  installing
 
  Ran 4760 tests in 42.124s
 
  OK (KNOWNFAIL=10, SKIP=8)
  nose.result.TextTestResult run=4760 errors=0 failures=0
 
 
  so pip also seems to be reusing leftover files.
 
  all clear.
 
 
  Running the statsmodels test suite, I get a failure in
  test_discrete.TestProbitCG where fmin_cg converges to something that
 differs
  in the 3rd decimal.
 
  I usually only test the 32-bit version, so I don't know if this is
 specific
  to this scipy version, but we haven't seen this in a long time.
  I used our nightly binaries
 http://statsmodels.sourceforge.net/binaries/

 That's interesting, you saw also we're getting failures on the tests
 for 

Re: [Numpy-discussion] 1.9.x branch

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 1:12 PM, Sebastian Berg
sebast...@sipsolutions.netwrote:

 On Mi, 2014-04-23 at 10:11 -0400, josef.p...@gmail.com wrote:
  On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote:
   Hi All,
  
  
   I'd like to branch 1.9.x at the end of the month. There are a couple
   of reasons for the timing. First, we have a lot of new stuff in the
   development branch. Second, there is work ongoing in masked arrays
   that I'd like to keep out of the release so that it has more time to
   settle. Third, it's past time ;)
  
   Sounds good.
  
   There are currently a number of 1.9.0 blockers, which can be seen
   here.
  
   Datetime timezone handling broken in 1.7.x
  
   I don't think there is time to get this done for 1.9.0 and it needs to
   be pushed off to 1.10.0.
  
   Return multiple field selection as ro view
  
   I have a branch for this, but because the returned type differs from a
   copy by alignment spacing there was a test failure. Merging that
   branch might cause some incompatibilities.
  
  
   I am a bit worried here that comparisons might make trouble.
  
   Object array creation new conversion to int
  
  
   This one needs a decision. Julian, Sebastian, thoughts?
  
  
   Maybe for all to consider this is about what happens for object arrays
   if you do things like:
  
   # Array cast to object array (np.array(arr) would be identical):
   a = np.arange(10).astype(object)
   # Array passed into new array creation (not just *one* array):
   b = np.array([np.arange(10)], dtype=object)
   # Numerical array is assigned to object array:
   c = np.empty(10, dtype=object)
   c[...] = np.arange(10)
  
   Before this change, the result was:
   type(a[0]) is int
   type(b[0,0]) is np.int_  # Note the numpy type
   type(c[0]) is int
  
   After this change, they are all `int`. Though note that the numpy type
   is preserved for example for long double. On the one hand preserving
 the
   numpy type might be nice, but on the other hand we don't care much
 about
   the dtypes of scalars and in practice the python types are probably
 more
   often wanted.
 
  what if I don't like python?
 

 Fair point :). I also think it is more consistent if we use the numpy
 types (and the user has to cast to the python type explicitly). Could
 argue that if someone casts to object, they might like python objects,
 but if you don't want them that is tricky, too.

 There is the option that the numerical array - object array cast always
 returns an array of numpy scalars. Making it consistent (opposite to
 what is currently in numpy master).

 This would mean that `numerical_arr.astype(object)` would give an array
 of numpy scalars always. Getting python scalars would only be possible
 using `arr.item()` (or `tolist`). I guess that change is less likely to
 cause problems, because the numpy types can do more things normally
 though they are slower.

 So a (still a bit unsure) +1 from me for making numeric - object casts
 return arrays of numpy scalars unless we have reason to expect that to
 cause problems. Not sure how easy that would be to change, it wasn't a
 one line change when I tried, so there is something slightly tricky to
 clear out, but probably not too hard either.


More general background question.

Why is there casting involved in object arrays?

I thought object arrays will just take and return whatever you put in.
If I use python ints, I might want python ints.
If I use numpy int_s, I might want numpy ints.

b1 = np.array([np.arange(10)], dtype=object)
versus
b2 = np.array([list(range(10))], dtype=object)


 b1 = np.array([np.arange(10)], dtype=object)
 type(b1[0,2])
type 'numpy.int32'
 type(b1[0][2])
type 'numpy.int32'


 b2 = np.array([list(range(10))], dtype=object)
 type(b2[0,2])
type 'int'
 type(b2[0][2])
type 'int'

another version

 type(np.array(np.arange(10).tolist(), dtype=object)[2])
type 'int'
 type(np.array(list(np.arange(10)), dtype=object)[2])
type 'numpy.int32'

Josef



 - Sebastian

   np.int_(0)**(-1)
  inf
   0**-1
  Traceback (most recent call last):
File pyshell#28, line 1, in module
  0**-1
  ZeroDivisionError: 0.0 cannot be raised to a negative power
 
 
   type(np.arange(5)[0])
  type 'numpy.int32'
   np.arange(5)[0]**-1
  inf
 
   type(np.arange(5)[0].item())
  type 'int'
   np.arange(5)[0].item()**-1
  Traceback (most recent call last):
File pyshell#40, line 1, in module
  np.arange(5)[0].item()**-1
  ZeroDivisionError: 0.0 cannot be raised to a negative power
 
   np.__version__
  '1.6.1'
 
 
  I remember struggling through this (avoiding python operations) quite
  a bit in my early bugfixes to scipy.stats.distributions.
 
  (IIRC I ended up avoiding most ints.)
 
  Josef
 
  
   Since I just realized that things are safe (float128 does not cast to
   float after all), I changed my mind and am tempted to keep the new
   behaviour. That is, if it does 

Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com
 wrote:
  Hi,
 
  On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com wrote:
  On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett matthew.br...@gmail.com
 wrote:
  Hi,
 
  I'm exploring Mingw-w64 for numpy building, and I've found it gives a
  slightly different answer for 'exp' than - say - gcc on OSX.
 
  The difference is of the order of the eps value for the output number
  (2 * eps for a result of ~2.0).
 
  Is accuracy somewhere specified for C functions like exp?  Or is
  accuracy left as an implementation detail for the C library author?
 
  C99 says (sec 5.2.4.2.2) that The accuracy of the floating point
  operations ... and of the library functions in math.h and
  complex.h that return floating point results is implemenetation
  defined. The implementation may state that the accuracy is unknown.
  (This last sentence is basically saying that with regard to some
  higher up clauses that required all conforming implementations to
  document this stuff, saying eh, who knows counts as documenting it.
  Hooray for standards!)
 
  Presumably the accuracy in this case is a function of the C library
  anyway, not the compiler?
 
  Mingw-w64 implementation is in assembly:
 
 
 http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
 
  Numpy has its own implementations for a
  bunch of the math functions, and it's been unclear in the past whether
  numpy or the libc implementations were better in any particular case.
 
  I only investigated this particular value, in which case it looked as
  though the OSX value was closer to the exact value (via sympy.mpmath)
  - by ~1 unit-at-the-last-place.  This was causing a divergence in the
  powell optimization path and therefore a single scipy test failure.  I
  haven't investigated further - was wondering what investigation I
  should do, more than running the numpy / scipy test suites.

 Investigating further, with this script:

 https://gist.github.com/matthew-brett/11301221

 The following are tests of np.exp accuracy for input values between 0
 and 10, for numpy 1.8.1.

 If np.exp(x) performs perfectly, it will return the nearest floating
 point value to the exact value of exp(x).  If it does, this scores a
 zero for error in the tables below.  If 'proportion of zeros' is 1 -
 then np.exp performs perfectly for all tested values of exp (as is the
 case for linux here).

 OSX 10.9

 Proportion of zeros: 0.99789
 Sum of error: 2.15021267458e-09
 Sum of squared error: 2.47149370032e-14
 Max / min error: 5.96046447754e-08 -2.98023223877e-08
 Sum of squared relative error: 5.22456992025e-30
 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Debian Jessie / Sid

 Proportion of zeros: 1.0
 Sum of error: 0.0
 Sum of squared error: 0.0
 Max / min error: 0.0 0.0
 Sum of squared relative error: 0.0
 Max / min relative error: 0.0 0.0
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Mingw-w64 Windows 7

 Proportion of zeros: 0.82089
 Sum of error: 8.08415331122e-07
 Sum of squared error: 2.90045099615e-12
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 4.18466468175e-28
 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Take-home : exp implementation for mingw-w64 is exactly (floating
 point) correct 82% of the time, and one unit-at-the-last-place off for
 the rest [1].  OSX is off by 1 ULP only 0.2% of the time.



Windows 64 with MKL

\WinPython-64bit-3.3.2.2\python-3.3.2.amd64python
E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py
Proportion of zeros: 0.99793
Sum of error: -2.10546855506e-07
Sum of squared error: 3.33304327526e-14
Max / min error: 5.96046447754e-08 -5.96046447754e-08
Sum of squared relative error: 4.98420694339e-30
Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
eps:  2.22044604925e-16
Proportion of relative err = eps: 0.0


Windows 32 bit python with official MingW binaries

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32

Proportion of zeros: 0.99464
Sum of error: -3.91621083118e-07
Sum of squared error: 9.2239247812e-14
Max / min error: 5.96046447754e-08 -5.96046447754e-08
Sum of squared relative error: 1.3334972729e-29
Max / min relative error: 2.21593462148e-16 -2.2098803255e-16
eps:  2.22044604925e-16
Proportion of relative err = eps: 0.0




 Is mingw-w64 accurate enough?  Do we have any policy on this?


I wouldn't worry about a missing or an extra eps in our applications, but
the competition is more accurate.

Josef



 Cheers,

 Matthew

 [1] http://matthew-brett.github.io/pydagogue/floating_error.html
 ___
 

Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 8:05 PM, josef.p...@gmail.com wrote:




 On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com
 wrote:
  Hi,
 
  On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com wrote:
  On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett 
 matthew.br...@gmail.com wrote:
  Hi,
 
  I'm exploring Mingw-w64 for numpy building, and I've found it gives a
  slightly different answer for 'exp' than - say - gcc on OSX.
 
  The difference is of the order of the eps value for the output number
  (2 * eps for a result of ~2.0).
 
  Is accuracy somewhere specified for C functions like exp?  Or is
  accuracy left as an implementation detail for the C library author?
 
  C99 says (sec 5.2.4.2.2) that The accuracy of the floating point
  operations ... and of the library functions in math.h and
  complex.h that return floating point results is implemenetation
  defined. The implementation may state that the accuracy is unknown.
  (This last sentence is basically saying that with regard to some
  higher up clauses that required all conforming implementations to
  document this stuff, saying eh, who knows counts as documenting it.
  Hooray for standards!)
 
  Presumably the accuracy in this case is a function of the C library
  anyway, not the compiler?
 
  Mingw-w64 implementation is in assembly:
 
 
 http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
 
  Numpy has its own implementations for a
  bunch of the math functions, and it's been unclear in the past whether
  numpy or the libc implementations were better in any particular case.
 
  I only investigated this particular value, in which case it looked as
  though the OSX value was closer to the exact value (via sympy.mpmath)
  - by ~1 unit-at-the-last-place.  This was causing a divergence in the
  powell optimization path and therefore a single scipy test failure.  I
  haven't investigated further - was wondering what investigation I
  should do, more than running the numpy / scipy test suites.

 Investigating further, with this script:

 https://gist.github.com/matthew-brett/11301221

 The following are tests of np.exp accuracy for input values between 0
 and 10, for numpy 1.8.1.

 If np.exp(x) performs perfectly, it will return the nearest floating
 point value to the exact value of exp(x).  If it does, this scores a
 zero for error in the tables below.  If 'proportion of zeros' is 1 -
 then np.exp performs perfectly for all tested values of exp (as is the
 case for linux here).

 OSX 10.9

 Proportion of zeros: 0.99789
 Sum of error: 2.15021267458e-09
 Sum of squared error: 2.47149370032e-14
 Max / min error: 5.96046447754e-08 -2.98023223877e-08
 Sum of squared relative error: 5.22456992025e-30
 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Debian Jessie / Sid

 Proportion of zeros: 1.0
 Sum of error: 0.0
 Sum of squared error: 0.0
 Max / min error: 0.0 0.0
 Sum of squared relative error: 0.0
 Max / min relative error: 0.0 0.0
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Mingw-w64 Windows 7

 Proportion of zeros: 0.82089
 Sum of error: 8.08415331122e-07
 Sum of squared error: 2.90045099615e-12
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 4.18466468175e-28
 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Take-home : exp implementation for mingw-w64 is exactly (floating
 point) correct 82% of the time, and one unit-at-the-last-place off for
 the rest [1].  OSX is off by 1 ULP only 0.2% of the time.



 Windows 64 with MKL

 \WinPython-64bit-3.3.2.2\python-3.3.2.amd64python
 E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py
 Proportion of zeros: 0.99793
 Sum of error: -2.10546855506e-07
 Sum of squared error: 3.33304327526e-14
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 4.98420694339e-30
 Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0


 Windows 32 bit python with official MingW binaries

 Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
 (Intel)] on win32

 Proportion of zeros: 0.99464
 Sum of error: -3.91621083118e-07
 Sum of squared error: 9.2239247812e-14
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 1.3334972729e-29
 Max / min relative error: 2.21593462148e-16 -2.2098803255e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0




 Is mingw-w64 accurate enough?  Do we have any policy on this?


 I wouldn't worry about a missing or an extra eps in our applications, but
 the competition is more accurate.



Just for comparison, I increased `until` to 300
the 

Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?

2014-04-26 Thread josef . pktd
On Sat, Apr 26, 2014 at 8:31 PM, josef.p...@gmail.com wrote:




 On Sat, Apr 26, 2014 at 8:05 PM, josef.p...@gmail.com wrote:




 On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett 
 matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com
 wrote:
  Hi,
 
  On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com
 wrote:
  On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett 
 matthew.br...@gmail.com wrote:
  Hi,
 
  I'm exploring Mingw-w64 for numpy building, and I've found it gives a
  slightly different answer for 'exp' than - say - gcc on OSX.
 
  The difference is of the order of the eps value for the output number
  (2 * eps for a result of ~2.0).
 
  Is accuracy somewhere specified for C functions like exp?  Or is
  accuracy left as an implementation detail for the C library author?
 
  C99 says (sec 5.2.4.2.2) that The accuracy of the floating point
  operations ... and of the library functions in math.h and
  complex.h that return floating point results is implemenetation
  defined. The implementation may state that the accuracy is unknown.
  (This last sentence is basically saying that with regard to some
  higher up clauses that required all conforming implementations to
  document this stuff, saying eh, who knows counts as documenting it.
  Hooray for standards!)
 
  Presumably the accuracy in this case is a function of the C library
  anyway, not the compiler?
 
  Mingw-w64 implementation is in assembly:
 
 
 http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h
 
  Numpy has its own implementations for a
  bunch of the math functions, and it's been unclear in the past whether
  numpy or the libc implementations were better in any particular case.
 
  I only investigated this particular value, in which case it looked as
  though the OSX value was closer to the exact value (via sympy.mpmath)
  - by ~1 unit-at-the-last-place.  This was causing a divergence in the
  powell optimization path and therefore a single scipy test failure.  I
  haven't investigated further - was wondering what investigation I
  should do, more than running the numpy / scipy test suites.

 Investigating further, with this script:

 https://gist.github.com/matthew-brett/11301221

 The following are tests of np.exp accuracy for input values between 0
 and 10, for numpy 1.8.1.

 If np.exp(x) performs perfectly, it will return the nearest floating
 point value to the exact value of exp(x).  If it does, this scores a
 zero for error in the tables below.  If 'proportion of zeros' is 1 -
 then np.exp performs perfectly for all tested values of exp (as is the
 case for linux here).

 OSX 10.9

 Proportion of zeros: 0.99789
 Sum of error: 2.15021267458e-09
 Sum of squared error: 2.47149370032e-14
 Max / min error: 5.96046447754e-08 -2.98023223877e-08
 Sum of squared relative error: 5.22456992025e-30
 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Debian Jessie / Sid

 Proportion of zeros: 1.0
 Sum of error: 0.0
 Sum of squared error: 0.0
 Max / min error: 0.0 0.0
 Sum of squared relative error: 0.0
 Max / min relative error: 0.0 0.0
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Mingw-w64 Windows 7

 Proportion of zeros: 0.82089
 Sum of error: 8.08415331122e-07
 Sum of squared error: 2.90045099615e-12
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 4.18466468175e-28
 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0

 Take-home : exp implementation for mingw-w64 is exactly (floating
 point) correct 82% of the time, and one unit-at-the-last-place off for
 the rest [1].  OSX is off by 1 ULP only 0.2% of the time.



 Windows 64 with MKL

 \WinPython-64bit-3.3.2.2\python-3.3.2.amd64python
 E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py
 Proportion of zeros: 0.99793
 Sum of error: -2.10546855506e-07
 Sum of squared error: 3.33304327526e-14
 Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 4.98420694339e-30
 Max / min relative error: 2.20881302691e-16 -2.18321571939e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0


 Windows 32 bit python with official MingW binaries

 Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
 (Intel)] on win32

 Proportion of zeros: 0.99464
 Sum of error: -3.91621083118e-07
 Sum of squared error: 9.2239247812e-14
  Max / min error: 5.96046447754e-08 -5.96046447754e-08
 Sum of squared relative error: 1.3334972729e-29
 Max / min relative error: 2.21593462148e-16 -2.2098803255e-16
 eps:  2.22044604925e-16
 Proportion of relative err = eps: 0.0




 Is mingw-w64 accurate enough?  Do we have any policy on this?


 I wouldn't worry about a missing or an extra eps in our applications, but
 the competition is 

Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:


 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


 Cool. After all these long years... Now all we need is a box running tests
 for CI.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


I get two test failures with numpy

Josef

 np.test()
Running unit tests for numpy
NumPy version 1.8.1
NumPy is installed in C:\Python27\lib\site-packages\numpy
Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
(AMD64)]
nose version 1.1.2

==
FAIL: test_iterator.test_iter_broadcasting_errors
--
Traceback (most recent call last):
  File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
self.test(*self.arg)
  File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
line 657, in test_iter_broadcasting_errors
'(2)-(2,newaxis)') % msg)
  File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in
assert_
raise AssertionError(msg)
AssertionError: Message operands could not be broadcast together with
remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and
requested shape (4,3) doesn't contain remapped operand
shape(2)-(2,newaxis)

==
FAIL: test_iterator.test_iter_array_cast
--
Traceback (most recent call last):
  File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
self.test(*self.arg)
  File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
line 836, in test_iter_array_cast
assert_equal(i.operands[0].strides, (-96,8,-32))
  File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in
assert_equal
assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
verbose)
  File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in
assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
item=0

 ACTUAL: 96L
 DESIRED: -96

--
Ran 4828 tests in 46.306s

FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
nose.result.TextTestResult run=4828 errors=0 failures=2
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:08 PM, josef.p...@gmail.com wrote:




 On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:


 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett 
 matthew.br...@gmail.comwrote:

 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
 scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


 Cool. After all these long years... Now all we need is a box running
 tests for CI.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 I get two test failures with numpy


scipy looks good, just two powell trace failures

Josef



 Josef

  np.test()
 Running unit tests for numpy
 NumPy version 1.8.1
 NumPy is installed in C:\Python27\lib\site-packages\numpy
 Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
 (AMD64)]
 nose version 1.1.2

 ==
 FAIL: test_iterator.test_iter_broadcasting_errors
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 657, in test_iter_broadcasting_errors
 '(2)-(2,newaxis)') % msg)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in
 assert_
 raise AssertionError(msg)
 AssertionError: Message operands could not be broadcast together with
 remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and
 requested shape (4,3) doesn't contain remapped operand
 shape(2)-(2,newaxis)

 ==
 FAIL: test_iterator.test_iter_array_cast
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 836, in test_iter_array_cast
 assert_equal(i.operands[0].strides, (-96,8,-32))
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255,
 in assert_equal
 assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
 verbose)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317,
 in assert_equal
 raise AssertionError(msg)
 AssertionError:
 Items are not equal:
 item=0

  ACTUAL: 96L
  DESIRED: -96

 --
 Ran 4828 tests in 46.306s

 FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
 nose.result.TextTestResult run=4828 errors=0 failures=2



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote:




 On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:


 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett 
 matthew.br...@gmail.comwrote:

 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
 scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


 Cool. After all these long years... Now all we need is a box running
 tests for CI.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 I get two test failures with numpy

 Josef

  np.test()
 Running unit tests for numpy
 NumPy version 1.8.1
 NumPy is installed in C:\Python27\lib\site-packages\numpy
 Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
 (AMD64)]
 nose version 1.1.2

 ==
 FAIL: test_iterator.test_iter_broadcasting_errors
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 657, in test_iter_broadcasting_errors
 '(2)-(2,newaxis)') % msg)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44,
 in assert_
 raise AssertionError(msg)
 AssertionError: Message operands could not be broadcast together with
 remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and
 requested shape (4,3) doesn't contain remapped operand
 shape(2)-(2,newaxis)

 ==
 FAIL: test_iterator.test_iter_array_cast
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py,
 line 836, in test_iter_array_cast
 assert_equal(i.operands[0].strides, (-96,8,-32))
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255,
 in assert_equal
 assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
 verbose)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317,
 in assert_equal
 raise AssertionError(msg)
 AssertionError:
 Items are not equal:
 item=0

  ACTUAL: 96L
  DESIRED: -96

 --
 Ran 4828 tests in 46.306s

 FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
 nose.result.TextTestResult run=4828 errors=0 failures=2


 Strange. That second one looks familiar, at least the -96 part. Wonder
 why this doesn't show up with the MKL builds.


ok tried again, this time deleting the old numpy directories before
installing

Ran 4760 tests in 42.124s

OK (KNOWNFAIL=10, SKIP=8)
nose.result.TextTestResult run=4760 errors=0 failures=0


so pip also seems to be reusing leftover files.

all clear.

Josef





 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd



OT:   Oh, I hate pip and packages that require numpy

E:\tmpC:\Python27\python C:\Python27\Scripts\pip-script.py install -U patsy
Downloading/unpacking patsy
  Running setup.py
(path:c:\users\josef\appdata\local\temp\pip_build_josef\patsy\setup.py)
egg_info for package patsy

no previously-included directories found matching 'doc\_build'
Downloading/unpacking numpy from
https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678(from
patsy)


  Found existing installation: numpy 1.6.2
Uninstalling numpy:
  Successfully uninstalled numpy
  Running setup.py install for numpy
...

...

C:\Python27\lib\distutils\dist.py:267: UserWarning: Unknown distribution
option: 'define_macros'

  warnings.warn(msg)

error: Unable to find vcvarsall.bat


  Rolling back uninstall of numpy
Cleaning up...


-
user error ?
I have a stale numpy-1.6.2-py2.7.egg-info file in site-packages

put that's a nice new feature of pip   Rolling back uninstall of numpy
numpy is still here

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote:




 On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote:




 On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:


 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.com
  wrote:

 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy
 scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


 Cool. After all these long years... Now all we need is a box running
 tests for CI.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 I get two test failures with numpy

 Josef

  np.test()
 Running unit tests for numpy
 NumPy version 1.8.1
 NumPy is installed in C:\Python27\lib\site-packages\numpy
 Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit
 (AMD64)]
 nose version 1.1.2

 ==
 FAIL: test_iterator.test_iter_broadcasting_errors
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File
 C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line
 657, in test_iter_broadcasting_errors
 '(2)-(2,newaxis)') % msg)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44,
 in assert_
 raise AssertionError(msg)
 AssertionError: Message operands could not be broadcast together with
 remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and
 requested shape (4,3) doesn't contain remapped operand
 shape(2)-(2,newaxis)

 ==
 FAIL: test_iterator.test_iter_array_cast
 --
 Traceback (most recent call last):
   File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest
 self.test(*self.arg)
   File
 C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line
 836, in test_iter_array_cast
 assert_equal(i.operands[0].strides, (-96,8,-32))
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255,
 in assert_equal
 assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg),
 verbose)
   File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317,
 in assert_equal
 raise AssertionError(msg)
 AssertionError:
 Items are not equal:
 item=0

  ACTUAL: 96L
  DESIRED: -96

 --
 Ran 4828 tests in 46.306s

 FAILED (KNOWNFAIL=10, SKIP=8, failures=2)
 nose.result.TextTestResult run=4828 errors=0 failures=2


  Strange. That second one looks familiar, at least the -96 part. Wonder
 why this doesn't show up with the MKL builds.


 ok tried again, this time deleting the old numpy directories before
 installing

 Ran 4760 tests in 42.124s

 OK (KNOWNFAIL=10, SKIP=8)
 nose.result.TextTestResult run=4760 errors=0 failures=0


 so pip also seems to be reusing leftover files.

 all clear.


Running the statsmodels test suite, I get a failure
in test_discrete.TestProbitCG where fmin_cg converges to something that
differs in the 3rd decimal.

I usually only test the 32-bit version, so I don't know if this is specific
to this scipy version, but we haven't seen this in a long time.
I used our nightly binaries http://statsmodels.sourceforge.net/binaries/

Josef





 Josef





 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-04-24 Thread josef . pktd
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:


 Hi Matthew,

 On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 Thanks to Cark Kleffner's toolchain and some help from Clint Whaley
 (main author of ATLAS), I've built 64-bit windows numpy and scipy
 wheels for testing.

 The build uses Carl's custom mingw-w64 build with static linking.

 There are two harmless test failures on scipy (being discussed on the
 list at the moment) - tests otherwise clean.

 Wheels are here:


 https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl

 https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl

 You can test with:

 pip install -U pip # to upgrade pip to latest
 pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy

 Please do send feedback.

 ATLAS binary here:


 https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2

 Many thanks for Carl in particular for doing all the hard work,


 Cool. After all these long years... Now all we need is a box running tests
 for CI.


Very good news, after 3 years interruption I might be able to build scipy
again, and switch now to a 64bit development version.

Thanks for pushing for this, and doing all the hard work.

Josef



 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.9.x branch

2014-04-23 Thread josef . pktd
On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote:
 Hi All,


 I'd like to branch 1.9.x at the end of the month. There are a couple
 of reasons for the timing. First, we have a lot of new stuff in the
 development branch. Second, there is work ongoing in masked arrays
 that I'd like to keep out of the release so that it has more time to
 settle. Third, it's past time ;)

 Sounds good.

 There are currently a number of 1.9.0 blockers, which can be seen
 here.

 Datetime timezone handling broken in 1.7.x

 I don't think there is time to get this done for 1.9.0 and it needs to
 be pushed off to 1.10.0.

 Return multiple field selection as ro view

 I have a branch for this, but because the returned type differs from a
 copy by alignment spacing there was a test failure. Merging that
 branch might cause some incompatibilities.


 I am a bit worried here that comparisons might make trouble.

 Object array creation new conversion to int


 This one needs a decision. Julian, Sebastian, thoughts?


 Maybe for all to consider this is about what happens for object arrays
 if you do things like:

 # Array cast to object array (np.array(arr) would be identical):
 a = np.arange(10).astype(object)
 # Array passed into new array creation (not just *one* array):
 b = np.array([np.arange(10)], dtype=object)
 # Numerical array is assigned to object array:
 c = np.empty(10, dtype=object)
 c[...] = np.arange(10)

 Before this change, the result was:
 type(a[0]) is int
 type(b[0,0]) is np.int_  # Note the numpy type
 type(c[0]) is int

 After this change, they are all `int`. Though note that the numpy type
 is preserved for example for long double. On the one hand preserving the
 numpy type might be nice, but on the other hand we don't care much about
 the dtypes of scalars and in practice the python types are probably more
 often wanted.

what if I don't like python?

 np.int_(0)**(-1)
inf
 0**-1
Traceback (most recent call last):
  File pyshell#28, line 1, in module
0**-1
ZeroDivisionError: 0.0 cannot be raised to a negative power


 type(np.arange(5)[0])
type 'numpy.int32'
 np.arange(5)[0]**-1
inf

 type(np.arange(5)[0].item())
type 'int'
 np.arange(5)[0].item()**-1
Traceback (most recent call last):
  File pyshell#40, line 1, in module
np.arange(5)[0].item()**-1
ZeroDivisionError: 0.0 cannot be raised to a negative power

 np.__version__
'1.6.1'


I remember struggling through this (avoiding python operations) quite
a bit in my early bugfixes to scipy.stats.distributions.

(IIRC I ended up avoiding most ints.)

Josef


 Since I just realized that things are safe (float128 does not cast to
 float after all), I changed my mind and am tempted to keep the new
 behaviour. That is, if it does not create any problems (there was some
 issue in scipy, not sure how bad).

 - Sebastian

 Median of np.matrix is broken(


 Not sure what the status of this one is.

 1.8 deprecations: Follow-up ticket


 Things that might should be removed.

 ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3


 I believe this one was fixed. For general problems reading/writing big
 files on OS X, I believe they were fixed in Maverick and I'm inclined
 to recommend an OS upgrade rather than work to chunk all the io.

 Deprecate NPY_CHAR
 This one is waiting on a fix from Pearu to make f2py use numpy
 strings. I think we will need to do this ourselves if we want to carry
 through the deprecation. In any case it probably needs to be pushed
 off to 1.10.

 1.7 deprecations: Follow-up ticket
 Some of these changes have been made, ro diagonal view for instance,
 some still remain.



 Additions, updates, and thoughts welcome.


 Chuck






 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed

2014-04-10 Thread josef . pktd
On Thu, Apr 10, 2014 at 4:43 AM, Francesc Alted fal...@gmail.com wrote:
 On 4/9/14, 10:46 PM, Chris Barker wrote:
 On Tue, Apr 8, 2014 at 11:14 AM, Nathaniel Smith n...@pobox.com
 mailto:n...@pobox.com wrote:

 Thank you! Though I suspect that the most important part of my
 contribution may have just been my high tolerance for writing emails
 ;-).


 no -- it's your high tolerance for _reading_ emails...

 Far too many of us have a high tolerance for writing them!

 Ha ha, very true!

I'm trying, but it's hard.
Sometimes avoiding the `send` button helps. I still have a collection
of draft replies.

Josef
I'm not going to send this reply -- oops, did it again


 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 8:50 AM, Daπid davidmen...@gmail.com wrote:

 On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote:

 josef.p...@gmail.com wrote:

  pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.


 All this discussion reminds me of the book Numerical Recipes:

 if the difference between N and N − 1 ever matters to you, then you
 are probably up to no good anyway — e.g., trying to substantiate a
 questionable
 hypothesis with marginal data.

 For any reasonably sized data set, it is a correction in the second
 significant figure.

I fully agree, but sometimes you don't have much choice.

`big data` == `statistics with negative degrees of freedom` ?

or maybe

`machine learning` == `statistics with negative degrees of freedom` ?

Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 Currently there are several placements of the '.. versionadded::' directive
 and I'd like to settle
 on a proper style for consistency. There are two occasions on which it is
 used, first, when a new function or class is added and second, when a new
 keyword is added to an existing function or method. The options are as
 follows.

 New Function

 1) Originally, the directive was added in the notes section.

 Notes
 -
 .. versionadded:: 1.5.0

  2) Alternatively, it is placed after the extended summary.

 blah, blah

 ..versionadded:: 1.5.0

 Between these two, I vote for 2) because the version is easily found when
 reading the documentation either in a terminal or rendered into HTML.

 New Parameter

 1) It is placed before the parameter description

 newoption : int, optional
 .. versionadded:: 1.5.0
 blah.

 2) It is placed after the parameter description.

 newoption : int, optional
 blah.

 .. versionadded:: 1.5.0

 Both of these render correctly, but the first is more compact while the
 second puts the version
 after the description where it doesn't interrupt the reading. I'm tending
 towards 1) on account of its compactness.

 Thoughts?

I'm in favor of putting them only in the Notes section.

Most of the time they are not crucial information and it's
distracting.  I usually only look for them when I'm working explicitly
across several numpy versions.

like in python: versionadded 2.1 is only interesting for historians.

Josef


 Chuck



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  Currently there are several placements of the '.. versionadded::'
  directive
  and I'd like to settle
  on a proper style for consistency. There are two occasions on which it
  is
  used, first, when a new function or class is added and second, when a
  new
  keyword is added to an existing function or method. The options are as
  follows.
 
  New Function
 
  1) Originally, the directive was added in the notes section.
 
  Notes
  -
  .. versionadded:: 1.5.0
 
   2) Alternatively, it is placed after the extended summary.
 
  blah, blah
 
  ..versionadded:: 1.5.0
 
  Between these two, I vote for 2) because the version is easily found
  when
  reading the documentation either in a terminal or rendered into HTML.
 
  New Parameter
 
  1) It is placed before the parameter description
 
  newoption : int, optional
  .. versionadded:: 1.5.0
  blah.
 
  2) It is placed after the parameter description.
 
  newoption : int, optional
  blah.
 
  .. versionadded:: 1.5.0
 
  Both of these render correctly, but the first is more compact while the
  second puts the version
  after the description where it doesn't interrupt the reading. I'm
  tending
  towards 1) on account of its compactness.
 
  Thoughts?

 I'm in favor of putting them only in the Notes section.

 Most of the time they are not crucial information and it's
 distracting.  I usually only look for them when I'm working explicitly
 across several numpy versions.

 like in python: versionadded 2.1 is only interesting for historians.


 I find the opposite to be true. Because numpy needs maintain compatibility
 with a number python versions, I often check the python documentation to see
 in which version a function was added.

 I agree; versionadded 2.1 is not likely interesting but versionadded
 2.7 is very interesting.

That's true, but it's a mess for maintainers because we support now 5
python versions.

numpy doesn't have a long history of versionadded yet, I didn't find
anything for 1.3 in a quick search.
statsmodels has now numpy 1.6 as minimum requirement and I'm
interested in the features that become available with a minimum
version increase.
Once I know what I'm allowed to use, I only care about the real
documentation, How does einsum really work?.

But as a numpy user, I was never really interested in the information
that arraysetops where enhanced and renamed in numpy 1.x (x=?4), or
that take was added in 0.y, ...  Even the first part of polynomial is
already in 1.4
(It might just make me feel old if I remember when it was changed.)

versionadded is not very distracting in the html rendering, so I'm
just +0.1 on Notes.

Josef


 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 3:33 PM,  josef.p...@gmail.com wrote:
 On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  Currently there are several placements of the '.. versionadded::'
  directive
  and I'd like to settle
  on a proper style for consistency. There are two occasions on which it
  is
  used, first, when a new function or class is added and second, when a
  new
  keyword is added to an existing function or method. The options are as
  follows.
 
  New Function
 
  1) Originally, the directive was added in the notes section.
 
  Notes
  -
  .. versionadded:: 1.5.0
 
   2) Alternatively, it is placed after the extended summary.
 
  blah, blah
 
  ..versionadded:: 1.5.0
 
  Between these two, I vote for 2) because the version is easily found
  when
  reading the documentation either in a terminal or rendered into HTML.
 
  New Parameter
 
  1) It is placed before the parameter description
 
  newoption : int, optional
  .. versionadded:: 1.5.0
  blah.
 
  2) It is placed after the parameter description.
 
  newoption : int, optional
  blah.
 
  .. versionadded:: 1.5.0
 
  Both of these render correctly, but the first is more compact while the
  second puts the version
  after the description where it doesn't interrupt the reading. I'm
  tending
  towards 1) on account of its compactness.
 
  Thoughts?

 I'm in favor of putting them only in the Notes section.

 Most of the time they are not crucial information and it's
 distracting.  I usually only look for them when I'm working explicitly
 across several numpy versions.

 like in python: versionadded 2.1 is only interesting for historians.

since I like history:

AFAICS:

arraysetops was changed in 1.4

histogram was added in 0.4
corrcoef was added in 0.9.2

numpy 0.9.2 is 8 years old
python 2.1 has soon it's 13th anniversary

Josef




 I find the opposite to be true. Because numpy needs maintain compatibility
 with a number python versions, I often check the python documentation to see
 in which version a function was added.

 I agree; versionadded 2.1 is not likely interesting but versionadded
 2.7 is very interesting.

 That's true, but it's a mess for maintainers because we support now 5
 python versions.

 numpy doesn't have a long history of versionadded yet, I didn't find
 anything for 1.3 in a quick search.
 statsmodels has now numpy 1.6 as minimum requirement and I'm
 interested in the features that become available with a minimum
 version increase.
 Once I know what I'm allowed to use, I only care about the real
 documentation, How does einsum really work?.

 But as a numpy user, I was never really interested in the information
 that arraysetops where enhanced and renamed in numpy 1.x (x=?4), or
 that take was added in 0.y, ...  Even the first part of polynomial is
 already in 1.4
 (It might just make me feel old if I remember when it was changed.)

 versionadded is not very distracting in the html rendering, so I'm
 just +0.1 on Notes.

 Josef


 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread josef . pktd
On Wed, Apr 2, 2014 at 10:06 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 josef.p...@gmail.com wrote:

 pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.

 Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
 or specify ddof. Bu that belongs in the docs.


 Sturla

 P.S. Personally I am not convinced unbiased is ever a valid argument, as
 the biased estimator has smaller error. This is from experience in
 marksmanship: I'd rather shoot a tight series with small systematic error
 than scatter my bullets wildly but unbiased on the target. It is the
 total error that counts. The series with smallest total error gets the best
 score. It is better to shoot two series and calibrate the sight in between
 than use a calibration-free sight that don't allow us to aim.

calibration == bias correction ?

That's why I
 think classical statistics got this one wrong. Unbiased is never a virtue,
 but the smallest error is. Thus, if we are to repeat an experiment, we
 should calibrate our estimator just like a marksman calibrates his sight.
 But the aim should always be calibrated to give the smallest error, not an
 unbiased scatter. Noone in their right mind would claim a shotgun is more
 precise than a rifle because it has smaller bias. But that is what applying
 the Bessel correction implies.

https://www.youtube.com/watch?v=i4xcEZZDW_I


I spent several days trying to figure out what Stata is doing for
small sample corrections to reduce the bias of the rejection interval
with uncorrected variance estimates.

Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread josef . pktd
On Thu, Apr 3, 2014 at 2:21 PM, Bago mrb...@gmail.com wrote:




 Sturla

 P.S. Personally I am not convinced unbiased is ever a valid argument, as
 the biased estimator has smaller error. This is from experience in
 marksmanship: I'd rather shoot a tight series with small systematic error
 than scatter my bullets wildly but unbiased on the target. It is the
 total error that counts. The series with smallest total error gets the
 best
 score. It is better to shoot two series and calibrate the sight in between
 than use a calibration-free sight that don't allow us to aim. That's why I
 think classical statistics got this one wrong. Unbiased is never a virtue,
 but the smallest error is. Thus, if we are to repeat an experiment, we
 should calibrate our estimator just like a marksman calibrates his sight.
 But the aim should always be calibrated to give the smallest error, not an
 unbiased scatter. Noone in their right mind would claim a shotgun is more
 precise than a rifle because it has smaller bias. But that is what
 applying
 the Bessel correction implies.


 I agree with the point, and what makes it even worse is that ddof=1 does not
 even produce an unbiased standard deviation estimate. I produces an unbiased
 variance estimate but the sqrt of this variance estimate is a biased
 standard deviation estimate,
 http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

But ddof=1 still produces a smaller bias than ddof=0

I think the main point in stats is that without ddof, the variance
will be too small and t-test or similar will be liberal in small
samples, or confidence intervals will be too short.
(for statisticians that prefer to have tests that maintain their level
and prefer to err on the conservative side.)

Josef



 Bago

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread josef . pktd
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

 If we could still choose here without any costs, obviously that's true. This
 particular ship sailed a long time ago though. By the way, there isn't even
 a `scipy.stats.std`, so we're comparing with differently named functions
 (nanstd for example).

 Presumably nanstd is a lot less heavily used than std, and presumably
 people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
 think of changing nanstd to ddof=0 to match numpy? (With appropriate
 FutureWarning transition, etc.)

numpy is numpy, a numerical library
scipy.stats is stats and behaves differently.  (axis=0)

nanstd in scipy.stats will hopefully also go away soon, so I don't
think it's worth changing there either.

pandas came later and thought ddof=1 is worth more than consistency.

I don't think ddof defaults's are worth jumping through deprecation hoops.

(bias in cov, corrcoef is non-standard ddof)

Josef



 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.vander

2014-03-29 Thread josef . pktd
On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:
 Hi,

 I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds
 up `np.vander` by using accumulated multiplication instead of exponentiation
 to compute the Vandermonde matrix. For largish matrices the speed-ups can be
 quite dramatic, over an order of magnitude.

 Julian has raised concerns on numerical stability and loss of precision,
 which don't seem to be all that relevant. Do speak up if you think
 otherwise.

 We are also discussing replacing a recently added kwarg, order, which now
 accepts a string, either increasing or decreasing, to indicate the
 ordering of the matrix columns. This was not present in 1.8, so can still be
 modified. The proposal is to replace it with a reversed boolean flag.
 Unfortunately, the return of np.vander in 1.8 and before is the opposite
 (i.e. its reversed) from the standard definition, which has powers
 increasing from left to right. So it is not clear what the reversed keyword
 should refer to:

 1. If it refers to the standard definition, then it would default to False
 for backwards compatibility, but be consistent with the conventional
 definition.

 2. If it refers to the existing behavior of numpy's vander, then it would
 default to True, and not be consistent with the conventional definition.

 I prefer option 1, but would like to hear other's opinions. Which could of
 course include naming the boolean flag more ingeniously, or keeping the
 string flag. If he's reading, I'd specially like to hear Warren Weckesser's
 thoughts, as he is the one who added the order kwarg.

order is not a good name, I would find it very confusing (I'm
usually mixing up order and degree)
http://en.wikipedia.org/wiki/Order_of_a_polynomial

how about calling the keyword increasing=False ?
which would avoid defining what it's reversed against.


I don't know about precision loss.
There's a nasty NIST problem for polynomial regression. If that
doesn't change much, I wouldn't worry about differences in precision
for statistical applications.

But the problem is for the regression part, and might not be affected
much by the vander precision. (Besides it's an example for Don't do
that at home.)
http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html
http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression

Josef


 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
 dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to np.vander

2014-03-29 Thread josef . pktd
On Sat, Mar 29, 2014 at 7:31 AM,  josef.p...@gmail.com wrote:
 On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 Hi,

 I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds
 up `np.vander` by using accumulated multiplication instead of exponentiation
 to compute the Vandermonde matrix. For largish matrices the speed-ups can be
 quite dramatic, over an order of magnitude.

 Julian has raised concerns on numerical stability and loss of precision,
 which don't seem to be all that relevant. Do speak up if you think
 otherwise.

 We are also discussing replacing a recently added kwarg, order, which now
 accepts a string, either increasing or decreasing, to indicate the
 ordering of the matrix columns. This was not present in 1.8, so can still be
 modified. The proposal is to replace it with a reversed boolean flag.
 Unfortunately, the return of np.vander in 1.8 and before is the opposite
 (i.e. its reversed) from the standard definition, which has powers
 increasing from left to right. So it is not clear what the reversed keyword
 should refer to:

 1. If it refers to the standard definition, then it would default to False
 for backwards compatibility, but be consistent with the conventional
 definition.

 2. If it refers to the existing behavior of numpy's vander, then it would
 default to True, and not be consistent with the conventional definition.

 I prefer option 1, but would like to hear other's opinions. Which could of
 course include naming the boolean flag more ingeniously, or keeping the
 string flag. If he's reading, I'd specially like to hear Warren Weckesser's
 thoughts, as he is the one who added the order kwarg.

 order is not a good name, I would find it very confusing (I'm
 usually mixing up order and degree)
 http://en.wikipedia.org/wiki/Order_of_a_polynomial

 how about calling the keyword increasing=False ?
 which would avoid defining what it's reversed against.

Obviously I didn't read the PR before answering.

But it shows that `increasing` might be obvious, or that Warren and I
think the same way.

Josef



 I don't know about precision loss.
 There's a nasty NIST problem for polynomial regression. If that
 doesn't change much, I wouldn't worry about differences in precision
 for statistical applications.

 But the problem is for the regression part, and might not be affected
 much by the vander precision. (Besides it's an example for Don't do
 that at home.)
 http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html
 http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression

 Josef


 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
 dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-27 Thread josef . pktd
On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel
olivier.gri...@ensta.org wrote:
 My understanding of Carl's effort is that the long term goal is to
 have official windows whl packages for both numpy and scipy published
 on PyPI with a builtin BLAS / LAPACK implementation so that users can
 do `pip install scipy` under windows and get something that just works
 without have to install any compiler (fortran or C) nor any additional
 library manually.

 Most windows users are beginners and you cannot really expect them to
 understand how to build the whole scipy stack from source.

 The current solution (executable setup installers) is not optimal as
 it requires Administrator rights to run, does not resolve dependencies
 as pip does and cannot be installed in virtualenvs.

as small related point:

The official installers can be used to install in virtualenv
The way I do it:
Run the superpack, official installer, wait until it extracts the
correct (SSE) install exe, then cancel
Then easy_install the install exe file that has been extracted to the
temp folder into the virtualenv.

I don't remember if the extraction already requires admin rights, but
I think not.
easy_install doesn't require any, IIRC.

Josef


 If we can build numpy / scipy whl packages for windows with the Atlas
 dlls then fine embedded in the numpy package then good. It does not
 need to be the
 fastest BLAS / LAPACK lib in my opinion. Just something that works.

 The problem with ATLAS is that you need to select the number of thread
 at build time AFAIK. But we could set it to a reasonable default (e.g.
 4 threads) for the default windows package.

 --
 Olivier
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-27 Thread josef . pktd
On Thu, Mar 27, 2014 at 9:59 AM, Olivier Grisel
olivier.gri...@ensta.org wrote:
 2014-03-27 14:55 GMT+01:00  josef.p...@gmail.com:
 On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel
 olivier.gri...@ensta.org wrote:
 My understanding of Carl's effort is that the long term goal is to
 have official windows whl packages for both numpy and scipy published
 on PyPI with a builtin BLAS / LAPACK implementation so that users can
 do `pip install scipy` under windows and get something that just works
 without have to install any compiler (fortran or C) nor any additional
 library manually.

 Most windows users are beginners and you cannot really expect them to
 understand how to build the whole scipy stack from source.

 The current solution (executable setup installers) is not optimal as
 it requires Administrator rights to run, does not resolve dependencies
 as pip does and cannot be installed in virtualenvs.

 as small related point:

 The official installers can be used to install in virtualenv
 The way I do it:
 Run the superpack, official installer, wait until it extracts the
 correct (SSE) install exe, then cancel
 Then easy_install the install exe file that has been extracted to the
 temp folder into the virtualenv.

 I don't remember if the extraction already requires admin rights, but
 I think not.
 easy_install doesn't require any, IIRC.

 Hackish but interesting. Maybe the extraction can be done with generic
 tools like winzip?

I tried to open and unzip with WinRAR but couldn't make sense of the content.

BTW: easy_install for other installers like matplotlib also works
nicely for virtualenv



However, the official installers are only for 32-bit python, and I
appreciate all the efforts to modernize the numpy and scipy builds.

Josef


 --
 Olivier
 http://twitter.com/ogrisel - http://github.com/ogrisel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-26 Thread josef . pktd
On Mon, Mar 24, 2014 at 8:33 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote:
  After 88 emails we don't have a conclusion in the other thread (see
  [1] for background). But we have to come to some conclusion or another
  if we want @ to exist :-). So I'll summarize where the discussion
  stands and let's see if we can find some way to resolve this.

 Response in this thread so far seems (AFAICT) to have pretty much
 converged on same-left.

 If you think that this would be terrible and there is some compelling
 argument against it, then please speak up! Otherwise, if no-one
 objects, then I'll go ahead in the next few days and put same-left
 into the PEP.


 I think we should take a close look at broadcasting before deciding on the
 precedence.

 Can you elaborate? Like what, concretely, do you think we need to do now?

???


In examples like this, parenthesizing the code aggressively to spell
out the logic, not
only to Stata but also to yourself and anybody else reading it, should
cause no embarrassment.
You need not assume knowledge of Stata's precedence rules that determine
interpretation when several operators are used in one expression. More
importantly,
you may avoid some horrible little bugs. Nicholas J. Cox

Trying to figure out what Stata is using: elementwise operations are
just below their matrix version in operator precedence.
But Stata came late to matrix algebra, and is definitely not like
Matlab or Gauss, or numpy.

Josef




 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] why sort does not accept a key?

2014-03-24 Thread josef . pktd
On Mon, Mar 24, 2014 at 12:08 PM, Alan G Isaac alan.is...@gmail.com wrote:
 On Mon, Mar 24, 2014 at 11:32 AM, Alan G Isaac wrote:
 I'm wondering if `sort` intentionally does not accept
 a `key`
 or if this is just a missing feature?


 On 3/24/2014 11:47 AM, Alexander Belopolsky wrote:
 It would be very inefficient to call a key function on
 every element compared during the sort.   See np.argsort
 and np.lexsort for faster alternatives.


 But the keys could be as in `lexsort`.

 I am currently using `argsort`, but I can do so because
 I don't need a lexicographically determined sort order
 for the indexes.


 To close with a related question:
 what is the preferred idiom for a descending sort?

adding [::-1] just creates a new view, pretty low cost.

Josef



 Thanks,
 Alan



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Fri, Mar 21, 2014 at 6:49 PM, josef.p...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac alan.is...@gmail.comwrote:

 The documentation of numpy.unique
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
 does not seem to promise that return_index=True will always index the
 *first* occurrence of each unique item, which I believe is the current
 behavior.

 A promise would be nice.
 Is it intended?


 Yes, it is intended, although the required mergesort wasn't available
 for all types before numpy 1.7.


 Does this mean return_inverse works again for all cases, even with
 return_index?

 I removed return_index from my code in statsmodels because I make
 frequent use of return_inverse, which was broken. We don't have any
 unittests in statsmodels anymore that use both return_xxx.


 I don't know, needs checking. Seems to work now with a simple trial array
 of integers.


my example from may 2012,  thread 1.6.2 no more unique for rows
works fine on  python 3.3  numpy 1.7.1

 groups = np.random.randint(0,4,size=(10,2))
 groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
 uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
return_inverse=True)
 uni
array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)],
  dtype=[('f0', 'i4'), ('f1', 'i4')])
 uni_inv
array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32)
 np.__version__
'1.7.1'

Thanks,

Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unique return_index order?

2014-03-21 Thread josef . pktd
On Fri, Mar 21, 2014 at 9:27 PM, josef.p...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 6:49 PM, josef.p...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac alan.is...@gmail.comwrote:

 The documentation of numpy.unique
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
 does not seem to promise that return_index=True will always index the
 *first* occurrence of each unique item, which I believe is the current
 behavior.

 A promise would be nice.
 Is it intended?


 Yes, it is intended, although the required mergesort wasn't available
 for all types before numpy 1.7.


summary, AFAICS: since numpy 1.6.2 np.unique used mergesort if
return_index=True and provides a stable sort.

Josef




 Does this mean return_inverse works again for all cases, even with
 return_index?

 I removed return_index from my code in statsmodels because I make
 frequent use of return_inverse, which was broken. We don't have any
 unittests in statsmodels anymore that use both return_xxx.


 I don't know, needs checking. Seems to work now with a simple trial array
 of integers.


 my example from may 2012,  thread 1.6.2 no more unique for rows
 works fine on  python 3.3  numpy 1.7.1

  groups = np.random.randint(0,4,size=(10,2))
  groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten()
  uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
 return_inverse=True)
  uni
 array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)],
   dtype=[('f0', 'i4'), ('f1', 'i4')])
  uni_inv
 array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32)
  np.__version__
 '1.7.1'

 Thanks,

 Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-20 Thread josef . pktd
On Thu, Mar 20, 2014 at 1:25 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith n...@pobox.com wrote:

 Okay, I wrote a little script [1] to scan Python source files look for
 things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot
 method equivalents. So what we get out is:
 - a count of how many 'dot' calls there are
 - a count of how often we see left-associative nestings: dot(dot(a, b), c)
 - a count of how often we see right-associative nestings: dot(a, dot(b,
 c))

 Running it on a bunch of projects, I get:

 | project  | dots | left | right | right/left |
 |--+--+--+---+|
 | scipy|  796 |   53 |27 |   0.51 |
 | nipy |  275 |3 |19 |   6.33 |
 | scikit-learn |  472 |   11 |10 |   0.91 |
 | statsmodels  |  803 |   46 |38 |   0.83 |
 | astropy  |   17 |0 | 0 |nan |
 | scikit-image |   15 |1 | 0 |   0.00 |
 |--+--+--+---+|
 | total| 2378 |  114 |94 |   0.82 |


 Another way to visualize this, converting each contiguous chain of calls
 to np.dot into a parenthesized expression, and then counting how often we
 see each pattern.

   1943  (_ @ _)
100  ((_ @ _) @ _) # left
 86  (_ @ (_ @ _)) # right
  2  (_ @ ((_ @ _) @ _))
  2  (((_ @ _) @ _) @ _) # left
  1  ((_ @ (_ @ _)) @ _)
  1  ((_ @ _) @ (_ @ _))
  1  (((_ @ _) @ _) @ (_ @ _))
  1  ((_ @ ((_ @ _) @ _)) @ _)
  1  ((_ @ _) @ (_ @ (_ @ _)))

 (This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3
 different patterns that have a consistent associativity.

 From this I'm leaning towards the conclusions that:

 - Expressions with complex parenthesization do happen, but probably not
 often enough to justify elaborate stuff like my 'chaining' proposal -- only
 8.7% of these cases involve more than one @.


just for statsmodels

We do have a very large amount of chaining, but in many cases this has been
taken out of a single expression into a temporary or permanent variable for
parts of the chain. (similar to the quadratic form example in the PEP),
either for clarity (a temp variable), or because one dot product shows up
several times in the same expression (quadratic forms) or because we need
to keep it around for reuse in other expressions.

That's what I tried to explain before, that chaining and breaking up larger
multi-dot expressions is most of the time a intentional choice and not just
random because the the dot function forces us.

The most convincing argument for me for @ is that it makes parenthesis
visible (until I realized that I didn't really care about @).
This reduces the cases where we separate out a dot product for clarity and
readibility, but still leaves us with the other two cases, where our
chaining won't change whatever numpy provides additionally.

Josef




 - There's very little support here for the intuition that
 right-associativity is more useful than left-associativity on a day-to-day
 basis.

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-19 Thread josef . pktd
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern robert.k...@gmail.com
 wrote:
  On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith n...@pobox.com wrote:
  On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith n...@pobox.com wrote:
  Mathematica: instead of having an associativity, a @ b @ c gets
  converted into mdot([a, b, c])
 
  So, I've been thinking about this (thanks to @rfateman for pointing it
  out), and wondering if Mathematica's approach is worth following up
  more. (It would need to make it past python-dev, of course, but worst
  case is just that they say no and we're back where we are now, so we
  might as well think it through.)
 
  I predict with near-certainty that this will be rejected,

 I guess that's what everyone thought about @ too? ;-)

  but that
  doesn't prevent it from derailing the discussion. This proposal is
  unlike anything else in Python. Chained comparisons are *not* similar
  to this proposal. The chaining only happens at the syntax level, not
  the semantics. `a  b  c` gets compiled down to `a.__lt__(b) and
  b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.

 Yes, the syntax is the same as chained comparisons, and the dispatch
 is a generalization of regular operators. It is unusual; OTOH, @ is
 unusual in that no other operators in Python have the property that
 evaluating in the wrong order can cost you seconds of time and
 gigabytes of memory. Perhaps.

  We have approval for a binary @ operator. Take the win.

 We have approval, and we have a request: that we figure out how @
 should work in detail to be most useful to us. Maybe that's this
 proposal; maybe not. Ultimately rejected-or-not-rejected comes down to
 how strong the arguments for something are. And while we can make some
 guesses about that, it's impossible to know how strong an argument
 will be until one sits down and works it out. So I still would like to
 hear what people think, even if it just ends in the conclusion that
 it's a terrible idea ;-).



What happens if you have 5 @ in a row?

My head hurts if I had to think about what would actually be going on.
and don't forget, the sparse matrix is stuck in the middle.

But I would be happy to have a optimizing multi_dot or chain_dot function
when it feels safe enough.




 As for arguments against the grouping semantics, I did think of one
 another case where @ is not associative, though it's pretty weird:

 In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4))

 In [10]: np.dot(a, np.dot(a, a.astype(float)))
 Out[10]:
 array([[  1680.,   1940.,   2200.,   2460.],
[  4880.,   5620.,   6360.,   7100.],
[  8080.,   9300.,  10520.,  11740.],
[ 11280.,  12980.,  14680.,  16380.]])

 In [12]: np.dot(np.dot(a, a), a.astype(float))
 Out[12]:
 array([[ 1680.,  1940.,  2200.,  2460.],
[-1264., -1548., -1832., -2116.],
[ 1936.,  2132.,  2328.,  2524.],
[-1008., -1100., -1192., -1284.]])

 (What's happening is that we have int8 @ int8 @ float, so (int8 @
 int8) @ float has overflows in the first computation, but int8 @ (int8
 @ float) does all the computations in float, with no overflows.)


That's similar to my example before that mixes in some scalar *.

I thought of it as an argument for same-left

Josef




 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread josef . pktd
I'm still bothered by what Nathaniel mentioned about mixing 1d and 2d arrays


 c = np.arange(4)
 a = np.arange(16).reshape(4,4)
 cc = c[:,None]

 a.dot(c).dot(c.T)
420

 a.dot(c.dot(c.T))
array([[  0,  14,  28,  42],
   [ 56,  70,  84,  98],
   [112, 126, 140, 154],
   [168, 182, 196, 210]])


 a.dot(cc).dot(cc.T)
array([[  0,  14,  28,  42],
   [  0,  38,  76, 114],
   [  0,  62, 124, 186],
   [  0,  86, 172, 258]])

 a.dot(cc.dot(cc.T))
array([[  0,  14,  28,  42],
   [  0,  38,  76, 114],
   [  0,  62, 124, 186],
   [  0,  86, 172, 258]])


hint:
 c.dot(c.T)
14

and I expect it will be a lot more fun if we mix in some 3d or nd arrays.


I think some of the decisions should not be driven by what is the most
convenient for the usual cases, but by how easy it is to read your code and
find the bugs where we made silly mistakes.


A biased view from someone who learned how to use numpy and scipy by
debugging.


Matlab and GAUSS are user friendly, they don't allow for reduced dimension
and never steal an axis in a reduce operation.

I didn't manage to come up with more difficult examples (and ran out of
time)

 (a.dot(c).dot(c.T)).dot(2*a)
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'numpy.int32' object has no attribute 'dot'

If we make too many mistakes, then numpy tells us.
In the above examples, the scalar dot matrix would raise according to the
PEP.

I cannot come up with examples where we mix 3d and 2d and 1d because dot
currently doesn't do it.


sane-left

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread josef . pktd
On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky ndar...@mac.comwrote:


 On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith n...@pobox.com wrote:

 In practice all
 well-behaved classes have to make sure that they implement __special__
 methods in such a way that all the different variations work, no
 matter which class ends up actually handling the operation.


 Well-behaved classes are hard to come by in practice.  The @ operator
 may fix the situation with np.matrix, so take a look at MaskedArray with
 its 40-line __array_wrap__ and no end of bugs.

 Requiring superclass __method__ to handle creation of subclass results
 correctly is turning Liskov principle on its head.  With enough clever
 tricks and tight control over the full class hierarchy you can make it work
 in some cases, but it is not a good design.

 I am afraid that making @ special among other binary operators that
 implement mathematically associative operations will create a lot of
 confusion.  (The pow operator is special because the corresponding
 mathematical operation is non-associative.)

 Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @
 (b @ c).  What are the chances that they will correctly figure out what a
 // b // c means after this?


One case where we need to keep track of left or right is type promotion

 a.shape
(100,)
 1. * a.dot(a)
-98.0
 (1.*a).dot(a)
328350.0
 a.dtype
dtype('int8')

 1. * a @ a
???

similar to
 1. * 2 / 3
0.
 1. * (2 / 3)   # I'm not in the `future`
0.0

Josef




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

  1. * a.dot(a)
-98.0
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread josef . pktd
On Mon, Mar 17, 2014 at 1:18 PM, josef.p...@gmail.com wrote:




 On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky ndar...@mac.comwrote:


 On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith n...@pobox.com wrote:

 In practice all
 well-behaved classes have to make sure that they implement __special__
 methods in such a way that all the different variations work, no
 matter which class ends up actually handling the operation.


 Well-behaved classes are hard to come by in practice.  The @ operator
 may fix the situation with np.matrix, so take a look at MaskedArray with
 its 40-line __array_wrap__ and no end of bugs.

 Requiring superclass __method__ to handle creation of subclass results
 correctly is turning Liskov principle on its head.  With enough clever
 tricks and tight control over the full class hierarchy you can make it work
 in some cases, but it is not a good design.

 I am afraid that making @ special among other binary operators that
 implement mathematically associative operations will create a lot of
 confusion.  (The pow operator is special because the corresponding
 mathematical operation is non-associative.)

 Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a
 @ (b @ c).  What are the chances that they will correctly figure out what a
 // b // c means after this?


 One case where we need to keep track of left or right is type promotion

  a.shape
 (100,)
  1. * a.dot(a)
 -98.0
  (1.*a).dot(a)
 328350.0
  a.dtype
 dtype('int8')

  1. * a @ a
 ???

 similar to
  1. * 2 / 3
 0.
  1. * (2 / 3)   # I'm not in the `future`
 0.0


I thought of sending a message with I'm +-1 on either, but I'm not

I'm again in favor of left, because it's the simplest to understand
A.dot(B).dot(C)  with some * mixed in

I understand now the computational argument in favor of right

x @ inv(x.T @ x) @ x.T @ y   ( with shapes T,k   k,k   k,T  T,1  )
or
x @ pinv(x) @ y(with shapes T,k k,T  T,1 )

with  with Tk  (last 1 could be a m1 with Tm)

However, we don't write code like that most of the time.
Alan's students won't care much if some intermediate arrays blow up.
In library code like in statsmodels it's almost always a conscious choice
of where to set the parenthesis and, more often, which part of a long array
expression is taken out as a temporary or permanent variable.

I think almost the only uses of chain_dot(A, B, C) (which is right) is
for quadratic forms

xtxi = pinv(np.dot(exog.T, exog))   # k,k
xtdx = np.dot(exog.T * d[np.newaxis, :], exog)   # k,k
vcov = chain_dot(xtxi, xtdx, xtxi)  # kk, kk, kk
(from Quantreg)

I think optimizing this way is relatively easy


On the other hand, I worry a lot more about messy cases with different
dtypes or different classes involved as Alexander has pointed out. Cases
that might trip up medium to medium-advanced numpy users.

(Let's see, I have to read @ back to front, and * front to back, and why
did I put a sparse matrix in the middle and a masked array at the end. Oh
no, that's not a masked array it's a panda.)
compared to
(Somewhere there is a mistake, let's go through all terms from the
beginning to the end)

Josef




 Josef




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

  1. * a.dot(a)
 -98.0

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread josef . pktd
On Mon, Mar 17, 2014 at 6:33 PM, Christophe Bal projet...@gmail.com wrote:

 I think that weak-left is a little strange, just think a little of the
 operators used by mathematicians that always follow a hierarchy.

 A parser is mostly done using grammars : see
 http://docs.python.org/3.1/reference/grammar.html.

 Defining *-product to have stronger priority than the @-product, and this
 last having stronger priority than +, will make the changes in the grammar
 easier.

 I'm now convinced of the usefulness of @ and @@ too but I also think that
 you must think of other uses than only for numpy. In other words, numpy is
 a the good argument for this new operators, but this can also open new
 perspectives for other uses.



My main problem with weak-left (* higher) and tight-left (@ higher)
compared to same-left is that I don't see any obvious choice between the
weak and tight.
I don't think I would have problems with readability.

Wikipedia doesn't say anything about precedence of Hadamard versus matrix
product.

matlab, IDL and Gauss (I checked the manual) all use same-left, as
Nathaniel pointed out.

For scalar * together with dot product which is more common in formulas, we
would just read it sequentially, i.e. same-left.

I don't remember when I have seen dot-in-a-circle in a paper, but I don't
think there was any precedence either.

---
I guess the same applies for other (mis)uses of @

from math import sqrt

class MyOp(object):

def __init__(self, func):
self.func = func

def __at__(self, x):
return [self.func(xi) for xi in x]


myop = MyOp(lambda x: sqrt(x))

print myop.__at__(range(3))   # myop @ range(5)
print myop.__at__(range(3) * 2)  # myop @ (range(5) * 2)
print myop.__at__(range(3)) * 3  # myop @ range(5) * 3

'''
[0.0, 1.0, 1.4142135623730951]
[0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951]
[0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951, 0.0, 1.0,
1.4142135623730951]
'''

-


Josef





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread josef . pktd
On Sun, Mar 16, 2014 at 10:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Mar 16, 2014 at 2:39 PM, Eelco Hoogendoorn
 hoogendoorn.ee...@gmail.com wrote:
  Note that I am not opposed to extra operators in python, and only mildly
  opposed to a matrix multiplication operator in numpy; but let me lay out
 the
  case against, for your consideration.
 
  First of all, the use of matrix semantics relative to arrays semantics is
  extremely rare; even in linear algebra heavy code, arrays semantics often
  dominate. As such, the default of array semantics for numpy has been a
 great
  choice. Ive never looked back at MATLAB semantics.

 Different people work on different code and have different experiences
 here -- yours may or may be typical yours. Pauli did some quick checks
 on scikit-learn  nipy  scipy, and found that in their test suites,
 uses of np.dot and uses of elementwise-multiplication are ~equally
 common: https://github.com/numpy/numpy/pull/4351#issuecomment-37717330h

  Secondly, I feel the urge to conform to a historical mathematical
 notation
  is misguided, especially for the problem domain of linear algebra.
 Perhaps
  in the world of mathematics your operation is associative or commutes,
 but
  on your computer, the order of operations will influence both outcomes
 and
  performance. Even for products, we usually care not only about the
 outcome,
  but also how that outcome is arrived at. And along the same lines, I
 don't
  suppose I need to explain how I feel about A@@-1 and the likes. Sure, it
  isn't to hard to learn or infer this implies a matrix inverse, but why on
  earth would I want to pretend the rich complexity of numerical matrix
  inversion can be mangled into one symbol? Id much rather write inv or
 pinv,
  or whatever particular algorithm happens to be called for given the
  situation. Considering this isn't the num-lisp discussion group, I
 suppose I
  am hardly the only one who feels so.
 

 My impression from the other thread is that @@ probably won't end up
 existing, so you're safe here ;-).

  On the whole, I feel the @ operator is mostly superfluous. I prefer to be
  explicit about where I place my brackets. I prefer to be explicit about
 the
  data layout and axes that go into a (multi)linear product, rather than
 rely
  on obtuse row/column conventions which are not transparent across
 function
  calls. When I do linear algebra, it is almost always vectorized over
  additional axes; how does a special operator which is only well defined
 for
  a few special cases of 2d and 1d tensors help me with that?

 Einstein notation is coming up on its 100th birthday and is just as
 blackboard-friendly as matrix product notation. Yet there's still a
 huge number of domains where the matrix notation dominates. It's cool
 if you aren't one of the people who find it useful, but I don't think
 it's going anywhere soon.

  Note that I don't think there is much harm in an @ operator; but I don't
 see
  myself using it either. Aside from making textbook examples like a
  gram-schmidt orthogonalization more compact to write, I don't see it
 having
  much of an impact in the real world.

 The analysis in the PEP found ~780 calls to np.dot, just in the two
 projects I happened to look at. @ will get tons of use in the real
 world. Maybe all those people who will be using it would be happier if
 they were using einsum instead, I dunno, but it's an argument you'll
 have to convince them of, not me :-).


Just as example

I just read for the first time two journal articles in econometrics that
use einsum notation.
I have no idea what their formulas are supposed to mean, no sum signs and
no matrix algebra.
I need to have a strong incentive to stare at those formulas again.

(statsmodels search finds 1520 dot, including sandbox and examples)

Josef
TODO: learn how to use einsums



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?

2014-03-15 Thread josef . pktd
I think I wouldn't use anything like @@ often enough to remember it's
meaning. I'd rather see english names for anything that is not **very**
common.

I find A@@-1 pretty ugly compared to inv(A)
A@@(-0.5)  might be nice   (do we have matrix_sqrt ?)

Josef


On Sat, Mar 15, 2014 at 5:11 PM, Stephan Hoyer sho...@gmail.com wrote:

 Speaking only for myself (and as someone who has regularly used matrix
 powers), I would not expect matrix power as @@ to follow from matrix
 multiplication as @. I do agree that matrix power is the only reasonable
 use for @@ (given @), but it's still not something I would be confident
 enough to know without looking up.

 We should keep in mind that each new operator imposes some (small)
 cognitive burden on everyone who encounters them for the first time, and,
 in this case, this will include a large fraction of all Python users,
 whether they do numerical computation or not.

 Guido has given us a tremendous gift in the form of @. Let's not insist on
 @@, when it is unclear if the burden of figuring out what @@ means it would
 be worth using, even for heavily numeric code. I would certainly prefer to
 encounter norm(A), inv(A), matrix_power(A, n), fractional_matrix_power(A,
 n) and expm(A) rather than their infix equivalents. It will certainly not
 be obvious which of these @@ will support for objects from any given
 library.

 One useful data point might be to consider whether matrix power is
 available as an infix operator in other languages commonly used for
 numerical work. AFAICT from some quick searches:
 MATLAB: Yes
 R: No
 IDL: No

 All of these languages do, of course, implement infix matrix
 multiplication, but it is apparently not clear at all whether the matrix
 power is useful.

 Best,
 Stephan




 On Sat, Mar 15, 2014 at 9:03 AM, Olivier Delalleau sh...@keba.be wrote:

 2014-03-15 11:18 GMT-04:00 Charles R Harris charlesr.har...@gmail.com:




 On Fri, Mar 14, 2014 at 10:32 PM, Nathaniel Smith n...@pobox.com wrote:

 Hi all,

 Here's the second thread for discussion about Guido's concerns about
 PEP 465. The issue here is that PEP 465 as currently written proposes
 two new operators, @ for matrix multiplication and @@ for matrix power
 (analogous to * and **):
   http://legacy.python.org/dev/peps/pep-0465/

 The main thing we care about of course is @; I pushed for including @@
 because I thought it was nicer to have than not, and I thought the
 analogy between * and ** might make the overall package more appealing
 to Guido's aesthetic sense.

 It turns out I was wrong :-). Guido is -0 on @@, but willing to be
 swayed if we think it's worth the trouble to make a solid case.

 Note that question now is *not*, how will @@ affect the reception of
 @. @ itself is AFAICT a done deal, regardless of what happens with @@.
 For this discussion let's assume @ can be taken for granted, and that
 we can freely choose to either add @@ or not add @@ to the language.
 The question is: which do we think makes Python a better language (for
 us and in general)?

 Some thoughts to start us off:

 Here are the interesting use cases for @@ that I can think of:
 - 'vector @@ 2' gives the squared Euclidean length (because it's the
 same as vector @ vector). Kind of handy.
 - 'matrix @@ n' of course gives the matrix power, which is of marginal
 use but does come in handy sometimes, e.g., when looking at graph
 connectivity.
 - 'matrix @@ -1' provides a very transparent notation for translating
 textbook formulas (with all their inverses) into code. It's a bit
 unhelpful in practice, because (a) usually you should use solve(), and
 (b) 'matrix @@ -1' is actually more characters than 'inv(matrix)'. But
 sometimes transparent notation may be important. (And in some cases,
 like using numba or theano or whatever, 'matrix @@ -1 @ foo' could be
 compiled into a call to solve() anyway.)

 (Did I miss any?)

 In practice it seems to me that the last use case is the one that's
 might matter a lot practice, but then again, it might not -- I'm not
 sure. For example, does anyone who teaches programming with numpy have
 a feeling about whether the existence of '@@ -1' would make a big
 difference to you and your students? (Alan? I know you were worried
 about losing the .I attribute on matrices if switching to ndarrays for
 teaching -- given that ndarray will probably not get a .I attribute,
 how much would the existence of @@ -1 affect you?)

 On a more technical level, Guido is worried about how @@'s precedence
 should work (and this is somewhat related to the other thread about
 @'s precedence and associativity, because he feels that if we end up
 giving @ and * different precedence, then that makes it much less
 clear what to do with @@, and reduces the strength of the */**/@/@@
 analogy). In particular, if we want to argue for @@ then we'll need to
 figure out what expressions like
a @@ b @@ c
 and
a ** b @@ c
 and
a @@ b ** c
 should do.

 A related question is what @@ should do 

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-15 Thread josef . pktd
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith n...@pobox.com wrote:

 Hi all,

 Here's the main blocker for adding a matrix multiply operator '@' to
 Python: we need to decide what we think its precedence and associativity
 should be. I'll explain what that means so we're on the same page, and what
 the choices are, and then we can all argue about it. But even better would
 be if we could get some data to guide our decision, and this would be a lot
 easier if some of you all can help; I'll suggest some ways you might be
 able to do that.

 So! Precedence and left- versus right-associativity. If you already know
 what these are you can skim down until you see CAPITAL LETTERS.

 We all know what precedence is. Code like this:
   a + b * c
 gets evaluated as:
   a + (b * c)
 because * has higher precedence than +. It binds more tightly, as they
 say. Python's complete precedence able is here:
   http://docs.python.org/3/reference/expressions.html#operator-precedence

 Associativity, in the parsing sense, is less well known, though it's just
 as important. It's about deciding how to evaluate code like this:
   a * b * c
 Do we use
   a * (b * c)# * is right associative
 or
   (a * b) * c# * is left associative
 ? Here all the operators have the same precedence (because, uh... they're
 the same operator), so precedence doesn't help. And mostly we can ignore
 this in day-to-day life, because both versions give the same answer, so who
 cares. But a programming language has to pick one (consider what happens if
 one of those objects has a non-default __mul__ implementation). And of
 course it matters a lot for non-associative operations like
   a - b - c
 or
   a / b / c
 So when figuring out order of evaluations, what you do first is check the
 precedence, and then if you have multiple operators next to each other with
 the same precedence, you check their associativity. Notice that this means
 that if you have different operators that share the same precedence level
 (like + and -, or * and /), then they have to all have the same
 associativity. All else being equal, it's generally considered nice to have
 fewer precedence levels, because these have to be memorized by users.

 Right now in Python, every precedence level is left-associative, except
 for '**'. If you write these formulas without any parentheses, then what
 the interpreter will actually execute is:
   (a * b) * c
   (a - b) - c
   (a / b) / c
 but
   a ** (b ** c)

 Okay, that's the background. Here's the question. We need to decide on
 precedence and associativity for '@'. In particular, there are three
 different options that are interesting:

 OPTION 1 FOR @:
 Precedence: same as *
 Associativity: left
 My shorthand name for it: same-left (yes, very creative)

 This means that if you don't use parentheses, you get:
a @ b @ c  -  (a @ b) @ c
a * b @ c  -  (a * b) @ c
a @ b * c  -  (a @ b) * c

 OPTION 2 FOR @:
 Precedence: more-weakly-binding than *
 Associativity: right
 My shorthand name for it: weak-right

 This means that if you don't use parentheses, you get:
a @ b @ c  -  a @ (b @ c)
a * b @ c  -  (a * b) @ c
a @ b * c  -  a @ (b * c)

 OPTION 3 FOR @:
 Precedence: more-tightly-binding than *
 Associativity: right
 My shorthand name for it: tight-right

 This means that if you don't use parentheses, you get:
a @ b @ c  -  a @ (b @ c)
a * b @ c  -  a * (b @ c)
a @ b * c  -  (a @ b) * c

 We need to pick which of which options we think is best, based on whatever
 reasons we can think of, ideally more than hmm, weak-right gives me warm
 fuzzy feelings ;-). (In principle the other 2 possible options are
 tight-left and weak-left, but there doesn't seem to be any argument in
 favor of either, so we'll leave them out of the discussion.)

 Some things to consider:

 * and @ are actually not associative (in the math sense) with respect to
 each other, i.e., (a * b) @ c and a * (b @ c) in general give different
 results when 'a' is not a scalar. So considering the two expressions 'a * b
 @ c' and 'a @ b * c', we can see that each of these three options gives
 produces different results in some cases.

 Same-left is the easiest to explain and remember, because it's just, @
 acts like * and /. So we already have to know the rule in order to
 understand other non-associative expressions like a / b / c or a - b - c,
 and it'd be nice if the same rule applied to things like a * b @ c so we
 only had to memorize *one* rule. (Of course there's ** which uses the
 opposite rule, but I guess everyone internalized that one in secondary
 school; that's not true for * versus @.) This is definitely the default we
 should choose unless we have a good reason to do otherwise.

 BUT: there might indeed be a good reason to do otherwise, which is the
 whole reason this has come up. Consider:
 Mat1 @ Mat2 @ vec
 Obviously this will execute much more quickly if we do
 Mat1 @ (Mat2 @ vec)
 because that results in 

Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?

2014-03-15 Thread josef . pktd
On Sat, Mar 15, 2014 at 8:47 PM, Warren Weckesser 
warren.weckes...@gmail.com wrote:


 On Sat, Mar 15, 2014 at 8:38 PM, josef.p...@gmail.com wrote:

 I think I wouldn't use anything like @@ often enough to remember it's
 meaning. I'd rather see english names for anything that is not **very**
 common.

 I find A@@-1 pretty ugly compared to inv(A)
 A@@(-0.5)  might be nice   (do we have matrix_sqrt ?)



 scipy.linalg.sqrtm:
 http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html


maybe a good example: I could never figured that one out

M = sqrtm(A)

A = M @ M

but what we use in stats is

A = R.T @ R
(eigenvectors dot diag(sqrt of eigenvalues)

which sqrt is A@@(0.5) ?

Josef





 Warren



 Josef



 On Sat, Mar 15, 2014 at 5:11 PM, Stephan Hoyer sho...@gmail.com wrote:

 Speaking only for myself (and as someone who has regularly used matrix
 powers), I would not expect matrix power as @@ to follow from matrix
 multiplication as @. I do agree that matrix power is the only reasonable
 use for @@ (given @), but it's still not something I would be confident
 enough to know without looking up.

 We should keep in mind that each new operator imposes some (small)
 cognitive burden on everyone who encounters them for the first time, and,
 in this case, this will include a large fraction of all Python users,
 whether they do numerical computation or not.

 Guido has given us a tremendous gift in the form of @. Let's not insist
 on @@, when it is unclear if the burden of figuring out what @@ means it
 would be worth using, even for heavily numeric code. I would certainly
 prefer to encounter norm(A), inv(A), matrix_power(A, n),
 fractional_matrix_power(A, n) and expm(A) rather than their infix
 equivalents. It will certainly not be obvious which of these @@ will
 support for objects from any given library.

 One useful data point might be to consider whether matrix power is
 available as an infix operator in other languages commonly used for
 numerical work. AFAICT from some quick searches:
 MATLAB: Yes
 R: No
 IDL: No

 All of these languages do, of course, implement infix matrix
 multiplication, but it is apparently not clear at all whether the matrix
 power is useful.

 Best,
 Stephan




 On Sat, Mar 15, 2014 at 9:03 AM, Olivier Delalleau sh...@keba.bewrote:

 2014-03-15 11:18 GMT-04:00 Charles R Harris charlesr.har...@gmail.com
 :




 On Fri, Mar 14, 2014 at 10:32 PM, Nathaniel Smith n...@pobox.comwrote:

 Hi all,

 Here's the second thread for discussion about Guido's concerns about
 PEP 465. The issue here is that PEP 465 as currently written proposes
 two new operators, @ for matrix multiplication and @@ for matrix power
 (analogous to * and **):
   http://legacy.python.org/dev/peps/pep-0465/

 The main thing we care about of course is @; I pushed for including @@
 because I thought it was nicer to have than not, and I thought the
 analogy between * and ** might make the overall package more appealing
 to Guido's aesthetic sense.

 It turns out I was wrong :-). Guido is -0 on @@, but willing to be
 swayed if we think it's worth the trouble to make a solid case.

 Note that question now is *not*, how will @@ affect the reception of
 @. @ itself is AFAICT a done deal, regardless of what happens with @@.
 For this discussion let's assume @ can be taken for granted, and that
 we can freely choose to either add @@ or not add @@ to the language.
 The question is: which do we think makes Python a better language (for
 us and in general)?

 Some thoughts to start us off:

 Here are the interesting use cases for @@ that I can think of:
 - 'vector @@ 2' gives the squared Euclidean length (because it's the
 same as vector @ vector). Kind of handy.
 - 'matrix @@ n' of course gives the matrix power, which is of marginal
 use but does come in handy sometimes, e.g., when looking at graph
 connectivity.
 - 'matrix @@ -1' provides a very transparent notation for translating
 textbook formulas (with all their inverses) into code. It's a bit
 unhelpful in practice, because (a) usually you should use solve(), and
 (b) 'matrix @@ -1' is actually more characters than 'inv(matrix)'. But
 sometimes transparent notation may be important. (And in some cases,
 like using numba or theano or whatever, 'matrix @@ -1 @ foo' could be
 compiled into a call to solve() anyway.)

 (Did I miss any?)

 In practice it seems to me that the last use case is the one that's
 might matter a lot practice, but then again, it might not -- I'm not
 sure. For example, does anyone who teaches programming with numpy have
 a feeling about whether the existence of '@@ -1' would make a big
 difference to you and your students? (Alan? I know you were worried
 about losing the .I attribute on matrices if switching to ndarrays for
 teaching -- given that ndarray will probably not get a .I attribute,
 how much would the existence of @@ -1 affect you?)

 On a more technical level, Guido is worried about how @@'s precedence
 should work 

  1   2   3   4   5   6   7   8   9   10   >