Re: [Numpy-discussion] Possible bug in Numpy.convolve
On Wed, Mar 22, 2017 at 10:06 PM, Thøger Emil Rivera-Thorsen < thoger.e...@gmail.com> wrote: > Dear list; > > I am honestly not certain whether this, or the SciPy list, is the > appropriate place to post this; please let me know if I got it wrong. > > I am convolving a 1D data set containing a relatively narrow peak, with a > relatively narrow Gaussian kernel, in order to emulate the effect of > atmospheric seeing on astrophysical observations. > > I have a 1D data array 45 pixels long, and a Gaussian kernel, and run > np.convolve(data, kernel, mode='same') on the two arrays, the resulting > array's peak is shifted relative to the origin. I have attached a plot to > illustrate. > > The original data is shown in blue. When I convolve it with a symmetric > kernel (black), I get an offset resulting peak (magenta). If I flip the > kernel -- even though it is perfectly symmetric -- the resulting curve is > offset in the opposite direction (yellow). However, if I offset the kernel > so it is centered exactly one pixel below the central value, the output > array gets centered correct (red), even if I flip the (now no longer > symmetric) kernel. > > This is using Numpy 1.11.3, python 2.7.13, on Anaconda 4.3.0 64-bit on > Ubuntu 16.10 > > Using astropy.convolution, reproduces the correct red curve, so I can use > that for now, but it seems to me this is either a bug or, if it is indeed > the intended behavior, a word of caution would be merited in the docstring. > > Cheers, > > Emil Rivera-Thorsen > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > Can you provide an example to replicate? I haven't seen this behavior, it looks centered to me, at least for odd window length.. AFAIR, I had to try a bit in the past for how to set the window and location with even window length. >>> np.__version__ '1.11.2' >>> x = np.linspace(-1, 1, 21) >>> w = stats.norm.pdf(np.linspace(-3, 3, 5)) >>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13] array([[ -2.e-01, -1.33368234e-01], [ -1.e-01, -6.66841169e-02], [ 0.e+00, 1.51788304e-17], [ 1.e-01, 6.66841169e-02], [ 2.e-01, 1.33368234e-01]]) >>> x = np.abs(np.linspace(-1, 1, 21)) >>> w = stats.norm.pdf(np.linspace(-3, 3, 4)) >>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13] array([[ 0.2 , 0.12320129], [ 0.1 , 0.07392077], [ 0., 0.02552663], [ 0.1 , 0.02552663], [ 0.2 , 0.07392077]]) >>> w = stats.norm.pdf(np.linspace(-3, 3, 5)) >>> np.column_stack((x, np.convolve(x, w, mode='same')))[8:13] array([[ 0.2 , 0.13336823], [ 0.1 , 0.06757049], [ 0., 0.02767626], [ 0.1 , 0.06757049], [ 0.2 , 0.13336823]]) Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fortran order in recarray.
On Wed, Feb 22, 2017 at 11:57 AM, Alex Rogozhnikov < alex.rogozhni...@yandex.ru> wrote: > Hi Matthew, > maybe it is not the best place to discuss problems of pandas, but to show > that I am not missing something, let's consider a simple example. > > # simplest DataFrame > x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20))) > # simplest indexing. Can you predict results without looking at comments? > x[:2] # returns two first rows, as expected > x[[0, 1]]# returns copy of x, whole dataframe > x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you > guess why?) > x[[0, 1], :] # unhashable type: list > > > just in case - I know about .loc and .iloc, but when you write code with > many subroutines, you concentrate on numpy inputs, and at some point you > simply *forget* to convert some of the data you operated with to numpy > and it *continues* to work, but it yields wrong results (while you tested > everything, but you tested this for numpy). Checking all the inputs in each > small subroutine is strange. > > Ok, a bit more: > > x[x['a'] > 5]# works as expected > x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be > hashed > lookup = numpy.arange(10) > x[lookup[x['a']] > 5] # works as expected > x[lookup[x['a']] > 5, :] # TypeError: unhashable type: 'numpy.ndarray' > > x[lookup]['a'] # indexError > x['a'][lookup] # works as expected > > > Now let's go a bit further: train/test splitted the data for machine > learning (again, the most frequent operation) > > from sklearn.model_selection import train_test_split > x1, x2 = train_test_split(x, random_state=42) > # compare next to operations with pandas.DataFrame > col = x1['a']print col[:2] # first two elementsprint col[[0, 1]] # > doesn't fail (while there in no row with index 0), fills it with NaNprint > col[numpy.arange(2)] # same as previous > print col[col > 4] # as expectedprint col[col.values > 4] # as expectedprint > col.values[col > 4] # converts boolean to int, uses int indexing, but at > least raises warning > > > Mistakes done by such silent misoperating are not easy to detect (when > your data pipeline consists of several steps), quite hard to locate the > source of problem and almost impossible to be sure that you indeed avoided > all such caveats. Code review turns into paranoidal process (if you care > about the result, of course). > > Things are even worse, because I've demonstrated this for my installation, > and probably if you run this with some other pandas installation, you get > some other results (that were really basic operations). So things that > worked ok in one version, may work different way in the other, this becomes > completely intractable. > > Pandas may be nice, if you need a report, and you need get it done > tomorrow. Then you'll throw away the code. When we initially used pandas as > main data storage in yandex/rep, it looked like an good idea, but a year > later it was obvious this was a wrong decision. In case when you build data > pipeline / research that should be working several years later (using some > other installation by someone else), usage of pandas shall be *minimal*. > > That's why I am looking for a reliable pandas substitute, which should be: > - completely consistent with numpy and should fail when this wasn't > implemented / impossible > - fewer new abstractions, nobody wants to learn > one-more-way-to-manipulate-the-data, > specifically other researchers > - it may be less convenient for interactive data mungling > - in particular, less methods is ok > - written code should be interpretable, and hardly can be misinterpreted. > - not super slow, 1-10 gigabytes datasets are a normal situation > Just to the pandas part statsmodels supported pandas almost from the very beginning (or maybe after 1.5 years) when the new pandas was still very young. However, what I insisted on is that pandas is in the wrapper/interface code, and internally only numpy arrays are used. Besides the confusing "magic" indexing of early pandas, there were a lot of details that silently produced different results, e.g. default iteration on axis=1, ddof in std and var =1 instead of numpy =0. Essentially, every interface corresponds to np.asarry, but we store the DataFrame information, mainly the index and column names, wo we can return the appropriate pandas object if a pandas object was used for the input. This has worked pretty well. Users can have their dataframes, and we have pure numpy algorithms. Recently we have started to use pandas inside a few functions or classes that are less tightly integrated into the overall setup. We also use pandas for some things that are not convenient or not available in numpy. Our internal use of pandas groupby and similar will most likely increase over time. (One of the main issues we had was date and time index because that was a moving target in both numpy and pandas.) One issue for
Re: [Numpy-discussion] PowerPC testing servers
On Thu, Feb 16, 2017 at 3:55 AM, Ralf Gommerswrote: > > > On Thu, Feb 16, 2017 at 3:53 PM, Sandro Tosi wrote: > >> > A recent post to the wheel-builders mailing list pointed out some >> > links to places providing free PowerPC hosting for open source >> > projects, if they agree to a submitted request: >> >> The debian project has some powerpc machines (and we still build numpy >> on those boxes when i upload a new revision to our archives) and they >> also have hosts dedicated to let debian developers login and debug >> issues with their packages on that architecture. I can sponsor access >> to those machines for some of you, but it is not a place where you can >> host a CI instance. >> >> Just keep it in mind more broadly than powerpc, f.e. these are all the >> archs where numpy was built after the last upload >> https://buildd.debian.org/status/package.php?p=python-numpy; >> suite=unstable >> (the grayed out archs are the ones non release critical, so packages >> are built as best effort and if missing is not a big deal) > > > Thanks Sandro. It looks like even for the release-critical ones, it's just > the build that has to succeed and failures are not detected? For example, > armel is green but has 9 failures: https://buildd.debian.org/stat > us/fetch.php?pkg=python-numpy=armel=1%3A1.12.0-2& > stamp=1484889563=0 > > Ralf > More general questions on this: Are there any overviews over which packages in the python for science or python for data anlaysis areas work correctly on different platforms: Are there any platforms/processors, besides the standard x32/x54, where this is important? for example for statsmodels: In early releases of statsmodels, maybe 5 to 7 years ago, Yarik and I were still debugging problems on several machines like ppc and s390x during Debian testing. Since then I haven't heard much about specific problems. The current status for statsmodels on Debian machines is pretty mixed. In several of them some dependencies are not available, in some cases we have errors that might be caused by errors in dependencies, e.g. cvxopt. ppc64el test run for statsmodels has a large number of failure but checking scipy, it looks like it's also not working properly https://buildd.debian.org/status/fetch.php?pkg=python- scipy=ppc64el=0.18.1-2=1477075663=0 In those cases it would be impossible to start debugging, if we would have to debug through the entire dependency chain. CI-testing for Windows, Apple and Linux for mainly x64 seems to be working pretty well, with some delays while version incompatibilities are fixed. But anything that is not in a CI testing setup looks pretty random to me. (I'm mainly curious what the status for those machines are. I'm not really eager to create more debugging work, but sometimes failures on a machine point to code that is "fragile".) Josef > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] offset in fill diagonal
Is there a simple way to fill in diagonal elements in an array for other than main diagonal? As far as I can see, the diagxxx functions that have offset can only read and not inplace modify, and the functions for modifying don't have offset and only allow changing the main diagonal. Usecase: creating banded matrices (2-D arrays) similar to toeplitz. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Wed, Jan 18, 2017 at 8:53 AM,wrote: > > > On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El wrote: > >> >> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com >> wrote: >> >>> Let's look at what the user asked this function, and what it returns: >>> User asks: please give me random pairs of the three items, where item 1 has probability 0.2, item 2 has 0.4, and 3 has 0.4. Function returns: random pairs, where if you make many random returned results (as in the law of large numbers) and look at the items they contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is 0.38333. These are not (quite) the probabilities the user asked for... Can you explain a sense where the user's requested probabilities (0.2, 0.4, 0.4) are actually adhered in the results which random.choice returns? >>> >>> I think that the question the user is asking by specifying p is a >>> slightly different one: >>> "please give me random pairs of the three items extracted from a >>> population of 3 items where item 1 has probability of being extracted of >>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >>> extracted." >>> >> >> You are right, if that is what the user wants, numpy.random.choice does >> the right thing. >> >> I'm just wondering whether this is actually what users want, and whether >> they understand this is what they are getting. >> >> As I said, I expected it to generate pairs with, empirically, the desired >> distribution of individual items. The documentation of numpy.random.choice >> seemed to me (wrongly) that it implis that that's what it does. So I was >> surprised to realize that it does not. >> > > As Alessandro and you showed, the function returns something that makes > sense. If the user wants something different, then they need to look for a > different function, which is however difficult if it doesn't have a > solution in general. > > Sounds to me a bit like a Monty Hall problem. Whether we like it or not, > or find it counter intuitive, it is what it is given the sampling scheme. > > Having more sampling schemes would be useful, but it's not possible to > implement sampling schemes with impossible properties. > BTW: sampling 3 out of 3 without replacement is even worse No matter what sampling scheme and what selection probabilities we use, we always have every element with probability 1 in the sample. (Which in survey statistics implies that the sampling error or standard deviation of any estimate of a population mean or total is zero. Which I found weird. How can you do statistics and get an estimate that doesn't have any uncertainty associated with it?) Josef > > Josef > > > >> >> Nadav. >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'Elwrote: > > On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com > wrote: > >> Let's look at what the user asked this function, and what it returns: >> >>> >>> User asks: please give me random pairs of the three items, where item 1 >>> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >>> >>> Function returns: random pairs, where if you make many random returned >>> results (as in the law of large numbers) and look at the items they >>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >>> 0.38333. >>> These are not (quite) the probabilities the user asked for... >>> >>> Can you explain a sense where the user's requested probabilities (0.2, >>> 0.4, 0.4) are actually adhered in the results which random.choice returns? >>> >> >> I think that the question the user is asking by specifying p is a >> slightly different one: >> "please give me random pairs of the three items extracted from a >> population of 3 items where item 1 has probability of being extracted of >> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >> extracted." >> > > You are right, if that is what the user wants, numpy.random.choice does > the right thing. > > I'm just wondering whether this is actually what users want, and whether > they understand this is what they are getting. > > As I said, I expected it to generate pairs with, empirically, the desired > distribution of individual items. The documentation of numpy.random.choice > seemed to me (wrongly) that it implis that that's what it does. So I was > surprised to realize that it does not. > As Alessandro and you showed, the function returns something that makes sense. If the user wants something different, then they need to look for a different function, which is however difficult if it doesn't have a solution in general. Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or find it counter intuitive, it is what it is given the sampling scheme. Having more sampling schemes would be useful, but it's not possible to implement sampling schemes with impossible properties Josef > > Nadav. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Tue, Jan 17, 2017 at 6:58 PM, aleba...@gmail.comwrote: > > > 2017-01-17 22:13 GMT+01:00 Nadav Har'El : > >> >> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com >> wrote: >> >>> Hi Nadav, >>> >>> I may be wrong, but I think that the result of the current >>> implementation is actually the expected one. >>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and >>> 0.4 >>> >>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >>> >> >> Yes, this formula does fit well with the actual algorithm in the code. >> But, my question is *why* we want this formula to be correct: >> >> Just a note: this formula is correct and it is one of statistics > fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability + > https://en.wikipedia.org/wiki/Bayes%27_theorem > Thus, the result we get from random.choice IMHO definitely makes sense. Of > course, I think we could always discuss about implementing other sampling > methods if they are useful to some application. > > >> >>> Now, P([1]) = 0.2 and P([2]) = 0.4. However: >>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) >>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, >>> once normalised, translate into 1/3 and 2/3 respectively) >>> Therefore P([1,2]) = 0.7/3 = 0.2 >>> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53 >>> >> >> Right, these are the numbers that the algorithm in the current code, and >> the formula above, produce: >> >> P([1,2]) = P([1,3]) = 0.2 >> P([2,3]) = 0.5 >> >> What I'm puzzled about is that these probabilities do not really fullfill >> the given probability vector 0.2, 0.4, 0.4... >> Let me try to explain explain: >> >> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three >> items in the first place? >> >> One reasonable interpretation is that the user wants in his random picks >> to see item 1 half the time of item 2 or 3. >> For example, maybe item 1 costs twice as much as item 2 or 3, so picking >> it half as often will result in an equal expenditure on each item. >> >> If the user randomly picks the items individually (a single item at a >> time), he indeed gets exactly this distribution: 0.2 of the time item 1, >> 0.4 of the time item 2, 0.4 of the time item 3. >> >> Now, what happens if he picks not individual items, but pairs of >> different items using numpy.random.choice with two items, replace=false? >> Suddenly, the distribution of the individual items in the results get >> skewed: If we look at the expected number of times we'll see each item in >> one draw of a random pair, we will get: >> >> E(1) = P([1,2]) + P([1,3]) = 0.4 >> E(2) = P([1,2]) + P([2,3]) = 0.7 >> E(3) = P([1,3]) + P([2,3]) = 0.7 >> >> Or renormalizing by dividing by 2: >> >> P(1) = 0.23 >> P(2) = 0.38 >> P(3) = 0.38 >> >> As you can see this is not quite the probabilities we wanted (which were >> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more >> often than we wanted, and item 2 and 3 were used a bit less often! >> > > p is not the probability of the output but the one of the source finite > population. I think that if you want to preserve that distribution, as > Josef pointed out, you have to make extractions independent, that is either > sample with replacement or approximate an infinite population (that is > basically the same thing). But of course in this case you will also end up > with events [X,X]. > With replacement and keeping duplicates the results might also be similar in the pattern of the marginal probabilities https://onlinecourses.science.psu.edu/stat506/node/17 Another approach in survey sampling is also to drop duplicates in with replacement sampling, but then the sample size itself is random. (again I didn't try to understand the small print) (another related aside: The problem with discrete sample space in small samples shows up also in calculating hypothesis tests, e.g. fisher's exact or similar. Because, we only get a few discrete possibilities in the sample space, it is not possible to construct a test that has exactly the desired type 1 error.) Josef > > >> So that brought my question of why we consider these numbers right. >> >> In this example, it's actually possible to get the right item >> distribution, if we pick the pair outcomes with the following probabilties: >> >>P([1,2]) = 0.2(not 0.23 as above) >>P([1,3]) = 0.2 >>P([2,3]) = 0.6(not 0.53 as above) >> >> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 >> >> Interestingly, fixing things like I suggest is not always possible. >> Consider a different probability-vector example for three items - 0.99, >> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking >> pairs from these three items, *each* returned pair will inevitably contain >> one of the two
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Tue, Jan 17, 2017 at 4:13 PM, Nadav Har'Elwrote: > > > On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com wrote: >> >> Hi Nadav, >> >> I may be wrong, but I think that the result of the current implementation is actually the expected one. >> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 >> >> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) > > > Yes, this formula does fit well with the actual algorithm in the code. But, my question is *why* we want this formula to be correct: > >> >> Now, P([1]) = 0.2 and P([2]) = 0.4. However: >> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) >> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once normalised, translate into 1/3 and 2/3 respectively) >> Therefore P([1,2]) = 0.7/3 = 0.2 >> Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53 > > > Right, these are the numbers that the algorithm in the current code, and the formula above, produce: > > P([1,2]) = P([1,3]) = 0.2 > P([2,3]) = 0.5 > > What I'm puzzled about is that these probabilities do not really fullfill the given probability vector 0.2, 0.4, 0.4... > Let me try to explain explain: > > Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three items in the first place? > > One reasonable interpretation is that the user wants in his random picks to see item 1 half the time of item 2 or 3. > For example, maybe item 1 costs twice as much as item 2 or 3, so picking it half as often will result in an equal expenditure on each item. > > If the user randomly picks the items individually (a single item at a time), he indeed gets exactly this distribution: 0.2 of the time item 1, 0.4 of the time item 2, 0.4 of the time item 3. > > Now, what happens if he picks not individual items, but pairs of different items using numpy.random.choice with two items, replace=false? > Suddenly, the distribution of the individual items in the results get skewed: If we look at the expected number of times we'll see each item in one draw of a random pair, we will get: > > E(1) = P([1,2]) + P([1,3]) = 0.4 > E(2) = P([1,2]) + P([2,3]) = 0.7 > E(3) = P([1,3]) + P([2,3]) = 0.7 > > Or renormalizing by dividing by 2: > > P(1) = 0.23 > P(2) = 0.38 > P(3) = 0.38 > > As you can see this is not quite the probabilities we wanted (which were 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more often than we wanted, and item 2 and 3 were used a bit less often! > > So that brought my question of why we consider these numbers right. > > In this example, it's actually possible to get the right item distribution, if we pick the pair outcomes with the following probabilties: > >P([1,2]) = 0.2(not 0.23 as above) >P([1,3]) = 0.2 >P([2,3]) = 0.6(not 0.53 as above) > > Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 > > Interestingly, fixing things like I suggest is not always possible. Consider a different probability-vector example for three items - 0.99, 0.005, 0.005. Now, no matter which algorithm we use for randomly picking pairs from these three items, *each* returned pair will inevitably contain one of the two very-low-probability items, so each of those items will appear in roughly half the pairs, instead of in a vanishingly small percentage as we hoped. > > But in other choices of probabilities (like the one in my original example), there is a solution. For 2-out-of-3 sampling we can actually show a system of three linear equations in three variables, so there is always one solution but if this solution has components not valid as probabilities (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005, 0.005 example. I think the underlying problem is that in the sampling space the events (1, 2) (1, 3) (2, 3) are correlated and because of the discreteness an arbitrary marginal distribution on the individual events 1, 2, 3 is not possible. related aside: I'm not able (or willing to spend the time) on the math, but I just went through something similar for survey sampling in finite population (e.g. survey two out of 3 individuals, where 3 is the population), leading to the Horvitz–Thompson estimator. The books have chapters on different sampling schemes and derivation of the marginal and joint probability to be surveyed. (I gave up on sampling without replacement, and assume we have a large population where it doesn't make a difference.) In some of the sampling schemes they pick sequentially and adjust the probabilities for the remaining individuals. That seems to provide more flexibility to create a desired or optimal sampling scheme. Josef > > > >> >> What am I missing? >> >> Alessandro >> >> >> 2017-01-17 13:00 GMT+01:00 : >>> >>> Hi, I'm looking for a way to find a random sample of C different items out >>> of N items, with a
Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve
On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polatwrote: > > Note that you're proposing a new scipy feature (right?) on the numpy > list > > > This sounds like a good idea to me. As a former heavy Matlab user I > remember a lot of things to dislike, but "\" behavior was quite nice. > > Correct, I am not sure where this might go in. It seemed like a NumPy > array operation (touching array elements rapidly etc. can also be added for > similar functionalities other than solve) hence the NumPy list. But of > course it can be pushed as an exclusive SciPy feature. I'm not sure what > the outlook on np.linalg.solve is. > > > > How much is a noticeable slowdown? Note that we still have the current > interfaces available for users that know what they need, so a nice > convenience function that is say 5-10% slower would not be the end of the > world. > > the fastest case was around 150-400% slower but of course it might be the > case that I'm not using the fastest methods. It was mostly shuffling things > around and using np.any on them in the pure python3 case. I will cook up > something again for the baseline as soon as I have time. > > > All this checks sound a bit expensive, if we have almost always completely unstructured arrays that don't satisfy any special matrix pattern. In analogy to the type proliferation in Julia to handle those cases: Is there a way to attach information to numpy arrays that for example signals that a 2d array is hermitian, banded or diagonal or ...? (After second thought: maybe completely unstructured is not too expensive to detect if the checks are short-circuited, one off diagonal element nonzero - not diagonal, two opposite diagonal different - not symmetric, ...) Josef > > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating matrices.
On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommerswrote: > > > On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey > wrote: > >> >> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >> wrote: >> >>> This sounds like a reasonable idea. Timeline could be something like: >>> >>> 1. Now: create new package, deprecate np.matrix in docs. >>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >>> 3. After 2020: remove matrix from numpy. >>> >>> Ralf >>> >> >> I think this sounds reasonable, and reminds me of the deliberate >> deprecation process taken for scipy.weave. I guess we'll see how successful >> it was when 0.19 is released. >> >> The major problem I have with removing numpy matrices is the effect on >> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >> often produces numpy.matrix results when densifying. The two are coupled >> tightly enough that if numpy matrices go away, all of the existing sparse >> matrix classes will have to go at the same time. >> >> I don't think that would be the end of the world, >> > > Not the end of the world literally, but the impact would be pretty major. > I think we're stuck with scipy.sparse, and may at some point will add a new > sparse *array* implementation next to it. For scipy we will have to add a > dependency on the new npmatrix package or vendor it. > That sounds to me like moving maintenance of numpy.matrix from numpy to scipy, if scipy.sparse is one of the main users and still depends on it. Josef > > Ralf > > > >> but it's definitely something that should happen while scipy is still >> pre-1.0, if it's ever going to happen. >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating matrices.
On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommerswrote: > > > On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> Hi All, >> >> Just throwing this click bait out for discussion. Now that the `@` >> operator is available and things seem to be moving towards Python 3, >> especially in the classroom, we should consider the real possibility of >> deprecating the matrix type and later removing it. No doubt there are old >> scripts that require them, but older versions of numpy are available for >> those who need to run old scripts. >> >> Thoughts? >> > > Clearly deprecate in the docs now, and warn only later imho. We can't warn > before we have a good solution for scipy.sparse matrices, which have matrix > semantics and return matrix instances. > > Ralf > How about dropping python 2 support at the same time, then we can all be in a @ world. Josef > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy integers to integer powers again again
On Wed, Oct 26, 2016 at 3:39 PM, Nathaniel Smithwrote: > On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris > wrote: > [...] > > What I have been concerned about are the follow combinations that > currently > > return floats > > > > num: , exp: , res: > 'numpy.float32'> > > num: , exp: , res: > 'numpy.float32'> > > num: , exp: , res: > 'numpy.float32'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > num: , exp: , res: > 'numpy.float64'> > > What's this referring to? For both arrays and scalars I get: > > In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype > Out[8]: dtype('int8') > > In [9]: (np.int8(2) ** np.int8(2)).dtype > Out[9]: dtype('int8') > >>> (np.array([2], dtype=np.int8) ** np.array(-1, dtype=np.int8).squeeze()).dtype dtype('int8') >>> (np.array([2], dtype=np.int8)[0] ** np.array(-1, dtype=np.int8).squeeze()).dtype dtype('float32') >>> (np.int8(2)**np.int8(-1)).dtype dtype('float32') >>> (np.int8(2)**np.int8(2)).dtype dtype('int8') The last one looks like value dependent scalar dtype Josef > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy integers to integer powers again again
On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harriswrote: > > > On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer wrote: > >> I am also concerned about adding more special cases for NumPy scalars vs >> arrays. These cases are already confusing (e.g., making no distinction >> between 0d arrays and scalars) and poorly documented. >> >> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith wrote: >> >>> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris >>> wrote: >>> > Hi All, >>> > >>> > I've been thinking about this some (a lot) more and have an alternate >>> > proposal for the behavior of the `**` operator >>> > >>> > if both base and power are numpy/python scalar integers, convert to >>> python >>> > integers and call the `**` operator. That would solve both the >>> precision and >>> > compatibility problems and I think is the option of least surprise. For >>> > those who need type preservation and modular arithmetic, the np.power >>> > function remains, although the type conversions can be surpirising as >>> it >>> > seems that the base and power should play different roles in >>> determining >>> > the type, at least to me. >>> > Array, 0-d or not, are treated differently from scalars and integers >>> raised >>> > to negative integer powers always raise an error. >>> > >>> > I think this solves most problems and would not be difficult to >>> implement. >>> > >>> > Thoughts? >>> >>> My main concern about this is that it adds more special cases to numpy >>> scalars, and a new behavioral deviation between 0d arrays and scalars, >>> when ideally we should be trying to reduce the >>> duplication/discrepancies between these. It's also inconsistent with >>> how other operations on integer scalars work, e.g. regular addition >>> overflows rather than promoting to Python int: >>> >>> In [8]: np.int64(2 ** 63 - 1) + 1 >>> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning: >>> overflow encountered in long_scalars >>> #!/home/njs/.user-python3.5-64bit/bin/python3.5 >>> Out[8]: -9223372036854775808 >>> >>> So I'm inclined to try and keep it simple, like in your previous >>> proposal... theoretically of course it would be nice to have the >>> perfect solution here, but at this point it feels like we might be >>> overthinking this trying to get that last 1% of improvement. The thing >>> where 2 ** -1 returns 0 is just broken and bites people so we should >>> definitely fix it, but beyond that I'm not sure it really matters >>> *that* much what we do, and "special cases aren't special enough to >>> break the rules" and all that. >>> >>> > What I have been concerned about are the follow combinations that > currently return floats > > num: , exp: , res: 'numpy.float32'> > num: , exp: , res: 'numpy.float32'> > num: , exp: , res: 'numpy.float32'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > num: , exp: , res: 'numpy.float64'> > > The other combinations of signed and unsigned integers to signed powers > currently raise ValueError due to the change to the power ufunc. The > exceptions that aren't covered by uint64 + signed (which won't change) seem > to occur when the exponent can be safely cast to the base type. I suspect > that people have already come to depend on that, especially as python > integers on 64 bit linux convert to int64. So in those cases we should > perhaps raise a FutureWarning instead of an error. > >>> np.int64(2)**np.array(-1, np.int64) 0.5 >>> np.__version__ '1.10.4' >>> np.int64(2)**np.array([-1, 2], np.int64) array([0, 4], dtype=int64) >>> np.array(2, np.uint64)**np.array([-1, 2], np.int64) array([0, 4], dtype=int64) >>> np.array([2], np.uint64)**np.array([-1, 2], np.int64) array([ 0.5, 4. ]) >>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64) array([0, 4], dtype=int64) (IMO: If you have to break backwards compatibility, break forwards not backwards.) Josef http://www.stanlaurelandoliverhardy.com/nicemess.htm > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve
On Tue, Oct 18, 2016 at 1:30 PM,wrote: > On Tue, Oct 18, 2016 at 1:25 PM, wrote: >> On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig >> wrote: >>> Hi, >>> >>> >>> Le 16/10/2016 à 11:52, Hanno Klemm a écrit : When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them. >>> When estimating the autocorrelation of a signal, it make sense to drop >>> missing pairs of values. Only in this use case, it opens the question of >>> correcting or not correcting for the number of missing elements when >>> computing the mean. I don't remember what R function "acf" is doing. > > as aside: statsmodels has now an option for acf and similar > > missing : str > A string in ['none', 'raise', 'conservative', 'drop'] > specifying how the NaNs > are to be treated. aside to the aside: statsmodels was just catching up in this The original for masked array acf including correct counting of "valid" terms is https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/lib/avcf.py (which I looked at way before statsmodels had any acf) Josef > > Josef > >>> >>> >>> Also, coming back to the initial question, I feel that it is necessary >>> that the name "mask" (or "na" or similar) appears in the parameter name. >>> Otherwise, people will wonder : "what on earth is contagious/being >>> propagated" >>> >>> just thinking of yet another keyword name : ignore_masked (or drop_masked) >>> >>> If I remember well, in R it is dropna. It would be nice if the boolean >>> switch followed the same logic. >>> >>> Now of course the convolution function is more general than just >>> autocorrelation... >> >> I think "drop" or "ignore" is too generic, for correlation it would be >> for example ignore pairs versus ignore cases. >> >> To me propagate sounds ok to me, but something with `valid` might be >> more explicit for convolution or `correlate`, however `valid` also >> refers to the end points, so maybe valid_na or valid_masked=True >> >> Josef >> >>> >>> best, >>> Pierre >>> >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve
On Tue, Oct 18, 2016 at 1:25 PM,wrote: > On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig > wrote: >> Hi, >> >> >> Le 16/10/2016 à 11:52, Hanno Klemm a écrit : >>> When I have similar situations, I usually interpolate between the valid >>> values. I assume there are a lot of use cases for convolutions but I have >>> difficulties imagining that ignoring a missing value and, for the purpose >>> of the computation, treating it as zero is useful in many of them. >> When estimating the autocorrelation of a signal, it make sense to drop >> missing pairs of values. Only in this use case, it opens the question of >> correcting or not correcting for the number of missing elements when >> computing the mean. I don't remember what R function "acf" is doing. as aside: statsmodels has now an option for acf and similar missing : str A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs are to be treated. Josef >> >> >> Also, coming back to the initial question, I feel that it is necessary >> that the name "mask" (or "na" or similar) appears in the parameter name. >> Otherwise, people will wonder : "what on earth is contagious/being >> propagated" >> >> just thinking of yet another keyword name : ignore_masked (or drop_masked) >> >> If I remember well, in R it is dropna. It would be nice if the boolean >> switch followed the same logic. >> >> Now of course the convolution function is more general than just >> autocorrelation... > > I think "drop" or "ignore" is too generic, for correlation it would be > for example ignore pairs versus ignore cases. > > To me propagate sounds ok to me, but something with `valid` might be > more explicit for convolution or `correlate`, however `valid` also > refers to the end points, so maybe valid_na or valid_masked=True > > Josef > >> >> best, >> Pierre >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessigwrote: > Hi, > > > Le 16/10/2016 à 11:52, Hanno Klemm a écrit : >> When I have similar situations, I usually interpolate between the valid >> values. I assume there are a lot of use cases for convolutions but I have >> difficulties imagining that ignoring a missing value and, for the purpose of >> the computation, treating it as zero is useful in many of them. > When estimating the autocorrelation of a signal, it make sense to drop > missing pairs of values. Only in this use case, it opens the question of > correcting or not correcting for the number of missing elements when > computing the mean. I don't remember what R function "acf" is doing. > > > Also, coming back to the initial question, I feel that it is necessary > that the name "mask" (or "na" or similar) appears in the parameter name. > Otherwise, people will wonder : "what on earth is contagious/being > propagated" > > just thinking of yet another keyword name : ignore_masked (or drop_masked) > > If I remember well, in R it is dropna. It would be nice if the boolean > switch followed the same logic. > > Now of course the convolution function is more general than just > autocorrelation... I think "drop" or "ignore" is too generic, for correlation it would be for example ignore pairs versus ignore cases. To me propagate sounds ok to me, but something with `valid` might be more explicit for convolution or `correlate`, however `valid` also refers to the end points, so maybe valid_na or valid_masked=True Josef > > best, > Pierre > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to negative integer powers, time for a decision.
On Fri, Oct 7, 2016 at 9:12 PM, Charles R Harriswrote: > Hi All, > > The time for NumPy 1.12.0 approaches and I like to have a final decision on > the treatment of integers to negative integer powers with the `**` operator. > The two alternatives looked to be > > Raise an error for arrays and numpy scalars, including 1 and -1 to negative > powers. > > Pluses > > Backward compatible > Allows common powers to be integer, e.g., arange(3)**2 > Consistent with inplace operators > Fixes current wrong behavior. > Preserves type > > > Minuses > > Integer overflow > Computational inconvenience > Inconsistent with Python integers > > > Always return a float > > Pluses > > Computational convenience > > > Minuses > > Loss of type > Possible backward incompatibilities > Not applicable to inplace operators > > > > Thoughts? 2: +1 I'm still in favor of number 2: less buggy code and less mental gymnastics (watch out for that int, or which int do I need) (upcasting is not applicable for any inplace operators, AFAIU *=0.5 ? zz = np.arange(5) zz**(-1) zz *= 0.5 tried in >>> np.__version__ '1.9.2rc1' >>> np.__version__ '1.10.4' backwards compatibility ? ) Josef > > Chuck > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Which NumPy/Numpy/numpy spelling?
https://mail.scipy.org/pipermail/scipy-user/2010-June/025756.html "NumPy and SciPy to refer to the projects. numpy and scipy to refer to the packages, specifically. When in doubt, use the former." I thought there was also another discussion about capital letters, but I don't find it. Josef On Mon, Aug 29, 2016 at 9:22 AM, Bartosz Telenczukwrote: > Hi, > > I would not mind any choice as long as it's consistent. > > I agree that using all-lowercase spelling may avoid some common errors. > However, > PEP8 requires all module/package names to be lower case [1]. If we force the > name of the library and the corresponding package to be the same, all Python > libraries would be named in lowercase. This would not be the best choice for > libraries, which have multi-component names (like NumPy = Numerical Python). > > Note also that both the Wikipedia page [2] and the official NumPy logo [3] use > "NumPy" spelling. > > Some other popular [4] libraries use similar dichotomies: > > - Django - import django > - Cython - import cython > - PyYAML - import yaml > - scikit-learn - import sklearn > > On the other hand all standard Python libraries are lower-case named. > > Cheers, > > Bartosz > > [1] https://www.python.org/dev/peps/pep-0008/#package-and-module-names > [2] https://en.wikipedia.org/wiki/NumPy > [3] http://www.numpy.org/_static/numpy_logo.png > [4] http://pypi-ranking.info/alltime > >> On 08/29/2016 07:43 AM, m...@telenczuk.pl wrote: >> > What is the official spelling of NumPy/Numpy/numpy? >> >> IMHO it should be written numpy, because ... >> >> >>> import NumPy >> Traceback (most recent call last): >>File "", line 1, in >> ImportError: No module named NumPy >> >>> import Numpy >> Traceback (most recent call last): >>File "", line 1, in >> ImportError: No module named Numpy >> >>> import numpy >> >>> >> >> Phil >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] StackOverflow documentation
On Thu, Jul 21, 2016 at 3:10 PM, Pauli Virtanenwrote: > Thu, 21 Jul 2016 16:24:15 +0200, mail kirjoitti: > [clip] >> Since it is run by the community, perhaps it's not a bad idea to >> encourage people to share their examples. > > I would perhaps rather encourage people to improve the "Numpy User Guide" > or the main documentation. Of course, working on that requires a somewhat > different level of committment than editing what's essentially > Stackexchange-provided wiki (where content gets relicensed with > attribution clauses where you have to reference stackexchange and not the > original author directly). The way it looks right now, it is more of an example collection by topic, similar to the old scipy wiki examples for numpy. http://stackoverflow.com/documentation/python/196/comprehensions#t=201607212153155953853 It doesn't look like it will be a substitute for proper API documentation, docstrings with numpy standard. Longer examples or recipes on stackoverflow also have the problem that code doesn't have a different license and falls under the documentation license. So, I don't think it will become a substitute for good notebook or blog post examples either. (license is the usual one way street, they copy, we cannot copy back under our license) Josef > > -- > Pauli Virtanen > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d
On Wed, Jul 6, 2016 at 3:29 AM,wrote: > > > On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommers > wrote: > >> >> >> On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith wrote: >> >> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" >>> wrote: >>> > >>> > Hi, >>> > >>> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a >>> > function np.atleast_nd in PR#7804 >>> > (https://github.com/numpy/numpy/pull/7804). >>> > >>> > As a result of this PR, I have a couple of questions about >>> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with >>> > the dimensions: If the input is 1D, it prepends and appends a size-1 >>> > dimension. If the input is 2D, it appends a size-1 dimension. This is >>> > inconsistent with `np.atleast_2d`, which always prepends (as does >>> > `np.atleast_nd`). >>> > >>> > - Is there any reason for this behavior? >>> > - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in >>> > terms of `np.atleast_nd`, which is actually much simpler)? This would >>> > be a slight API change since the output would not be exactly the same. >>> >>> Changing atleast_3d seems likely to break a bunch of stuff... >>> >>> Beyond that, I find it hard to have an opinion about the best design for >>> these functions, because I don't think I've ever encountered a situation >>> where they were actually what I wanted. I'm not a big fan of coercing >>> dimensions in the first place, for the usual "refuse to guess" reasons. And >>> then generally if I do want to coerce an array to another dimension, then I >>> have some opinion about where the new dimensions should go, and/or I have >>> some opinion about the minimum acceptable starting dimension, and/or I have >>> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix; >>> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that >>> requirements list.) >>> >>> I don't know how typical I am in this. But it does make me wonder if the >>> atleast_* functions act as an attractive nuisance, where new users take >>> their presence as an implicit recommendation that they are actually a >>> useful thing to reach for, even though they... aren't that. And maybe we >>> should be recommending folk move away from them rather than trying to >>> extend them further? >>> >>> Or maybe they're totally useful and I'm just missing it. What's your use >>> case that motivates atleast_nd? >>> >> I think you're just missing it:) atleast_1d/2d are used quite a bit in >> Scipy and Statsmodels (those are the only ones I checked), and in the large >> majority of cases it's the best thing to use there. There's a bunch of >> atleast_2d calls with a transpose appended because the input needs to be >> treated as columns instead of rows, but that's still efficient and readable >> enough. >> > > > As Ralph pointed out its usage in statsmodels. I do find them useful as > replacement for several lines of ifs and reshapes > > We stilll need in many cases the atleast_2d_cols, that appends the newaxis > if necessary. > > roughly the equivalent of > > if x.ndim == 1: > x = x[:, None] > else: > x = np.atleast_2d(x) > > Josef > > >> >> For 3D/nD I can see that you'd need more control over where the >> dimensions go, but 1D/2D are fine. >> > statsmodels has currently very little code with ndim >2, so I have no overview of possible use cases, but it would be necessary to have full control over the added axis since axis have a strict meaning and stats still prefer Fortran order to default numpy/C ordering. Josef > >> >> Ralf >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d
On Wed, Jul 6, 2016 at 2:21 AM, Ralf Gommerswrote: > > > On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith wrote: > > On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz" >> wrote: >> > >> > Hi, >> > >> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a >> > function np.atleast_nd in PR#7804 >> > (https://github.com/numpy/numpy/pull/7804). >> > >> > As a result of this PR, I have a couple of questions about >> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with >> > the dimensions: If the input is 1D, it prepends and appends a size-1 >> > dimension. If the input is 2D, it appends a size-1 dimension. This is >> > inconsistent with `np.atleast_2d`, which always prepends (as does >> > `np.atleast_nd`). >> > >> > - Is there any reason for this behavior? >> > - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in >> > terms of `np.atleast_nd`, which is actually much simpler)? This would >> > be a slight API change since the output would not be exactly the same. >> >> Changing atleast_3d seems likely to break a bunch of stuff... >> >> Beyond that, I find it hard to have an opinion about the best design for >> these functions, because I don't think I've ever encountered a situation >> where they were actually what I wanted. I'm not a big fan of coercing >> dimensions in the first place, for the usual "refuse to guess" reasons. And >> then generally if I do want to coerce an array to another dimension, then I >> have some opinion about where the new dimensions should go, and/or I have >> some opinion about the minimum acceptable starting dimension, and/or I have >> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix; >> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that >> requirements list.) >> >> I don't know how typical I am in this. But it does make me wonder if the >> atleast_* functions act as an attractive nuisance, where new users take >> their presence as an implicit recommendation that they are actually a >> useful thing to reach for, even though they... aren't that. And maybe we >> should be recommending folk move away from them rather than trying to >> extend them further? >> >> Or maybe they're totally useful and I'm just missing it. What's your use >> case that motivates atleast_nd? >> > I think you're just missing it:) atleast_1d/2d are used quite a bit in > Scipy and Statsmodels (those are the only ones I checked), and in the large > majority of cases it's the best thing to use there. There's a bunch of > atleast_2d calls with a transpose appended because the input needs to be > treated as columns instead of rows, but that's still efficient and readable > enough. > As Ralph pointed out its usage in statsmodels. I do find them useful as replacement for several lines of ifs and reshapes We stilll need in many cases the atleast_2d_cols, that appends the newaxis if necessary. roughly the equivalent of if x.ndim == 1: x = x[:, None] else: x = np.atleast_2d(x) Josef > > For 3D/nD I can see that you'd need more control over where the dimensions > go, but 1D/2D are fine. > > Ralf > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.dtype has the wrong size, try recompiling
On Tue, Jul 5, 2016 at 4:22 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > This question on Stack Overflow: > > > http://stackoverflow.com/questions/38197086/sklearn-numpy-dtype-has-the-wrong-size-try-recompiling-in-both-pycharm-and-te > > If I remember correctly this has something to do with Cython, right? Can't > remeber the details though, can someone help that poor soul out? > from what I understood many years ago (based mainly on what David Cournapeau was saying) http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling Josef > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes > de dominación mundial. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesiaswrote: > On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com) > wrote: > > Hashing it probably wouldn't work, too > great a chance for collisions. > > > If the string is ASCII, you can always interpret the bytes as part of an 8 > byte integer. Or, you can map unique values to consecutive integers. > IIUC np.nonzero(a[1] == a[:-1]) gives all changes independent of dtype. add or remove a 1 to adjust which element is indexed. (IIRC from a long time ago, arraysetops used/uses something like this.) Josef > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaacwrote: > On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote: >> >> 1. What in principle is the best return type for int ** int (which >> Josef I think most properly rephrased as whether `**` should be >> thought of as a float operator, like `/` in python3 and `sqrt` etc.); > > > > Perhaps the question is somewhat different. Maybe it is: what type > should a user expect when the exponent is a Python int? The obvious > choices seem to be an object array of Python ints, or an array of > floats. So far, nobody has proposed the former, and concerns have > been expressed about the latter. More important, either would break > the rule that the scalar type is not important in array operations, > which seems like a good general rule (useful and easy to remember). > > How much commitment is there to such a rule? E.g., > np.int64(2**7)*np.arange(5,dtype=np.int8) > violates this. One thing that has come out of this > discussion for me is that the actual rules in play are > hard to keep track of. Are they all written down in > one place? > > I suspect there is general support for the idea that if someone > explicitly specifies the same dtype for the base and the > exponent then the result should also have that dtype. > I think this is already true for array exponentiation > and for scalar exponentiation. > > One other thing that a user might expect, I believe, is that > any type promotion rules for scalars and arrays will be the same. > This is not currently the case, and that feels like an > inconsistency. But is it an inconsistency? If the rule is that > that array type dominates the scalar type, that may > be understandable, but then it should be a firm rule. > In this case, an exponent that is a Python int should not > affect the dtype of the (array) result. > > In sum, as a user, I've come around to Chuck's original proposal: > integers raised to negative integer powers raise an error. > My reason for coming around is that I believe it meshes > well with a general rule that in binary operations the > scalar dtypes should not influence the dtype of an array result. > Otoh, it is unclear to me how much commitment there is to that rule. > > Thanks in advance to anyone who can help me understand better > the issues in play. the main thing I get out of the discussion in this thread is that this is way to complicated. which ints do I have? is it python or one of the many numpy int types, or two different (u)int types or maybe one is a scalar so it shouldn't count? scalar dominates here >>> (np.ones(5, np.int8) *1.0).dtype dtype('float64') otherwise a huge amount of code would be broken that uses the *1. trick Josef > > Cheers, > Alan Isaac > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Mon, Jun 13, 2016 at 12:07 PM,wrote: > > > On Mon, Jun 13, 2016 at 11:51 AM, wrote: > >> >> >> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou >> wrote: >> >>> On Mon, 13 Jun 2016 10:49:44 -0400 >>> josef.p...@gmail.com wrote: >>> > >>> > My argument is that `**` is like integer division and sqrt where the >>> domain >>> > where integer return are the correct numbers is too small to avoid >>> > headaches by users. >>> >>> float64 has less integer precision than int64: >>> >>> >>> math.pow(3, 39) == 3**39 >>> False >>> >>> np.int64(3)**39 == 3**39 >>> True >>> >> >> but if a user does this, then ??? (headaches or head scratching) >> >> >>> np.array([3])**39 >> RuntimeWarning: invalid value encountered in power >> >> array([-2147483648], dtype=int32) >> > > I forgot to add > > the real headaches start in the second call, when we don't get the > RuntimeWarning anymore > > >>> np.array([4])**39 > array([-2147483648], dtype=int32) > > > ("Now, why do I owe so much money, when I made a huge profit all year." ) > (grumpy off-topic complaint: The Canadian tax system is like this. They make a mistake in transferring information to a new computerized system, and then they send a bill for taxes based on reassessment of something that happened 5 years ago because their computerized record is wrong. ) > > Josef > > > >> >> Josef >> >> >>> >>> >>> (as a sidenote, np.float64's equality operator seems to be slightly >>> broken: >>> >>> >>> np.float64(3)**39 == 3**39 >>> True >>> >>> int(np.float64(3)**39) == 3**39 >>> False >>> >>> float(np.float64(3)**39) == 3**39 >>> False >>> ) >>> >>> Regards >>> >>> Antoine. >>> >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Mon, Jun 13, 2016 at 11:51 AM,wrote: > > > On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou > wrote: > >> On Mon, 13 Jun 2016 10:49:44 -0400 >> josef.p...@gmail.com wrote: >> > >> > My argument is that `**` is like integer division and sqrt where the >> domain >> > where integer return are the correct numbers is too small to avoid >> > headaches by users. >> >> float64 has less integer precision than int64: >> >> >>> math.pow(3, 39) == 3**39 >> False >> >>> np.int64(3)**39 == 3**39 >> True >> > > but if a user does this, then ??? (headaches or head scratching) > > >>> np.array([3])**39 > RuntimeWarning: invalid value encountered in power > > array([-2147483648], dtype=int32) > I forgot to add the real headaches start in the second call, when we don't get the RuntimeWarning anymore >>> np.array([4])**39 array([-2147483648], dtype=int32) ("Now, why do I owe so much money, when I made a huge profit all year." ) Josef > > Josef > > >> >> >> (as a sidenote, np.float64's equality operator seems to be slightly >> broken: >> >> >>> np.float64(3)**39 == 3**39 >> True >> >>> int(np.float64(3)**39) == 3**39 >> False >> >>> float(np.float64(3)**39) == 3**39 >> False >> ) >> >> Regards >> >> Antoine. >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrouwrote: > On Mon, 13 Jun 2016 10:49:44 -0400 > josef.p...@gmail.com wrote: > > > > My argument is that `**` is like integer division and sqrt where the > domain > > where integer return are the correct numbers is too small to avoid > > headaches by users. > > float64 has less integer precision than int64: > > >>> math.pow(3, 39) == 3**39 > False > >>> np.int64(3)**39 == 3**39 > True > but if a user does this, then ??? (headaches or head scratching) >>> np.array([3])**39 RuntimeWarning: invalid value encountered in power array([-2147483648], dtype=int32) Josef > > > (as a sidenote, np.float64's equality operator seems to be slightly > broken: > > >>> np.float64(3)**39 == 3**39 > True > >>> int(np.float64(3)**39) == 3**39 > False > >>> float(np.float64(3)**39) == 3**39 > False > ) > > Regards > > Antoine. > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Mon, Jun 13, 2016 at 10:05 AM, Alan Isaacwrote: > On 6/13/2016 4:47 AM, Antoine Pitrou wrote: > >> Currently, the choice is simple: if you want an int output, >> have an int input; if you want a float output, have a float output. >> > > > That is a misunderstanding, which may be influencing the discussion. > Examples of complications: > > >>> type(np.int8(2)**2) > > >>> type(np.uint64(2)**np.int8(2)) > > > I don't think anyone has proposed first principles > from which the desirable behavior could be deduced. > I do think reference to the reasoning used by other > languages in making this decision could be helpful. I think the main principle is whether an operator is a "float" operator. for example, I don't think anyone would expect sqrt(int) to return int, even if it would have exact results in a countable infinite number of cases (theoretically) another case is division which moved from return-int to return-float definition in the py2 - py3 move. My argument is that `**` is like integer division and sqrt where the domain where integer return are the correct numbers is too small to avoid headaches by users. Josef > > > Alan Isaac > (on Windows) > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawronwrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems > to me once you do > > osm = dist.ppf(osm_uniform) >>> >>> > you’re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it’s a division point in a UNIFORM division of the probability axis. > Yes and No, quantile, i.e. what you get from ppf, are units of the random variable. So it is on the scale of the random variable not on a probability scale. The axis labels are in units of the random variable. pp-plots have probabilities on the axis and are uniform scaled in probabilities but non-uniform in the values of the random variable. The difficult part to follow is if the plot is done uniform in one scale, but the axis are labeled non-uniform in the other scale. That's what Paul's probscale does and what you have in mind, AFAIU. Josef > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers > wrote: > >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron >> wrote: >> >>> >>> >>> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >>> values versus actual data values for visualization of fit to a >>> distribution. First a one-D array of expected percentiles is generated for >>> a sample of size N; then that is passed to dist.ppf, the per cent point >>> function for the chosen distribution, to return an array of expected >>> values. The visualized data points are pairs of expected and actual >>> values, and a linear regression is done on these to produce the line data >>> points in this distribution should lie on. >>> >>> Where x is the input data array and dist the chosen distribution we have: >>> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >>> >>> >>> My question concerns the plot display. >>> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >>> >>> >>> The x-axis of the resulting plot is labeled quantiles, but the xticks >>> and xticklabels produced produced by qqplot and problplot do not seem >>> correct for the their intended interpretations. First the numbers on the >>> x-axis do not represent quantiles; the intervals between them do not in >>> general contain equal numbers of points. For a normal distribution with >>> sigma=1, they represent standard deviations. Changing the label on the >>> x-axis does not seem like a very good solution, because the interpretation >>> of the values on the x-axis will be different for different distributions. >>> Rather the right solution seems to be to actually show quantiles on the >>> x-axis. The numbers on the x-axis can stay as they are, representing >>> quantile indexes, but they need to be spaced so as to show the actual >>> division points that carve the population up into groups of the same >>> size. This can be done in something like the following way. >>> >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] >> and statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity > scales attached to pp-plots or generalized versions of probability plots. I > think qqplots are less ambiguous because they are on the original or > standardized scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields > that have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that > there will be a large choice of axis labels and scales. > > Josef > > >> >> >> >> ___ >> NumPy-Discussion mailing list >>
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommerswrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a >> distribution. First a one-D array of expected percentiles is generated for >> a sample of size N; then that is passed to dist.ppf, the per cent point >> function for the chosen distribution, to return an array of expected >> values. The visualized data points are pairs of expected and actual >> values, and a linear regression is done on these to produce the line data >> points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> >> My question concerns the plot display. >> >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the >> values on the x-axis will be different for different distributions. Rather >> the right solution seems to be to actually show quantiles on the x-axis. >> The numbers on the x-axis can stay as they are, representing quantile >> indexes, but they need to be spaced so as to show the actual division >> points that carve the population up into groups of the same size. This >> can be done in something like the following way. >> > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] > and statsmodels [3]. I see that we just forgot to add "theoretical" to the > x-axis label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > as related link http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html Paul Hobson has done a lot of work for getting different probabitlity scales attached to pp-plots or generalized versions of probability plots. I think qqplots are less ambiguous because they are on the original or standardized scale. I haven't worked my way through the various interpretation of probability axis yet because I find it "not obvious". It might be easier for fields that have a tradition of using probability papers. It's planned to be added to the statsmodels probability plots so that there will be a large choice of axis labels and scales. Josef > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Fri, Jun 10, 2016 at 2:00 PM, Nathaniel Smithwrote: > On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > >> > >> You keep pounding on this example. It's a fine example, but, c'mon. **2 > is probably at least 100x more common in real source code. Maybe 1000x more > common. Why should we break the > >> common case for your edge case? > > > > > > > > It is hardly an "edge case". > > Again, **almost all** integer combinations overflow: that's the point. > > When you say "almost all", you're assuming inputs that are uniformly > sampled integers. I'm much more interested in what proportion of calls to > the ** operator involve inputs that can overflow, and in real life those > inputs are very heavily biased towards small numbers. > > (I also think we should default to raising an error on overflow in > general, with a seterr switch to turn it off when desired. But that's > another discussion...) > but x**2 is just x*x which some seem to recommend (I have no idea why), and then there are not so many "common" cases left. (However, I find integers pretty useless except in some very specific cases. When I started to cleanup scipy.stats.distribution, I threw out integers for discrete distributions and replaced all or most `**` by np.power, IIRC mainly because of the old python behavior and better numpy behavior.) (I'd rather use robust calculations that provide correct numbers, than chasing individual edge cases and save a bit of memory in some common cases. scipy stats also doesn't use factorial in almost all cases, because special.gamma and variants are more robust ) Josef > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ENH: compute many inner products quickly
On Sun, Jun 5, 2016 at 8:41 PM, Stephan Hoyerwrote: > If possible, I'd love to add new functions for "generalized ufunc" linear > algebra, and then deprecate (or at least discourage) using the older > versions with inferior broadcasting rules. Adding a new keyword arg means > we'll be stuck with an awkward API for a long time to come. > > There are three types of matrix/vector products for which ufuncs would be > nice: > 1. matrix-matrix product (covered by matmul) > 2. matrix-vector product > 3. vector-vector (inner) product > > It's straightful to implement either of the later two options by inserting > dummy dimensions and then calling matmul, but that's a pretty awkward API, > especially for inner products. Unfortunately, we already use the two most > obvious one word names for vector inner products (inner and dot). But on > the other hand, one word names are not very descriptive, and the short name > "dot" probably mostly exists because of the lack of an infix operator. > > So I'll start by throwing out some potential new names: > > For matrix-vector products: > matvecmul (if it's worth making a new operator) > > For inner products: > vecmul (similar to matmul, but probably too ambiguous) > dot_product > inner_prod > inner_product > > how about names in plural as in the PR I thought the `s` in inner_prods would signal better the broadcasting behavior dot_products ... "dots" ? (I guess not) Josef > > > > > On Sat, May 28, 2016 at 8:53 PM, Scott Sievert > wrote: > >> I recently ran into an application where I had to compute many inner >> products quickly (roughy 50k inner products in less than a second). I >> wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, >> …, xn.T @ A @ xn]` with A.shape = (1k, 1k). >> >> My first instinct was to look for a NumPy function to quickly compute >> this, such as np.inner. However, it looks like np.inner has some other >> behavior and I couldn’t get tensordot/einsum to work for me. >> >> Then a labmate pointed out that I can just do some slick matrix >> multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`. >> I opened [a PR] with this, and proposed that we define a new function >> called `inner_prods` for this. >> >> However, in the PR, @shoyer pointed out >> >> > The main challenge is to figure out how to transition the behavior of >> all these operations, while preserving backwards compatibility. Quite >> likely, we need to pick new names for these functions, though we should try >> to pick something that doesn't suggest that they are second class >> alternatives. >> >> Do we choose new function names? Do we add a keyword arg that changes >> what np.inner returns? >> >> [a PR]:https://github.com/numpy/numpy/pull/7690 >> >> >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harriswrote: > > > On Sat, Jun 4, 2016 at 6:17 PM, wrote: > >> >> >> On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris < >> charlesr.har...@gmail.com> wrote: >> >>> >>> >>> On Sat, Jun 4, 2016 at 5:27 PM, wrote: >>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: > On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: > > Also in favor of 2. Always return a float for '**' > > Even if we did want to switch to this, it's such a major > backwards-incompatible change that I'm not sure how we could actually > make the transition without first making it an error for a while. > AFAIU, only the dtype for int**int would change. So, what would be the problem with FutureWarnings as with other dtype changes that were done in recent releases. >>> The main problem I see with that is that numpy integers would behave >>> differently than Python integers, and the difference would be silent. With >>> option 1 it is possible to write code that behaves the same up to overflow >>> and the error message would supply a warning when the exponent should be >>> float. One could argue that numpy scalar integer types could be made to >>> behave like python integers, but then their behavior would differ from >>> numpy arrays and numpy scalar arrays. >>> >> >> I'm not sure I understand. >> >> Do you mean >> >> np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) >> >> or 2**2 would behave differently than np.int_(2)**np.int(2) >> > > The second case. Python returns ints for non-negative integer powers of > ints. > > >> >> ? >> >> >> AFAICS, there are many cases where numpy scalars don't behave like python >> scalars. Also, does different behavior mean different type/dtype or >> different numbers. (The first I can live with, the second requires human >> memory usage, which is a scarce resource.) >> >> >>> 2**(-2) >> 0.25 >> >> > But we can't mix types in np.arrays and we can't depend on the element > values of arrays in the exponent, but only on their type, so 2 ** array([1, > -1]) must contain a single type and making that type float would surely > break code. Scalar arrays, which are arrays, have the same problem. We > can't do what Python does with ndarrays and numpy scalars, and it would be > best to be consistent. Division was a simpler problem to deal with, as > there were two operators, `//` and `/`. If there were two exponential > operators life would be simpler. > What bothers me with the entire argument is that you are putting higher priority on returning a dtype than on returning the correct numbers. Reverse the argument: Because we cannot make the return type value dependent we **have** to return float, in order to get the correct number. (It's an argument not what we really have to do.) Which code really breaks, code that gets a float instead of an int, and with some advance warning users that really need to watch their memory can use np.power. My argument before was that I think a simple operator like `**` should work for 90+% of the users and match their expectation, and the users that need to watch dtypes can as well use the function. (I can also live with the exception from case 1., but I really think this is like the python 2 integer division "surprise") Josef > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harriswrote: > > > On Sat, Jun 4, 2016 at 5:27 PM, wrote: > >> >> >> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: >> >>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >>> > Also in favor of 2. Always return a float for '**' >>> >>> Even if we did want to switch to this, it's such a major >>> backwards-incompatible change that I'm not sure how we could actually >>> make the transition without first making it an error for a while. >>> >> >> AFAIU, only the dtype for int**int would change. So, what would be the >> problem with FutureWarnings as with other dtype changes that were done in >> recent releases. >> >> > The main problem I see with that is that numpy integers would behave > differently than Python integers, and the difference would be silent. With > option 1 it is possible to write code that behaves the same up to overflow > and the error message would supply a warning when the exponent should be > float. One could argue that numpy scalar integer types could be made to > behave like python integers, but then their behavior would differ from > numpy arrays and numpy scalar arrays. > I'm not sure I understand. Do you mean np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) or 2**2 would behave differently than np.int_(2)**np.int(2) ? AFAICS, there are many cases where numpy scalars don't behave like python scalars. Also, does different behavior mean different type/dtype or different numbers. (The first I can live with, the second requires human memory usage, which is a scarce resource.) >>> 2**(-2) 0.25 Josef > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smithwrote: > On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: > > Also in favor of 2. Always return a float for '**' > > Even if we did want to switch to this, it's such a major > backwards-incompatible change that I'm not sure how we could actually > make the transition without first making it an error for a while. > AFAIU, only the dtype for int**int would change. So, what would be the problem with FutureWarnings as with other dtype changes that were done in recent releases. Josef > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Sat, Jun 4, 2016 at 3:49 PM, Matthew Brettwrote: > On Sat, Jun 4, 2016 at 12:47 PM, wrote: > > > > > > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris < > charlesr.har...@gmail.com> > > wrote: > >> > >> > >> > >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris > >> wrote: > >>> > >>> Hi All, > >>> > >>> I've made a new post so that we can make an explicit decision. AFAICT, > >>> the two proposals are > >>> > >>> Integers to negative integer powers raise an error. > >>> Integers to integer powers always results in floats. > >>> > >>> My own sense is that 1. would be closest to current behavior and using > a > >>> float exponential when a float is wanted is an explicit way to > indicate that > >>> desire. OTOH, 2. would be the most convenient default for everyday > numerical > >>> computation, but I think would more likely break current code. I am > going to > >>> come down on the side of 1., which I don't think should cause too many > >>> problems if we start with a {Future, Deprecation}Warning explaining the > >>> workaround. > > > > > > I'm in favor of 2. always float for `**` > > I don't see enough pure integer usecases to throw away a nice operator. > > I can't make sense of 'throw away a nice operator' - you still have > arr ** 2.0 if you want floats. > but if we have x**y, then we always need to check the dtype. If we don't we get RuntimeErrors or overflow, where we might have forgotten to include the relevant cases in the unit tests. numpy has got pickier with using only integers in some areas (index, ...). Now we have to watch out that we convert back to floats for power. Not a serious problem for a library with unit tests and enough users who run into the dtype issues and report them. But I'm sure I will have to fix any scripts or interactive work that I'm writing. It's just another thing to watch out for, after we managed to get rid of integer division 1/2=?. Josef > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harriswrote: > > > On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> Hi All, >> >> I've made a new post so that we can make an explicit decision. AFAICT, >> the two proposals are >> >> >>1. Integers to negative integer powers raise an error. >>2. Integers to integer powers always results in floats. >> >> My own sense is that 1. would be closest to current behavior and using a >> float exponential when a float is wanted is an explicit way to indicate >> that desire. OTOH, 2. would be the most convenient default for everyday >> numerical computation, but I think would more likely break current code. I >> am going to come down on the side of 1., which I don't think should cause >> too many problems if we start with a {Future, Deprecation}Warning >> explaining the workaround. >> > I'm in favor of 2. always float for `**` I don't see enough pure integer usecases to throw away a nice operator. Josef > > Note that current behavior in 1.11 is such a mess > ``` > In [5]: array([0], dtype=int64) ** -1 > Out[5]: array([-9223372036854775808]) > > In [6]: array([0], dtype=uint64) ** -1 > Out[6]: array([ inf]) > ``` > That the simplest approach might be to start by raising an error rather > than by trying to maintain current behavior and issuing a warning. > > Chuck > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Thu, Jul 17, 2014 at 4:21 PM, josef.p...@gmail.com wrote: On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote: On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote: Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: And one more comment: I debug broken tests pretty often. My favorites in pdb are np.max(np.abs(x - y)) and np.max(np.abs(x / y - 1)) to see how much I would have to adjust atol and rtol in assert_allclose in the tests to make them pass, and to decide whether this is an acceptable numerical difference or a bug. allclose doesn't tell me anything and I almost never use it. Josef np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. Just to explain, p-values, sf of the distributions are usually accurate at 1e-30 or 1e-50 or something like that. And when we test the tails of the distributions we use that the relative error is small and the absolute error is tiny. We would need to do a grep to see how many cases there actually are in scipy and statsmodels, before we change it because for some use cases we only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR. Josef Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote: Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote: On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote: Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. Just to explain, p-values, sf of the distributions are usually accurate at 1e-30 or 1e-50 or something like that. And when we test the tails of the distributions we use that the relative error is small and the absolute error is tiny. We would need to do a grep to see how many cases there actually are in scipy and statsmodels, before we change it because for some use cases we only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR. Josef Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] problems with mailing list ?
Are the problems with sending out the messages with the mailing lists? I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote: On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: On 16 Jul 2014 10:26, Tony Yu tsy...@gmail.com wrote: Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any. assert_allclose is one of our (statsmodels) most frequently used numpy function this is not informative: `np.allclose(x, 1e-30)` since there are keywords either np.assert_allclose(x, atol=1e-30) if I want to be close to zero or np.assert_allclose(x, rtol=1e-11, atol=1e-25) if we have a mix of large numbers and zeros in an array. Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata. atol=1e-8 is not close to zero in most cases in my experience. The numpy.testing assert functions are some of the most useful functions in numpy, and heavily used code. Josef Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Thu, Jul 17, 2014 at 11:37 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu tsy...@gmail.com wrote: It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done. Currently we have: allclose: rtol=1e-5, atol=1e-8 assert_allclose: rtol=1e-7, atol=0 Why would it be painful to change assert_allclose to match allclose? It would weaken some tests, but no code would break. We might break our code, if suddenly our test suite doesn't do what it is supposed to do. (rough guess: 40% of the statsmodels code are unit tests.) Josef -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Fri, Jul 18, 2014 at 2:03 PM, josef.p...@gmail.com wrote: On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Jul 18, 2014 at 12:38 PM, josef.p...@gmail.com wrote: On Thu, Jul 17, 2014 at 4:07 PM, josef.p...@gmail.com wrote: If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any. I wrote allclose because it's shorter, but my point is that assert_allclose and allclose should use the same criterion, and was making a suggestion for what that shared criterion might be. assert_allclose is one of our (statsmodels) most frequently used numpy function this is not informative: `np.allclose(x, 1e-30)` since there are keywords either np.assert_allclose(x, atol=1e-30) I think we might be talking past each other here -- 1e-30 here is my gold p-value that I'm hoping x will match, not a tolerance argument. my mistake if I want to be close to zero or np.assert_allclose(x, rtol=1e-11, atol=1e-25) if we have a mix of large numbers and zeros in an array. Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata. I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails? atol=1e-8 is not close to zero in most cases in my experience. If I understand correctly (Tony?) the problem here is that another common use case for assert_allclose is in cases like assert_allclose(np.sin(some * complex ** calculation / (that - should - be * zero)), 0) For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than normal floating point error. These calculations usually involve normal sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases. my problem is that there is no normal floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing. this test two different functions/methods that calculate the same thing (Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) I don't care about errors that are smaller that 1e-25 for example testing p-values against Stata (Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Fri, Jul 18, 2014 at 2:29 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Jul 18, 2014 at 7:03 PM, josef.p...@gmail.com wrote: On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith n...@pobox.com wrote: For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than normal floating point error. These calculations usually involve normal sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases. my problem is that there is no normal floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing. this test two different functions/methods that calculate the same thing (Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) I don't care about errors that are smaller that 1e-25 for example testing p-values against Stata (Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit tests. ...these are all cases where there are not exact zeros, so my proposal would not affect them? I can see the argument that we shouldn't provide any default rtol/atol at all because there is no good default, but... I don't think putting that big of a barrier in front of newbies writing their first tests is a good idea. I think atol=0 is **very** good for newbies, and everyone else. If expected is really zero or very small, then it immediately causes a test failure, and it's relatively obvious how to fix it. I worry a lot more about unit tests that don't bite written by newbies or not so newbies who just use a default. That's one of the problems we had with assert_almost_equal, and why I was very happy to switch to assert_allclose with it's emphasis on relative tolerance. Josef -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Fri, Jul 18, 2014 at 2:44 PM, Nathaniel Smith n...@pobox.com wrote: On 18 Jul 2014 19:31, josef.p...@gmail.com wrote: Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata. I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails? I was thinking of the case when we only see pvalue 1e-16 or something like this, and we replace this by assert close to zero. which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` with maybe an additional rtol=1e-11 if we have an array of pvalues where some are large (0.5). This example is also handled correctly by my proposal :-) depends on the details of your proposal alternative: desired is exactly zero means assert_equal (Pdb) self.res_reg.params[m:] array([ 0., 0., 0.]) (Pdb) assert_allclose(0, self.res_reg.params[m:]) (Pdb) assert_allclose(0, self.res_reg.params[m:], rtol=0, atol=0) (Pdb) This test uses currently assert_almost_equal with decimal=4 :( regularized estimation with hard thresholding: the first m values are estimate not equal zero, the m to the end elements are exactly zero. This is discrete models fit_regularized which predates numpy assert_allclose. I haven't checked what the unit test of Kerby's current additions for fit_regularized looks like. unit testing is serious business: I'd rather have good unit test in SciPy related packages than convincing a few more newbies that they can use the defaults for everything. Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
On Fri, Jul 18, 2014 at 4:21 PM, Charles G. Waldman char...@crunch.io wrote: Joseph Martinot-Lagarde writes: Compare what's comparable: That's fair. In addition, you have to use AltGr on some keyboards to get the brackets Wow, it must be rather painful to do any real programming on such a keyboard! - C On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde joseph.martinot-laga...@m4x.org wrote: Le 18/07/2014 20:42, Charles G. Waldman a écrit : Well, if the goal is shorthand, typing numpy.array(numpy.mat()) won't please many users. But the more I think about it, the less I think Numpy should support this (non-Pythonic) input mode. Too much molly-coddling of new users! When doing interactive work I usually just type: np.array([[1,2,3], ... [4,5,6], ... [7,8,9]]) which is (IMO) easier to read: e.g. it's not totally obvious that 1,0,0;0,1,0;0,0,1 represents a 3x3 identity matrix, but [[1,0,0], [0,1,0], [0,0,1]] is pretty obvious. Compare what's comparable: [[1,0,0], [0,1,0], [0,0,1]] vs 1 0 0; 0 1 0; 0 0 1 or 1 0 0; 0 1 0; 0 0 1 [[1,0,0], [0,1,0], [0,0,1]] vs 1 0 0; 0 1 0; 0 0 1 The difference in (non-whitespace) chars is 19 vs 25, so the shorthand doesn't seem to save that much. Well, it's easier to type (twice the same character) than [], and you have no risk in swapping en opening and a closing bracket. In addition, you have to use AltGr on some keyboards to get the brackets. It doesn't boils down to a number of characters. Just my €0.02, It's the year of the notebook. notebooks are reusable. notebooks correctly align the brackets in the second and third line and it looks pretty, just like a matrix (But, I don't have to teach newbies, and often I even correct whitespace on the commandline, because it looks ugly and I will eventually copy it to a script file.) Josef no broken windows! well, except for the ones I don't feel like fixing right now. - C On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac alan.is...@gmail.com wrote: On 7/18/2014 12:45 PM, Mark Miller wrote: If the true goal is to just allow quick entry of a 2d array, why not just advocate using a = numpy.array(numpy.mat(1 2 3; 4 5 6; 7 8 9)) It's even simpler: a = np.mat(' 1 2 3;4 5 6;7 8 9').A I'm not putting a dog in this race. Still I would say that the reason why such proposals miss the point is that there are introductory settings where one would like to explain as few complications as possible. In particular, one might prefer *not* to discuss the existence of a matrix type. As an additional downside, this is only good for 2d, and there have been proposals for the new array builder to handle other dimensions. fwiw, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
On Fri, Jul 18, 2014 at 5:04 PM, Joseph Martinot-Lagarde joseph.martinot-laga...@m4x.org wrote: Le 18/07/2014 22:46, Chris Barker a écrit : On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde joseph.martinot-laga...@m4x.org mailto:joseph.martinot-laga...@m4x.org wrote: In addition, you have to use AltGr on some keyboards to get the brackets. If it's hard to type square brackets -- you're kind of dead in the water with Python anyway -- this is not going to help. -Chris Welcome to the azerty world ! ;) It's not that hard to type, just a bit more involved. My biggest problem is that you have to type the opening and closing bracket for each line, with a comma in between. It will always be harder and more error prone than a single semicolon, whatever the keyboard. My use case is not teaching but doing quick'n'dirty computations with a few values. Sometimes these values are copy-pasted from a space separated file, or from a printed array in another console. Having to add comas and bracket makes simple computations less easy. That's why I often use Octave for these. my copy paste approaches for almost quick'n'dirty (no semicolons): given: a b c 1 2 3 4 5 6 7 8 9 (select Ctrl-C) pandas.read_clipboard(sep=' ') a b c 0 1 2 3 1 4 5 6 2 7 8 9 np.asarray(pandas.read_clipboard()) array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=int64) pandas.read_clipboard().values array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=int64) arr = np.array('''\ 1 2 3 4 5 6 7 8 9'''.split(), float).reshape(-1, 3) the last is not so quick and dirty but reusable and reused. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: On 7/7/2014 7:17 AM, Daπid wrote: How about a new one? np.matarray, for MATLAB array. How about `str2arr` or even `build`, since teaching appears to be a focus. Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d. It seems unambiguous to allow '1 2 3;;' to be 3d, or even '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting that, not urging that it be implemented. Probably overdoing it, but if we plan on more then just this, what about banning such functions to something like numpy.interactive/numpy.helpers which you can then import * (or better specific functions) from? I think the fact that you need many imports on startup should rather be fixed by an ipython scientific mode or other startup imports. Is this whole thing really worth it? We get back to a numpy pylab. First users learn the dirty shortcuts, and then they have to learn how to do it properly. (I'm using quite often string split and reshape for copy-pasted text tables.) Josef - Sebastian Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
On Mon, Jul 7, 2014 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Jul 7, 2014 at 3:28 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-07-07 at 09:50 -0400, josef.p...@gmail.com wrote: On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: On 7/7/2014 7:17 AM, Daπid wrote: How about a new one? np.matarray, for MATLAB array. How about `str2arr` or even `build`, since teaching appears to be a focus. Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d. It seems unambiguous to allow '1 2 3;;' to be 3d, or even '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting that, not urging that it be implemented. Probably overdoing it, but if we plan on more then just this, what about banning such functions to something like numpy.interactive/numpy.helpers which you can then import * (or better specific functions) from? I think the fact that you need many imports on startup should rather be fixed by an ipython scientific mode or other startup imports. Is this whole thing really worth it? We get back to a numpy pylab. First users learn the dirty shortcuts, and then they have to learn how to do it properly. Yeah, you are right. Just a bit afraid of creating too many such functions that I am not sure are very useful/used much. For example I am not sure that many use np.r_ or np.c_ Yeah, we definitely have too many random bits of API around overall. But I think this one is probably worthwhile. It doesn't add any real complexity (no new types, trivial for readers to understand the first time they encounter it, etc.), and it addresses a recurring perceived shortcoming of numpy that people run into in the first 5 minutes of use, at a time when it's pretty easy to give up and go back to Matlab. And, it removes one of the perceived advantages of np.matrix over np.ndarray, so it smooths our way for eventually phasing out np.matrix. I'm not sure that preserving np.arrtab is that important (tab here only saves 1 character!), but some possible alternatives for short names: np.marr (matlab-like array construction) np.sarr (string array) np.parse short like np.s (didn't know there is already s_) something long like np.fromstring('1 2', sep=' ') array([ 1., 2.]) np.fromstring2d('1 2 3; 5 3.4 7') array([[ 1. , 2. , 3. ], [ 5. , 3.4, 7. ]]) Josef -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Switch to using ATLAS for OSX binary wheels
On Mon, Jun 16, 2014 at 7:10 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Jun 15, 2014 at 10:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sat, Jun 14, 2014 at 11:56 AM, Matthew Brett matthew.br...@gmail.com wrote: On Friday, June 13, 2014, Ralf Gommers ralf.gomm...@gmail.com wrote: On Fri, Jun 13, 2014 at 2:07 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Summary : I'm planning to upload OSX wheels for numpy and scipy using the ATLAS blas / lapack library instead of the default OSX Accelerate framework. We've run into some trouble with a segfault in recent OSX Accelerate: https://github.com/numpy/numpy/issues/4007 and Accelerate also doesn't play well with multiprocessing. https://github.com/numpy/numpy/issues/4776 Because there's nothing I love more than half-day compilation runs on my laptop, I've built ATLAS binaries with gcc 4.8, and linked numpy and scipy to them to make OSX wheels. These pass all tests in i386 and x86_64 mode, including numpy, scipy, matplotlib, pandas: https://travis-ci.org/matthew-brett/scipy-stack-osx-testing/builds/27442987 The build process needs some automating, but it's recorded here: https://github.com/matthew-brett/numpy-atlas-binaries It's possible to get travis-ci to build these guys from a bare machine and then upload them somewhere, but I haven't tried to do that. Meanwhile Sturla kindly worked up a patch to numpy to work round the Accelerate segfault [1]. I haven't tested that, but given I'd already built the wheels, I prefer the ATLAS builds because they work with multiprocessing. I propose uploading these wheels as the default for numpy and scipy. Does anyone have any objection or comments before I go ahead and do that? From your README and wscript I don't see what numpy version you're using to compile scipy against. I got the impression that you used 1.8.1, but it should be numpy 1.5.1 for the 2.7 build, and 1.7.1 for 3.x. I've tried the scipy 0.14.0 python2.7 wheel, but I get import errors (see below). Your wheels should work with all common Python installs (mine is homebrew) right? Ralf $ python2.7 -c import scipy; scipy.test() Running unit tests for scipy NumPy version 1.9.0.dev-056ab73 NumPy is installed in /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy SciPy version 0.14.0 SciPy is installed in /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy Python version 2.7.5 (default, Jun 18 2013, 21:21:44) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] nose version 1.3.0 E...EEEE == ERROR: Failure: ImportError (dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so, 2): Symbol not found: _PyModule_Create2 Referenced from: /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so Expected in: flat namespace in /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so) -- Traceback (most recent call last): File /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/loader.py, line 413, in loadTestsFromName addr.filename, addr.module) File /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py, line 47, in importFromPath return self.importFromDir(dir_path, fqname) File /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/importer.py, line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/__init__.py, line 27, in module from . import vq, hierarchy File /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.py, line 175, in module from . import _hierarchy_wrap ImportError: dlopen(/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so, 2): Symbol not found: _PyModule_Create2 Referenced from: /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/_hierarchy_wrap.so Expected in: flat namespace in
Re: [Numpy-discussion] numpy.random selection from a loglogistic distribution?
On Mon, Jun 16, 2014 at 1:38 PM, Liz VanWormer evanwor...@ucdavis.edu wrote: Dear all, I'm a novice Python user, and I need to draw random variables from a loglogistic distribution. I've used the numpy.random command in the past to select variables from supported distributions (beta, normal, lognormal, etc). Is there a way to add in a distribution to be used with the numpy.random command? You or someone can add new distribution for numpy.random. However, scipy has the fisk distribution which according to wikipedia is the same as log-logistic. fisk is implemented as a special case of burr and has an explicit inverse cdf (ppf) So using fisk.rvs should be reasonably fast for vectorized calls, since the overhead is much larger than for numpy.random functions. http://en.wikipedia.org/wiki/Log-logistic_distribution to make distribution names more fun: Wikipedia:generalized log-logistic distribution These include the Burr Type XII distribution (also known as the Singh-Maddala distribution) Josef Thank you for your insight, Liz ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release
On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback jeffreb...@gmail.com wrote: The one pandas test failure that is valid: ERROR: test_interp_regression (pandas.tests.test_generic.TestSeries) has been fixed in pandas master / 0.14.1 (prob releasing in 1 month). (the other test failures are for clipboard / network issues) Thanks for the feedback. snip Looks to me like a fair number of the other test failures are actually bugs in the tests, or could be argued to be so. Or rather, actual bugs, but not in numpy. I can see this will take a while to settle ;) not really a bug on our side given the old numpy behavior I was surprised about the failure because it still produces the correct result. params[nuis_param_index] = nuisance_params ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (0,) As far as I have figured out based on Christoph's test failures `nuis_param_index` is array([], dtype=int32) and because numpy didn't assign anything, `nuisance_params` was picked essentially arbitrary along this code path x = np.arange(5.) we_dont_care_what_this_is = np.random.randn(10) x[[]] = we_dont_care_what_this_is x array([ 0., 1., 2., 3., 4.]) Does x[[]] = [] work with numpy 1.9? Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.9.0 beta release
On Mon, Jun 9, 2014 at 11:10 PM, josef.p...@gmail.com wrote: On Mon, Jun 9, 2014 at 8:49 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:47 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:23 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jun 9, 2014 at 6:21 PM, Jeff Reback jeffreb...@gmail.com wrote: The one pandas test failure that is valid: ERROR: test_interp_regression (pandas.tests.test_generic.TestSeries) has been fixed in pandas master / 0.14.1 (prob releasing in 1 month). (the other test failures are for clipboard / network issues) Thanks for the feedback. snip Looks to me like a fair number of the other test failures are actually bugs in the tests, or could be argued to be so. Or rather, actual bugs, but not in numpy. I can see this will take a while to settle ;) not really a bug on our side given the old numpy behavior forgot to define: our side = statsmodels I was surprised about the failure because it still produces the correct result. params[nuis_param_index] = nuisance_params ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (0,) As far as I have figured out based on Christoph's test failures `nuis_param_index` is array([], dtype=int32) and because numpy didn't assign anything, `nuisance_params` was picked essentially arbitrary along this code path x = np.arange(5.) we_dont_care_what_this_is = np.random.randn(10) x[[]] = we_dont_care_what_this_is x array([ 0., 1., 2., 3., 4.]) Does x[[]] = [] work with numpy 1.9? Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)
On Thu, Jun 5, 2014 at 8:58 AM, Todd toddr...@gmail.com wrote: On 5 Jun 2014 14:28, David Cournapeau courn...@gmail.com wrote: On Thu, Jun 5, 2014 at 9:44 AM, Todd toddr...@gmail.com wrote: On 5 Jun 2014 02:57, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 4, 2014 at 7:18 AM, Travis Oliphant tra...@continuum.io wrote: And numpy will be much harder to replace than numeric -- numeric wasn't the most-imported package in the pythonverse ;-). If numpy is really such a core part of python ecosystem, does it really make sense to keep it as a stand-alone package? Rather than thinking about a numpy 2, might it be better to be focusing on getting ndarray and dtype to a level of quality where acceptance upstream might be plausible? There has been discussions about integrating numpy a long time ago (can't find a reference right now), and the consensus was that this was possible in its current shape nor advisable. The situation has not changed. Putting something in the stdlib means it basically cannot change anymore: API compatibility requirements would be stronger than what we provide even now. NumPy is also a large codebase which would need some major clean up to be accepted, etc... David I am not suggesting merging all of numpy, only ndarray and dtype (which I know is a huge job itself). And perhaps not even all of what us currently included in those, some methods could be split out to their own functions. And any numpy 2.0 would also imply a major code cleanup. So although ndarray and dtype are certainly not ready for such a thing right now, if you are talking about numpy 2.0 already, perhaps part of that discussion could involve a plan to get the code into a state where such a move might be plausible. Even if the merge doesn't actually happen, having the code at that quality level would still be a good thing. I agree that the relationship between numpy and python has not changed very much in the last few years, but I think the scientific computing landscape is changing. The latter issue is where my primary concern lies. I don't think it would have any effect on scientific computing users. It might be useful for other users that occasionally want to do a bit of array processing. Scientific users need the extended SciPy stack and not a part of numpy that can be imported from the standard library. For example in Data Science, where I pay more attention and where Python is getting pretty popular, the usual recommended list requires numpy scipy and 5 to 10 more python libraries. Should pandas also go into the python standard library? Python 3.4 got a statistics library, but I don't think it has any effect on the potential statsmodels user base. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)
On Thu, Jun 5, 2014 at 8:40 AM, David Cournapeau courn...@gmail.com wrote: On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant tra...@continuum.io wrote: Believe me, I'm all for incremental changes if it is actually possible and doesn't actually cost more. It's also why I've been silent until now about anything we are doing being a candidate for a NumPy 2.0. I understand the challenges of getting people to change. But, features and solid improvements *will* get people to change --- especially if their new library can be used along with the old library and the transition can be done gradually. Python 3's struggle is the lack of features. At some point there *will* be a NumPy 2.0. What features go into NumPy 2.0, how much backward compatibility is provided, and how much porting is needed to move your code from NumPy 1.X to NumPy 2.X is the real user question --- not whether it is characterized as incremental change or re-write. What I call a re-write and what you call an incremental-change are two points on a spectrum and likely overlap signficantly if we really compared what we are thinking about. One huge benefit that came out of the numeric / numarray / numpy transition that we mustn't forget about was actually the extended buffer protocol and memory view objects. This really does allow multiple array objects to co-exist and libraries to use the object that they prefer in a way that did not exist when Numarray / numeric / numpy came out.So, we shouldn't be afraid of that world. The existence of easy package managers to update environments to try out new features and have applications on a single system that use multiple versions of the same library is also something that didn't exist before and that will make any transition easier for users. One thing I regret about my working on NumPy originally is that I didn't have the foresight, skill, and understanding to work more on a more extended and better designed multiple-dispatch system so that multiple array objects could participate together in an expression flow. The __numpy_ufunc__ mechanism gives enough capability in that direction that it may be better now. Ultimately, I don't disagree that NumPy can continue to exist in incremental change mode ( though if you are swapping out whole swaths of C-code for Cython code --- it sounds a lot like a re-write) as long as there are people willing to put the effort into changing it. I think this is actually benefited by the existence of other array objects that are pushing the feature envelope without the constraints --- in much the same way that the Python standard library is benefitted by many versions of different capabilities being tried out before moving into the standard library. I remain optimistic that things will continue to improve in multiple ways --- if a little messier than any of us would conceive individually. It *is* great to see all the PR's coming from multiple people on NumPy and all the new energy around improving things whether great or small. @nathaniel IIRC, one of the objections to the missing values work was that it changed the underlying array object by adding a couple of variables to the structure. I'm willing to do that sort of thing, but it would be good to have general agreement that that is acceptable. I think changing the ABI for some versions of numpy (2.0 , whatever) is acceptable. There is little doubt that the ABI will need to change to accommodate a better and more flexible architecture. Changing the C API is more tricky: I am not up to date to the changes from the last 2-3 years, but at that time, most things could have been changed internally without breaking much, though I did not go far enough to estimate what the performance impact could be (if any). My impression is that you can do it once (in a while) so that no more than two incompatible versions of numpy are alive at the same time. It doesn't look worse to me than supporting a new python version, but doubles the number of binaries and wheels. (Supporting python 3.4 for cython based projects was hoping or helping that cython takes care of it. And cython developers took care of it. ) Josef As to blaze/dynd, I'd like to steal bits here and there, and maybe in the long term base numpy on top of it with a compatibility layer. There is a lot of thought and effort that has gone into those projects and we should use what we can. As is, I think numpy is good for another five to ten years and will probably hang on for fifteen, but it will be outdated by the end of that period. Like great whites, we need to keep swimming just to have oxygen. Software projects tend to be obligate ram ventilators. The Python 3 experience is definitely something we want to avoid. And while blaze does big data and offers some nice features, I don't know
Re: [Numpy-discussion] ANN: Pandas 0.14.0 released
No comment, Josef On Fri, May 30, 2014 at 6:16 PM, Neal Becker ndbeck...@gmail.com wrote: pip install --user --up pandas Downloading/unpacking pandas from https://pypi.python.org/packages/source/p/pandas/pandas-0.14.0.tar.gz#md5=b775987c0ceebcc8d5ace4a1241c967a ... Downloading/unpacking numpy=1.6.1 from https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678 (from pandas) Downloading numpy-1.8.1.tar.gz (3.8MB): 3.8MB downloaded Running setup.py egg_info for package numpy Running from numpy source directory. warning: no files found matching 'tools/py3tool.py' warning: no files found matching '*' under directory 'doc/f2py' warning: no previously-included files matching '*.pyc' found anywhere in distribution warning: no previously-included files matching '*.pyo' found anywhere in distribution warning: no previously-included files matching '*.pyd' found anywhere in distribution Downloading/unpacking six from https://pypi.python.org/packages/source/s/six/six-1.6.1.tar.gz#md5=07d606ac08595d795bf926cc9985674f (from python-dateutil-pandas) Downloading six-1.6.1.tar.gz Running setup.py egg_info for package six no previously-included directories found matching 'documentation/_build' Installing collected packages: pandas, pytz, numpy, six What? I already have numpy-1.8.0 installed (also have six, pytz). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] smoothing function
On Thu, May 15, 2014 at 8:04 AM, rodrigo koblitz rodrigokobl...@gmail.comwrote: Buenos, I'm reading Zuur book (ecology models with R) and try make it entire in python. Have this function in R: M4 - gam(So ∼ s(De) + factor(ID), subset = I1) the 's' term indicated with So is modelled as a smoothing function of De I'm looking for something close to this in python. These kind of general questions are better asked on the scipy-user mailing list which covers more general topics than numpy-discussion. As far as I know, GAMs are not available in python, at least I never came across any. statsmodels has an ancient GAM in the sandbox that has never been connected to any smoother, since, lowess, spline and kernel regression support was missing. Nobody is working on that right now. If you have only a single nonparametric variable, then statsmodels also has partial linear model based on kernel regression, that is not cleaned up or verified, but Padarn is currently working on this. I think in this case using a penalized linear model with spline basis functions would be more efficient, but there is also nothing clean available, AFAIK. It's not too difficult to write the basic models, but it takes time to figure out the last 10% and to verify the results and write unit tests. If you make your code publicly available, then I would be very interested in a link. I'm trying to collect examples from books that have a python solution. Josef Someone can help me? abraços, Koblitz ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] smoothing function
On Thu, May 15, 2014 at 12:17 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, May 15, 2014 at 1:04 PM, rodrigo koblitz rodrigokobl...@gmail.com wrote: Buenos, I'm reading Zuur book (ecology models with R) and try make it entire in python. Have this function in R: M4 - gam(So ∼ s(De) + factor(ID), subset = I1) the 's' term indicated with So is modelled as a smoothing function of De I'm looking for something close to this in python. The closest thing that doesn't require writing your own code is probably to use patsy's [1] support for (simple unpenalized) spline basis transformations [2]. I think using statsmodels this works like: import statsmodels.formula.api as smf # adjust '5' to taste -- bigger = wigglier, less bias, more overfitting results = smf.ols(So ~ bs(De, 5) + C(ID), data=my_df).fit() print results.summary() Nice To graph the resulting curve you'll want to use the results to somehow do prediction -- I'm not sure what the API for that looks like in statsmodels. If you need help figuring it out then the asking on the statsmodels list or stackoverflow is probably the quickest way to get help. seems to work (in a very simple made up example) results.predict({'De':np.arange(1,5), 'ID':['a']*4}, transform=True) #array([ 0.75 , 1.0833, 0.75 , 0.4167]) Josef -n [1] http://patsy.readthedocs.org/en/latest/ [2] http://patsy.readthedocs.org/en/latest/builtins-reference.html#patsy.builtins.bs -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Sun, Apr 27, 2014 at 6:06 PM, Carl Kleffner cmkleff...@gmail.com wrote: A possible option is to install the toolchain inside site-packages and to deploy it as PYPI wheel or wininst packages. The PATH to the toolchain could be extended during import of the package. But I have no idea, whats the best strategy to additionaly install ATLAS or other third party libraries. What I did in the past is just to download the ATLAS binaries from the scipy/numpy wiki and move them into the Python/Dlls directory. IIRC My impression was that finding ATLAS binaries was the difficult part, not moving them into the right directory. Cheers, Carl 2014-04-27 23:46 GMT+02:00 Matthew Brett matthew.br...@gmail.com: Hi, On Sun, Apr 27, 2014 at 2:34 PM, Carl Kleffner cmkleff...@gmail.com wrote: Hi, I will definitly don't have not time until thursday this week working out the github workflow for a numpy pull request. So feel free to do it for me. OK - I will have a go at this tomorrow. BTW: There is a missing feature in the mingw-w64 toolchain. By now it features linking to msvcrt90 runtime only. I have do extend the specs file to allow linking to msvcr100 with an addional flag. Or create a dedicated toolchain - what do you think? I don't know. Is this a discussion that should go to the mingw-w64 list do you think? It must be a very common feature. As you know, I'm really hoping it will be possible make a devkit for Python similar to the Ruby devkits [1]. I got my entire initial setup on the computer I'm using right now through python-xy, including MingW 32. The only thing I ever had to do was to create the `distutils.cfg` in new python install. python-xy relies on the availability of a open source development environment for numpy and scipy, and has been restricted so far to python 32 versions. winpython is only a python distribution and is also available for 64bit (with Gohlke binaries I think) I think it would be very helpful to get python-xy set up for development for 64 bit versions, now that the toolchain with MingW is available. I'm skeptical about having lot's of distributions that install all their own full toolchain (I always worry about which one is actually on the path. I deleted my first git for Windows version because it came with a built-in MSYS/MingW toolchain and now just use the nice and small portable version.) The ideal would be a devkit that transparently picked up 32 vs 64 bit, and MSVC runtime according to the Python version. For example, OSX compilation automatically picks up the OSX SDK with which the relevant Python was built. Do you think something like this is possible? That would be a great improvement for people building extensions and wheels on Windows. How does MingW64 decide whether to build 32 or to build 64 bit versions? Does the python version matter for MingW? or should this pick up one of the Visual SDK's that the user needs to install? Josef Cheers, Matthew [1] http://rubyinstaller.org/add-ons/devkit/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Thu, Apr 24, 2014 at 5:26 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 Strange. That second one looks familiar, at least the -96 part. Wonder why this doesn't show up with the MKL builds. ok tried again, this time deleting the old numpy directories before installing Ran 4760 tests in 42.124s OK (KNOWNFAIL=10, SKIP=8) nose.result.TextTestResult run=4760 errors=0 failures=0 so pip also seems to be reusing leftover files. all clear. Running the statsmodels test suite, I get a failure in test_discrete.TestProbitCG where fmin_cg converges to something that differs in the 3rd decimal. I usually only test the 32-bit version, so I don't know if this is specific to this scipy version, but we haven't seen this in a long time. I used our nightly binaries http://statsmodels.sourceforge.net/binaries/ That's interesting, you saw also we're getting failures on the tests for powell optimization because of small unit-at-last-place differences in the exp function in mingw-w64. Is there any chance you can track down
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Sat, Apr 26, 2014 at 10:10 AM, josef.p...@gmail.com wrote: On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Thu, Apr 24, 2014 at 5:26 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 Strange. That second one looks familiar, at least the -96 part. Wonder why this doesn't show up with the MKL builds. ok tried again, this time deleting the old numpy directories before installing Ran 4760 tests in 42.124s OK (KNOWNFAIL=10, SKIP=8) nose.result.TextTestResult run=4760 errors=0 failures=0 so pip also seems to be reusing leftover files. all clear. Running the statsmodels test suite, I get a failure in test_discrete.TestProbitCG where fmin_cg converges to something that differs in the 3rd decimal. I usually only test the 32-bit version, so I don't know if this is specific to this scipy version, but we haven't seen this in a long time. I used our nightly binaries http://statsmodels.sourceforge.net/binaries/ That's interesting, you saw also we're getting failures on the tests for powell optimization because of small unit-at-last-place differences in
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Sat, Apr 26, 2014 at 10:20 AM, josef.p...@gmail.com wrote: On Sat, Apr 26, 2014 at 10:10 AM, josef.p...@gmail.com wrote: On Fri, Apr 25, 2014 at 1:21 AM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Thu, Apr 24, 2014 at 5:26 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installersnumpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 Strange. That second one looks familiar, at least the -96 part. Wonder why this doesn't show up with the MKL builds. ok tried again, this time deleting the old numpy directories before installing Ran 4760 tests in 42.124s OK (KNOWNFAIL=10, SKIP=8) nose.result.TextTestResult run=4760 errors=0 failures=0 so pip also seems to be reusing leftover files. all clear. Running the statsmodels test suite, I get a failure in test_discrete.TestProbitCG where fmin_cg converges to something that differs in the 3rd decimal. I usually only test the 32-bit version, so I don't know if this is specific to this scipy version, but we haven't seen this in a long time. I used our nightly binaries http://statsmodels.sourceforge.net/binaries/ That's interesting, you saw also we're getting failures on the tests for
Re: [Numpy-discussion] 1.9.x branch
On Sat, Apr 26, 2014 at 1:12 PM, Sebastian Berg sebast...@sipsolutions.netwrote: On Mi, 2014-04-23 at 10:11 -0400, josef.p...@gmail.com wrote: On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote: Hi All, I'd like to branch 1.9.x at the end of the month. There are a couple of reasons for the timing. First, we have a lot of new stuff in the development branch. Second, there is work ongoing in masked arrays that I'd like to keep out of the release so that it has more time to settle. Third, it's past time ;) Sounds good. There are currently a number of 1.9.0 blockers, which can be seen here. Datetime timezone handling broken in 1.7.x I don't think there is time to get this done for 1.9.0 and it needs to be pushed off to 1.10.0. Return multiple field selection as ro view I have a branch for this, but because the returned type differs from a copy by alignment spacing there was a test failure. Merging that branch might cause some incompatibilities. I am a bit worried here that comparisons might make trouble. Object array creation new conversion to int This one needs a decision. Julian, Sebastian, thoughts? Maybe for all to consider this is about what happens for object arrays if you do things like: # Array cast to object array (np.array(arr) would be identical): a = np.arange(10).astype(object) # Array passed into new array creation (not just *one* array): b = np.array([np.arange(10)], dtype=object) # Numerical array is assigned to object array: c = np.empty(10, dtype=object) c[...] = np.arange(10) Before this change, the result was: type(a[0]) is int type(b[0,0]) is np.int_ # Note the numpy type type(c[0]) is int After this change, they are all `int`. Though note that the numpy type is preserved for example for long double. On the one hand preserving the numpy type might be nice, but on the other hand we don't care much about the dtypes of scalars and in practice the python types are probably more often wanted. what if I don't like python? Fair point :). I also think it is more consistent if we use the numpy types (and the user has to cast to the python type explicitly). Could argue that if someone casts to object, they might like python objects, but if you don't want them that is tricky, too. There is the option that the numerical array - object array cast always returns an array of numpy scalars. Making it consistent (opposite to what is currently in numpy master). This would mean that `numerical_arr.astype(object)` would give an array of numpy scalars always. Getting python scalars would only be possible using `arr.item()` (or `tolist`). I guess that change is less likely to cause problems, because the numpy types can do more things normally though they are slower. So a (still a bit unsure) +1 from me for making numeric - object casts return arrays of numpy scalars unless we have reason to expect that to cause problems. Not sure how easy that would be to change, it wasn't a one line change when I tried, so there is something slightly tricky to clear out, but probably not too hard either. More general background question. Why is there casting involved in object arrays? I thought object arrays will just take and return whatever you put in. If I use python ints, I might want python ints. If I use numpy int_s, I might want numpy ints. b1 = np.array([np.arange(10)], dtype=object) versus b2 = np.array([list(range(10))], dtype=object) b1 = np.array([np.arange(10)], dtype=object) type(b1[0,2]) type 'numpy.int32' type(b1[0][2]) type 'numpy.int32' b2 = np.array([list(range(10))], dtype=object) type(b2[0,2]) type 'int' type(b2[0][2]) type 'int' another version type(np.array(np.arange(10).tolist(), dtype=object)[2]) type 'int' type(np.array(list(np.arange(10)), dtype=object)[2]) type 'numpy.int32' Josef - Sebastian np.int_(0)**(-1) inf 0**-1 Traceback (most recent call last): File pyshell#28, line 1, in module 0**-1 ZeroDivisionError: 0.0 cannot be raised to a negative power type(np.arange(5)[0]) type 'numpy.int32' np.arange(5)[0]**-1 inf type(np.arange(5)[0].item()) type 'int' np.arange(5)[0].item()**-1 Traceback (most recent call last): File pyshell#40, line 1, in module np.arange(5)[0].item()**-1 ZeroDivisionError: 0.0 cannot be raised to a negative power np.__version__ '1.6.1' I remember struggling through this (avoiding python operations) quite a bit in my early bugfixes to scipy.stats.distributions. (IIRC I ended up avoiding most ints.) Josef Since I just realized that things are safe (float128 does not cast to float after all), I changed my mind and am tempted to keep the new behaviour. That is, if it does
Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?
On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I'm exploring Mingw-w64 for numpy building, and I've found it gives a slightly different answer for 'exp' than - say - gcc on OSX. The difference is of the order of the eps value for the output number (2 * eps for a result of ~2.0). Is accuracy somewhere specified for C functions like exp? Or is accuracy left as an implementation detail for the C library author? C99 says (sec 5.2.4.2.2) that The accuracy of the floating point operations ... and of the library functions in math.h and complex.h that return floating point results is implemenetation defined. The implementation may state that the accuracy is unknown. (This last sentence is basically saying that with regard to some higher up clauses that required all conforming implementations to document this stuff, saying eh, who knows counts as documenting it. Hooray for standards!) Presumably the accuracy in this case is a function of the C library anyway, not the compiler? Mingw-w64 implementation is in assembly: http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h Numpy has its own implementations for a bunch of the math functions, and it's been unclear in the past whether numpy or the libc implementations were better in any particular case. I only investigated this particular value, in which case it looked as though the OSX value was closer to the exact value (via sympy.mpmath) - by ~1 unit-at-the-last-place. This was causing a divergence in the powell optimization path and therefore a single scipy test failure. I haven't investigated further - was wondering what investigation I should do, more than running the numpy / scipy test suites. Investigating further, with this script: https://gist.github.com/matthew-brett/11301221 The following are tests of np.exp accuracy for input values between 0 and 10, for numpy 1.8.1. If np.exp(x) performs perfectly, it will return the nearest floating point value to the exact value of exp(x). If it does, this scores a zero for error in the tables below. If 'proportion of zeros' is 1 - then np.exp performs perfectly for all tested values of exp (as is the case for linux here). OSX 10.9 Proportion of zeros: 0.99789 Sum of error: 2.15021267458e-09 Sum of squared error: 2.47149370032e-14 Max / min error: 5.96046447754e-08 -2.98023223877e-08 Sum of squared relative error: 5.22456992025e-30 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Debian Jessie / Sid Proportion of zeros: 1.0 Sum of error: 0.0 Sum of squared error: 0.0 Max / min error: 0.0 0.0 Sum of squared relative error: 0.0 Max / min relative error: 0.0 0.0 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Mingw-w64 Windows 7 Proportion of zeros: 0.82089 Sum of error: 8.08415331122e-07 Sum of squared error: 2.90045099615e-12 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.18466468175e-28 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Take-home : exp implementation for mingw-w64 is exactly (floating point) correct 82% of the time, and one unit-at-the-last-place off for the rest [1]. OSX is off by 1 ULP only 0.2% of the time. Windows 64 with MKL \WinPython-64bit-3.3.2.2\python-3.3.2.amd64python E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py Proportion of zeros: 0.99793 Sum of error: -2.10546855506e-07 Sum of squared error: 3.33304327526e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.98420694339e-30 Max / min relative error: 2.20881302691e-16 -2.18321571939e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Windows 32 bit python with official MingW binaries Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Proportion of zeros: 0.99464 Sum of error: -3.91621083118e-07 Sum of squared error: 9.2239247812e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 1.3334972729e-29 Max / min relative error: 2.21593462148e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Is mingw-w64 accurate enough? Do we have any policy on this? I wouldn't worry about a missing or an extra eps in our applications, but the competition is more accurate. Josef Cheers, Matthew [1] http://matthew-brett.github.io/pydagogue/floating_error.html ___
Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?
On Sat, Apr 26, 2014 at 8:05 PM, josef.p...@gmail.com wrote: On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I'm exploring Mingw-w64 for numpy building, and I've found it gives a slightly different answer for 'exp' than - say - gcc on OSX. The difference is of the order of the eps value for the output number (2 * eps for a result of ~2.0). Is accuracy somewhere specified for C functions like exp? Or is accuracy left as an implementation detail for the C library author? C99 says (sec 5.2.4.2.2) that The accuracy of the floating point operations ... and of the library functions in math.h and complex.h that return floating point results is implemenetation defined. The implementation may state that the accuracy is unknown. (This last sentence is basically saying that with regard to some higher up clauses that required all conforming implementations to document this stuff, saying eh, who knows counts as documenting it. Hooray for standards!) Presumably the accuracy in this case is a function of the C library anyway, not the compiler? Mingw-w64 implementation is in assembly: http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h Numpy has its own implementations for a bunch of the math functions, and it's been unclear in the past whether numpy or the libc implementations were better in any particular case. I only investigated this particular value, in which case it looked as though the OSX value was closer to the exact value (via sympy.mpmath) - by ~1 unit-at-the-last-place. This was causing a divergence in the powell optimization path and therefore a single scipy test failure. I haven't investigated further - was wondering what investigation I should do, more than running the numpy / scipy test suites. Investigating further, with this script: https://gist.github.com/matthew-brett/11301221 The following are tests of np.exp accuracy for input values between 0 and 10, for numpy 1.8.1. If np.exp(x) performs perfectly, it will return the nearest floating point value to the exact value of exp(x). If it does, this scores a zero for error in the tables below. If 'proportion of zeros' is 1 - then np.exp performs perfectly for all tested values of exp (as is the case for linux here). OSX 10.9 Proportion of zeros: 0.99789 Sum of error: 2.15021267458e-09 Sum of squared error: 2.47149370032e-14 Max / min error: 5.96046447754e-08 -2.98023223877e-08 Sum of squared relative error: 5.22456992025e-30 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Debian Jessie / Sid Proportion of zeros: 1.0 Sum of error: 0.0 Sum of squared error: 0.0 Max / min error: 0.0 0.0 Sum of squared relative error: 0.0 Max / min relative error: 0.0 0.0 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Mingw-w64 Windows 7 Proportion of zeros: 0.82089 Sum of error: 8.08415331122e-07 Sum of squared error: 2.90045099615e-12 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.18466468175e-28 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Take-home : exp implementation for mingw-w64 is exactly (floating point) correct 82% of the time, and one unit-at-the-last-place off for the rest [1]. OSX is off by 1 ULP only 0.2% of the time. Windows 64 with MKL \WinPython-64bit-3.3.2.2\python-3.3.2.amd64python E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py Proportion of zeros: 0.99793 Sum of error: -2.10546855506e-07 Sum of squared error: 3.33304327526e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.98420694339e-30 Max / min relative error: 2.20881302691e-16 -2.18321571939e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Windows 32 bit python with official MingW binaries Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Proportion of zeros: 0.99464 Sum of error: -3.91621083118e-07 Sum of squared error: 9.2239247812e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 1.3334972729e-29 Max / min relative error: 2.21593462148e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Is mingw-w64 accurate enough? Do we have any policy on this? I wouldn't worry about a missing or an extra eps in our applications, but the competition is more accurate. Just for comparison, I increased `until` to 300 the
Re: [Numpy-discussion] Slightly off-topic - accuracy of C exp function?
On Sat, Apr 26, 2014 at 8:31 PM, josef.p...@gmail.com wrote: On Sat, Apr 26, 2014 at 8:05 PM, josef.p...@gmail.com wrote: On Sat, Apr 26, 2014 at 6:37 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Wed, Apr 23, 2014 at 11:59 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Apr 23, 2014 at 1:43 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 23, 2014 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I'm exploring Mingw-w64 for numpy building, and I've found it gives a slightly different answer for 'exp' than - say - gcc on OSX. The difference is of the order of the eps value for the output number (2 * eps for a result of ~2.0). Is accuracy somewhere specified for C functions like exp? Or is accuracy left as an implementation detail for the C library author? C99 says (sec 5.2.4.2.2) that The accuracy of the floating point operations ... and of the library functions in math.h and complex.h that return floating point results is implemenetation defined. The implementation may state that the accuracy is unknown. (This last sentence is basically saying that with regard to some higher up clauses that required all conforming implementations to document this stuff, saying eh, who knows counts as documenting it. Hooray for standards!) Presumably the accuracy in this case is a function of the C library anyway, not the compiler? Mingw-w64 implementation is in assembly: http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-crt/math/exp.def.h Numpy has its own implementations for a bunch of the math functions, and it's been unclear in the past whether numpy or the libc implementations were better in any particular case. I only investigated this particular value, in which case it looked as though the OSX value was closer to the exact value (via sympy.mpmath) - by ~1 unit-at-the-last-place. This was causing a divergence in the powell optimization path and therefore a single scipy test failure. I haven't investigated further - was wondering what investigation I should do, more than running the numpy / scipy test suites. Investigating further, with this script: https://gist.github.com/matthew-brett/11301221 The following are tests of np.exp accuracy for input values between 0 and 10, for numpy 1.8.1. If np.exp(x) performs perfectly, it will return the nearest floating point value to the exact value of exp(x). If it does, this scores a zero for error in the tables below. If 'proportion of zeros' is 1 - then np.exp performs perfectly for all tested values of exp (as is the case for linux here). OSX 10.9 Proportion of zeros: 0.99789 Sum of error: 2.15021267458e-09 Sum of squared error: 2.47149370032e-14 Max / min error: 5.96046447754e-08 -2.98023223877e-08 Sum of squared relative error: 5.22456992025e-30 Max / min relative error: 2.19700100681e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Debian Jessie / Sid Proportion of zeros: 1.0 Sum of error: 0.0 Sum of squared error: 0.0 Max / min error: 0.0 0.0 Sum of squared relative error: 0.0 Max / min relative error: 0.0 0.0 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Mingw-w64 Windows 7 Proportion of zeros: 0.82089 Sum of error: 8.08415331122e-07 Sum of squared error: 2.90045099615e-12 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.18466468175e-28 Max / min relative error: 2.22041308226e-16 -2.22042100773e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Take-home : exp implementation for mingw-w64 is exactly (floating point) correct 82% of the time, and one unit-at-the-last-place off for the rest [1]. OSX is off by 1 ULP only 0.2% of the time. Windows 64 with MKL \WinPython-64bit-3.3.2.2\python-3.3.2.amd64python E:\Josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_exp_error.py Proportion of zeros: 0.99793 Sum of error: -2.10546855506e-07 Sum of squared error: 3.33304327526e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 4.98420694339e-30 Max / min relative error: 2.20881302691e-16 -2.18321571939e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Windows 32 bit python with official MingW binaries Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Proportion of zeros: 0.99464 Sum of error: -3.91621083118e-07 Sum of squared error: 9.2239247812e-14 Max / min error: 5.96046447754e-08 -5.96046447754e-08 Sum of squared relative error: 1.3334972729e-29 Max / min relative error: 2.21593462148e-16 -2.2098803255e-16 eps: 2.22044604925e-16 Proportion of relative err = eps: 0.0 Is mingw-w64 accurate enough? Do we have any policy on this? I wouldn't worry about a missing or an extra eps in our applications, but the competition is
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Thu, Apr 24, 2014 at 7:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy scipy looks good, just two powell trace failures Josef Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 Strange. That second one looks familiar, at least the -96 part. Wonder why this doesn't show up with the MKL builds. ok tried again, this time deleting the old numpy directories before installing Ran 4760 tests in 42.124s OK (KNOWNFAIL=10, SKIP=8) nose.result.TextTestResult run=4760 errors=0 failures=0 so pip also seems to be reusing leftover files. all clear. Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
OT: Oh, I hate pip and packages that require numpy E:\tmpC:\Python27\python C:\Python27\Scripts\pip-script.py install -U patsy Downloading/unpacking patsy Running setup.py (path:c:\users\josef\appdata\local\temp\pip_build_josef\patsy\setup.py) egg_info for package patsy no previously-included directories found matching 'doc\_build' Downloading/unpacking numpy from https://pypi.python.org/packages/source/n/numpy/numpy-1.8.1.tar.gz#md5=be95babe263bfa3428363d6db5b64678(from patsy) Found existing installation: numpy 1.6.2 Uninstalling numpy: Successfully uninstalled numpy Running setup.py install for numpy ... ... C:\Python27\lib\distutils\dist.py:267: UserWarning: Unknown distribution option: 'define_macros' warnings.warn(msg) error: Unable to find vcvarsall.bat Rolling back uninstall of numpy Cleaning up... - user error ? I have a stale numpy-1.6.2-py2.7.egg-info file in site-packages put that's a nice new feature of pip Rolling back uninstall of numpy numpy is still here Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Thu, Apr 24, 2014 at 7:29 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:20 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:08 PM, josef.p...@gmail.com wrote: On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I get two test failures with numpy Josef np.test() Running unit tests for numpy NumPy version 1.8.1 NumPy is installed in C:\Python27\lib\site-packages\numpy Python version 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] nose version 1.1.2 == FAIL: test_iterator.test_iter_broadcasting_errors -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 657, in test_iter_broadcasting_errors '(2)-(2,newaxis)') % msg) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 44, in assert_ raise AssertionError(msg) AssertionError: Message operands could not be broadcast together with remapped shapes [original-remapped]: (2,3)-(2,3) (2,)-(2,newaxis) and requested shape (4,3) doesn't contain remapped operand shape(2)-(2,newaxis) == FAIL: test_iterator.test_iter_array_cast -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Python27\lib\site-packages\numpy\core\tests\test_iterator.py, line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 255, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k, err_msg), verbose) File C:\Python27\lib\site-packages\numpy\testing\utils.py, line 317, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96L DESIRED: -96 -- Ran 4828 tests in 46.306s FAILED (KNOWNFAIL=10, SKIP=8, failures=2) nose.result.TextTestResult run=4828 errors=0 failures=2 Strange. That second one looks familiar, at least the -96 part. Wonder why this doesn't show up with the MKL builds. ok tried again, this time deleting the old numpy directories before installing Ran 4760 tests in 42.124s OK (KNOWNFAIL=10, SKIP=8) nose.result.TextTestResult run=4760 errors=0 failures=0 so pip also seems to be reusing leftover files. all clear. Running the statsmodels test suite, I get a failure in test_discrete.TestProbitCG where fmin_cg converges to something that differs in the 3rd decimal. I usually only test the 32-bit version, so I don't know if this is specific to this scipy version, but we haven't seen this in a long time. I used our nightly binaries http://statsmodels.sourceforge.net/binaries/ Josef Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing
On Thu, Apr 24, 2014 at 7:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi Matthew, On Thu, Apr 24, 2014 at 3:56 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Thanks to Cark Kleffner's toolchain and some help from Clint Whaley (main author of ATLAS), I've built 64-bit windows numpy and scipy wheels for testing. The build uses Carl's custom mingw-w64 build with static linking. There are two harmless test failures on scipy (being discussed on the list at the moment) - tests otherwise clean. Wheels are here: https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl You can test with: pip install -U pip # to upgrade pip to latest pip install -f https://nipy.bic.berkeley.edu/scipy_installers numpy scipy Please do send feedback. ATLAS binary here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-64-full-sse2.tar.bz2 Many thanks for Carl in particular for doing all the hard work, Cool. After all these long years... Now all we need is a box running tests for CI. Very good news, after 3 years interruption I might be able to build scipy again, and switch now to a 64bit development version. Thanks for pushing for this, and doing all the hard work. Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.9.x branch
On Wed, Apr 23, 2014 at 5:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Di, 2014-04-22 at 15:35 -0600, Charles R Harris wrote: Hi All, I'd like to branch 1.9.x at the end of the month. There are a couple of reasons for the timing. First, we have a lot of new stuff in the development branch. Second, there is work ongoing in masked arrays that I'd like to keep out of the release so that it has more time to settle. Third, it's past time ;) Sounds good. There are currently a number of 1.9.0 blockers, which can be seen here. Datetime timezone handling broken in 1.7.x I don't think there is time to get this done for 1.9.0 and it needs to be pushed off to 1.10.0. Return multiple field selection as ro view I have a branch for this, but because the returned type differs from a copy by alignment spacing there was a test failure. Merging that branch might cause some incompatibilities. I am a bit worried here that comparisons might make trouble. Object array creation new conversion to int This one needs a decision. Julian, Sebastian, thoughts? Maybe for all to consider this is about what happens for object arrays if you do things like: # Array cast to object array (np.array(arr) would be identical): a = np.arange(10).astype(object) # Array passed into new array creation (not just *one* array): b = np.array([np.arange(10)], dtype=object) # Numerical array is assigned to object array: c = np.empty(10, dtype=object) c[...] = np.arange(10) Before this change, the result was: type(a[0]) is int type(b[0,0]) is np.int_ # Note the numpy type type(c[0]) is int After this change, they are all `int`. Though note that the numpy type is preserved for example for long double. On the one hand preserving the numpy type might be nice, but on the other hand we don't care much about the dtypes of scalars and in practice the python types are probably more often wanted. what if I don't like python? np.int_(0)**(-1) inf 0**-1 Traceback (most recent call last): File pyshell#28, line 1, in module 0**-1 ZeroDivisionError: 0.0 cannot be raised to a negative power type(np.arange(5)[0]) type 'numpy.int32' np.arange(5)[0]**-1 inf type(np.arange(5)[0].item()) type 'int' np.arange(5)[0].item()**-1 Traceback (most recent call last): File pyshell#40, line 1, in module np.arange(5)[0].item()**-1 ZeroDivisionError: 0.0 cannot be raised to a negative power np.__version__ '1.6.1' I remember struggling through this (avoiding python operations) quite a bit in my early bugfixes to scipy.stats.distributions. (IIRC I ended up avoiding most ints.) Josef Since I just realized that things are safe (float128 does not cast to float after all), I changed my mind and am tempted to keep the new behaviour. That is, if it does not create any problems (there was some issue in scipy, not sure how bad). - Sebastian Median of np.matrix is broken( Not sure what the status of this one is. 1.8 deprecations: Follow-up ticket Things that might should be removed. ERROR: test_big_arrays (test_io.TestSavezLoad) on OS X + Python 3.3 I believe this one was fixed. For general problems reading/writing big files on OS X, I believe they were fixed in Maverick and I'm inclined to recommend an OS upgrade rather than work to chunk all the io. Deprecate NPY_CHAR This one is waiting on a fix from Pearu to make f2py use numpy strings. I think we will need to do this ourselves if we want to carry through the deprecation. In any case it probably needs to be pushed off to 1.10. 1.7 deprecations: Follow-up ticket Some of these changes have been made, ro diagonal view for instance, some still remain. Additions, updates, and thoughts welcome. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed
On Thu, Apr 10, 2014 at 4:43 AM, Francesc Alted fal...@gmail.com wrote: On 4/9/14, 10:46 PM, Chris Barker wrote: On Tue, Apr 8, 2014 at 11:14 AM, Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote: Thank you! Though I suspect that the most important part of my contribution may have just been my high tolerance for writing emails ;-). no -- it's your high tolerance for _reading_ emails... Far too many of us have a high tolerance for writing them! Ha ha, very true! I'm trying, but it's hard. Sometimes avoiding the `send` button helps. I still have a collection of draft replies. Josef I'm not going to send this reply -- oops, did it again -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Fri, Apr 4, 2014 at 8:50 AM, Daπid davidmen...@gmail.com wrote: On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote: josef.p...@gmail.com wrote: pandas came later and thought ddof=1 is worth more than consistency. Pandas is a data analysis package. NumPy is a numerical array package. I think ddof=1 is justified for Pandas, for consistency with statistical software (SPSS et al.) For NumPy, there are many computational tasks where the Bessel correction is not wanted, so providing a uncorrected result is the correct thing to do. NumPy should be a low-level array library that does very little magic. All this discussion reminds me of the book Numerical Recipes: if the difference between N and N − 1 ever matters to you, then you are probably up to no good anyway — e.g., trying to substantiate a questionable hypothesis with marginal data. For any reasonably sized data set, it is a correction in the second significant figure. I fully agree, but sometimes you don't have much choice. `big data` == `statistics with negative degrees of freedom` ? or maybe `machine learning` == `statistics with negative degrees of freedom` ? Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Where to put versionadded
On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Currently there are several placements of the '.. versionadded::' directive and I'd like to settle on a proper style for consistency. There are two occasions on which it is used, first, when a new function or class is added and second, when a new keyword is added to an existing function or method. The options are as follows. New Function 1) Originally, the directive was added in the notes section. Notes - .. versionadded:: 1.5.0 2) Alternatively, it is placed after the extended summary. blah, blah ..versionadded:: 1.5.0 Between these two, I vote for 2) because the version is easily found when reading the documentation either in a terminal or rendered into HTML. New Parameter 1) It is placed before the parameter description newoption : int, optional .. versionadded:: 1.5.0 blah. 2) It is placed after the parameter description. newoption : int, optional blah. .. versionadded:: 1.5.0 Both of these render correctly, but the first is more compact while the second puts the version after the description where it doesn't interrupt the reading. I'm tending towards 1) on account of its compactness. Thoughts? I'm in favor of putting them only in the Notes section. Most of the time they are not crucial information and it's distracting. I usually only look for them when I'm working explicitly across several numpy versions. like in python: versionadded 2.1 is only interesting for historians. Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Where to put versionadded
On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote: On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Currently there are several placements of the '.. versionadded::' directive and I'd like to settle on a proper style for consistency. There are two occasions on which it is used, first, when a new function or class is added and second, when a new keyword is added to an existing function or method. The options are as follows. New Function 1) Originally, the directive was added in the notes section. Notes - .. versionadded:: 1.5.0 2) Alternatively, it is placed after the extended summary. blah, blah ..versionadded:: 1.5.0 Between these two, I vote for 2) because the version is easily found when reading the documentation either in a terminal or rendered into HTML. New Parameter 1) It is placed before the parameter description newoption : int, optional .. versionadded:: 1.5.0 blah. 2) It is placed after the parameter description. newoption : int, optional blah. .. versionadded:: 1.5.0 Both of these render correctly, but the first is more compact while the second puts the version after the description where it doesn't interrupt the reading. I'm tending towards 1) on account of its compactness. Thoughts? I'm in favor of putting them only in the Notes section. Most of the time they are not crucial information and it's distracting. I usually only look for them when I'm working explicitly across several numpy versions. like in python: versionadded 2.1 is only interesting for historians. I find the opposite to be true. Because numpy needs maintain compatibility with a number python versions, I often check the python documentation to see in which version a function was added. I agree; versionadded 2.1 is not likely interesting but versionadded 2.7 is very interesting. That's true, but it's a mess for maintainers because we support now 5 python versions. numpy doesn't have a long history of versionadded yet, I didn't find anything for 1.3 in a quick search. statsmodels has now numpy 1.6 as minimum requirement and I'm interested in the features that become available with a minimum version increase. Once I know what I'm allowed to use, I only care about the real documentation, How does einsum really work?. But as a numpy user, I was never really interested in the information that arraysetops where enhanced and renamed in numpy 1.x (x=?4), or that take was added in 0.y, ... Even the first part of polynomial is already in 1.4 (It might just make me feel old if I remember when it was changed.) versionadded is not very distracting in the html rendering, so I'm just +0.1 on Notes. Josef Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Where to put versionadded
On Fri, Apr 4, 2014 at 3:33 PM, josef.p...@gmail.com wrote: On Fri, Apr 4, 2014 at 3:07 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Apr 4, 2014 at 11:54 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote: On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Currently there are several placements of the '.. versionadded::' directive and I'd like to settle on a proper style for consistency. There are two occasions on which it is used, first, when a new function or class is added and second, when a new keyword is added to an existing function or method. The options are as follows. New Function 1) Originally, the directive was added in the notes section. Notes - .. versionadded:: 1.5.0 2) Alternatively, it is placed after the extended summary. blah, blah ..versionadded:: 1.5.0 Between these two, I vote for 2) because the version is easily found when reading the documentation either in a terminal or rendered into HTML. New Parameter 1) It is placed before the parameter description newoption : int, optional .. versionadded:: 1.5.0 blah. 2) It is placed after the parameter description. newoption : int, optional blah. .. versionadded:: 1.5.0 Both of these render correctly, but the first is more compact while the second puts the version after the description where it doesn't interrupt the reading. I'm tending towards 1) on account of its compactness. Thoughts? I'm in favor of putting them only in the Notes section. Most of the time they are not crucial information and it's distracting. I usually only look for them when I'm working explicitly across several numpy versions. like in python: versionadded 2.1 is only interesting for historians. since I like history: AFAICS: arraysetops was changed in 1.4 histogram was added in 0.4 corrcoef was added in 0.9.2 numpy 0.9.2 is 8 years old python 2.1 has soon it's 13th anniversary Josef I find the opposite to be true. Because numpy needs maintain compatibility with a number python versions, I often check the python documentation to see in which version a function was added. I agree; versionadded 2.1 is not likely interesting but versionadded 2.7 is very interesting. That's true, but it's a mess for maintainers because we support now 5 python versions. numpy doesn't have a long history of versionadded yet, I didn't find anything for 1.3 in a quick search. statsmodels has now numpy 1.6 as minimum requirement and I'm interested in the features that become available with a minimum version increase. Once I know what I'm allowed to use, I only care about the real documentation, How does einsum really work?. But as a numpy user, I was never really interested in the information that arraysetops where enhanced and renamed in numpy 1.x (x=?4), or that take was added in 0.y, ... Even the first part of polynomial is already in 1.4 (It might just make me feel old if I remember when it was changed.) versionadded is not very distracting in the html rendering, so I'm just +0.1 on Notes. Josef Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Wed, Apr 2, 2014 at 10:06 AM, Sturla Molden sturla.mol...@gmail.com wrote: josef.p...@gmail.com wrote: pandas came later and thought ddof=1 is worth more than consistency. Pandas is a data analysis package. NumPy is a numerical array package. I think ddof=1 is justified for Pandas, for consistency with statistical software (SPSS et al.) For NumPy, there are many computational tasks where the Bessel correction is not wanted, so providing a uncorrected result is the correct thing to do. NumPy should be a low-level array library that does very little magic. Those who need the Bessel correction can multiply with sqrt(n/float(n-1)) or specify ddof. Bu that belongs in the docs. Sturla P.S. Personally I am not convinced unbiased is ever a valid argument, as the biased estimator has smaller error. This is from experience in marksmanship: I'd rather shoot a tight series with small systematic error than scatter my bullets wildly but unbiased on the target. It is the total error that counts. The series with smallest total error gets the best score. It is better to shoot two series and calibrate the sight in between than use a calibration-free sight that don't allow us to aim. calibration == bias correction ? That's why I think classical statistics got this one wrong. Unbiased is never a virtue, but the smallest error is. Thus, if we are to repeat an experiment, we should calibrate our estimator just like a marksman calibrates his sight. But the aim should always be calibrated to give the smallest error, not an unbiased scatter. Noone in their right mind would claim a shotgun is more precise than a rifle because it has smaller bias. But that is what applying the Bessel correction implies. https://www.youtube.com/watch?v=i4xcEZZDW_I I spent several days trying to figure out what Stata is doing for small sample corrections to reduce the bias of the rejection interval with uncorrected variance estimates. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Thu, Apr 3, 2014 at 2:21 PM, Bago mrb...@gmail.com wrote: Sturla P.S. Personally I am not convinced unbiased is ever a valid argument, as the biased estimator has smaller error. This is from experience in marksmanship: I'd rather shoot a tight series with small systematic error than scatter my bullets wildly but unbiased on the target. It is the total error that counts. The series with smallest total error gets the best score. It is better to shoot two series and calibrate the sight in between than use a calibration-free sight that don't allow us to aim. That's why I think classical statistics got this one wrong. Unbiased is never a virtue, but the smallest error is. Thus, if we are to repeat an experiment, we should calibrate our estimator just like a marksman calibrates his sight. But the aim should always be calibrated to give the smallest error, not an unbiased scatter. Noone in their right mind would claim a shotgun is more precise than a rifle because it has smaller bias. But that is what applying the Bessel correction implies. I agree with the point, and what makes it even worse is that ddof=1 does not even produce an unbiased standard deviation estimate. I produces an unbiased variance estimate but the sqrt of this variance estimate is a biased standard deviation estimate, http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation. But ddof=1 still produces a smaller bias than ddof=0 I think the main point in stats is that without ddof, the variance will be too small and t-test or similar will be liberal in small samples, or confidence intervals will be too short. (for statisticians that prefer to have tests that maintain their level and prefer to err on the conservative side.) Josef Bago ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If we could still choose here without any costs, obviously that's true. This particular ship sailed a long time ago though. By the way, there isn't even a `scipy.stats.std`, so we're comparing with differently named functions (nanstd for example). Presumably nanstd is a lot less heavily used than std, and presumably people expect 'nanstd' to be a 'nan' version of 'std' -- what do you think of changing nanstd to ddof=0 to match numpy? (With appropriate FutureWarning transition, etc.) numpy is numpy, a numerical library scipy.stats is stats and behaves differently. (axis=0) nanstd in scipy.stats will hopefully also go away soon, so I don't think it's worth changing there either. pandas came later and thought ddof=1 is worth more than consistency. I don't think ddof defaults's are worth jumping through deprecation hoops. (bias in cov, corrcoef is non-standard ddof) Josef -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes to np.vander
On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: Hi, I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds up `np.vander` by using accumulated multiplication instead of exponentiation to compute the Vandermonde matrix. For largish matrices the speed-ups can be quite dramatic, over an order of magnitude. Julian has raised concerns on numerical stability and loss of precision, which don't seem to be all that relevant. Do speak up if you think otherwise. We are also discussing replacing a recently added kwarg, order, which now accepts a string, either increasing or decreasing, to indicate the ordering of the matrix columns. This was not present in 1.8, so can still be modified. The proposal is to replace it with a reversed boolean flag. Unfortunately, the return of np.vander in 1.8 and before is the opposite (i.e. its reversed) from the standard definition, which has powers increasing from left to right. So it is not clear what the reversed keyword should refer to: 1. If it refers to the standard definition, then it would default to False for backwards compatibility, but be consistent with the conventional definition. 2. If it refers to the existing behavior of numpy's vander, then it would default to True, and not be consistent with the conventional definition. I prefer option 1, but would like to hear other's opinions. Which could of course include naming the boolean flag more ingeniously, or keeping the string flag. If he's reading, I'd specially like to hear Warren Weckesser's thoughts, as he is the one who added the order kwarg. order is not a good name, I would find it very confusing (I'm usually mixing up order and degree) http://en.wikipedia.org/wiki/Order_of_a_polynomial how about calling the keyword increasing=False ? which would avoid defining what it's reversed against. I don't know about precision loss. There's a nasty NIST problem for polynomial regression. If that doesn't change much, I wouldn't worry about differences in precision for statistical applications. But the problem is for the regression part, and might not be affected much by the vander precision. (Besides it's an example for Don't do that at home.) http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression Josef Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changes to np.vander
On Sat, Mar 29, 2014 at 7:31 AM, josef.p...@gmail.com wrote: On Sat, Mar 29, 2014 at 12:12 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: Hi, I have submitted a PR (https://github.com/numpy/numpy/pull/4568) that speeds up `np.vander` by using accumulated multiplication instead of exponentiation to compute the Vandermonde matrix. For largish matrices the speed-ups can be quite dramatic, over an order of magnitude. Julian has raised concerns on numerical stability and loss of precision, which don't seem to be all that relevant. Do speak up if you think otherwise. We are also discussing replacing a recently added kwarg, order, which now accepts a string, either increasing or decreasing, to indicate the ordering of the matrix columns. This was not present in 1.8, so can still be modified. The proposal is to replace it with a reversed boolean flag. Unfortunately, the return of np.vander in 1.8 and before is the opposite (i.e. its reversed) from the standard definition, which has powers increasing from left to right. So it is not clear what the reversed keyword should refer to: 1. If it refers to the standard definition, then it would default to False for backwards compatibility, but be consistent with the conventional definition. 2. If it refers to the existing behavior of numpy's vander, then it would default to True, and not be consistent with the conventional definition. I prefer option 1, but would like to hear other's opinions. Which could of course include naming the boolean flag more ingeniously, or keeping the string flag. If he's reading, I'd specially like to hear Warren Weckesser's thoughts, as he is the one who added the order kwarg. order is not a good name, I would find it very confusing (I'm usually mixing up order and degree) http://en.wikipedia.org/wiki/Order_of_a_polynomial how about calling the keyword increasing=False ? which would avoid defining what it's reversed against. Obviously I didn't read the PR before answering. But it shows that `increasing` might be obvious, or that Warren and I think the same way. Josef I don't know about precision loss. There's a nasty NIST problem for polynomial regression. If that doesn't change much, I wouldn't worry about differences in precision for statistical applications. But the problem is for the regression part, and might not be affected much by the vander precision. (Besides it's an example for Don't do that at home.) http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Linear_Regression Josef Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe
On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel olivier.gri...@ensta.org wrote: My understanding of Carl's effort is that the long term goal is to have official windows whl packages for both numpy and scipy published on PyPI with a builtin BLAS / LAPACK implementation so that users can do `pip install scipy` under windows and get something that just works without have to install any compiler (fortran or C) nor any additional library manually. Most windows users are beginners and you cannot really expect them to understand how to build the whole scipy stack from source. The current solution (executable setup installers) is not optimal as it requires Administrator rights to run, does not resolve dependencies as pip does and cannot be installed in virtualenvs. as small related point: The official installers can be used to install in virtualenv The way I do it: Run the superpack, official installer, wait until it extracts the correct (SSE) install exe, then cancel Then easy_install the install exe file that has been extracted to the temp folder into the virtualenv. I don't remember if the extraction already requires admin rights, but I think not. easy_install doesn't require any, IIRC. Josef If we can build numpy / scipy whl packages for windows with the Atlas dlls then fine embedded in the numpy package then good. It does not need to be the fastest BLAS / LAPACK lib in my opinion. Just something that works. The problem with ATLAS is that you need to select the number of thread at build time AFAIK. But we could set it to a reasonable default (e.g. 4 threads) for the default windows package. -- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe
On Thu, Mar 27, 2014 at 9:59 AM, Olivier Grisel olivier.gri...@ensta.org wrote: 2014-03-27 14:55 GMT+01:00 josef.p...@gmail.com: On Wed, Mar 26, 2014 at 5:17 PM, Olivier Grisel olivier.gri...@ensta.org wrote: My understanding of Carl's effort is that the long term goal is to have official windows whl packages for both numpy and scipy published on PyPI with a builtin BLAS / LAPACK implementation so that users can do `pip install scipy` under windows and get something that just works without have to install any compiler (fortran or C) nor any additional library manually. Most windows users are beginners and you cannot really expect them to understand how to build the whole scipy stack from source. The current solution (executable setup installers) is not optimal as it requires Administrator rights to run, does not resolve dependencies as pip does and cannot be installed in virtualenvs. as small related point: The official installers can be used to install in virtualenv The way I do it: Run the superpack, official installer, wait until it extracts the correct (SSE) install exe, then cancel Then easy_install the install exe file that has been extracted to the temp folder into the virtualenv. I don't remember if the extraction already requires admin rights, but I think not. easy_install doesn't require any, IIRC. Hackish but interesting. Maybe the extraction can be done with generic tools like winzip? I tried to open and unzip with WinRAR but couldn't make sense of the content. BTW: easy_install for other installers like matplotlib also works nicely for virtualenv However, the official installers are only for 32-bit python, and I appreciate all the efforts to modernize the numpy and scipy builds. Josef -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Resolving the associativity/precedence debate for @
On Mon, Mar 24, 2014 at 8:33 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote: After 88 emails we don't have a conclusion in the other thread (see [1] for background). But we have to come to some conclusion or another if we want @ to exist :-). So I'll summarize where the discussion stands and let's see if we can find some way to resolve this. Response in this thread so far seems (AFAICT) to have pretty much converged on same-left. If you think that this would be terrible and there is some compelling argument against it, then please speak up! Otherwise, if no-one objects, then I'll go ahead in the next few days and put same-left into the PEP. I think we should take a close look at broadcasting before deciding on the precedence. Can you elaborate? Like what, concretely, do you think we need to do now? ??? In examples like this, parenthesizing the code aggressively to spell out the logic, not only to Stata but also to yourself and anybody else reading it, should cause no embarrassment. You need not assume knowledge of Stata's precedence rules that determine interpretation when several operators are used in one expression. More importantly, you may avoid some horrible little bugs. Nicholas J. Cox Trying to figure out what Stata is using: elementwise operations are just below their matrix version in operator precedence. But Stata came late to matrix algebra, and is definitely not like Matlab or Gauss, or numpy. Josef -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] why sort does not accept a key?
On Mon, Mar 24, 2014 at 12:08 PM, Alan G Isaac alan.is...@gmail.com wrote: On Mon, Mar 24, 2014 at 11:32 AM, Alan G Isaac wrote: I'm wondering if `sort` intentionally does not accept a `key` or if this is just a missing feature? On 3/24/2014 11:47 AM, Alexander Belopolsky wrote: It would be very inefficient to call a key function on every element compared during the sort. See np.argsort and np.lexsort for faster alternatives. But the keys could be as in `lexsort`. I am currently using `argsort`, but I can do so because I don't need a lexicographically determined sort order for the indexes. To close with a related question: what is the preferred idiom for a descending sort? adding [::-1] just creates a new view, pretty low cost. Josef Thanks, Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique return_index order?
On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Mar 21, 2014 at 6:49 PM, josef.p...@gmail.com wrote: On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac alan.is...@gmail.comwrote: The documentation of numpy.unique http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html does not seem to promise that return_index=True will always index the *first* occurrence of each unique item, which I believe is the current behavior. A promise would be nice. Is it intended? Yes, it is intended, although the required mergesort wasn't available for all types before numpy 1.7. Does this mean return_inverse works again for all cases, even with return_index? I removed return_index from my code in statsmodels because I make frequent use of return_inverse, which was broken. We don't have any unittests in statsmodels anymore that use both return_xxx. I don't know, needs checking. Seems to work now with a simple trial array of integers. my example from may 2012, thread 1.6.2 no more unique for rows works fine on python 3.3 numpy 1.7.1 groups = np.random.randint(0,4,size=(10,2)) groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, return_inverse=True) uni array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)], dtype=[('f0', 'i4'), ('f1', 'i4')]) uni_inv array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32) np.__version__ '1.7.1' Thanks, Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique return_index order?
On Fri, Mar 21, 2014 at 9:27 PM, josef.p...@gmail.com wrote: On Fri, Mar 21, 2014 at 9:01 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Mar 21, 2014 at 6:49 PM, josef.p...@gmail.com wrote: On Fri, Mar 21, 2014 at 8:46 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Mar 21, 2014 at 6:26 PM, Alan G Isaac alan.is...@gmail.comwrote: The documentation of numpy.unique http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html does not seem to promise that return_index=True will always index the *first* occurrence of each unique item, which I believe is the current behavior. A promise would be nice. Is it intended? Yes, it is intended, although the required mergesort wasn't available for all types before numpy 1.7. summary, AFAICS: since numpy 1.6.2 np.unique used mergesort if return_index=True and provides a stable sort. Josef Does this mean return_inverse works again for all cases, even with return_index? I removed return_index from my code in statsmodels because I make frequent use of return_inverse, which was broken. We don't have any unittests in statsmodels anymore that use both return_xxx. I don't know, needs checking. Seems to work now with a simple trial array of integers. my example from may 2012, thread 1.6.2 no more unique for rows works fine on python 3.3 numpy 1.7.1 groups = np.random.randint(0,4,size=(10,2)) groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, return_inverse=True) uni array([(0, 2), (0, 3), (1, 0), (2, 1), (2, 2), (3, 2), (3, 3)], dtype=[('f0', 'i4'), ('f1', 'i4')]) uni_inv array([1, 6, 3, 4, 5, 3, 2, 5, 0, 2], dtype=int32) np.__version__ '1.7.1' Thanks, Josef Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Thu, Mar 20, 2014 at 1:25 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith n...@pobox.com wrote: Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is: - a count of how many 'dot' calls there are - a count of how often we see left-associative nestings: dot(dot(a, b), c) - a count of how often we see right-associative nestings: dot(a, dot(b, c)) Running it on a bunch of projects, I get: | project | dots | left | right | right/left | |--+--+--+---+| | scipy| 796 | 53 |27 | 0.51 | | nipy | 275 |3 |19 | 6.33 | | scikit-learn | 472 | 11 |10 | 0.91 | | statsmodels | 803 | 46 |38 | 0.83 | | astropy | 17 |0 | 0 |nan | | scikit-image | 15 |1 | 0 | 0.00 | |--+--+--+---+| | total| 2378 | 114 |94 | 0.82 | Another way to visualize this, converting each contiguous chain of calls to np.dot into a parenthesized expression, and then counting how often we see each pattern. 1943 (_ @ _) 100 ((_ @ _) @ _) # left 86 (_ @ (_ @ _)) # right 2 (_ @ ((_ @ _) @ _)) 2 (((_ @ _) @ _) @ _) # left 1 ((_ @ (_ @ _)) @ _) 1 ((_ @ _) @ (_ @ _)) 1 (((_ @ _) @ _) @ (_ @ _)) 1 ((_ @ ((_ @ _) @ _)) @ _) 1 ((_ @ _) @ (_ @ (_ @ _))) (This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3 different patterns that have a consistent associativity. From this I'm leaning towards the conclusions that: - Expressions with complex parenthesization do happen, but probably not often enough to justify elaborate stuff like my 'chaining' proposal -- only 8.7% of these cases involve more than one @. just for statsmodels We do have a very large amount of chaining, but in many cases this has been taken out of a single expression into a temporary or permanent variable for parts of the chain. (similar to the quadratic form example in the PEP), either for clarity (a temp variable), or because one dot product shows up several times in the same expression (quadratic forms) or because we need to keep it around for reuse in other expressions. That's what I tried to explain before, that chaining and breaking up larger multi-dot expressions is most of the time a intentional choice and not just random because the the dot function forces us. The most convincing argument for me for @ is that it makes parenthesis visible (until I realized that I didn't really care about @). This reduces the cases where we separate out a dot product for clarity and readibility, but still leaves us with the other two cases, where our chaining won't change whatever numpy provides additionally. Josef - There's very little support here for the intuition that right-associativity is more useful than left-associativity on a day-to-day basis. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Wed, Mar 19, 2014 at 2:24 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith n...@pobox.com wrote: On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith n...@pobox.com wrote: Mathematica: instead of having an associativity, a @ b @ c gets converted into mdot([a, b, c]) So, I've been thinking about this (thanks to @rfateman for pointing it out), and wondering if Mathematica's approach is worth following up more. (It would need to make it past python-dev, of course, but worst case is just that they say no and we're back where we are now, so we might as well think it through.) I predict with near-certainty that this will be rejected, I guess that's what everyone thought about @ too? ;-) but that doesn't prevent it from derailing the discussion. This proposal is unlike anything else in Python. Chained comparisons are *not* similar to this proposal. The chaining only happens at the syntax level, not the semantics. `a b c` gets compiled down to `a.__lt__(b) and b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`. Yes, the syntax is the same as chained comparisons, and the dispatch is a generalization of regular operators. It is unusual; OTOH, @ is unusual in that no other operators in Python have the property that evaluating in the wrong order can cost you seconds of time and gigabytes of memory. Perhaps. We have approval for a binary @ operator. Take the win. We have approval, and we have a request: that we figure out how @ should work in detail to be most useful to us. Maybe that's this proposal; maybe not. Ultimately rejected-or-not-rejected comes down to how strong the arguments for something are. And while we can make some guesses about that, it's impossible to know how strong an argument will be until one sits down and works it out. So I still would like to hear what people think, even if it just ends in the conclusion that it's a terrible idea ;-). What happens if you have 5 @ in a row? My head hurts if I had to think about what would actually be going on. and don't forget, the sparse matrix is stuck in the middle. But I would be happy to have a optimizing multi_dot or chain_dot function when it feels safe enough. As for arguments against the grouping semantics, I did think of one another case where @ is not associative, though it's pretty weird: In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4)) In [10]: np.dot(a, np.dot(a, a.astype(float))) Out[10]: array([[ 1680., 1940., 2200., 2460.], [ 4880., 5620., 6360., 7100.], [ 8080., 9300., 10520., 11740.], [ 11280., 12980., 14680., 16380.]]) In [12]: np.dot(np.dot(a, a), a.astype(float)) Out[12]: array([[ 1680., 1940., 2200., 2460.], [-1264., -1548., -1832., -2116.], [ 1936., 2132., 2328., 2524.], [-1008., -1100., -1192., -1284.]]) (What's happening is that we have int8 @ int8 @ float, so (int8 @ int8) @ float has overflows in the first computation, but int8 @ (int8 @ float) does all the computations in float, with no overflows.) That's similar to my example before that mixes in some scalar *. I thought of it as an argument for same-left Josef -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
I'm still bothered by what Nathaniel mentioned about mixing 1d and 2d arrays c = np.arange(4) a = np.arange(16).reshape(4,4) cc = c[:,None] a.dot(c).dot(c.T) 420 a.dot(c.dot(c.T)) array([[ 0, 14, 28, 42], [ 56, 70, 84, 98], [112, 126, 140, 154], [168, 182, 196, 210]]) a.dot(cc).dot(cc.T) array([[ 0, 14, 28, 42], [ 0, 38, 76, 114], [ 0, 62, 124, 186], [ 0, 86, 172, 258]]) a.dot(cc.dot(cc.T)) array([[ 0, 14, 28, 42], [ 0, 38, 76, 114], [ 0, 62, 124, 186], [ 0, 86, 172, 258]]) hint: c.dot(c.T) 14 and I expect it will be a lot more fun if we mix in some 3d or nd arrays. I think some of the decisions should not be driven by what is the most convenient for the usual cases, but by how easy it is to read your code and find the bugs where we made silly mistakes. A biased view from someone who learned how to use numpy and scipy by debugging. Matlab and GAUSS are user friendly, they don't allow for reduced dimension and never steal an axis in a reduce operation. I didn't manage to come up with more difficult examples (and ran out of time) (a.dot(c).dot(c.T)).dot(2*a) Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'numpy.int32' object has no attribute 'dot' If we make too many mistakes, then numpy tells us. In the above examples, the scalar dot matrix would raise according to the PEP. I cannot come up with examples where we mix 3d and 2d and 1d because dot currently doesn't do it. sane-left Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky ndar...@mac.comwrote: On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith n...@pobox.com wrote: In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation. Well-behaved classes are hard to come by in practice. The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs. Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head. With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design. I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion. (The pow operator is special because the corresponding mathematical operation is non-associative.) Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c). What are the chances that they will correctly figure out what a // b // c means after this? One case where we need to keep track of left or right is type promotion a.shape (100,) 1. * a.dot(a) -98.0 (1.*a).dot(a) 328350.0 a.dtype dtype('int8') 1. * a @ a ??? similar to 1. * 2 / 3 0. 1. * (2 / 3) # I'm not in the `future` 0.0 Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion 1. * a.dot(a) -98.0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Mon, Mar 17, 2014 at 1:18 PM, josef.p...@gmail.com wrote: On Mon, Mar 17, 2014 at 12:50 PM, Alexander Belopolsky ndar...@mac.comwrote: On Mon, Mar 17, 2014 at 12:13 PM, Nathaniel Smith n...@pobox.com wrote: In practice all well-behaved classes have to make sure that they implement __special__ methods in such a way that all the different variations work, no matter which class ends up actually handling the operation. Well-behaved classes are hard to come by in practice. The @ operator may fix the situation with np.matrix, so take a look at MaskedArray with its 40-line __array_wrap__ and no end of bugs. Requiring superclass __method__ to handle creation of subclass results correctly is turning Liskov principle on its head. With enough clever tricks and tight control over the full class hierarchy you can make it work in some cases, but it is not a good design. I am afraid that making @ special among other binary operators that implement mathematically associative operations will create a lot of confusion. (The pow operator is special because the corresponding mathematical operation is non-associative.) Imagine teaching someone that a % b % c = (a % b) % c, but a @ b @ c = a @ (b @ c). What are the chances that they will correctly figure out what a // b // c means after this? One case where we need to keep track of left or right is type promotion a.shape (100,) 1. * a.dot(a) -98.0 (1.*a).dot(a) 328350.0 a.dtype dtype('int8') 1. * a @ a ??? similar to 1. * 2 / 3 0. 1. * (2 / 3) # I'm not in the `future` 0.0 I thought of sending a message with I'm +-1 on either, but I'm not I'm again in favor of left, because it's the simplest to understand A.dot(B).dot(C) with some * mixed in I understand now the computational argument in favor of right x @ inv(x.T @ x) @ x.T @ y ( with shapes T,k k,k k,T T,1 ) or x @ pinv(x) @ y(with shapes T,k k,T T,1 ) with with Tk (last 1 could be a m1 with Tm) However, we don't write code like that most of the time. Alan's students won't care much if some intermediate arrays blow up. In library code like in statsmodels it's almost always a conscious choice of where to set the parenthesis and, more often, which part of a long array expression is taken out as a temporary or permanent variable. I think almost the only uses of chain_dot(A, B, C) (which is right) is for quadratic forms xtxi = pinv(np.dot(exog.T, exog)) # k,k xtdx = np.dot(exog.T * d[np.newaxis, :], exog) # k,k vcov = chain_dot(xtxi, xtdx, xtxi) # kk, kk, kk (from Quantreg) I think optimizing this way is relatively easy On the other hand, I worry a lot more about messy cases with different dtypes or different classes involved as Alexander has pointed out. Cases that might trip up medium to medium-advanced numpy users. (Let's see, I have to read @ back to front, and * front to back, and why did I put a sparse matrix in the middle and a masked array at the end. Oh no, that's not a masked array it's a panda.) compared to (Somewhere there is a mistake, let's go through all terms from the beginning to the end) Josef Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion 1. * a.dot(a) -98.0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Mon, Mar 17, 2014 at 6:33 PM, Christophe Bal projet...@gmail.com wrote: I think that weak-left is a little strange, just think a little of the operators used by mathematicians that always follow a hierarchy. A parser is mostly done using grammars : see http://docs.python.org/3.1/reference/grammar.html. Defining *-product to have stronger priority than the @-product, and this last having stronger priority than +, will make the changes in the grammar easier. I'm now convinced of the usefulness of @ and @@ too but I also think that you must think of other uses than only for numpy. In other words, numpy is a the good argument for this new operators, but this can also open new perspectives for other uses. My main problem with weak-left (* higher) and tight-left (@ higher) compared to same-left is that I don't see any obvious choice between the weak and tight. I don't think I would have problems with readability. Wikipedia doesn't say anything about precedence of Hadamard versus matrix product. matlab, IDL and Gauss (I checked the manual) all use same-left, as Nathaniel pointed out. For scalar * together with dot product which is more common in formulas, we would just read it sequentially, i.e. same-left. I don't remember when I have seen dot-in-a-circle in a paper, but I don't think there was any precedence either. --- I guess the same applies for other (mis)uses of @ from math import sqrt class MyOp(object): def __init__(self, func): self.func = func def __at__(self, x): return [self.func(xi) for xi in x] myop = MyOp(lambda x: sqrt(x)) print myop.__at__(range(3)) # myop @ range(5) print myop.__at__(range(3) * 2) # myop @ (range(5) * 2) print myop.__at__(range(3)) * 3 # myop @ range(5) * 3 ''' [0.0, 1.0, 1.4142135623730951] [0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951] [0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951, 0.0, 1.0, 1.4142135623730951] ''' - Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator
On Sun, Mar 16, 2014 at 10:54 AM, Nathaniel Smith n...@pobox.com wrote: On Sun, Mar 16, 2014 at 2:39 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Note that I am not opposed to extra operators in python, and only mildly opposed to a matrix multiplication operator in numpy; but let me lay out the case against, for your consideration. First of all, the use of matrix semantics relative to arrays semantics is extremely rare; even in linear algebra heavy code, arrays semantics often dominate. As such, the default of array semantics for numpy has been a great choice. Ive never looked back at MATLAB semantics. Different people work on different code and have different experiences here -- yours may or may be typical yours. Pauli did some quick checks on scikit-learn nipy scipy, and found that in their test suites, uses of np.dot and uses of elementwise-multiplication are ~equally common: https://github.com/numpy/numpy/pull/4351#issuecomment-37717330h Secondly, I feel the urge to conform to a historical mathematical notation is misguided, especially for the problem domain of linear algebra. Perhaps in the world of mathematics your operation is associative or commutes, but on your computer, the order of operations will influence both outcomes and performance. Even for products, we usually care not only about the outcome, but also how that outcome is arrived at. And along the same lines, I don't suppose I need to explain how I feel about A@@-1 and the likes. Sure, it isn't to hard to learn or infer this implies a matrix inverse, but why on earth would I want to pretend the rich complexity of numerical matrix inversion can be mangled into one symbol? Id much rather write inv or pinv, or whatever particular algorithm happens to be called for given the situation. Considering this isn't the num-lisp discussion group, I suppose I am hardly the only one who feels so. My impression from the other thread is that @@ probably won't end up existing, so you're safe here ;-). On the whole, I feel the @ operator is mostly superfluous. I prefer to be explicit about where I place my brackets. I prefer to be explicit about the data layout and axes that go into a (multi)linear product, rather than rely on obtuse row/column conventions which are not transparent across function calls. When I do linear algebra, it is almost always vectorized over additional axes; how does a special operator which is only well defined for a few special cases of 2d and 1d tensors help me with that? Einstein notation is coming up on its 100th birthday and is just as blackboard-friendly as matrix product notation. Yet there's still a huge number of domains where the matrix notation dominates. It's cool if you aren't one of the people who find it useful, but I don't think it's going anywhere soon. Note that I don't think there is much harm in an @ operator; but I don't see myself using it either. Aside from making textbook examples like a gram-schmidt orthogonalization more compact to write, I don't see it having much of an impact in the real world. The analysis in the PEP found ~780 calls to np.dot, just in the two projects I happened to look at. @ will get tons of use in the real world. Maybe all those people who will be using it would be happier if they were using einsum instead, I dunno, but it's an argument you'll have to convince them of, not me :-). Just as example I just read for the first time two journal articles in econometrics that use einsum notation. I have no idea what their formulas are supposed to mean, no sum signs and no matrix algebra. I need to have a strong incentive to stare at those formulas again. (statsmodels search finds 1520 dot, including sandbox and examples) Josef TODO: learn how to use einsums -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?
I think I wouldn't use anything like @@ often enough to remember it's meaning. I'd rather see english names for anything that is not **very** common. I find A@@-1 pretty ugly compared to inv(A) A@@(-0.5) might be nice (do we have matrix_sqrt ?) Josef On Sat, Mar 15, 2014 at 5:11 PM, Stephan Hoyer sho...@gmail.com wrote: Speaking only for myself (and as someone who has regularly used matrix powers), I would not expect matrix power as @@ to follow from matrix multiplication as @. I do agree that matrix power is the only reasonable use for @@ (given @), but it's still not something I would be confident enough to know without looking up. We should keep in mind that each new operator imposes some (small) cognitive burden on everyone who encounters them for the first time, and, in this case, this will include a large fraction of all Python users, whether they do numerical computation or not. Guido has given us a tremendous gift in the form of @. Let's not insist on @@, when it is unclear if the burden of figuring out what @@ means it would be worth using, even for heavily numeric code. I would certainly prefer to encounter norm(A), inv(A), matrix_power(A, n), fractional_matrix_power(A, n) and expm(A) rather than their infix equivalents. It will certainly not be obvious which of these @@ will support for objects from any given library. One useful data point might be to consider whether matrix power is available as an infix operator in other languages commonly used for numerical work. AFAICT from some quick searches: MATLAB: Yes R: No IDL: No All of these languages do, of course, implement infix matrix multiplication, but it is apparently not clear at all whether the matrix power is useful. Best, Stephan On Sat, Mar 15, 2014 at 9:03 AM, Olivier Delalleau sh...@keba.be wrote: 2014-03-15 11:18 GMT-04:00 Charles R Harris charlesr.har...@gmail.com: On Fri, Mar 14, 2014 at 10:32 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Here's the second thread for discussion about Guido's concerns about PEP 465. The issue here is that PEP 465 as currently written proposes two new operators, @ for matrix multiplication and @@ for matrix power (analogous to * and **): http://legacy.python.org/dev/peps/pep-0465/ The main thing we care about of course is @; I pushed for including @@ because I thought it was nicer to have than not, and I thought the analogy between * and ** might make the overall package more appealing to Guido's aesthetic sense. It turns out I was wrong :-). Guido is -0 on @@, but willing to be swayed if we think it's worth the trouble to make a solid case. Note that question now is *not*, how will @@ affect the reception of @. @ itself is AFAICT a done deal, regardless of what happens with @@. For this discussion let's assume @ can be taken for granted, and that we can freely choose to either add @@ or not add @@ to the language. The question is: which do we think makes Python a better language (for us and in general)? Some thoughts to start us off: Here are the interesting use cases for @@ that I can think of: - 'vector @@ 2' gives the squared Euclidean length (because it's the same as vector @ vector). Kind of handy. - 'matrix @@ n' of course gives the matrix power, which is of marginal use but does come in handy sometimes, e.g., when looking at graph connectivity. - 'matrix @@ -1' provides a very transparent notation for translating textbook formulas (with all their inverses) into code. It's a bit unhelpful in practice, because (a) usually you should use solve(), and (b) 'matrix @@ -1' is actually more characters than 'inv(matrix)'. But sometimes transparent notation may be important. (And in some cases, like using numba or theano or whatever, 'matrix @@ -1 @ foo' could be compiled into a call to solve() anyway.) (Did I miss any?) In practice it seems to me that the last use case is the one that's might matter a lot practice, but then again, it might not -- I'm not sure. For example, does anyone who teaches programming with numpy have a feeling about whether the existence of '@@ -1' would make a big difference to you and your students? (Alan? I know you were worried about losing the .I attribute on matrices if switching to ndarrays for teaching -- given that ndarray will probably not get a .I attribute, how much would the existence of @@ -1 affect you?) On a more technical level, Guido is worried about how @@'s precedence should work (and this is somewhat related to the other thread about @'s precedence and associativity, because he feels that if we end up giving @ and * different precedence, then that makes it much less clear what to do with @@, and reduces the strength of the */**/@/@@ analogy). In particular, if we want to argue for @@ then we'll need to figure out what expressions like a @@ b @@ c and a ** b @@ c and a @@ b ** c should do. A related question is what @@ should do
Re: [Numpy-discussion] [help needed] associativity and precedence of '@'
On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Here's the main blocker for adding a matrix multiply operator '@' to Python: we need to decide what we think its precedence and associativity should be. I'll explain what that means so we're on the same page, and what the choices are, and then we can all argue about it. But even better would be if we could get some data to guide our decision, and this would be a lot easier if some of you all can help; I'll suggest some ways you might be able to do that. So! Precedence and left- versus right-associativity. If you already know what these are you can skim down until you see CAPITAL LETTERS. We all know what precedence is. Code like this: a + b * c gets evaluated as: a + (b * c) because * has higher precedence than +. It binds more tightly, as they say. Python's complete precedence able is here: http://docs.python.org/3/reference/expressions.html#operator-precedence Associativity, in the parsing sense, is less well known, though it's just as important. It's about deciding how to evaluate code like this: a * b * c Do we use a * (b * c)# * is right associative or (a * b) * c# * is left associative ? Here all the operators have the same precedence (because, uh... they're the same operator), so precedence doesn't help. And mostly we can ignore this in day-to-day life, because both versions give the same answer, so who cares. But a programming language has to pick one (consider what happens if one of those objects has a non-default __mul__ implementation). And of course it matters a lot for non-associative operations like a - b - c or a / b / c So when figuring out order of evaluations, what you do first is check the precedence, and then if you have multiple operators next to each other with the same precedence, you check their associativity. Notice that this means that if you have different operators that share the same precedence level (like + and -, or * and /), then they have to all have the same associativity. All else being equal, it's generally considered nice to have fewer precedence levels, because these have to be memorized by users. Right now in Python, every precedence level is left-associative, except for '**'. If you write these formulas without any parentheses, then what the interpreter will actually execute is: (a * b) * c (a - b) - c (a / b) / c but a ** (b ** c) Okay, that's the background. Here's the question. We need to decide on precedence and associativity for '@'. In particular, there are three different options that are interesting: OPTION 1 FOR @: Precedence: same as * Associativity: left My shorthand name for it: same-left (yes, very creative) This means that if you don't use parentheses, you get: a @ b @ c - (a @ b) @ c a * b @ c - (a * b) @ c a @ b * c - (a @ b) * c OPTION 2 FOR @: Precedence: more-weakly-binding than * Associativity: right My shorthand name for it: weak-right This means that if you don't use parentheses, you get: a @ b @ c - a @ (b @ c) a * b @ c - (a * b) @ c a @ b * c - a @ (b * c) OPTION 3 FOR @: Precedence: more-tightly-binding than * Associativity: right My shorthand name for it: tight-right This means that if you don't use parentheses, you get: a @ b @ c - a @ (b @ c) a * b @ c - a * (b @ c) a @ b * c - (a @ b) * c We need to pick which of which options we think is best, based on whatever reasons we can think of, ideally more than hmm, weak-right gives me warm fuzzy feelings ;-). (In principle the other 2 possible options are tight-left and weak-left, but there doesn't seem to be any argument in favor of either, so we'll leave them out of the discussion.) Some things to consider: * and @ are actually not associative (in the math sense) with respect to each other, i.e., (a * b) @ c and a * (b @ c) in general give different results when 'a' is not a scalar. So considering the two expressions 'a * b @ c' and 'a @ b * c', we can see that each of these three options gives produces different results in some cases. Same-left is the easiest to explain and remember, because it's just, @ acts like * and /. So we already have to know the rule in order to understand other non-associative expressions like a / b / c or a - b - c, and it'd be nice if the same rule applied to things like a * b @ c so we only had to memorize *one* rule. (Of course there's ** which uses the opposite rule, but I guess everyone internalized that one in secondary school; that's not true for * versus @.) This is definitely the default we should choose unless we have a good reason to do otherwise. BUT: there might indeed be a good reason to do otherwise, which is the whole reason this has come up. Consider: Mat1 @ Mat2 @ vec Obviously this will execute much more quickly if we do Mat1 @ (Mat2 @ vec) because that results in
Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?
On Sat, Mar 15, 2014 at 8:47 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sat, Mar 15, 2014 at 8:38 PM, josef.p...@gmail.com wrote: I think I wouldn't use anything like @@ often enough to remember it's meaning. I'd rather see english names for anything that is not **very** common. I find A@@-1 pretty ugly compared to inv(A) A@@(-0.5) might be nice (do we have matrix_sqrt ?) scipy.linalg.sqrtm: http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html maybe a good example: I could never figured that one out M = sqrtm(A) A = M @ M but what we use in stats is A = R.T @ R (eigenvectors dot diag(sqrt of eigenvalues) which sqrt is A@@(0.5) ? Josef Warren Josef On Sat, Mar 15, 2014 at 5:11 PM, Stephan Hoyer sho...@gmail.com wrote: Speaking only for myself (and as someone who has regularly used matrix powers), I would not expect matrix power as @@ to follow from matrix multiplication as @. I do agree that matrix power is the only reasonable use for @@ (given @), but it's still not something I would be confident enough to know without looking up. We should keep in mind that each new operator imposes some (small) cognitive burden on everyone who encounters them for the first time, and, in this case, this will include a large fraction of all Python users, whether they do numerical computation or not. Guido has given us a tremendous gift in the form of @. Let's not insist on @@, when it is unclear if the burden of figuring out what @@ means it would be worth using, even for heavily numeric code. I would certainly prefer to encounter norm(A), inv(A), matrix_power(A, n), fractional_matrix_power(A, n) and expm(A) rather than their infix equivalents. It will certainly not be obvious which of these @@ will support for objects from any given library. One useful data point might be to consider whether matrix power is available as an infix operator in other languages commonly used for numerical work. AFAICT from some quick searches: MATLAB: Yes R: No IDL: No All of these languages do, of course, implement infix matrix multiplication, but it is apparently not clear at all whether the matrix power is useful. Best, Stephan On Sat, Mar 15, 2014 at 9:03 AM, Olivier Delalleau sh...@keba.bewrote: 2014-03-15 11:18 GMT-04:00 Charles R Harris charlesr.har...@gmail.com : On Fri, Mar 14, 2014 at 10:32 PM, Nathaniel Smith n...@pobox.comwrote: Hi all, Here's the second thread for discussion about Guido's concerns about PEP 465. The issue here is that PEP 465 as currently written proposes two new operators, @ for matrix multiplication and @@ for matrix power (analogous to * and **): http://legacy.python.org/dev/peps/pep-0465/ The main thing we care about of course is @; I pushed for including @@ because I thought it was nicer to have than not, and I thought the analogy between * and ** might make the overall package more appealing to Guido's aesthetic sense. It turns out I was wrong :-). Guido is -0 on @@, but willing to be swayed if we think it's worth the trouble to make a solid case. Note that question now is *not*, how will @@ affect the reception of @. @ itself is AFAICT a done deal, regardless of what happens with @@. For this discussion let's assume @ can be taken for granted, and that we can freely choose to either add @@ or not add @@ to the language. The question is: which do we think makes Python a better language (for us and in general)? Some thoughts to start us off: Here are the interesting use cases for @@ that I can think of: - 'vector @@ 2' gives the squared Euclidean length (because it's the same as vector @ vector). Kind of handy. - 'matrix @@ n' of course gives the matrix power, which is of marginal use but does come in handy sometimes, e.g., when looking at graph connectivity. - 'matrix @@ -1' provides a very transparent notation for translating textbook formulas (with all their inverses) into code. It's a bit unhelpful in practice, because (a) usually you should use solve(), and (b) 'matrix @@ -1' is actually more characters than 'inv(matrix)'. But sometimes transparent notation may be important. (And in some cases, like using numba or theano or whatever, 'matrix @@ -1 @ foo' could be compiled into a call to solve() anyway.) (Did I miss any?) In practice it seems to me that the last use case is the one that's might matter a lot practice, but then again, it might not -- I'm not sure. For example, does anyone who teaches programming with numpy have a feeling about whether the existence of '@@ -1' would make a big difference to you and your students? (Alan? I know you were worried about losing the .I attribute on matrices if switching to ndarrays for teaching -- given that ndarray will probably not get a .I attribute, how much would the existence of @@ -1 affect you?) On a more technical level, Guido is worried about how @@'s precedence should work