Re: [Numpy-discussion] performance of numpy.array()

2015-04-30 Thread Ryan Nelson
I have had good luck with Continuum's Miniconda Python distributions on
Linux.
http://conda.pydata.org/miniconda.html
The `conda` command makes it very easy to create specific testing
environments for Python 2 and 3 with many different packages. Everything is
precompiled, so you won't have to worry about system library differences
between the two clusters.

Hope that helps.

Ryan

On Thu, Apr 30, 2015 at 10:03 AM, simona bellavista 
wrote:

> I have seen a big improvement in performance with  numpy 1.9.2 with python
> 2.7.8, numpy.array takes 5 s instead of 300s.
>
> On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4
> and the results are terrible: numpy.array takes 20s, but the other routines
> are slowed down, for example concatenate and astype and copy and uniform.
> Most of all, the sort function of numpy.dnarray is slowed down by a factor
> at least 10.
>
> On the other cluster I am using python 3.3 with numpy 1.9.0 and it is
> working very well (but I think it is so also because of the hardware). I
> was trying to install python 3.3 on this cluster, but because of other
> issues (error at compile time of h5py library and bug at runtime in the
> dill library) I cannot test it right now.
>
> 2015-04-29 17:47 GMT+02:00 Sebastian Berg :
>
>> There was a major improvement to np.array in some cases.
>>
>> You can probably work around this by using np.concatenate instead of
>> np.array in your case (depends on the usecase, but I will guess you have
>> code doing:
>>
>> np.array([arr1, arr2, arr3])
>>
>> or similar. If your use case is different, you may be out of luck and
>> only an upgrade would help.
>>
>>
>> On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote:
>> > You could try and install your own numpy to check whether that
>> > resolves the problem.
>> >
>> > 2015-04-29 17:40 GMT+02:00 simona bellavista :
>> > on cluster A 1.9.0 and on cluster B 1.8.2
>> >
>> > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen
>> > :
>> > Compile it yourself to know the limitations/benefits
>> > of the dependency libraries.
>> >
>> >
>> > Otherwise, have you checked which versions of numpy
>> > they are, i.e. are they the same version?
>> >
>> >
>> > 2015-04-29 17:05 GMT+02:00 simona bellavista
>> > :
>> >
>> > I work on two distinct scientific clusters. I
>> > have run the same python code on the two
>> > clusters and I have noticed that one is faster
>> > by an order of magnitude than the other (1min
>> > vs 10min, this is important because I run this
>> > function many times).
>> >
>> >
>> > I have investigated with a profiler and I have
>> > found that the cause of this is that (same
>> > code and same data) is the function
>> > numpy.array that is being called 10^5 times.
>> > On cluster A it takes 2 s in total, whereas on
>> > cluster B it takes ~6 min.  For what regards
>> > the other functions, they are generally faster
>> > on cluster A. I understand that the clusters
>> > are quite different, both as hardware and
>> > installed libraries. It strikes me that on
>> > this particular function the performance is so
>> > different. I would have though that this is
>> > due to a difference in the available memory,
>> > but actually by looking with `top` the memory
>> > seems to be used only at 0.1% on cluster B. In
>> > theory numpy is compiled with atlas on cluster
>> > B, and on cluster A it is not clear, because
>> > numpy.__config__.show() returns NOT AVAILABLE
>> > for anything.
>> >
>> >
>> > Does anybody has any insight on that, and if I
>> > can improve the performance on cluster B?
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> >
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Kind regards Nick
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> > __

Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3

2015-03-06 Thread Ryan Nelson
Arnd,

I can see where this is an issue. If you are trying to update your code for
Py3, I still think that it would really help to add a version attribute of
some sort to your new HDF files. You can then write a little check in your
access code that looks for this variable. If it is not present, you know
that it is an old file, and you can use the trick that I gave you.
Otherwise, it will process the file as normal. It could even throw a little
error saying that the file is outdated. You could write a small conversion
script that could run through old files and reprocess them into the new
format. Fortunately, Python is pretty good at automating tasks, even for
hundreds of files :)
It might be informative to ask at the PyTables list to see what they've
done. The Pandas folks also do a lot with HDF files, and they have
certainly worked their way through the Py2-3 transition. Also, because this
is an issue with Python pickle, a quick note on SO might get some hits. I
tried your script using a lists of list, rather than a list of arrays, and
the same problem still persists, so as Pauli notes this is going to be a
problem regardless of the type of attributes you set, I think your just
going to have to hard code some kind of check in your code to switch
behavior. I recently switched to using Py3 exclusively, and although it was
painful at first, I'm quite happy with Py3 overall. I also use the Anaconda
Python distribution, which makes it very easy to have Py2 and Py3
environments if you need to switch back and forth.
Sorry if that doesn't help much. Just some thoughts from my recent
conversion experiences.

Ryan



On Fri, Mar 6, 2015 at 9:48 AM, Arnd Baecker  wrote:

> On Fri, 6 Mar 2015, Pauli Virtanen wrote:
>
> > Arnd Baecker  web.de> writes:
> > [clip]
> >> Still I would have thought that this should be working out-of-the box,
> >> i.e. without the pickle.loads trick?
> >
> > Pickle files should be considered incompatible between Python 2 and
> Python 3.
> >
> > Python 3 interprets all bytes objects saved by Python 2 as str and
> attempts
> > to decode them under some unicode locale. The default locale is ASCII,
> so it
> > will simply just fail in most cases if the files contain any binary data.
> >
> > Failing by default is also the right thing to do, since the saved bytes
> > objects might actually represent strings in some locale, and ASCII is the
> > safest guess.
> >
> > This behavior is that of Python's pickle module, and does not depend on
> Numpy.
>
> Thank's a lot for the explanation!
>
> So what is then the recommded way to save data under python 2 so that
> they can still be loaded under python 3?
>
> For example using np.save with a list of arrays works fine
> either on python 2 or on python 3.
> However it does not work if one tries to open under python 3
> a file generated before on python 2.
> (Again, because pickle is involved internally
>"python3.4/site-packages/numpy/lib/npyio.py",
>line 393, in load  return format.read_array(fid)
>File "python34/lib/python3.4/site-packages/numpy/lib/format.py",
>line 602, in read_array  array = pickle.load(fp)
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0  ...
>
> Just to be clear: I don't want to beat a dead horse here - for my usage
> via pytables I was able to solve the loading of old files following
> Ryan's solutions. Personally I don't use .npy files.
> Maybe saving a list containing arrays is an unusual example ...
>
> Still, I am a little bit worried about backwards-compatibility:
> being able to load old data files is an important issue
> as by this it is possible to check whether current code still
> reproduces previously obtained (maybe also published) results.
>
> Best, Arnd
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3

2015-03-05 Thread Ryan Nelson
This works if run from Py3. Don't know if it will *always* work. From that
GH discussion you linked, it sounds like that is a bit of a hack.
##
"""Illustrate problem with pytables data - python 2 to python 3."""

from __future__ import print_function

import sys
import numpy as np
import tables as tb
import pickle as pkl


def main():
 """Run the example."""
 print("np.__version__=", np.__version__)
 check_on_same_version = False

 arr1 = np.linspace(0.0, 5.0, 6)
 arr2 = np.linspace(0.0, 10.0, 11)
 data = [arr1, arr2]

 # Only generate on python 2.X or check on the same python version:
 if sys.version < "3.0" or check_on_same_version:
 fpt = tb.open_file("tstdat.h5", mode="w")
 fpt.set_node_attr(fpt.root, "list_of_arrays", data)
 fpt.close()

 # Load the saved file:
 fpt = tb.open_file("tstdat.h5", mode="r")
 result = fpt.get_node_attr("/", "list_of_arrays")
 fpt.close()
 print("Loaded:", pkl.loads(result, encoding="latin1"))

main()
###
However, I would consider defining some sort of v2 of your HDF file format,
which converts all of the lists of arrays to CArrays or EArrays in the HDF
file. (https://pytables.github.io/usersguide/libref/homogenous_storage.html)
Otherwise, what is the advantage of using HDF files over just plain
shelves?... Just a thought.
Ryan

On Thu, Mar 5, 2015 at 2:52 AM, Anrd Baecker  wrote:

> Dear all,
>
> when preparing the transition of our repositories from python 2
> to python 3, I encountered a problem loading pytables (.h5) files
> generated using python 2.
> I suspect that it is caused by a problem with pickling numpy arrays
> under python 3:
>
> The code appended at the end of this mail works
> fine on either python 2.7 or python 3.4, however,
> generating the data on python 2 and trying to load
> them on python 3 gives some strange string
> ( b'(lp1\ncnumpy.core.multiarray\n_reconstruct\np2\n(cnumpy\nndarray ...)
> instead of
> [array([ 0.,  1.,  2.,  3.,  4.,  5.]),
>  array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])]
>
> The problem sounds very similar to the one reported here
>https://github.com/numpy/numpy/issues/4879
> which was fixed with numpy 1.9.
>
> I tried different versions/combintations of numpy (including 1.9.2)
> and always end up with the above result.
> Also I tried to reduce the problem down to the level of pure numpy
> and pickle (as in the above bug report):
>
>import numpy as np
>import pickle
>arr1 = np.linspace(0.0, 1.0, 2)
>arr2 = np.linspace(0.0, 2.0, 3)
>data = [arr1, arr2]
>
>p = pickle.dumps(data)
>print(pickle.loads(p))
>p
>
> Using the resulting string for p as input string
> (with b added at the beginnung) under python 3 gives
>UnicodeDecodeError: 'ascii' codec can't decode
>byte 0xf0 in position 14: ordinal not in range(128)
>
>
> Can someone reproduce the problem with pytables?
> Is there maybe work-around?
> (And no: I can't re-generate the "old" data files - it's
> hundreds of .h5 files ... ;-).
>
> Many thanks, best, Arnd
>
>
> ##
> """Illustrate problem with pytables data - python 2 to python 3."""
>
> from __future__ import print_function
>
> import sys
> import numpy as np
> import tables as tb
>
>
> def main():
>  """Run the example."""
>  print("np.__version__=", np.__version__)
>  check_on_same_version = False
>
>  arr1 = np.linspace(0.0, 5.0, 6)
>  arr2 = np.linspace(0.0, 10.0, 11)
>  data = [arr1, arr2]
>
>  # Only generate on python 2.X or check on the same python version:
>  if sys.version < "3.0" or check_on_same_version:
>  fpt = tb.open_file("tstdat.h5", mode="w")
>  fpt.set_node_attr(fpt.root, "list_of_arrays", data)
>  fpt.close()
>
>  # Load the saved file:
>  fpt = tb.open_file("tstdat.h5", mode="r")
>  result = fpt.get_node_attr("/", "list_of_arrays")
>  fpt.close()
>  print("Loaded:", result)
>
> main()
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix Class

2015-02-11 Thread Ryan Nelson
Colin,

I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test conda
environment with Python2.7 and Numpy 1.7.0, and I get the same:


Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014, 16:57:52)
[MSC v
.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.3.1 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help  -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: np.mat([4,'5',6])
Out[3]:
matrix([['4', '5', '6']],
   dtype='|S1')

In [4]: np.mat([4,'5',6], dtype=int)
Out[4]: matrix([[4, 5, 6]])
###

As to your comment about coordinating with Statsmodels, you should see the
links in the thread that Alan posted:
http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
Josef's comments at the time seem to echo the issues the devs (and others)
have with the matrix class. Maybe things have changed with Statsmodels.

I know I mentioned Sage and SageMathCloud before. I'll just point out that
there are folks that use this for real research problems, not just as a
pedagogical tool. They have a Matrix/vector/column_matrix class that do
what you were expecting from your problems posted above. Indeed below is a
(truncated) cut and past from a Sage Worksheet. (See
http://www.sagemath.org/doc/tutorial/tour_linalg.html)
##
In : Matrix([1,'2',3])
Error in lines 1-1
Traceback (most recent call last):
TypeError: unable to find a common ring for all elements

In : Matrix([[1,2,3],[4,5]])
ValueError: List of rows is not valid (rows are wrong types or lengths)

In : vector([1,2,3])
(1, 2, 3)

In : column_matrix([1,2,3])
[1]
[2]
[3]
##

Large portions of the custom code and wrappers in Sage are written in
Python. I don't think their Matrix object is a subclass of ndarray, so
perhaps you could strip out the Matrix stuff from here to make a separate
project with just the Matrix stuff, if you don't want to go through the
Sage interface.


On Wed, Feb 11, 2015 at 11:54 AM, cjw  wrote:

>
> On 11-Feb-15 10:21 AM, Ryan Nelson wrote:
>
> So:
>
> In [2]: np.mat([4,'5',6])
> Out[2]:
> matrix([['4', '5', '6']], dtype='
> In [3]: np.mat([4,'5',6], dtype=int)
> Out[3]: matrix([[4, 5, 6]])
>
>
>  Thanks Ryan,
>
> We are not singing from the same hymn book.
>
> Using PyScripter, I get:
>
> *** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
> (AMD64)] on win32. ***
> >>> import numpy as np
> >>> print('Numpy version: ', np.__version__)
> ('Numpy version: ', '1.9.0')
> >>>
>
> Could you say which version you are using please?
>
> Colin W
>
>
> On Tue, Feb 10, 2015 at 5:07 PM, cjw   wrote:
>
>
>  It seems to be agreed that there are weaknesses in the existing Numpy
> Matrix
> Class.
>
> Some problems are illustrated below.
>
> I'll try to put some suggestions over the coming weeks and would appreciate
> comments.
>
> Colin W.
>
> Test Script:
>
> if __name__ == '__main__':
> a= mat([4, 5, 6])   # Good
> print('a: ', a)
> b= mat([4, '5', 6]) # Not the expected result
> print('b: ', b)
> c= mat([[4, 5, 6], [7, 8]]) # Wrongly accepted as rectangular
> print('c: ', c)
> d= mat([[1, 2, 3]])
> try:
> d[0, 1]= 'b'# Correctly flagged, not numeric
> except ValueError:
> print("d[0, 1]= 'b' # Correctly flagged, not numeric",
> '
> ValueError')
> print('d: ', d)
>
> Result:
>
> *** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
> (AMD64)] on win32. ***
>
> a:  [[4 5 6]]
> b:  [['4' '5' '6']]
> c:  [[[4, 5, 6] [7, 8]]]
> d[0, 1]= 'b' # Correctly flagged, not numeric  ValueError
> d:  [[1 2 3]]
>
>
>
>
>
> --
> View this message in 
> context:http://numpy-discussion.10968.n7.nabble.com/Matrix-Class-tp39719.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix Class

2015-02-11 Thread Ryan Nelson
So:

In [2]: np.mat([4,'5',6])
Out[2]:
matrix([['4', '5', '6']], dtype=' wrote:

> It seems to be agreed that there are weaknesses in the existing Numpy
> Matrix
> Class.
>
> Some problems are illustrated below.
>
> I'll try to put some suggestions over the coming weeks and would appreciate
> comments.
>
> Colin W.
>
> Test Script:
>
> if __name__ == '__main__':
> a= mat([4, 5, 6])   # Good
> print('a: ', a)
> b= mat([4, '5', 6]) # Not the expected result
> print('b: ', b)
> c= mat([[4, 5, 6], [7, 8]]) # Wrongly accepted as rectangular
> print('c: ', c)
> d= mat([[1, 2, 3]])
> try:
> d[0, 1]= 'b'# Correctly flagged, not numeric
> except ValueError:
> print("d[0, 1]= 'b' # Correctly flagged, not numeric",
> '
> ValueError')
> print('d: ', d)
>
> Result:
>
> *** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
> (AMD64)] on win32. ***
> >>>
> a:  [[4 5 6]]
> b:  [['4' '5' '6']]
> c:  [[[4, 5, 6] [7, 8]]]
> d[0, 1]= 'b' # Correctly flagged, not numeric  ValueError
> d:  [[1 2 3]]
> >>>
>
>
>
>
>
> --
> View this message in context:
> http://numpy-discussion.10968.n7.nabble.com/Matrix-Class-tp39719.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-08 Thread Ryan Nelson
Colin,

I'll second the endorsement of Sage; however, for teaching purposes, I
would suggest Sage Math Cloud. It is a free, web-based version of Sage, and
it does not require you or the students to install any software (besides a
new-ish web browser). It also make sharing/collaborative work quite easy as
well. I've used this a bit for demos, and it's great. The author William
Stein is good at correcting bugs/issues very quickly.

Sage implements it's own Matrix and Vector classes, and the Vector class
has a "column" method that returns a column vector (transpose).
http://www.sagemath.org/doc/tutorial/tour_linalg.html

For what it's worth, I agree with others about the benefits of avoiding a
Matrix class in Numpy. In my experience, it certainly makes things cleaner
in larger projects when I always use NDArray and just call the appropriate
linear algebra functions (e.g. np.dot, etc) when that is context I need.

Anyway, just my two cents.

Ryan

On Wed, Jan 7, 2015 at 2:44 PM, cjw  wrote:

>  Thanks Alexander,
>
> I'll look at Sage.
>
> Colin W.
>
>
> On 06-Jan-15 8:38 PM, Alexander Belopolsky wrote:
>
> On Tue, Jan 6, 2015 at 8:20 PM, Nathaniel Smith  
>  wrote:
>
>
>  Since matrices are now part of some high school curricula, I urge that
>
>  they
>
>  be treated appropriately in Numpy.  Further, I suggest that
>
>  consideration be
>
>  given to establishing V and VT sub-classes, to cover vectors and
>
>  transposed
>
>  vectors.
>
>  The numpy devs don't really have the interest or the skills to create
> a great library for pedagogical use in high schools. If you're
> interested in an interface like this, then I'd suggest creating a new
> package focused specifically on that (which might use numpy
> internally). There's really no advantage in glomming this into numpy
> proper.
>
>
> Sorry for taking this further off-topic, but I recently discovered an
> excellent SAGE package,  
> .  While it's targeted
> audience includes math graduate students and research mathematicians, parts
> of it are accessible to schoolchildren.  SAGE is written in Python and
> integrates a number of packages including numpy.
>
> I would highly recommend to anyone interested in using Python for education
> to take a look at SAGE.
>
>
>
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Guidance regarding build and testing

2014-12-22 Thread Ryan Nelson
Maniteja,

Ralf's suggestion for Numpy works very well. In a more general case,
though, you might want to play around with conda, the package manager for
Anaconda's Python distribution (http://continuum.io/downloads).

I use the Miniconda package, which is pretty much just conda, to create new
"environments," which are a lot like virtualenvs (
http://conda.pydata.org/docs/faq.html#env-creating). The nice thing here is
that all of the dependencies are only downloaded once, and you can make
Python 2 and 3 environments pretty easily.

For example, to make a Python 3 environment, you could use the following:
$ conda create -n npy3 python=3 numpy ipython
$ source activate npy3
That creates a Python3 environment called "npy3" with numpy, ipython, and
all the dependencies. Once activated, you can remove the conda version of
numpy and then install the development version:
[npy3]$ conda remove numpy
[npy3]$ python setup.py install
### Do dev stuff ###
[npy3]$ source deactivate

This is not necessary for what you are trying to do, but it might be
helpful to know about as you move along.

Ryan

On Sun, Dec 21, 2014 at 4:54 PM, Ralf Gommers 
wrote:

>
>
> On Sun, Dec 21, 2014 at 7:17 PM, Maniteja Nandana <
> maniteja.modesty...@gmail.com> wrote:
>
>> Hello Ralf,
>> Thanks for the help. Now I am able to see the modifications in the
>> interpreter. As I was going through broadcasting and slicing, I was eager
>> to try out different modifications to understand the working.
>>
>> On Sun, Dec 21, 2014 at 10:57 PM, Ralf Gommers 
>> wrote:
>>
>>>
>>> Almost. test_xxx.py contains tests for all functions in the file xxx.py
>>>
>>>
>> Sorry was a bit confused then. Thanks for the correction  :)
>>
>>>
>>> Note that there is also a variant which does use virtualenvs documented
>>> at https://github.com/scipy/scipy/blob/master/HACKING.rst.txt#faq
>>> (under "*How do I set up a development version of SciPy in parallel to
>>> a released version that I use to do my job/research?").*
>>>
>>>
 maniteja@ubuntu:~/FOSS/numpy$ echo $PYTHONPATH
 /home/maniteja/FOSS/numpy/numpy


>>> Maybe that's one /numpy too many? If it's right, you should have a dir
>>> /home/maniteja/FOSS/numpy/
>>> numpy/numpy/core.
>>>
>>>  No I have setup.py in home/maniteja/FOSS/numpy/numpy.
>>  Hence, I have core also as home/maniteja/FOSS/numpy/numpy/core
>>
>>
>>> An easy way to check which numpy you're using is "import numpy;
 print(numpy.__file__)".

>>>
>>> Thanks, I didn't get the idea then. It now shows
>> '/home/maniteja/FOSS/numpy/numpy/__init__.pyc'
>>
>> The documentation tells that the PWD of the setup.py is to be set as
>> PYTHONPATH variable.
>>
>
> That's correct. Note that setup.py's are hierarchical - you have one in
> .../FOSS/numpy (this is the main one), one in .../FOSS/numpy/numpy, one in
> .../FOSS/numpy/numpy/core and so on.
>
> This is fine. You should not develop directly on your own master branch.
>>> Rather, keep your master branch in sync with numpy master, and create a new
>>> feature branch for every new feature that you want to work on.
>>>
>>> Ralf
>>>
>>> Oh thanks, I have only used git for my local repositories or
>> collaboration with peers. So just wanted to clarify before I end up messing
>> anything :), though there I know that there needs to be write access to
>> modify the master branch.
>>
>> Lastly, it would be great if you could suggest whether I should learn
>> Cython or any other codebase to understand the source code
>>
>
> It depends on what you want to work on. There's not much Cython in numpy,
> only in numpy.random. There's a lot of things you can work on knowing only
> Python, but the numpy core (ndarray, dtypes, ufuncs, etc.) is written in C.
>
> I'd suggest diving right in and starting with something that can be
> fixed/implemented in Python, something from
> https://github.com/numpy/numpy/labels/Easy%20Fix perhaps. Then send a PR
> for that so you get some feedback and a feeling for how the process of
> contributing works.
>
>
>> and also the timings preferable to work and discuss on the mailing lists
>> as I stay in India ,which is GMT+5:30 timezone. This is my winter holidays.
>> So, I could adjust my timings accordingly as I have no schoolwork :)
>>
>
> I wouldn't worry about that. In many cases it takes a day or couple of
> days before someone replies, especially if the topic requires detailed
> knowledge of the codebase. And the people on this list are split roughly
> equally between the US and Europe with smaller representations from all
> other continents, so there's always someone awake:)
>
> Cheers,
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.spacing question

2014-12-05 Thread Ryan Nelson
Alok Singhal  edgestreamlp.com> writes:

> 
> On Thu, Dec 4, 2014 at 4:25 PM, Ryan Nelson  
gmail.com> wrote:
> >
> > I guess I'm a little confused about how the spacing values are 
calculated.
> 
> np.spacing(x) is basically the same as np.nextafter(x, np.inf) - x,
> i.e., it returns the minimum positive number that can be added to x to
> get a number that's different from x.
> 
> > My expectation is that the first logical test should give an output 
array
> > where all of the results are the same. But it is also very likely 
that I
> > don't have any idea what's going on. Can someone provide some 
clarification?
> 
> For 1e-10, np.spacing() is 1.2924697071141057e-26.  1e-10 * eps is
> 2.2204460492503132e-26, which, when added to 1e-10 rounds to the
> closest number that can be represented in a 64-bit floating-point
> representation.  That happens to be 2*np.spacing(1e-10), and not
> 1*np.spacing(1e-10).
> 

Thanks Nathaniel and Alok. Your explanations were very helpful. I was 
expecting that all of those logical tests would come out True. It might 
have been the example in the doc string for 
`assert_array_almost_equal_nulp` that was throwing me off a little bit. 
The precision test in that function is `np.abs(x-y) <= ref`, where `ref` 
is the spacing for the largest values in the two arrays (which is `y` in 
my case). In the doc string, this function is run comparing x to (x*eps 
+ x), which seems like it shouldn't throw an error given the logical 
test in the function. For example, if you change the following `x = 
np.array([1., 1e-9, 1e-20])`, then the assert test function does not 
throw an error for that example.

Anyway, I guess that is the problem with working at the last unit of 
precision in these numbers... Pesky floating point values...



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.spacing question

2014-12-04 Thread Ryan Nelson
Hello everyone,

I was working through the example usage for the test function
`assert_array_almost_equal_nulp`, and it brought up a question regarding
the function `spacing`. Here's some example code:


import numpy as np
from numpy.testing import assert_array_almost_equal_nulp
np.set_printoptions(precision=50)

x = np.array([1., 1e-10, 1e-20])
eps = np.finfo(x.dtype).eps
y = x*eps + x # y must be larger than x


[In]: np.abs(x-y) <= np.spacing(y)
[Out]: array([ True, False,  True], dtype=bool)

[In]: np.spacing(y)
[Out]: array([  2.22044604925031308084726333618164062500e-16,
 1.29246970711410574198657608135931695869658142328262e-26,
 1.50463276905252801019998276764447446760789191266827e-36])

[In]: np.abs(x-y)
[Out]: array([  2.22044604925031308084726333618164062500e-16,
 2.58493941422821148397315216271863391739316284656525e-26,
 1.50463276905252801019998276764447446760789191266827e-36])



I guess I'm a little confused about how the spacing values are calculated.
My expectation is that the first logical test should give an output array
where all of the results are the same. But it is also very likely that I
don't have any idea what's going on. Can someone provide some
clarification?

Thanks

Ryan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creation of ndarray with dtype=np.object : bug?

2014-12-02 Thread Ryan Nelson
Emanuele Olivetti  relativita.com> writes:

> 
> Hi,
> 
> I am using 2D arrays where only one dimension remains constant, e.g.:
> ---
> import numpy as np
> a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
> b = np.array([[9, 8, 7]]) # 1 x 3
> c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
> d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
> ---
> I have a large number of them and need to extract subsets of them
> through fancy indexing and then stack them together. For this reason
> I put them into an array of dtype=np.object, given their non-constant
> nature. Indexing works well :) but stacking does not :( , as you can
> see in the following example:
> ---
> # fancy indexing :)
> data = np.array([a, b, c, d], dtype=np.object)
> idx = [0, 1, 3]
> print(data[idx])
> In [1]:
> [[[1 2 3]
>   [4 5 6]] [[9 8 7]] [[5 5 4]
>   [4 3 3]]]
> 
> # stacking :(
> data2 = np.array([a, b, c], dtype=np.object)
> data3 = np.array([a, d], dtype=np.object)
> together = np.vstack([data2, data3])
> In [2]:
> --
-
> ValueErrorTraceback (most recent call 
last)
>  in ()
> > 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE
> 
> /tmp/python-3276515J.py in ()
>1 data2 = np.array([a, b, c], dtype=np.object)
>2 data3 = np.array([a, d], dtype=np.object)
> > 3 together = np.vstack([data2, data3])
> 
> /usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in 
vstack(tup)
>  224
>  225 """
> --> 226 return _nx.concatenate(map(atleast_2d,tup),0)
>  227
>  228 def hstack(tup):
> 
> ValueError: arrays must have same number of dimensions
> 
> The reason of the error is that data2.shape is "(2,)", while 
data3.shape is "(2, 
> 2, 3)".
> This happens because the creation of ndarrays with dtype=np.object 
tries to be
> "smart" and infer the common dimensions between the objects you put in 
the array
> instead of just creating an array of the objects you give. This leads 
to unexpected
> results when you use it, like the one in the example, because you 
cannot control
> the resulting shape, which is data dependent. Or at least I cannot 
find a way to
> create data3 with shape (2,)...
> 
> How should I address this issue? To me, it looks like a bug in the 
excellent NumPy.
> 
> Best,
> 
> Emanuele
> 

Emanuele,

This doesn't address your question directly. However, I wonder if you 
could approach this problem from a different way to get what you want.

First of all, create a "index" array and then just vstack all of your 
arrays at once.

-
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
b = np.array([[9, 8, 7]]) # 1 x 3
c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3

all_array = [a, b, c, d]

z = []
np.array([z.extend([n,]*i.shape[0]) for n, i in enumerate(all_array)])
z = np.array(z)

varrays = np.vstack(all_array)


Now z looks like this `array([0, 0, 1, 2, 2, 2, 3, 3])` and varrays is a 
vstack of all your data.

To select one of your arrays, you can do something like the following.

-

[In]: varrays[ z == 2 ] # Array c

[Out]:
array([[1, 3, 5],
   [7, 9, 8],
   [6, 4, 2]])
-
Now, if you want to select both arrays b and d, for example, you would 
need a boolean array that looks like this:
array([False, False, True, False, False, False, True, True])
I think there is some Numpy black magic that let's you do this easily 
(e.g. `i_wish = z == [1,3]`), but right now, I can only think about how 
to do this with a loop:


idxs = np.zeros(z.shape, dtype=bool)
for i in [1,3]:
idxs = np.logical_or(idxs, z == i)
idxs



This lets you select from the large loop and get the vstacked arrays 
automatically.


[In]: varrays[idxs]
[Out]:
array([[9, 8, 7],
   [5, 5, 4],
   [4, 3, 3]])
-

Sorry if this does not help. Just spit-balling...
Ryan


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about broadcasting vs for loop performance

2014-09-14 Thread Ryan Nelson
I think I figured out my own question. I guess that the broadcasting
approach is generating a very large 2D array in memory, which takes a bit
of extra time. I gathered this from reading the last example on the
following site:
http://wiki.scipy.org/EricsBroadcastingDoc
I tried this again with a much smaller "xs" array (~100 points) and the
broadcasting version was much faster.
Thanks

Ryan

Note: The link to the Scipy wiki page above is broken at the bottom of
Numpy's broadcasting page, otherwise I would have seen that earlier. Sorry
for the noise.

On Sun, Sep 14, 2014 at 10:22 PM, Ryan Nelson  wrote:

> Hello all,
>
> I have a question about the performance of broadcasting versus Python for
> loops. I have the following sample code that approximates some simulation
> I'd like to do:
>
> ## Test Code ##
>
> import numpy as np
>
>
> def lorentz(x, pos, inten, hwhm):
>
> return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )
>
>
> poss = np.random.rand(100)
>
> intens = np.random.rand(100)
>
> xs = np.linspace(0,10,1)
>
>
> def first_try():
>
> sim_inten = np.zeros(xs.shape)
>
> for freq, inten in zip(poss, intens):
>
> sim_inten += lorentz(xs, freq, inten, 5.0)
>
> return sim_inten
>
>
> def second_try():
>
> sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)
>
> sim_inten2 = sim_inten2.sum(axis=1)
>
> return sim_inten2
>
>
> print np.array_equal(first_try(), second_try())
>
>
> ## End Test ##
>
>
> Running this script prints "True" for the final equality test. However,
> IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
> second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
> machine both with Python 2.7 and Numpy 1.8.2.
>
>
> I understand in principle why broadcasting should be faster than Python
> loops, but I'm wondering why I'm getting worse results with the pure Numpy
> function. Is there some general rules for when broadcasting might give
> worse performance than a Python loop?
>
>
> Thanks
>
>
> Ryan
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about broadcasting vs for loop performance

2014-09-14 Thread Ryan Nelson
Hello all,

I have a question about the performance of broadcasting versus Python for
loops. I have the following sample code that approximates some simulation
I'd like to do:

## Test Code ##

import numpy as np


def lorentz(x, pos, inten, hwhm):

return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )


poss = np.random.rand(100)

intens = np.random.rand(100)

xs = np.linspace(0,10,1)


def first_try():

sim_inten = np.zeros(xs.shape)

for freq, inten in zip(poss, intens):

sim_inten += lorentz(xs, freq, inten, 5.0)

return sim_inten


def second_try():

sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)

sim_inten2 = sim_inten2.sum(axis=1)

return sim_inten2


print np.array_equal(first_try(), second_try())


## End Test ##


Running this script prints "True" for the final equality test. However,
IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
machine both with Python 2.7 and Numpy 1.8.2.


I understand in principle why broadcasting should be faster than Python
loops, but I'm wondering why I'm getting worse results with the pure Numpy
function. Is there some general rules for when broadcasting might give
worse performance than a Python loop?


Thanks


Ryan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion