Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Edison Gustavo Muenz
>From what I know this would be the use case that Dask seems to solve.

I think this blog post can help:
https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python

Notice that I haven't used any of these projects myself.

On Thu, Jan 14, 2016 at 11:48 AM, Francesc Alted  wrote:

> Well, maybe something like a simple class emulating a dictionary that
> stores a key-value on disk would be more than enough.  Then you can use
> whatever persistence layer that you want (even HDF5, but not necessarily).
>
> As a demonstration I did a quick and dirty implementation for such a
> persistent key-store thing (
> https://gist.github.com/FrancescAlted/8e87c8762a49cf5fc897).  On it, the
> KeyStore class (less than 40 lines long) is responsible for storing the
> value (2 arrays) into a key (a directory).  As I am quite a big fan of
> compression, I implemented a couple of serialization flavors: one using the
> .npz format (so no other dependencies than NumPy are needed) and the other
> using the ctable object from the bcolz package (bcolz.blosc.org).  Here
> are some performance numbers:
>
> python key-store.py -f numpy -d __test -l 0
> ## Checking method: numpy (via .npz files) 
> Building database.  Wait please...
> Time (creation) --> 1.906
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 0.191
> Number of elements out of getitem: 10518976
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh __test
>
> 75M __test
>
> So, with the NPZ format we can deal with the 75 MB quite easily.  But NPZ
> can compress data as well, so let's see how it goes:
>
> $ python key-store.py -f numpy -d __test -l 9
> ## Checking method: numpy (via .npz files) 
> Building database.  Wait please...
> Time (creation) --> 6.636
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 0.384
> Number of elements out of getitem: 10518976
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh __test
> 28M __test
>
> Ok, in this case we have got almost a 3x compression ratio, which is not
> bad.  However, the performance has degraded a lot.  Let's use now bcolz.
> First in non-compressed mode:
>
> $ python key-store.py -f bcolz -d __test -l 0
> ## Checking method: bcolz (via ctable(clevel=0, cname='blosclz')
> 
> Building database.  Wait please...
> Time (creation) --> 0.479
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 0.103
> Number of elements out of getitem: 10518976
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh __test
> 82M __test
>
> Without compression, bcolz takes a bit more (~10%) space than NPZ.
> However, bcolz is actually meant to be used with compression on by default:
>
> $ python key-store.py -f bcolz -d __test -l 9
> ## Checking method: bcolz (via ctable(clevel=9, cname='blosclz')
> 
> Building database.  Wait please...
> Time (creation) --> 0.487
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 0.98
> Number of elements out of getitem: 10518976
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh __test
> 29M __test
>
> So, the final disk usage is quite similar to NPZ, but it can store and
> retrieve lots faster.  Also, the data decompression speed is on par to
> using non-compression.  This is because bcolz uses Blosc behind the scenes,
> which is much faster than zlib (used by NPZ) --and sometimes faster than a
> memcpy().  However, even we are doing I/O against the disk, this dataset is
> so small that fits in the OS filesystem cache, so the benchmark is actually
> checking I/O at memory speeds, not disk speeds.
>
> In order to do a more real-life comparison, let's use a dataset that is
> much larger than the amount of memory in my laptop (8 GB):
>
> $ PYTHONPATH=. python key-store.py -f bcolz -m 100 -k 5000 -d
> /media/faltet/docker/__test -l 0
> ## Checking method: bcolz (via ctable(clevel=0, cname='blosclz')
> 
> Building database.  Wait please...
> Time (creation) --> 133.650
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 2.881
> Number of elements out of getitem: 91907396
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh
> /media/faltet/docker/__test
>
> 39G /media/faltet/docker/__test
>
> and now, with compression on:
>
> $ PYTHONPATH=. python key-store.py -f bcolz -m 100 -k 5000 -d
> /media/faltet/docker/__test -l 9
> ## Checking method: bcolz (via ctable(clevel=9, cname='blosclz')
> 
> Building database.  Wait please...
> Time (creation) --> 145.633
> Retrieving 100 keys in arbitrary order...
> Time (   query) --> 1.339
> Number of elements out of getitem: 91907396
> faltet@faltet-Latitude-E6430:~/blosc/bcolz$ du -sh
> /media/faltet/docker/__test
>
> 12G /media/faltet/docker/__test
>
> So, we are still seeing 

Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Edison Gustavo Muenz
Sometime ago I saw this: https://software.intel.com/sites/campaigns/nest/

I don't know if the "community" license applies in your case though. It is
worth taking a look at.

On Wed, Dec 16, 2015 at 4:30 PM, Francesc Alted  wrote:

> Sorry, I have to correct myself, as per:
> http://docs.continuum.io/mkl-optimizations/index it seems that Anaconda
> is not linking with MKL by default (I thought that was the case before?).
> After installing MKL (conda install mkl), I am getting:
>
> In [1]: import numpy as np
> Vendor:  Continuum Analytics, Inc.
> Package: mkl
> Message: trial mode expires in 30 days
>
> In [2]: testA = np.random.randn(15000, 15000)
>
> In [3]: testb = np.random.randn(15000)
>
> In [4]: %time testx = np.linalg.solve(testA, testb)
> CPU times: user 1min, sys: 468 ms, total: 1min 1s
> Wall time: 15.3 s
>
>
> so, it looks like you will need to buy a MKL license separately (which
> makes sense for a commercial product).
>
> Sorry for the confusion.
> Francesc
>
>
> 2015-12-16 18:59 GMT+01:00 Francesc Alted :
>
>> Hi,
>>
>> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
>> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
>> and actually Anaconda does that by default, so switching to Anaconda would
>> be a good option for you.
>>
>> Here you have what I am getting with Anaconda's NumPy and a machine with
>> 8 cores:
>>
>> In [1]: import numpy as np
>>
>> In [2]: testA = np.random.randn(15000, 15000)
>>
>> In [3]: testb = np.random.randn(15000)
>>
>> In [4]: %time testx = np.linalg.solve(testA, testb)
>> CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s
>> Wall time: 46.1 s
>>
>> This is not 20 sec, but it is not 3 min either (but of course that
>> depends on your machine).
>>
>> Francesc
>>
>> 2015-12-16 18:34 GMT+01:00 Edward Richards :
>>
>>> I recently did a conceptual experiment to estimate the computational
>>> time required to solve an exact expression in contrast to an approximate
>>> solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution
>>> requires a matrix inversion, and in my case the matrix would contain ~15000
>>> rows.
>>>
>>> On my machine MATLAB seems to perform this matrix inversion with random
>>> matrices about 9x faster (20 sec vs 3 mins). I thought the performance
>>> would be roughly the same because I presume both rely on the same
>>> LAPACK solvers.
>>>
>>> I will not actually need to solve this problem (even at 20 sec it is
>>> prohibitive for broadband simulation), but if I needed to I would
>>> reluctantly choose MATLAB . I am simply wondering why there is this
>>> performance gap, and if there is a better way to solve this problem in
>>> numpy?
>>>
>>> Thank you,
>>>
>>> Ned
>>>
>>> #Python version
>>>
>>> import numpy as np
>>>
>>> testA = np.random.randn(15000, 15000)
>>>
>>> testb = np.random.randn(15000)
>>>
>>> %time testx = np.linalg.solve(testA, testb)
>>>
>>> %MATLAB version
>>>
>>> testA = randn(15000);
>>>
>>> testb = randn(15000, 1);
>>> tic(); testx = testA \ testb; toc();
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>>
>> --
>> Francesc Alted
>>
>
>
>
> --
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread Edison Gustavo Muenz
I'm sorry if this is out-of-topic, but I'm curious on why nobody mentioned
Conda yet.

Is there any particular reason for not using it?

On Tue, Oct 27, 2015 at 11:48 AM, James E.H. Turner 
wrote:

> Apparently it is not well known that if you have a Python project
>> source tree (e.g., a numpy checkout), then the correct way to install
>> it is NOT to type
>>
>>python setup.py install   # bad and broken!
>>
>> but rather to type
>>
>>pip install .
>>
>
> Though I haven't studied it exhaustively, it always seems to me that
> pip is bad & broken, whereas python setup.py install does what I
> expect (even if it's a mess internally). In particular, when
> maintaining a distribution of Python packages, you try to have some
> well-defined, reproducible build from source tarballs and then you
> find that pip is going off and downloading stuff under the radar
> without being asked (etc.). Stopping that can be a pain & I always
> groan whenever some package insists on using pip. Maybe I don't
> understand it well enough but in this role its dependency handling
> is an unnecessary complication with no purpose. Just a comment that
> not every installation is someone trying to get numpy on their
> laptop...
>
> Cheers,
>
> James.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problems using add_npy_pkg_config

2015-08-12 Thread Edison Gustavo Muenz
Why don't you use CMake ? It's pretty standard for C/C++.

On Wed, Aug 12, 2015 at 2:35 PM, Ralf Gommers ralf.gomm...@gmail.com
wrote:



 On Wed, Aug 12, 2015 at 7:23 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Wed, Aug 12, 2015 at 10:50 AM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:



 On Wed, Aug 12, 2015 at 6:23 PM, Christian Engwer 
 christian.eng...@uni-muenster.de wrote:

 Dear all,

 I'm trying to use the numpy distutils to install native C
 libraries. These are part of a larger roject and should be usable
 standalone. I managed to install headers and libs, but now I
 experience problems writing the corresponding pkg file. I first tried
 to do the trick without numpy, but getting all the pathes right in all
 different setups is really a mess.


 This doesn't answer your question but: why? If you're not distributing a
 Python project, there is no reason to use distutils instead of a sane build
 system.


 Believe it or not, distutils *is* one of the saner build systems when you
 want something cross platform. Sad, isn't it...


 Come on. We don't take it seriously, and neither do the Python core devs.
 It's also pretty much completely unsupported. Numpy.distutils is a bit
 better in that respect than Python distutils, which doesn't even get sane
 patches merged.

 Try Scons, Tup, Gradle, Shake, Waf or anything else that's at least
 somewhat modern and supported. Do not use numpy.distutils unless there's no
 other mature choice (i.e. you're developing a Python project).

 Ralf



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] IDE's for numpy development?

2015-04-01 Thread Edison Gustavo Muenz
The PTVS can debug into native code.

On Wed, Apr 1, 2015 at 2:21 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  In a recent exchange Mark Wiebe suggested that the lack of support for
 numpy
  development in Visual Studio might limit the number of developers
 attracted
  to the project. I'm a vim/console developer myself and make no claim of
  familiarity with modern development tools, but I wonder if such tools
 might
  now be available for Numpy. A quick google search turns up a beta plugin
 for
  Visual Studio,, and there is an xcode IDE for the mac that apparently
 offers
  some Python support. The two things that I think are required are: 1)
  support for mixed C, python developement and 2) support for building and
  testing numpy. I'd be interested in information from anyone with
 experience
  in using such an IDE and ideas of how Numpy might make using some of the
  common IDEs easier.
 
  Thoughts?

 I have no experience with the C/C++ part, but I'm using the C/C++
 version of Eclipse with PyDev.

 It should have all the extra features available, but I don't use them
 and don't have compiler, debugger and so on for C/C++ connected to
 Eclipse. It looks like it supports Visual C++ and MingW GCC toolchain.
 (I'm not sure the same project can be a C/C++ and a PyDev project at
 the same time.)


 Josef

 
  Chuck
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Initializing array from buffer

2014-11-17 Thread Edison Gustavo Muenz
Have you tried using the C-API to create the array? This link might be of
help:
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#creating-arrays

I know that Boost.Python can handle this.

On Sun, Nov 16, 2014 at 3:42 PM, Andrea Arteaga andyspi...@gmail.com
wrote:

 Hello.
 Using the numpy.frombuffer function [1] one can initialize a numpy array
 using an existing python object that implements the buffer protocol [2].
 This is great, but currently this function supports only 1D buffers, even
 if the provided buffer is multidimensional and it exposes all information
 about its structure (shape, strides, data type).

 Apparently, one can extract every kind of buffer information out of a
 buffer of a numpy array (pointer, number of dimensions, shape, strides,
 suboffsets,...), but the other way around is only partially implemented:
 providing a multidimensional buffer does not mean being able of creating a
 numpy array the uses that buffer with the desired structure.

 My use case is the following: we have a some 3D arrays in our C++
 framework. The ordering of the elements in these arrays is neither C nor
 Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in
 memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ.
 Moreover we put some padding to optimize aligned access. This kind of
 memory structure cannot be just expressed as 'C' or 'Fortran', but it can
 be perfectly expressed using the Python buffer protocol by providing the
 shape and the strides. We would like to export this structure to a numpy
 array that should be able of accessing the same memory locations in a
 consistent way and make some operations like initializing the content or
 plotting it.

 Is this currently possible?
 If not, is it planned to implement such a feature?

 ==

 Maybe just to clarify I could show an example entirely in python. Assume a
 in a 2D numpy array:

 a = np.ones((10,20))

 It contains information about its structure which can be portably accessed
 using its data member:

 a.data.format == 'd'
 a.data.ndim == 2
 a.data.shape == (10,20)
 a.data.strides == (160,8)

 Unfortunately, when initializing an array b from this buffer, the
 structure of the buffer is downgraded to unidimensional shape:

 b = np.frombuffer(a.data)

 b.ndim == 1
 b.shape == (200,)
 b.strides == (8,)

 I wished b had the same multi-dimensional structure of a.

 (This is of course a very simple example. In my use case I would
 initialize b with my own buffer instead of that of another numpy array).

 Best regards

 [1]
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html
 [2] https://docs.python.org/3/c-api/buffer.html

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Accept numpy arrays on arguments of numpy.testing.assert_approx_equal()

2014-10-27 Thread Edison Gustavo Muenz
I’ve implemented support for numpy.arrays for the arguments of
numpy.testing.assert_approx_equal() and have issued a pull-request
https://github.com/numpy/numpy/pull/5219 on Github.

I don’t know if I should be sending the message to the list to notify about
this, but since I’m new to the *numpy-dev* list I think it never hurts to
say hi :)
​
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion