Re: [Numpy-discussion] PyData Barcelona this May

2017-03-21 Thread Daπid
On 20 March 2017 at 19:58, Jaime Fernández del Río  wrote:
>
> Taking NumPy In Stride
> This workshop is aimed at users already familiar with NumPy. We will dissect
> the NumPy memory model with the help of a very powerful abstraction:
> strides.
> Participants will learn how to create different views out of the same data,
> including multidimensional ones, get a new angle on how and why broadcasting
> works, and explore related techniques to write faster, more efficient code.

I think I only understand this abstract because I know what views are.
Maybe you could add a line explaining what they are? (I cannot think
of one myself).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Move scipy.org docs to Github?

2017-03-15 Thread Daπid
What about readthedocs? I haven't seen any explicit limit in traffic.

On 15 March 2017 at 16:33, Matthew Brett  wrote:
> Hi,
>
> On Wed, Mar 15, 2017 at 2:47 AM, Ralf Gommers  wrote:
>>
>>
>> On Wed, Mar 15, 2017 at 3:21 PM, Matthew Brett 
>> wrote:
>>>
>>> Hi,
>>>
>>> The scipy.org site is down at the moment, and has been for more than 36
>>> hours:
>>>
>>> https://github.com/numpy/numpy/issues/8779#issue-213781439
>>>
>>> This has happened before:
>>>
>>> https://github.com/scipy/scipy.org/issues/187#issue-186426408
>>>
>>> I think it was down for about 24 hours that time.
>>>
>>> From the number of people opening issues or commenting on the
>>> scipy.org website this time, it seems to be causing quite a bit of
>>> disruption.
>>>
>>> It seems to me that we would have a much better chances of avoiding
>>> significant down-time, if we switched to hosting the docs on github
>>> pages.
>>>
>>> What do y'all think?
>>
>>
>> Once the site is back up we should look at migrating to a better (hosted)
>> infrastructure. I suspect that Github Pages won't work, we'll exceed or be
>> close to exceeding both the 1 GB site size limit and the 100 GB/month
>> bandwidth limit [1].
>>
>> Rough bandwidth estimate (using page size from
>> http://scipy.github.io/devdocs/ and Alexa stats): 2 million visits per
>> month, 2.5 page views per visit, 5 kb/page = 25 GB/month (html). Add to that
>> pdf docs, which are ~20 MB in size: if only a small fraction of visitors
>> download those, we'll be at >100 GB.
>
> Maybe we could host the PDF docs somewhere else?   I wonder if Github
> would consider allowing us to go a bit over if necessary?
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-17 Thread Daπid
This is very nice indeed!

On 17 February 2017 at 12:15, Robert McLeod  wrote:
> * bytes and unicode support
> * reductions (mean, sum, prod, std)

I use both a lot, maybe I can help you get them working.

Also, regarding "Vectorization hasn't been done yet with cmath
functions for real numbers (such as sqrt(), exp(), etc.), only for
complex functions". What is the bottleneck? Is it in GCC or just
someone has to sit down and adapt it?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Daπid
On 4 November 2016 at 16:04, Stephan Hoyer  wrote:
>
> But, we also don't have an unstack function. This would mostly be syntactic
> sugar, but I think it would be a nice addition. Such a function actually
> exists in TensorFlow:
> https://g3doc.corp.google.com/third_party/tensorflow/g3doc/api_docs/python/array_ops.md?cl=head#unstack

That link is behind a login wall. This is the public version:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/array_ops.md
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

2016-02-09 Thread Daπid
On 8 February 2016 at 20:25, Matthew Brett  wrote:

>
> I used the latest release, v0.2.15:
>
> https://github.com/matthew-brett/manylinux-builds/blob/master/build_openblas.sh#L5
>
> Is there a later version that we should try?
>
> Cheers,
>

That is the one in the Fedora repos that is working for me. How are you
compiling it?

Mine is compiled with GCC 5 with the options seen in the source rpm:
http://koji.fedoraproject.org/koji/packageinfo?packageID=15277
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] resizeable arrays using shared memory?

2016-02-09 Thread Daπid
On 6 February 2016 at 23:56, Elliot Hallmark  wrote:
> Now, I would like to have these arrays shared between processes spawned via 
> multiprocessing (for fast interprocess communication purposes, not for 
> parallelizing work on an array).  I don't care about mapping to a file on 
> disk, and I don't want disk I/O happening.  I don't care (really) about data 
> being copied in memory on resize.  I *do* want the array to be resized "in 
> place", so that the child processes can still access the arrays from the 
> object they were initialized with.

If you are only reading in parallel, and you can afford the extra
dependency, one alternative way to do this would be to use an
expandable array from HDF5:

http://www.pytables.org/usersguide/libref/homogenous_storage.html#earrayclassdescr

To avoid I/O, your file can live in RAM.

http://www.pytables.org/cookbook/inmemory_hdf5_files.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linking other libm-Implementation

2016-02-09 Thread Daπid
On 8 February 2016 at 18:36, Nathaniel Smith  wrote:
> I would be highly suspicious that this speed comes at the expense of
> accuracy... My impression is that there's a lot of room to make
> speed/accuracy tradeoffs in these functions, and modern glibc's libm has
> seen a fair amount of scrutiny by people who have access to the same code
> that openlibm is based off of. But then again, maybe not :-).


I did some digging, and I found this:

http://julia-programming-language.2336112.n4.nabble.com/Is-the-accuracy-of-Julia-s-elementary-functions-exp-sin-known-td32736.html

In short: according to their devs, most openlibm functions are
accurate to less than 1ulp, while GNU libm is rounded to closest
float.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multi-distribution Linux wheels - please test

2016-02-08 Thread Daπid
On 6 February 2016 at 21:26, Matthew Brett  wrote:

>
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy
> python -c 'import numpy; numpy.test()'
> python -c 'import scipy; scipy.test()'
>


All the tests pass on my Fedora 23 with Python 2.7, but it seems to be
linking to the system openblas:

numpy.show_config()
lapack_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_lapack_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_mkl_info:
  NOT AVAILABLE

I can also reproduce Ogrisel's segfault.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Downstream integration testing

2016-01-31 Thread Daπid
On 31 Jan 2016 13:08, "Pauli Virtanen"  wrote:
> For example, automated test rig that does the following:
>
> - run tests of a given downstream project version, against
>   previous numpy version, record output
>
> - run tests of a given downstream project version, against
>   numpy master, record output
>
> - determine which failures were added by the new numpy version
>
> - make this happen with just a single command, eg "python run.py",
>   and implement it for several downstream packages and versions.
>   (Probably good to steal ideas from travis-ci dependency matrix etc.)

A simpler idea: build the master branch of a series of projects and run the
tests. In case of failure, we can compare with Travis's logs from the
project when they use the released numpy. In most cases, the master branch
is clean, so an error will likely be a change in behaviour.

This can be run automatically once a week, to not hog too much of Travis,
and counting the costs in hours of work, is very cheap to set up, and free
to maintain.

/David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array_equal too strict?

2015-12-17 Thread Daπid
On 17 December 2015 at 14:43, Nico Schlömer 
wrote:

> I'm not sure where I'm going wrong here. Any hints?


You are dancing around the boundary between close floating point numbers,
and when you are dealing with ULPs, number of decimal places is a bad
measure. Working with plain numbers, instead of arrays (just so that the
numbers are printed in full detail)

a1 = np.float64(3.1415926535897930)
a2 = np.float64(3.1415926535897939)

They are numerically different:
a2 - a1
8.8817841970012523e-16

In epsilons (defined as the smallest number such that (1 + eps) - 1 > 0):
(a2 - a1) / np.finfo(np.float64).eps
4.0

In fact, there is one number in between, two epsilons away from each one:
np.nextafter(a1, a2)
3.1415926535897936

np.nextafter(np.nextafter(a1, 10), 10) - a2
0.0

The next number on the other side:
np.nextafter(a1, 0)
3.1415926535897927

For more information:

print np.finfo(np.float64)
Machine parameters for float64
---
precision= 15   resolution= 1.0001e-15
machep=   -52   eps=2.2204460492503131e-16
negep =   -53   epsneg= 1.1102230246251565e-16
minexp= -1022   tiny=   2.2250738585072014e-308
maxexp=  1024   max=1.7976931348623157e+308
nexp  =11   min=-max
---


/David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-17 Thread Daπid
On 16 December 2015 at 18:59, Francesc Alted  wrote:

> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
> and actually Anaconda does that by default, so switching to Anaconda would
> be a good option for you.


A free alternative is OpenBLAS. I am getting 20 s in an i7 Haswell with 8
cores.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Build broken

2015-12-15 Thread Daπid
On 15 December 2015 at 07:25, Jaime Fernández del Río 
wrote:

> Can anyone who actually knows what Travis is doing take a look at the log:
>
> https://travis-ci.org/numpy/numpy/builds/96836128
>

I don't claim to understand what is happening there, but I believe the
function setup_chroot is missing the actual installation of Numpy, or
perhaps a call to setup_base.

https://github.com/numpy/numpy/blob/master/tools/travis-test.sh

What is CHROOT install anyway?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Daπid
On 10 November 2015 at 16:07, Irvin Probst 
wrote:

> I know this new behavior might break a lot of existing code as
> usecol=(42,) used to return a 1-D array, but usecol=42, also
> returns a 1-D array so the current behavior is not consistent imho.


42,  is exactly the same as (42,) If you want a tuple of tuples,
you have to do ((42,),), but then it raises: TypeError: list indices must
be integers, not tuple.

What numpy cares about is that whatever object you give it is iterable, and
its entries are ints, so usecol={0:'a', 5:'b'} is perfectly valid.

I think loadtxt should be a tool to read text files in the least surprising
fashion, and a text file is a 1 or 2D container, so it shouldn't return any
other shapes. Any fancy stuff one may want to do with the output should be
done with the typical indexing tricks. If I want a single column, I would
first be very surprised if I got a 2D array (I was bitten by this design in
MATLAB many many times). For the rare cases where I do want a "fake" 2D
array, I can make it explicit by expanding it with arr[:, np.newaxis], and
then I know that the shape will be (N, 1) and not (1, N). Thus, usecols
should be int or sequence of ints, and the result 1 or 2D.


In your example:

a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

What would the shape of foo be?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-06 Thread Daπid
On 30 September 2015 at 18:20, Nathaniel Smith  wrote:

> - parallel code in general is not very composable. If someone is calling a
> numpy operation from one thread, great, transparently using multiple
> threads internally is a win. If they're exploiting some higher-level
> structure in their problem to break it into pieces and process each in
> parallel, and then using numpy on each piece, then numpy spawning threads
> internally will probably destroy performance. And numpy is too low-level to
> know which case it's in. This problem exists to some extent already with
> multi-threaded BLAS, so people use various BLAS-specific knobs to manage it
> in ad hoc ways, but this doesn't scale.
>
One idea: what about creating a "parallel numpy"? There are a few
algorithms that can benefit from parallelisation. This library would mimic
Numpy's signature, and the user would be responsible for choosing the
single threaded or the parallel one by just changing np.function(x, y) to
pnp.function(x, y)

If that were deemed a good one, what would be the best parallelisation
scheme? OpenMP? Threads?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10.0 release

2015-10-06 Thread Daπid
I don't get any failures on Fedora 22. I have installed it with pip,
setting my CFLAGS to "-march=core-avx-i -O2 -pipe -mtune=native" and
linking against openblas.

With the new Numpy, Scipy full suite shows two errors, I am sorry I didn't
think of running that in the RC phase:

==
FAIL: test_weighting (test_stats.TestHistogram)
--
Traceback (most recent call last):
  File
"/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py",
line 892, in test_weighting
decimal=2)
  File
"/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py",
line 886, in assert_array_almost_equal
precision=decimal)
  File
"/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py",
line 708, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 2 decimals

(mismatch 40.0%)
 x: array([   4. ,0. ,4.5,   -0.9,0. ,0.3,  110.2,0. ,
  0. ,   42. ])
 y: array([   4. ,0. ,4.5,   -0.9,0.3,0. ,7. ,  103.2,
  0. ,   42. ])

==
FAIL: test_nanmedian_all_axis (test_stats.TestNanFunc)
--
Traceback (most recent call last):
  File
"/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py",
line 226, in test_nanmedian_all_axis
assert_equal(len(w), 4)
  File
"/home/david/.local/virtualenv/py27/lib/python2.7/site-packages/numpy/testing/utils.py",
line 354, in assert_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
 ACTUAL: 1
 DESIRED: 4

I am almost sure these errors weren't there before.

On 6 October 2015 at 13:53, Neal Becker  wrote:

> 1 test failure:
>
> FAIL: test_blasdot.test_blasdot_used
> --
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in
> runTest
> self.test(*self.arg)
>   File "/home/nbecker/.local/lib/python2.7/site-
> packages/numpy/testing/decorators.py", line 146, in skipper_func
> return f(*args, **kwargs)
>   File "/home/nbecker/.local/lib/python2.7/site-
> packages/numpy/core/tests/test_blasdot.py", line 31, in test_blasdot_used
> assert_(dot is _dotblas.dot)
>   File "/home/nbecker/.local/lib/python2.7/site-
> packages/numpy/testing/utils.py", line 53, in assert_
> raise AssertionError(smsg)
> AssertionError
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-02 Thread Daπid
On 2 October 2015 at 11:58, Juha Jeronen  wrote:

>
>>
> First version done and uploaded:
>
>
> https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy
>

Small comment: now you are checking if the input is a scalar or a ndarray,
but it should also accept any array-like. If I pass a list, I expect it to
work, internally converting it into an array.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-02 Thread Daπid
On 1 October 2015 at 09:05, Nathaniel Smith  wrote:

>
> >> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
> >> fix this but they still haven't applied it; alternatively there's a
> >> workaround you can use in multiprocessing (not using fork mode), but
> this
> >> requires every user update their code and the workaround has other
> >> limitations. We're unlikely to use OpenMP while this is the case.
> >
> > Any idea when is this going to be released?
>
> Which? The gcc patch? I spent 2 full release cycles nagging them and
> they still can't be bothered to make a decision either way, so :-(. If
> anyone has some ideas for how to get traction in gcc-land then I'm
> happy to pass on details...
>

:(

Have you tried asking Python-dev for help with this? Hopefully they would
have some weight there.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-10-01 Thread Daπid
On 30 September 2015 at 18:20, Nathaniel Smith <n...@pobox.com> wrote:

> On Sep 30, 2015 2:28 AM, "Daπid" <davidmen...@gmail.com> wrote:
> [...]
> > Is there a nice way to ship both versions? After all, most
> implementations of BLAS and friends do spawn OpenMP threads, so I don't
> think it would be outrageous to take advantage of it in more places;
> provided there is a nice way to fallback to a serial version when it is not
> available.
>
> This is incorrect -- the only common implementation of BLAS that uses
> *OpenMP* threads is OpenBLAS, and even then it's not the default -- it only
> happens if you run it in a special non-default configuration.
>
Right, sorry. I wanted to say they spawn parallel threads. What do you mean
by a non default configuration? Setting he OMP_NUM_THREADS?

> The challenges to providing transparent multithreading in numpy generally
> are:
>
> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
> fix this but they still haven't applied it; alternatively there's a
> workaround you can use in multiprocessing (not using fork mode), but this
> requires every user update their code and the workaround has other
> limitations. We're unlikely to use OpenMP while this is the case.
>
Any idea when is this going to be released?

As I understand it, OpenBLAS doesn't have this problem, am I right?

> - parallel code in general is not very composable. If someone is calling a
> numpy operation from one thread, great, transparently using multiple
> threads internally is a win. If they're exploiting some higher-level
> structure in their problem to break it into pieces and process each in
> parallel, and then using numpy on each piece, then numpy spawning threads
> internally will probably destroy performance. And numpy is too low-level to
> know which case it's in. This problem exists to some extent already with
> multi-threaded BLAS, so people use various BLAS-specific knobs to manage it
> in ad hoc ways, but this doesn't scale.
>
> (Ironically OpenMP is more composable then most approaches to threading,
> but only if everyone is using it and, as per above, not everyone is and we
> currently can't.)
>
That is what I meant with providing also a single threaded version.
<https://mail.scipy.org/mailman/listinfo/numpy-discussion>The user can
choose if they want the parallel or the serial, depending on the case.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Daπid
On 30 September 2015 at 10:20, Juha Jeronen  wrote:

>
> Are we putting Cuthon in Numpy now? I lost track.
>>
>
> I thought so? But then again, I'm just a user :)


As of this moment, there are three Cython modules in Numpy, so yes.
Releases ship the C generated modules, so it is not a dependency.

./numpy/distutils/tests/pyrex_ext/primes.pyx
./numpy/random/mtrand/mtrand.pyx
./tools/allocation_tracking/alloc_hook.pyx

In any case, we should always include a single threaded version because
sometimes the computations are already parallelised at a higher level.

Is there a nice way to ship both versions? After all, most implementations
of BLAS and friends do spawn OpenMP threads, so I don't think it would be
outrageous to take advantage of it in more places; provided there is a nice
way to fallback to a serial version when it is not available.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numpy 1.10.0rc1 released.

2015-09-24 Thread Daπid
On 24 September 2015 at 09:17, Jens Jørgen Mortensen 
wrote:

>
> >>> np.vdot(a[:, :, 0], b[:, :, 0]).real
> 84.0
> >>> np.__version__
> '1.10.0rc1'
>
> The result should be 42.  This is on Ubuntu 15.04.


I can reproduce on Fedora with Numpy linked against Openblas. Numpy 1.9 in
the same configuration works. The full test suite raises one error,  but
only due to lack of space when saving a large array.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-25 Thread Daπid
On 24 May 2015 at 22:30, Sturla Molden sturla.mol...@gmail.com wrote:

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


But there is a lot of legacy code out there that doesn't specify the
version required; and in most cases the original author cannot even be
asked.

Tests are a particularly annoying case. For example, when testing an
algorithm, is usually a good practice to record the number of iterations as
well as the result; consider it an early warning that we have changed
something we possibly didn't mean to, even if the result is correct. If we
want to support several NumPy versions, and the algorithm has any
randomness, the tests would have to be duplicated, or find a seed that
gives the exact same results. Thus, keeping different versions lets us
compare the results against the old API, without needing to duplicate the
tests. A lot less people will get annoyed.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.diag(np.dot(A, B))

2015-05-22 Thread Daπid
On 22 May 2015 at 12:15, Mathieu Blondel math...@mblondel.org wrote:

 Right now I am using np.sum(A * B.T, axis=1) for dense data and I have
 implemented a Cython routine for sparse data.
 I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum(ij,ji-i, A, B)
 yet since I am mostly interested in the sparse case right now.



In my system, einsum seems to be faster.


In [3]: N = 256

In [4]: A = np.random.random((N, N))

In [5]: B = np.random.random((N, N))

In [6]: %timeit np.sum(A * B.T, axis=1)
1000 loops, best of 3: 260 µs per loop

In [7]: %timeit  np.einsum(ij,ji-i, A, B)
1 loops, best of 3: 147 µs per loop


In [9]: N = 1023

In [10]: A = np.random.random((N, N))

In [11]: B = np.random.random((N, N))

In [12]: %timeit np.sum(A * B.T, axis=1)
100 loops, best of 3: 14 ms per loop

In [13]: %timeit  np.einsum(ij,ji-i, A, B)
100 loops, best of 3: 10.7 ms per loop


I have ATLAS installed from the Fedora repos, so not tuned; but einsum is
only using one thread anyway, so probably it is not using it (definitely
not computing the full dot, because that already takes 200 ms).

If B is in FORTRAN order, it is much faster (for N=5000).

In [25]: Bf = B.copy(order='F')

In [26]: %timeit  np.einsum(ij,ji-i, A, Bf)
10 loops, best of 3: 25.7 ms per loop

In [27]: %timeit  np.einsum(ij,ji-i, A, B)
1 loops, best of 3: 404 ms per loop

In [29]: %timeit np.sum(A * Bf.T, axis=1)
10 loops, best of 3: 118 ms per loop

In [30]: %timeit np.sum(A * B.T, axis=1)
1 loops, best of 3: 517 ms per loop

But the copy is not worth it:

In [31]: %timeit Bf = B.copy(order='F')
1 loops, best of 3: 463 ms per loop



/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project Vector math library integration

2015-03-11 Thread Daπid
On 11 March 2015 at 16:51, Dp Docs sdpa...@gmail.com wrote:



 On Wed, Mar 11, 2015 at 7:52 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
 
 ​Hi sturla,
 Thanks for suggestion.​

  There are several vector math libraries NumPy could use, e.g. MKL/VML,
  Apple Accelerate (vecLib), ACML, and probably others.

 ​Are these libraries fast enough in comparison to C maths libraries?​


These are the fastest beast out there, written in C, Assembly, and arcane
incantations.


  There are at least two ways to proceed here. One is to only use vector
  math when strides and alignment allow it.
 ​I didn't got it. can you explain in detail?​


One example, you can create a numpy 2D array using only the odd columns of
a matrix.

odd_matrix = full_matrix[::2, ::2]

This is just a view of the original data, so you save the time and the
memory of making a copy. The drawback is that you trash memory locality, as
the elements are not contiguous in memory. If the memory is guaranteed to
be contiguous, a compiler can apply extra optimisations, and this is what
vector libraries usually assume. What I think Sturla is suggesting with
when strides and aligment allow it is to use the fast version if the
array is contiguous, and fall back to the present implementation otherwise.
Another would be to make an optimally aligned copy, but that could eat up
whatever time we save from using the faster library, and other problems.

The difficulty with Numpy's strides is that they allow so many ways of
manipulating the data... (alternating elements, transpositions, different
precisions...).


 I think the actual problem is not to choose which library to integrate,
 it is how to integrate these libraries? as I have seen the code base and
 been told the current implementation uses the c math library, Can we just
 use the current  implementation and whenever it is calling ​C Maths
 functions, we will replace by these above fast library functions?Then we
 have to modify the Numpy library (which usually get imported for maths
 operation) by using some if else conditions like first work with the faster
 one  and if it is not available the look for the Default one.


At the moment, we are linking to whichever LAPACK is avaliable at compile
time, so no need for a runtime check. I guess it could (should?) be the
same.


 Moreover, I have Another Doubt also. are we suppose to integrate just one
 fast library or more than one so that if one is not available, look for the
 second one and if second is not available then either go to default are
 look for the third one if available?
 Are we suppose to think like this: Let say exp is faster in sleef
 library so integrate sleef library for this operation and let say sin is
 faster in any other library, so integrate that library for sin operation? I
 mean, it may be possible that different operations are faster in different
 libraries So the implementation should be operation oriented or just
 integrate one complete library?​Thanks​


Which one is faster depends on the hardware, the version of the library,
and even the size of the problem:

http://s3.postimg.org/wz0eis1o3/single.png

I don't think you can reliably decide ahead of time which one should go for
each operation. But, on the other hand, whichever one you go for will
probably be fast enough for anyone using Python. Most of the work here is
adapting Numpy's machinery to dispatch a call to the vector library, once
that is ready, adding another one will hopefully be easier. At least, at
the moment Numpy can use one of several linear algebra packages (MKL,
ATLAS, CBLAS...) and they are added, I think, without too much pain (but
maybe I am just far away from the screams of whoever did it).


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dot and MKL memory

2015-02-27 Thread Daπid
On 27 February 2015 at 17:50, Charles R Harris
charlesr.har...@gmail.com wrote:


 No, but I don't know why that is happening with MKL. Can anyone else
 reproduce this?

 Chuck

I can't reproduce. I have checked on my system python (ATLAS) and
Conda with MKL running in parallel with vmstat. The difference between
them is under 1 MB.

I am Fedora 20, system python using GCC 4.8 and ATLAS 3.8.4. Coda is
compiled with GCC 4.4 and linked to MKL 11.1.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

2015-02-11 Thread Daπid
On 11 February 2015 at 08:06, Kartik Kumar Perisetla
kartik.p...@gmail.com wrote:
 Thanks David. But do I need to install virtualenv on every node in hadoop
 cluster? Actually I am not very sure whether same namenodes are assigned for
 my every hadoop job. So how shall I proceed on such scenario.

I have never used hadoop, but in the clusters I have used, you have a
home folder on the central node, and each and every computing node has
access to it. You can then install Python in your home folder and make
every node run that, or pull a local copy.

Probably the cluster support can clear this up further and adapt it to
your particular case.

/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

2015-02-10 Thread Daπid
On 11 February 2015 at 03:38, Kartik Kumar Perisetla
kartik.p...@gmail.com wrote:
 Also, I don't have root access thus, can't install numpy or any other
 package on cluster

You can create a virtualenv, and install packages on it without
needing root access. To minimize trouble, you can ensure it uses the
system packages when available. Here are instructions on how to
install it:

https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo
http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/

This does not require root access, but it is probably good to check
with the sysadmins to make sure they are fine with it.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Any interest in a 'heaviside' ufunc?

2015-02-04 Thread Daπid
On 4 February 2015 at 11:05, Sturla Molden sturla.mol...@gmail.com wrote:
 On 04/02/15 06:18, Warren Weckesser wrote:

 By discrete form, do you mean discrete time (i.e. a function defined
 on the integers)?  Then I agree, the discrete time unit step function is
 defined as

 It is the cumulative integral of the delta function, and thus it can
 never obtain the value 0.5. The delta function is defined to have an
 integral of 0 or 1.

 Sturla

There are several definitions. Abramowitz and Stegun
(http://people.math.sfu.ca/~cbm/aands/page_1020.htm) assign the value
0.5 at x=0. It can also be defined as:

H(x) = 1/2 * (1 + sign(x))

Where sign(0) = 0, and therefore H(0) = 1/2.


Actually, Heaviside function is better seen as a distribution instead
of a function, and then there is no problem with the value at 0, as
long as it is finite.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sorting refactor

2015-01-16 Thread Daπid
On 16 January 2015 at 13:15, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
 Perhaps an algorithm can be made faster, but often these multicore
 algorithms are also less efficient, and a less data-dependent way of putting
 my cores to good use would have been preferable. Potentially, other code
 could slow down due to cache trashing if too many parallel tasks run in
 parallel. Id rather be in charge of such matters myself; but I imagine
 adding a keyword arg for these matters would not be much trouble?

As I understand it, that is where the strength of Cilk+ lies. It does
not force parallelisation, just suggests it. The decision to actually
spawn parallel is decided at runtime depending on the load of the
other cores.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread Daπid
On 5 January 2015 at 20:40, Colin J. Williams cjwilliam...@gmail.com
wrote:

 This illustrates a failure, which is reported later in the calculation:

 A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]])

 Here 2 - 6 is treated as an expression.

There should be a comma between 2 and -6. The rectangularity is checked,
and in this case, it is not fulfilled. As such, NumPy creates a square
matrix of size 1x1 of dtype object.

If you want to make sure what you have manually inputed is correct, you
should include a couple of assertions afterwards.

/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] median filtering a masked array

2014-11-06 Thread Daπid
On 5 November 2014 19:11, Moroney, Catherine M (398E)
catherine.m.moro...@jpl.nasa.gov wrote:
 What is the recommended way of doing a fast median filter on an array where 
 only
 certain elements of the array need to be calculated?  I'm trying to avoid a
 nested loop over all (I,J) elements.

Since you are using FORTRAN, I believe the simplest way is to do this
double loop in Cython. While you are at it, if you want to squeeze
cycles, you can implement the median in Cython too. Here is some code:

http://numpy-discussion.10968.n7.nabble.com/A-faster-median-Wirth-s-method-td14267.html


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)

2014-10-29 Thread Daπid
On 29 October 2014 10:48, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote:
 My point isn't about speed; its about the scope of numpy. typing np.fft.fft
 isn't more or less convenient than using some other symbol from the
 scientific python stack.

The problem is in distribution. For many users, installing a new
library is not easy (computing cluster, company regulations...). And
this assuming the alternative library is held to the same quality
standards as Numpy.

Another argument is that this should only be living in Scipy, that is,
after all, quite standard; but it requires a FORTRAN compiler, that is
quite a big dependency.

/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] higher accuracy in diagonialzation

2014-10-27 Thread Daπid
On 27 October 2014 09:37, Sunghwan Choi sunghwancho...@gmail.com wrote:

 One of them is inaccurate or both two of them are inaccurate within that
 range. Which one is more accurate?


You can check it yourself using the eigenvectors. The cosine distance
between v and M.dot(v) will give you the error in the eigenvectors, and the
difference between ||lambda*v|| and ||M.dot(v)|| the error in the
eigenvalue. I would also check the condition numbers, maybe your matrix is
just not well conditioned. You would have to look at preconditioners.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

2014-10-26 Thread Daπid
On 26 October 2014 12:54, Jeff Reback jeffreb...@gmail.com wrote:

 you should have a read here/
 http://wesmckinney.com/blog/?p=543

 going below the 2x memory usage on read in is non trivial and costly in
 terms of performance



If you know in advance the number of rows (because it is in the header,
counted with wc -l, or any other prior information) you can preallocate the
array and fill in the numbers as you read, with virtually no overhead.

If the number of rows is unknown, an alternative is to use a chunked data
container like Bcolz [1] (former carray) instead of Python structures. It
may be used as such, or copied back to a ndarray if we want the memory to
be aligned. Including a bit of compression we can get the memory overhead
to somewhere under 2x (depending on the dataset), at the cost of not so
much CPU time, and this could be very useful for large data and slow
filesystems.


/David.

[1] http://bcolz.blosc.org/
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy log and exceptions

2014-10-22 Thread Daπid
On 22 October 2014 15:43, Nathaniel Smith n...@pobox.com wrote:

 I guess we could make this more consistent by hand if we wanted - isnan is
 pretty cheap?



Can it be made avoiding storing the full bool array? The 1/8 memory
overhead can be problematic for large arrays.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong

2014-09-26 Thread Daπid
On 26 September 2014 10:41, Jerry lancebo...@qwest.net wrote:

  I am just learning Python-NumPy-SciPy but it appears as though the
 SciPy situation is that the documentation page above *** mentions the flag
 but the flag has not been implemented into SciPy itself. I would be glad to
 stand corrected.


Regarding this, you can look at the source:

odd = M % 2
if not sym and not odd:
w = w[:-1]

so it is implemented  in Scipy, explicitely limited to even windows. It is
not in Numpy, though (but that can be fixed).

I think a reference is worth adding to the documentation, to make the
intent and application of the sym flag clear. I am sure a PR will be most
appreciated.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy ignores OPT/FOPT under Python3

2014-09-10 Thread Daπid
On 9 September 2014 17:23, Thomas Unterthiner thomas_unterthi...@web.de
wrote:


 I want to use the OPT/FOPT environment viariables to set compiler flags
 when compiling numpy. However it seems that they get ignored under
 python3. Using Ubuntu 14.04 and numpy 1.9.0, I did the following:

  export OPT=-march=native
  export FOPT = -march=native
   python setup.py build   # python executes python2.7
 [...snip...]
 C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing
 -march=native -fPIC
 ^C


Running the same on my computer (Fedora 20, python 2.7) doesn't seem to
process the flags:

C compiler: -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC

Double-checking:

$ echo $OPT
-march=native
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Does a `mergesorted` function make sense?

2014-08-27 Thread Daπid
On 27 August 2014 19:02, Jaime Fernández del Río jaime.f...@gmail.com
wrote:


 Since there is at least one other person out there that likes it, is there
 any more interest in such a function? If yes, any comments on what the
 proper interface for extra output should be? Although perhaps the best is
 to leave that out for starters and see what use people make of it, if any.


I think a perhaps more useful thing would be to implement timsort. I
understand it is capable of take full advantage of the partially sorted
arrays, with the extra safety of not making the assumption that the
individual arrays are sorted. This will also be useful for other real world
cases where the data is already partially sorted.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Daπid
On 28 July 2014 14:46, Sebastian Berg sebast...@sipsolutions.net wrote:

  To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
  np.ones((2,N)).mean(1) produce different results with the
  implementation in the current master? If so, I think that would be
  very much regrettable; and if this is a minority opinion, I do hope
  that at least this gets documented in a most explicit manner.
 

 This will always give you different results. Though in master. the
 difference is more likely to be large, since (often the second one)
 maybe be less likely to run into bigger numerical issues.



An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
float32):

for N in np.logspace(2, 6):
print N, (np.ones((N,2), dtype=np.float16).mean(0), np.ones((2,N),
dtype=np.float16).mean(1))

The first one gives correct results up to 2049, from where the values start
to fall. The second one, on the other hand, gives correct results up to
65519, where it blows to infinity.

Interestingly, in the second case there are fluctuations. For example, for
N = 65424, the mean is 0.99951172, but 1 for the next and previous numbers.
But I think they are just an effect of the rounding, as:

In [33]: np.ones(N+1, dtype=np.float16).sum() - N
Out[33]: 16.0

In [35]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[35]: 15.0

In [36]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[36]: -15.0

In [37]: N = 65519 - 20

In [38]: np.ones(N, dtype=np.float16).sum() - N
Out[38]: 5.0

In [39]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[39]: 4.0

In [40]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[40]: 6.0
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-13 Thread Daπid
On 11 July 2014 22:30, Daniel da Silva var.mail.dan...@gmail.com wrote:

 I think the idea at hand is not that it would be used everyday, but it
 would be there when needed. What people do everyday is with *real* data.
 They are using functions to load the data.


But sometimes we have to hard-code a few values, and it is true that making
a list (or nested list) is quite verbose; one example are unittests. Having
a MATLAB-style array creation would be convenient for those cases.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-07 Thread Daπid
On 7 July 2014 08:48, Jacco Hoekstra - LR j.m.hoeks...@tudelft.nl wrote:

 How about using the old name np.mat() for this type of array creation?


How about a new one? np.matarray, for MATLAB array.

/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython requirement?

2014-07-06 Thread Daπid
On 6 July 2014 20:40, Benjamin Root ben.r...@ou.edu wrote:

 When did Cython become a build requirement? I remember discussing the use
 of Cython a while back, and IIRC the agreement was that both the cython
 code and the generated C files would be included in version control so that
 cython wouldn't be a build requirement, only a developer requirement when
 modifying those files.



The policy was changed to not include them in VC, but to them in the
releases, not to pollute the repository and avoid having C files not
matching the pyx, IIRC. The change was fairly recent, I was only able to
dig this email mentioning it:

http://numpy-discussion.10968.n7.nabble.com/numpy-git-master-requiring-cython-for-build-td37250.html


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Switch to using ATLAS for OSX binary wheels

2014-06-13 Thread Daπid
On 13 June 2014 18:00, Ralf Gommers ralf.gomm...@gmail.com wrote:

 pip install numpy[mkl]
  ?


 I think that's possible.


MKL are fairly famous, but perhaps it would be legally safer to use
[mkl-nonfree] (or something of the sort) to signal the licence.

But maybe I am bikeshedding here.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fftw supported?

2014-06-05 Thread Daπid
On 4 June 2014 23:34, Alexander Eberspächer alex.eberspaec...@gmail.com
wrote:

 If you feel pyfftw bothers you with too many FFTW details, you may try
 something like https://github.com/aeberspaecher/transparent_pyfftw
 (be careful, it's a hack that has seen only little testing).


pyFFTW provides a drop-in replacement for Numpy and Scipy's fftw:

https://hgomersall.github.io/pyFFTW/pyfftw/interfaces/interfaces.html

You can still set the number of threads and other advanced parameters, but
you can just ignore them and view it as a very simple library. Does your
wrapper set these to reasonable values? If not, I am missing completely the
point.

I am starting to use pyFFTW, and maybe I can help you test tfftw.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy quad and maple

2014-05-10 Thread Daπid
On 10 May 2014 01:04, Ihab Riad ifr...@gmail.com wrote:

 Hi,

 I did the following integral with numpy quad

 quad(lambda h:

 np.exp(-.5*2334.0090702455204-1936.9610182100055*5*log10(h)-(12.5*2132.5498892927189)*log10(h)*log10(h)),0,inf)

 and I get the following values (1.8368139214123403e-126,
 3.3631976081491865e-126).


The first value is the integral, the second is the error. You have (1.8 +-
3.3) 10^-126. That is, your integral is almost zero.


 When I do the integral with maple and mathematica I get
 2.643019766*10^(-127)


That is inside the bracket Scipy said. And I wouldn't even trust them to be
correct without doing some checking.

Numbers so small are numerically non trustworthy. Your function is:

E^(A - B log(h) - C log^2(h)) = E^A * h^B' * h ^ (C' log(h))

where B' and C' depend on your numbers. E^A is already 10^-507 (are you
sure this numbers are correct?), so if you strip it out of your integral,
you have a simpler expression, and more reasonable values. This is what is
killing your integral and sending it to 0.

Now, you have to normalise the values. So,  changing variables: h- sx, dh
= sdx:

sx^B' * (sx)^[C' (log(s) + log(x))]

Expand and simplify, find the value that makes more reasonable values
(order of one).


Last thing: giving quad a limit in infinity can make it behave crazily. If
you look at your second term, that is roughly x^(-log(x)), it decays very
quickly for large values of x (10^-5 for x=20), so you can, without
affecting your integral much, set a cut at a some reasonable h (plot it to
be sure). You can even make an estimate comparing with the analytical
result [1] of that particular piece.

Bottomline: don't use a lambda, use a function and expand your parameters.
That will make the expression simpler.

def f(h):
log10 = np.log10(h)
B = 1936.9610182100055*5
...
return ...


/David.

[1] http://www.wolframalpha.com/input/?i=Integrate[x
^-log%28x%29%2C+{x%2C+k%2C+infinity}]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] index partition

2014-04-15 Thread Daπid
On 14 April 2014 18:17, Alan G Isaac alan.is...@gmail.com wrote:

 I find it rather more convenient to use boolean arrays,
 but I wonder if arrays of indexes might have other
 advantages (which would suggest using the set operations
 instead). In particular, might a[boolean_array] be slower
 that a[indexes]?  (I'm just asking, not suggesting.)


Indexing is generally faster, but convert from boolean to indexes gets more
expensive:

In [2]: arr =np.random.random(1000)

In [3]: mask = arr0.7

In [4]: mask.sum()
Out[4]: 290

In [5]: %timeit arr[mask]
10 loops, best of 3: 4.01 µs per loop

In [6]: %%timeit
   ...: wh = np.where(mask)
   ...: arr[wh]
   ...:
10 loops, best of 3: 6.47 µs per loop

In [8]: wh = np.where(mask)

In [9]: %timeit arr[wh]
10 loops, best of 3: 2.57 µs per loop

In [10]: %timeit np.where(mask)
10 loops, best of 3: 3.89 µs per loop

In [14]: np.all(arr[wh] == arr[mask])
Out[14]: True


If you want to apply the same mask to several arrays, it is then worth
(performance-wise) to do it.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] match RNG numbers with R?

2014-04-07 Thread Daπid
On Apr 7, 2014 3:59 AM, Yaroslav Halchenko li...@onerussian.com wrote:
 so I would assume that the devil is indeed in R post-processing and would
look
 into it (if/when get a chance).

The devil here is the pigeon and the holes problem. Mersenne Twister
generates random integers in a certain range. The output is guaranteed to
be deterministic, uniform, and reproducible.

But when you want to cast those m possible input in n possible outputs, you
need to do magic (or maths) to keep the new distribution truly uniform.
Simply getting random bytes and viewing them as ints will give you low
quality random numbers.

The algorithm that casts MT output to a random integer is probably what is
different, and perhaps you could find it documented somewhere.

As a side note, C++11 includes in the standard a MT RNG guaranteed to be
the same across implementations, but there is no promise about the
distributions -not even across versions of the same implementation.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-04 Thread Daπid
On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote:

 josef.p...@gmail.com wrote:

  pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.


All this discussion reminds me of the book Numerical Recipes:

if the difference between N and N - 1 ever matters to you, then you
are probably up to no good anyway -- e.g., trying to substantiate a
questionable
hypothesis with marginal data.

For any reasonably sized data set, it is a correction in the second
significant figure.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Where to put versionadded

2014-04-04 Thread Daπid
On Apr 4, 2014 8:54 PM, Charles R Harris charlesr.har...@gmail.com
wrote:




 On Fri, Apr 4, 2014 at 12:48 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 4, 2014 at 2:12 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  Currently there are several placements of the '.. versionadded::'
directive
  and I'd like to settle
  on a proper style for consistency. There are two occasions on which it
is
  used, first, when a new function or class is added and second, when a
new
  keyword is added to an existing function or method. The options are as
  follows.
 
  New Function
 
  1) Originally, the directive was added in the notes section.
 
  Notes
  -
  .. versionadded:: 1.5.0
 
   2) Alternatively, it is placed after the extended summary.
 
  blah, blah
 
  ..versionadded:: 1.5.0
 
  Between these two, I vote for 2) because the version is easily found
when
  reading the documentation either in a terminal or rendered into HTML.
 
  New Parameter
 
  1) It is placed before the parameter description
 
  newoption : int, optional
  .. versionadded:: 1.5.0
  blah.
 
  2) It is placed after the parameter description.
 
  newoption : int, optional
  blah.
 
  .. versionadded:: 1.5.0
 
  Both of these render correctly, but the first is more compact while the
  second puts the version
  after the description where it doesn't interrupt the reading. I'm
tending
  towards 1) on account of its compactness.
 
  Thoughts?

 I'm in favor of putting them only in the Notes section.

 Most of the time they are not crucial information and it's
 distracting.  I usually only look for them when I'm working explicitly
 across several numpy versions.

 like in python: versionadded 2.1 is only interesting for historians.


 I find the opposite to be true. Because numpy needs maintain
compatibility with a number python versions, I often check the python
documentation to see in which version a function was added.

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


My user perspective: I am developing a tool whose main use is to run on my
computer, so I prefer to run the newest and sweetest version of the
libraries, and I this report the minimum versions. But it would be good if
I could grep my code, see what numpy functions are being used and infer a
probable minimum version required.

If other libraries follow similar conventions, and one does not do
metaprogramming or other fancy things, it is relatively easy to get
automatically.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


On numpy latest version:

for kind in ['quicksort', 'mergesort', 'heapsort']:
print kind
%timeit np.sort(data, kind=kind)
%timeit np.argsort(data, kind=kind)


quicksort
1 loops, best of 3: 3.55 s per loop
1 loops, best of 3: 10.3 s per loop
mergesort
1 loops, best of 3: 4.84 s per loop
1 loops, best of 3: 9.49 s per loop
heapsort
1 loops, best of 3: 12.1 s per loop
1 loops, best of 3: 39.3 s per loop


It looks quicksort is quicker sorting, but mergesort is marginally faster
sorting args. The diference is slim, but upon repetition, it remains
significant.

Why is that? Probably part of the reason is what Eelco said, and part is
that in sort comparison are done accessing the array elements directly, but
in argsort you have to index the array, introducing some overhead.

I seem unable to find the code for ndarray.sort, so I can't check. I have
tried to grep it tring all possible combinations of def ndarray,
self.sort, etc. Where is it?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote:

 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?



Nevermind, it is in core/src/multiarray/methods.c
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?

2014-02-11 Thread Daπid
Here it is an example:

import numpy as np
import pylab as plt

N = 100
M = 200
x, y = np.meshgrid(np.arange(N), np.arange(M))

# Center at 40, 70, radius 20
x -= 40
y -= 70
out_circle = (x * x + y * y  20**2)

out_ring = out_circle - (x * x + y * y  10**2)

plt.imshow(out_circle)
plt.figure()
plt.imshow(out_ring)
plt.show()

Note that I have avoided taking the costly square root of each element by
just taking the square of the radius. It can also be generalised to
ellipses or rectangles, if you need it.

Also, don't use 0 as a False value, and don't force it to be a 0. Instead,
use if not ring:


/David





On 11 February 2014 21:56, Wolfgang Draxinger 
wolfgang.draxin...@physik.uni-muenchen.de wrote:

 Hi,

 I implemented the following helper function to create masking arrays:

 def gen_mask(self, ring, sector):
 in_segment = None
 if ring == 0:
 radius = self.scales()[0]
 def in_center_circle(xy):
 dx = xy[0] - self.xy[0]
 dy = xy[1] - self.xy[1]
 r = math.sqrt( dx**2 + dy**2 )
 return r  radius
 in_segment = in_center_circle
 else:
 angle = ( self.a_sector(sector, ring),
   self.a_sector( (sector+1) % self.n_sectors(),
 ring) )
 radii = self.scales()[ring:ring+1]
 def in_segment_ring(xy):
 dx = xy[0] - self.xy[0]
 dy = xy[1] - self.xy[1]
 r = math.sqrt( dx**2 + dy**2 )
 a = math.atan2( dy, dx )
 return r = radii[0] and r  radii[1] \
and a = angle[0] and a  angle[1]
 in_segment = in_segment_ring

 width,height = self.arr.shape

 mask = numpy.zeros(shape=(width, height), dtype=numpy.bool)
 for y in range(height):
 for x in range(width):
 mask[x][y] = in_segment((x,y))
 return mask

 self.scales() generates a list of radius scaling factors.
 self.a_sector() gives the dividing angle between sectors sector and
 sector+1 on the given ring.

 The function works, it generates the masks as I need it. The problem is
 - of course - that it's quite slow due to the inner loops the perform
 the geometrical test if the given element of the array self.arr is
 withing the bounds of the ring-sector for which the mask is generated.

 I wonder if you guys have some ideas, on how I could accelerate this.
 This can be perfectly well constructed by boolean combination of
 elementary geometrical objects. For example a ring would be

 ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0  r1

 The sector would then be

 ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1)

 I'd like to avoid doing this using some C helper routine. I'm looking
 for something that is the most efficient when it comes to
 speedup/time invested to develop this.


 Cheers,

 Wolfgang
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?

2014-02-11 Thread Daπid
A small improvement:

# Dimensions
N = 100
M = 200

# Coordinates of the centre
x0 = 40
y0 = 70

x, y = np.meshgrid(np.arange(N) - x0, np.arange(M) - y0, sparse=True)

# Center at 40, 70, radius 20
out_circle = (x * x + y * y  20**2)

out_ring = out_circle - (x * x + y * y  10**2)

plt.imshow(out_circle)
plt.figure()
plt.imshow(out_ring)
plt.show()

Using sparse you can avoid the repetition of the arrays, getting the same
functionality. Also, if the image is big, you can delegate the computations
of out_circle and out_ring to numexpr for speed.

/David.


On 11 February 2014 22:16, Daπid davidmen...@gmail.com wrote:

 Here it is an example:

 import numpy as np
 import pylab as plt

 N = 100
 M = 200
 x, y = np.meshgrid(np.arange(N), np.arange(M))

 # Center at 40, 70, radius 20
 x -= 40
 y -= 70
 out_circle = (x * x + y * y  20**2)

 out_ring = out_circle - (x * x + y * y  10**2)

 plt.imshow(out_circle)
 plt.figure()
 plt.imshow(out_ring)
 plt.show()

 Note that I have avoided taking the costly square root of each element by
 just taking the square of the radius. It can also be generalised to
 ellipses or rectangles, if you need it.

 Also, don't use 0 as a False value, and don't force it to be a 0. Instead,
 use if not ring:


 /David





 On 11 February 2014 21:56, Wolfgang Draxinger 
 wolfgang.draxin...@physik.uni-muenchen.de wrote:

 Hi,

 I implemented the following helper function to create masking arrays:

 def gen_mask(self, ring, sector):
 in_segment = None
 if ring == 0:
 radius = self.scales()[0]
 def in_center_circle(xy):
 dx = xy[0] - self.xy[0]
 dy = xy[1] - self.xy[1]
 r = math.sqrt( dx**2 + dy**2 )
 return r  radius
 in_segment = in_center_circle
 else:
 angle = ( self.a_sector(sector, ring),
   self.a_sector( (sector+1) % self.n_sectors(),
 ring) )
 radii = self.scales()[ring:ring+1]
 def in_segment_ring(xy):
 dx = xy[0] - self.xy[0]
 dy = xy[1] - self.xy[1]
 r = math.sqrt( dx**2 + dy**2 )
 a = math.atan2( dy, dx )
 return r = radii[0] and r  radii[1] \
and a = angle[0] and a  angle[1]
 in_segment = in_segment_ring

 width,height = self.arr.shape

 mask = numpy.zeros(shape=(width, height), dtype=numpy.bool)
 for y in range(height):
 for x in range(width):
 mask[x][y] = in_segment((x,y))
 return mask

 self.scales() generates a list of radius scaling factors.
 self.a_sector() gives the dividing angle between sectors sector and
 sector+1 on the given ring.

 The function works, it generates the masks as I need it. The problem is
 - of course - that it's quite slow due to the inner loops the perform
 the geometrical test if the given element of the array self.arr is
 withing the bounds of the ring-sector for which the mask is generated.

 I wonder if you guys have some ideas, on how I could accelerate this.
 This can be perfectly well constructed by boolean combination of
 elementary geometrical objects. For example a ring would be

 ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0  r1

 The sector would then be

 ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1)

 I'd like to avoid doing this using some C helper routine. I'm looking
 for something that is the most efficient when it comes to
 speedup/time invested to develop this.


 Cheers,

 Wolfgang
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Daπid
On 2 February 2014 20:58, Mads Ipsen mads.ip...@gmail.com wrote:

 ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
 practical examples, the list can be much larger (N  100.000 connections.



Perhaps you should consider an alternative approach. You could consider it
a graph, and you could use Networkx or Scipy to work with them (provided it
actually works well with the rest of your problem)

In the case of Scipy, the graph is described by its adjacency matrix, and
you just want to delete a row and a column.

But, in any case, not knowing at all what is your overall project,
renumbering nodes is not something one has to usually do when working with
graphs, except for final results. The labels are that, labels, with no
further meaning.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-25 Thread Daπid
On 24 January 2014 23:09, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  Francesc: Thanks. I looked at numexpr a few years back but it didn't
 support array slicing/indexing.  Has that changed?


No, but you can do it yourself.

big_array = np.empty(2)
piece = big_array[30:-50]
ne.evaluate('sqrt(piece)')

Here, creating piece does not increase memory use, as slicing shares the
original data (well, actually, it adds a mere 80 bytes, the overhead of an
array).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using loadtxt to load a text file in to a numpy array

2014-01-15 Thread Daπid
On 15 January 2014 11:12, Hedieh Ebrahimi hedieh.ebrah...@amphos21.comwrote:

 I try to print my fileContent array after I read it and it looks like this
 :

 [b'C:UsersDocumentsProjectmytextfile1.txt'
  b'C:UsersDocumentsProjectmytextfile2.txt'
  b'C:UsersDocumentsProjectmytextfile3.txt']

 Why is this happening and how can I prevent it ?
 Also if I have a line that starts like this in my file, python will crash
 on me. how can i fix this ?


What is wrong with this case? If you are concerned about the multiple
backslashes, they are there because they are special symbols, and so they
have to be escaped (you actually want a backslash, not whatever else they
could mean).

Depending on what else is on the file, you may be better off reading the
file in pure python. Assuming there is nothing else, something like this
would work:

[line.strip() for line in open(filePath, 'r').readlines()]


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] adding fused multiply and add to numpy

2014-01-09 Thread Daπid
On 8 January 2014 22:39, Julian Taylor jtaylor.deb...@googlemail.comwrote:

 As you can see even without real hardware support it is about 30% faster
 than inplace unblocked numpy due better use of memory bandwidth. Its
 even more than two times faster than unoptimized numpy.


I have an i5, and AVX crashes, even though it is supported by my CPU.
Here are my timings:

SSE2:

In [24]: %timeit npfma.fma(a, b, c)
10 loops, best of 3: 15 us per loop

In [28]: %timeit npfma.fma(a, b, c)
100 loops, best of 3: 2.36 ms per loop

In [29]: %timeit npfma.fms(a, b, c)
100 loops, best of 3: 2.36 ms per loop

In [31]: %timeit pure_numpy_fma(a, b, c)
100 loops, best of 3: 7.5 ms per loop

In [33]: %timeit pure_numpy_fma2(a, b, c)
100 loops, best of 3: 4.41 ms per loop

The model supports all the way to sse4_2

libc:

In [24]: %timeit npfma.fma(a, b, c)
1000 loops, best of 3: 883 us per loop

In [28]: %timeit npfma.fma(a, b, c)
10 loops, best of 3: 88.7 ms per loop

In [29]: %timeit npfma.fms(a, b, c)
10 loops, best of 3: 87.4 ms per loop

In [31]: %timeit pure_numpy_fma(a, b, c)
100 loops, best of 3: 7.94 ms per loop

In [33]: %timeit pure_numpy_fma2(a, b, c)
100 loops, best of 3: 3.03 ms per loop



 If you have a machine capable of fma instructions give it a spin to see
 if you get similar or better results. Please verify the assembly
 (objdump -d fma-suffix.o) to check if the compiler properly used the
 machine fma.


Following the instructions in the readme, there is only one compiled file,
npfma.so, but no .o.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nasty bug in 1.8.0??

2013-12-02 Thread Daπid
I get:

In [4]: x.strides
Out[4]: (8,)

Same architecture and OS, Numpy installed via Pip on Python 2.7.5.


On 2 December 2013 20:08, Neal Becker ndbeck...@gmail.com wrote:

 This is np 1.8.0 on fedora x86_64:

 In [5]: x =np.array ((1,))

 In [6]: x.shape
 Out[6]: (1,)

 In [7]: x.strides
 Out[7]: (9223372036854775807,)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy vbench-marking, compiler comparison

2013-11-26 Thread Daπid
Have you tried on an Intel CPU? I have both a i5 quad core and an i7 octo
core where I could run it over the weekend. One may expect some compiler
magic taking advantage of the advanced features, specially the i7.

/David
On Nov 25, 2013 8:16 PM, Julian Taylor jtaylor.deb...@googlemail.com
wrote:

 On 25.11.2013 02:32, Yaroslav Halchenko wrote:
 
  On Tue, 15 Oct 2013, Nathaniel Smith wrote:
  What do you have to lose?
 
  btw -- fresh results are here
 http://yarikoptic.github.io/numpy-vbench/ .
 
  I have tuned benchmarking so it now reflects the best performance
 across
  multiple executions of the whole battery, thus eliminating spurious
  variance if estimate is provided from a single point in time.
  Eventually I
  expect many of those curves to become even cleaner.
 
  On another note, what do you think of moving the vbench benchmarks
  into the main numpy tree? We already require everyone who submits a
  bug fix to add a test; there are a bunch of speed enhancements coming
  in these days and it would be nice if we had some way to ask people to
  submit a benchmark along with each one so that we know that the
  enhancement stays enhanced...
 
  On this positive note (it is boring to start a new thread, isn't it?) --
  would you be interested in me transfering numpy-vbench over to
  github.com/numpy ?
 
  as of today, plots on http://yarikoptic.github.io/numpy-vbench  should
  be updating 24x7 (just a loop, thus no time guarantee after you submit
  new changes).
 
  Besides benchmarking new benchmarks (your PRs  would still be very
  welcome,  so far it was just me and Julian T) and revisions, that
  process also goes through a random sample of existing previously
  benchmarked revisions and re-runs the benchmarks thus improving upon the
  ultimate 'min' timing performance.  So you can see already that many
  plots became much 'cleaner', although now there might be a bit of bias
  in estimates for recent revisions since they hadn't accumulated yet as
  many of 'independent runs' as older revisions.
 

 using the vbench I created a comparison of gcc and clang with different
 options.
 Cliffnotes:
 * gcc -O2 performs 5-10% better than -O3 in most benchmarks, except in a
 few select cases where the vectorizer does its magic
 * gcc and clang are very close in performance, but the cases where a
 compiler wins by a large margin its mostly gcc that wins

 I have collected some interesting plots on this notebook:
 http://nbviewer.ipython.org/7646615
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] RuntimeWarning raised in the wrong line

2013-11-20 Thread Daπid
I have the following code, operating on some arrays.

pen = 1 - (real_sum[int(i1 + t)]) / (absolute[int(i1 + t)])
if np.isnan(pen):
pen = 0.0

I know that, sometimes, real_sum and absolute are 0 at the designated
point, so I should get a RuntimeWarning. But, the warning is puzzling:

RuntimeWarning: invalid value encountered in double_scalars
  pen = 0.0

Is the warning actually being raised at that point? Or can I not really
trust the line reported by the warning? I have tried to replicate it in
interactive terminal, but without luck, I always get it in the previous
line.


I am using Python 2.7 and Numpy 1.8.0.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] savetxt fmt argument fails when using a unicode string

2013-11-14 Thread Daπid
On 14 November 2013 09:25, Pierre Haessig pierre.haes...@crans.org wrote:

 Le 13/11/2013 17:54, Daπid a écrit :
 
  np.savetxt('a.csv', [1], fmt=str('%.3f'))
 Thanks, that's what I did too.

 I'm just still wondering whether there is a cleaner solution...


I have a fix, I am submitting a PR. The only thing I am missing is where
the tests for IO are.

https://github.com/numpy/numpy/pull/4053
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Precision/value change moving from C to Python

2013-11-13 Thread Daπid
On 13 November 2013 02:40, Bart Baker bart...@gmail.com wrote:

  That is the order of the machine epsilon for double, that looks like
 roundoff
  errors to me.


 I'm trying to my head around this. So does that mean that neither of
 them is right, that it is just the result of doing the same
 calculation two different ways using different computational libraries?


Essentially, yes.

I am tempted to say that, depending on the compiler flags, the C version
*could* be more accurate, as the compiler can reorganise the operations and
reduce the number of steps. But also, if it is optimised for speed, it
could be using faster and less accurate functions and techniques.

In any case, if that 10^-16 matters to you, I'd say you are either doing
something wrong or using the wrong dtype; and without knowing the
specifics, I would bet on the first one. If you really need that precision,
you would have to use more bits, and make sure your library supports that
dtype. I believe the following proves that (my) numpy.cos can deal with 128
bits without converting it to float.

 a = np.array([1.2584568431575694895413875135786543], dtype=np.float128)
 np.cos(a)-np.cos(a.astype(np.float64))
array([ 7.2099444e-18], dtype=float128)


The bottom line is, don't believe nor trust the least significant bits of
your floating point numbers.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] savetxt fmt argument fails when using a unicode string

2013-11-13 Thread Daπid
The following works on Numpy 1.8.0:

from __future__ import unicode_literals
import numpy as np

np.savetxt('a.csv', [1], fmt=str('%.3f'))


Without the str, I get a clearer error:

Traceback (most recent call last):
  File a.py, line 4, in module
np.savetxt('a.csv', [1], fmt='%.3f')
  File .virtualenv/local/lib/python2.7/site-packages/numpy/lib/npyio.py,
line 1049, in savetxt
raise ValueError('invalid fmt: %r' % (fmt,))
ValueError: invalid fmt: u'%.3f'


/David.



On 13 November 2013 17:28, Pierre Haessig pierre.haes...@crans.org wrote:

 Hi,

 I just noticed (with numpy 1.7.1) that the following code

 import numpy as np
 np.savetxt('a.csv', [1], fmt=u'%.3f')

 fails with:

1045 else:
1046 for row in X:
 - 1047 fh.write(asbytes(format % tuple(row) + newline))
1048 if len(footer)  0:
1049 footer = footer.replace('\n', '\n' + comments)

 UnboundLocalError: local variable 'format' referenced before assignment

 (of course, the simple solution is to remove the u prefix, but in my
 real code, I have a from __future__ import unicode_literals)

 Since parts of the savetxt function were changed since that, I don't
 know if this bug is still applicable or not.
 I'm guessing that since Warren Weckesser commit

 https://github.com/numpy/numpy/commit/0d284764a855cae11699228ff1b81e6d18f38ed2
 the problem would be caught earlier with a much nicer ValueError.

 However, I still see

 isinstance(fmt, str): (on line 1035
 https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1035 )

 How can I make the fmt argument work with unicode_literals ?

 best,
 Pierre




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Precision/value change moving from C to Python

2013-11-12 Thread Daπid
On 12 November 2013 12:01, Bart Baker bart...@gmail.com wrote:

 The issue is that there are some minor (10^-16) differences in the
 values when I do the calculation in C vs Python.


That is the order of the machine epsilon for double, that looks like
roundoff errors to me.

 I found similar results cythonising some code, everything was the same
until I changed some numpy functions for libc functions (exp, sin, cos...).
After some operations in float32, the results were apart for 1 in 10^-5
(the epsilon is 10^-6). I blame them on specific implementation differences
between numpy's and my system's libc specific functions.

To check equality, use np.allclose, it lets you define the relative and
absolute error.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] memmory management question

2013-10-28 Thread Daπid
On 28 October 2013 03:13, Georgios Exarchakis gexarc...@gmail.com wrote:

 If yes then how do you release memorry by slicing away parts of an array?


An array is a single Python object. In your example, there is always one
reference pointing to the array (either the whole array or only a view), so
the memory cannot be released.

In your case, a = a[:1].copy() releases the memory by creating a new
object and pointing a to this new one, so the big array gets unreferenced
and thus, garbage collected. The price to pay is that, for some time, you
have living in memory both the original array and your new one: if they
were very big, you could run out of memory; and that you have to perform a
memcopy (again, slow if you have lots of data).


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.8.0rc1

2013-10-01 Thread Daπid
On 30 September 2013 17:17, Charles R Harris charlesr.har...@gmail.comwrote:

 NumPy 1.8.0rc1 is up now on 
 sourceforgehttp://sourceforge.net/projects/numpy/files/NumPy/1.8.0rc1/.The 
 binary builds are included except for Python 3.3 on windows, which
 will arrive later. Many thanks to Ralf for the binaries, and to those who
 found and fixed the bugs in the last beta. Any remaining bugs are all my
 fault ;) I hope this will be the last release before final, so please test
 it thoroughly.


I installed it with

# python setup.py install

But something is wrong there:

 import numpy as np
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python2.7/site-packages/numpy/__init__.py, line 137, in
module
import add_newdocs
  File /usr/lib64/python2.7/site-packages/numpy/add_newdocs.py, line 13,
in module
from numpy.lib import add_newdoc
  File /usr/lib64/python2.7/site-packages/numpy/lib/__init__.py, line 4,
in module
from type_check import *
  File /usr/lib64/python2.7/site-packages/numpy/lib/type_check.py, line
8, in module
import numpy.core.numeric as _nx
  File /usr/lib64/python2.7/site-packages/numpy/core/__init__.py, line
45, in module
from numpy.testing import Tester
  File /usr/lib64/python2.7/site-packages/numpy/testing/__init__.py, line
10, in module
import decorators as dec
  File /usr/lib64/python2.7/site-packages/numpy/testing/decorators.py,
line 19, in module
from numpy.testing.utils import \
  File /usr/lib64/python2.7/site-packages/numpy/testing/utils.py, line
12, in module
from .nosetester import import_nose
  File /usr/lib64/python2.7/site-packages/numpy/testing/nosetester.py,
line 12, in module
from numpy.compat import basestring
ImportError: cannot import name basestring


I am using Python27 on Fedora 19.

$ gcc --version
gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.8.0rc1

2013-10-01 Thread Daπid
Disregard that, I had not cleaned the previous installation properly.

Sorry for the noise.


On 1 October 2013 12:11, Daπid davidmen...@gmail.com wrote:


 On 30 September 2013 17:17, Charles R Harris charlesr.har...@gmail.comwrote:

 NumPy 1.8.0rc1 is up now on 
 sourceforgehttp://sourceforge.net/projects/numpy/files/NumPy/1.8.0rc1/.The 
 binary builds are included except for Python 3.3 on windows, which
 will arrive later. Many thanks to Ralf for the binaries, and to those who
 found and fixed the bugs in the last beta. Any remaining bugs are all my
 fault ;) I hope this will be the last release before final, so please test
 it thoroughly.


 I installed it with

 # python setup.py install

 But something is wrong there:

  import numpy as np

 Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/lib64/python2.7/site-packages/numpy/__init__.py, line 137,
 in module
 import add_newdocs
   File /usr/lib64/python2.7/site-packages/numpy/add_newdocs.py, line 13,
 in module
 from numpy.lib import add_newdoc
   File /usr/lib64/python2.7/site-packages/numpy/lib/__init__.py, line 4,
 in module
 from type_check import *
   File /usr/lib64/python2.7/site-packages/numpy/lib/type_check.py, line
 8, in module
 import numpy.core.numeric as _nx
   File /usr/lib64/python2.7/site-packages/numpy/core/__init__.py, line
 45, in module
 from numpy.testing import Tester
   File /usr/lib64/python2.7/site-packages/numpy/testing/__init__.py,
 line 10, in module
 import decorators as dec
   File /usr/lib64/python2.7/site-packages/numpy/testing/decorators.py,
 line 19, in module
 from numpy.testing.utils import \
   File /usr/lib64/python2.7/site-packages/numpy/testing/utils.py, line
 12, in module
 from .nosetester import import_nose
   File /usr/lib64/python2.7/site-packages/numpy/testing/nosetester.py,
 line 12, in module
 from numpy.compat import basestring
 ImportError: cannot import name basestring


 I am using Python27 on Fedora 19.

 $ gcc --version
 gcc (GCC) 4.8.1 20130603 (Red Hat 4.8.1-1)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Valid algorithm for generating a 3D Wiener Process?

2013-09-27 Thread Daπid
On 26 September 2013 10:02, Daπid davidmen...@gmail.com wrote:

 The simplest way is to do it in cartesian coordinates: take x, y, and z
 independently from N(0,1). If you want to generate only one normal number
 per step, consider the jacobian in the angles.



Actually, this is wrong, as it would allow displacements (at 1 sigma) of 1
along the axis, but up to sqrt(3) along diagonals. What you actually want
is a multivariate normal distribution with covariance proportional to the
identity (uncorrelation between axis and isotropy).

See formulae here:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Valid algorithm for generating a 3D Wiener Process?

2013-09-26 Thread Daπid
On 25 September 2013 19:41, David Goldsmith d.l.goldsm...@gmail.com wrote:

 the 'angles' that describe the position undergo a random walk [actually,
 it would seem that they don't, since they too fail the
 varying-as-white-noise test], so the particle tends to move in the same
 direction over short intervals--is this just another way of saying that,
 since I was varying the angles by -1, 0, or 1 unit each time, the
 simulation is susceptible to unnaturally long strings of -1, 0, or 1
 increments?



In the 1D case, the white noise has a gaussian probability distribution of
being positive or negative. Translated to the Wiener process, it means you
would have to sum normally distributed values. When you go 3D you can do
the same thing, taking a random displacement from a N(0,1) and two random
angles.

The issue here is that the polar angles cannot be taken uniformly, but
instead they have to be distributed proportionally  to the jacobian. As you
have it now, your particle will tend to move towards the poles. If you want
to visualize it: take a sphere and imagine dots spaced evenly at angles
(intersection of meridians and parallels, for example): they are much more
dense at the poles.

The simplest way is to do it in cartesian coordinates: take x, y, and z
independently from N(0,1). If you want to generate only one normal number
per step, consider the jacobian in the angles.


David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)

2013-09-06 Thread Daπid
On 6 September 2013 21:21, Yaroslav Halchenko li...@onerussian.com wrote:

 some old ones are
 still there, some might be specific to my CPU here


How long does one run take? Maybe I can run it in my machine (Intel i5) for
comparison.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Stick intersection path algorithm

2013-09-05 Thread Daπid
On 5 September 2013 13:14, Josè Luis Mietta joseluismie...@yahoo.com.arwrote:

 2. Using networkx or other tool: how can I obtain the 'clusters size'
 distribution (that is: number of clusters of size 'D', for all
 cluster-sizes)?


This is best asked in their mailing list. A possible way is:

np.bincount([len(i) for i in nx.connected_components(G)])

For example:
np.bincount([len(i) for i in
nx.connected_components(nx.erdos_renyi_graph(100, 0.01))])
array([ 0, 39,  7,  3,  1,  1,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,
0,  0,  1])
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Stick intersection path algorithm

2013-09-05 Thread Daπid
On 5 September 2013 17:03, Josè Luis Mietta joseluismie...@yahoo.com.arwrote:

 The array ([ 0, 39,  7,  3,  1,  1,  0,  0,  0,  0,  1,  0,  0,  0,  0,
  0,  0, 0,  0,  1]) means that in the sistem (graph) are : 4 cluster of
 size 1, one cluster of size 3, one cluster of size 7 and one cluste of size
 39?


No, it means there are 39 of size 1, 7 of size 2 and so on.


David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numpy 1.8.0 beta 1 release

2013-09-01 Thread Daπid
On 1 September 2013 18:54, Charles R Harris charlesr.har...@gmail.comwrote:

 I'm happy to announce the first beta release of Numpy 1.8.0. Please try
 this beta and report any issues on the numpy-dev mailing list.


In an old thread there was discussion about adding minmax and sincos; is
there a plan for implementing these?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Stick intersection path algorithm

2013-09-01 Thread Daπid
On 2 September 2013 02:50, Josè Luis Mietta joseluismie...@yahoo.com.arwrote:

 I wanna see if two line segments are connected by a path formed by line
 segments intersected.


You could build a network where each segment is a node and an intersection
is a link. Then, you could use NetworkX connected_components to get groups
of segments together.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] RAM problem during code execution - Numpya arrays

2013-08-23 Thread Daπid
On 23 August 2013 16:59, Benjamin Root ben.r...@ou.edu wrote:
 A lot of the code you have here can be greatly simplified. I would start
 with just trying to get rid of appends as much as possible and use
 preallocated arrays with np.empty() or np.ones() or the likes.

Also, if you don't know beforehand the final size of the array (I find
difficult to follow the flow of the program, it is quite lengthy), you
can use lists as a temporary thing:

results = []
while SomeCondition:
   do_something()
   results.append(res)
results = np.array(res)

Also, it may also help to trace things down encapsulate pieces of code
inside functions. There are two reasons for this: it will make the
code more readable, easier to test, and you could run independently
pieces of code to find where is your memory growing.


I hope it is of any help.

David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading footer lines

2013-08-13 Thread Daπid
On 13 August 2013 14:20, Resmi l.re...@gmail.com wrote:
 As a workaround, I've tried using os.system along with grep. And I get the
 following output :

 os.system(grep -e 'tx' 'data.dat' )
  ## tx =2023.06
 0

 Why is there a 0 in the output? The file has no blank lines.

That 0 corresponds to the exit status of os.system, as there were no
errors, it returns a 0.

My idea would be to read the file and store it into a StringIO. That
can be fed to np.loadtxt to extract the numbers, and look for your
number. My (untested) attempt:


import StringIO

with datafile open as f:
... raw_data = StringIO.StringIO()
... raw_data.write(f.read())

data = np.loadfromtxt(raw_data)
numbers = [float(line.split['='][1].strip()) for line in
raw_data.readlines() if line[0] == '#']

If there is only one number there and it is after all the data, you
could skip them all. From the shape of data you know there are N lines
of data:

for _ in range(N):
... raw_data.readline()
while True:
... line = raw_data.readline()
... if line[0] == '#':
.. number = float(line.split['='][1].strip())
.. break


Alternatively, you could also use seek to put the pointer a certain
distance from the end of the file and start from there, but this could
cause problems in Windows.




David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] add .H attribute?

2013-07-24 Thread Daπid
An idea:

If .H is ideally going to be a view, and we want to keep it this way,
we could have a .h() method with the present implementation. This
would preserve the name .H for the conjugate view --when someone finds
the way to do it.

This way we would increase the readability, simplify some matrix
algebra code, and keep the API consistency.

On 24 July 2013 13:08, Dave Hirschfeld dave.hirschf...@gmail.com wrote:
 Nathaniel Smith njs at pobox.com writes:



 As soon as you talk about attributes returning things you've already
 broken Python's mental model... attributes are things that sit there,
 not things that execute arbitrary code. Of course this is not how the
 actual implementation works, attribute access *can* in fact execute
 arbitrary code, but the mental model is important, so we should
 preserve it where-ever we can. Just mentioning an attribute should not
 cause unbounded memory allocations.


 Yep, sorry - sloppy use of terminology which I agree is important in helping
 understand what's happening.

 -Dave




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.filled, again

2013-06-12 Thread Daπid
On 12 June 2013 17:29,  josef.p...@gmail.com wrote:
 +2 on this.
 I like a good name like `filled` for a function that I expect to use 
 regularly.
 it's short and descriptive.

FTIW, I am for this one too. That is not only clear, but a name that
one may try even before checking the docs.

Emtpy-filled sounds too much of an oximoron to me.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Avoiding messy code in Python

2013-06-12 Thread Daπid
On 12 June 2013 18:27, Chris Barker - NOAA Federal
chris.bar...@noaa.gov wrote:
 without documentation and testing.

 This one is easy: add documentation and testing!

This lecture may help you. It uses unittest, though, but the idea is
applicable to whatever system you use:

https://python.g-node.org/python-summerschool-2012/software_carpentry

And for a brief introduction to best practices, you could see this:
https://python.g-node.org/python-summerschool-2012/best_practices


David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] About speed vs math...

2013-06-03 Thread Daπid
On 3 June 2013 08:33, David Cournapeau courn...@gmail.com wrote:
 (around 50 on my own machine, but that's platform
 specific).

In my machine, it is around 10. You also have to be aware of the data
container: it is not the same to iterate over lists than over arrays.

In [6]: a = np.arange(50)

In [7]: %timeit np.sin(a)
10 loops, best of 3: 3.95 us per loop

In [8]: %timeit np.array([math.sin(i) for i in a]) # We are working
with arrays, we keep the array output.
1 loops, best of 3: 32.9 us per loop

In [9]: %timeit [math.sin(i) for i in a]# Keep the list output.
1 loops, best of 3: 21.3 us per loop



But, if you stick to lists:

In [12]: b = range(50)

In [13]: %timeit [math.sin(i) for i in b]
10 loops, best of 3: 12.4 us per loop



In any case, switching between iterators and numpy functions is, in my
opinion, error prone. I would use math only for scalars, and numpy for
the rest, except if optimisation is needed (as a last step). For the
record, another interesting example:


In [18]: a = np.arange(10)

In [19]: b = range(10)

In [20]: %timeit a + b  ## Mixed container
10 loops, best of 3: 10.4 us per loop

In [21]: %timeit [i + j for i, j in zip(a, b)]
10 loops, best of 3: 12 us per loop

In [22]: %timeit a + a   ## Pure array
100 loops, best of 3: 1.23 us per loop

In [23]: %timeit [i + j for i, j in zip(a, a)]
10 loops, best of 3: 7.21 us per loop

In [24]: %timeit b + b  ## Pure list
100 loops, best of 3: 197 ns per loop ## PS

In [25]: %timeit [i + j for i, j in zip(b, b)]
10 loops, best of 3: 1.68 us per loop

Here, b + b is just attaching the lists, not adding them. In a
program, that may be an error due to forgotten conversion.

Out of curiosity, VPython, a visual package oriented to beginners, has
its own functions that redirect to math if applied on scalars and to
numpy otherwise.



David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Name change of the ptp() function

2013-05-10 Thread Daπid
On May 10, 2013 3:18 PM, Robert Kern robert.k...@gmail.com wrote:
 Sure, it's probably more readable

I am not sure of it. I would have to check the docs to see what it means.
The mathematical term is range, but it already has a meaning in Python, so
it is not a good way to go, being perhaps valuerange the  compromise,  but
not really clear by itself. In some areas, nevertheless, ptp is the
standard notation, as it is in electronics - and maybe that is why it made
its way into Numeric.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] printing array in tabular form

2013-05-09 Thread Daπid
On 9 May 2013 10:06, Sudheer Joseph sudheer.jos...@yahoo.com wrote:
 However writing a formatted out put looks to be bit tricky with python
 relative to other programing languages.

If performance is not an issue, you could do it by hand, as you can
always do in any programming language:


savefile = open('data.txt', 'w')
N = len(IL)

for start in xrange(N/5):
   if start+5  N:
 end = N
   else:
 end = start+5
   print  savefile, IL[start : end]


But this is actually more verbose, and once you get into NumPy
workflow, it is actually simple.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)

2013-05-07 Thread Daπid
On 7 May 2013 13:47, Sebastian Berg sebast...@sipsolutions.net wrote:
 Indexing/assignment was the first thing I thought of too (also because
 fancy indexing/assignment really could use some speedups...). Other then
 that maybe some timings for small arrays/scalar math, but that might be
 nice for that GSoC project.

Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.

Also, what about interfacing with other packages? It may increase the
compiling overhead, but I would like to see Cython in action (say,
only last version, maybe it can be fixed).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

2013-05-01 Thread Daπid
On 1 May 2013 03:36, Benjamin Root ben.r...@ou.edu wrote:
 Are there any other functions that others feel are missing from numpy and
 would like to see for v1.8?  Let's discuss them here.

I would like to have sincos, to compute sin and cos of the same number
faster. According to some benchmarks, it is barely slower than just
computing one of them.



On 1 May 2013 07:13, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote:
 Of course, the documentation for discussed before: np.minmax().  My thinking 
 is that it would return a 2xN array

 How about a tuple: (min, max)?

Consider the case of np.minmax(matrix, axis=1), you will end up with a
tuple of two arrays. In that scenario, you probably want to do
computations with both numbers, so having them in an array seems more
convenient.

If there is enough reason, we could always add a unpack=True flag
and then return a tuple.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

2013-05-01 Thread Daπid
On 1 May 2013 17:13, Todd toddr...@gmail.com wrote:
 Speaking of which, I think there should be a function to construct a complex
 array out of two identically-shaped floating-point arrays, as well as
 perhaps an np.i class that converts a real array to an imaginary one (using
 __mul__ and such).

np.i would be exactly the same as array * 1j, or am I missing anything?

The same goes for constructing a complex, real + imag * 1j
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSoC proposal -- Numpy SciPy

2013-05-01 Thread Daπid
On 1 May 2013 20:12, Blake Griffith blake.a.griff...@gmail.com wrote:
 However, it would still be useful to have ufuncs working well with the
 sparse package.

How are you planning to deal with ufunc(0) != 0? cos(sparse) is actually dense.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nanmean(), nanstd() and other missing functions for 1.8

2013-04-30 Thread Daπid
On 1 May 2013 03:36, Benjamin Root ben.r...@ou.edu wrote:
 There is one other non-trivial function that have been discussed before:
 np.minmax().  My thinking is that it would return a 2xN array (where N is
 whatever size of the result that would be returned if just np.min() was
 used).  This would allow one to do min, max = np.minmax(X).

I had been looking for this function in the past, I think it is a good
and necessary addition. It should also come with its companion,
np.argminmax.


David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Pruned / sparse FFT 2D

2013-04-15 Thread Daπid
Hi,

I have a sparse cloud of points of ~10^4 points randomly scattered around a
triangular domain [1] from which I want to take the Fourier transform. I
have been looking for algorithms and found one library, but only appears to
be for the 1-D case (and seems there is no documentation). In [3] there is
C code for the FFTW, but seems it code is itself pruned (pun intended).

Does anyone have a clue about how to perform it? Speed is not a big issue,
but accuracy is quite important.


Thanks,

David.


[1] http://www.asaaf.org/~davidmh/coverage.png
[2] https://pypi.python.org/pypi/pynfftls/1.0
[3] http://www.fftw.org/pruned.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sources more confusing in Python

2013-04-07 Thread Daπid
On 7 April 2013 16:53, Happyman bahtiyor_zohi...@mail.ru wrote:

 I completed my Registration..but still no idea how to use it..it is ,may
 be, lack of experience of python ..


You don't need registration. The easiest way to use it is via pip or
easy_install. If you are on Windows, download and execute the easy_install
installer, if you are on Linux, browse your repository for any of them.

Once installed, you can do:

$easy_install pip  # to install pip, easiest way to get it on Windows

$pip install numpy # to install package numpy

All this, in the terminal. To get it on Windows, run - cmd
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Daπid
On 4 April 2013 11:03, Daniele Nicolodi dani...@grinta.net wrote:

 I think that generally the issue is not relevant for any practical use
 of a timebase: there are not many applications requiring sub-second
 accuracy over many years periods.


I agree. I think the basic datetime object should ignore this issue
(properly documented), and leaving room for someone writing a
datetime_leapsecond object, that would be aware of them. Avoiding this
issue we can achieve a better performance and a simpler code base, with
enough functionality for most practical purposes.

Another point that should be noted: as stated earlier, the leap seconds
cannot be predicted, they require a frequent update, making replicability
difficult: the same code in two machines with same numpy version, but
updated at different times can produce different results.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy

2013-03-23 Thread Daπid
I am a bit worried about the differences in results. Just to be sure
you are comparing apples with apples, it may be a good idea to set the
seed at the beginning:

np.random.seed( SEED )

where SEED is an int. This way, you will be inverting always the same
matrix, regardless of the Python version. I think, even if the timing
is different, the results should be the same.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed



David.


On 23 March 2013 15:39, Colin J. Williams cjwilliam...@gmail.com wrote:
 On 23/03/2013 7:21 AM, Ralf Gommers wrote:




 On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams cjwilliam...@gmail.com
 wrote:

 On 20/03/2013 11:12 AM, Frédéric Bastien wrote:

 On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams
 cjwilliam...@gmail.com wrote:

 On 20/03/2013 10:30 AM, Frédéric Bastien wrote:

 Hi,

 win32 do not mean it is a 32 bits windows. sys.platform always return
 win32 on 32bits and 64 bits windows even for python 64 bits.

 But that is a good question, is your python 32 or 64 bits?

 32 bits.

 That explain why you have memory problem but not other people with 64
 bits version. So if you want to work with bigger input, change to a
 python 64 bits.

 Fred

 Thanks to the people who responded to my report that numpy, with Python
 3.2 was significantly slower than with Python 2.7.

 I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and
 3.3.0.

 The Pythons came from python.org and the Numpys from PyPi.  The SciPy site
 still points to Source Forge, I gathered from the responses that Source
 Forge is no longer recommended for downloads.


 That's not the case. The official binaries for NumPy and SciPy are on
 SourceForge. The Windows installers on PyPI are there to make easy_install
 work, but they're likely slower than the SF installers (no SSE2/SSE3
 instructions).

 Ralf

 Thanks, I'll read over Robert Kern's comments.  PyPi is the simpler process,
 but, if the result is unoptimized code, then easy_install is not the way to
 go.

 The code is available here(http://web.ncf.ca/cjw/testFPSpeed.py)
 and the most recent test results are
 here(http://web.ncf.ca/cjw/FP%2023-Mar-13%20Test%20Summary.txt).  These are
 using PyPi, I'll look into SourceForge.

 Colin W.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy

2013-03-20 Thread Daπid
Without much detailed knowledge of the topic, I would expect both
versions to give very similar timing, as it is essentially a call to
ATLAS function, not much is done in Python.

Given this, maybe the difference is in ATLAS itself. How have you
installed it? When you compile ATLAS, it will do some machine-specific
optimisation, but if you have installed a binary chances are that your
version is optimised for a machine quite different from yours. So, two
different installations could have been compiled in different machines
and so one is more suited for your machine. If you want to be sure, I
would try to compile ATLAS (this may be difficult) or check the same
on a very different machine (like an AMD processor, different
architecture...).



Just for reference, on Linux Python 2.7 64 bits can deal with these
matrices easily.

%timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
res = np.dot(mat, matinv); diff= res-np.eye(6143); print
np.sum(np.abs(diff))
2.41799631031e-05
1.13955868701e-05
3.64338191541e-05
1.13484781021e-05
1 loops, best of 3: 156 s per loop

Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
(I don't run heavy stuff on this computer).

On 20 March 2013 14:46, Colin J. Williams c...@ncf.ca wrote:
 I have a small program which builds random matrices for increasing matrix
 orders, inverts the matrix and checks the precision of the product.  At some
 point, one would expect operations to fail, when the memory capacity is
 exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area handled,
 but not 6,143.

 Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7.
 The profiler indicates a problem in the solver.

 Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free
 disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.

 The results are show below.

 Colin W.

 _
 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
 order=2   measure ofimprecision= 0.097   Time elapsed (seconds)=
 0.004143
 order=5   measure ofimprecision= 2.207   Time elapsed (seconds)=
 0.001514
 order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
 0.001455
 order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
 0.001608
 order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
 0.002339
 order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
 0.005747
 order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
 0.029974
 order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
 0.145339
 order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
 0.841142
 order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
 5.793630
 order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
 39.559540
 order=  6143 Process terminated by a MemoryError

 Above: 2.7.3  Below: Python 3.2.3

 bbb_bbb
 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
 order=2   measure ofimprecision= 0.000   Time elapsed (seconds)=
 0.113930
 order=5   measure ofimprecision= 1.807   Time elapsed (seconds)=
 0.001373
 order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
 0.001468
 order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
 0.001609
 order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
 0.002687
 order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
 0.013510
 order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
 0.061560
 order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
 0.418490
 order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
 3.815713
 order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
 27.877270
 order= 3071   measure ofimprecision= 9.996   Time elapsed
 (seconds)=212.545610
 order=  6143 Process terminated by a MemoryError



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Daπid
On 13 March 2013 16:54, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote:

  Since I'm in the process of buying new hardware too, a slight OT (but
 definitely related).
 Does an ssd provide substantial improvement in these cases?


It should help. Nevertheless, when talking about performance, it is
difficult to predict, mainly because in a computer there are many things
going on and many layers involved.

I have a couple of computers equipped with SSD, if you want, if you send me
some benchmarks I can run them and see if I get any speedup.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.test('full') errors and failures

2013-02-12 Thread Daπid
On 12 February 2013 10:55, Dominic Steinitz domi...@steinitz.org wrote:
 Running unit tests for numpy
 NumPy version 1.8.0.dev-4600b2f

I can see this is not the stable version, try the 1.7 that has been
just released.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] number of tests

2013-02-12 Thread Daπid
On 12 February 2013 04:54, Davide Del Vento ddve...@ucar.edu wrote:
 Ran 3587 tests in 22.211s
 FAILED (KNOWNFAIL=5, SKIP=11, failures=2)

 Whereas in a remote batch node (with a script) I get:

 Ran 3229 tests in 15.642s
 OK (KNOWNFAIL=5, SKIP=19)

On my machine (linux 64 bits)

In [3]: np.test('full')
Running unit tests for numpy
NumPy version 1.7.0
NumPy is installed in /usr/lib64/python2.7/site-packages/numpy
Python version 2.7.3 (default, Aug  9 2012, 17:23:57) [GCC 4.7.1
20120720 (Red Hat 4.7.1-5)]
nose version 1.2.1

--
Ran 4836 tests in 33.016s

OK (KNOWNFAIL=5, SKIP=1)
Out[3]: nose.result.TextTestResult run=4836 errors=0 failures=0
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Daπid
I have just upgraded numpy with pip on Linux 64 bits with Python 2.7,
and I got *a lot* of output, so much it doesn't fit in the terminal.
Most of it are gcc commands, but there are many different errors
thrown by the compiler. Is this expected?

I am not too worried as the test suite passes, but pip is supposed to
give only meaningful output (or at least, this is what the creators
intended).


David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Daπid
On 12 February 2013 14:58, Francesc Alted franc...@continuum.io wrote:
 Yes, I think that's expected. Just to make sure, can you send some
 excerpts of the errors that you are getting?

Actually the errors are at the beginning of the process, so they are
out of the reach of my terminal right now. Seems like pip doesn't keep
a log in case of success.

The ones I can see are mostly warnings of unused variables and
functions, maybe this is the expected behaviour for a library? This
errors come from a complete reinstall instead of the original upgrade
(the cat closed the terminal, worst excuse ever!):


compile options: '-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include
-I/usr/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/quicksort.c
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/quicksort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1170:0: warning:
_POSIX_C_SOURCE redefined [enabled by default]
In file included from /usr/include/stdlib.h:24:0,
 from numpy/core/src/npysort/quicksort.c.src:31:
/usr/include/features.h:168:0: note: this is the location of the
previous definition
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/quicksort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1192:0: warning:
_XOPEN_SOURCE redefined [enabled by default]
In file included from /usr/include/stdlib.h:24:0,
 from numpy/core/src/npysort/quicksort.c.src:31:
/usr/include/features.h:170:0: note: this is the location of the
previous definition
gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/mergesort.c
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/mergesort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1170:0: warning:
_POSIX_C_SOURCE redefined [enabled by default]
In file included from /usr/include/stdlib.h:24:0,
 from numpy/core/src/npysort/mergesort.c.src:31:
/usr/include/features.h:168:0: note: this is the location of the
previous definition
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/mergesort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1192:0: warning:
_XOPEN_SOURCE redefined [enabled by default]
In file included from /usr/include/stdlib.h:24:0,
 from numpy/core/src/npysort/mergesort.c.src:31:
/usr/include/features.h:170:0: note: this is the location of the
previous definition
gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/heapsort.c
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/heapsort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1170:0: warning:
_POSIX_C_SOURCE redefined [enabled by default]
In file included from /usr/include/stdlib.h:24:0,
 from numpy/core/src/npysort/heapsort.c.src:31:
/usr/include/features.h:168:0: note: this is the location of the
previous definition
In file included from /usr/include/python2.7/pyconfig.h:6:0,
 from /usr/include/python2.7/Python.h:8,
 from numpy/core/src/private/npy_sort.h:5,
 from numpy/core/src/npysort/heapsort.c.src:32:
/usr/include/python2.7/pyconfig-64.h:1192:0: warning:
_XOPEN_SOURCE redefined [enabled by default]



compile options:
'-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath
-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include
-I/usr/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: numpy/core/src/umath/umathmodule_onefile.c
In file included from 

Re: [Numpy-discussion] Issues to fix for 1.7.0rc2.

2013-02-06 Thread Daπid
On 6 February 2013 08:41, Charles R Harris charlesr.har...@gmail.com wrote:

 More Python craziness

 In [6]: print None or 0
 0

 In [7]: print 0 or None
 None

Just for clarifying this behaviour:

In [1]: print None or 0
0

In [2]: print 0 or None
None

In [3]: val = 0 or None

In [4]: print val
None
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Insights / lessons learned from NumPy design

2013-01-09 Thread Daπid
On Jan 9, 2013 11:35 AM, Mike Anderson mike.r.anderson...@gmail.com
wrote:
 But I'm curious: what is the main use case for the alternative data types
in NumPy? Is it for columns of data of heterogeneous types? or something
else?
In my case, I have used 32 bit (or lower) arrays due to memory limitations
and some significant speedups in certain situations. This was particularly
useful when I was preprocessing numerous arrays to especially Boolean data,
saved a lot of hd space and I/O. I have used 128 bits when precision was
critical, as I was dealing with very small differences.
It is also nice to be able to repeat your computation with different
precision in order to spot possible numerical instabilities, even if the
performance is not great.l

David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way of binning points and applying functions to these groups

2012-12-26 Thread Daπid
This looks like the perfect work for cython. It it's great opp optimizing
loops.
Another option is the new
Numba, an automatic compiler.

David.
El 26/12/2012 10:09, Eric Emsellem eric.emsel...@eso.org escribió:

 Hi!

 I am looking for an efficient way of doing some simple binning of points
 and then applying some functions to points within each bin.

 I have tried several ways, including crude looping over the indices, or
 using digitize (see below) but I cannot manage to get it as efficient as
 I need it to be. I have a comparison with a similar (although complex)
 code in idl, and thought I would ask the forum. In idl there is a way to
 invert an histogram and get a reverse set of indices (via the
 histogram function) which seems to do the trick (or maybe it is faster
 for another reason).

 Below I provide a (dummy) example of what I wish to achieve. Any hint on
 how to do this EFFICIENTLY using numpy is most welcome. I need to speed
 things up quite a bit (at the moment, the version I have, see below, is
 10 times slower than the more complex idl routine..., I must be doing
 something wrong!).

 thanks!!
 Eric
 
 # I have a random set of data points in 2D with coordinates x,y :
 import numpy as np
 x = np.random.random(1000)
 y = np.random.random(1000)

 # I have now a 2D grid given by let's say 10x10 grid points:
 nx = 11
 ny = 21
 lx = linspace(0,1,nx)
 ly = linspace(0,1,ny)
 gx, gy = np.meshgrid(lx, ly)

 # So my set of 2D bins are (not needed in the solution I present but
 just for clarity)
 bins = np.dstack((gx.ravel(), gy.ravel()))[0]

 # Now I want to have the list of points in each bin and
 # if the number of points in that bin is larger than 10, apply (dummy)
 function func1 (see below)
 # If less than 10, apply (dummy) function func2 so (dum?)
 # if 0, do nothing
 # for two dummy functions like (for example):
 def func1(x) : return x.mean()

 def func2(x) : return x.std()

 # One solution would be to use digitize in 1D and histogram in 2D (don't
 need gx, gy for this one):

 h = histogram2d(x, y, bins=[lx, ly])[0]

 digitX = np.digitize(x, lx)
 digitY = np.digitize(y, ly)

 # create the output array, with -999 values to make sure I see which
 ones are not filled in
 result = np.zeros_like(h) - 999

 for i in range(nx-1) :
  for j in range(ny-1) :
  selectionofpoints = (digitX == i+1)  (digitY == j+1)
  if h[i,j]  10 : result[i,j] = func1(x[selectionofpoints])
  elif h[i,j]  0 : result[i,j] = func2(x[selectionofpoints])
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy where function on different sized arrays

2012-11-24 Thread Daπid
A pure Python approach could be:

for i, x in enumerate(a):
for j, y in enumerate(x):
if y in b:
idx.append((i,j))

Of course, it is slow if the arrays are large, but it is very
readable, and probably very fast if cythonised.


David.

On Sat, Nov 24, 2012 at 10:19 PM, David Warde-Farley
d.warde.far...@gmail.com wrote:
 M = A[..., np.newaxis] == B

 will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a
 boolean mask for all the occurrences of B[i] in A.

 If you wanted all the (i, j) pairs for each value in B, you could do
 something like

 import numpy as np
 from itertools import izip, groupby
 from operator import itemgetter

 id1, id2, id3 = np.where(A[..., np.newaxis] == B)
 order = np.argsort(id3)
 triples_iter = izip(id3[order], id1[order], id2[order])
 grouped = groupby(triples_iter, itemgetter(0))
 d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in
 grouped)

 Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and
 the keys of d are every value in B.



 On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.uk
 wrote:

 Hi all

 This must have been answered in the past but my google search capabilities
 are not the best.

 Given an array A say of dimension 40x60 and given another array/vector B
 of dimension 20 (the values in B occur only once).

 What I would like to do is the following which of course does not work (by
 the way doesn't work in IDL either):

 indx=where(A == B)

 I understand A and B are both of different dimensions. So my question:
 what would the fastest or proper way to accomplish this (I found a solution
 but think is rather awkward and not very scipy/numpy-tonic tough).

 Thanks
 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   >