Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-29 Thread Neil Girdhar
Sorry for the late reply.   I will definitely consider submitting a pull
request to numexpr if it's the direction I decide to go.  Right now I'm
still evaluating all of the many options for my project.

I am implementing a machine learning algorithm as part of my thesis work.
I'm in the make it work, but quickly approaching the make it fast part.

With research, you usually want to iterate quickly, and so whatever
solution I choose has to be automated.  I can't be coding things in an
intuitive, natural way, and then porting it to a different implementation
to make it fast.  What I want is for that conversion to be automated.  I'm
still evaluating how to best achieve that.

On Tue, Apr 28, 2015 at 6:08 AM, Francesc Alted fal...@gmail.com wrote:

 2015-04-28 4:59 GMT+02:00 Neil Girdhar mistersh...@gmail.com:

 I don't think I'm asking for so much.  Somewhere inside numexpr it builds
 an AST of its own, which it converts into the optimized code.   It would be
 more useful to me if that AST were in the same format as the one returned
 by Python's ast module.  This way, I could glue in the bits of numexpr that
 I like with my code.  For my purpose, this would have been the more ideal
 design.


 I don't think implementing this for numexpr would be that complex. So for
 example, one could add a new numexpr.eval_ast(ast_expr) function.  Pull
 requests are welcome.

 At any rate, which is your use case?  I am curious.

 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Nick Papior Andersen
You could try and install your own numpy to check whether that resolves the
problem.

2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com:

 on cluster A 1.9.0 and on cluster B 1.8.2

 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen nickpap...@gmail.com:

 Compile it yourself to know the limitations/benefits of the dependency
 libraries.

 Otherwise, have you checked which versions of numpy they are, i.e. are
 they the same version?

 2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com:

 I work on two distinct scientific clusters. I have run the same python
 code on the two clusters and I have noticed that one is faster by an order
 of magnitude than the other (1min vs 10min, this is important because I run
 this function many times).

 I have investigated with a profiler and I have found that the cause of
 this is that (same code and same data) is the function numpy.array that is
 being called 10^5 times. On cluster A it takes 2 s in total, whereas on
 cluster B it takes ~6 min.  For what regards the other functions, they are
 generally faster on cluster A. I understand that the clusters are quite
 different, both as hardware and installed libraries. It strikes me that on
 this particular function the performance is so different. I would have
 though that this is due to a difference in the available memory, but
 actually by looking with `top` the memory seems to be used only at 0.1% on
 cluster B. In theory numpy is compiled with atlas on cluster B, and on
 cluster A it is not clear, because numpy.__config__.show() returns NOT
 AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the
 performance on cluster B?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Kind regards Nick

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Kind regards Nick
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Sebastian Berg
There was a major improvement to np.array in some cases.

You can probably work around this by using np.concatenate instead of
np.array in your case (depends on the usecase, but I will guess you have
code doing:

np.array([arr1, arr2, arr3])

or similar. If your use case is different, you may be out of luck and
only an upgrade would help.


On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote:
 You could try and install your own numpy to check whether that
 resolves the problem.
 
 2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com:
 on cluster A 1.9.0 and on cluster B 1.8.2
 
 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen
 nickpap...@gmail.com:
 Compile it yourself to know the limitations/benefits
 of the dependency libraries.
 
 
 Otherwise, have you checked which versions of numpy
 they are, i.e. are they the same version?
 
 
 2015-04-29 17:05 GMT+02:00 simona bellavista
 afy...@gmail.com:
 
 I work on two distinct scientific clusters. I
 have run the same python code on the two
 clusters and I have noticed that one is faster
 by an order of magnitude than the other (1min
 vs 10min, this is important because I run this
 function many times). 
 
 
 I have investigated with a profiler and I have
 found that the cause of this is that (same
 code and same data) is the function
 numpy.array that is being called 10^5 times.
 On cluster A it takes 2 s in total, whereas on
 cluster B it takes ~6 min.  For what regards
 the other functions, they are generally faster
 on cluster A. I understand that the clusters
 are quite different, both as hardware and
 installed libraries. It strikes me that on
 this particular function the performance is so
 different. I would have though that this is
 due to a difference in the available memory,
 but actually by looking with `top` the memory
 seems to be used only at 0.1% on cluster B. In
 theory numpy is compiled with atlas on cluster
 B, and on cluster A it is not clear, because
 numpy.__config__.show() returns NOT AVAILABLE
 for anything.
 
 
 Does anybody has any insight on that, and if I
 can improve the performance on cluster B?
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 -- 
 Kind regards Nick
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 -- 
 Kind regards Nick
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread simona bellavista
on cluster A 1.9.0 and on cluster B 1.8.2

2015-04-29 17:18 GMT+02:00 Nick Papior Andersen nickpap...@gmail.com:

 Compile it yourself to know the limitations/benefits of the dependency
 libraries.

 Otherwise, have you checked which versions of numpy they are, i.e. are
 they the same version?

 2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com:

 I work on two distinct scientific clusters. I have run the same python
 code on the two clusters and I have noticed that one is faster by an order
 of magnitude than the other (1min vs 10min, this is important because I run
 this function many times).

 I have investigated with a profiler and I have found that the cause of
 this is that (same code and same data) is the function numpy.array that is
 being called 10^5 times. On cluster A it takes 2 s in total, whereas on
 cluster B it takes ~6 min.  For what regards the other functions, they are
 generally faster on cluster A. I understand that the clusters are quite
 different, both as hardware and installed libraries. It strikes me that on
 this particular function the performance is so different. I would have
 though that this is due to a difference in the available memory, but
 actually by looking with `top` the memory seems to be used only at 0.1% on
 cluster B. In theory numpy is compiled with atlas on cluster B, and on
 cluster A it is not clear, because numpy.__config__.show() returns NOT
 AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the
 performance on cluster B?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Kind regards Nick

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Julian Taylor
numpy 1.9 makes array(list) performance similar in performance to vstack
in 1.8 its very slow.

On 29.04.2015 17:40, simona bellavista wrote:
 on cluster A 1.9.0 and on cluster B 1.8.2
 
 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen nickpap...@gmail.com
 mailto:nickpap...@gmail.com:
 
 Compile it yourself to know the limitations/benefits of the
 dependency libraries.
 
 Otherwise, have you checked which versions of numpy they are, i.e.
 are they the same version?
 
 2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com
 mailto:afy...@gmail.com:
 
 I work on two distinct scientific clusters. I have run the same
 python code on the two clusters and I have noticed that one is
 faster by an order of magnitude than the other (1min vs 10min,
 this is important because I run this function many times). 
 
 I have investigated with a profiler and I have found that the
 cause of this is that (same code and same data) is the function
 numpy.array that is being called 10^5 times. On cluster A it
 takes 2 s in total, whereas on cluster B it takes ~6 min.  For
 what regards the other functions, they are generally faster on
 cluster A. I understand that the clusters are quite different,
 both as hardware and installed libraries. It strikes me that on
 this particular function the performance is so different. I
 would have though that this is due to a difference in the
 available memory, but actually by looking with `top` the memory
 seems to be used only at 0.1% on cluster B. In theory numpy is
 compiled with atlas on cluster B, and on cluster A it is not
 clear, because numpy.__config__.show() returns NOT AVAILABLE for
 anything.
 
 Does anybody has any insight on that, and if I can improve the
 performance on cluster B?
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 -- 
 Kind regards Nick
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Julian Taylor
On 29.04.2015 17:50, Robert Kern wrote:
 On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista afy...@gmail.com
 mailto:afy...@gmail.com wrote:

 I work on two distinct scientific clusters. I have run the same python
 code on the two clusters and I have noticed that one is faster by an
 order of magnitude than the other (1min vs 10min, this is important
 because I run this function many times).

 I have investigated with a profiler and I have found that the cause of
 this is that (same code and same data) is the function numpy.array that
 is being called 10^5 times. On cluster A it takes 2 s in total, whereas
 on cluster B it takes ~6 min.  For what regards the other functions,
 they are generally faster on cluster A. I understand that the clusters
 are quite different, both as hardware and installed libraries. It
 strikes me that on this particular function the performance is so
 different. I would have though that this is due to a difference in the
 available memory, but actually by looking with `top` the memory seems to
 be used only at 0.1% on cluster B. In theory numpy is compiled with
 atlas on cluster B, and on cluster A it is not clear, because
 numpy.__config__.show() returns NOT AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the
 performance on cluster B?
 
 Check to see if you have the Transparent Hugepages (THP) Linux kernel
 feature enabled on each cluster. You may want to try turning it off. I
 have recently run into a problem with a large-memory multicore machine
 with THP for programs that had many large numpy.array() memory
 allocations. Usually, THP helps memory-hungry applications (you can
 Google for the reasons), but it does require defragmenting the memory
 space to get contiguous hugepages. The system can get into a state where
 the memory space is so fragmented such that trying to get each new
 hugepage requires a lot of extra work to create the contiguous memory
 regions. In my case, a perfectly well-performing program would suddenly
 slow down immensely during it's memory-allocation-intensive actions.
 When I turned THP off, it started working normally again.
 
 If you have root, try using `perf top` to see what C functions in user
 space and kernel space are taking up the most time in your process. If
 you see anything like `do_page_fault()`, this, or a similar issue, is
 your problem.
 

this issue it has nothing to do with thp, its a change in array in numpy
1.9. Its now as fast as vstack, while before it was really really slow.

But the memory compaction is indeed awful, especially the backport
redhat did for their enterprise linux.

Typically it is enough to only disable the automatic defragmentation on
allocation only, not the full thps, e.g. via
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
(on redhat backports its a different path)

You still have the hugepaged running defrags at times of low load and in
limited fashion, you can also manually trigger a defrag by writting to:
/prog/sys/vm/compact_memory
Though the hugepaged which runs only occasionally should already do a
good job.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Weighted covariance.

2015-04-29 Thread Charles R Harris
The weighted covariance function in PR #4960
https://github.com/numpy/numpy/pull/4960 is evolving to the following,
where frequency weights are `f` and reliability weights are `a`.

Assume that the observations are in the columns of the observation matrix.
the steps to compute the weighted covariance are as follows::

 w = f * a
 v1 = np.sum(w)
 v2 = np.sum(a * w)
 m -= np.sum(m * w, axis=1, keepdims=True) / v1
 cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)

Note that when ``a == 1``, the normalization factor ``v1 / (v1**2 -
ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)``
as it should.

This is probably a good time for comments from all the kibitzers out there.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Robert Kern
On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista afy...@gmail.com wrote:

 I work on two distinct scientific clusters. I have run the same python
code on the two clusters and I have noticed that one is faster by an order
of magnitude than the other (1min vs 10min, this is important because I run
this function many times).

 I have investigated with a profiler and I have found that the cause of
this is that (same code and same data) is the function numpy.array that is
being called 10^5 times. On cluster A it takes 2 s in total, whereas on
cluster B it takes ~6 min.  For what regards the other functions, they are
generally faster on cluster A. I understand that the clusters are quite
different, both as hardware and installed libraries. It strikes me that on
this particular function the performance is so different. I would have
though that this is due to a difference in the available memory, but
actually by looking with `top` the memory seems to be used only at 0.1% on
cluster B. In theory numpy is compiled with atlas on cluster B, and on
cluster A it is not clear, because numpy.__config__.show() returns NOT
AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the
performance on cluster B?

Check to see if you have the Transparent Hugepages (THP) Linux kernel
feature enabled on each cluster. You may want to try turning it off. I have
recently run into a problem with a large-memory multicore machine with THP
for programs that had many large numpy.array() memory allocations. Usually,
THP helps memory-hungry applications (you can Google for the reasons), but
it does require defragmenting the memory space to get contiguous hugepages.
The system can get into a state where the memory space is so fragmented
such that trying to get each new hugepage requires a lot of extra work to
create the contiguous memory regions. In my case, a perfectly
well-performing program would suddenly slow down immensely during it's
memory-allocation-intensive actions. When I turned THP off, it started
working normally again.

If you have root, try using `perf top` to see what C functions in user
space and kernel space are taking up the most time in your process. If you
see anything like `do_page_fault()`, this, or a similar issue, is your
problem.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] performance of numpy.array()

2015-04-29 Thread simona bellavista
I work on two distinct scientific clusters. I have run the same python code
on the two clusters and I have noticed that one is faster by an order of
magnitude than the other (1min vs 10min, this is important because I run
this function many times).

I have investigated with a profiler and I have found that the cause of this
is that (same code and same data) is the function numpy.array that is being
called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B
it takes ~6 min.  For what regards the other functions, they are generally
faster on cluster A. I understand that the clusters are quite different,
both as hardware and installed libraries. It strikes me that on this
particular function the performance is so different. I would have though
that this is due to a difference in the available memory, but actually by
looking with `top` the memory seems to be used only at 0.1% on cluster B.
In theory numpy is compiled with atlas on cluster B, and on cluster A it is
not clear, because numpy.__config__.show() returns NOT AVAILABLE for
anything.

Does anybody has any insight on that, and if I can improve the performance
on cluster B?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Nick Papior Andersen
Compile it yourself to know the limitations/benefits of the dependency
libraries.

Otherwise, have you checked which versions of numpy they are, i.e. are they
the same version?

2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com:

 I work on two distinct scientific clusters. I have run the same python
 code on the two clusters and I have noticed that one is faster by an order
 of magnitude than the other (1min vs 10min, this is important because I run
 this function many times).

 I have investigated with a profiler and I have found that the cause of
 this is that (same code and same data) is the function numpy.array that is
 being called 10^5 times. On cluster A it takes 2 s in total, whereas on
 cluster B it takes ~6 min.  For what regards the other functions, they are
 generally faster on cluster A. I understand that the clusters are quite
 different, both as hardware and installed libraries. It strikes me that on
 this particular function the performance is so different. I would have
 though that this is due to a difference in the available memory, but
 actually by looking with `top` the memory seems to be used only at 0.1% on
 cluster B. In theory numpy is compiled with atlas on cluster B, and on
 cluster A it is not clear, because numpy.__config__.show() returns NOT
 AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the performance
 on cluster B?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Kind regards Nick
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion