[Numpy-discussion] saving the lower half of matrix

2013-06-17 Thread Bala subramanian
Friends,
I have to save only the lower half of a symmetric matrix to a file. I used
numpy.tril to extract the lower half. However when i use 'numpy.savetxt',
numpy saves the whole matrix (with upper half values replaced as zeros)
rather than only the lower half. Any better idea to achieve this.

as an example
import numpy as np
 d=np.arange(1,26,1)
 d=d.reshape(5,5)
 np.tril(d)
array([[ 1,  0,  0,  0,  0],
   [ 6,  7,  0,  0,  0],
   [11, 12, 13,  0,  0],
   [16, 17, 18, 19,  0],
   [21, 22, 23, 24, 25]])

 np.savetxt( 'test', np.tril(d))
output test also contains the zeros, what i want is only the lower half
like below.
1
67
11 12 13
 - - --  etc

Thanks,
Bala


-- 
C. Balasubramanian
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] saving the lower half of matrix

2013-06-17 Thread Kumar Appaiah
On Mon, Jun 17, 2013 at 01:07:34PM +0200, Bala subramanian wrote:
Friends,
I have to save only the lower half of a symmetric matrix to a file. I used
numpy.tril to extract the lower half. However when i use 'numpy.savetxt',
numpy saves the whole matrix (with upper half values replaced as zeros)
rather than only the lower half. Any better idea to achieve this.
as an example
import numpy as np
 d=np.arange(1,26,1)
 d=d.reshape(5,5)
 np.tril(d)
array([[ 1,� 0,� 0,� 0,� 0],
�� [ 6,� 7,� 0,� 0,� 0],
�� [11, 12, 13,� 0,� 0],
�� [16, 17, 18, 19,� 0],
�� [21, 22, 23, 24, 25]])
 
 np.savetxt( 'test', np.tril(d))
output test also contains the zeros, what i want is only the lower half
like below.
1
6��� 7
11 12 13
�- - -��� -� etc

How about saving numpy.concatenate([x[i,:i+1] for i in range(x.shape[0])])
instead? If you remove the concatenate, you'll get the individual
arrays that have the data you need as well.

Kumar
-- 
Kumar Appaiah
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] saving the lower half of matrix

2013-06-17 Thread Matthew Brett
Hi,

On Mon, Jun 17, 2013 at 12:07 PM, Bala subramanian
bala.biophys...@gmail.com wrote:
 Friends,
 I have to save only the lower half of a symmetric matrix to a file. I used
 numpy.tril to extract the lower half. However when i use 'numpy.savetxt',
 numpy saves the whole matrix (with upper half values replaced as zeros)
 rather than only the lower half. Any better idea to achieve this.

 as an example
import numpy as np
 d=np.arange(1,26,1)
 d=d.reshape(5,5)
 np.tril(d)
 array([[ 1,  0,  0,  0,  0],
[ 6,  7,  0,  0,  0],
[11, 12, 13,  0,  0],
[16, 17, 18, 19,  0],
[21, 22, 23, 24, 25]])

 np.savetxt( 'test', np.tril(d))
 output test also contains the zeros, what i want is only the lower half like
 below.
 1
 67
 11 12 13
  - - --  etc

Maybe you want something like:

e = d[np.tril_indices(5)]

?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] low level optimization in NumPy and minivect

2013-06-17 Thread Frédéric Bastien
Hi,

I saw that recently Julian Taylor is doing many low level optimization like
using SSE instruction. I think it is great.

Last year, Mark Florisson released the minivect[1] project that he worked
on during is master thesis. minivect is a compiler for element-wise
expression that do some of the same low level optimization that Julian is
doing in NumPy right now.

Mark did minivect in a way that allow it to be reused by other project. It
is used now by Cython and Numba I think. I had plan to reuse it in Theano,
but I didn't got the time to integrate it up to now.

What about reusing it in NumPy? I think that some of Julian optimization
aren't in minivect (I didn't check to confirm). But from I heard, minivect
don't implement reduction and there is a pull request to optimize this in
NumPy.

The advantage to concentrate some functionality in one common package is
that more project benefit from optimization done to it. (after the work to
use it first!)

How this could be done in NumPy? NumPy have its own code generator for many
dtype. We could call minivect code generator to replace some of it.

What do you think of this idea?


Sadly, I won't be able to spent time on the code for this, but I wanted to
raise the idea while people are working on that, in case it is helpful.

Frédéric


[1] https://github.com/markflorisson88/minivect
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-06-17 Thread Arink Verma
Hi Nathaniel

It's a probabilistic sampling profiler, so if it doesn't have enough
 samples then it can miss things. 227 samples is way way too low. You need
 to run the profiled code for longer (a few seconds at least), and if that's
 not enough then maybe increase the sampling rate too (though be careful
 because setting this too high can also add noise).


I ran code '10' times, which record 229115 samples. Callgraph[1]
generated is converging to *PyArray_DESCR*, and rest are unconnected.

Does it mean anything?
[1]
https://docs.google.com/file/d/0B3Pqyp8kuQw0MzRoTTNVbmlaNFE/edit?usp=sharing




-- 

Arink Verma
www.arinkverma.in
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-06-17 Thread Charles R Harris
On Mon, Jun 17, 2013 at 9:29 AM, Arink Verma arinkve...@gmail.com wrote:

 Hi Nathaniel

 It's a probabilistic sampling profiler, so if it doesn't have enough
 samples then it can miss things. 227 samples is way way too low. You need
 to run the profiled code for longer (a few seconds at least), and if that's
 not enough then maybe increase the sampling rate too (though be careful
 because setting this too high can also add noise).


 I ran code '10' times, which record 229115 samples. Callgraph[1]
 generated is converging to *PyArray_DESCR*, and rest are unconnected.


Not sure what you are profiling. The PyArray_DESCR call just returns a
pointer to the descr contained in an ndarray instance, so probably has
little relevance here.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-06-17 Thread Arink Verma


 Not sure what you are profiling. The PyArray_DESCR call just returns a
 pointer to the descr contained in an ndarray instance, so probably has
 little relevance here.


I am profiling following code

timeit.timeit('x+y',number=10,setup='import numpy as np;x =
np.asarray(1.0);y = np.asarray(2.0);')

Arink Verma
www.arinkverma.in
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-06-17 Thread Nathaniel Smith
On Mon, Jun 17, 2013 at 4:29 PM, Arink Verma arinkve...@gmail.com wrote:
 Hi Nathaniel

 It's a probabilistic sampling profiler, so if it doesn't have enough
 samples then it can miss things. 227 samples is way way too low. You need to
 run the profiled code for longer (a few seconds at least), and if that's not
 enough then maybe increase the sampling rate too (though be careful because
 setting this too high can also add noise).


 I ran code '10' times, which record 229115 samples. Callgraph[1]
 generated is converging to PyArray_DESCR, and rest are unconnected.

 Does it mean anything?
 [1]https://docs.google.com/file/d/0B3Pqyp8kuQw0MzRoTTNVbmlaNFE/edit?usp=sharing

I think it means that pprof is failing to walk to the stack to compute
callers. That's consistent with PyArray_DESCR being the only call
that it can find, because PyArray_DESCR isn't a real function, it
always gets inlined, so detecting the caller doesn't require walking
the stack.

Is your numpy compiled with -fomit-frame-pointer or something like
that? Any other weird build options used while building it? Is your
binary stripped? If you're using a distro version, do you have the
debug symbols installed? Did you compile this numpy yourself? (If not,
the simplest thing to do would be to just build it yourself using the
default settings and see if that helps.) What OS are you on?

When I run your profiling code (but being lazier and only running
12000 samples), then do 'google-pprof --pdf /path/to/python
/path/to/my/profile.prof', then I get the graph attached below.

-n


prof.pdf
Description: Adobe PDF document
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-17 Thread Julian Taylor
On 17.06.2013 17:11, Frédéric Bastien wrote:
 Hi,
 
 I saw that recently Julian Taylor is doing many low level optimization
 like using SSE instruction. I think it is great.
 
 Last year, Mark Florisson released the minivect[1] project that he
 worked on during is master thesis. minivect is a compiler for
 element-wise expression that do some of the same low level optimization
 that Julian is doing in NumPy right now.
 
 Mark did minivect in a way that allow it to be reused by other project.
 It is used now by Cython and Numba I think. I had plan to reuse it in
 Theano, but I didn't got the time to integrate it up to now.
 
 What about reusing it in NumPy? I think that some of Julian optimization
 aren't in minivect (I didn't check to confirm). But from I heard,
 minivect don't implement reduction and there is a pull request to
 optimize this in NumPy.

Hi,
what I vectorized is just the really easy cases of unit stride
continuous operations, so the min/max reductions which is now in numpy
is in essence pretty trivial.
minivect goes much further in optimizing general strided access and
broadcasting via loop optimizations (it seems to have a lot of overlap
with the graphite loop optimizer available in GCC [0]) so my code is
probably not of very much use to minivect.

The most interesting part in minivect for numpy is probably the
optimization of broadcasting loops which seem to be pretty inefficient
in numpy [0].

Concerning the rest I'm not sure how much of a bottleneck general
strided operations really are in common numpy using code.


I guess a similar discussion about adding an expression compiler to
numpy has already happened when numexpr was released?
If yes what was the outcome of that?


[0] http://gcc.gnu.org/wiki/Graphite
[1] ones((5000,100)) - ones((100,) spends about 40% of its time copying
stuff around in buffers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-06-17 Thread Arink Verma
I am building numpy from source, python setup.py build --fcompiler=gnu95
then installation, python setup.py install --user, on ubuntu 13.04

for analysis results
pprof --svg /usr/bin/python py.prof



On Mon, Jun 17, 2013 at 10:04 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jun 17, 2013 at 4:29 PM, Arink Verma arinkve...@gmail.com wrote:
  Hi Nathaniel
 
  It's a probabilistic sampling profiler, so if it doesn't have enough
  samples then it can miss things. 227 samples is way way too low. You
 need to
  run the profiled code for longer (a few seconds at least), and if
 that's not
  enough then maybe increase the sampling rate too (though be careful
 because
  setting this too high can also add noise).
 
 
  I ran code '10' times, which record 229115 samples. Callgraph[1]
  generated is converging to PyArray_DESCR, and rest are unconnected.
 
  Does it mean anything?
  [1]
 https://docs.google.com/file/d/0B3Pqyp8kuQw0MzRoTTNVbmlaNFE/edit?usp=sharing

 I think it means that pprof is failing to walk to the stack to compute
 callers. That's consistent with PyArray_DESCR being the only call
 that it can find, because PyArray_DESCR isn't a real function, it
 always gets inlined, so detecting the caller doesn't require walking
 the stack.

 Is your numpy compiled with -fomit-frame-pointer or something like
 that? Any other weird build options used while building it? Is your
 binary stripped? If you're using a distro version, do you have the
 debug symbols installed? Did you compile this numpy yourself? (If not,
 the simplest thing to do would be to just build it yourself using the
 default settings and see if that helps.) What OS are you on?

 When I run your profiling code (but being lazier and only running
 12000 samples), then do 'google-pprof --pdf /path/to/python
 /path/to/my/profile.prof', then I get the graph attached below.

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

Arink Verma
www.arinkverma.in


test.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-17 Thread Dag Sverre Seljebotn
On 06/17/2013 11:03 PM, Julian Taylor wrote:
 On 17.06.2013 17:11, Frédéric Bastien wrote:
 Hi,

 I saw that recently Julian Taylor is doing many low level optimization
 like using SSE instruction. I think it is great.

 Last year, Mark Florisson released the minivect[1] project that he
 worked on during is master thesis. minivect is a compiler for
 element-wise expression that do some of the same low level optimization
 that Julian is doing in NumPy right now.

 Mark did minivect in a way that allow it to be reused by other project.
 It is used now by Cython and Numba I think. I had plan to reuse it in
 Theano, but I didn't got the time to integrate it up to now.

 What about reusing it in NumPy? I think that some of Julian optimization
 aren't in minivect (I didn't check to confirm). But from I heard,
 minivect don't implement reduction and there is a pull request to
 optimize this in NumPy.

 Hi,
 what I vectorized is just the really easy cases of unit stride
 continuous operations, so the min/max reductions which is now in numpy
 is in essence pretty trivial.
 minivect goes much further in optimizing general strided access and
 broadcasting via loop optimizations (it seems to have a lot of overlap
 with the graphite loop optimizer available in GCC [0]) so my code is
 probably not of very much use to minivect.

 The most interesting part in minivect for numpy is probably the
 optimization of broadcasting loops which seem to be pretty inefficient
 in numpy [0].

There's also related things like

arr + arr.T

which has much less than optimal performance in NumPy (unless there was 
recent changes). This example was one of the motivating examples for 
minivect.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion