[Numpy-discussion] Bug in einsum?

2013-03-13 Thread Jaakko Luttinen
Hi,

I have encountered a very weird behaviour with einsum. I try to compute
something like R*A*R', where * denotes a kind of matrix
multiplication. However, for particular shapes of R and A, the results
are extremely bad.

I compare two einsum results:
First, I compute in two einsum calls as (R*A)*R'.
Second, I compute the whole result in one einsum call.
However, the results are significantly different for some shapes.

My test:
import numpy as np
for D in range(30):
A = np.random.randn(100,D,D)
R = np.random.randn(D,D)
Y1 = np.einsum('...ik,...kj-...ij', R, A)
Y1 = np.einsum('...ik,...kj-...ij', Y1, R.T)
Y2 = np.einsum('...ik,...kl,...lj-...ij', R, A, R.T)
print(D=%d % D, np.allclose(Y1,Y2), np.linalg.norm(Y1-Y2))

Output:
D=0 True 0.0
D=1 True 0.0
D=2 True 8.40339658678e-15
D=3 True 8.09995399928e-15
D=4 True 3.59428803435e-14
D=5 False 34.755610184
D=6 False 28.3576558351
D=7 False 41.5402690906
D=8 True 2.31709582841e-13
D=9 False 36.0161112799
D=10 True 4.76237746912e-13
D=11 True 4.5790782e-13
D=12 True 4.90302218301e-13
D=13 True 6.96175851271e-13
D=14 True 1.10067181384e-12
D=15 True 1.29095933163e-12
D=16 True 1.3466837332e-12
D=17 True 1.52265065763e-12
D=18 True 2.05407923852e-12
D=19 True 2.33327630748e-12
D=20 True 2.96849358082e-12
D=21 True 3.31063706175e-12
D=22 True 4.28163620455e-12
D=23 True 3.58951880681e-12
D=24 True 4.69973694769e-12
D=25 True 5.47385264567e-12
D=26 True 5.49643316347e-12
D=27 True 6.75132988402e-12
D=28 True 7.86435437892e-12
D=29 True 7.85453681029e-12

So, for D={5,6,7,9}, allclose returns False and the error norm is HUGE.
It doesn't seem like just some small numerical inaccuracy because the
error norm is so large. I don't know which one is correct (Y1 or Y2) but
at least either one is wrong in my opinion.

I ran the same test several times, and each time same values of D fail.
If I change the shapes somehow, the failing values of D might change
too, but I usually have several failing values.

I'm running the latest version from github (commit bd7104cef4) under
Python 3.2.3. With NumPy 1.6.1 under Python 2.7.3 the test crashes and
Python exits printing Floating point exception.

This seems so weird to me that I wonder if I'm just doing something stupid..

Thanks a lot for any help!
Jaakko
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adopt Mersenne Twister 64bit?

2013-03-13 Thread Neal Becker
Neal Becker wrote:

 Neal Becker wrote:
 
 I guess I talked to you about 100 years ago about sharing state between numpy
 rng and code I have in c++ that wraps boost::random.  So is there a C-api for
 this RandomState object I could use to call from c++?  Maybe I could do
 something with that.
 
 The c++ code could invoke via the python api, but that might be slower.  I'm
 just rambling here, I'd have to see the API to get some ideas.
 
 I think if I could just grab a long int from the underlying mersenne twister,
 through some c api?

Well, this at least appears to work - probably not the most efficient approach 
- 
calls the RandomState object via the python interface to get 4 bytes at a time:

int test1 (bp::object  rs) {
  bp::str bytes = call_methodbp::str (rs.ptr(), bytes, 4); // get 4 bytes
  return *reinterpret_castint* (PyString_AS_STRING (bytes.ptr()));
}

BOOST_PYTHON_MODULE (numpy_rand) {
  boost::numpy::initialize();

  def (test1, test1);
}


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
Hi everybody, I hope this has not been discussed before, I couldn't find a 
solution elsewhere.
I need to read some binary data, and I am using numpy.fromfile to do this. 
Since the files are huge, and would make me run out of memory, I need to read 
data skipping some records (I am reading data recorded at high frequency, so 
basically I want to read subsampling).
At the moment, I came up with the code below, which is then compiled using 
cython. Despite the significant performance increase from the pure python 
version, the function is still much slower than numpy.fromfile, and only reads 
one kind of data (in this case uint32), otherwise I do not know how to define 
the array type in advance. I have basically no experience with cython nor c, so 
I am a bit stuck. How can I try to make this more efficient and possibly more 
generic?
Thanks

import numpy as np
#For cython!
cimport numpy as np
from libc.stdint cimport uint32_t

def cffskip32(fid, int count=1, int skip=0):

cdef int k=0
cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32)

if skip=0:
while kcount:
try:
data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
fid.seek(skip, 1)
k +=1
except ValueError:
data = data[:k]
break
return data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Nathaniel Smith
On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't find a 
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this. 
 Since the files are huge, and would make me run out of memory, I need to read 
 data skipping some records (I am reading data recorded at high frequency, so 
 basically I want to read subsampling).
 At the moment, I came up with the code below, which is then compiled using 
 cython. Despite the significant performance increase from the pure python 
 version, the function is still much slower than numpy.fromfile, and only 
 reads one kind of data (in this case uint32), otherwise I do not know how to 
 define the array type in advance. I have basically no experience with cython 
 nor c, so I am a bit stuck. How can I try to make this more efficient and 
 possibly more generic?

If your data is stored as fixed-format binary (as it seems it is),
then the easiest way is probably

# Exploit the operating system's virtual memory manager to get a
virtual copy of the entire file in memory
# (This does not actually use any memory until accessed):
virtual_arr = np.memmap(path, np.uint32, r)
# Get a numpy view onto every 20th entry:
virtual_arr_subsampled = virtual_arr[::20]
# Copy those bits into regular malloc'ed memory:
arr_subsampled = virtual_arr_subsampled.copy()

(Your data is probably large enough that this will only work if you're
using a 64-bit system, because of address space limitations; but if
you have data that's too large to fit into memory, then I assume
you're using a 64-bit system anyway...)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy reference array

2013-03-13 Thread Matt U
Is it possible to create a numpy array which points to the same data in a
different numpy array (but in different order etc)?

For example:

Code: 
--
import numpy as np
a = np.arange(10)
ids = np.array([0,0,5,5,9,9,1,1])
b = a[ids]
a[0] = -1
b[0] #should be -1 if b[0] referenced the same data as a[0]
0
--

ctypes almost does it for me, but the access is inconvenient. I would like to
access b as a regular numpy array:

Code: 
--
import numpy as np
import ctypes
a = np.arange(10)
ids = np.array([0,0,5,5,9,9,1,1])
b = [a[id:id+1].ctypes.data_as(ctypes.POINTER(ctypes.c_long)) for id in ids]
a[0] = -1
b[0][0] #access is inconvenient
-1
--
Some more information: I've written a finite-element code, and I'm working on
optimizing the python implementation. Profiling shows the slowest operation is
the re-creation of an array that extracts edge degrees of freedom from the
volume of the element (similar to b above). So, I'm trying to avoid copying the
data every time, and just setting up 'b' once. The ctypes solution is
sub-optimal since my code is mostly vectorized, that is, later I'd like to
something like

Code:
--
c[ids] = b[ids] + d[ids]
--

where c, and d are the same shape as b but contain different data.

Any thoughts? If it's not possible that will save me time searching.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
This solution does not work for me since I have an offset before the data that 
is not a multiple of the datatype (it's a header containing various stuff).
I'll at pytables.

# Exploit the operating system's virtual memory manager to get a
virtual copy of the entire file in memory
# (This does not actually use any memory until accessed):
virtual_arr = np.memmap(path, np.uint32, r)
# Get a numpy view onto every 20th entry:
virtual_arr_subsampled = virtual_arr[::20]
# Copy those bits into regular malloc'ed memory:
arr_subsampled = virtual_arr_subsampled.copy()
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Dot/inner products with broadcasting?

2013-03-13 Thread Jaakko Luttinen
Hi!

How can I compute dot product (or similar multiplysum operations)
efficiently so that broadcasting is utilized?
For multi-dimensional arrays, NumPy's inner and dot functions do not
match the leading axes and use broadcasting, but instead the result has
first the leading axes of the first input array and then the leading
axes of the second input array.

For instance, I would like to compute the following inner-product:
np.sum(A*B, axis=-1)

But numpy.inner gives:
A = np.random.randn(2,3,4)
B = np.random.randn(3,4)
np.inner(A,B).shape
# - (2, 3, 3) instead of (2, 3)

Similarly for dot product, I would like to compute for instance:
np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)

But numpy.dot gives:
In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
In [13]: np.dot(A,B).shape
# - (2, 3, 2, 5) instead of (2, 3, 5)

I could use einsum for these operations, but I'm not sure whether that's
as efficient as using some BLAS-supported(?) dot products.

I couldn't find any function which could perform this kind of
operations. NumPy's functions seem to either flatten the input arrays
(vdot, outer) or just use the axes of the input arrays separately (dot,
inner, tensordot).

Any help?

Best regards,
Jaakko
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
I see that pytables deals with hdf5 data. It would be very nice if the data 
were in such a standard format, but that is not the case, and that can't be 
changed.


Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Frédéric Bastien [no...@nouiz.org]
Inviato: mercoledì 13 marzo 2013 15.03
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

Hi,

I would suggest that you look at pytables[1]. It use a different file
format, but it seam to do exactly what you want and give an object
that have a very similar interface to numpy.ndarray (but fewer
function). You would just ask for the slice/indices that you want and
it return you a numpy.ndarray.

HTH

Frédéric

[1] http://www.pytables.org/moin

On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus
 andrea.cimatori...@nioz.nl wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't find a 
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this. 
 Since the files are huge, and would make me run out of memory, I need to 
 read data skipping some records (I am reading data recorded at high 
 frequency, so basically I want to read subsampling).
 At the moment, I came up with the code below, which is then compiled using 
 cython. Despite the significant performance increase from the pure python 
 version, the function is still much slower than numpy.fromfile, and only 
 reads one kind of data (in this case uint32), otherwise I do not know how to 
 define the array type in advance. I have basically no experience with cython 
 nor c, so I am a bit stuck. How can I try to make this more efficient and 
 possibly more generic?

 If your data is stored as fixed-format binary (as it seems it is),
 then the easiest way is probably

 # Exploit the operating system's virtual memory manager to get a
 virtual copy of the entire file in memory
 # (This does not actually use any memory until accessed):
 virtual_arr = np.memmap(path, np.uint32, r)
 # Get a numpy view onto every 20th entry:
 virtual_arr_subsampled = virtual_arr[::20]
 # Copy those bits into regular malloc'ed memory:
 arr_subsampled = virtual_arr_subsampled.copy()

 (Your data is probably large enough that this will only work if you're
 using a 64-bit system, because of address space limitations; but if
 you have data that's too large to fit into memory, then I assume
 you're using a 64-bit system anyway...)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Nathaniel Smith
On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
 This solution does not work for me since I have an offset before the data 
 that is not a multiple of the datatype (it's a header containing various 
 stuff).

np.memmap takes an offset= argument.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
Indeed, but that offset it should be a multiple of the byte-size of dtype as 
the help says.
Indeed, this is silly.


Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Nathaniel Smith [n...@pobox.com]
Inviato: mercoledì 13 marzo 2013 15.32
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
 This solution does not work for me since I have an offset before the data 
 that is not a multiple of the datatype (it's a header containing various 
 stuff).

np.memmap takes an offset= argument.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
On top of that, there is another issue: it can be that the data available 
itself is not a multiple of dtype, since there can be write errors at the end 
of the file, and I don't know how to deal with that.

Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Andrea Cimatoribus
Inviato: mercoledì 13 marzo 2013 15.37
A: Discussion of Numerical Python
Oggetto: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks

Indeed, but that offset it should be a multiple of the byte-size of dtype as 
the help says.
Indeed, this is silly.


Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Nathaniel Smith [n...@pobox.com]
Inviato: mercoledì 13 marzo 2013 15.32
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
 This solution does not work for me since I have an offset before the data 
 that is not a multiple of the datatype (it's a header containing various 
 stuff).

np.memmap takes an offset= argument.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
Indeed, but that offset it should be a multiple of the byte-size of dtype as 
the help says.

My mistake, sorry, even if the help says so, it seems that this is not the case 
in the actual code. Still, the problem with the size of the available data 
(which is not necessarily a multiple of dtype byte-size) remains.
ac
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Nathaniel Smith
On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
Indeed, but that offset it should be a multiple of the byte-size of dtype 
as the help says.

 My mistake, sorry, even if the help says so, it seems that this is not the 
 case in the actual code. Still, the problem with the size of the available 
 data (which is not necessarily a multiple of dtype byte-size) remains.

Worst case you can always work around such issues with an extra layer
of view manipulation:

# create a raw view onto the contents of the file
file_bytes = np.memmap(path, dtype=np.uint8, ...)
# cut out any arbitrary number of bytes from the beginning and end
data_bytes = file_bytes[...some slice expression...]
# switch to viewing the bytes as the proper data type
data = data_bytes.view(dtype=np.uint32)
# proceed as before

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't find a 
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this. 
 Since the files are huge, and would make me run out of memory, I need to read 
 data skipping some records (I am reading data recorded at high frequency, so 
 basically I want to read subsampling).
[clip]

You can do a fid.seek(offset) prior to np.fromfile() and the it will 
read from offset.  See the docstrings for `file.seek()` on how to use it.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 3:53 PM, Francesc Alted wrote:
 On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't 
 find a solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do 
 this. Since the files are huge, and would make me run out of memory, 
 I need to read data skipping some records (I am reading data recorded 
 at high frequency, so basically I want to read subsampling).
 [clip]

 You can do a fid.seek(offset) prior to np.fromfile() and the it will 
 read from offset.  See the docstrings for `file.seek()` on how to use it.


Ups, you were already using file.seek().  Disregard, please.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
Ok, this seems to be working (well, as soon as I get the right offset and 
things like that, but that's a different story).
The problem is that it does not go any faster than my initial function compiled 
with cython, and it is still a lot slower than fromfile. Is there a reason why, 
even with compiled code, reading from a file skipping some records should be 
slower than reading the whole file?


Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Nathaniel Smith [n...@pobox.com]
Inviato: mercoledì 13 marzo 2013 15.53
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data  
chunks

On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus
andrea.cimatori...@nioz.nl wrote:
Indeed, but that offset it should be a multiple of the byte-size of dtype 
as the help says.

 My mistake, sorry, even if the help says so, it seems that this is not the 
 case in the actual code. Still, the problem with the size of the available 
 data (which is not necessarily a multiple of dtype byte-size) remains.

Worst case you can always work around such issues with an extra layer
of view manipulation:

# create a raw view onto the contents of the file
file_bytes = np.memmap(path, dtype=np.uint8, ...)
# cut out any arbitrary number of bytes from the beginning and end
data_bytes = file_bytes[...some slice expression...]
# switch to viewing the bytes as the proper data type
data = data_bytes.view(dtype=np.uint32)
# proceed as before

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Nathaniel Smith
On 13 Mar 2013 15:16, Andrea Cimatoribus andrea.cimatori...@nioz.nl
wrote:

 Ok, this seems to be working (well, as soon as I get the right offset and
things like that, but that's a different story).
 The problem is that it does not go any faster than my initial function
compiled with cython, and it is still a lot slower than fromfile. Is there
a reason why, even with compiled code, reading from a file skipping some
records should be slower than reading the whole file?

Oh, in that case you're probably IO bound, not CPU bound, so Cython etc.
can't help.

Traditional spinning-disk hard drives can read quite quickly, but take a
long time to find the right place to read from and start reading. Your OS
has heuristics in it to detect sequential reads and automatically start the
setup for the next read while you're processing the previous read, so you
don't see the seek overhead. If your reads are widely separated enough,
these heuristics will get confused and you'll drop back to doing a new disk
seek on every call to read(), which is deadly. (And would explain what
you're seeing.) If this is what's going on, your best bet is to just write
a python loop that uses fromfile() to read some largeish (megabytes?)
chunk, subsample those and throw away the rest, and repeat.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Performance of einsum?

2013-03-13 Thread Jaakko Luttinen
Hi,

I was wondering if someone could provide some intuition on the
performance of einsum?

I have found that sometimes it is extremely efficient but sometimes it
is several orders of magnitudes slower compared to some other
approaches, for instance, using multiple dot-calls.

My intuition is that the computation time of einsum is linear with
respect to the size of the index space, that is, the product of the
maximums of the indices.

So, for instance computing the dot product of three matrices A*B*C would
not be efficient as einsum('ij,jk,kl-il', A, B, C) because there are
four indices i=1,...,I, j=1,...,J, k=1,...,K and l=1,...,L so the total
computation time is O(I*J*K*L) which is much worse than with two dot
products O(I*J*K+J*K*L), or with two einsum-calls for Y=A*B and Y*C.

On the other hand, computing einsum('ij,ij,ij-i', A, B, C) would be
efficient because the computation time is only O(I*J).

Is this intuition roughly correct or how could I intuitively understand
when the usage of einsum is bad?

Best regards,
Jaakko
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Andrea Cimatoribus
Thanks a lot for the feedback, I'll try to modify my function to overcome this 
issue.
Since I'm in the process of buying new hardware too, a slight OT (but 
definitely related).
Does an ssd provide substantial improvement in these cases?

Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per 
conto di Nathaniel Smith [n...@pobox.com]
Inviato: mercoledì 13 marzo 2013 16.43
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping
data chunks

On 13 Mar 2013 15:16, Andrea Cimatoribus 
andrea.cimatori...@nioz.nlmailto:andrea.cimatori...@nioz.nl wrote:

 Ok, this seems to be working (well, as soon as I get the right offset and 
 things like that, but that's a different story).
 The problem is that it does not go any faster than my initial function 
 compiled with cython, and it is still a lot slower than fromfile. Is there a 
 reason why, even with compiled code, reading from a file skipping some 
 records should be slower than reading the whole file?

Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't 
help.

Traditional spinning-disk hard drives can read quite quickly, but take a long 
time to find the right place to read from and start reading. Your OS has 
heuristics in it to detect sequential reads and automatically start the setup 
for the next read while you're processing the previous read, so you don't see 
the seek overhead. If your reads are widely separated enough, these heuristics 
will get confused and you'll drop back to doing a new disk seek on every call 
to read(), which is deadly. (And would explain what you're seeing.) If this is 
what's going on, your best bet is to just write a python loop that uses 
fromfile() to read some largeish (megabytes?) chunk, subsample those and throw 
away the rest, and repeat.

-n

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] (@Pat Marion) Re: Yes, this one again ImportError: No module named multiarray

2013-03-13 Thread Dinesh B Vadhia
Many thanks Pat - the numpy discussion list is brill.
Go ahead and see if the CPython developers would be interested as it is a 
problem that appears all the time on boards/lists.
Best ... Dinesh



From: Pat Marion 
Sent: Tuesday, March 12, 2013 7:23 AM
To: Aron Ahmadia 
Cc: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] (@Pat Marion) Re: Yes,this one again 
ImportError: No module named multiarray


Thanks for copying me, Aron.

Hi Dinesh, I have a github project which demonstrates how to use numpy with 
freeze.  The project's readme includes more information:

  https://github.com/patmarion/NumpyBuiltinExample

It does require a small patch to CPython's import.c file.  I haven't tried 
posted this patch to the CPython developers, perhaps there'd be interest 
incorporating it upstream.

Pat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Yes, this one again ImportError: No module named multiarray

2013-03-13 Thread Dinesh B Vadhia
Hi Chris
Darn!  It worked this morning and I don't know why.

Focused on PyInstaller because it creates a single executable.  Testing on 
all major versions of Windows (32-bit and 64-bit), Linux and OSX.  The 
problem OS is unsurprisingly, Windows XP (SP3).

Numpy was upgraded to the mkl-version and maybe that did the trick.  Tried 
to replicate on an identical Windows XP machine using the standard 
sourceforge distribution but that resulted in a pyinstaller error.

Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy 
1.7.0, Scipy 0.11.0, PyInstaller 2.0.

Will post back if run into problems again.  Best ...


--
From: Chris Barker - NOAA Federal chris.bar...@noaa.gov
Sent: Tuesday, March 12, 2013 2:50 PM
To: Discussion of Numerical Python numpy-discussion@scipy.org
Subject: Re: [Numpy-discussion] Yes,this one again ImportError: No module 
named multiarray

 On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia
 dineshbvad...@hotmail.com wrote:
 Does that mean numpy won't work with freeze/create_executable type of 
 tools
 or is there a workaround?

 I've used numpy with py2exe and py2app out of the box with no issues (
 actually, there is an issue with too much stuff getting bundled up,
 but it works)

 ImportError let alone what the solution is.  The Traceback, similar to
 others found on the web, is:

 Traceback (most recent call last):
   File test.py, ...
   File C:\Python27\lib\site-packages\numpy\__init__.py, line 154, in
 module

 This indicates that your code is importing the numpy that's inside the
 system installation -- it should be using one in your app bundle.

 What bundling tool are you using?
 How did you install python/numpy?
 What does your bundling tol config look like?
 And, of course, version numbers of everything.

 -Chris

 -- 

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov

 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Richard Hattersley
 Since the files are huge, and would make me run out of memory, I need to
read data skipping some records

Is it possible to describe what you're doing with the data once you have
subsampled it? And if there were a way to work with the full resolution
data, would that be desirable?

I ask because I've been dabbling with a pure-Python library for handilng
larger-than-memory datasets - https://github.com/SciTools/biggus, and it
uses similar chunking techniques as mentioned in the other replies to
process data at the full streaming I/O rate. It's still in the early stages
of development so the design can be fluid, so maybe it's worth seeing if
there's enough in common with your needs to warrant adding your use case.

Richard


On 13 March 2013 13:45, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote:

 Hi everybody, I hope this has not been discussed before, I couldn't find a
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this.
 Since the files are huge, and would make me run out of memory, I need to
 read data skipping some records (I am reading data recorded at high
 frequency, so basically I want to read subsampling).
 At the moment, I came up with the code below, which is then compiled using
 cython. Despite the significant performance increase from the pure python
 version, the function is still much slower than numpy.fromfile, and only
 reads one kind of data (in this case uint32), otherwise I do not know how
 to define the array type in advance. I have basically no experience with
 cython nor c, so I am a bit stuck. How can I try to make this more
 efficient and possibly more generic?
 Thanks

 import numpy as np
 #For cython!
 cimport numpy as np
 from libc.stdint cimport uint32_t

 def cffskip32(fid, int count=1, int skip=0):

 cdef int k=0
 cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count,
 dtype=np.uint32)

 if skip=0:
 while kcount:
 try:
 data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
 fid.seek(skip, 1)
 k +=1
 except ValueError:
 data = data[:k]
 break
 return data
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Daπid
On 13 March 2013 16:54, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote:

  Since I'm in the process of buying new hardware too, a slight OT (but
 definitely related).
 Does an ssd provide substantial improvement in these cases?


It should help. Nevertheless, when talking about performance, it is
difficult to predict, mainly because in a computer there are many things
going on and many layers involved.

I have a couple of computers equipped with SSD, if you want, if you send me
some benchmarks I can run them and see if I get any speedup.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] can't run cython on mtrand.pyx

2013-03-13 Thread Neal Becker
Grabbed numpy-1.7.0 source.
Cython is 0.18

 cython mtrand.pyx produces lots of errors.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] can't run cython on mtrand.pyx

2013-03-13 Thread Robert Kern
On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker ndbeck...@gmail.com wrote:
 Grabbed numpy-1.7.0 source.
 Cython is 0.18

  cython mtrand.pyx produces lots of errors.

It helps to copy-and-paste the errors that you are seeing.

In any case, Cython 0.18 works okay on master's mtrand.pyx sources.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] can't run cython on mtrand.pyx

2013-03-13 Thread Neal Becker
Robert Kern wrote:

 On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker ndbeck...@gmail.com wrote:
 Grabbed numpy-1.7.0 source.
 Cython is 0.18

  cython mtrand.pyx produces lots of errors.
 
 It helps to copy-and-paste the errors that you are seeing.
 
 In any case, Cython 0.18 works okay on master's mtrand.pyx sources.
 

Well, this is the first error:

cython mtrand.pyx

Error compiling Cython file:

...
PyArray_DIMS(oa) , NPY_DOUBLE)
length = PyArray_SIZE(array)
array_data = double *PyArray_DATA(array)
itera = flatiterPyArray_IterNew(objectoa)
for i from 0 = i  length:
array_data[i] = func(state, (double *(itera.dataptr))[0])
^


mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

2013-03-13 Thread Charles R Harris
On Wed, Mar 13, 2013 at 9:54 AM, Andrea Cimatoribus 
andrea.cimatori...@nioz.nl wrote:

 Thanks a lot for the feedback, I'll try to modify my function to overcome
 this issue.
 Since I'm in the process of buying new hardware too, a slight OT (but
 definitely related).
 Does an ssd provide substantial improvement in these cases?


It should. Seek time on an ssd is quite low, and readout is fast. Skipping
over items will probably not be as fast as a sequential read but I expect
it will be substantially faster than a disk. Nathaniel's loop idea will
probably work faster also. The sequential readout rate of a modern ssd will
be about 500 MB/sec, so you can probably just divide that into your file
size to get an estimate of the time needed.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy reference array

2013-03-13 Thread Chris Barker - NOAA Federal
On Wed, Mar 13, 2013 at 6:56 AM, Matt U mpuec...@mit.edu wrote:
 Is it possible to create a numpy array which points to the same data in a
 different numpy array (but in different order etc)?

You can do this (easily), but only if the different order can be
defined in terms of strides. A simple example is a transpose:

In [3]: a = np.arange(12).reshape((3,4))

In [4]: a
Out[4]:
array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11]])

In [5]: b = a.T

In [6]: b
Out[6]:
array([[ 0,  4,  8],
   [ 1,  5,  9],
   [ 2,  6, 10],
   [ 3,  7, 11]])

# b is the transpose of a
# but a view on the same data block:
# change a:
In [7]: a[2,1] = 44


In [8]: a
Out[8]:
array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8, 44, 10, 11]])

# b is changed, too.
In [9]: b
Out[9]:
array([[ 0,  4,  8],
   [ 1,  5, 44],
   [ 2,  6, 10],
   [ 3,  7, 11]])

check out stride tricks for clever things you can do.

But numpy does require that the data in your array be a contiguous
block, in order, so you can't arbitrarily re-arrange it while keeping
a view.

HTH,
  -Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Any help from Numpy community?

2013-03-13 Thread Birdada Simret
*

Any help from Numpy community
[[   0.  1.540.  0.  0.1.08
1.08  1.08  ]
[ 1.540.  1.081.08  1.080.  0.
  0.   ]
 [0.   1.08 0.  0.   0.0.
0.   0.   ]
 [0.   1.08 0.  0.   0.0.
   0.0.]
 [   0.1.080.   0.   0.0.
   0.0.]
 [ 1.08   0.   0.   0.   0.0.
   0.0. ]
 [ 1.08   0.   0.   0.   0.0.
   0.0. ]
 [ 1.08   0.   0.   0.   0.0.
   0.0. ]]

the above is the numpy array matrix. the numbers represents:
C-C: 1.54 and C-H=1.08
So I want to write this form as
C of index i is connected to C of index j
C of index i is connected to H of index j

(C(i),C(j))  # key C(i) and value C(j)
(C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to
fulfil as much as the values of H(j)
To summarize,  the out put may look like:
C1 is connected to C2
C1 is connected to H1
C1 is connected to H3
C2 is connected to H2   etc

Any guide is greatly appreciated,
thanks
birda

*
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Yes, this one again ImportError: No module named multiarray

2013-03-13 Thread Pat Marion
Glad you got it working!  For those who might be interested, the
distinction between the example I linked to and packaging tools like
PyInstaller or py2exe, is that
NumpyBuiltinExamplehttps://github.com/patmarion/NumpyBuiltinExampleuses
static linking to embed numpy as a builtin module.  At runtime, there
is no dynamic loading, and there is no filesystem access.  The technique is
targeted at HPC or embedded systems where you might want to avoid touching
the filesystem, or avoid dynamic loading.

Pat


On Thu, Mar 14, 2013 at 2:08 AM, Dinesh B Vadhia
dineshbvad...@hotmail.comwrote:

 Hi Chris
 Darn!  It worked this morning and I don't know why.

 Focused on PyInstaller because it creates a single executable.  Testing on
 all major versions of Windows (32-bit and 64-bit), Linux and OSX.  The
 problem OS is unsurprisingly, Windows XP (SP3).

 Numpy was upgraded to the mkl-version and maybe that did the trick.  Tried
 to replicate on an identical Windows XP machine using the standard
 sourceforge distribution but that resulted in a pyinstaller error.

 Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy
 1.7.0, Scipy 0.11.0, PyInstaller 2.0.

 Will post back if run into problems again.  Best ...


 --
 From: Chris Barker - NOAA Federal chris.bar...@noaa.gov
 Sent: Tuesday, March 12, 2013 2:50 PM
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Subject: Re: [Numpy-discussion] Yes,this one again ImportError: No module
 named multiarray

  On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia
  dineshbvad...@hotmail.com wrote:
  Does that mean numpy won't work with freeze/create_executable type of
  tools
  or is there a workaround?
 
  I've used numpy with py2exe and py2app out of the box with no issues (
  actually, there is an issue with too much stuff getting bundled up,
  but it works)
 
  ImportError let alone what the solution is.  The Traceback, similar to
  others found on the web, is:
 
  Traceback (most recent call last):
File test.py, ...
File C:\Python27\lib\site-packages\numpy\__init__.py, line 154, in
  module
 
  This indicates that your code is importing the numpy that's inside the
  system installation -- it should be using one in your app bundle.
 
  What bundling tool are you using?
  How did you install python/numpy?
  What does your bundling tol config look like?
  And, of course, version numbers of everything.
 
  -Chris
 
  --
 
  Christopher Barker, Ph.D.
  Oceanographer
 
  Emergency Response Division
  NOAA/NOS/ORR(206) 526-6959   voice
  7600 Sand Point Way NE   (206) 526-6329   fax
  Seattle, WA  98115   (206) 526-6317   main reception
 
  chris.bar...@noaa.gov
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion