Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

2006-10-04 Thread David Cournapeau
Tim Hochberg wrote:


 I guess the problem is coming from the fact that y being C order, y[:, 
 i] needs accessing data in a non 'linear' way. Is there a way to speed 
 this up ? I did something like this:

   y   = N.zeros((K, n))
 for i in range(K):
 y[i] = gauss_den(data, mu[i, :], va[i, :])
 return y.T

 which works, but I don't like it very much.
 Why not?

Mainly because using those transpose do not really reflect the 
intention, and this does not seem natural.
  Isn't there any other way 
 That depends on the details of gauss_den.

 A general note: for this kind of microoptimization puzzle, it's much 
 easier to help if you can post a self contained example, preferably 
 something fairly simple that still illustrates the speed issue, that we 
 can experiment with.

Here we are (the difference may not seem that much between the two 
multiple_ga, but in reality, _diag_gauss_den is an internal function 
which is done in C, and is much faster... By writing this example, I've 
just realized that the function _diag_gauss_den may be slow for exactly 
the same reasons):


#! /usr/bin/env python
# Last Change: Wed Oct 04 07:00 PM 2006 J

import numpy as N
from numpy.random import randn

def _diag_gauss_den(x, mu, va):
 This function is the actual implementation
of gaussian pdf in scalar case. It assumes all args
are conformant, so it should not be used directly
   
Call gauss_den instead
# Diagonal matrix case
d   = mu.size
inva= 1/va[0]
fac = (2*N.pi) ** (-d/2.0) * N.sqrt(inva)
y   =  (x[:,0] - mu[0]) ** 2 * inva * -0.5
for i in range(1, d):
inva= 1/va[i]
fac *= N.sqrt(inva)
y   += (x[:,i] - mu[i]) ** 2 * inva * -0.5
y   = fac * N.exp(y)

def multiple_gauss_den1(data, mu, va):
Helper function to generate several Gaussian
pdf (different parameters) from the same data: unoptimized version
K   = mu.shape[0]
n   = data.shape[0]
d   = data.shape[1]
   
y   = N.zeros((n, K))
for i in range(K):
y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :])

return y

def multiple_gauss_den2(data, mu, va):
Helper function to generate several Gaussian
pdf (different parameters) from the same data: optimized version
K   = mu.shape[0]
n   = data.shape[0]
d   = data.shape[1]
   
y   = N.zeros((K, n))
for i in range(K):
y[i] = _diag_gauss_den(data, mu[i, :], va[i, :])
   
return y.T

def bench():
#===
# GMM of 30 comp, 15 dimension, 1e4 frames
#===
d   = 15
k   = 30
nframes = 1e4
niter   = 10
mode= 'diag'

mu  = randn(k, d)
va  = randn(k, d) ** 2
X   = randn(nframes, d)

print =
print (%d dim, %d components) GMM with %d iterations, for %d frames \
% (d, k, niter, nframes)

for i in range(niter):
y1  = multiple_gauss_den1(X, mu, va)

for i in range(niter):
y2  = multiple_gauss_den2(X, mu, va)

se  = N.sum(y1-y2)

print se

if __name__ == '__main__':
import hotshot, hotshot.stats
profile_file= 'foo.prof'
prof= hotshot.Profile(profile_file, lineevents=1)
prof.runcall(bench)
p = hotshot.stats.load(profile_file)
print p.sort_stats('cumulative').print_stats(20)
prof.close()

I am a bit puzzled by all those C vs F storage, though. In Matlab, where 
the storage was always F as far as I know, I have never encountered such 
differences (eg between y(:, i) and y(:, i)); I don't know if this is 
because I am doing it badly, or because matlab is much more clever than 
numpy at handling those cases, or if that is the price to pay for the 
added flexibility of numpy...

David

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

2006-10-04 Thread David Cournapeau
David Cournapeau wrote:
 Here we are (the difference may not seem that much between the two 
 multiple_ga, but in reality, _diag_gauss_den is an internal function 
 which is done in C, and is much faster... By writing this example, I've 
 just realized that the function _diag_gauss_den may be slow for exactly 
 the same reasons):

   
I checked my assumption about the _diag_gauss_den function being slow 
because of the same problem, and this is indeed the case:

If I replace X = randn(nframes, d) by X = randn(d, nframes); X   = X.T, 
the function is mode than twice as fast ! This seems way to much of a 
difference to me

David

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion