Re: [Numpy-discussion] Fast decrementation of indices

2014-02-03 Thread Rick White
I think you'll find the algorithm below to be a lot faster, especially if the 
arrays are big.  Checking each array index against the list of included or 
excluded elements is must slower than simply creating a secondary array and 
looking up whether the elements are included or not.

b = np.array([
 [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7, 8, 9, 10, 11],
 [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3,  4,  5]
])

a = [1,2,3,7,8]

keepdata = np.ones(12, dtype=np.bool)
keepdata[a] = False
w = np.where(keepdata[b[0]]  keepdata[b[1]])
newindex = keepdata.cumsum()-1
c = newindex[b[:,w[0]]]

Cheers,
Rick

On 2 February 2014 20:58, Mads Ipsen mads.ip...@gmail.com wrote:

 Hi,
 
 I have run into a potential 'for loop' bottleneck. Let me outline:
 
 The following array describes bonds (connections) in a benzene molecule
 
b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7,
 8, 9, 10, 11],
 [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
 2, 3,  4,  5]]
 
 ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
 practical examples, the list can be much larger (N  100.000 connections.
 
 Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
 connecting those atoms must be deleted. I achieve this doing
 
 i_0 = numpy.in1d(b[0], a)
 i_1 = numpy.in1d(b[1], a)
 b_i = numpy.where(i_0 | i_1)[0]
 b = b[:,~(i_0 | i_1)]
 
 If you find this approach lacking, feel free to comment.
 
 This results in the following updated bond list
 
 b = [[0,  0,  4,  4,  5,  5,  5,  6, 10, 11]
 [5,  6, 10,  5,  4, 11,  0,  0,  4,  5]]
 
 This list is however not correct: Since atoms [1,2,3,7,8] have been
 deleted, the remaining atoms with indices larger than the deleted atoms
 must be decremented. I do this as follows:
 
 for i in a:
b = numpy.where(b  i, bonds-1, bonds)  (*)
 
 yielding the correct result
 
 b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
 [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]
 
 The Python for loop in (*) may easily contain 50.000 iteration. Is there
 a smart way to utilize numpy functionality to avoid this?
 
 Thanks and best regards,
 
 Mads
 
 -- 
 +-+
 | Mads Ipsen  |
 +--+--+
 | G?seb?ksvej 7, 4. tv | phone:  +45-29716388 |
 | DK-2500 Valby| email:  mads.ip...@gmail.com |
 | Denmark  | map  :   www.tinyurl.com/ns52fpa |
 +--+--+

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 38, Issue 11

2009-11-04 Thread Rick White
The difference between IDL and numpy is that IDL uses single precision  
floats by default while numpy uses doubles.  If you try it with  
doubles in IDL, you will see that it also returns false.

As David Cournapeau said, you should not expect different floating  
point arithmetic operations to give exactly the same result.  It's  
just luck if they do.  E.g., in IDL you'll find that 0.1+0.6 is not  
equal to 0.7.  That's because 0.1 and 0.6 are not exactly  
representable as floats.

On Nov 4, 2009, Jean-Luc Menut wrote:

 Hello all,



 If if define 2 variables a and b by doing the following :

 on [5]: a
 Out[5]: array([ 1.7])

 In [6]: b=array([0.8])+array([0.9])

 In [7]: b
 Out[7]: array([ 1.7])


 if I test the equality of a and b, instead to obatin True, I have :
 In [8]: a==b
 Out[8]: array([False], dtype=bool)

 I know it not related only to numpy but also to python.
 How do you tackle this problem ?

 (I also tried on IDL and it works well : I guess it truncate the  
 number).

 Thanks ,

 Jean-Luc

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] exec: bad practice?

2009-09-15 Thread Rick White
You're not supposed to write to the locals() dictionary.  Sometimes  
it works, but sometimes it doesn't.  From the Python library docs:

locals()
Update and return a dictionary representing the current local symbol  
table.
Note: The contents of this dictionary should not be modified;  
changes may not affect the values of local variables used by the  
interpreter.

I think the only way to create a variable with a program-specified  
name in the local namespace is to use exec (but I'd be happy to be  
corrected).

Cheers,
Rick

On Sep 15, 2009 Sebastien Binet wrote:

 hi John,

 I have a bit of code where I create arrays with meaningful names via:

 meat = ['beef','lamb','pork']
 cut = ['ribs','cutlets']

 for m in meat:
for c in cut:
  exec(consumed_%s_%s = np.zeros 
 ((numxgrid,numygrid,nummeasured))
  % (m,c))

 Is this 'pythonic'? Or is it bad practice (and potentially slow)  
 to use the
 'exec' statement?

 usage of the exec statement is usually frown upon and can be side  
 stepped.
 e.g:

 for m in meat:
 for c in cut:
   locals()['consumed_%s_%s' % (m,c)] = some_array

 hth,
 sebastien.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] puzzle: generate index with many ranges

2009-01-30 Thread Rick White
Here's a technique that works:

Python 2.4.2 (#5, Nov 21 2005, 23:08:11)
[GCC 4.0.0 20041026 (Apple Computer, Inc. build 4061)] on darwin
Type help, copyright, credits or license for more information.
  import numpy as np
  a = np.array([0,4,0,11])
  b = np.array([-1,11,4,15])
  rangelen = b-a+1
  cumlen = rangelen.cumsum()
  c = np.arange(cumlen[-1],dtype=np.int32)
  c += np.repeat(a[1:]-c[cumlen[0:-1]], rangelen[1:])
  print c
[ 4  5  6  7  8  9 10 11  0  1  2  3  4 11 12 13 14 15]

The basic idea is that the difference of your desired output from a  
simple range is an array with a bunch of constant values appended  
together, and that is what repeat() does.  I'm assuming that you'll  
never have b  a.  Notice the slight ugliness of prepending the  
elements at the beginning so that the cumsum starts with zero.   
(Maybe there is a cleaner way to do that.)

This does create a second array (via the repeat) that is the same  
length as the result.  If that uses too much memory, you could break  
up the repeat and update of c into segments using a loop.  (You  
wouldn't need a loop for every a,b element -- do a bunch in each  
iteration.)

-- Rick

Raik Gruenberg wrote:

 Hi there,

 perhaps someone has a bright idea for this one:

 I want to concatenate ranges of numbers into a single array (for  
 indexing). So I
 have generated an array a with starting positions, for example:

 a = [4, 0, 11]

 I have an array b with stop positions:

 b = [11, 4, 15]

 and I would like to generate an index array that takes 4..11, then  
 0..4, then
 11..15.

 In reality, a and b have 1+ elements and the arrays to be  
 sliced are very
 large so I want to avoid any for loops etc. Any idea how this could  
 be done? I
 thought some combination of *repeat* and adding of *arange* should  
 do the trick
 but just cannot nail it down.

 Thanks in advance for any hints!

 Greetings,
 Raik

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread Rick White
On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote:

 At some point I might try and test
 different cache sizes for different data-set sizes and see what the
 effect is.  For now, 65536 seems a good number and I would be happy to
 see this replace the current numpy.histogram.

I experimented a little on my machine and found that 64k was a good  
size, but it is fairly insensitive to the size over a wide range  
(16000 to 1e6).  I'd be interested to hear how this scales on other  
machines -- I'm pretty sure that the ideal size will keep the piece  
of the array being sorted smaller than the on-chip cache.

Just so we don't get too smug about the speed, if I do this in IDL on  
the same machine it is 10 times faster (0.28 seconds instead of 4  
seconds).  I'm sure the IDL version uses the much faster approach of  
just sweeping through the array once, incrementing counts in the  
appropriate bins.  It only handles equal-sized bins, so it is not as  
general as the numpy version -- but equal-sized bins is a very common  
case.  I'd still like to see a C version of histogram (which I guess  
would need to be a ufunc) go into the core numpy.
Rick
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Rick White
On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote:

 I'm trying to generate histograms of extremely large datasets.  I've
 tried a few methods, listed below, all with their own shortcomings.
 Mailing-list archive and google searches have not revealed any
 solutions.

The numpy.histogram function can be modified to use memory much more  
efficiently when the input array is large, and the modification turns  
out to be faster even for smallish arrays (in my tests, anyway).   
Below is a modified version of the histogram function from  
function_base.py.  It is almost identical, but it avoids doing the  
sort of the entire input array simply by dividing it into blocks.   
(It would be even better to avoid the call to ravel too.)  The only  
other messy detail is that the builtin range function is shadowed by  
the 'range' parameter.

In my timing tests this is about the same speed for arrays about the  
same size as the block size and is faster than the current version by  
30-40% for large arrays.  The speed difference increases as the array  
size increases.

I haven't compared this to Eric's weave function, but this has the  
advantages of being pure Python and of being much simpler.  On my  
machine (MacBook Pro) it takes about 4 seconds for an array with 100  
million elements.  The time increases perfectly linearly with array  
size for arrays larger than a million elements.
Rick

from numpy import *

lrange = range
def histogram(a, bins=10, range=None, normed=False):
 a = asarray(a).ravel()
 if not iterable(bins):
 if range is None:
 range = (a.min(), a.max())
 mn, mx = [mi+0.0 for mi in range]
 if mn == mx:
 mn -= 0.5
 mx += 0.5
 bins = linspace(mn, mx, bins, endpoint=False)

 # best block size probably depends on processor cache size
 block = 65536
 n = sort(a[:block]).searchsorted(bins)
 for i in lrange(block,len(a),block):
 n += sort(a[i:i+block]).searchsorted(bins)
 n = concatenate([n, [len(a)]])
 n = n[1:]-n[:-1]

 if normed:
 db = bins[1] - bins[0]
 return 1.0/(a.size*db) * n, bins
 else:
 return n, bins

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion