[Numpy-discussion] copying array to itself

2012-05-02 Thread Henry Gomersall
I'm need to do some shifting of data within an array and am using the
following code:

for p in numpy.arange(array.shape[0], dtype='int64'):
for q in numpy.arange(array.shape[1]):
# A positive shift is towards zero
shift = shift_values[p, q]

if shift = 0:
copy_len = array.shape[2] - shift
array[p, q, :copy_len] = array[p, q, shift:]
array[p, q, copy_len:] = 0

else:
neg_shift = -shift
copy_len = array.shape[2] - neg_shift
array[p, q, neg_shift:] = array[p, q, :copy_len]
array[p, q, :neg_shift] = 0

In essence, if shift is positive, it copies the data to a block closer
the zero index on the last dimension, and if its negative, it copies it
to a block further away.

The problem I've encountered is that in for some values of p and q
(which is consistent), it simply writes deterministic and consistent
garbage (a repeated pattern around around -0.002) to that last dimension
to which I'm trying to copy. Most of the values of p and q work fine and
as expected. I can solve the problem with an interim copy.

Is this some nuance of the way numpy does things? Or am I missing some
stupid bug in my code?

cheers,

Henry

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] sparse array data

2012-05-02 Thread Wolfgang Kerzendorf
Hi all,

I'm currently writing a code that needs three dimensional data (for the 
physicists it's dimensions are atom, ion, level). The problem is that not all 
combinations do exist (a sparse array). Sparse matrices in scipy only deal with 
two dimensions. The operations that I need to do on those are running functions 
like exp(item/constant) on all of the items. I also want to sum them up in the 
last dimension. What's the best way to make a class that takes this kind of 
data and does the required operations fast. Maybe some phycisists have 
implemented these things already. Any thoughts?

Cheers
   Wolfgang
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Frédéric Bastien
what about numpy.ma? Those are marked array. But they won't be the fastest.

Fred

On Wed, May 2, 2012 at 12:16 PM, Wolfgang Kerzendorf
wkerzend...@gmail.com wrote:
 Hi all,

 I'm currently writing a code that needs three dimensional data (for the 
 physicists it's dimensions are atom, ion, level). The problem is that not all 
 combinations do exist (a sparse array). Sparse matrices in scipy only deal 
 with two dimensions. The operations that I need to do on those are running 
 functions like exp(item/constant) on all of the items. I also want to sum 
 them up in the last dimension. What's the best way to make a class that takes 
 this kind of data and does the required operations fast. Maybe some 
 phycisists have implemented these things already. Any thoughts?

 Cheers
   Wolfgang
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Moroney, Catherine M (388D)
Hello,

Can somebody give me some hints as to how to code up this function
in pure python, rather than dropping down to Fortran?

I will want to compare a 7-element vector (called element) to a large list of 
similarly-dimensioned
vectors (called target, and pick out the vector in target that is the 
closest to element
(determined by minimizing the Euclidean distance).  

For instance, in (slow) brute force form it would look like:

element = numpy.array([1, 2, 3, 4, 5, 6, 7])
target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

min_length = .0
min_index  = 
for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
  min_length = distance
  min_index  = i

Now of course, the actual problem will be of a much larger scale.  I will have
an array of elements, and a large number of potential targets.  

I was thinking of having element be an array where each element itself is
a numpy.ndarray, and then vectorizing the code above so as an output I would
have an array of the min_index and min_length values.  

I can get the following simple test to work so I may be on the right track:

import numpy

dtype = [(x, numpy.ndarray)]

def single(data):
return data[0].min()

multiple = numpy.vectorize(single)

if __name__ == __main__:

a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=dtype)
for i in xrange(0, b.shape[0]):
b[i][x] = a[i,:]

print a
print b

x = multiple(b)
print x
  
What is the best way of constructing b from a?  I tried b = 
numpy.recarray((4), dtype=dtype, buf=a)
but I get a segmentation fault when I try to print b.

Is there a way to perform this larger task efficiently with record arrays and 
vectorization, or
am I off on the wrong track completely?  How can I do this efficiently without 
dropping
down to Fortran?

Thanks for any advice,

Catherine
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 11:06 AM, Moroney, Catherine M (388D)
catherine.m.moro...@jpl.nasa.gov wrote:
 I will want to compare a 7-element vector (called element) to a large list 
 of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the 
 closest to element
 (determined by minimizing the Euclidean distance).

It's not entirely clear what you mean from the description above.  In
the code example, you return a single index, but from the description
it sounds like you want to pick out a vector?

If you need multiple answers, one for each element, then you probably
need to do broadcasting as shown in the NumPy medkit:

http://mentat.za.net/numpy/numpy_advanced_slides/

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copying array to itself

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 9:03 AM, Henry Gomersall h...@cantab.net wrote:
 Is this some nuance of the way numpy does things? Or am I missing some
 stupid bug in my code?

Try playing with the parameters of the following code:

sz = 1
N = 10

import numpy as np

x = np.arange(sz)
y = x.copy()
x[:-N] = x[N:]

np.testing.assert_equal(x[:-N], y[N:])


For small values of sz this typically works, but as soon as numpy
needs to buffer strange things happen because you are reading from
memory locations that you've already written to.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Francesc Alted
On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
 Hi all,

 I'm currently writing a code that needs three dimensional data (for the 
 physicists it's dimensions are atom, ion, level). The problem is that not all 
 combinations do exist (a sparse array). Sparse matrices in scipy only deal 
 with two dimensions. The operations that I need to do on those are running 
 functions like exp(item/constant) on all of the items. I also want to sum 
 them up in the last dimension. What's the best way to make a class that takes 
 this kind of data and does the required operations fast. Maybe some 
 phycisists have implemented these things already. Any thoughts?

Curiously enough, I have recently been discussing with Travis O. about 
how to represent sparse matrices with complete generality.  One of the 
possibilities is to use what Travis call synthetic dimensions.  The 
idea behind it is easy: use a table with as many columns as dimensions, 
and add another one for the actual values of the array.  For a 3-D 
sparse array, this looks like:

dim0 | dim1 | dim2 | value
==
0 |   0  |   0  | val0
0 |  10  | 100  | val1
   20 |   5  | 202  | val2

You can use any package that deals with tables for implementing such a 
thing.  I'm going to quickly describe a raw implementation of this on 
top of carray [1], not only because I'm the author, but also because it 
adapts well to the needs you exposed.

[1] https://github.com/FrancescAlted/carray

Let's start with a small array with shape (2,5).  We are going to use a 
dense array for this, mainly for comparison purposes with typical NumPy 
arrays, but of course the logic behind this can be extended to 
multidimensional sparse arrays with complete generality.

In [1]: import carray as ca

In [2]: import numpy as np

In [3]: syn_dtype = np.dtype([('dim0', np.uint32), ('dim1', np.uint32), 
('value', np.float64)])

In [4]: N = 10

In [6]: ct = ca.fromiter(((i/2, i%2, i*i) for i in xrange(N)), 
dtype=syn_dtype, count=N)

In [7]: ct
Out[7]:
ctable((10,), |V16) nbytes: 160; cbytes: 12.00 KB; ratio: 0.01
   cparams := cparams(clevel=5, shuffle=True)
[(0, 0, 0.0) (0, 1, 1.0) (1, 0, 4.0) (1, 1, 9.0) (2, 0, 16.0) (2, 1, 25.0)
  (3, 0, 36.0) (3, 1, 49.0) (4, 0, 64.0) (4, 1, 81.0)]

Okay, we have our small array. Now, let's apply a function for the 
values (in this case the log()):

In [8]: ct['value'][:] = ct.eval('log(value)')

In [9]: ct
Out[9]:
ctable((10,), |V16) nbytes: 160; cbytes: 12.00 KB; ratio: 0.01
   cparams := cparams(clevel=5, shuffle=True)
[(0, 0, -inf) (0, 1, 0.0) (1, 0, 1.3862943611198906)
  (1, 1, 2.1972245773362196) (2, 0, 2.772588722239781)
  (2, 1, 3.2188758248682006) (3, 0, 3.58351893845611)
  (3, 1, 3.8918202981106265) (4, 0, 4.1588830833596715)
  (4, 1, 4.394449154672439)]

carray uses numexpr behind the scenes, so these operations are very 
fast.  Also, for functions not supported inside numexpr, carray can also 
make use of the ones in NumPy (although these are typically not as 
efficient).

Let's see how to do sums in different axis.  For this, we will use the 
selection capabilities in the ctable object.  Let's do the sum in the 
last axis first:

In [10]: [ sum(row.value for row in ct.where('(dim0==%d)' % (i,))) for i 
in range(N/2) ]
Out[10]:
[-inf,
  3.58351893845611,
  5.991464547107982,
  7.475339236566736,
  8.55333223803211]

So, it is just a matter of summing over dim1 while keeping dim0 fixed.  
One can check that the results are the same than for NumPy:

In [11]: t = np.fromiter((np.log(i*i) for i in xrange(N)), 
dtype='f8').reshape(N/2,2)

In [12]: t.sum(axis=1)
Out[12]: array([   -inf,  3.58351894,  5.99146455,  7.47533924,  
8.55333224])

Summing over the leading dimension means keeping dim1 fixed:

In [13]: [ sum(row.value for row in ct.where('(dim1==%d)' % (i,))) for i 
in range(2) ]
Out[13]: [-inf, 13.702369854987484]

and again, this is the same than using the `axis=0` parameter:

In [14]: t.sum(axis=0)
Out[14]: array([-inf,  13.70236985])

Summing everything is, as expected, the easiest:

In [15]: sum(row.value for row in ct.iter())
Out[15]: -inf

In [16]: t.sum()
Out[16]: -inf

Of course, the case for more dimensions requires a bit more complexity, 
but nothing fancy (this is left as an exercise for the reader ;).  In 
case you are going to use this in your package, you may want to create 
wrappers that would access the different functionality more easily.

Finally, you should note that I used 4-byte integers for representing 
the dimensions.  If this is not enough, you can use 8-byte integers 
too.  As the carray objects are compressed by default, this usually 
doesn't take a lot of space.  For example, for an array with 1 million 
elements:

In [31]: ct = ca.fromiter(((i/2, i%2, i*i) for i in xrange(N)), 
dtype=syn_dtype, count=N)

In [32]: ct
Out[32]:
ctable((100,), |V16) nbytes: 15.26 MB; cbytes: 1.76 MB; ratio: 8.67
   cparams := cparams(clevel=5, shuffle=True)
[(0, 0, 0.0), (0, 1, 1.0), (1, 

Re: [Numpy-discussion] Continuous Integration

2012-05-02 Thread David Froger
 I've been working on setting up a new buildbot for NumPy. Unfortunately, I 
 don't have much time to work on it, so it's slow going! Right now I am still 
 at 
 the stage of getting NumPy to pass all its tests on the machines I'm using as 
 test slaves. After that, I plan to transfer existing slaves to the new setup, 
 and then maybe ask for new volunteer slave machines (if people think the 
 buildbot setup is useful).
Hi,

If there are things  one can contribute to help the  development of the buildbot
for NumPy, I would be happy to participate!

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Stéfan van der Walt
Hi Francesc

On Wed, May 2, 2012 at 1:53 PM, Francesc Alted franc...@continuum.io wrote:
 and add another one for the actual values of the array.  For a 3-D
 sparse array, this looks like:

 dim0 | dim1 | dim2 | value
 ==
    0 |   0  |   0  | val0
    0 |  10  | 100  | val1
   20 |   5  | 202  | val2

What's the distinction between this and a coo_matrix?

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Nathaniel Smith
On Wed, May 2, 2012 at 9:53 PM, Francesc Alted franc...@continuum.io wrote:
 On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
 Hi all,

 I'm currently writing a code that needs three dimensional data (for the 
 physicists it's dimensions are atom, ion, level). The problem is that not 
 all combinations do exist (a sparse array). Sparse matrices in scipy only 
 deal with two dimensions. The operations that I need to do on those are 
 running functions like exp(item/constant) on all of the items. I also want 
 to sum them up in the last dimension. What's the best way to make a class 
 that takes this kind of data and does the required operations fast. Maybe 
 some phycisists have implemented these things already. Any thoughts?

 Curiously enough, I have recently been discussing with Travis O. about
 how to represent sparse matrices with complete generality.  One of the
 possibilities is to use what Travis call synthetic dimensions.  The
 idea behind it is easy: use a table with as many columns as dimensions,
 and add another one for the actual values of the array.  For a 3-D
 sparse array, this looks like:

 dim0 | dim1 | dim2 | value
 ==
    0 |   0  |   0  | val0
    0 |  10  | 100  | val1
   20 |   5  | 202  | val2

This coordinate format is also what's used by the MATLAB Tensor
Toolbox. They have a paper justifying this choice and describing some
tricks for how to work with them:
  http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
(Spoiler: you use a lot of sort operations. Conveniently, timsort
appears to be perfectly adapted for their algorithmic requirements.)

I'm not sure why one would make up a new term like synthetic
dimensions though, it's just standard coordinate format...

Though, for the original poster, depending on their exact problem,
they might be better off just using a list or object ndarray of
scipy.sparse matrices. Or some coordinate arrays like above, plus
add.reduceat for the sums they mentioned.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issue Tracking

2012-05-02 Thread Wes McKinney
On Wed, May 2, 2012 at 9:48 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers ralf.gomm...@googlemail.com
 wrote:



 On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen p...@iki.fi wrote:

 01.05.2012 21:34, Ralf Gommers kirjoitti:
 [clip]
  At this point it's probably good to look again at the problems we want
  to solve:
  1. responsive user interface (must absolutely have)

 Now that it comes too late: with some luck, I've possibly hit on what
 was ailing the Tracs (max_diff_bytes configured too large). Let's see if
 things work better from now on...


 That's amazing - not only does it not give errors anymore, it's also an
 order of magnitude faster.


 So maybe we could just stick with trac. Performance was really the sticking
 point.

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going
to get involved in NumPy dev eventually, promise). While warty in some
of the places already mentioned, I have found it to be very
low-friction and low-annoyance in my own dev process (nearing 1000
issues closed in the last year in pandas). But there are fewer cooks
in the kitchen with pandas so perhaps this experience wouldn't be
identical with NumPy. The biggest benefit I've seen is community
involvement that you really wouldn't see if I were using a Trac or
something else hosted elsewhere. Users are on GitHub and it for some
reason gives people a feeling of engagement in the open source process
that I don't see anywhere else.

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] record arrays initialization

2012-05-02 Thread Moroney, Catherine M (388D)
Thanks to Perry for some very useful off-list conversation.   I realize that
I wasn't being clear at all in my earlier description of the problem so here it 
is
in a nutshell:

Find the best match in an array t(5000, 7) for a single vector e(7).  Now scale
it up so e is (128, 512, 7) and I want to return a (128, 512) array of the 
t-identifiers
that are the best match for e.  Best match is defined as the minimum 
Euclidean distance.

I'm going to try three ways: (a) brute force and lots of looping in python,
(b) constructing a function to find the match for a single instance of e and
vectorizing it, and (c) coding it in Fortran.  I'll be curious to see the 
performance figures.

Two smaller questions:

A)  How do I most efficiently construct a record array from a single array?
I want to do the following, but it segfaults on me when i try to print b.

vtype = [(x, numpy.ndarray)]
a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=vtype, buf=a)

print a
print b

What is the most efficient way of constructing b from the values of a?  In 
real-life, 
a is (128*512*7) and I want b to be (128, 512) with the x component being a 
7-value numpy array.

and

B)  If I'm vectorizing a function (single) to find the best match for
a single element of e within t, how do I pass the entire array t into
the function without having it parcelled down to its individual elements?

i.e.

def single(elements, targets):
nlen = element.shape[0]
nvec = targets.data.shape[0]
x = element.reshape(1, nlen).repeat(nvec, axis=0)

diffs = ((x - targets.data)**2).sum(axis=1)
diffs = numpy.sqrt(diffs)
return numpy.argmin(diffs, axis=0)

multiple = numpy.vectorize(single)
x = multiple(all_elements, target)

where all_elements is similar to b in my first example, and target
is a 2-d array.  The above code doesn't work because target gets reduced
to a single element when it gets down to single and I need to see the whole 
array
when I'm down in single. 

I found a work-around by encapsulating target into a single object and passing
in the object, but I'm curious if there's a better way of doing this.

I hope I've explained myself better this time around,

Catherine
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 1:06 PM, Moroney, Catherine M (388D)
catherine.m.moro...@jpl.nasa.gov wrote:
 Hello,

 Can somebody give me some hints as to how to code up this function
 in pure python, rather than dropping down to Fortran?

 I will want to compare a 7-element vector (called element) to a large list 
 of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the 
 closest to element
 (determined by minimizing the Euclidean distance).

 For instance, in (slow) brute force form it would look like:

 element = numpy.array([1, 2, 3, 4, 5, 6, 7])
 target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

 min_length = .0
 min_index  =
 for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
      min_length = distance
      min_index  = i


If you are just trying to find the index to the vector in target
that is closest to element, then I think the default broadcasting
would work fine. Here is an example that should work (the broadcasting
is done for the subtraction element-targets):

In [39]: element = np.arange(1,8)
In [40]: targets = np.random.uniform(0,8,(1000,7))
In [41]: distance_squared = ((element-targets)**2).sum(1)
In [42]: min_index = distance_squared.argmin()
In [43]: element
Out[43]: array([1, 2, 3, 4, 5, 6, 7])
In [44]: targets[min_index,:]
Out[44]:
array([ 1.93625981,  2.56137284,  2.23395169,  4.15215253,  3.96478248,
5.21829915,  5.13049489])

Note - depending on the number of vectors in targets, it might be
better to have everything transposed if you are really worried about
the timing; you'd need to try that for your particular case.

Hope that helps,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Continuous Integration

2012-05-02 Thread Chris Ball
David Froger david.froger at gmail.com writes:
 
  I've been working on setting up a new buildbot for
  NumPy. Unfortunately, I don't have much time to work on it,
  so it's slow going! 
...
 Hi,
 
 If there are things one can contribute to help the development
 of the buildbot for NumPy, I would be happy to participate!

Great! I've sent you a message off the list so we can coordinate further.

Chris


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Francesc Alted
On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
 Hi Francesc

 On Wed, May 2, 2012 at 1:53 PM, Francesc Altedfranc...@continuum.io  wrote:
 and add another one for the actual values of the array.  For a 3-D
 sparse array, this looks like:

 dim0 | dim1 | dim2 | value
 ==
 0 |   0  |   0  | val0
 0 |  10  | 100  | val1
20 |   5  | 202  | val2
 What's the distinction between this and a coo_matrix?

Well, as the OP said, coo_matrix does not support dimensions larger than 
2, right?

In [4]: coo_matrix((3,4), dtype=np.int8).todense()
Out[4]:
matrix([[0, 0, 0, 0],
 [0, 0, 0, 0],
 [0, 0, 0, 0]], dtype=int8)


In [5]: coo_matrix((2,3,2), dtype=np.int8).todense()
---
TypeError Traceback (most recent call last)
/Users/faltet/ipython-input-5-46b8ae62259f in module()
 1 coo_matrix((2,3,2), dtype=np.int8).todense()

/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/scipy/sparse/coo.py
 
in __init__(self, arg1, shape, dtype, copy)
 127 obj, ij = arg1
 128 except:
-- 129 raise TypeError('invalid input format')
 130
 131 try:

TypeError: invalid input format

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 2:45 PM, Moroney, Catherine M (388D)
catherine.m.moro...@jpl.nasa.gov wrote:
 Find the best match in an array t(5000, 7) for a single vector e(7).  Now 
 scale
 it up so e is (128, 512, 7) and I want to return a (128, 512) array of the 
 t-identifiers
 that are the best match for e.  Best match is defined as the minimum 
 Euclidean distance.

 I'm going to try three ways: (a) brute force and lots of looping in python,
 (b) constructing a function to find the match for a single instance of e and
 vectorizing it, and (c) coding it in Fortran.  I'll be curious to see the
 performance figures.

I'd use a mixture of (a) and (b):  break the t(N, 7) up into blocks
of, say, (1000, 7), compute the best match in each using broadcasting,
and then combine your results to find the best of the best.  This
strategy should be best for very large N.  For moderate N, where
broadcasting easily fits into memory, the answer given by the OP to
your original email would do the trick.

 A)  How do I most efficiently construct a record array from a single array?
 I want to do the following, but it segfaults on me when i try to print b.

 vtype = [(x, numpy.ndarray)]
 a = numpy.arange(0, 16).reshape(4,4)
 b = numpy.recarray((4), dtype=vtype, buf=a)

I prefer not to use record arrays, and stick to structured arrays:

In [11]: vtype = np.dtype([('x', (np.float, 4))])

In [12]: a = np.arange(16.).reshape((4,4))

In [13]: a.view(vtype)
Out[13]:
array([[([0.0, 1.0, 2.0, 3.0],)],
   [([4.0, 5.0, 6.0, 7.0],)],
   [([8.0, 9.0, 10.0, 11.0],)],
   [([12.0, 13.0, 14.0, 15.0],)]],
  dtype=[('x', 'f8', (4,))])

 B)  If I'm vectorizing a function (single) to find the best match for
 a single element of e within t, how do I pass the entire array t into
 the function without having it parcelled down to its individual elements?

I think the new dtype just makes your life more difficult here.  Simply do:

In [49]: np.sum(a - elements.T, axis=1)
Out[49]: array([  0.,  16.,  32.,  48.])

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Francesc Alted
On 5/2/12 4:20 PM, Nathaniel Smith wrote:
 On Wed, May 2, 2012 at 9:53 PM, Francesc Altedfranc...@continuum.io  wrote:
 On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
 Hi all,

 I'm currently writing a code that needs three dimensional data (for the 
 physicists it's dimensions are atom, ion, level). The problem is that not 
 all combinations do exist (a sparse array). Sparse matrices in scipy only 
 deal with two dimensions. The operations that I need to do on those are 
 running functions like exp(item/constant) on all of the items. I also want 
 to sum them up in the last dimension. What's the best way to make a class 
 that takes this kind of data and does the required operations fast. Maybe 
 some phycisists have implemented these things already. Any thoughts?
 Curiously enough, I have recently been discussing with Travis O. about
 how to represent sparse matrices with complete generality.  One of the
 possibilities is to use what Travis call synthetic dimensions.  The
 idea behind it is easy: use a table with as many columns as dimensions,
 and add another one for the actual values of the array.  For a 3-D
 sparse array, this looks like:

 dim0 | dim1 | dim2 | value
 ==
 0 |   0  |   0  | val0
 0 |  10  | 100  | val1
20 |   5  | 202  | val2
 This coordinate format is also what's used by the MATLAB Tensor
 Toolbox. They have a paper justifying this choice and describing some
 tricks for how to work with them:
http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
 (Spoiler: you use a lot of sort operations. Conveniently, timsort
 appears to be perfectly adapted for their algorithmic requirements.)

Uh, I do not have access to the article, but yes, sorting makes a lot of 
sense for these scenarios (this is why I was suggesting using indexes, 
which is sort of sorting too).


 I'm not sure why one would make up a new term like synthetic
 dimensions though, it's just standard coordinate format...

Yeah, this seems a bit weird.  Perhaps Travis was referring to other 
things and I mixed concepts.  Anyways.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Kevin Jacobs jac...@bioinformed.com
On Wed, May 2, 2012 at 5:45 PM, Moroney, Catherine M (388D) 
catherine.m.moro...@jpl.nasa.gov wrote:

 Thanks to Perry for some very useful off-list conversation.   I realize
 that
 I wasn't being clear at all in my earlier description of the problem so
 here it is
 in a nutshell:

 Find the best match in an array t(5000, 7) for a single vector e(7).  Now
 scale
 it up so e is (128, 512, 7) and I want to return a (128, 512) array of the
 t-identifiers
 that are the best match for e.  Best match is defined as the minimum
 Euclidean distance.


It sounds like you want to find the nearest neighbor to a point in a
high-dimensional space. This sounds like a job for a spacial data structure
like a KD-tree.  See:

http://docs.scipy.org/doc/scipy/reference/spatial.html
http://mloss.org/software/view/143/
http://www.mrzv.org/software/pyANN/
etc.

-Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 3:20 PM, Francesc Alted franc...@continuum.io wrote:
 On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
 Well, as the OP said, coo_matrix does not support dimensions larger than
 2, right?

That's just an implementation detail, I would imagine--I'm trying to
figure out if there is a new principle behind synthetic dimensions?
By the way, David Cournapeau mentioned using b-trees for sparse ops a
while ago; did you ever talk to him about those ideas?

BTW, this coo-type storage is used in Stanford's probabilistic
graphical models course, but for dense data (like we have in the
course) it's a pain.  Writing code in both Octave and Python, I again
came to realize what a very elegant N-dimensional structure the numpy
array exposes!

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Francesc Alted
On 5/2/12 5:28 PM, Stéfan van der Walt wrote:
 On Wed, May 2, 2012 at 3:20 PM, Francesc Altedfranc...@continuum.io  wrote:
 On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
 Well, as the OP said, coo_matrix does not support dimensions larger than
 2, right?
 That's just an implementation detail, I would imagine--I'm trying to
 figure out if there is a new principle behind synthetic dimensions?

No, no new things under the sun.  We were just talking about many 
different things and this suddenly appeared in our talk.  Nothing really 
serious.

 By the way, David Cournapeau mentioned using b-trees for sparse ops a
 while ago; did you ever talk to him about those ideas?

Yup, the b-tree idea fits very well for indexing the coordinates.  
Although one problem with b-trees is that they do not compress well in 
general.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Test failures - which dependencies am I missing?

2012-05-02 Thread Chris Ball
Chris Ball ceball at gmail.com writes:

 
 Keith Hughitt keith.hughitt at gmail.com writes:
 
  Hi Chris,
  
  Try sudo apt-get build-dep python-numpy to install the dependencies for
  building NumPy. I believe it will install all of the optional dependencies 
  as well.
 
 Thanks for that, but I'd already tried it and found the same failures. 
 
 However, I also found that on my version of Ubuntu (10.04 LTS), which includes
 NumPy 1.3.0, running numpy.test(verbose=3) yielded the following:

(Above, numpy is the Ubuntu-supplied numpy in case that wasn't clear.)
 
 nose.config: INFO: Excluding tests matching 
['f2py_ext','f2py_f90_ext','gen_ext',
 'pyrex_ext', 'swig_ext', 'array_from_pyobj']

I've discovered that numpy itself explicitly excludes these tests (in 
numpy/testing/nosetester.py):

# Stuff to exclude from tests. These are from numpy.distutils   
 
excludes = ['f2py_ext',
'f2py_f90_ext',
'gen_ext',
'pyrex_ext',
'swig_ext']

So, all is explained now.

Chris

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Nathaniel Smith
On Wed, May 2, 2012 at 11:26 PM, Francesc Alted franc...@continuum.io wrote:
 On 5/2/12 4:20 PM, Nathaniel Smith wrote:
 On Wed, May 2, 2012 at 9:53 PM, Francesc Altedfranc...@continuum.io  wrote:
 On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
 Hi all,

 I'm currently writing a code that needs three dimensional data (for the 
 physicists it's dimensions are atom, ion, level). The problem is that not 
 all combinations do exist (a sparse array). Sparse matrices in scipy only 
 deal with two dimensions. The operations that I need to do on those are 
 running functions like exp(item/constant) on all of the items. I also want 
 to sum them up in the last dimension. What's the best way to make a class 
 that takes this kind of data and does the required operations fast. Maybe 
 some phycisists have implemented these things already. Any thoughts?
 Curiously enough, I have recently been discussing with Travis O. about
 how to represent sparse matrices with complete generality.  One of the
 possibilities is to use what Travis call synthetic dimensions.  The
 idea behind it is easy: use a table with as many columns as dimensions,
 and add another one for the actual values of the array.  For a 3-D
 sparse array, this looks like:

 dim0 | dim1 | dim2 | value
 ==
     0 |   0  |   0  | val0
     0 |  10  | 100  | val1
    20 |   5  | 202  | val2
 This coordinate format is also what's used by the MATLAB Tensor
 Toolbox. They have a paper justifying this choice and describing some
 tricks for how to work with them:
    http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
 (Spoiler: you use a lot of sort operations. Conveniently, timsort
 appears to be perfectly adapted for their algorithmic requirements.)

 Uh, I do not have access to the article, but yes, sorting makes a lot of
 sense for these scenarios (this is why I was suggesting using indexes,
 which is sort of sorting too).

Doh, sorry. Sticking the title into google scholar gives me this:
  http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SIAM-67648.pdf

- N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 5:27 PM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 On Wed, May 2, 2012 at 5:45 PM, Moroney, Catherine M (388D)
 catherine.m.moro...@jpl.nasa.gov wrote:

 Thanks to Perry for some very useful off-list conversation.   I realize
 that
 I wasn't being clear at all in my earlier description of the problem so
 here it is
 in a nutshell:

 Find the best match in an array t(5000, 7) for a single vector e(7).  Now
 scale
 it up so e is (128, 512, 7) and I want to return a (128, 512) array of the
 t-identifiers
 that are the best match for e.  Best match is defined as the minimum
 Euclidean distance.


 It sounds like you want to find the nearest neighbor to a point in a
 high-dimensional space. This sounds like a job for a spacial data structure
 like a KD-tree.  See:

 http://docs.scipy.org/doc/scipy/reference/spatial.html
 http://mloss.org/software/view/143/
 http://www.mrzv.org/software/pyANN/
 etc.

In general this is a good suggestion - I was going to mention it
earlier - but I think for this particular problem it is not better
than the brute force and argmin() NumPy approach. On my laptop, the
KDTree query is about a factor of 7 slower (ignoring the time cost to
create the KDTree)

In [45]: def foo1(element, targets):
distance_squared = ((element-targets)**2).sum(1)
min_index = distance_squared.argmin()
return sqrt(distance_squared[min_index]), min_index
   :

In [46]: def foo2(element, T):
return T.query(element)


In [47]: element = np.arange(1,8)
In [48]: targets = np.random.uniform(0,8,(5000,7))
In [49]: T = scipy.spatial.KDTree(targets)
In [50]: %timeit foo1(element, targets)
1000 loops, best of 3: 401 us per loop
In [51]: %timeit foo2(element, T)
100 loops, best of 3: 2.92 ms per loop

Just to make sure they say the same thing:

In [53]: foo1(element, targets)
Out[53]: (1.8173671152898632, 570)
In [54]: foo2(element, T)
Out[54]: (1.8173671152898632, 570)


I think KDTree is more optimal for larger searches (more than 5000
elements), and fewer dimensions. For example, with 500,000 elements
and 2 dimensions, I get 34 ms for NumPy and 2 ms for the KDtree.

Back to the original question, for 400 us per search, even over
128x512 elements that would be 26 seconds total, I think? That might
not be too bad.

Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] record arrays initialization

2012-05-02 Thread Moroney, Catherine M (388D)


On May 2, 2012, at 3:23 PM, numpy-discussion-requ...@scipy.org
 wrote:

 A) ?How do I most efficiently construct a record array from a single array?
 I want to do the following, but it segfaults on me when i try to print b.
 
 vtype = [(x, numpy.ndarray)]
 a = numpy.arange(0, 16).reshape(4,4)
 b = numpy.recarray((4), dtype=vtype, buf=a)
 
 I prefer not to use record arrays, and stick to structured arrays:
 
 In [11]: vtype = np.dtype([('x', (np.float, 4))])
 
 In [12]: a = np.arange(16.).reshape((4,4))
 
 In [13]: a.view(vtype)
 Out[13]:
 array([[([0.0, 1.0, 2.0, 3.0],)],
   [([4.0, 5.0, 6.0, 7.0],)],
   [([8.0, 9.0, 10.0, 11.0],)],
   [([12.0, 13.0, 14.0, 15.0],)]],
  dtype=[('x', 'f8', (4,))])

Using structured arrays is making my code complex when I try to call the
vectorized function.  If I stick to the original record arrays, what's the 
best way of initializing b from a without doing an row-by-row copy?  

Catherine
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Kevin Jacobs jac...@bioinformed.com
On Wed, May 2, 2012 at 7:25 PM, Aronne Merrelli
aronne.merre...@gmail.comwrote:

 In general this is a good suggestion - I was going to mention it
 earlier - but I think for this particular problem it is not better
 than the brute force and argmin() NumPy approach. On my laptop, the
 KDTree query is about a factor of 7 slower (ignoring the time cost to
 create the KDTree)


The cKDTree implementation is more than 4 times faster than the brute-force
approach:

T = scipy.spatial.cKDTree(targets)

In [11]: %timeit foo1(element, targets)   # Brute force
1000 loops, best of 3: 385 us per loop

In [12]: %timeit foo2(element, T) # cKDTree
1 loops, best of 3: 83.5 us per loop

In [13]: 385/83.5
Out[13]: 4.610778443113772


A FLANN implementation should be even faster--perhaps by as much as another
factor of two.

-Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Travis Oliphant

On May 2, 2012, at 5:28 PM, Stéfan van der Walt wrote:

 On Wed, May 2, 2012 at 3:20 PM, Francesc Alted franc...@continuum.io wrote:
 On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
 Well, as the OP said, coo_matrix does not support dimensions larger than
 2, right?
 
 That's just an implementation detail, I would imagine--I'm trying to
 figure out if there is a new principle behind synthetic dimensions?
 By the way, David Cournapeau mentioned using b-trees for sparse ops a
 while ago; did you ever talk to him about those ideas?

The only new principle (which is not strictly new --- but new to NumPy's 
world-view) is using one (or more) fields of a structured array as synthetic 
dimensions which replace 1 or more of the raw table dimensions. Thus, you 
could create a view of a NumPy array (or a group of NumPy arrays) where 1 or 
more dimensions is replaced with these sparse dimensions.  This is a 
fully-general way to handle a mixture of sparse and dense structures in one 
general array interface.  

However, you lose the O(1) lookup as now you must search for the non-zero items 
in order to implement algorithms (indexes are critical and Francesc has some 
nice indexes in PyTables).  

A group-by operation can be replaced by an operation on a sparse dimension 
where you have mapped attributes to 1 or more dimensions in the underlying 
array. 

coo_matrix is just a special case of this more general idea.If you add the 
ability to compress attributes, then you get csr, csc, and various other forms 
of matrices as well.  

More to come  If you are interested in this sort of thing please let me 
know


-Travis


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 4:26 PM, Moroney, Catherine M (388D)
catherine.m.moro...@jpl.nasa.gov wrote:
 Using structured arrays is making my code complex when I try to call the
 vectorized function.  If I stick to the original record arrays, what's the
 best way of initializing b from a without doing an row-by-row copy?

What does your original data look like?  It seems like `a` is already
what you need after the reshape?

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 A FLANN implementation should be even faster--perhaps by as much as another
 factor of two.

I guess it depends on whether you care about the Approximate in
Fast Library for Approximate Nearest Neighbors.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Benjamin Root
On Wednesday, May 2, 2012, Stéfan van der Walt wrote:

 On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs 
 jac...@bioinformed.comjavascript:;
 
 bioinfor...@gmail.com javascript:; wrote:
  A FLANN implementation should be even faster--perhaps by as much as
 another
  factor of two.

 I guess it depends on whether you care about the Approximate in
 Fast Library for Approximate Nearest Neighbors.

 Stéfan


This is why I love following these lists!  I don't think I ever would have
come across this method on my own. Nifty!

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record arrays initialization

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 6:46 PM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 On Wed, May 2, 2012 at 7:25 PM, Aronne Merrelli aronne.merre...@gmail.com
 wrote:

 In general this is a good suggestion - I was going to mention it
 earlier - but I think for this particular problem it is not better
 than the brute force and argmin() NumPy approach. On my laptop, the
 KDTree query is about a factor of 7 slower (ignoring the time cost to
 create the KDTree)


 The cKDTree implementation is more than 4 times faster than the brute-force
 approach:

 T = scipy.spatial.cKDTree(targets)

 In [11]: %timeit foo1(element, targets)   # Brute force
 1000 loops, best of 3: 385 us per loop

 In [12]: %timeit foo2(element, T)         # cKDTree
 1 loops, best of 3: 83.5 us per loop

 In [13]: 385/83.5
 Out[13]: 4.610778443113772

Wow, not sure how I missed that! It even seems to scale better than
linear (some of that 83us is call overhead, I guess):

In [35]: %timeit foo2(element, T)
1 loops, best of 3: 115 us per loop
In [36]: elements = np.random.uniform(0,8,[128,512,7])
In [37]: %timeit foo2(elements.reshape((128*512,7)), T)
1 loops, best of 3: 2.66 s per loop

So only 2.7 seconds to search the whole set. Not bad!

Cheers,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant tra...@continuum.io wrote:
 The only new principle (which is not strictly new --- but new to NumPy's 
 world-view) is using one (or more) fields of a structured array as synthetic 
 dimensions which replace 1 or more of the raw table dimensions.

Ah, thanks--that's the detail I was missing.  I wonder if the
contiguity requirement will hamper us here, though.  E.g., I could
imagine that some tree structure might be more suitable to storing and
organizing indices, and for large arrays we wouldn't like to make a
copy for each operation.  I guess we can't wait for discontiguous
arrays to come along, though :)

 More to come  If you are interested in this sort of thing please let me 
 know

Definitely--if we can optimize this machinery it will be beneficial to
scipy.sparse as well.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Charles R Harris
On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, May 2, 2012 at 9:53 PM, Francesc Alted franc...@continuum.io
 wrote:
  On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
  Hi all,
 
  I'm currently writing a code that needs three dimensional data (for the
 physicists it's dimensions are atom, ion, level). The problem is that not
 all combinations do exist (a sparse array). Sparse matrices in scipy only
 deal with two dimensions. The operations that I need to do on those are
 running functions like exp(item/constant) on all of the items. I also want
 to sum them up in the last dimension. What's the best way to make a class
 that takes this kind of data and does the required operations fast. Maybe
 some phycisists have implemented these things already. Any thoughts?
 
  Curiously enough, I have recently been discussing with Travis O. about
  how to represent sparse matrices with complete generality.  One of the
  possibilities is to use what Travis call synthetic dimensions.  The
  idea behind it is easy: use a table with as many columns as dimensions,
  and add another one for the actual values of the array.  For a 3-D
  sparse array, this looks like:
 
  dim0 | dim1 | dim2 | value
  ==
 0 |   0  |   0  | val0
 0 |  10  | 100  | val1
20 |   5  | 202  | val2

 This coordinate format is also what's used by the MATLAB Tensor
 Toolbox. They have a paper justifying this choice and describing some
 tricks for how to work with them:
  http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
 (Spoiler: you use a lot of sort operations. Conveniently, timsort
 appears to be perfectly adapted for their algorithmic requirements.)


Timsort probably isn't the best choice here, it is optimized for python
lists of python objects where there is at least one level of indirection
and compares are expensive, even more expensive for compound objects. If
the coordinates are stored in numpy structured arrays an ordinary sort is
likely to be faster. Lexsort might even be faster as it could access
aligned integer data and not have to move lists of indexes around.

I'm not sure why one would make up a new term like synthetic
 dimensions though, it's just standard coordinate format...

 Though, for the original poster, depending on their exact problem,
 they might be better off just using a list or object ndarray of
 scipy.sparse matrices. Or some coordinate arrays like above, plus
 add.reduceat for the sums they mentioned.


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sparse array data

2012-05-02 Thread Travis Oliphant

On May 2, 2012, at 10:03 PM, Stéfan van der Walt wrote:

 On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant tra...@continuum.io wrote:
 The only new principle (which is not strictly new --- but new to NumPy's 
 world-view) is using one (or more) fields of a structured array as 
 synthetic dimensions which replace 1 or more of the raw table dimensions.
 
 Ah, thanks--that's the detail I was missing.  I wonder if the
 contiguity requirement will hamper us here, though.  E.g., I could
 imagine that some tree structure might be more suitable to storing and
 organizing indices, and for large arrays we wouldn't like to make a
 copy for each operation.  I guess we can't wait for discontiguous
 arrays to come along, though :)

Actually, it's better to keep the actual data together as much as possible, I 
think, and simulate a tree structure with a layer on top --- i.e. an index.

Different algorithms will prefer different orderings of the underlying data 
just as today different algorithms prefer different striding patterns on the 
standard, strided view of a dense array. 

-Travis


 
 More to come  If you are interested in this sort of thing please let me 
 know
 
 Definitely--if we can optimize this machinery it will be beneficial to
 scipy.sparse as well.
 
 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion