timlash wrote: > Still fairly new to Python. I wrote a program that used a class > called RectangularArray as described here: > > class RectangularArray: > def __init__(self, rows, cols, value=0): > self.arr = [None]*rows > self.row = [value]*cols > def __getitem__(self, (i, j)): > return (self.arr[i] or self.row)[j] > def __setitem__(self, (i, j), value): > if self.arr[i]==None: self.arr[i] = self.row[:] > self.arr[i][j] = value > > This class was found in a 14 year old post: > http://www.python.org/search/hypermail/python-recent/0106.html > > This worked great and let me process a few hundred thousand data > points with relative ease. However, I soon wanted to start sorting > arbitrary portions of my arrays and to transpose others. I turned to > Numpy rather than reinventing the wheel with custom methods within the > serviceable RectangularArray class. However, once I refactored with > Numpy I was surprised to find that the execution time for my program > doubled! I expected a purpose built array module to be more efficient > rather than less. > > I'm not doing any linear algebra with my data. I'm working with > rectangular datasets, evaluating individual rows, grouping, sorting > and summarizing various subsets of rows. > > Is a Numpy implementation overkill for my data handling uses? Should > I evaluate prior array modules such as Numeric or Numarray? Are there > any other modules suited to handling tabular data? Would I be best > off expanding the RectangularArray class for the few data > transformation methods I need? > > Any guidance or suggestions would be greatly appreciated!
Do you have many rows with zeros? That might be the reason why your self-made approach shows better performance. Googling for "numpy sparse" finds: http://www.scipy.org/SciPy_Tutorial Maybe one of the sparse matrix implementations in scipy works for you. Peter -- http://mail.python.org/mailman/listinfo/python-list