[Numpy-discussion] lazy loading ndarrays

2011-07-26 Thread Craig Yoshioka
I want to subclass ndarray to create a class for image and volume data, and 
when referencing a file I'd like to have it load the data only when accessed.  
That way the class can be used to quickly set and manipulate header values, and 
won't load data unless necessary.  What is the best way to do this?  Are there 
any hooks I can use to load the data when an array's values are first accessed 
or manipulated?  I tried some trickery with __array_interface__ but couldn't 
get it to work very well.  Should I just use a memmapped array, and give up on 
a purely 'lazy' approach?

Thanks, and cheers!
-Craig
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy loading ndarrays

2011-07-26 Thread Matthew Brett
Hi,

On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka crai...@me.com wrote:
 I want to subclass ndarray to create a class for image and volume data, and 
 when referencing a file I'd like to have it load the data only when accessed. 
  That way the class can be used to quickly set and manipulate header values, 
 and won't load data unless necessary.  What is the best way to do this?  Are 
 there any hooks I can use to load the data when an array's values are first 
 accessed or manipulated?  I tried some trickery with __array_interface__ but 
 couldn't get it to work very well.  Should I just use a memmapped array, and 
 give up on a purely 'lazy' approach?

What kind of images are you loading?   We do lazy loading in nibabel,
for medical image type formats:

http://nipy.sourceforge.net/nibabel/

- but our images _have_ arrays and headers, rather than (appearing to
be) arrays.  Thus something like:

import nibabel as nib

img = nib.load('my_image.img')
# data not loaded at this point
data = img.get_data()
# data loaded now.  Maybe memmapped if the format allows

If you think you might have similar needs, I'd be very happy to help
you get going in nibabel...

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy loading ndarrays

2011-07-26 Thread Joe Kington
Similar to what Matthew said, I often find that it's cleaner to make a
seperate class with a data (or somesuch) property that lazily loads the
numpy array.

For example, something like:

class DataFormat(object):
def __init__(self, filename):
self.filename = filename
for key, value in self._read_header().iteritems():
setattr(self, key, value)

@property
def data(self):
try:
return self._data
except AttributeError:
self._data = self._read_data()
return self._data

Hope that helps,
-Joe

On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka crai...@me.com wrote:
  I want to subclass ndarray to create a class for image and volume data,
 and when referencing a file I'd like to have it load the data only when
 accessed.  That way the class can be used to quickly set and manipulate
 header values, and won't load data unless necessary.  What is the best way
 to do this?  Are there any hooks I can use to load the data when an array's
 values are first accessed or manipulated?  I tried some trickery with
 __array_interface__ but couldn't get it to work very well.  Should I just
 use a memmapped array, and give up on a purely 'lazy' approach?

 What kind of images are you loading?   We do lazy loading in nibabel,
 for medical image type formats:

 http://nipy.sourceforge.net/nibabel/

 - but our images _have_ arrays and headers, rather than (appearing to
 be) arrays.  Thus something like:

 import nibabel as nib

 img = nib.load('my_image.img')
 # data not loaded at this point
 data = img.get_data()
 # data loaded now.  Maybe memmapped if the format allows

 If you think you might have similar needs, I'd be very happy to help
 you get going in nibabel...

 Best,

 Matthew
  ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy loading ndarrays

2011-07-26 Thread Nadav Horesh
For lazy data loading I use memory-mapped array (numpy.memmap): I use it to 
process multi-image files that are much larger than the available RAM.

   Nadav.


From: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] 
On Behalf Of Craig Yoshioka [crai...@me.com]
Sent: 27 July 2011 05:41
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] lazy loading ndarrays

ok, that was an alternative strategy I was going to try... but not my favorite 
as I'd have to explicitly perform all operations on the data portion of the 
object, and given numpy's mechanics, assignment would also have to be explicit, 
and creating new image objects implicitly would be trickier:

image3 = Image(image1)
image3.data = ( image1.data + 19.0 ) * image2.data

vs.

image3 = ( image1 + 19 ) * image2

I suppose option A isn't that bad though and getting lazy loading would be very 
straightforward

--

On a side note, I prefer this construct for lazy operations... curious to see 
what people's reactions are, ie: that's horrible!

class lazy_property(object):
'''
meant to be used for lazy evaluation of object attributes.
should represent non-mutable return value, as whatever is returned replaces 
itself permanently.
'''

def __init__(self,fget):
self.fget = fget


def __get__(self,obj,cls):
value = self.fget(obj)
setattr(obj,self.fget.func_name,value)
return value


class DataFormat(object):
def __init__(self,loader):
self.loadData = loader
@lazy_property
def data(self):
return self.loadData()



On Jul 26, 2011, at 5:45 PM, Joe Kington wrote:

Similar to what Matthew said, I often find that it's cleaner to make a seperate 
class with a data (or somesuch) property that lazily loads the numpy array.

For example, something like:

class DataFormat(object):
def __init__(self, filename):
self.filename = filename
for key, value in self._read_header().iteritems():
setattr(self, key, value)

@property
def data(self):
try:
return self._data
except AttributeError:
self._data = self._read_data()
return self._data

Hope that helps,
-Joe

On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett 
matthew.br...@gmail.commailto:matthew.br...@gmail.com wrote:
Hi,

On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka 
crai...@me.commailto:crai...@me.com wrote:
 I want to subclass ndarray to create a class for image and volume data, and 
 when referencing a file I'd like to have it load the data only when accessed. 
  That way the class can be used to quickly set and manipulate header values, 
 and won't load data unless necessary.  What is the best way to do this?  Are 
 there any hooks I can use to load the data when an array's values are first 
 accessed or manipulated?  I tried some trickery with __array_interface__ but 
 couldn't get it to work very well.  Should I just use a memmapped array, and 
 give up on a purely 'lazy' approach?

What kind of images are you loading?   We do lazy loading in nibabel,
for medical image type formats:

http://nipy.sourceforge.net/nibabel/

- but our images _have_ arrays and headers, rather than (appearing to
be) arrays.  Thus something like:

import nibabel as nib

img = nib.load('my_image.img')
# data not loaded at this point
data = img.get_data()
# data loaded now.  Maybe memmapped if the format allows

If you think you might have similar needs, I'd be very happy to help
you get going in nibabel...

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.orgmailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.orgmailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion