We're using HDF5 files for the base file format of our data, and I've
been trying to write a reader/writer library in Python that hides all
the PyTables stuff (so the same objects can be used on in memory
objects). But I've been having problems determining which files can be
closed and when.
The file layout is either one per day with three datasets in each file
or one file per year, with two or more datasets for each day in the
year. (In both cases, there's only one dataset per day that we actually
use.)
My current simple implementation, our data objects have two
WeakValueDictionary caches, one for the files and one for the day
objects my library uses. The file cache is full of tokens which wrap the
tables.File objects, and when reclaimed, close the associated file. By
giving copies of the token to each day object that needs it, when all
the day objects disappear, the file closes automatically.
Now in trying to abstract the disk storage, I'm having trouble figuring
out how to keep track of the open files, and knowing when to close them.
I'm keeping a list of weakrefs to nodes in the file, and when they all
go away, I try to close the file, but I get an exception
"exceptions.AttributeError: AttributeError("'NoneType' object has no
attribute '_f_close'",) in <function remove at ...> ignored". (remove
doesn't call _f_close, but file.close, so it must be coming from in
there. (I've attached this implementation.)
I'm going through all this trouble because (1) I want to eliminate as
many "Closing remaining files" messages as possible, (2) I'd really like
to save the time reopening and rereading the file when possible, and (3)
if one object modifies the day object, this should be reflected if
trying to be read.
I can't help feeling there's better techniques for what I'm trying.
Any advice?
--
Anthony Foglia
Princeton Consultants
(609) 987-8787 x233
import sys
import weakref
import tables
class H5FileCache(object) :
class CacheItem(object) :
__slots__ = ("file","open_nodes")
def __init__(self) :
self.file = None
# Should this be a dictionary from node name to node object?
# Would make searching for pre-existing nodes easier, but the
# keyed ref would need to store both the key in this
# dictionary and the enclosing one...
self.open_nodes = set()
def __init__(self) :
def remove(wr, selfref=weakref.ref(self)) :
self = selfref()
print "Removing weakref:",wr
print "weakref has key:",wr.key
sys.stdout.flush()
if self is not None :
print self.data[wr.key].open_nodes
sys.stdout.flush()
self.data[wr.key].open_nodes.remove(wr)
print self.data[wr.key].open_nodes
sys.stdout.flush()
print "not self.data[wr.key].open_nodes", \
(not self.data[wr.key].open_nodes)
sys.stdout.flush()
if not self.data[wr.key].open_nodes :
print "Testing..."
sys.stdout.flush()
print "Closing file %s..." % str(self.data[wr.key].file.filename)
sys.stdout.flush()
self.data[wr.key].file.close()
del self.data[wr.key]
self._remove = remove
self.data = {}
def get_node(self, filename, nodename, mode='r') :
# Need to handle root group better. That will never be freed.
# Maybe whenever a dataset is freed go through the remaining and
# remove the root node...
try :
cache_item = self.data[filename, mode]
except KeyError :
# open file
cache_item = self.CacheItem()
cache_item.file = tables.openFile(filename, mode=mode)
self.data[filename, mode] = cache_item
for n_ref in cache_item.open_nodes :
n = n_ref()
if n._v_pathname == nodename :
return n
else :
n = cache_item.file.getNode(nodename)
cache_item.open_nodes.add(
weakref.KeyedRef(n, self._remove, (filename, mode)))
return n
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users