We're using HDF5 files for the base file format of our data, and I've been trying to write a reader/writer library in Python that hides all the PyTables stuff (so the same objects can be used on in memory objects). But I've been having problems determining which files can be closed and when.

The file layout is either one per day with three datasets in each file or one file per year, with two or more datasets for each day in the year. (In both cases, there's only one dataset per day that we actually use.)

My current simple implementation, our data objects have two WeakValueDictionary caches, one for the files and one for the day objects my library uses. The file cache is full of tokens which wrap the tables.File objects, and when reclaimed, close the associated file. By giving copies of the token to each day object that needs it, when all the day objects disappear, the file closes automatically.

Now in trying to abstract the disk storage, I'm having trouble figuring out how to keep track of the open files, and knowing when to close them. I'm keeping a list of weakrefs to nodes in the file, and when they all go away, I try to close the file, but I get an exception "exceptions.AttributeError: AttributeError("'NoneType' object has no attribute '_f_close'",) in <function remove at ...> ignored". (remove doesn't call _f_close, but file.close, so it must be coming from in there. (I've attached this implementation.)

I'm going through all this trouble because (1) I want to eliminate as many "Closing remaining files" messages as possible, (2) I'd really like to save the time reopening and rereading the file when possible, and (3) if one object modifies the day object, this should be reflected if trying to be read.

I can't help feeling there's better techniques for what I'm trying. Any advice?

--
Anthony Foglia
Princeton Consultants
(609) 987-8787 x233
import sys
import weakref
import tables

class H5FileCache(object) :
   class CacheItem(object) :
      __slots__ = ("file","open_nodes")

      def __init__(self) :
         self.file = None
         # Should this be a dictionary from node name to node object?
         # Would make searching for pre-existing nodes easier, but the
         # keyed ref would need to store both the key in this
         # dictionary and the enclosing one...
         self.open_nodes = set()

   def __init__(self) :
      def remove(wr, selfref=weakref.ref(self)) :
         self = selfref()
         print "Removing weakref:",wr
         print "weakref has key:",wr.key
         sys.stdout.flush()
         if self is not None :
            print self.data[wr.key].open_nodes
            sys.stdout.flush()
            self.data[wr.key].open_nodes.remove(wr)
            print self.data[wr.key].open_nodes
            sys.stdout.flush()
            print "not self.data[wr.key].open_nodes", \
                (not self.data[wr.key].open_nodes)
            sys.stdout.flush()

            if not self.data[wr.key].open_nodes :
               print "Testing..."
               sys.stdout.flush()
               print "Closing file %s..." % str(self.data[wr.key].file.filename)
               sys.stdout.flush()
               self.data[wr.key].file.close()
               del self.data[wr.key]
      self._remove = remove
      self.data = {}

   def get_node(self, filename, nodename, mode='r') :
      # Need to handle root group better.  That will never be freed.
      # Maybe whenever a dataset is freed go through the remaining and
      # remove the root node...
      try :
         cache_item = self.data[filename, mode]
      except KeyError :
         # open file
         cache_item = self.CacheItem()
         cache_item.file = tables.openFile(filename, mode=mode)
         self.data[filename, mode] = cache_item
      for n_ref in cache_item.open_nodes :
         n = n_ref()
         if n._v_pathname == nodename :
            return n
      else :
         n = cache_item.file.getNode(nodename)
         cache_item.open_nodes.add(
            weakref.KeyedRef(n, self._remove, (filename, mode)))
         return n
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to