[Pytables-users] Caching Nodes and Files

Anthony Foglia Fri, 31 Jul 2009 08:17:22 -0700

We're using HDF5 files for the base file format of our data, and I'vebeen trying to write a reader/writer library in Python that hides allthe PyTables stuff (so the same objects can be used on in memoryobjects). But I've been having problems determining which files can beclosed and when.

The file layout is either one per day with three datasets in each fileor one file per year, with two or more datasets for each day in theyear. (In both cases, there's only one dataset per day that we actuallyuse.)

My current simple implementation, our data objects have twoWeakValueDictionary caches, one for the files and one for the dayobjects my library uses. The file cache is full of tokens which wrap thetables.File objects, and when reclaimed, close the associated file. Bygiving copies of the token to each day object that needs it, when allthe day objects disappear, the file closes automatically.

Now in trying to abstract the disk storage, I'm having trouble figuringout how to keep track of the open files, and knowing when to close them.I'm keeping a list of weakrefs to nodes in the file, and when they allgo away, I try to close the file, but I get an exception"exceptions.AttributeError: AttributeError("'NoneType' object has noattribute '_f_close'",) in <function remove at ...> ignored". (removedoesn't call _f_close, but file.close, so it must be coming from inthere. (I've attached this implementation.)

I'm going through all this trouble because (1) I want to eliminate asmany "Closing remaining files" messages as possible, (2) I'd really liketo save the time reopening and rereading the file when possible, and (3)if one object modifies the day object, this should be reflected iftrying to be read.

I can't help feeling there's better techniques for what I'm trying.Any advice?


--
Anthony Foglia
Princeton Consultants
(609) 987-8787 x233

import sys
import weakref
import tables

class H5FileCache(object) :
   class CacheItem(object) :
      __slots__ = ("file","open_nodes")

      def __init__(self) :
         self.file = None
         # Should this be a dictionary from node name to node object?
         # Would make searching for pre-existing nodes easier, but the
         # keyed ref would need to store both the key in this
         # dictionary and the enclosing one...
         self.open_nodes = set()

   def __init__(self) :
      def remove(wr, selfref=weakref.ref(self)) :
         self = selfref()
         print "Removing weakref:",wr
         print "weakref has key:",wr.key
         sys.stdout.flush()
         if self is not None :
            print self.data[wr.key].open_nodes
            sys.stdout.flush()
            self.data[wr.key].open_nodes.remove(wr)
            print self.data[wr.key].open_nodes
            sys.stdout.flush()
            print "not self.data[wr.key].open_nodes", \
                (not self.data[wr.key].open_nodes)
            sys.stdout.flush()

            if not self.data[wr.key].open_nodes :
               print "Testing..."
               sys.stdout.flush()
               print "Closing file %s..." % str(self.data[wr.key].file.filename)
               sys.stdout.flush()
               self.data[wr.key].file.close()
               del self.data[wr.key]
      self._remove = remove
      self.data = {}

   def get_node(self, filename, nodename, mode='r') :
      # Need to handle root group better.  That will never be freed.
      # Maybe whenever a dataset is freed go through the remaining and
      # remove the root node...
      try :
         cache_item = self.data[filename, mode]
      except KeyError :
         # open file
         cache_item = self.CacheItem()
         cache_item.file = tables.openFile(filename, mode=mode)
         self.data[filename, mode] = cache_item
      for n_ref in cache_item.open_nodes :
         n = n_ref()
         if n._v_pathname == nodename :
            return n
      else :
         n = cache_item.file.getNode(nodename)
         cache_item.open_nodes.add(
            weakref.KeyedRef(n, self._remove, (filename, mode)))
         return n

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Caching Nodes and Files

Reply via email to