Thanks Laurence, this looks really helpful. The simplicity of ZODB's concept and the joy of using it apparently hides some of the complexity necessary to use it efficiently. I'll check this out when I circle back to data stuff tomorrow.
Have a great morning/day/evening! -Ryan On Tue, May 11, 2010 at 5:44 PM, Laurence Rowe <l...@lrowe.co.uk> wrote: > I think this means that you are storing all of your data in a single > persistent object, the database root PersistentMapping. You need to > break up your data into persistent objects (instances of objects that > inherit from persistent.Persistent) for the ZODB to have a chance of > performing memory mapping. You want to do something like: > > import transaction > from ZODB import FileStorage, DB > from BTrees.LOBTree import BTree, TreeSet > storage = FileStorage.FileStorage('/tmp/test-filestorage.fs') > db = DB(storage) > conn = db.open() > root = conn.root() > transaction.begin() > index = root['index'] = BTree() > values = index = TreeSet() > values.add(42) > transaction.commit() > > You should probably read: > http://www.zodb.org/documentation/guide/modules.html#btrees-package. > Since that was written an L variants of the BTree types have been > introduced for storing 64bit integers. I'm using an LOBTree because > that maps 64bit integers to python objects. For values I'm using an > LOTreeSet, though you could also use an LLTreeSet (which has larger > buckets). > > Laurence > > On 12 May 2010 00:37, Ryan Noon <rmn...@gmail.com> wrote: > > Hi Jim, > > I'm really sorry for the miscommunication, I thought I made that clear in > my > > last email: > > "I'm wrapping ZODB in a 'ZMap' class that just forwards all the > dictionary > > methods to the ZODB root and allows easy interchangeability with my old > > sqlite OODB abstraction." > > wordid_to_docset is a "ZMap", which just wraps the ZODB > > boilerplate/connection and forwards dictionary methods to the root. If > this > > seems superfluous, it was just to maintain backwards compatibility with > all > > of the code I'd already written for the sqlite OODB I was using before I > > switched to ZODB. Whenever you see something like wordid_to_docset[id] > it's > > just doing self.root[id] behind the scenes in a __setitem__ call inside > the > > ZMap class, which I've pasted below. > > The db is just storing longs mapped to array('L')'s with a few thousand > > longs in em. I'm going to try switching to the persistent data structure > > that Laurence suggested (a pointer to relevant documentation would be > really > > useful), but I'm still sorta worried because in my experimentation with > ZODB > > so far I've never been able to observe it sticking to any cache limits, > no > > matter how often I tell it to garbage collect (even when storing very > small > > values that should give it adequate granularity...see my experiment at > the > > end of my last email). If the memory reported to the OS by Python 2.6 is > > the problem I'd understand, but memory usage goes up the second I start > > adding new things (which indicates that Python is asking for more and not > > actually freeing internally, no?). > > If you feel there's something pathological about my memory access > patterns > > in this operation I can just do the actual inversion step in Hadoop and > load > > the output into ZODB for my application later, I was just hoping to keep > all > > of my data in OODB's the entire time. > > Thanks again all of you for your collective time. I really like ZODB so > > far, and it bugs me that I'm likely screwing it up somewhere. > > Cheers, > > Ryan > > > > > > class ZMap(object): > > > > def __init__(self, name=None, dbfile=None, cache_size_mb=512, > > autocommit=True): > > self.name = name > > self.dbfile = dbfile > > self.autocommit = autocommit > > > > self.__hash__ = None #can't hash this > > > > #first things first, figure out if we need to make up a name > > if self.name == None: > > self.name = make_up_name() > > if sep in self.name: > > if self.name[-1] == sep: > > self.name = self.name[:-1] > > self.name = self.name.split(sep)[-1] > > > > > > if self.dbfile == None: > > self.dbfile = self.name + '.zdb' > > > > self.storage = FileStorage(self.dbfile, pack_keep_old=False) > > self.cache_size = cache_size_mb * 1024 * 1024 > > > > self.db = DB(self.storage, pool_size=1, > > cache_size_bytes=self.cache_size, > > historical_cache_size_bytes=self.cache_size, database_name=self.name) > > self.connection = self.db.open() > > self.root = self.connection.root() > > > > print 'Initializing ZMap "%s" in file "%s" with %dmb cache. > Current > > %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root)) > > > > # basic operators > > def __eq__(self, y): # x == y > > return self.root.__eq__(y) > > def __ge__(self, y): # x >= y > > return len(self) >= len(y) > > def __gt__(self, y): # x > y > > return len(self) > len(y) > > def __le__(self, y): # x <= y > > return not self.__gt__(y) > > def __lt__(self, y): # x < y > > return not self.__ge__(y) > > def __len__(self): # len(x) > > return len(self.root) > > > > > > # dictionary stuff > > def __getitem__(self, key): # x[key] > > return self.root[key] > > def __setitem__(self, key, value): # x[key] = value > > self.root[key] = value > > self.__commit_check() # write back if necessary > > > > def __delitem__(self, key): # del x[key] > > del self.root[key] > > > > def get(self, key, default=None): # x[key] if key in x, else default > > return self.root.get(key, default) > > def has_key(self, key): # True if x has key, else False > > return self.root.has_key(key) > > def items(self): # list of key/val pairs > > return self.root.items() > > def keys(self): > > return self.root.keys() > > def pop(self, key, default=None): > > return self.root.pop() > > def popitem(self): #remove and return an arbitrary key/val pair > > return self.root.popitem() > > def setdefault(self, key, default=None): > > #D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D > > return self.root.setdefault(key, default) > > def values(self): > > return self.root.values() > > > > def copy(self): #copy it? dubiously necessary at the moment > > NOT_IMPLEMENTED('copy') > > > > > > # iteration > > def __iter__(self): # iter(x) > > return self.root.iterkeys() > > > > def iteritems(self): #iterator over items, this can be hellaoptimized > > return self.root.iteritems() > > > > def itervalues(self): > > return self.root.itervalues() > > def iterkeys(self): > > return self.root.iterkeys() > > > > # practical realities of the abstraction > > def garbage_collect(self): > > self.root._p_jar.cacheGC() > > #self.connection.cacheGC() > > > > def commit(self): > > return self.__commit_check(force=True) > > > > def __commit_check(self, force=False): > > if self.autocommit or force: > > transaction.commit() > > > > > > On Tue, May 11, 2010 at 3:50 AM, Jim Fulton <j...@zope.com> wrote: > >> > >> On Mon, May 10, 2010 at 8:20 PM, Ryan Noon <rmn...@gmail.com> wrote: > >> > P.S. About the data structures: > >> > wordset is a freshly unpickled python set from my old sqlite oodb > >> > thingy. > >> > The new docsets I'm keeping are 'L' arrays from the stdlib array > module. > >> > I'm up for using ZODB's builtin persistent data structures if it > makes > >> > a > >> > lot of sense to do so, but it sorta breaks my abstraction a bit and I > >> > feel > >> > like the memory issues I'm having are somewhat independent of the > >> > container > >> > data structures (as I'm having the same issue just with fixed size > >> > strings). > >> > >> This is getting tiresome. We can't really advise you because we can't > >> see what data structures you're using and we're wasting too much time > >> guessing. We wouldn't have to guess and grill you if you showed a > >> complete demonstration program, or at least one that showed what the > >> heck your doing. > >> > >> The program you've showed so far is so incomplete, perhaps we're > >> missing the obvious. > >> > >> In your original program, you never actually store anything in the > >> database. You assign the database root to self.root, but never use > >> self.root. (The variable self is not defined and we're left to assume > >> that this disembodied code is part of a method definition.) In your > >> most recent snippet, you don't show any database access. If you > >> never actually store anything in the database, then nothing will be > >> removed from memory. > >> > >> You're inserting data into wordid_to_docset, but you don't show its > >> definition and won't tell us what it is. > >> > >> Jim > >> > >> -- > >> Jim Fulton > > > > > > > > -- > > Ryan Noon > > Stanford Computer Science > > BS '09, MS '10 > > > > _______________________________________________ > > For more information about ZODB, see the ZODB Wiki: > > http://www.zope.org/Wikis/ZODB/ > > > > ZODB-Dev mailing list - ZODB-Dev@zope.org > > https://mail.zope.org/mailman/listinfo/zodb-dev > > > > > -- Ryan Noon Stanford Computer Science BS '09, MS '10
_______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev