-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jim Fulton wrote:
> On Mon, May 10, 2010 at 3:27 PM, Ryan Noon <rmn...@gmail.com> wrote:

<snip>

>> Here's my code:
>>
>>         self.storage = FileStorage(self.dbfile, pack_keep_old=False)
>>         cache_size = 512 * 1024 * 1024
>>
>>         self.db = DB(self.storage, pool_size=1, cache_size_bytes=cache_size,
>> historical_cache_size_bytes=cache_size, database_name=self.name)
>>         self.connection = self.db.open()
>>         self.root = self.connection.root()
>>
>> and the actual insertions...
>>             set_default = wordid_to_docset.root.setdefault #i can be kinda
>> pathological with loop operations
>>             array_append = array.append
>>             for docid, wordset in docid_to_wordset.iteritems(): #one of my
>> older sqlite oodb's, not maintaining a cache...just iterating (small
>> constant mem usage)
>>                 for wordid in wordset:
>>                     docset = set_default(wordid, array('L'))

Note that you are creating the array willy-nilly in the inner loop here.
 I would nearly always write that as::

                       docset = wordid_to_docset.root.get(wordid)
                       if docset is None:
                           docset = array('L')
                           wordid_to_docset.root[worid] = docet

>>                     array_append(docset, docid)

Why are you using an unbound method here?  The following would be
clearer, and almost certainly not noticeably slower:

                       docset.append(docid)

>>                 n_docs_traversed += 1
>>                 if n_docs_traversed % 1000 == 1:
>>                     status_tick()
>>                 if n_docs_traversed % 25000 == 1:
>>                     self.do_commit() #just commits the oodb by calling
>> transaction.commit()

Don't forget the final commit. ;)  Also, I don't know what the 'array'
type is here, but if it doesn't manage its own persistence, then you
have a bug here:  mutating a non-persistent sub-object doesn't
automatically cause the persistent container to register as dirty with
the transaction, which means you may lose changes after the object is
evicted from the RAM cache, or at shutdown.

<snip>

> Also note that memory allocated by Python is generally not returned to
> the OS when freed.

Python's own internal heap management has gotten noticeably better about
returning reclaimed chunks to the OS in 2.6.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tsea...@palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkvoeIoACgkQ+gerLs4ltQ6WjACgsvDmG96nD2iPl/noiHS5ThdL
SdIAn1Ei+yfzRyJ4W1lwvuThBj9BxzGt
=nrBB
-----END PGP SIGNATURE-----

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to