Re: [spambayes-dev] A bit confused about ZODB changes

Tony Meyer Mon, 24 Apr 2006 00:05:37 -0700

> I finally cvs up'd last night, installed ZODB and tried things out.

Somewhat OT, but did you have a large number of warnings when you  
installed ZODB?  I finally got around installing ZODB 3.6 and there  
were screens of warnings (ZODB 3.6.0, Python 2.4.1, OS X 10.4.6,  
Apple's gcc 4.0.0).


> My tte.py script (in contrib) is still selecting BerkDB instead of  
> ZODB.
> Looking at things, I see that it uses storage.database_type to  
> determine the
> database type and name.  My storage options are
>
>     [Storage]
>     persistent_storage_file: ~/hammie.db
>
> I run my tte.py script like so:
>
>     .../tte.py -d ~/hammie.db ...
>
> so storage.database_type is called like so:
>
>     storage.database_type([('-d', '/Users/skip/hammie.db')],
>                           default_type="ZODB",  
> default_name="~hammie.db")

"zodb", not "ZODB" (which I suppose it ought to have been), but yes.

> The _storage_options dictionary still says that -d means "dbm".   
> Shouldn't
> it say "zodb", since that's the new default?  After making that change
> locally, it now dumps a ZODB database.)

Does "d" stand for "database" or "dbm" (or "default"!)?  I figured it  
stood for "dbm", so left that alone.  If people think that it should  
mean "zodb" or should be the default (i.e. ZODB if importable, dbm  
otherwise), that's easy to do.

At the moment, "-d NAME" is really the same as:

[Storage]
persistent_use_database: dbm
persistent_storage_file: NAME

or "-o Storage:persistent_use_database:dbm -o  
Storage:persistent_storage_file:NAME"

And "-p NAME" is really the same as:

[Storage]
persistent_use_database: pickle
persistent_storage_file: NAME

or "-o Storage:persistent_use_database:pickle -o  
Storage:persistent_storage_file:NAME"

> Alternatively, should I even be using storage.database_type?

If you want to combine the command-line options and config file like  
the other scripts, then IMO yes.

> I need to use the -d flag because I write the database into a  
> different spot
> then mv it into place so as to avoid problems
> with simultaneous reads and writes during database generation.

I presume this would work:

     .../tte.py -o Storage:persistent_storage_file:~/hammie.db ...

Or changing the meaning of "-d" would.  I don't use the -d/-p  
switches, so don't personally care what they mean.

> If I'm using
> ZODB do I need to mv more than just one file into place?  I see  
> that the
> process generated .index, .lock and .tmp files as well.

I'll leave this one for Tim.  I *think* that .lock and .tmp should  
disappear when the ZODB is closed, and that .index will just be  
recreated (so would be optional).

> Finally, I don't understand how I'm supposed to get the spam and  
> ham counts
> from a ZODB database.  My spamcounts.py script (see contrib dir)  
> was making
> assumptions about the structure of the database, assuming it could  
> directly
> access the keys of a dbm or dict (pickle).  Any thoughts about how  
> to clean
> that up?  I think I should be calling db.spamprob(word), but I  
> still don't
> know how to get the raw spam/ham counts that script wants to print.

This part of all of the classifiers is pretty messy, IMO.  What I do  
is use the _wordinfokeys, _wordinfoget, etc, methods as you (later)  
changed spamcounts.py to do.  But these have prefixed underscores, so  
I guess we really shouldn't be doing that.

IIRC, using keys() doesn't work for dbm, because Mark put in some  
clever caching code that means that hapaxes aren't in keys(), so if  
you want the whole list, you have to use _wordinfokeys().  Or maybe  
that's the other way around...

If this was added to ZODBClassifier:

     def keys(self):
         return self.classifier.wordinfo.keys()

     def get(self, token):
         return self.classifier.wordinfo.get(token)

     def set(self, token, value):
         self.classifier.wordinfo.set(token, value)

Would that be enough?  It seems like the proper interface to me.

=Tony.Meyer
_______________________________________________
spambayes-dev mailing list
spambayes-dev@python.org
http://mail.python.org/mailman/listinfo/spambayes-dev

Re: [spambayes-dev] A bit confused about ZODB changes

Reply via email to