[pygr] Re: First pass seqdb review FINISHED

C. Titus Brown Wed, 18 Mar 2009 12:34:12 -0700

On Wed, Mar 18, 2009 at 10:51:36AM -0700, Christopher Lee wrote:
-> Thanks for all your detailed analysis, Titus!  Comments below...


Welcome!

-> > The class names in this file still confuse me.  Do we have any rules
-> > on how to build names for classes in inheritance hierarcy?  e.g.
-> 
-> I believe Steve McConnell's books (e.g. Code Complete) cover these  
-> naming convention issues.  We could look to see what he says.
-> 
-> >  FileDBSequence and SequenceFileDB -- which one is the sequence db,
-> >  which one is the sequence??
-> 
-> I guess I'm just following English, e.g. "sequence database" is a  
-> database; vs. "database sequence" is a sequence.  If someone has a  
-> better convention, please describe it.

So this is more of a pragmatic issue for me than a style question: I
just get confused when reading those names in code.  At this point I've
mentally over-analyzed the situation so I can't think clearly about it
any more.

I think FileDB_Sequence and Sequence_FileDB would be clearer to me,
emphasizing that the adjectival associations are (FileDB)+Sequence and
Sequence+(FileDB).  But those are ugly names so I don't want to do that.

Is this a mental problem unique to me?  Or does anyone else get
confused?

-> > The invert behavior is confusing (__invert__ points to
-> > classutil.standard_invert which then redirects to self._inverse,
-> > creating it from self._inverseClass if necessary).  Can we use
-> > descriptors or something "simpler" here?
-> 
-> Please elaborate further; I don't understand what you're suggesting.   
-> The Python standard defines __invert__ as the invert operator method  
-> name; I follow that standard.  How can a descriptor help us here?

It's not __invert__ itself, it's the optimization that constructs
__invert__ only when needed.  I don't think

class SomeClass(...):
   __invert__ = classutil.standard_invert
   ...
   _inverseClass = FooBar

is an obvious construct, although I do understand why you defined things
that way.  How about,

   __invert__ = classutil.lazy_create(FooBar)

instead, for example?  Then FooBar would be obviously and directly
connected to __invert__.

-> > Have we decided anything about logging, as Istvan established in his
-> > branch?  I'd like to get rid of explicit writes to sys.stderr, in
-> > particular.
-> 
-> Seems like a worthy goal.  Has anyone entered an issue for this in the  
-> tracker?  I'm not aware that anyone has explicitly proposed this for  
-> 0.8.

Istvan?  How much work would this be, do you think?

Maybe we could at least put the mechanisms in, and then convert the code
base slowly?

-> > I'd like to establish some documentation or tests on exactly what
-> > db.seqInfoDict should do.  So far I know that it should contain info
-> > objects with a length attribute, for NLMSAs to use; and an id and
-> > sequence, presumably.  What else?  Ultimately it'd be good to
-> > establish a simple executable test for "proper" seqInfoDict behavior.

[ ... info on seqInfoDict ... ]

OK, I'll write this up in some nice way in the module itself, and then
you can check it over.  I think I understand seqInfoDict now, at any
rate ;)

-> So the executable test you're proposing only needs to look for the  
-> "length" attribute.

[ ... ]

OK.  I'll just put such a statement into the documentation on building
your own seqdb implementation, for now.

-> > I don't like some aspects of the 'ifile' handling in SequenceFileDB.
-> > In particular, I dislike the part where it re-opens the file to
-> > re-read it and then closes 'ifile'; this may not be expected behavior
-> > ('ifile' itself is only used to grab the filename!) and I don't like
-> > closing user-passed-in file handles.  Plus, pulling the filename out
-> > of a file handle disallows StringIO and other "file-like" handles such
-> > as sockets...  Is the ifile mechanism actually used anywhere?
-> 
-> This was used by some coordinator module functionality that is made  
-> obsolete by the much more powerful capabilities of pygr.Data.  We can  
-> certainly get rid of this ifile mechanism.

OK, that will definitely simplify the code -- fantastic!

-> > The 'set_seqtype' function on SequenceDB depends on things being a  
-> > FASTA file
-> > instead of just iterating.  Why??  Can we remove that?

[ ... __iter__ problem on bsddb ... ]

-> Conclusion: we could again get rid of read_fasta_one_line().

OK, great!

-> > Also, 'set_seqtype' returns the sequence type as well as setting it on
-> > the db object.  This behavior isn't used anywhere in the code base.   
-> > Can
-> > we trash it?
-> 
-> Sounds fine to me.

Great!

-> > In fact, it seems like 'set_seqtype' is used only within SequenceDB
-> > and its children.  Why not rename it to be private, i.e. prepend it
-> > with '_'?
-> 
-> Fine by me.  Actually, this seems like a topic ripe for code review.   
-> Python is not that zealous about the whole public vs. private  
-> distinction, and I've accordingly been lax about this.  We could go  
-> through the code and designate all purely internal methods, then  
-> prefix them with _ .

I'll just assume this as part of the module-level code review, then.  It
works well with systematically improving code coverage analysis, as you
can quickly discover where you forgot compensatory _ prefixes when you
have near-100% code coverage...

-> > Should SequenceDB._cacheMax be configurable?  Right now it's  
-> > hardcoded.
-> 
-> Sure, why not.

ok

-> > I don't know what idFilter does in _store_seqlen_dict, and it seems
-> > like it would be easier to just implement that in your own
-> > seqfmt-based reader, anyway.  Can we trash it?  It's not used anywhere
-> > in the code base.
-> 
-> [ ... ] We could get rid of idFilter in _store_seqlen_dict.

Excellent!

-> > The 'Sequence' classes in seqdb (e.g. FileDBSequence) should never be
-> > created directly by a user, right?  Should we make them private by
-> > prepending a '_'?
-> 
-> Throughout pygr we allow users to pass an itemClass to a database  
-> constructor, so the item classes are public.  The _init_subclass  
-> pattern tended to modularize all the backend details in the item  
-> class, making the database class very generic (works with any  
-> itemClass).  Based on your advice, I've reversed that policy, so that  
-> the backend details are on the database class.  But still, I can  
-> easily imagine scenarios in which a user would want to subclass  
-> FileDBSequence, so I don't see these as private.

OK, makes sense.  I was -0 on doing this and you've explained why ;)

-> > The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and
-> > unused in the rest of the code base.  I'd appreciate some guidance on
-> > what they do, and perhaps some tests to get started.  (@CTB issue)
-> 
-> I don't think anyone has used these classes in at least a couple of  
-> years.  We could delete them.

OK, great!

-> > The SQLSequence stuff at the bottom of seqdb.py could probably go in a
-> > different file, like sqlgraph.py.  They don't depend on anything in
-> > seqdb.py.
-> 
-> That seems like a reasonable idea.

I'll try it out & see what happens...

cheers,
--titus
-- 
C. Titus Brown, c...@msu.edu

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: First pass seqdb review FINISHED

Reply via email to