On Wed, Mar 18, 2009 at 10:51:36AM -0700, Christopher Lee wrote: -> Thanks for all your detailed analysis, Titus! Comments below...
Welcome! -> > The class names in this file still confuse me. Do we have any rules -> > on how to build names for classes in inheritance hierarcy? e.g. -> -> I believe Steve McConnell's books (e.g. Code Complete) cover these -> naming convention issues. We could look to see what he says. -> -> > FileDBSequence and SequenceFileDB -- which one is the sequence db, -> > which one is the sequence?? -> -> I guess I'm just following English, e.g. "sequence database" is a -> database; vs. "database sequence" is a sequence. If someone has a -> better convention, please describe it. So this is more of a pragmatic issue for me than a style question: I just get confused when reading those names in code. At this point I've mentally over-analyzed the situation so I can't think clearly about it any more. I think FileDB_Sequence and Sequence_FileDB would be clearer to me, emphasizing that the adjectival associations are (FileDB)+Sequence and Sequence+(FileDB). But those are ugly names so I don't want to do that. Is this a mental problem unique to me? Or does anyone else get confused? -> > The invert behavior is confusing (__invert__ points to -> > classutil.standard_invert which then redirects to self._inverse, -> > creating it from self._inverseClass if necessary). Can we use -> > descriptors or something "simpler" here? -> -> Please elaborate further; I don't understand what you're suggesting. -> The Python standard defines __invert__ as the invert operator method -> name; I follow that standard. How can a descriptor help us here? It's not __invert__ itself, it's the optimization that constructs __invert__ only when needed. I don't think class SomeClass(...): __invert__ = classutil.standard_invert ... _inverseClass = FooBar is an obvious construct, although I do understand why you defined things that way. How about, __invert__ = classutil.lazy_create(FooBar) instead, for example? Then FooBar would be obviously and directly connected to __invert__. -> > Have we decided anything about logging, as Istvan established in his -> > branch? I'd like to get rid of explicit writes to sys.stderr, in -> > particular. -> -> Seems like a worthy goal. Has anyone entered an issue for this in the -> tracker? I'm not aware that anyone has explicitly proposed this for -> 0.8. Istvan? How much work would this be, do you think? Maybe we could at least put the mechanisms in, and then convert the code base slowly? -> > I'd like to establish some documentation or tests on exactly what -> > db.seqInfoDict should do. So far I know that it should contain info -> > objects with a length attribute, for NLMSAs to use; and an id and -> > sequence, presumably. What else? Ultimately it'd be good to -> > establish a simple executable test for "proper" seqInfoDict behavior. [ ... info on seqInfoDict ... ] OK, I'll write this up in some nice way in the module itself, and then you can check it over. I think I understand seqInfoDict now, at any rate ;) -> So the executable test you're proposing only needs to look for the -> "length" attribute. [ ... ] OK. I'll just put such a statement into the documentation on building your own seqdb implementation, for now. -> > I don't like some aspects of the 'ifile' handling in SequenceFileDB. -> > In particular, I dislike the part where it re-opens the file to -> > re-read it and then closes 'ifile'; this may not be expected behavior -> > ('ifile' itself is only used to grab the filename!) and I don't like -> > closing user-passed-in file handles. Plus, pulling the filename out -> > of a file handle disallows StringIO and other "file-like" handles such -> > as sockets... Is the ifile mechanism actually used anywhere? -> -> This was used by some coordinator module functionality that is made -> obsolete by the much more powerful capabilities of pygr.Data. We can -> certainly get rid of this ifile mechanism. OK, that will definitely simplify the code -- fantastic! -> > The 'set_seqtype' function on SequenceDB depends on things being a -> > FASTA file -> > instead of just iterating. Why?? Can we remove that? [ ... __iter__ problem on bsddb ... ] -> Conclusion: we could again get rid of read_fasta_one_line(). OK, great! -> > Also, 'set_seqtype' returns the sequence type as well as setting it on -> > the db object. This behavior isn't used anywhere in the code base. -> > Can -> > we trash it? -> -> Sounds fine to me. Great! -> > In fact, it seems like 'set_seqtype' is used only within SequenceDB -> > and its children. Why not rename it to be private, i.e. prepend it -> > with '_'? -> -> Fine by me. Actually, this seems like a topic ripe for code review. -> Python is not that zealous about the whole public vs. private -> distinction, and I've accordingly been lax about this. We could go -> through the code and designate all purely internal methods, then -> prefix them with _ . I'll just assume this as part of the module-level code review, then. It works well with systematically improving code coverage analysis, as you can quickly discover where you forgot compensatory _ prefixes when you have near-100% code coverage... -> > Should SequenceDB._cacheMax be configurable? Right now it's -> > hardcoded. -> -> Sure, why not. ok -> > I don't know what idFilter does in _store_seqlen_dict, and it seems -> > like it would be easier to just implement that in your own -> > seqfmt-based reader, anyway. Can we trash it? It's not used anywhere -> > in the code base. -> -> [ ... ] We could get rid of idFilter in _store_seqlen_dict. Excellent! -> > The 'Sequence' classes in seqdb (e.g. FileDBSequence) should never be -> > created directly by a user, right? Should we make them private by -> > prepending a '_'? -> -> Throughout pygr we allow users to pass an itemClass to a database -> constructor, so the item classes are public. The _init_subclass -> pattern tended to modularize all the backend details in the item -> class, making the database class very generic (works with any -> itemClass). Based on your advice, I've reversed that policy, so that -> the backend details are on the database class. But still, I can -> easily imagine scenarios in which a user would want to subclass -> FileDBSequence, so I don't see these as private. OK, makes sense. I was -0 on doing this and you've explained why ;) -> > The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and -> > unused in the rest of the code base. I'd appreciate some guidance on -> > what they do, and perhaps some tests to get started. (@CTB issue) -> -> I don't think anyone has used these classes in at least a couple of -> years. We could delete them. OK, great! -> > The SQLSequence stuff at the bottom of seqdb.py could probably go in a -> > different file, like sqlgraph.py. They don't depend on anything in -> > seqdb.py. -> -> That seems like a reasonable idea. I'll try it out & see what happens... cheers, --titus -- C. Titus Brown, c...@msu.edu --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---