Hi Titus,
seqInfoDict's values should be objects with attributes. The Pygr
Developer's Guide gives a little background on what seqInfoDict is for:
"seqInfoDict: a dictionary-like object that for each valid sequence ID
returns an object with attributes providing information about that
sequence. This allows you, if you wish, to implement an efficient
mechanism for retrieving information about a sequence that does not
need to retrieve the sequence string itself."
I will update this description to say exactly what attributes a
seqInfoDict value object must have: the only one that is needed
currently is "length", which just gives the sequence length. I
believe this attribute is used only by NLMSA, so far.
Details below.
-- Chris
On Mar 12, 2009, at 2:15 PM, C. Titus Brown wrote:
> I thought I'd start by making my own in-memory seq db class; here's
> what
> I've got so far:
>
> ---
> class MemSequence(SequenceBase):
> def _init_subclass(cls, db, **kwargs):
> cls.db = db
> db.seqInfoDict = kwargs['theDict']
> _init_subclass = classmethod(_init_subclass)
wouldn't it be clearer to make seqInfoDict an explicit named argument?
class MemSequence(SequenceBase):
def _init_subclass(cls, db, seqInfoDict, **kwargs):
cls.db = db
db.seqInfoDict = seqInfoDict
_init_subclass = classmethod(_init_subclass)
>
>
> def __init__(self, db, id):
> self.id = id
> SequenceBase.__init__(self)
> self.seq = db.seqInfoDict[id]
The SequenceDB model assumes that there is some back-end that stores
the sequences, so when the user requests a sequence by ID, SequenceDB
will automatically construct sequence objects (using the itemClass)
that should initialize themselves from that back-end storage.
Your MemSequence is using seqInfoDict as that back-end (for the
moment, anyway), which is fine. However, the values in the
seqInfoDict are supposed to be objects with named attributes,
including at least "length" (NLMSA requires this from any sequence DB
it works with). So you could do something like this:
class MySeqInfo(object):
pass
seqInfoA = MySeqInfo()
seqInfoA.seq = 'ATCG'
seqInfoA.length = len(seqInfoA.seq)
seqInfoB = MySeqInfo()
seqInfoB.seq = 'ATGGCAT'
seqInfoB.length = len(seqInfoB.seq)
d = dict(a=seqInfoA, b=seqInfoB)
memdb = SequenceDB(itemClass=MemSequence, seqInfoDict=d)
Note that you don't have to subclass SequenceDB, you can just pass it
your desired itemClass
You'd also change your MemSequence.__init__ line accordingly:
self.seq = db.seqInfoDict[id].seq
> In essence, all that's happening is that the dictionary 'd' becomes
> db.seqInfoDict; the rest is just machinery surrounding that.
>
> So, my question to you is, is this a reasonable start? I realize that
> MemSequenceDB
Yes, this is a reasonable start, with the corrections I listed above.
>
> - isn't pickleable; not sure exactly what I'd need to do to implement
> that;
All you would need is to add keys to MemSequenceDB._pickleAttrs
representing whatever additional attribute(s) need to be saved for
pickling your class. See SequenceFileDB for an example.
>
> - doesn't do any caching or anything clever with strslice;
Just copy FileDBSequence.strslice() to get the caching behavior for
free (it is really provided by SequenceDB), and modify it for
accessing your back-end storage.
>
>
> So, my next step is to provide a simple dict-like interface to Alexs
> code that can then replace 'd'. Anything else I need to note in order
> to proceed with linking this to an actual on-disk file of sequences?
Since your back-end storage will just be a file, I think you should
just subclass from our SequenceFileDB (instead of SequenceDB). The
only thing that SequenceFileDB adds is proper support for a "filepath"
argument (including pickling). In fact, if you don't need to add
extra pickle attributes, you won't need to subclass it at all. Just
use it as-is, passing it your itemClass as an argument to the
constructor.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---