On Thu, Mar 12, 2009 at 03:36:22PM -0700, Christopher Lee wrote:
-> seqInfoDict's values should be objects with attributes. The Pygr
-> Developer's Guide gives a little background on what seqInfoDict is for:
-> "seqInfoDict: a dictionary-like object that for each valid sequence ID
-> returns an object with attributes providing information about that
-> sequence. This allows you, if you wish, to implement an efficient
-> mechanism for retrieving information about a sequence that does not
-> need to retrieve the sequence string itself."
-> I will update this description to say exactly what attributes a
-> seqInfoDict value object must have: the only one that is needed
-> currently is "length", which just gives the sequence length. I
-> believe this attribute is used only by NLMSA, so far.
->
-> Details below.
OK, I have a simple seqdb2-based interface working, attached. (Note,
seqdb2 is going to be renamed to screed as soon as Alex gets back from
break ;)
Note that right now, screen doesn't permit partial retrieval of large
sequences, so there's no point in doing clever strslice stuff. Since
screed is designed around short Solexa-length reads, I don't see this as
a huge drawback. We'll do that in the next version.
Also, the screed database has to have been created by fqdbm or fadbm
first.
I post this mostly for posterity; I'll clean it up & make a more
official post once things have been renamed, and then maybe people can
try it out to see if it offers performance improvements over
SequenceFileDB in certain situations.
cheers,
--titus
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---
import sys
sys.path.insert(0, '/Users/t/dev/pygr')
sys.path.insert(0, '/Users/t/dev/seqdb2/python')
import pygr
import seqdb2
from pygr.seqdb import *
import UserDict
###
class MemSequence(SequenceBase):
def _init_subclass(cls, db, **kwargs):
cls.db = db
db.seqInfoDict = kwargs['seqInfoDict']
_init_subclass = classmethod(_init_subclass)
def __init__(self, db, id):
self.id = id
SequenceBase.__init__(self)
self.seq = db.seqInfoDict[id].seq
class MemSequenceInfo(object):
def __init__(self, seq, length):
self.seq = seq
self.length = length
class ScreedBasedSeqInfoDict(object, UserDict.DictMixin):
def __init__(self, filename):
self.sdb = seqdb2.dbread(filename)
def __getitem__(self, k):
v = self.sdb[k]
info = MemSequenceInfo(v.sequence, len(v.sequence))
return info
def keys(self):
return self.sdb.keys()
###
# build a proxy to the testdb
screed_db = ScreedBasedSeqInfoDict('/Users/t/dev/pygr/ctb/dnaseq.fasta_seqdb2')
memdb = SequenceDB(seqInfoDict=screed_db, itemClass=MemSequence)
for k in memdb:
print k, repr(memdb[k])