[pygr] Re: Some questions on seqdb

C. Titus Brown Thu, 12 Mar 2009 20:44:41 -0700

On Thu, Mar 12, 2009 at 03:36:22PM -0700, Christopher Lee wrote:
-> seqInfoDict's values should be objects with attributes.  The Pygr  
-> Developer's Guide gives a little background on what seqInfoDict is for:
-> "seqInfoDict: a dictionary-like object that for each valid sequence ID  
-> returns an object with attributes providing information about that  
-> sequence. This allows you, if you wish, to implement an efficient  
-> mechanism for retrieving information about a sequence that does not  
-> need to retrieve the sequence string itself."
-> I will update this description to say exactly what attributes a  
-> seqInfoDict value object must have: the only one that is needed  
-> currently is "length", which just gives the sequence length.  I  
-> believe this attribute is used only by NLMSA, so far.
-> 
-> Details below.


OK, I have a simple seqdb2-based interface working, attached.  (Note,
seqdb2 is going to be renamed to screed as soon as Alex gets back from
break ;)

Note that right now, screen doesn't permit partial retrieval of large
sequences, so there's no point in doing clever strslice stuff.  Since
screed is designed around short Solexa-length reads, I don't see this as
a huge drawback.  We'll do that in the next version.

Also, the screed database has to have been created by fqdbm or fadbm
first.

I post this mostly for posterity; I'll clean it up & make a more
official post once things have been renamed, and then maybe people can
try it out to see if it offers performance improvements over
SequenceFileDB in certain situations.

cheers,
--titus

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

import sys
sys.path.insert(0, '/Users/t/dev/pygr')
sys.path.insert(0, '/Users/t/dev/seqdb2/python')
import pygr
import seqdb2

from pygr.seqdb import *

import UserDict

###

class MemSequence(SequenceBase):
    def _init_subclass(cls, db, **kwargs):
        cls.db = db
        db.seqInfoDict = kwargs['seqInfoDict']
    _init_subclass = classmethod(_init_subclass)

    def __init__(self, db, id):
        self.id = id
        SequenceBase.__init__(self)
        self.seq = db.seqInfoDict[id].seq

class MemSequenceInfo(object):
    def __init__(self, seq, length):
        self.seq = seq
        self.length = length

class ScreedBasedSeqInfoDict(object, UserDict.DictMixin):
    def __init__(self, filename):
        self.sdb = seqdb2.dbread(filename)

    def __getitem__(self, k):
        v = self.sdb[k]
        info = MemSequenceInfo(v.sequence, len(v.sequence))
        return info

    def keys(self):
        return self.sdb.keys()
        
###

# build a proxy to the testdb
screed_db = ScreedBasedSeqInfoDict('/Users/t/dev/pygr/ctb/dnaseq.fasta_seqdb2')

memdb = SequenceDB(seqInfoDict=screed_db, itemClass=MemSequence)

for k in memdb:
    print k, repr(memdb[k])

[pygr] Re: Some questions on seqdb

Reply via email to