Titus requested some usage examples, so here goes:

First, I'll summarize the new BlastMapping model that in pygr 0.8  
replaces the old BlastDB model.  Whereas BlastDB combined the  
functions of a sequence database and blast search functionality,  
BlastMapping separates the blast functionality completely from the  
sequence database functionality.  (BlastDB is still available for  
backwards compatibility).  Another class, MegablastMapping, provides  
the exact same interface as BlastMapping, but performs repeat-masked  
megablast nucleotide searches instead of standard BLAST.

BASIC USAGE: create a BlastMapping object for any seqDB that you want  
to search using BLAST. This BlastMapping object acts like a Pygr graph  
(i.e. dict) that takes any sequence object as a source node (key) and  
maps it to homologous sequence intervals from seqDB as target nodes,  
with edge information about each alignment, following the generic  
pattern blastmap[src][target] = edgeInfo.  You can use it either in  
the mapping style:

blastmap = BlastMapping(seqDB) # create the mapping object
# myquery can be any Pygr sequence interval object...
for src,target,edge in blastmap[myquery].edges(): # run blast query
     # print the aligned intervals or whatever you want to do...
     print '%s %s\n%s %s\n' %(src,repr(src),target,repr(target))

Or you can use it as a callable, so you can pass search parameters  
like expmax etc.  This also lets you save many search results into one  
NLMSA...

for myquery in lotsaQueries:
     blastmap(myquery, expmax=1e-10, al=myNLMSA)
myNLMSA.build() # build alignment indexes when done saving results

Since all this uses NLMSA to store the results, any of the NLMSASlice  
group-by operators can be applied as usual.

Just like BlastDB, BlastMapping determines which flavor of BLAST  
(blastp or blastn) to run based on the sequence type (nucleotide vs.  
protein) of the query and database.  Currently, tblastn and blastx  
won't work correctly, because of the translation issues outlined above.

The new proposal would make tblastn work exactly the same as blastp  
and blastn (i.e. as in the example code above), but with one extra  
feature: the aligned target intervals (which look like protein  
sequence intervals, even though the target database is nucleotide)  
would have a "sequence" attribute which yields the associated  
nucleotide sequence interval from the target sequence database, e.g.  
to print the target *nucleotide sequence coordinates* next to the  
target *protein sequence string*, you would just change the code  
trivially:

for src,target,edge in blastmap[myquery].edges():
     print '%s %s\n%s %s %2.1f\n' %(src,repr(src),target,
                                    repr(target.sequence),
                                    100.*edge.pIdentity())

This example also prints the percent identity based of the *protein*  
sequence of target vs. src.

Explanation: the target sequence interval is not an interval of seqDB  
of course, because tblastn translates the subject nucleotide sequence  
into a protein sequence.  target represents that protein sequence  
interval; you can slice it, extract its protein sequence string (by  
str()) etc. in all the usual ways.  But it is actually an Annotation  
sequence object, bound to a real sequence object, namely its  
associated interval of nucleotide sequence from seqDB, which as always  
with an annotation, you can obtain by requesting the "sequence"  
attribute.  This takes advantage of the fact that in Pygr an  
annotation is actually a subclass of sequence (SeqPath), so all the  
sequence operation methods come along for free...

I'll address blastx / tblastx in a separate posting.

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to