[pygr] Re: general pygr questions and help with error

Brent Pedersen Mon, 30 Mar 2009 17:54:55 -0700

On Mon, Mar 30, 2009 at 2:40 PM, Christopher Lee <l...@chem.ucla.edu> wrote:
>
> Hi Brent,
> the problem is that you are querying specifically with the union_seq
> that you constructed afresh in this script, which did not come from
> pygr.Data.  That's why it works when you use the msa constructed
> afresh from union_seq, but not when you use the msa from pygr.Data.
> The thing that obscured this was that the printing of the proper error
> message was screwed up by our recent seqdb_review.  I fixed this in
> the latest master branch in the public repository.
>
> The simple fix is to change is to change your line:
>
> for item, edge in msa[union_seq['sorghum.10'][1:40400]].items():
>
> to
>
> for item, edge in msa[msa.seqDict['sorghum.10'][1:40400]].items():
>
> In the case where msa was loaded from pygr.Data, msa.seqDict will be
> the object loaded from pygr.Data.  In the case where msa was
> constructed afresh from union_seq etc., msa.seqDict will be union_seq
> (just as you assigned it).
>
> Does this make sense to you, or do you think that users will find this
> confusing?
>
> FYI, pygr.Data (and pygr in general) does not attempt to analyze
> whether different objects (created on different occasions potentially
> by different users) are actually the same underlying data.  You might
> think that because you opened "the same file" on two occasions it
> should consider those two different objects to be equivalent.  Of
> course, all sorts of things might have changed in between different
> file open operations, so the only reliable way to test equivalence
> would be to compute a md5 hash or equivalent on the complete data
> represented by that object.  We've avoided such potentially
> performance-killing approaches in favor of the KISS approach of
> pygr.Data: if the user saved the object as name X, within a single
> interpreter session every request for name X is guaranteed to get the
> same object (unless you specifically turn off this behavior by using
> pygr.Data.clear_cache() to force future requests to reload...).
>
> Again, does this make sense to you, or do you think users will find
> this confusing?
>
> Thanks for reporting and analyzing this problem!!!
>
> -- Chris
>
> >
>


hi chris, yes, this makes sense to me ... now that you've explained
it. thanks much.
the improving docs and tests will help too, i'm sure.


so i changed it so that one script only creates that dataset and saves
it to pygr.Data.
then load/use pygr.Data in a separate script. as that seems to be the
pygr-ish way to
do things.

now i have another problem. all the blast intervals are mapped, and i
put the genes in an annotation map and
an MSA for each organism, and i have something like below where i have
simplified to organisms 'A' and 'B'.
and where i want to find all blast hits for each gene in A and find
the corresponding genes in B that overlap those blast
hits. that script minus import pasted below.

all goes well until the 2nd to last line. where i get an error:
KeyError: "no key 'b11' in database <SequenceFileDB
'/home/bpederse/work/learn_pygr/data/b.fasta'>"

where 'b11' is the name of the fake gene in the variable b_gene. but
apparently, it's looking for the keys in b.fasta which
has the chromsomal fasta headers of 'chr1' ... 'chr3'

i made a tar ball of my learning project here:
http://128.32.8.29/tinker/learn_pygr.tgz
just run load_simple_msa.py, then simple_msa.py <- where the error occurs.


thanks again for any pointers.
-brent

#################################

{{{
msa = pygr.Data.Bio.ABMSA()
a_gene_msa = pygr.Data.Bio.AGeneMSA()
b_gene_msa = pygr.Data.Bio.BGeneMSA()
a_anno = pygr.Data.Bio.AAnno()
b_anno = pygr.Data.Bio.BAnno()


for name, gene in a_anno.iteritems():

    # find all blast hits to this gene.
    found = msa[gene.sequence]
    if not found: continue

    print "hits to %s:" % name
    for bslice, edge in found.iteritems():
        # all b genes that hit this blast
        print bslice.id, bslice.start, bslice.stop

        bgslice = b_gene_msa.seqDict[bslice.id][bslice.start:bslice.stop]
        bgene_slice = b_gene_msa[bgslice]
        print bgene_slice.seq
        for bgene in bgene_slice.items():
            print bgene

}}}

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: general pygr questions and help with error

Reply via email to