On Mon, Mar 30, 2009 at 2:40 PM, Christopher Lee <l...@chem.ucla.edu> wrote: > > Hi Brent, > the problem is that you are querying specifically with the union_seq > that you constructed afresh in this script, which did not come from > pygr.Data. That's why it works when you use the msa constructed > afresh from union_seq, but not when you use the msa from pygr.Data. > The thing that obscured this was that the printing of the proper error > message was screwed up by our recent seqdb_review. I fixed this in > the latest master branch in the public repository. > > The simple fix is to change is to change your line: > > for item, edge in msa[union_seq['sorghum.10'][1:40400]].items(): > > to > > for item, edge in msa[msa.seqDict['sorghum.10'][1:40400]].items(): > > In the case where msa was loaded from pygr.Data, msa.seqDict will be > the object loaded from pygr.Data. In the case where msa was > constructed afresh from union_seq etc., msa.seqDict will be union_seq > (just as you assigned it). > > Does this make sense to you, or do you think that users will find this > confusing? > > FYI, pygr.Data (and pygr in general) does not attempt to analyze > whether different objects (created on different occasions potentially > by different users) are actually the same underlying data. You might > think that because you opened "the same file" on two occasions it > should consider those two different objects to be equivalent. Of > course, all sorts of things might have changed in between different > file open operations, so the only reliable way to test equivalence > would be to compute a md5 hash or equivalent on the complete data > represented by that object. We've avoided such potentially > performance-killing approaches in favor of the KISS approach of > pygr.Data: if the user saved the object as name X, within a single > interpreter session every request for name X is guaranteed to get the > same object (unless you specifically turn off this behavior by using > pygr.Data.clear_cache() to force future requests to reload...). > > Again, does this make sense to you, or do you think users will find > this confusing? > > Thanks for reporting and analyzing this problem!!! > > -- Chris > > > >
hi chris, yes, this makes sense to me ... now that you've explained it. thanks much. the improving docs and tests will help too, i'm sure. so i changed it so that one script only creates that dataset and saves it to pygr.Data. then load/use pygr.Data in a separate script. as that seems to be the pygr-ish way to do things. now i have another problem. all the blast intervals are mapped, and i put the genes in an annotation map and an MSA for each organism, and i have something like below where i have simplified to organisms 'A' and 'B'. and where i want to find all blast hits for each gene in A and find the corresponding genes in B that overlap those blast hits. that script minus import pasted below. all goes well until the 2nd to last line. where i get an error: KeyError: "no key 'b11' in database <SequenceFileDB '/home/bpederse/work/learn_pygr/data/b.fasta'>" where 'b11' is the name of the fake gene in the variable b_gene. but apparently, it's looking for the keys in b.fasta which has the chromsomal fasta headers of 'chr1' ... 'chr3' i made a tar ball of my learning project here: http://128.32.8.29/tinker/learn_pygr.tgz just run load_simple_msa.py, then simple_msa.py <- where the error occurs. thanks again for any pointers. -brent ################################# {{{ msa = pygr.Data.Bio.ABMSA() a_gene_msa = pygr.Data.Bio.AGeneMSA() b_gene_msa = pygr.Data.Bio.BGeneMSA() a_anno = pygr.Data.Bio.AAnno() b_anno = pygr.Data.Bio.BAnno() for name, gene in a_anno.iteritems(): # find all blast hits to this gene. found = msa[gene.sequence] if not found: continue print "hits to %s:" % name for bslice, edge in found.iteritems(): # all b genes that hit this blast print bslice.id, bslice.start, bslice.stop bgslice = b_gene_msa.seqDict[bslice.id][bslice.start:bslice.stop] bgene_slice = b_gene_msa[bgslice] print bgene_slice.seq for bgene in bgene_slice.items(): print bgene }}} --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---