Re: problems using sequence slices as index to NLMSA

Kenny Daily Fri, 30 Jan 2009 15:41:09 -0800

That would definitely work! There's nothing about cPickle that I'm
inherently attached to, just need a way to store our data structure.
Thanks!


On Jan 30, 11:05 am, Christopher Lee <[email protected]> wrote:
> Hi Kenny,
> strictly speaking the behavior you reported is not a bug but rather a  
> usage that pygr.Data doesn't support.  The problem is that you're  
> using regular pickling, which doesn't know about pygr.Data.  pygr.Data  
> uses pickling, but regular pickling doesn't know about pygr.Data.  I  
> believe we have a pygr.Data-aware pickling method but I'd have to find  
> that information for you.
>
> This is an issue of modularity of design.  Pygr.Data is designed to  
> work with any class that is picklable; the code for those classes does  
> not need to know anything about pygr.Data or even that it exists.  If  
> you want to be able to pickle your objects with the regular python  
> pickler / cPickler and have that magically invoke pygr.Data, then the  
> code for those object's classes would have to include special  
> __getstate__, __setstate__, __reduce__ methods that specifically  
> invoke pygr.Data methods.  That would break the modularity principle  
> stated above, because each such class would have to be written to work  
> specifically with pygr.Data.
>
> Simple option: instead of using the standard pickle function, use a  
> variant that we supply that saves data in a pygr.Data aware way, and  
> unpickle using a variant method that we supply that again will invoke  
> pygr.Data as needed to reconstitute your data.
>
> Would that work for you?
>
> -- Chris
>
> On Jan 28, 2009, at 11:41 AM, Kenny Daily wrote:
>
>
>
> > OK, so I re-ran my scripts double-checking everywhere that I used the
> > genome from pygr.Data, and it still fails. So I worked up a test case,
> > data and code follows. Before pickling, the _persistent_id is there.
> > After pickling, its not.
>
> > *** START test_genome.fa ***
> >> test
> > ACGCAGACTGACCTACGATCAAATAAGCCGAGCTAGCAAGCCCGCCGTAATCGATCGACGTACGTCGATCGATCGACCCC
> > GAATAGACTCCGATAAGCGTAGTGTATATAGCGCGCCCGTATATAGGATGAGAAGAATATAAAGCTCCTCTCGAGATCGA
> > *** END test_genome.fa ***
>
> > *** START GENOME BUILD CODE ***
> > from pygr import seqdb
> > g = seqdb.BlastDB("/home/baldig/projects/genomics/nonsvn/results/
> > yeast/
> > Ty3/TEST/test_genome.fa")
> > g.__doc__ = "Genome for testing Ty3 read clustering pipeline"
> > import pygr.Data
> > pygr.Data.getResource.addResource("Bio.Seq.Genome.TESTING.test_ty3",
> > g)
> > pygr.Data.save()
> > *** END GENOME BUILD CODE ***
>
> > *** START DATA CREATION CODE ***
> > import cPickle
> > import pygr.Data
> > from pygr.sequence import Sequence
>
> > # mimicks the data structure i'm using
> > class HTS:
> >    def __init__(self, seqs=[]):
> >        self.seqs = seqs
>
> > genome = pygr.Data.getResource("Bio.Seq.Genome.TESTING.test_ty3")
> > foo = Sequence('CCCGCCGTAATCGATCGAC', 'foo')
> > b = genome.blast(foo)
> > s,d,e = b[foo].edges()[0]
>
> > # check that persistent_id
> > d.path.db._persistent_id
>
> > H = HTS(seqs=[d])
> > x = H.seqs[0]
>
> > # check that persistent_id, again
> > x.path.db._persistent_id
>
> > Hs = {1: H}
> > cPickle.dump(Hs, file("test_pickle_seqslice.pkl", "w"))
>
> > *** END DATA CREATION CODE ***
>
> > *** START DATA READ CODE ***
>
> > import cPickle
> > import pygr.Data
>
> > class HTS:
> >    def __init__(self, seqs=[]):
> >        self.seqs = seqs
>
> > d = cPickle.load(file("test_pickle_seqslice.pkl"))
> > x = d[1]
> > s = x.seqs[0]
>
> > # I get <type 'exceptions.AttributeError'>: 'BlastDB' object has no
> > attribute '_persistent_id'
> > s.path.db._persistent_id
>
> > *** END DATA READ CODE ***
>
> > On Jan 28, 3:57 am, Christopher Lee <[email protected]> wrote:
> >> On Jan 28, 2009, at 1:53 PM, Kenny Daily wrote:
>
> >>> OK. These things make sense. However, I think what I'm doing is a
> >>> little more complicated, and I've left out some of the important  
> >>> steps
> >>> that may help explain. First, I'm sure that I'm using the pygr.Data
> >>> object everytime...i.e. genome is always set by:
>
> >>> genome = pygr.Data.getResource("Bio.Seq.Genome.YEAST.sacCer")
>
> >> Kenny, could you check the c.sequence.path.db._persistent_id on the
> >> case from your example that gives the KeyError?  If this attribute is
> >> missing, the data was *not* loaded with a pygr.Data ID.  Let me know
> >> what you find.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: problems using sequence slices as index to NLMSA

Reply via email to