Perfect, I'll re-run my script. On Thu, Sep 3, 2009 at 12:47 PM, Namshin Kim <deepr...@gmail.com> wrote:
> Yes, there is ponAbe2 genome. > > >>> import os > >>> os.environ['WORLDBASEPATH'] = '., > http://biodb2.bioinformatics.ucla.edu:5000' > >>> from pygr import worldbase > >>> ponAbe2 = worldbase.Bio.Seq.Genome.PONAB.ponAbe2(download=True) > INFO downloader.download_unpickler: Beginning download of > http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz to > /data/server/downloadable/pygr/tests/biodb2_update/t/ponAbe2.gz... > INFO downloader.download_monitor: downloaded 100663296 bytes (10.0%)... > > > On Fri, Sep 4, 2009 at 4:15 AM, Paul Rigor (gmail) <paulri...@gmail.com>wrote: > >> Thanks Namshim!!! >> But don't I also have to build the individual genome resources as well? >> In any case, it would be great if the non-existent ponAbe2 genome would be >> made available through XMLRPC as well. >> >> Thanks, >> Paul >> >> >> On Thu, Sep 3, 2009 at 5:07 AM, Namshin Kim <deepr...@gmail.com> wrote: >> >>> Hi Paul, >>> >>> I am now building hg18_multiz44way NLMSA without any problems. Please >>> give me some error message if you still have those problems. You may need to >>> start over after you delete .pygr_data in your writable WORLDBASEPATH. >>> If your WORLDBASEBUILDDIR is not final repository, you can move all NLMSA >>> files into your destination directory. And, update .seqDictP like this: >>> >>> You can open genome using seqdb.SequenceFileDB (should use absolute path) >>> or from worldbase. >>> hg18 = seqdb.SequenceFileDB('hg18') or hg18 = >>> worldbase.Bio.Seq.Genome.HUMAN.hg18() >>> >>> genomeDict = {'hg18':hg18, ...} # supply all 44 genomes >>> genomeUnion = seqdb.PrefixUnionDict(genomeDict) >>> msa = cnestedlist.NLMSA('hg18_multiz44way, genomeUnion, 'r') >>> msa.save_seq_dict() >>> >>> Then, .seqDictP will be updated and you can access without any problems. >>> >>> chr1_slice = msa.seqDict['hg18.chr1'][1000:2000] >>> edges = msa[chr1_slice].edges() >>> >>> -- >>> Namshin Kim >>> >>> >>> >>> >>> On Thu, Sep 3, 2009 at 7:20 AM, Namshin Kim <deepr...@gmail.com> wrote: >>> >>>> Strange... Correct URL will be >>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz The URL >>>> you used does not exist, thus it give 404 error (HTML doc). >>>> Hmm... I never downloaded and built the hg18_multiz44way via XMLRPC. I >>>> will try that... >>>> >>>> Thanks, >>>> Namshin Kim >>>> >>>> >>>> >>>> On Thu, Sep 3, 2009 at 6:54 AM, Paul Rigor (gmail) <paulri...@gmail.com >>>> > wrote: >>>> >>>>> Hi Namshim, >>>>> Downloading the 44way alignment was successful. However, the >>>>> persistend data (.pygrdata) seems to be unworkable. The metabase lists >>>>> Bio.MSA, etc, but it cannot be loaded. >>>>> >>>>> Also, I've attempted to download the genomes from the UCLA metabase, >>>>> but a genome might be corrupt. Specifically, >>>>> >>>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/chromFa.tar.gz >>>>> >>>>> which gives the error message below. In fact, checking the file that >>>>> is downloaded (ponAbe2.tar.gz), is an HTML document! >>>>> >>>>> $ file ponAbe2.tar.gz >>>>> ponAbe2.tar.gz: HTML document text >>>>> >>>>> >>>>> ....[error trace below] >>>>> .... >>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/site-packages/pygr-0.8.0.beta1-py2.6-linux-x86_64.egg/pygr/downloader.pyc >>>>> in do_untar(filepath, mode, newpath, singleFile, **kwargs) >>>>> 105 newpath = filepath + '.out' >>>>> 106 import tarfile >>>>> --> 107 t = tarfile.open(filepath, mode) >>>>> 108 try: >>>>> 109 if singleFile: # extract to a single file >>>>> >>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc >>>>> in open(cls, name, mode, fileobj, bufsize, **kwargs) >>>>> 1662 else: >>>>> 1663 raise CompressionError("unknown compression >>>>> type %r" % comptype) >>>>> -> 1664 return func(name, filemode, fileobj, **kwargs) >>>>> 1665 >>>>> 1666 elif "|" in mode: >>>>> >>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc >>>>> in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs) >>>>> 1713 **kwargs) >>>>> 1714 except IOError: >>>>> -> 1715 raise ReadError("not a gzip file") >>>>> 1716 t._extfileobj = False >>>>> 1717 return t >>>>> >>>>> ReadError: not a gzip file >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Sep 1, 2009 at 9:55 PM, Paul Rigor (gmail) < >>>>> paulri...@gmail.com> wrote: >>>>> >>>>>> Well, we have time, storage and bandwidth =) >>>>>> I'll let you know how it goes? Maybe we can host an XMLRPC mirror >>>>>> someday too. >>>>>> >>>>>> Thanks, >>>>>> Paul >>>>>> >>>>>> >>>>>> On Tue, Sep 1, 2009 at 9:41 PM, Namshin Kim <deepr...@gmail.com>wrote: >>>>>> >>>>>>> Hi Paul, >>>>>>> I just checked the size of hg18_multiz44way and it is 167GB for just >>>>>>> NLMSA. If we consider genome assemblies you may not have, it would be ~ >>>>>>> 250GB. I think it would take a long time to download all files. >>>>>>> >>>>>>> -- >>>>>>> Namshin Kim >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 2, 2009 at 1:33 PM, Paul Rigor (gmail) < >>>>>>> paulri...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Namshin, >>>>>>>> I'm running this over night =) Has anyone successfully pulled and >>>>>>>> used this alignment? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Paul >>>>>>>> >>>>>>>> On Sun, Aug 2, 2009 at 4:40 PM, Namshin Kim <deepr...@gmail.com>wrote: >>>>>>>> >>>>>>>>> Now the downloadable resources are available on biodb2 XMLRPC >>>>>>>>> server. >>>>>>>>> >>>>>>>>> Two ways to build NLMSA. >>>>>>>>> >>>>>>>>> 1. metabase >>>>>>>>> >>>>>>>>> >>> import os >>>>>>>>> >>> os.environ['WORLDBASEPATH'] = '., >>>>>>>>> http://biodb2.bioinformatics.ucla.edu:5000' >>>>>>>>> >>> from pygr import metabase >>>>>>>>> >>> mdb = metabase.MetabaseList() >>>>>>>>> >>> hg18 = mdb('Bio.MSA.UCSC.hg18_multiz44way',download=True) >>>>>>>>> >>>>>>>>> 2. from text files >>>>>>>>> >>>>>>>>> download text files from >>>>>>>>> http://biodb.bioinformatics.ucla.edu/PYGRDATA/ >>>>>>>>> use cnestedlist.textfile_to_binaries('hg18_multiz44way') function >>>>>>>>> to convert from text to binaries >>>>>>>>> >>>>>>>>> If you want to see the script used to add these resources, visit >>>>>>>>> this URL. >>>>>>>>> >>>>>>>>> >>>>>>>>> http://github.com/deepreds/pygr/tree/d7ab9247dcd39b7d474029cb8749a53eb8582968/tests/biodb2_update >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Paul Rigor >>>>>> Graduate Student >>>>>> Institute for Genomics and Bioinformatics >>>>>> Donald Bren School of Information and Computer Sciences >>>>>> University of California, Irvine >>>>>> http://www.paulrigor.net/ >>>>>> http://www.ics.uci.edu/~prigor >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Paul Rigor >>>>> Graduate Student >>>>> Institute for Genomics and Bioinformatics >>>>> Donald Bren School of Information and Computer Sciences >>>>> University of California, Irvine >>>>> http://www.paulrigor.net/ >>>>> http://www.ics.uci.edu/~prigor >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> >> >> -- >> Paul Rigor >> Graduate Student >> Institute for Genomics and Bioinformatics >> Donald Bren School of Information and Computer Sciences >> University of California, Irvine >> http://www.paulrigor.net/ >> http://www.ics.uci.edu/~prigor >> >> >> > > > > -- Paul Rigor Graduate Student Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences University of California, Irvine http://www.paulrigor.net/ http://www.ics.uci.edu/~prigor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---