Hi Paul, I am now building hg18_multiz44way NLMSA without any problems. Please give me some error message if you still have those problems. You may need to start over after you delete .pygr_data in your writable WORLDBASEPATH. If your WORLDBASEBUILDDIR is not final repository, you can move all NLMSA files into your destination directory. And, update .seqDictP like this:
You can open genome using seqdb.SequenceFileDB (should use absolute path) or from worldbase. hg18 = seqdb.SequenceFileDB('hg18') or hg18 = worldbase.Bio.Seq.Genome.HUMAN.hg18() genomeDict = {'hg18':hg18, ...} # supply all 44 genomes genomeUnion = seqdb.PrefixUnionDict(genomeDict) msa = cnestedlist.NLMSA('hg18_multiz44way, genomeUnion, 'r') msa.save_seq_dict() Then, .seqDictP will be updated and you can access without any problems. chr1_slice = msa.seqDict['hg18.chr1'][1000:2000] edges = msa[chr1_slice].edges() -- Namshin Kim On Thu, Sep 3, 2009 at 7:20 AM, Namshin Kim <deepr...@gmail.com> wrote: > Strange... Correct URL will be > http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz The URL > you used does not exist, thus it give 404 error (HTML doc). > Hmm... I never downloaded and built the hg18_multiz44way via XMLRPC. I will > try that... > > Thanks, > Namshin Kim > > > > On Thu, Sep 3, 2009 at 6:54 AM, Paul Rigor (gmail) <paulri...@gmail.com>wrote: > >> Hi Namshim, >> Downloading the 44way alignment was successful. However, the persistend >> data (.pygrdata) seems to be unworkable. The metabase lists Bio.MSA, etc, >> but it cannot be loaded. >> >> Also, I've attempted to download the genomes from the UCLA metabase, but a >> genome might be corrupt. Specifically, >> >> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/chromFa.tar.gz >> >> which gives the error message below. In fact, checking the file that is >> downloaded (ponAbe2.tar.gz), is an HTML document! >> >> $ file ponAbe2.tar.gz >> ponAbe2.tar.gz: HTML document text >> >> >> ....[error trace below] >> .... >> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/site-packages/pygr-0.8.0.beta1-py2.6-linux-x86_64.egg/pygr/downloader.pyc >> in do_untar(filepath, mode, newpath, singleFile, **kwargs) >> 105 newpath = filepath + '.out' >> 106 import tarfile >> --> 107 t = tarfile.open(filepath, mode) >> 108 try: >> 109 if singleFile: # extract to a single file >> >> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc >> in open(cls, name, mode, fileobj, bufsize, **kwargs) >> 1662 else: >> 1663 raise CompressionError("unknown compression type >> %r" % comptype) >> -> 1664 return func(name, filemode, fileobj, **kwargs) >> 1665 >> 1666 elif "|" in mode: >> >> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc >> in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs) >> 1713 **kwargs) >> 1714 except IOError: >> -> 1715 raise ReadError("not a gzip file") >> 1716 t._extfileobj = False >> 1717 return t >> >> ReadError: not a gzip file >> >> >> >> >> On Tue, Sep 1, 2009 at 9:55 PM, Paul Rigor (gmail) >> <paulri...@gmail.com>wrote: >> >>> Well, we have time, storage and bandwidth =) >>> I'll let you know how it goes? Maybe we can host an XMLRPC mirror >>> someday too. >>> >>> Thanks, >>> Paul >>> >>> >>> On Tue, Sep 1, 2009 at 9:41 PM, Namshin Kim <deepr...@gmail.com> wrote: >>> >>>> Hi Paul, >>>> I just checked the size of hg18_multiz44way and it is 167GB for just >>>> NLMSA. If we consider genome assemblies you may not have, it would be ~ >>>> 250GB. I think it would take a long time to download all files. >>>> >>>> -- >>>> Namshin Kim >>>> >>>> >>>> >>>> On Wed, Sep 2, 2009 at 1:33 PM, Paul Rigor (gmail) <paulri...@gmail.com >>>> > wrote: >>>> >>>>> >>>>> Hi Namshin, >>>>> I'm running this over night =) Has anyone successfully pulled and used >>>>> this alignment? >>>>> >>>>> Thanks, >>>>> Paul >>>>> >>>>> On Sun, Aug 2, 2009 at 4:40 PM, Namshin Kim <deepr...@gmail.com>wrote: >>>>> >>>>>> Now the downloadable resources are available on biodb2 XMLRPC server. >>>>>> >>>>>> Two ways to build NLMSA. >>>>>> >>>>>> 1. metabase >>>>>> >>>>>> >>> import os >>>>>> >>> os.environ['WORLDBASEPATH'] = '., >>>>>> http://biodb2.bioinformatics.ucla.edu:5000' >>>>>> >>> from pygr import metabase >>>>>> >>> mdb = metabase.MetabaseList() >>>>>> >>> hg18 = mdb('Bio.MSA.UCSC.hg18_multiz44way',download=True) >>>>>> >>>>>> 2. from text files >>>>>> >>>>>> download text files from >>>>>> http://biodb.bioinformatics.ucla.edu/PYGRDATA/ >>>>>> use cnestedlist.textfile_to_binaries('hg18_multiz44way') function to >>>>>> convert from text to binaries >>>>>> >>>>>> If you want to see the script used to add these resources, visit this >>>>>> URL. >>>>>> >>>>>> >>>>>> http://github.com/deepreds/pygr/tree/d7ab9247dcd39b7d474029cb8749a53eb8582968/tests/biodb2_update >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Paul Rigor >>> Graduate Student >>> Institute for Genomics and Bioinformatics >>> Donald Bren School of Information and Computer Sciences >>> University of California, Irvine >>> http://www.paulrigor.net/ >>> http://www.ics.uci.edu/~prigor >>> >> >> >> >> -- >> Paul Rigor >> Graduate Student >> Institute for Genomics and Bioinformatics >> Donald Bren School of Information and Computer Sciences >> University of California, Irvine >> http://www.paulrigor.net/ >> http://www.ics.uci.edu/~prigor >> >> >> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---