[pygr] Re: multiz44way?

Paul Rigor (gmail) Thu, 03 Sep 2009 12:49:16 -0700

Perfect, I'll re-run my script.

On Thu, Sep 3, 2009 at 12:47 PM, Namshin Kim <deepr...@gmail.com> wrote:


> Yes, there is ponAbe2 genome.
>
> >>> import os
> >>> os.environ['WORLDBASEPATH'] = '.,
> http://biodb2.bioinformatics.ucla.edu:5000'
> >>> from pygr import worldbase
> >>> ponAbe2 = worldbase.Bio.Seq.Genome.PONAB.ponAbe2(download=True)
> INFO downloader.download_unpickler: Beginning download of
> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz to
> /data/server/downloadable/pygr/tests/biodb2_update/t/ponAbe2.gz...
> INFO downloader.download_monitor: downloaded 100663296 bytes (10.0%)...
>
>
> On Fri, Sep 4, 2009 at 4:15 AM, Paul Rigor (gmail) <paulri...@gmail.com>wrote:
>
>> Thanks Namshim!!!
>> But don't I also have to build the individual genome resources as well?
>>  In any case, it would be great if the non-existent ponAbe2 genome would be
>> made available through XMLRPC as well.
>>
>> Thanks,
>> Paul
>>
>>
>> On Thu, Sep 3, 2009 at 5:07 AM, Namshin Kim <deepr...@gmail.com> wrote:
>>
>>> Hi Paul,
>>>
>>> I am now building hg18_multiz44way NLMSA without any problems. Please
>>> give me some error message if you still have those problems. You may need to
>>> start over after you delete .pygr_data in your writable WORLDBASEPATH.
>>> If your WORLDBASEBUILDDIR is not final repository, you can move all NLMSA
>>> files into your destination directory. And, update .seqDictP like this:
>>>
>>> You can open genome using seqdb.SequenceFileDB (should use absolute path)
>>> or from worldbase.
>>> hg18 = seqdb.SequenceFileDB('hg18') or hg18 =
>>> worldbase.Bio.Seq.Genome.HUMAN.hg18()
>>>
>>> genomeDict = {'hg18':hg18, ...} # supply all 44 genomes
>>> genomeUnion = seqdb.PrefixUnionDict(genomeDict)
>>> msa = cnestedlist.NLMSA('hg18_multiz44way, genomeUnion, 'r')
>>> msa.save_seq_dict()
>>>
>>> Then, .seqDictP will be updated and you can access without any problems.
>>>
>>>  chr1_slice = msa.seqDict['hg18.chr1'][1000:2000]
>>> edges = msa[chr1_slice].edges()
>>>
>>> --
>>> Namshin Kim
>>>
>>>
>>>
>>>
>>> On Thu, Sep 3, 2009 at 7:20 AM, Namshin Kim <deepr...@gmail.com> wrote:
>>>
>>>> Strange... Correct URL will be
>>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz The URL
>>>> you used does not exist, thus it give 404 error (HTML doc).
>>>> Hmm... I never downloaded and built the hg18_multiz44way via XMLRPC. I
>>>> will try that...
>>>>
>>>> Thanks,
>>>> Namshin Kim
>>>>
>>>>
>>>>
>>>> On Thu, Sep 3, 2009 at 6:54 AM, Paul Rigor (gmail) <paulri...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Namshim,
>>>>> Downloading the 44way alignment was successful.  However, the
>>>>> persistend data (.pygrdata) seems to be unworkable.  The metabase lists
>>>>> Bio.MSA, etc, but it cannot be loaded.
>>>>>
>>>>> Also, I've attempted to download the genomes from the UCLA metabase,
>>>>> but a genome might be corrupt.  Specifically,
>>>>>
>>>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/chromFa.tar.gz
>>>>>
>>>>> which gives the error message below.  In fact, checking the file that
>>>>> is downloaded (ponAbe2.tar.gz), is an HTML document!
>>>>>
>>>>> $ file ponAbe2.tar.gz
>>>>> ponAbe2.tar.gz: HTML document text
>>>>>
>>>>>
>>>>> ....[error trace below]
>>>>> ....
>>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/site-packages/pygr-0.8.0.beta1-py2.6-linux-x86_64.egg/pygr/downloader.pyc
>>>>> in do_untar(filepath, mode, newpath, singleFile, **kwargs)
>>>>>     105         newpath = filepath + '.out'
>>>>>     106     import tarfile
>>>>> --> 107     t = tarfile.open(filepath, mode)
>>>>>     108     try:
>>>>>     109         if singleFile: # extract to a single file
>>>>>
>>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>>>>> in open(cls, name, mode, fileobj, bufsize, **kwargs)
>>>>>    1662             else:
>>>>>    1663                 raise CompressionError("unknown compression
>>>>> type %r" % comptype)
>>>>> -> 1664             return func(name, filemode, fileobj, **kwargs)
>>>>>    1665
>>>>>    1666         elif "|" in mode:
>>>>>
>>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>>>>> in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
>>>>>    1713                 **kwargs)
>>>>>    1714         except IOError:
>>>>> -> 1715             raise ReadError("not a gzip file")
>>>>>    1716         t._extfileobj = False
>>>>>    1717         return t
>>>>>
>>>>> ReadError: not a gzip file
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 1, 2009 at 9:55 PM, Paul Rigor (gmail) <
>>>>> paulri...@gmail.com> wrote:
>>>>>
>>>>>> Well, we have time, storage and bandwidth =)
>>>>>> I'll let you know how it goes?  Maybe we can host an XMLRPC mirror
>>>>>> someday too.
>>>>>>
>>>>>> Thanks,
>>>>>> Paul
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 1, 2009 at 9:41 PM, Namshin Kim <deepr...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Paul,
>>>>>>> I just checked the size of hg18_multiz44way and it is 167GB for just
>>>>>>> NLMSA. If we consider genome assemblies you may not have, it would be ~
>>>>>>> 250GB. I think it would take a long time to download all files.
>>>>>>>
>>>>>>> --
>>>>>>> Namshin Kim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 2, 2009 at 1:33 PM, Paul Rigor (gmail) <
>>>>>>> paulri...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Namshin,
>>>>>>>> I'm running this over night =)  Has anyone successfully pulled and
>>>>>>>> used this alignment?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Paul
>>>>>>>>
>>>>>>>> On Sun, Aug 2, 2009 at 4:40 PM, Namshin Kim <deepr...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Now the downloadable resources are available on biodb2 XMLRPC
>>>>>>>>> server.
>>>>>>>>>
>>>>>>>>> Two ways to build NLMSA.
>>>>>>>>>
>>>>>>>>> 1. metabase
>>>>>>>>>
>>>>>>>>> >>> import os
>>>>>>>>> >>> os.environ['WORLDBASEPATH'] = '.,
>>>>>>>>> http://biodb2.bioinformatics.ucla.edu:5000'
>>>>>>>>> >>> from pygr import metabase
>>>>>>>>> >>> mdb = metabase.MetabaseList()
>>>>>>>>> >>> hg18 = mdb('Bio.MSA.UCSC.hg18_multiz44way',download=True)
>>>>>>>>>
>>>>>>>>> 2. from text files
>>>>>>>>>
>>>>>>>>> download text files from
>>>>>>>>> http://biodb.bioinformatics.ucla.edu/PYGRDATA/
>>>>>>>>> use cnestedlist.textfile_to_binaries('hg18_multiz44way') function
>>>>>>>>> to convert from text to binaries
>>>>>>>>>
>>>>>>>>> If you want to see the script used to add these resources, visit
>>>>>>>>> this URL.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://github.com/deepreds/pygr/tree/d7ab9247dcd39b7d474029cb8749a53eb8582968/tests/biodb2_update
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Paul Rigor
>>>>>> Graduate Student
>>>>>> Institute for Genomics and Bioinformatics
>>>>>> Donald Bren School of Information and Computer Sciences
>>>>>> University of California, Irvine
>>>>>> http://www.paulrigor.net/
>>>>>> http://www.ics.uci.edu/~prigor
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paul Rigor
>>>>> Graduate Student
>>>>> Institute for Genomics and Bioinformatics
>>>>> Donald Bren School of Information and Computer Sciences
>>>>> University of California, Irvine
>>>>> http://www.paulrigor.net/
>>>>> http://www.ics.uci.edu/~prigor
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Paul Rigor
>> Graduate Student
>> Institute for Genomics and Bioinformatics
>> Donald Bren School of Information and Computer Sciences
>> University of California, Irvine
>> http://www.paulrigor.net/
>> http://www.ics.uci.edu/~prigor
>>
>>
>>
>
> >
>


-- 
Paul Rigor
Graduate Student
Institute for Genomics and Bioinformatics
Donald Bren School of Information and Computer Sciences
University of California, Irvine
http://www.paulrigor.net/
http://www.ics.uci.edu/~prigor

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: multiz44way?

Reply via email to