[pygr] Re: multiz44way?

Namshin Kim Thu, 03 Sep 2009 12:47:05 -0700

Yes, there is ponAbe2 genome.

>>> import os
>>> os.environ['WORLDBASEPATH'] = '.,
http://biodb2.bioinformatics.ucla.edu:5000'
>>> from pygr import worldbase
>>> ponAbe2 = worldbase.Bio.Seq.Genome.PONAB.ponAbe2(download=True)
INFO downloader.download_unpickler: Beginning download of
http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz to
/data/server/downloadable/pygr/tests/biodb2_update/t/ponAbe2.gz...
INFO downloader.download_monitor: downloaded 100663296 bytes (10.0%)...



On Fri, Sep 4, 2009 at 4:15 AM, Paul Rigor (gmail) <paulri...@gmail.com>wrote:

> Thanks Namshim!!!
> But don't I also have to build the individual genome resources as well?  In
> any case, it would be great if the non-existent ponAbe2 genome would be made
> available through XMLRPC as well.
>
> Thanks,
> Paul
>
>
> On Thu, Sep 3, 2009 at 5:07 AM, Namshin Kim <deepr...@gmail.com> wrote:
>
>> Hi Paul,
>>
>> I am now building hg18_multiz44way NLMSA without any problems. Please give
>> me some error message if you still have those problems. You may need to
>> start over after you delete .pygr_data in your writable WORLDBASEPATH.
>> If your WORLDBASEBUILDDIR is not final repository, you can move all NLMSA
>> files into your destination directory. And, update .seqDictP like this:
>>
>> You can open genome using seqdb.SequenceFileDB (should use absolute path)
>> or from worldbase.
>> hg18 = seqdb.SequenceFileDB('hg18') or hg18 =
>> worldbase.Bio.Seq.Genome.HUMAN.hg18()
>>
>> genomeDict = {'hg18':hg18, ...} # supply all 44 genomes
>> genomeUnion = seqdb.PrefixUnionDict(genomeDict)
>> msa = cnestedlist.NLMSA('hg18_multiz44way, genomeUnion, 'r')
>> msa.save_seq_dict()
>>
>> Then, .seqDictP will be updated and you can access without any problems.
>>
>>  chr1_slice = msa.seqDict['hg18.chr1'][1000:2000]
>> edges = msa[chr1_slice].edges()
>>
>> --
>> Namshin Kim
>>
>>
>>
>>
>> On Thu, Sep 3, 2009 at 7:20 AM, Namshin Kim <deepr...@gmail.com> wrote:
>>
>>> Strange... Correct URL will be
>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz The URL
>>> you used does not exist, thus it give 404 error (HTML doc).
>>> Hmm... I never downloaded and built the hg18_multiz44way via XMLRPC. I
>>> will try that...
>>>
>>> Thanks,
>>> Namshin Kim
>>>
>>>
>>>
>>> On Thu, Sep 3, 2009 at 6:54 AM, Paul Rigor (gmail) 
>>> <paulri...@gmail.com>wrote:
>>>
>>>> Hi Namshim,
>>>> Downloading the 44way alignment was successful.  However, the persistend
>>>> data (.pygrdata) seems to be unworkable.  The metabase lists Bio.MSA, etc,
>>>> but it cannot be loaded.
>>>>
>>>> Also, I've attempted to download the genomes from the UCLA metabase, but
>>>> a genome might be corrupt.  Specifically,
>>>>
>>>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/chromFa.tar.gz
>>>>
>>>> which gives the error message below.  In fact, checking the file that is
>>>> downloaded (ponAbe2.tar.gz), is an HTML document!
>>>>
>>>> $ file ponAbe2.tar.gz
>>>> ponAbe2.tar.gz: HTML document text
>>>>
>>>>
>>>> ....[error trace below]
>>>> ....
>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/site-packages/pygr-0.8.0.beta1-py2.6-linux-x86_64.egg/pygr/downloader.pyc
>>>> in do_untar(filepath, mode, newpath, singleFile, **kwargs)
>>>>     105         newpath = filepath + '.out'
>>>>     106     import tarfile
>>>> --> 107     t = tarfile.open(filepath, mode)
>>>>     108     try:
>>>>     109         if singleFile: # extract to a single file
>>>>
>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>>>> in open(cls, name, mode, fileobj, bufsize, **kwargs)
>>>>    1662             else:
>>>>    1663                 raise CompressionError("unknown compression type
>>>> %r" % comptype)
>>>> -> 1664             return func(name, filemode, fileobj, **kwargs)
>>>>    1665
>>>>    1666         elif "|" in mode:
>>>>
>>>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>>>> in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
>>>>    1713                 **kwargs)
>>>>    1714         except IOError:
>>>> -> 1715             raise ReadError("not a gzip file")
>>>>    1716         t._extfileobj = False
>>>>    1717         return t
>>>>
>>>> ReadError: not a gzip file
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Sep 1, 2009 at 9:55 PM, Paul Rigor (gmail) <paulri...@gmail.com
>>>> > wrote:
>>>>
>>>>> Well, we have time, storage and bandwidth =)
>>>>> I'll let you know how it goes?  Maybe we can host an XMLRPC mirror
>>>>> someday too.
>>>>>
>>>>> Thanks,
>>>>> Paul
>>>>>
>>>>>
>>>>> On Tue, Sep 1, 2009 at 9:41 PM, Namshin Kim <deepr...@gmail.com>wrote:
>>>>>
>>>>>> Hi Paul,
>>>>>> I just checked the size of hg18_multiz44way and it is 167GB for just
>>>>>> NLMSA. If we consider genome assemblies you may not have, it would be ~
>>>>>> 250GB. I think it would take a long time to download all files.
>>>>>>
>>>>>> --
>>>>>> Namshin Kim
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 2, 2009 at 1:33 PM, Paul Rigor (gmail) <
>>>>>> paulri...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Namshin,
>>>>>>> I'm running this over night =)  Has anyone successfully pulled and
>>>>>>> used this alignment?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Paul
>>>>>>>
>>>>>>> On Sun, Aug 2, 2009 at 4:40 PM, Namshin Kim <deepr...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Now the downloadable resources are available on biodb2 XMLRPC
>>>>>>>> server.
>>>>>>>>
>>>>>>>> Two ways to build NLMSA.
>>>>>>>>
>>>>>>>> 1. metabase
>>>>>>>>
>>>>>>>> >>> import os
>>>>>>>> >>> os.environ['WORLDBASEPATH'] = '.,
>>>>>>>> http://biodb2.bioinformatics.ucla.edu:5000'
>>>>>>>> >>> from pygr import metabase
>>>>>>>> >>> mdb = metabase.MetabaseList()
>>>>>>>> >>> hg18 = mdb('Bio.MSA.UCSC.hg18_multiz44way',download=True)
>>>>>>>>
>>>>>>>> 2. from text files
>>>>>>>>
>>>>>>>> download text files from
>>>>>>>> http://biodb.bioinformatics.ucla.edu/PYGRDATA/
>>>>>>>> use cnestedlist.textfile_to_binaries('hg18_multiz44way') function to
>>>>>>>> convert from text to binaries
>>>>>>>>
>>>>>>>> If you want to see the script used to add these resources, visit
>>>>>>>> this URL.
>>>>>>>>
>>>>>>>>
>>>>>>>> http://github.com/deepreds/pygr/tree/d7ab9247dcd39b7d474029cb8749a53eb8582968/tests/biodb2_update
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paul Rigor
>>>>> Graduate Student
>>>>> Institute for Genomics and Bioinformatics
>>>>> Donald Bren School of Information and Computer Sciences
>>>>> University of California, Irvine
>>>>> http://www.paulrigor.net/
>>>>> http://www.ics.uci.edu/~prigor
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paul Rigor
>>>> Graduate Student
>>>> Institute for Genomics and Bioinformatics
>>>> Donald Bren School of Information and Computer Sciences
>>>> University of California, Irvine
>>>> http://www.paulrigor.net/
>>>> http://www.ics.uci.edu/~prigor
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
> --
> Paul Rigor
> Graduate Student
> Institute for Genomics and Bioinformatics
> Donald Bren School of Information and Computer Sciences
> University of California, Irvine
> http://www.paulrigor.net/
> http://www.ics.uci.edu/~prigor
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: multiz44way?

Reply via email to