[pygr] Re: multiz44way?

Namshin Kim Thu, 03 Sep 2009 05:07:27 -0700

Hi Paul,

I am now building hg18_multiz44way NLMSA without any problems. Please give
me some error message if you still have those problems. You may need to
start over after you delete .pygr_data in your writable WORLDBASEPATH.
If your WORLDBASEBUILDDIR is not final repository, you can move all NLMSA
files into your destination directory. And, update .seqDictP like this:


You can open genome using seqdb.SequenceFileDB (should use absolute path) or
from worldbase.
hg18 = seqdb.SequenceFileDB('hg18') or hg18 =
worldbase.Bio.Seq.Genome.HUMAN.hg18()

genomeDict = {'hg18':hg18, ...} # supply all 44 genomes
genomeUnion = seqdb.PrefixUnionDict(genomeDict)
msa = cnestedlist.NLMSA('hg18_multiz44way, genomeUnion, 'r')
msa.save_seq_dict()

Then, .seqDictP will be updated and you can access without any problems.

chr1_slice = msa.seqDict['hg18.chr1'][1000:2000]
edges = msa[chr1_slice].edges()

--
Namshin Kim




On Thu, Sep 3, 2009 at 7:20 AM, Namshin Kim <deepr...@gmail.com> wrote:

> Strange... Correct URL will be
> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/ponAbe2.gz The URL
> you used does not exist, thus it give 404 error (HTML doc).
> Hmm... I never downloaded and built the hg18_multiz44way via XMLRPC. I will
> try that...
>
> Thanks,
> Namshin Kim
>
>
>
> On Thu, Sep 3, 2009 at 6:54 AM, Paul Rigor (gmail) <paulri...@gmail.com>wrote:
>
>> Hi Namshim,
>> Downloading the 44way alignment was successful.  However, the persistend
>> data (.pygrdata) seems to be unworkable.  The metabase lists Bio.MSA, etc,
>> but it cannot be loaded.
>>
>> Also, I've attempted to download the genomes from the UCLA metabase, but a
>> genome might be corrupt.  Specifically,
>>
>> http://biodb.bioinformatics.ucla.edu/GENOMES/ponAbe2/chromFa.tar.gz
>>
>> which gives the error message below.  In fact, checking the file that is
>> downloaded (ponAbe2.tar.gz), is an HTML document!
>>
>> $ file ponAbe2.tar.gz
>> ponAbe2.tar.gz: HTML document text
>>
>>
>> ....[error trace below]
>> ....
>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/site-packages/pygr-0.8.0.beta1-py2.6-linux-x86_64.egg/pygr/downloader.pyc
>> in do_untar(filepath, mode, newpath, singleFile, **kwargs)
>>     105         newpath = filepath + '.out'
>>     106     import tarfile
>> --> 107     t = tarfile.open(filepath, mode)
>>     108     try:
>>     109         if singleFile: # extract to a single file
>>
>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>> in open(cls, name, mode, fileobj, bufsize, **kwargs)
>>    1662             else:
>>    1663                 raise CompressionError("unknown compression type
>> %r" % comptype)
>> -> 1664             return func(name, filemode, fileobj, **kwargs)
>>    1665
>>    1666         elif "|" in mode:
>>
>> /home/dock/shared_libraries/lx64/pkgs/pythonsandbox/2.6.2/lib/python2.6/tarfile.pyc
>> in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
>>    1713                 **kwargs)
>>    1714         except IOError:
>> -> 1715             raise ReadError("not a gzip file")
>>    1716         t._extfileobj = False
>>    1717         return t
>>
>> ReadError: not a gzip file
>>
>>
>>
>>
>> On Tue, Sep 1, 2009 at 9:55 PM, Paul Rigor (gmail) 
>> <paulri...@gmail.com>wrote:
>>
>>> Well, we have time, storage and bandwidth =)
>>> I'll let you know how it goes?  Maybe we can host an XMLRPC mirror
>>> someday too.
>>>
>>> Thanks,
>>> Paul
>>>
>>>
>>> On Tue, Sep 1, 2009 at 9:41 PM, Namshin Kim <deepr...@gmail.com> wrote:
>>>
>>>> Hi Paul,
>>>> I just checked the size of hg18_multiz44way and it is 167GB for just
>>>> NLMSA. If we consider genome assemblies you may not have, it would be ~
>>>> 250GB. I think it would take a long time to download all files.
>>>>
>>>> --
>>>> Namshin Kim
>>>>
>>>>
>>>>
>>>> On Wed, Sep 2, 2009 at 1:33 PM, Paul Rigor (gmail) <paulri...@gmail.com
>>>> > wrote:
>>>>
>>>>>
>>>>> Hi Namshin,
>>>>> I'm running this over night =)  Has anyone successfully pulled and used
>>>>> this alignment?
>>>>>
>>>>> Thanks,
>>>>> Paul
>>>>>
>>>>> On Sun, Aug 2, 2009 at 4:40 PM, Namshin Kim <deepr...@gmail.com>wrote:
>>>>>
>>>>>> Now the downloadable resources are available on biodb2 XMLRPC server.
>>>>>>
>>>>>> Two ways to build NLMSA.
>>>>>>
>>>>>> 1. metabase
>>>>>>
>>>>>> >>> import os
>>>>>> >>> os.environ['WORLDBASEPATH'] = '.,
>>>>>> http://biodb2.bioinformatics.ucla.edu:5000'
>>>>>> >>> from pygr import metabase
>>>>>> >>> mdb = metabase.MetabaseList()
>>>>>> >>> hg18 = mdb('Bio.MSA.UCSC.hg18_multiz44way',download=True)
>>>>>>
>>>>>> 2. from text files
>>>>>>
>>>>>> download text files from
>>>>>> http://biodb.bioinformatics.ucla.edu/PYGRDATA/
>>>>>> use cnestedlist.textfile_to_binaries('hg18_multiz44way') function to
>>>>>> convert from text to binaries
>>>>>>
>>>>>> If you want to see the script used to add these resources, visit this
>>>>>> URL.
>>>>>>
>>>>>>
>>>>>> http://github.com/deepreds/pygr/tree/d7ab9247dcd39b7d474029cb8749a53eb8582968/tests/biodb2_update
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Paul Rigor
>>> Graduate Student
>>> Institute for Genomics and Bioinformatics
>>> Donald Bren School of Information and Computer Sciences
>>> University of California, Irvine
>>> http://www.paulrigor.net/
>>> http://www.ics.uci.edu/~prigor
>>>
>>
>>
>>
>> --
>> Paul Rigor
>> Graduate Student
>> Institute for Genomics and Bioinformatics
>> Donald Bren School of Information and Computer Sciences
>> University of California, Irvine
>> http://www.paulrigor.net/
>> http://www.ics.uci.edu/~prigor
>>
>> >>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: multiz44way?

Reply via email to