Re: [Denovoassembler-users] RE : MetaRay inquiry

2012-09-27 Thread Sébastien Boisvert
Hi,

You need to get RayPlatform from its git repository too.

mkdir git-clones
cd git-clones
git clone git://github.com/sebhtml/ray.git
git clone git://github.com/sebhtml/RayPlatform.git
cd ray
make


Sébastien

On 27/09/12 02:44 AM, Mike Peabody wrote:
> Thanks Sébastien! It looks like I got the CreateRayInputStructures.sh to run 
> properly.
> 
> However, I seem to be having a problem getting this version to run. It looks 
> like RayPlatform 
> is actually a symbolic link to another Rayplatform?
> 
> lrwxrwxrwx  1 mpeabody mpeabody14 Sep 26 23:28 RayPlatform -> 
> ../RayPlatform
> 
> I tried just deleting this, and replacing it with a copy of the RayPlatform 
> folder 
> from Ray-v2.0.0, but when I tried "make PREFIX=ray-build" I got a bunch of 
> errors. The output is below:
> 
> 
> Compilation options (you can change them of course)
> 
> PREFIX = ray-build
> MAXKMERLENGTH = 32
> FORCE_PACKING = n
> ASSERT = n
> HAVE_LIBZ = n
> HAVE_LIBBZ2 = n
> INTEL_COMPILER = n
> MPICXX = mpicxx
> GPROF = n
> OPTIMIZE = y
> DEBUG = n
> 
> Compilation and linking flags (generated automatically)
> 
> CXXFLAGS = -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D 
> RAY_VERSION=\"2.1.0-devel\"
> LDFLAGS =
> 
> make[1]: Entering directory `/home/mpeabody/programs/Ray/ray/code'
> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
> RAY_VERSION=\"2.1.0-devel\" -I ../RayPlatform -I. -c -o 
> application_core/ray_main.o application_core/ray_main.cpp
> icpc: command line warning #10159: invalid argument for option '-std'
> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
> RAY_VERSION=\"2.1.0-devel\" -I ../RayPlatform -I. -c -o 
> application_core/Machine.o application_core/Machine.cpp
> icpc: command line warning #10159: invalid argument for option '-std'
> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
> RAY_VERSION=\"2.1.0-devel\" -I ../RayPlatform -I. -c -o 
> application_core/Parameters.o application_core/Parameters.cpp
> icpc: command line warning #10159: invalid argument for option '-std'
> application_core/Parameters.cpp(2074): warning #68: integer conversion 
> resulted in a change of sign
> uint64_t value=-1;
>^
> 
> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
> RAY_VERSION=\"2.1.0-devel\" -I ../RayPlatform -I. -c -o 
> application_core/common_functions.o application_core/common_functions.cpp
> icpc: command line warning #10159: invalid argument for option '-std'
> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
> RAY_VERSION=\"2.1.0-devel\" -I ../RayPlatform -I. -c -o plugin_Amos/Amos.o 
> plugin_Amos/Amos.cpp
> icpc: command line warning #10159: invalid argument for option '-std'
> plugin_Amos/Amos.cpp(34): error #303: explicit type is missing ("int" assumed)
>   __CreatePlugin(Amos);
>   ^
> 
> plugin_Amos/Amos.cpp(37): error: identifier "RAY_MASTER_MODE_AMOS" is 
> undefined
>   __CreateMasterModeAdapter(Amos,RAY_MASTER_MODE_AMOS); /**/
>  ^
> 
> plugin_Amos/Amos.cpp(37): error #303: explicit type is missing ("int" assumed)
>   __CreateMasterModeAdapter(Amos,RAY_MASTER_MODE_AMOS); /**/
>   ^
> 
> plugin_Amos/Amos.cpp(39): error: identifier "RAY_SLAVE_MODE_AMOS" is undefined
>   __CreateSlaveModeAdapter(Amos,RAY_SLAVE_MODE_AMOS); /**/
> ^
> 
> plugin_Amos/Amos.cpp(39): error #303: explicit type is missing ("int" assumed)
>   __CreateSlaveModeAdapter(Amos,RAY_SLAVE_MODE_AMOS); /**/
>   ^
> 
> plugin_Amos/Amos.cpp(239): error: type name is not allowed
> core->setSlaveModeObjectHandler(plugin,RAY_SLAVE_MODE_AMOS, 
> __GetAdapter(Amos,RAY_SLAVE_MODE_AMOS));
>   
>^
> 
> plugin_Amos/Amos.cpp(239): error: identifier "__GetAdapter" is undefined
> core->setSlaveModeObjectHandler(plugin,RAY_SLAVE_MODE_AMOS, 
> __GetAdapter(Amos,RAY_SLAVE_MODE_AMOS));
> ^
> 
> plugin_Amos/Amos.cpp(243): error: type name is not allowed
> core->setMasterModeObjectHandler(plugin,RAY_MASTER_MODE_AMOS, 
> __GetAdapter(Amos,RAY_MASTER_MODE_AMOS));
>   
>  ^
> 
> plugin_Amos/Amos.cpp(259): error: type name is not allowed
> __BindPlugin(Amos);
>  ^
> 
> plugin_Amos/Amos.cpp(259): error: identifier "__BindPlugin" is undefined
> __BindPlugin(Amos);
> ^
> 
> compilation aborted for plugin_Amos/Amos.cpp (code 2)
> make[1]: *** [plugin_Amos/Amos.o] Error 2
> make[1]: Leaving directory `/home/mpeabody/programs/Ray/ray/code'
> mpicxx   code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o Ray
> icpc: error #10236: File not found:  'code/TheRayGenomeAssembler.a'
> make: *** [Ray] Error 1
> 
> 
> 
> Do you happen to know this reason for this?
> 
> Thanks,
> Mike
> 
> - Original Message -
> From: "Sébast

Re: [Denovoassembler-users] RE : MetaRay inquiry

2012-09-19 Thread Sébastien Boisvert
Hi Mike,

(I CC'ed this to the mailing list).

Ray can be utilized to classify k-mers in a taxonomy. To do so,
Ray needs a taxonomy. You can use anything for the taxonomy.
At our center, we are using Greengenes and NCBI.

See these documents for general documentation about graph coloring and 
taxonomic profiling 
features (called Ray Communities):

- Documentation/Taxonomy.txt
- Documentation/BiologicalAbundances.txt


To download the NCBI taxonomy and generate required files:

Get a copy of ray:


   git clone git://github.com/sebhtml/ray.git


Add this to your PATH:

export PATH=~/git-clones/ray/scripts/NCBI-Taxonomy/:$PATH


Then, run this:

CreateRayInputStructures.sh 


This will generate these files:

- NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes
- NCBI-taxonomy/Genome-to-Taxon.tsv
- NCBI-taxonomy/TreeOfLife-Edges.tsv
- NCBI-taxonomy/Taxon-Names.tsv



Now, you can run Ray as usual (including Ray Méta plugins), but with 
additional options to run Ray Communities plugins as well:


mpiexec -n 96 \
Ray \
-k 31 -o Ray-Communities \
-p SeqA_1.fastq SeqA_2.fastq \
-p SeqB_1.fastq SeqB_2.fastq \
-search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes \
-with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv \
NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv 



As usual, you can also put all the arguments in a configuration file like this:

mpiexec -n 96 Ray Ray.conf

where Ray.conf contains

-k 31 -o Ray-Communities 
-p SeqA_1.fastq SeqA_2.fastq 
-p SeqB_1.fastq SeqB_2.fastq 
-search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes 
-with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv 
NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv 



So basically, the whole thing does a distributed de Bruijn graph really
fast (plugins for the distributed storage engine), assembles de novo the 
data by distributed graph traversals (Ray Méta; plugin SeedExtender), 
colors the graph with the reference genomes provided with the -search
option (Ray Communities, plugin Searcher), and computes taxonomic profiles
using the provided taxonomy (Ray Communities, -with-taxonomy, plugin 
PhylogenyViewer).


All that stuff is heavily distributed -- each Ray process has 32768 user-space 
threads
(workers) and you can throw as many Ray processes as you want to.


If you are running Ray on a buggy network (we had problems with Mellanox 
Infiniband MT26428,
revision a0), you can turn on virtual communications too.


Cheers, 

Sébastien

On 19/09/12 08:23 PM, Mike Peabody wrote:
> Thanks Sébastien!
> 
> -Mike
> 
> - Original Message -
> From: "Sébastien Boisvert" 
> To: "Mike Peabody" 
> Sent: Wednesday, September 19, 2012 6:46:19 AM
> Subject: Re: RE : MetaRay inquiry
> 
> Hi,
> 
> I should be done today I guess.
> 
> On Monday, we had a deadline for the Genome Canada bioinformatics competition.
> 
> Basically, the script will fetch all the finished bacterial genomes
> and all the draft bacterial genomes and create a bunch of symbolic links.
> 
> Each of these fasta files will already contain a >gi|something to classify
> it in the NCBI taxonomy.
> 
> For the NCBI taxonomy,there will be 3 files:
> 
> -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv
> 
> 
> I added the script in 
> https://github.com/sebhtml/ray/tree/master/scripts/NCBI-Taxonomy
> 
>   You can get it with "git clone git://github.com/sebhtml/ray.git"
> 
> The documentation is in Documentation/NCBI-Taxonomy.txt
> 
> It is not complete yet though. I need to add some code to format the tree and
> taxon names.
> 
> I will let you know once I have finished and tested everything.
>  
> 
> On 19/09/12 01:50 AM, Mike Peabody wrote:
>> Hi Sébastien,
>>
>> Just wanted to see how the script was going.
>>
>> Cheers,
>> Mike
>>
>> - Original Message -
>> From: "Sébastien Boisvert" 
>> To: "Mike Peabody" 
>> Sent: Thursday, September 13, 2012 6:27:28 PM
>> Subject: Re: RE : MetaRay inquiry
>>
>> I will write you a script that downloads the required files and that
>> convert them.
>>
>> I should get back at you by next Tuesday.
>>
>>
>> On 12/09/12 09:23 AM, Mike Peabody wrote:
>>> Hi Sébastien,
>>>
>>> Maybe you can upload the files to filedropper or another similar website?
>>> http://www.filedropper.com/
>>>
>>> Thanks!
>>> Mike
>>>
>>> - Original Message -
>>> From: "Sébastien Boisvert" 
>>> To: "Mike Peabody" 
>>> Sent: Wednesday, September 12, 2012 4:51:46 AM
>>> Subject: Re: RE : MetaRay inquiry
>>>
>>> Hi Mike,
>>>
>>> The 3 required files for taxonomy profiling are (+ reference genomes)
>>>
>>> -with-taxonomy \
>>> Genome-to-Taxon.tsv \
>>> TreeOfLife-Edges.tsv \
>>> Taxons.tsv
>>>
>>>
>>> There is the documentation at Documentation/Taxonomy.txt, but
>>> it seems that since I wrote the initial version, NCBI has changed (once 
>>> again !)
>>> the file formats on their FTP.
>>>
>>>
>>> The file ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip use