Re: [galaxy-user] Fwd: Galaxy: RNA-seq analysis problems
Hi Roberta, Here is a link to the documentation for replicate handling for the 'NGS: RNA Analysis' tool Cuffdiff: http://cufflinks.cbcb.umd.edu/howitworks.html#reps Other related areas of the documentation are: http://cufflinks.cbcb.umd.edu/faq.html#cuffdiff http://cufflinks.cbcb.umd.edu/howitworks.html#hdif Also see (under 'RNA-seq analysis tools'): http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results Good luck with your project! Jen Galaxy team On 9/11/12 7:45 AM, James Taylor wrote: Roberta, I'm traveling right now so I'm forwarding your message to our help list. Thanks. -- Forwarded message -- From: Roberta Galletti roberta.galle...@ens-lyon.fr Date: Tue, Sep 11, 2012 at 5:19 AM Subject: Re: Galaxy: RNA-seq analysis problems To: James Taylor ja...@jamestaylor.org Hello James, sorry to bother you again, but I've one more question for you. I know that most existing methodologies to analyze RNA-seq data, have a strong dependency on sequencing depth for their differential expression calls and that this results might have a considerable number of false positives. Unfortunately, 1 out of 3 biological replicates of a set of my samples have a much bigger seq depth with respect to the other two samples. Do the programs in the Galaxy NGS: RNA Analysis section take into account this problem and normalize it? Thank you in advance for you help, Roberta Galletti. On 6/11/2012 5:36 PM, James Taylor wrote: Glad to hear it! Thanks! On Jun 8, 2012, at 9:37 AM, Roberta Galletti wrote: James, I managed to make it work. Thank you for your help. Roberta. -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How can I extract sequence information fromm cuffdiff files?
Hello, By no annotation, do you mean species-specific annotation (GTF) was not used? And you want to compare to a protein database like Genbank NR or RefSeq? Then these are the instructions. Please let us know if you had something else in mind. The sequence extraction can be done on Galaxy Main (if that is where you are working), but the BLAST will need to be run on a local or cloud install. To get set up (instance and data), start here: http://getgalaxy.org http://usegalaxy.org/cloud The BLAST+ wrapper recently moved from the distribution to the Tool Shed, but there are installation tools integrated to help get this into your instance. See the latest News Brief for details (Sept 7, 2012) - these are also good to follow as you maintain your instance: http://wiki.g2.bx.psu.edu/News http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07 Questions about local/cloud installs are best directed to the galaxy-...@bx.psu.edu mailing list: http://wiki.g2.bx.psu.edu/Mailing%20Lists To extract the transcript sequences, use the tool 'Fetch Sequences - Extract Genomic DNA'. This will accept a custom reference genome from the history, if you have been using one, by changing the option Source for Genomic Data: to History. Hopefully this helps, Jen Galaxy team On 9/13/12 10:09 AM, Humberto Boncristiani wrote: Hi. I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it? Thanks. Humberto *Dr. Humberto Boncristiani* National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: hum...@gmail.com mailto:hum...@gmail.com ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] How can I extract sequence information fromm cuffdiff files?
Hi Humberto, Yes, my apologies, this should have been included in the original reply. The 'locus' field in the Cuffdiff files refers to a gene bound - not individual transcripts. To get to the transcripts, the inputs to Cuffdiff need to be accessed. If you used Cuffmerge, the merged transcripts GTF file would be the correct file to use as input to Extract. If you used just Cuffcompare, use the combined transcripts GTF. To know which transcript was associated with which gene bound, compare the Cuffmerge merged transcripts GTF attributes (9th column: gene_id, tss_id, etc) with Cuffdiffs gene_id, tss_id values - is also in the test_id column, depending on the file. The Cuffcompare GTF comparisons will be similar. You can gain access to the GTF attributes with the tool Filter and Sort - Filter GTF data by attribute values_list. Cut out the column of interest in the Cuffdiff file (Text Manipulation - Cut), edit as desired, and use as a list filter. Or explore the other GFF filter options in the same tool group. Take care, Jen Galaxy team On 9/13/12 11:14 AM, Humberto Boncristiani wrote: Hi Fetch sequence-extract genomic DNA do not accept cuffidif files. Should I convert this file to some specific format? Thanks, Humberto. *Dr. Humberto Boncristiani* National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: hum...@gmail.com mailto:hum...@gmail.com On Sep 13, 2012, at 2:06 PM, Jennifer Jackson wrote: Hello, By no annotation, do you mean species-specific annotation (GTF) was not used? And you want to compare to a protein database like Genbank NR or RefSeq? Then these are the instructions. Please let us know if you had something else in mind. The sequence extraction can be done on Galaxy Main (if that is where you are working), but the BLAST will need to be run on a local or cloud install. To get set up (instance and data), start here: http://getgalaxy.org http://usegalaxy.org/cloud The BLAST+ wrapper recently moved from the distribution to the Tool Shed, but there are installation tools integrated to help get this into your instance. See the latest News Brief for details (Sept 7, 2012) - these are also good to follow as you maintain your instance: http://wiki.g2.bx.psu.edu/News http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07 Questions about local/cloud installs are best directed to the galaxy-...@bx.psu.edu mailing list: http://wiki.g2.bx.psu.edu/Mailing%20Lists To extract the transcript sequences, use the tool 'Fetch Sequences - Extract Genomic DNA'. This will accept a custom reference genome from the history, if you have been using one, by changing the option Source for Genomic Data: to History. Hopefully this helps, Jen Galaxy team On 9/13/12 10:09 AM, Humberto Boncristiani wrote: Hi. I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it? Thanks. Humberto *Dr. Humberto Boncristiani* National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: hum...@gmail.com mailto:hum...@gmail.com ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Galaxy CloudMan - Nodes can't make their own qsub calls?
Hi guys, I created a new Galaxy instance web launcher (https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into the master node. I'm trying to run a Perl script that makes several qsub calls to other perl scripts. Now the catch is that one of those perl scripts makes its own qsub calls. And I'm getting this error when it tries to do that: Unable to run job: denied: host ip-10-29-176-111.ec2.internal is no submit host. Somehow this works fine on other clusters I've run this code on. Any idea what could be going on? Do I need to make all of the nodes submit hosts? Thanks a bunch! -Greg ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Does Tophat output *.accepted hits file contain headers?
Dear All, I want to use the Tophat output files with .accepted hits to do analysis outside Galaxy. However, the program I am using requires the Tophat output to be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with .accepted hits produced at Galaxy contain headers? Will the headers of BAM files generated by Tophat universally the same? Thanks, Jianguang ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Galaxy CloudMan - Nodes can't make their own qsub calls?
You probably need to sudo to sgeadmin (cloudman guys correct me if you have this setup differently). I don't see any reason not to make worker nodes submit hosts by default in a future cloudman release. -- jt On Thu, Sep 13, 2012 at 4:17 PM, greg margeem...@gmail.com wrote: As a follow up I found a command that should add the new nodes as submit hosts and I tried to run it but I got this error: $ qconf -as ip-10-28-164-178.ec2.internal denied: ubuntu must be manager for this operation What does it mean by manager? How would I run this command? I guess my preference is for Cloudman to do this automatically though so I'll be distributing this program to 3rd party users using the built-in cloudman sharing. I can't rightly ask users to be running qconf. Thanks again, Greg On Thu, Sep 13, 2012 at 3:59 PM, greg margeem...@gmail.com wrote: Hi guys, I created a new Galaxy instance web launcher (https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into the master node. I'm trying to run a Perl script that makes several qsub calls to other perl scripts. Now the catch is that one of those perl scripts makes its own qsub calls. And I'm getting this error when it tries to do that: Unable to run job: denied: host ip-10-29-176-111.ec2.internal is no submit host. Somehow this works fine on other clusters I've run this code on. Any idea what could be going on? Do I need to make all of the nodes submit hosts? Thanks a bunch! -Greg ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Counting RNA-seq reads per class.
Hello Mo, This may be a coordinate problems with 0-based vs 1-based start files. Using tools from Operate on Genomic Intervals might be an alternative since it works with the coordinates appropriately. File formats can be converted as needed BAM - SAM - Interval. Alternatively, and may sound simple, but would the tool Join, Subtract and Group - Group do the summary with enough specificity? These files (eg transcript/gene expression) have both the 'class_code' and a 'coverage' column. Coverage isn't exactly the same number but it does quantify the read data Cufflinks actually used to create the assembled transcripts assigned to the various class_codes, if that is what you are looking for. Please let us know if your question has been misunderstood. Others are also welcome to add in more comments! Best, Jen Galaxy team On 9/10/12 8:52 AM, Mohammad Heydarian wrote: Hi All, I have been trying to count the number of RNA-seq reads that fall into the various Cufflinks class codes ('=', 'j', 'u', 'x', etc...) and I am curious how others are determining how to count reads per class.. I tried first using the BedTools tool where you count the number of reads overlapping another set of intervals and later realized that each interval is extended1 kb up and downstream prior to the analysis (by default and not adjustable on Galaxy), so the number of reads that were counted for all of the classes was always much more than the amount of reads that I had for my Bam file. I then tried to isolate reads from each class into separate BAM files, using the BedTools intersect tool and there I consistently end up with significantly less reads than I have in my sample. I am very curious to find out how others are tackling this problem on Galaxy. Thanks for any input! Cheers, Mo Heydarian ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] No output produced.....
Hi, I have my own image registration tool that I've created on my own local instance of galaxy. The method takes in two images (*.nii.gz) formats and registers them together, and produces one registered *.nii.gz file and a *.trsf matrix file. The first issue encountered was the method was expecting *.nii.gz files as inputs but was receiving *.dat files. I navigated around this problem as shown by the files below: - tool id=RegisterAliBabaAffine name=RegisterAffine descriptiontwo images/description command interpreter=bash$__root_dir__/tools/registration/reg-wrapper.sh $moving $fixed $outputTRSF $outputImage/command - inputs param format=binary name=moving type=data label=Moving Image / param format=binary name=fixed type=data label=Fixed Image / param type=hidden name=outputTRSF value=output.trsf label=trsf file help=Output File must have .trsf extension / param type=hidden name=outputImage value=output.nii.gz label=Image output file help=Output Image File must have .nii.gz extension / /inputs - outputs data format=input name=output_TRSF from_work_dir=output.trsf / data format=input name=output_Image from_work_dir=output.nii.gz / /outputs helpThis tool uses Affine Registration to register two images./help /tool #!/bin/bash MOVING=`mktemp --suffix .nii.gz` FIXED=`mktemp --suffix .nii.gz` cat $1 $MOVING cat $2 $FIXED /usr/local/MILXView.12.08.1/BashScripts/RegisterAliBabaAffine -m $MOVING -f $FIXED -t $3 -o $4 RC=$? if [[ $RC == 0 ]]; then OUTPUTTRSF=`mktemp --suffix .trsf` OUTPUTIMG=`mktemp --suffix .nii.gz` cat $OUTPUTTRSF $3 cat $OUTPUTIMG $4 rm $OUTPUTTRSF rm $OUTPUTIMG fi rm $MOVING rm $FIXED exit $RC This allows them to pass the *.nii.gz files that the registration method is expecting. Everything works fine and I can see output generated in the job_working_dir and the history turns green... galaxy@bmladmin-OptiPlex-745:~$ ls -lrt ~/galaxy-dist/database/job_working_directory/000/27/ total 2940 -rw--- 1 galaxy nogroup 0 Sep 13 10:15 tmpRfHsOP_stderr -rw-r--r-- 1 galaxy nogroup 241 Sep 13 10:35 output.trsf -rw--- 1 galaxy nogroup 80 Sep 13 10:35 tmplmK0V2_stdout -rw-r--r-- 1 galaxy nogroup 2998272 Sep 13 10:38 output.nii.gz However, the problem occurs when the files are copied from ~/galaxy-dist/database/job_working_directory/000/27/ to ~/galaxy-dist/database/files/000/. When this happens the files become size = 0. Any ideas? -rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_40.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 09:36 /home/galaxy/galaxy-dist/database/files/000/dataset_41.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat -rw-r--r-- 1 galaxy nogroup 0 Sep 13 10:38 /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat The output in galaxy.log indicates it is successful: /home/galaxy/galaxy-dist/tools/registration/reg-wrapper.sh /home/galaxy/galaxy-dist/database/files/000/dataset_23.dat /home/galaxy/galaxy-dist/database/files/000/dataset_20.dat output.trsf output.nii.gz galaxy.jobs DEBUG 2012-09-13 10:38:10,334 The tool did not define exit code or stdio handling; checking stderr for success galaxy.jobs DEBUG 2012-09-13 10:38:10,361 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.trsf to /home/galaxy/galaxy-dist/database/files/000/dataset_42.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,380 finish(): Moved /home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.nii.gz to /home/galaxy/galaxy-dist/database/files/000/dataset_43.dat as directed by from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,609 job 27 ended Is the issue copying *.nii.gz files and *.trsf file into *.dat files? Anyway around this? I've also modified ~/galaxy-dist/lib/galaxy/jobs/__init__.py (line 363) to change shutil.move To shutil.copy2 (same results) Also put in a different output path to copy to. But essentially we have files with size in ~/galaxy-dist/database/job_working_directory/000/id/, but they files are size 0 after the move into ~/galaxy-dist/database/files/000 Thanks Neil ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/