Re: [galaxy-user] Fwd: Galaxy: RNA-seq analysis problems

2012-09-13 Thread Jennifer Jackson

Hi Roberta,

Here is a link to the documentation for replicate handling for the 'NGS: 
RNA Analysis' tool Cuffdiff:

http://cufflinks.cbcb.umd.edu/howitworks.html#reps

Other related areas of the documentation are:
http://cufflinks.cbcb.umd.edu/faq.html#cuffdiff
http://cufflinks.cbcb.umd.edu/howitworks.html#hdif

Also see (under 'RNA-seq analysis tools'):
http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results

Good luck with your project!

Jen
Galaxy team

On 9/11/12 7:45 AM, James Taylor wrote:

Roberta, I'm traveling right now so I'm forwarding your message to our
help list. Thanks.

-- Forwarded message --
From: Roberta Galletti roberta.galle...@ens-lyon.fr
Date: Tue, Sep 11, 2012 at 5:19 AM
Subject: Re: Galaxy: RNA-seq analysis problems
To: James Taylor ja...@jamestaylor.org


Hello James,
sorry to bother you again, but I've one more question for you. I know
that most existing methodologies to analyze RNA-seq data, have a
strong dependency on sequencing depth for their differential
expression calls and that this results might have a considerable
number of false positives. Unfortunately, 1 out of 3 biological
replicates of a set of my samples have a much bigger seq depth with
respect to the other two samples. Do the programs in the Galaxy  NGS:
RNA Analysis section take into account this problem and normalize it?
Thank you in advance for you help,
Roberta Galletti.


On 6/11/2012 5:36 PM, James Taylor wrote:

Glad to hear it! Thanks!

On Jun 8, 2012, at 9:37 AM, Roberta Galletti wrote:

James,
I managed to make it work. Thank you for your help.
Roberta.





--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] How can I extract sequence information fromm cuffdiff files?

2012-09-13 Thread Jennifer Jackson

Hello,

By no annotation, do you mean species-specific annotation (GTF) was not 
used? And you want to compare to a protein database like Genbank NR or 
RefSeq? Then these are the instructions. Please let us know if you had 
something else in mind.


The sequence extraction can be done on Galaxy Main (if that is where you 
are working), but the BLAST will need to be run on a local or cloud 
install. To get set up (instance and data), start here:

http://getgalaxy.org
http://usegalaxy.org/cloud

The BLAST+ wrapper recently moved from the distribution to the Tool 
Shed, but there are installation tools integrated to help get this into 
your instance. See the latest News Brief for details (Sept 7, 2012) - 
these are also good to follow as you maintain your instance:

http://wiki.g2.bx.psu.edu/News
http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07

Questions about local/cloud installs are best directed to the 
galaxy-...@bx.psu.edu mailing list:

http://wiki.g2.bx.psu.edu/Mailing%20Lists

To extract the transcript sequences, use the tool 'Fetch Sequences - 
Extract Genomic DNA'. This will accept a custom reference genome from 
the history, if you have been using one, by changing the option Source 
for Genomic Data: to History.


Hopefully this helps,

Jen
Galaxy team

On 9/13/12 10:09 AM, Humberto Boncristiani wrote:

Hi.

I got cuffdiff files with gene differential expression on it. I don't
have the annotation, therefore I need to extract the sequence
information from the genome coordinates and them blast them to identify
those.
How the easiest way to do it?

Thanks.

Humberto



*Dr. Humberto Boncristiani*
National Research Council (NRC) Fellow
Adjunct Research Associate
Department of Biology
Univ. North Carolina at Greensboro
312 Eberhart Bldg
Greensboro, NC 27403, USA.
Tel.:(1) 336-256-2591
Fax: (1) 336-334-5839
email: hum...@gmail.com mailto:hum...@gmail.com






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] How can I extract sequence information fromm cuffdiff files?

2012-09-13 Thread Jennifer Jackson

Hi Humberto,

Yes, my apologies, this should have been included in the original reply. 
The 'locus' field in the Cuffdiff files refers to a gene bound - not 
individual transcripts. To get to the transcripts, the inputs to 
Cuffdiff need to be accessed. If you used Cuffmerge, the merged 
transcripts GTF file would be the correct file to use as input to 
Extract. If you used just Cuffcompare, use the combined transcripts GTF.


To know which transcript was associated with which gene bound, compare 
the Cuffmerge merged transcripts GTF attributes (9th column: gene_id, 
tss_id, etc) with Cuffdiffs gene_id, tss_id values - is also in the 
test_id column, depending on the file. The Cuffcompare GTF comparisons 
will be similar.


You can gain access to the GTF attributes with the tool Filter and Sort 
- Filter GTF data by attribute values_list. Cut out the column of 
interest in the Cuffdiff file (Text Manipulation - Cut), edit as 
desired, and use as a list filter. Or explore the other GFF filter 
options in the same tool group.


Take care,

Jen
Galaxy team

On 9/13/12 11:14 AM, Humberto Boncristiani wrote:

Hi

Fetch sequence-extract genomic DNA do not accept cuffidif files.
Should I convert this file to some specific format?

Thanks,

Humberto.

*Dr. Humberto Boncristiani*
National Research Council (NRC) Fellow
Adjunct Research Associate
Department of Biology
Univ. North Carolina at Greensboro
312 Eberhart Bldg
Greensboro, NC 27403, USA.
Tel.:(1) 336-256-2591
Fax: (1) 336-334-5839
email: hum...@gmail.com mailto:hum...@gmail.com




On Sep 13, 2012, at 2:06 PM, Jennifer Jackson wrote:


Hello,

By no annotation, do you mean species-specific annotation (GTF) was
not used? And you want to compare to a protein database like Genbank
NR or RefSeq? Then these are the instructions. Please let us know if
you had something else in mind.

The sequence extraction can be done on Galaxy Main (if that is where
you are working), but the BLAST will need to be run on a local or
cloud install. To get set up (instance and data), start here:
http://getgalaxy.org
http://usegalaxy.org/cloud

The BLAST+ wrapper recently moved from the distribution to the Tool
Shed, but there are installation tools integrated to help get this
into your instance. See the latest News Brief for details (Sept 7,
2012) - these are also good to follow as you maintain your instance:
http://wiki.g2.bx.psu.edu/News
http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07

Questions about local/cloud installs are best directed to the
galaxy-...@bx.psu.edu mailing list:
http://wiki.g2.bx.psu.edu/Mailing%20Lists

To extract the transcript sequences, use the tool 'Fetch Sequences -
Extract Genomic DNA'. This will accept a custom reference genome from
the history, if you have been using one, by changing the option
Source for Genomic Data: to History.

Hopefully this helps,

Jen
Galaxy team

On 9/13/12 10:09 AM, Humberto Boncristiani wrote:

Hi.

I got cuffdiff files with gene differential expression on it. I don't
have the annotation, therefore I need to extract the sequence
information from the genome coordinates and them blast them to identify
those.
How the easiest way to do it?

Thanks.

Humberto



*Dr. Humberto Boncristiani*
National Research Council (NRC) Fellow
Adjunct Research Associate
Department of Biology
Univ. North Carolina at Greensboro
312 Eberhart Bldg
Greensboro, NC 27403, USA.
Tel.:(1) 336-256-2591
Fax: (1) 336-334-5839
email: hum...@gmail.com mailto:hum...@gmail.com






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org




--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Galaxy CloudMan - Nodes can't make their own qsub calls?

2012-09-13 Thread greg
Hi guys,

I created a new Galaxy instance web launcher
(https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into
the master node.

I'm trying to run a Perl script that makes several qsub calls to other
perl scripts.  Now the catch is that one of those perl scripts makes
its own qsub calls.

And I'm getting this error when it tries to do that:

Unable to run job: denied: host ip-10-29-176-111.ec2.internal is no
submit host.


Somehow this works fine on other clusters I've run this code on.  Any
idea what could be going on?  Do I need to make all of the nodes
submit hosts?

Thanks a bunch!

-Greg
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Does Tophat output *.accepted hits file contain headers?

2012-09-13 Thread Du, Jianguang
Dear All,

I want to use the Tophat output files with .accepted hits to do analysis 
outside Galaxy. However, the program I am using requires the Tophat output to 
be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with 
.accepted hits produced at Galaxy contain headers? Will the headers of BAM 
files generated by Tophat universally the same?

Thanks,

Jianguang
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Galaxy CloudMan - Nodes can't make their own qsub calls?

2012-09-13 Thread James Taylor
You probably need to sudo to sgeadmin (cloudman guys correct me if you
have this setup differently).

I don't see any reason not to make worker nodes submit hosts by
default in a future cloudman release.

-- jt


On Thu, Sep 13, 2012 at 4:17 PM, greg margeem...@gmail.com wrote:
 As a follow up I found a command that should add the new nodes as
 submit hosts and I tried to run it but I got this error:

 $ qconf -as ip-10-28-164-178.ec2.internal
 denied: ubuntu must be manager for this operation

 What does it mean by manager?  How would I run this command?


 I guess my preference is for Cloudman to do this automatically though
 so I'll be distributing this program to 3rd party users using the
 built-in cloudman sharing.  I can't rightly ask users to be running
 qconf.

 Thanks again,

 Greg


 On Thu, Sep 13, 2012 at 3:59 PM, greg margeem...@gmail.com wrote:
 Hi guys,

 I created a new Galaxy instance web launcher
 (https://biocloudcentral.herokuapp.com/launch) and then I ssh'd into
 the master node.

 I'm trying to run a Perl script that makes several qsub calls to other
 perl scripts.  Now the catch is that one of those perl scripts makes
 its own qsub calls.

 And I'm getting this error when it tries to do that:

 Unable to run job: denied: host ip-10-29-176-111.ec2.internal is no
 submit host.


 Somehow this works fine on other clusters I've run this code on.  Any
 idea what could be going on?  Do I need to make all of the nodes
 submit hosts?

 Thanks a bunch!

 -Greg
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

   http://lists.bx.psu.edu/
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Counting RNA-seq reads per class.

2012-09-13 Thread Jennifer Jackson

Hello Mo,

This may be a coordinate problems with 0-based vs 1-based start files. 
Using tools from Operate on Genomic Intervals might be an alternative 
since it works with the coordinates appropriately. File formats can be 
converted as needed BAM - SAM - Interval.


Alternatively, and may sound simple, but would the tool Join, Subtract 
and Group - Group do the summary with enough specificity? These files 
(eg transcript/gene expression) have both the 'class_code' and a 
'coverage' column. Coverage isn't exactly the same number but it does 
quantify the read data Cufflinks actually used to create the assembled 
transcripts assigned to the various class_codes, if that is what you are 
looking for.


Please let us know if your question has been misunderstood. Others are 
also welcome to add in more comments!


Best,

Jen
Galaxy team

On 9/10/12 8:52 AM, Mohammad Heydarian wrote:

Hi All,
I have been trying to count the number of RNA-seq reads that fall into
the various Cufflinks class codes ('=', 'j', 'u', 'x', etc...) and I am
curious how others are determining how to count reads per class..

I tried first using the BedTools tool where you count the number of
reads overlapping another set of intervals and later realized that each
interval is extended1 kb up and downstream prior to the analysis (by
default and not adjustable on Galaxy), so the number of reads that were
counted for all of the classes was always much more than the amount of
reads that I had for my Bam file. I then tried to isolate reads from
each class into separate BAM files, using the BedTools intersect tool
and there I consistently end up with significantly less reads than I
have in my sample.

I am very curious to find out how others are tackling this problem on
Galaxy.

Thanks for any input!

Cheers,
Mo Heydarian





___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] No output produced.....

2012-09-13 Thread Neil.Burdett
Hi,
I have my own image registration tool that I've created on my own local 
instance of galaxy.

The method takes in two images (*.nii.gz) formats and registers them together, 
and produces one registered *.nii.gz file and a *.trsf matrix file.

The first issue encountered was the method was expecting *.nii.gz files as 
inputs but was receiving *.dat files. I navigated around this problem as shown 
by the files below:

- tool id=RegisterAliBabaAffine name=RegisterAffine
descriptiontwo images/description
command 
interpreter=bash$__root_dir__/tools/registration/reg-wrapper.sh $moving 
$fixed $outputTRSF $outputImage/command
-   inputs
  param format=binary name=moving type=data label=Moving Image /
  param format=binary name=fixed type=data label=Fixed Image /
  param type=hidden name=outputTRSF value=output.trsf label=trsf 
file help=Output File must have .trsf extension /
  param type=hidden name=outputImage value=output.nii.gz 
label=Image output file help=Output Image File must have .nii.gz extension 
/
   /inputs
-   outputs
  data format=input name=output_TRSF from_work_dir=output.trsf /
  data format=input name=output_Image from_work_dir=output.nii.gz /
   /outputs
helpThis tool uses Affine Registration to register two images./help
  /tool

#!/bin/bash
MOVING=`mktemp --suffix .nii.gz`
FIXED=`mktemp --suffix .nii.gz`
cat $1  $MOVING
cat $2  $FIXED
/usr/local/MILXView.12.08.1/BashScripts/RegisterAliBabaAffine -m $MOVING -f 
$FIXED -t $3 -o $4
RC=$?
if [[ $RC == 0 ]]; then
  OUTPUTTRSF=`mktemp --suffix .trsf`
  OUTPUTIMG=`mktemp --suffix .nii.gz`
  cat  $OUTPUTTRSF  $3
  cat  $OUTPUTIMG  $4
  rm $OUTPUTTRSF
  rm $OUTPUTIMG
fi

rm $MOVING
rm $FIXED

exit $RC

This allows them to pass the *.nii.gz files that the registration method is 
expecting.

Everything works fine and I can see output generated in the job_working_dir and 
the history turns green...


galaxy@bmladmin-OptiPlex-745:~$ ls -lrt 
~/galaxy-dist/database/job_working_directory/000/27/

total 2940

-rw--- 1 galaxy nogroup   0 Sep 13 10:15 tmpRfHsOP_stderr

-rw-r--r-- 1 galaxy nogroup 241 Sep 13 10:35 output.trsf

-rw--- 1 galaxy nogroup  80 Sep 13 10:35 tmplmK0V2_stdout

-rw-r--r-- 1 galaxy nogroup 2998272 Sep 13 10:38 output.nii.gz

However, the problem occurs when the files are copied from 
~/galaxy-dist/database/job_working_directory/000/27/ to 
~/galaxy-dist/database/files/000/. When this happens the files become size = 0.

Any ideas?


-rw-r--r-- 1 galaxy nogroup   0 Sep 13 09:36 
/home/galaxy/galaxy-dist/database/files/000/dataset_40.dat

-rw-r--r-- 1 galaxy nogroup   0 Sep 13 09:36 
/home/galaxy/galaxy-dist/database/files/000/dataset_41.dat

-rw-r--r-- 1 galaxy nogroup   0 Sep 13 10:38 
/home/galaxy/galaxy-dist/database/files/000/dataset_43.dat

-rw-r--r-- 1 galaxy nogroup   0 Sep 13 10:38 
/home/galaxy/galaxy-dist/database/files/000/dataset_42.dat


The output in galaxy.log indicates it is successful:


/home/galaxy/galaxy-dist/tools/registration/reg-wrapper.sh 
/home/galaxy/galaxy-dist/database/files/000/dataset_23.dat 
/home/galaxy/galaxy-dist/database/files/000/dataset_20.dat output.trsf 
output.nii.gz galaxy.jobs DEBUG 2012-09-13 10:38:10,334 The tool did not define 
exit code or stdio handling; checking stderr for success galaxy.jobs DEBUG 
2012-09-13 10:38:10,361 finish(): Moved 
/home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.trsf to 
/home/galaxy/galaxy-dist/database/files/000/dataset_42.dat as directed by 
from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,380 finish(): Moved 
/home/galaxy/galaxy-dist/database/job_working_directory/000/27/output.nii.gz to 
/home/galaxy/galaxy-dist/database/files/000/dataset_43.dat as directed by 
from_work_dir galaxy.jobs DEBUG 2012-09-13 10:38:10,609 job 27 ended

Is the issue copying *.nii.gz files and *.trsf file into *.dat files? Anyway 
around this?


I've also modified ~/galaxy-dist/lib/galaxy/jobs/__init__.py (line 363) to 
change shutil.move



To shutil.copy2 (same results)



Also put in a different output path to copy to. But essentially we have files 
with size in ~/galaxy-dist/database/job_working_directory/000/id/, but they 
files are size  0 after the move into ~/galaxy-dist/database/files/000


Thanks

Neil
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/