[galaxy-user] Make a vcf file

2012-02-14 Thread David Matthews
Hi,

This may be a dense question, but how do we generate a vcf file from the public 
version of Galaxy? Am I missing something obvious?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] filtering pile-up fails

2012-02-14 Thread Sebahattin Cirak
Dear All,

I have been successful by using the online tool to align Illumina pair end
reads , each direction 5GB, and also generated a pileup of 680,000,000
lines,
but the filtering of the pileup always fails, it runs for several hours and
I get an empty file back. I tried different options and different pileups,
always the same.
Could somebody please help or what is the trick?

Thank you
Sebahattin
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Solution for: Error running cuffdiff. Error: cannot open reference GTF file CONDITION, CONTROL for reading

2012-02-14 Thread Jeremy Goecks
 The problem ended being the use of Perform Bias Correction(-b) and a
 GTF file with no Database/Build associated. Looking at cuffdiff
 wrapper I found, if a FASTA reference is not selected from the
 history, the FASTA reference of the GTF file associated build is used.
 If there is not build association, your cuffdiff run will fail with
 this not so helpful error.
 
 My feeling is, cuffdiff should check for a non-dashed string after
 '-b' and complain if is absents, but this doesn't happen currently.

Agreed. I implemented the spirit of this functionality via argument checking in 
galaxy-central changeset 71031bf3105c

Best,
J.
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Clustering with cuffcompare or cuffdiff results

2012-02-14 Thread Zhang Xiaoyu
Dear Sir or Madam,

I am planning to do clustering of several libraries based on the output of 
cuffcompare or cuffdiff, as they allow me to construct a matrix whose columns 
represent the libraries and rows are the count of transcripts or genes.  I want 
to construct the matrix because it is the required input format of many RNA-seq 
clustering softwares, e.g. baySeq, HTSCluster. However, by reading the answer 
of question I want to find differentially expressed genes. Can I use Cufflinks 
in conjunction with count-based differential expression packages? in the 
cufflinks FAQ list, it is suggested not to convert FPKM value to count data. 

Now my question is 
1. It seems that it is better to run everything up to cuffdiff, but does 
cuffdiff allow multiple sample comparison because I read somewhere that even 
for multi-samples it still compare tham pairwisely? In a sense, because I want 
to do clustering which needs some quantitative data source to do the merging, 
will cuffdiff provide me some quantitative measures rather than the test score 
and p-value which is too qualitative to include? 
2. If I really need to get count data from the FPKM values, how do I obtain the 
mentioned effective length? Would it be better if I treat each assembled 
transcript as an object in clustering, rather than genes. What does it mean 
you'd be throwing away Cufflinks' uncertainty even with using isoforms as 
objects? How should I include the uncertainty into my clustering?

Best,
Sherry

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Clustering with cuffcompare or cuffdiff results

2012-02-14 Thread Jeremy Goecks
 1. It seems that it is better to run everything up to cuffdiff, but does 
 cuffdiff allow multiple sample comparison because I read somewhere that even 
 for multi-samples it still compare tham pairwisely?

Cuffdiff supports replicate analysis.

 In a sense, because I want to do clustering which needs some quantitative 
 data source to do the merging, will cuffdiff provide me some quantitative 
 measures rather than the test score and p-value which is too qualitative to 
 include? 

Take a look at the Cuffdiff documentation for outputs: 
http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output

 2. If I really need to get count data from the FPKM values, how do I obtain 
 the mentioned effective length? Would it be better if I treat each 
 assembled transcript as an object in clustering, rather than genes. What does 
 it mean you'd be throwing away Cufflinks' uncertainty even with using 
 isoforms as objects? How should I include the uncertainty into my clustering?

These FAQs from http://cufflinks.cbcb.umd.edu/faq.html address your questions:

--
I want to find differentially expressed genes. Can I use Cufflinks in 
conjunction with count-based differential expression packages?

It's possible, but we strongly advise against this. Current count-based 
differential expression tools are poorly suited to differential expression 
analysis in genomes with alternatively spliced genes. The main reason for this 
is that when a gene has multiple isoforms, a change in the total number of 
reads or fragments from that gene doesn't always correspond to a change in 
expression for that gene. Conversely, a gene's expression may change, but the 
total number of fragments generated by its isoforms may be very similar. In 
order to detect changes accurately, it's necessary to estimate how many 
fragments came from each individual splice variant in each sample. Current 
count-based tools don't do this (to our knowledge - please send us email if you 
know of one!). Even if they did, fragments that come from parts of genes that 
are shared by more than one splice variant can't generally assigned to a single 
isoform, so the fragment counts for each isoform are only estimates, and there 
is some uncertainty in the counts. Isoforms that are very similar will have a 
great deal of uncertainty surrounding their fragment counts. This uncertainty 
needs to be accounted for when testing for differential expression. So while 
you could use Cufflinks to estimate isoform-level counts, you'd be throwing 
away Cufflinks' uncertainty, and thus have more confidence in the differences 
you see than you really should. This will probably lead to many false positives 
in your analysis. Furthermore, we do not normalize simply by the length to 
calculate FPKM but an effective length, as explained in our publications. 
Calculting counts from FPKM by multiplying by the length will give incorrect 
results. We strongly encourage you to consider using Cuffdiff to find 
differentially expressed genes and transcripts.

Will you please report how many fragments come from each transcript in a future 
release?

For the foreseeable future, we will not be reporting the number of fragments we 
think originated from each transcript. People who have asked for this almost 
always want to use Cufflinks in conjunction with count-based differential 
expression packages, which is not a good idea. We're trying to keep our output 
formats as simple as possible.
--

Best,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] filtering pile-up fails

2012-02-14 Thread Jennifer Jackson

Hello Sebahattin,

It may be that the data is too large to run on the Galaxy public 
instance, which means that a local or cloud instance would be the next 
recommendation, as explained in this wiki:

http://wiki.g2.bx.psu.edu/Big%20Picture/Choices

But, before you make the investment of moving you analysis, I'd like to 
offer to double check to see if there are any tool use modifications 
that will allow the job to process given the resources available on Main.


What I will need is for you to send in a tool error report by clicking 
on the green bug icon in the red error dataset. Please include this 
email address in the comments if it is not the one you also use with 
your galaxy account, so that I can link the two questions. If the tool 
just gave an empty set, but not an actual error, then use Options - 
Share or Publish, generate the share link, copy that and email it back 
to me directly. Please note the problem datasets if not obvious and 
leave all inputs and errors in an undeleted state (please undelete if 
necessary) so that I can examine the entire data path that lead up to 
the error.


If you want to try to troubleshoot on your own as well, some general 
advice is in this part of the Support wiki:

http://wiki.g2.bx.psu.edu/Support#Error_from_tools

Hopefully we can work out a solution,

Best,

Jen
Galaxy team

On 2/14/12 4:11 AM, Sebahattin Cirak wrote:

Dear All,

I have been successful by using the online tool to align Illumina pair
end reads , each direction 5GB, and also generated a pileup of
680,000,000 lines,
but the filtering of the pileup always fails, it runs for several hours
and I get an empty file back. I tried different options and different
pileups, always the same.
Could somebody please help or what is the trick?

Thank you
Sebahattin


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] bug for mate pairs colorspace to sequence conversion

2012-02-14 Thread Jennifer Jackson

Hi Philipp,

Sorry for the delay in reply, the question is a bit confusing since the 
conversion tools allow for the adapter base to be specified (and the 
default is a G). I am not sure we are talking about the same tool, so to 
clear things up, would you please send a shared history link and any 
other details that you think will help, and then I'll try to provide 
some more feedback? Including opening an enhancement request ticket if 
that seems appropriate after we discuss the data  tools.


To share, use Options - Share or Publish, generate the share link, 
copy that, and email it directly back to me (not the entire list). Make 
certain that all input are present and any attempted tool runs, 
successful or failure are present and undeleted. Please note the exact 
tools that you want to use, including the settings you think are the 
best fit, and what you think is missing, the more detail the better.


Thank you and I will watch for your reply,

Best,

Jen
Galaxy team

On 2/2/12 9:41 AM, philipp.bernin...@unibas.ch wrote:

Hi,

I have a problem converting paired reads with paired end reads from ABi,
in the color code the reads started with a G and afterwards with numbers
instead of T02102103... , so I guess the program assumes the T as
hardcoded instead of using the G as last base

best

Philipp



This message was sent using IMP, the Internet Messaging Program.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using reply all in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/