Re: [galaxy-user] MACS out put files from Galaxy

2012-09-11 Thread Duclot, Florian
Hi Peter,

 

I will try to give you my basic understanding of how MACS works. But before,
I would recommend you to read the MACS Readme
http://liulab.dfci.harvard.edu/MACS/README.html , where you will find
detailed explanations of how MACS works and what the different output files
are.

 

1.   I will say that it depends on what you want to do with the MACS
analysis. Usually, for all the downstream analyses that you can do, you will
need the interval file (bed). It's just a tab-separated tabular file that can
be opened with MS Excel anyway. I personally use Excel for quickly look at
the different FDR. Regarding the slight shift between the positions in BED or
XLS, this is normal and due to XLS formatting (I believe so)... Here's the
Readme's explanation: Coordinates in XLS is 1-based which is different with
BED format.

2.   See the MACS paper for a precise answer, but basically, the negative
peaks are used by MACS to calculate the FDR.

3.   The wig files contains both the position of your interval (your
reads) and a score. For this reason I like to use it for visualization of
my data, although it can be also used as input for downstream tools like CEAS
(gene annotation). Contrary to the wig file, the bed file does not include a
score, but only the precise location of your peaks (not reads). As these
peaks are detected by comparing the reads in your treatment versus control,
there is only one file (a peak corresponding to a significant enrichment of
reads that is not present in the control sample in one locus).

 

I hope I was able to help you... I'm sure a lot of people on this list can
give you more details if you need to (and more accurate).

 

Best,

 

 

 

From: galaxy-user-boun...@lists.bx.psu.edu
[mailto:galaxy-user-boun...@lists.bx.psu.edu] On Behalf Of peter scot
Sent: Tuesday, September 11, 2012 12:45 PM
To: galaxy-user@lists.bx.psu.edu
Subject: [galaxy-user] MACS out put files from Galaxy

 

I ran MACS on my chipseq dataset and found various files:

1. under html report there ar etwo files one of negative peaks.xls and second
is peaks.xls the file peaks.xls is same as  peaks .intreval file in the right
out put flow with one bp position added e..g if peak coordinate under html
report are 99 to 120 than in the peaks .interval it is 100 to 121. Which one
should be followed?

2. What is the meaning of negative peak. interval file?

3. I have used ctrl and treated sample to run MACS - there are two wig files
one ctrl.wig and another treatment. Wig; Do these two files belong to ctrl
and treated samples then where are corresponding bed files.

 

If someone can direct me to the out put as we get in Galaxy while using MACS
that will be helpful

 

Thanks

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] MACS out put files from Galaxy

2012-09-11 Thread Jennifer Jackson

Hi Peter,

Florian's answers are very good - I am not sure this will add much, but 
perhaps a little, for the Galaxy output datasets parts of the questions ...


The latest Using Galaxy paper, protocol 3, includes all of the 
optional output that MACS in Galaxy will produce (in addition to the 
linked files from the HTML report). Apart from the primary BED file and 
HTML output, there are 4 files paired by tags/control = 2 interval and 2 
wig.


The coordinate system used by each file specification can vary, as you 
observed and already noted. See the documentation links for exactly how 
these files are formatted. But regardless of the file coordinate system, 
a proper browser that interprets the datatype correctly will display the 
start/stop correctly, which is where the output datasets in Galaxy can 
be useful. Meaning, that whether the start in the file is 1-based or 
0-based, the actual start base will visualize as the same start base. 
Load the output into the UCSC Browser or Trackster in Galaxy and scroll 
into one of the regions to view this, and compare with the files, both 
datasets in Galaxy and downloaded through links) to better understand.


Full documentation for core MACS output is in the MACS documentation 
(link given by Florian, also linked from MACS tool page).


Documentation/examples for the Galaxy output files is in our paper:
http://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 (scroll to 
protocol 3)


http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi1005s38/full#bi1005-prot-0003 
(see step #6)


More help for datatypes:
http://wiki.g2.bx.psu.edu/Learn/Datatypes (bed, interval, wig are all 
covered with links to more resources)


Florian mostly covered these, but I'll also address to be clear:

On 9/11/12 9:45 AM, peter scot wrote:

I ran MACS on my chipseq dataset and found various files:

1. under html report there ar etwo files one of negative peaks.xls and
second is peaks.xls the file peaks.xls is same as  peaks .intreval file
in the right out put flow with one bp position added e..g if peak
coordinate under html report are 99 to 120 than in the peaks .interval
it is 100 to 121. Which one should be followed?

Related to different coordinate system. See file specifications.



2. What is the meaning of negative peak. interval file?
Is a type of control data - basically the inputs are flipped to produce 
it. May not be needed/useful for further downstream analysis. The advice 
to read the MACS doc to fully understand is a good one.




3. I have used ctrl and treated sample to run MACS - there are two wig
files one ctrl.wig and another treatment. Wig; Do these two files belong
to ctrl and treated samples then where are corresponding bed files.
These show the data density (pileup) in a graphical format. No bed 
files, although you can visualize these against the other bed and/or 
interval peak data to see how density was interpreted when calling peaks.


Hopefully this helps!

Jen
Galaxy team



If someone can direct me to the out put as we get in Galaxy while using
MACS that will be helpful

Thanks



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/