David’s answer assumes a more complicated objective, but obviously we are both 
unclear as to what you want.  Are you trying to find out which clusters have a 
unique pattern of mutation? (probably all of them, with so few clusters and so 
many genes?)

For either objective, this is not a statistical test, but a problem of 
identification.  For the simpler question, create a data frame with each row 
being the 150 1s and 0s associated with each cluster, and use duplicated() to 
identify unique rows. (unique rows will return “FALSE”)

Untested

On Aug 2, 2014, at 11:41 AM, David Winsemius <[email protected]> wrote:

> 
> On Aug 2, 2014, at 11:11 AM, Adrian Johnson wrote:
> 
>> Hi:
>> 
>> I am trying to identify mutually exclusive events from the following
>> example:
>> 
> #-------------
> dat <- read.table(text="Cluster      Gene      Mutated    not_mutated
>  1             G1             1              0
>  1             G2             1              0
>  1             G3             0              1
>  1             G4             0              1
>  1             G5             1              0
>  2             G1             0              1
>  2             G2             1              0
>  2             G3             1              0
>  2             G4             0              0
>  2             G5             1              0", header=TRUE, 
> stringsAsFactors=FALSE)
> 
> with(dat, table(Cluster, Gene, Mutated)  )
> #----------------
> , , Mutated = 0
> 
>       Gene
> Cluster G1 G2 G3 G4 G5
>      1  0  0  1  1  0
>      2  1  0  0  1  0
> 
> , , Mutated = 1
> 
>       Gene
> Cluster G1 G2 G3 G4 G5
>      1  1  1  0  0  1
>      2  0  1  1  0  1
> #--------------
> Or:
> xtabs(Mutated ~ Cluster+Gene, data=dat)
> #----------------
>       Gene
> Cluster G1 G2 G3 G4 G5
>      1  1  1  0  0  1
>      2  0  1  1  0  1
> 
> 
> I'm a bit unclear about your goals. Are you trying to identify the "Gene"s 
> that have only one "Cluster" mutated as the "G1-G3" events and the Gene's 
> that have either-Cluster but not both as the "G2-G5" events?
> 
> If so you can choose the columns that have a sum of 2 for the first and 
> columns with sum of 1 for the second.
>> 
>> 
>> In cluster 1 :  G1, G2, G5 are mutated
>> 
>> In cluster 2:    G2, G3, G5 are mutated.
>> 
>> 
>> I am interested in finding such G2-G5 event and G1-G3 events.
>> 
>> In total I have a 8 clusters and 150 gene (1200 rows x 4 columns).
>> 
>> What test could be appropriate to identify such pairs.
>> 
>> In my naive understanding would a fishers-exact test give such
>> combinations.
> 
> It's even less clear what sort of "test" you propose. `fisher.test` is a test 
> of association. It doesn't identify combinations.
>> 
>> Thanks a lot.
>> 
>> -Adrian
>> 
>>      [[alternative HTML version deleted]]
> 
> This is a plain text mailing list.
> 
>> 
>> ______________________________________________
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Don McKenzie
Research Ecologist
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington
[email protected]





        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to