Re: [R] association of multiple variables

Michael Friendly Wed, 19 Feb 2014 09:12:18 -0800

Below is a somewhat more general version of David's function,
which allows a choice of the association statistic from
vcd::assocstats().  Of course, only Cramer's V is calculated
on a scale of 0-1 for an absolute-value measure of strength
of association, but this could be accommodated by scaling to
diagonals = 1.

The OP specified binary variables, so tetrachoric correlations
might be more appropriate here. John Fox's polycor package
provides a more general approach to this problem, including
polychoric and polyserial correlations, as well as a hetcor()
function to calculate correlation-like measures for mixtures
of different variable types, all providing standard errors
and therefore the possibility to compute p-values.

catcor <- function(x, type=c("cramer", "phi", "contingency")) {
        require(vcd)
        nc <- ncol(x)
        v <- expand.grid(1:nc, 1:nc)
        type <- match.arg(type)
        res <- matrix(mapply(function(i1, i2) assocstats(table(x[,i1],
                x[,i2]))[[type]], v[,1], v[,2]), nc, nc)
        rownames(res) <- colnames(res) <- colnames(x)
        res
}

e.g.

dat <- data.frame(
 v1=sample(LETTERS[1:5], 15, replace=TRUE),
 v2=sample(LETTERS[1:5], 15, replace=TRUE),
 v3=sample(LETTERS[1:5], 15, replace=TRUE))

> catcor(dat, type="phi")
         v1       v2       v3
v1 2.000000 1.073675 0.942809
v2 1.073675 2.000000 1.105542
v3 0.942809 1.105542 2.000000
> catcor(dat, type="cramer")
          v1        v2        v3
v1 1.0000000 0.5368374 0.4714045
v2 0.5368374 1.0000000 0.5527708
v3 0.4714045 0.5527708 1.0000000
> catcor(dat, type="contingency")
          v1        v2        v3
v1 0.8944272 0.7317676 0.6859943
v2 0.7317676 0.8944272 0.7416198
v3 0.6859943 0.7416198 0.8944272
>

On 2/18/2014 9:38 AM, David Carlson wrote:

You might modify this function which computes Cramer's V using
the assocstats() function in package vcd:

catcor <- function(x) {
        require(vcd)
        nc <- ncol(x)
        v <- expand.grid(1:nc, 1:nc)
        matrix(mapply(function(i1, i2) assocstats(table(x[,i1],
                x[,i2]))$cramer, v[,1], v[,2]), nc, nc)
}

e.g.

dat <- data.frame(v1=sample(LETTERS[1:5], 15, replace=TRUE),

+ v2=sample(LETTERS[1:5], 15, replace=TRUE),
+ v3=sample(LETTERS[1:5], 15, replace=TRUE))

catcor(dat)

           [,1]      [,2]      [,3]
[1,] 1.0000000 0.5633481 0.5773503
[2,] 0.5633481 1.0000000 0.6831301
[3,] 0.5773503 0.6831301 1.0000000

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Skála, Zdenek
(INCOMA GfK)
Sent: Tuesday, February 18, 2014 3:33 AM
To: r-help@r-project.org
Subject: [R] association of multiple variables

Dear all,

Please, is there a way in R to calculate association statistics
over more than 2 categorical (binary) variables?
I mean something similar what

cor(my.dataframe)

does for continuous variables, i.e. to have a matrix of
statistics and/or p-values as an output.

Many thanks!

Zdenek

- -
Zdenlk Skala
INCOMA GfK

        [[alternative HTML version deleted]]



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] association of multiple variables

Reply via email to