Re: [R] : unusual combinations of categorical data

2010-11-09 Thread Jim Lemon

On 11/09/2010 09:25 AM, Alan Chalk wrote:

Regarding unusual combinations of factors in categorical data.
Are there any R packages that can be used to identify the outliers i.e.
unusual combinations in categorical datasets ?


Hi Alan,
If your factors are dichotomous and you are looking for common patterns 
of intersection, try intersectDiagram.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : unusual combinations of categorical data

2010-11-09 Thread Michael Friendly

On 11/8/2010 5:25 PM, Alan Chalk wrote:

Regarding unusual combinations of factors in categorical data.
Are there any R packages that can be used to identify the outliers i.e.
unusual combinations in categorical datasets ?


Unusual combinations of factors are those that have large residuals in 
some loglinear model (or glm with poisson link)-- positive if the

observed frequencies are  expected, negative otherwise.
The most basic 'null' loglinear model is that of mutual independence,
however, if some of the factors are predictors, it makes sense to
include their highest interaction in the null model.

Fit the model with loglm() or glm(), and use vcd::mosaic() to visualize
the outliers.

HTH

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] : unusual combinations of categorical data

2010-11-08 Thread Alan Chalk
Regarding unusual combinations of factors in categorical data. 
Are there any R packages that can be used to identify the outliers i.e. 
unusual combinations in categorical datasets ? 

Thanks.




Notice of Confidentiality

This transmission contains information that may be confidential and that may 
also be privileged. Unless you are the intended recipient of the message (or 
authorised to receive it for the intended recipient) you may not copy, forward, 
or otherwise use it, or disclose it or its contents to anyone else. If you have 
received this transmission in error please notify us immediately and delete it 
from your system.

RSA Insurance Group plc. Registered in England No. 2339826. The Registered 
Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 3BD


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : unusual combinations of categorical data

2010-11-08 Thread Joshua Wiley
Hi,

On Mon, Nov 8, 2010 at 2:25 PM, Alan Chalk alan.ch...@gcc.rsagroup.com wrote:
 Regarding unusual combinations of factors in categorical data.

where all variables are categorical?

 Are there any R packages that can be used to identify the outliers i.e.
 unusual combinations in categorical datasets ?

outlier or unusual tends to be rather variable, that is something
unusual in one data set may not be in another.  If you are dealing
with strictly categorical variables, I am not certain how you would
define an outlier.  The categories only have the meaning attached to
them, so it seems like they would only indicate outliers if you
decided that an entire category was an outlier (e.g., males, females,
half-man-half-ox).  If you have one continuous variable in mind by
different levels of a factor, then you could just use some simple
plots (e.g., ggplot() + geom_point() + facet_grid(factor ~ .) or
something similar).  You could also z-score the values by each factor
level and then extract zscores more extreme than +/- 3 or whatever
value you like.  It might be easier to give you feedback if you have a
more specific example.

Cheers,

Josh


 Thanks.


 

 Notice of Confidentiality

 This transmission contains information that may be confidential and that may 
 also be privileged. Unless you are the intended recipient of the message (or 
 authorised to receive it for the intended recipient) you may not copy, 
 forward, or otherwise use it, or disclose it or its contents to anyone else. 
 If you have received this transmission in error please notify us immediately 
 and delete it from your system.

 RSA Insurance Group plc. Registered in England No. 2339826. The Registered 
 Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 
 3BD

 
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] : unusual combinations of categorical data

2010-11-08 Thread Michael Bedward
Perhaps just use the ftable function to generate a flat contingency
table and look for counts below some threshold.

Michael


On 9 November 2010 09:25, Alan Chalk alan.ch...@gcc.rsagroup.com wrote:
 Regarding unusual combinations of factors in categorical data.
 Are there any R packages that can be used to identify the outliers i.e.
 unusual combinations in categorical datasets ?

 Thanks.


 

 Notice of Confidentiality

 This transmission contains information that may be confidential and that may 
 also be privileged. Unless you are the intended recipient of the message (or 
 authorised to receive it for the intended recipient) you may not copy, 
 forward, or otherwise use it, or disclose it or its contents to anyone else. 
 If you have received this transmission in error please notify us immediately 
 and delete it from your system.

 RSA Insurance Group plc. Registered in England No. 2339826. The Registered 
 Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 
 3BD

 
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.