In reviewing this I found an error in the case that there is an
outlier in one group with an equal value in another group that
is not an outlier. Also the iris example does not have duplicate
outliers so its not a very good test. Here is a much shorter
version that does not have the cited problem. Also we use
more suitable test data.
For each group, g, we find the indices in x, idx, of the values
corresponding to that group in out$out and then we use text()
to display those indices. (Note that it will overprint indices
if there are multiple outliers with the same value in a group.
One could try jittering the x or y values in text to address
this.)
x <- c(1:49, 100, 51:100, 101:148, 50, 50)
grp <- gl(3, 50)
out <- boxplot(x ~ grp)
for(g in unique(out$group)) {
idx <- which(x %in% out$out[out$group == g] & grp == g)
text(g, x[idx], idx, pos = 4, col = 2, cex = .5)
}
On 8/18/06, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> Try this:
>
> result <- boxplot(Petal.Length ~ Species, iris)
> if (length(result$out))
> text(result$group, result$out, match(result$out, iris$Petal.Length),
> pos = 4, col = "red")
>
> If the outliers can be non-unique then match is not enough.
> In that case assume that the nth occurrence of
> any value in result$out is also the nth occurrence in the
> vector boxplotted. (Sort the data frame by group if that is
> not the case.) This assumption is sufficient to allow us to write
> posof which gives the index into the data frame of any value in out.
>
> # determine position of x in y
> # assuming that if there are duplicates in x that
> # they occur the same number of times and in
> # the same order so that the 2nd occurrence of 37
> # in x would correspond to the 2nd occurrence of 37 in y
> posof <- function(x, y) {
> n <- sapply(seq(x), function(m) sum(x[m] == x[1:m]))
> mapply(function(x, n) which(y == x)[n], x, n)
> }
>
> result <- boxplot(Petal.Length ~ Species, iris)
> if (length(result$out))
> text(result$group, result$out, posof(result$out, iris$Petal.Length),
> pos = 4, col = "red")
>
>
>
> On 8/18/06, Ana Patricia Martins <[EMAIL PROTECTED]> wrote:
> > Hello R-users and developers,
> >
> >
> >
> > Once again, I'm asking for your help.
> >
> > I can identify outliers in boxplot with this instruction
> >
> >
> >
> > result <- boxplot( Income ~ Sex, col = "lightgray", data=dados)
> >
> > if (length(result$out))
> >
> > text(result$group, result$out, result$out, pos = 4, col = "red")
> >
> >
> >
> > But I can not identify the outlier's id (variable names) in the boxplot.
> >
> > Can anyone help me?
> >
> > Thanks in advance,
> >
> >
> >
> > Atenciosamente,
> >
> > Ana Patricia Martins
> >
> > -------------------------------------------
> >
> > Serviço Métodos Estatísticos
> >
> > Departamento de Metodologia Estatística
> >
> > INE - Portugal
> >
> > Telef: 218 426 100 - Ext: 3210
> >
> > E-mail: <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> >
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.