Re: [R] Plotting league tables/ caterpillar plots

Chris Evans Mon, 24 Jul 2006 10:10:52 -0700

Eik Vettorazzi sent the following  at 24/07/2006 13:04:
> Dear list,
> I was wondering if there is a function to plot league tables, sometimes  
> also known as "caterpillar plots"?
> A league table is conceptually very similar to a box plot. One difference  
> is that the inter-quartile ranges are not shown. If there isn't such a  
> function a first attempt for a "selfmade" plot would be to tell boxplot  
> not to plot boxes (sounds silly isn't it?). I've tried the option  
> "boxwex=0" but the result is unsatisfactory.
> 
> An example for a league table can be found in Marshall, Spiegelhalter  
> [1998], Reliability of league tables of in vitro fertilisation clinics,  
> BMJ1998;316:1701-1705, you may find it at http://bmj.bmjjournals.com



Interesting, never heard the term "caterpillar plot" but I like that.  I
also call related things "biplane plots": the body is the parameter and
the wings (not really biplanes at all are they) are the 95% CI for the
parameter.  However, these caterpillar plots are rather like forest
plots or "trees plots" (I argue that you can see both the wood and the
trees!)

I tend to plot them rotated by 90 degrees and without such good
labelling as in the Spiegelhalter paper, but this code may help you.
The "traffic lights" in this superimpose overall quartiles which is used
in some colleagues' way of looking at league tables.  I've used similar
plots for proportions, means, correlations and even alpha reliability
parameters, it's fairly easy to substitute different parameters and
their CIs.  I prefer to use a more exact estimator of the CI of the
median than the one that, if I remember rightly, is the default in the
boxplot which is, I think a much quicker but less robust estimator of
the CI for each sample.

I'd appreciate gentle constructive criticism of my coding, I know I'm a
much better psychotherapist than a programmer!

C

-- 
Chris Evans <[EMAIL PROTECTED]>
Professor of Psychotherapy, Nottingham University;
Consultant Psychiatrist in Psychotherapy, Rampton Hospital;
Research Programmes Director, Nottinghamshire NHS Trust;
Hon. SL Institute of Psychiatry, Hon. Con., Tavistock & Portman Trust
**If I am writing from one of those roles, it will be clear. Otherwise**

**my views are my own and not representative of those institutions    **

## user controlled initialising
title.txt <- paste("")
dependent <- E1SESSPL
indeped <- DBSITE
### special temporary addition for planned number of sessions
dependent[dependent==0] <- NA
missing.crit <- 50 # if the percentage of non-missing values for each level of 
the factor falls below this, colour changes
lights.on <- 0
colour.not.in.final <- 1

# legend.pos <- "bottomright" # options are "topleft" or "bottomright"
# legend.pos <- "topleft" # options are "topleft" or "bottomright"
legend.pos <- "topright" # options are "topleft","topright" or "bottomright"
sort.ord.flag <- "median" # options are "site","size","median"
#sort.ord.flag <- "size" # options are "site","size","median"
#sort.ord.flag <- "site" # options are "site","size","median"


## get the values
#xlow <- min(indeped)
#xhigh <- max(indeped)
tmp.dat1 <- cbind(dependent,indeped)
tmp.dat2 <- na.omit(tmp.dat1)
tmp.val.dep <- tmp.dat2[,1]
tmp.val.indep <- tmp.dat2[,2]
tmp.n <- table(tmp.val.indep)
site.names <- as.numeric(names(tmp.n))
xlen <- length(site.names)
xlow <- 1
xhigh <- xlen
print(xlen)
## OK deal with, and count, missings
print(tmp.n)
print(sum(tmp.n))
grand.n <- length(indeped)
usable.n <- length(tmp.val.dep)
n.missing <- grand.n - usable.n
perc.missing <- round(100*round(n.missing/grand.n,2),2)
grand.n.site <- table(indeped)
miss.triggered <- 0

#ord.site <- order(as.numeric(names(tmp.n)))

## set up storage
lwr <- rep(0,xlen)
skip.site <- lwr
site.median <- lwr
upr <- lwr
n.site <- lwr
perc.site <- lwr
prop <- lwr
n.sig.low <- 0
n.sig.high <- 0

## need to find limits
yhigh <- max(tmp.val.dep)
ylow  <- min(tmp.val.dep)
#yhigh <- 4
#ylow <- 0

## now use it
ref.median <- median(tmp.val.dep)
overall.max <- round(max(tmp.val.dep),2)
overall.min <- round(min(tmp.val.dep),2)
for (i in 1:xlen) {
        tmpval.site <- site.names[i]
        tmp.site.dat <- tmp.val.dep[tmp.val.indep==tmpval.site]
        n.site[i] <- length(tmp.site.dat)
   perc.site[i] <- 100*n.site[i]/grand.n.site[i]
        if (n.site[i] <= 5) {
                prop[i] <- -1           ##### impossible prop < 0 useful for 
sort order and to omit these
                                                        ##### carried over from 
binconf version of this
        } else {
                tmp.ci.med <- conf.med(tmp.site.dat)
                if (is.na(tmp.ci.med$lower) | is.na(tmp.ci.med$upper)) {
                        prop[i] <- 1
                } else {
                        lwr[i] <- tmp.ci.med$lower
                        upr[i] <- tmp.ci.med$upper
                        site.median[i] <- tmp.ci.med$median
                        if (lwr[i] > ref.median) n.sig.high <- n.sig.high + 1 # 
increment counter for high scoring service
                if (upr[i] < ref.median) n.sig.low <- n.sig.low + 1  # ditto 
for low
                }
        }
}

min.median <- signif(min(site.median),3)
max.median <- signif(max(site.median),3)
#yhigh <- max(upr)
#ylow <- min(lwr)
## use those to set the extreme CIs where these weren't calculable
upr[prop==-1] <- yhigh
lwr[prop==-1] <- ylow
## end of doing that!

ylen <- yhigh-ylow
yinc <- ylen/100

### *** this scale handling really needs rewriting *** ###
## adjusting block ##
# set up increments for text etc.
ylen <- yhigh - ylow
xinc <- 1 # might want to toy with this
yinc <- ylen/100
if (legend.pos == "topleft") { legpos.x <- xlow ; legpos.y <- yhigh }
if (legend.pos == "bottomright") { legpos.x <- xhigh - 15*xinc ; legpos.y <- 
ylow + 30*yinc }
if (legend.pos == "topright") { legpos.x <- xhigh - 15*xinc ; legpos.y <- yhigh 
}

### *** end of scale handling block *** ###



### plotting block ###
par(adj=.5, col=1, pch=1, lwd=1, lty=1) # just resetting the graphic parameters
plot(upr,xlim=c(xlow,xhigh),ylim=c(ylow,yhigh),ylab="Site 
median",xaxt="n",xlab="Services",type="n")
title(title.txt)


## next section puts in the horizontal ablines and traffic light coding if 
that's required
if (lights.on) { # add the "traffic light" areas
        t.lights <- (quantile(site.median)) # gets the five quantile summary 
for the traffic light boundaries
        
polygon(c(xlow-1,xlow-1,xhigh+1,xhigh+1),c(t.lights[1],t.lights[2],t.lights[2],t.lights[1]),col=8)
        
polygon(c(xlow-1,xlow-1,xhigh+1,xhigh+1),c(t.lights[2],t.lights[3],t.lights[3],t.lights[2]),col=5)
        
polygon(c(xlow-1,xlow-1,xhigh+1,xhigh+1),c(t.lights[3],t.lights[4],t.lights[4],t.lights[3]),col=12)
        
polygon(c(xlow-1,xlow-1,xhigh+1,xhigh+1),c(t.lights[4],t.lights[5],t.lights[5],t.lights[4]),col=4)
        abline(h=t.lights)
} else {
        abline(h=min.median)
        abline(h=max.median)
}
# put in the overall proportion 
# (which isn't the median of the sites' proportions and rarely would be)
par(lwd=2)
if (lights.on) par(lty=2)
abline(h=ref.median)
if (lights.on) par(lty=1)
par(lwd=1)
## end of ablines and traffic lights ##

## sort order
if (sort.ord.flag == "site") sort.ord <- ord.site 
if (sort.ord.flag == "size") sort.ord <- order(apply(tmp.n,2,sum))
if (sort.ord.flag == "median") sort.ord <- order(site.median)

## plot the data for each level of the independent factor
for (i in 1:xlen) {
   ## depending on the data available, format the plotting for this site
        if (prop[sort.ord][i] >= 0) {  # n for the site was > 1
          if (perc.site[i] > missing.crit){  # and enough non-missing data
              par(col=1)
            } else {
              par(col=8) # n large enough but too many missings
                                 miss.triggered <- 1
                                 if (colour.not.in.final) par(lty=2)
            }
        } else {
                par(lty=2)  # n was tiny
                par(col=8)
        }               

        lines(c(i,i),c(lwr[sort.ord][i],upr[sort.ord][i]))      # the CI line
        par(pch=22)                                                             
                                        # CI limits
        points(i,lwr[sort.ord][i])
        points(i,upr[sort.ord][i])
        par(lwd=4)
        points(i,site.median[sort.ord][i])                                      
        # the median
        par(pch=1,lwd=1)
        par(crt=90,adj=0,cex=.75,col=8)
        text(i,upr[sort.ord][i]+1*yinc,site.names[sort.ord][i])
        par(col=6,crt=0,adj=.5)
        if (add.n) {                                                            
                                # add the n if required
                text(i,lwr[sort.ord][i]-2*yinc,n.site[sort.ord][i])
        }
        par(crt=0,adj=.5,cex=1,col=1,lty=1,lwd=1) # reset the graphic formatting
}



if (min.median != 0) ratio <- as.character(signif(max.median/min.median,3))
        else ratio <- "Inf."
ref.median <- round(ref.median,2)
tmp.kw <- unlist(kruskal.test(dependent,indeped)) # one day I'll remember how 
to paste variable names!
kw.chisq <- round(as.numeric(tmp.kw[1]),2)
kw.df <- as.numeric(tmp.kw[2])
kw.p <- round(as.numeric(tmp.kw[3]),4)
        
if (!lights.on) {
        par(cex=.75,adj=0)
        tmptxt <- paste("Overall median = ",ref.median," on range from 
",overall.min," to ",overall.max,sep="")
        text(legpos.x,legpos.y-2*yinc,tmptxt)
        tmptxt <- paste("From n = ",usable.n," n(miss) = ",n.missing,"  %(miss) 
= ",perc.missing,sep="")
        text(legpos.x,legpos.y-5*yinc,tmptxt)
        tmptxt <- paste("Highest median = ",max.median," lowest = 
",min.median,sep="")
        text(legpos.x,legpos.y-8*yinc,tmptxt)
        tmptxt <- paste("Ratio, max:min = ",ratio,sep="")
        text(legpos.x,legpos.y-11*yinc,tmptxt)
        tmptxt <- paste("Kruskal-Wallis chi square = ",kw.chisq," d.f. = 
",kw.df," p = ",kw.p)
        text(legpos.x,legpos.y-14*yinc,tmptxt)
        tmptxt <- paste("Number \"significantly\" high",n.sig.high)
        text(legpos.x+xinc,legpos.y-17*yinc,tmptxt)
        tmptxt <- paste("Number \"significantly\" low",n.sig.low)
        text(legpos.x+xinc,legpos.y-20*yinc,tmptxt)
        tmp <- n.sig.low+n.sig.high
        tmptxt <- paste("Number \"significantly\" different",tmp)
        text(legpos.x+xinc,legpos.y-23*yinc,tmptxt)
        if (miss.triggered) {
                tmp <- round(100 - missing.crit,2)
                if (colour.not.in.final) {
                        tmptxt <- paste("Dotted CI indicates %(miss) > 
",tmp,"%",sep="")
                        text(legpos.x,legpos.y-26*yinc,tmptxt)
                } else {
                        tmptxt <- paste("Dotted red CI indicates %(miss) > 
",tmp,"%",sep="")
                        text(legpos.x,legpos.y-26*yinc,tmptxt)
                }
        }
} else {
        # graph too cluttered for all that if doing traffic lights so just
        # list the traffic light boundaries
        tmptxt1 <- paste("Quantile boundaries at: ")
        tmptxt2 <- paste(round(t.lights, 1),collapse=", ")
        tmptxt <- paste(tmptxt1,tmptxt2)
        text(legpos.x,legpos.y-24*yinc,tmptxt)  
}

par(adj=.5, col=1, pch=1, lwd=1, lty=1) # just resetting the graphic parameters

conf.med <- function(x) {
# Luc Wouters ([EMAIL PROTECTED]) sent me
# the following function:
# ----------Message from Luc Wouters-------------------------
# A distribution-free confidence interval on the median assuming only iid
# is based on the order statistics and the binomial distribution. See 
#   Lehmann (Nonparametrics: Statistical Methods Based on Ranks, 
#   Holden-Day, 1975, p.182-183). 
# The following S-Plus function can be used to set up such a 95%
# distribution-free confidence interval on the median:
        v <- sort(x, na.last = NA)
        n <- length(x)
        if(n > 0) {
                m <- median(x)
                i <- qbinom(0.025, n, 0.5)
                if(i > 0)
                        r <- c(m, v[i], v[n - i + 1])
                else r <- c(m, NA, NA)
        }
        else r <- c(NA, NA, NA)
        r <- as.data.frame(list(median = r[1], lower = r[2], upper = r[3]))
        class(r) <- "table"
        r
}

#conf.med(na.omit(bdi1$CORE14A))
tmp <- 
read.table("d:\\Data\\TCs\\ATC_Quality_Assurance_Panel\\Data3\\SP2000\\BDI1CORE14A.txt",as.is=TRUE)
conf.med(na.omit(tmp$V1))

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting league tables/ caterpillar plots

Reply via email to