[Rd] Rounding multinomial proportions

Arni Magnusson Thu, 11 Feb 2010 01:09:58 -0800

I present you with a function that solves a problem that has bugged me formany years. I think the problem may be general enough to at least consideradding this function, or a revamped version of it, to the 'stats' package,with the other multinomial functions reside.

I'm using R to export data to text files, which are input data for anexternal model written in C++. Parts of the data are age distributions, inthe form of relative frequency in each year:


  Year  Age1   Age2   ...  Age10
  1980  0.123  0.234  ...  0.001
  ...   ...    ...    ...  ...

Each row should sum to exactly 1. The problem is that when I preprocesseach line in R as p<-a/sum(a), occasionally a line will sum to 0.999,1.002, or the like. This could either crash the external model or lead towrong conclusions.

I believe similar partitioning is commonly used in a wide variety ofmodels, making this a general problem for many modellers.

In the past, I have checked every line manually, and then arbitrarilytweaked one or two values up or down to make the row sum to exactly one,but two people would tweak differently. Another semi-solution is to writethe values to the text file in a very long format, but this would (1) makeit harder to visually check the numbers and (2) the numbers in the articleor report would no longer match the data files exactly, so otherscientists could not repeat the analysis and get the same results.

Once I implemented a quick and dirty solution, simply setting the lastproportion (Age10 above) as 1 minus the sum of ages 1-9. I quickly stoppedusing that approach when I started seeing negative values.

After this introduction, the attached round_multinom.html should makesense. The algorithm I ended up choosing comes from allocating seats inelections, so I was tempted to provide that application as well, althoughit makes the interface and documentation slightly more confusing.

The working title of this function was a short and catchy vote(), but Ichanged it to round_multinom(), even though it's not matrix-oriented likethe other *multinom functions. That would probably be straightforward todo, but I'll keep it as a vector function during the initial discussion.

I'm curious to hear your impressions and ideas. In the worst case, this isa not-so-great solution to a marginal problem. In the best case, thismight be worth a short note in the Journal of Statistical Software.


Thanks for your time,

Arni

P.S. In case the mailing list doesn't handle attachments, I've placed thesame files on http://www.hafro.is/~arnima/ for your convenience.

\name{round_multinom}
\alias{round_multinom}
\encoding{UTF-8}
\title{
  Round Multinomial Proportions (or Allocate Seats from Election
  Results)
}
\description{
  This function can round multinomial proportions to a given number of
  decimal places, while making sure the rounded proportions sum to
  exactly one. This is achieved using one of three algorithms that were
  originally invented to allocate seats from election results.

  It is often necessary to round proportions, e.g. to produce legible
  percentages for an article or a presentation. Rounding also takes
  place when data are written to a text file, to be analyzed by an
  external model. The rounded proportions often fail to add to exactly
  one,
  \preformatted{
    a <- c(67630, 116558, 207536, 251555, 356721)
    p <- round(a/sum(a), 3)  # 0.068  0.117  0.208  0.252  0.357
    sum(p)                   # 1.002}
  which would make the rounded proportions illegal input data for many
  models. Instead of manually checking and arbitrarily tweaking
  proportions so they add to exactly one, this function can guarantee
  that condition, using an unbiased algorithm.
}
\usage{
round_multinom(x, digits = NULL, seats = NULL, method="SL", labels = names(x))
}
\arguments{
  \item{x}{vector containing multinomial proportions or counts.}
  \item{digits}{
    number of decimal places to use when rounding multinomial
    proportions.
  }
  \item{seats}{number of seats to allocate from election results.}
  \item{method}{
    string specifying the algorithm to use: \code{"DH"}, \code{"MSL"},
    or \code{"SL"}.
  }
  \item{labels}{optional vector of names for the output vector.}
}
\details{
  This function should be called \emph{either} with a \code{digits}
  argument to round multinomial proportions, \emph{or} with a
  \code{seats} argument to allocate seats from election results, not
  both.

  The algorithms are variations of the \dQuote{highest averages} method
  for allocating seats proportionally from multiparty election results:
  \describe{
    \item{\code{"DH"}}{
      d'Hondt method, involves the series 1, 2, 3, \ldots, \eqn{n}.
      Favors big parties.
    }
    \item{\code{"MSL"}}{
      Modified Sainte-LaguÃ«, involves the series 1, 2.4, 3.8, \ldots,
      1.4\eqn{n}--0.4. Favors big parties slightly.
    }
    \item{\code{"SL"}}{
      Sainte-LaguÃ«. Involves the series 1, 3, 5, \ldots, 2\eqn{n}--1.
      Does not favor big or small parties. A reasonable default method
      for rounding multinomial proportions.
    }
  }
}
\value{
  Vector of same length as \code{x}, with rounded numbers whose sum is
  one (or integers whose sum is \code{seats}).
}
\note{
  d'Hondt is used to allocate parliamentary seats in most of Europe and
  South America, East Timor, Israel, Japan, and Turkey.

  Modified Sainte-LaguÃ« is used in Norway and Sweden.

  Sainte-LaguÃ« is used in Bosnia and Herzegovina, Kosovo, Latvia, and
  New Zealand.
}
\author{Arni Magnusson.}
\references{
  Balinski, M. and V. RamÃrez. 1999. Parametric methods of
  apportionment, rounding and production. \emph{Mathematical Social
    Sciences} \bold{37}:107--122.

  Diaconis, P. and D. Freedman. 1979. On rounding percentages.
  \emph{Journal of the American Statistical Association}
  \bold{74}:359--364.

  Mosteller, F., C. Youtz, and D. Zahn. 1967. The distribution of sums
  of rounded percentages. \emph{Demography} \bold{4}:850--858.

  \url{http://en.wikipedia.org/wiki/Highest_averages_method} (accessed
  10 Feb 2010).
}
\seealso{
  \code{\link{round}} is the standard rounding function in \R.

  \code{\link{rmultinom}} generates random multinomial vectors.
}
\examples{
## Guarantee that rounded multinomial proportions sum to 1:

a <- c(67630, 116558, 207536, 251555, 356721)
p <- a / sum(a)
p1 <- round(p, 3)           # 0.068  0.117  0.208  0.252  0.357
sum(p1)                     # 1.002, no good
p2 <- round_multinom(p, 3)  # 0.068  0.117  0.207  0.251  0.357
sum(p2)                     # 1


## The multinomial "proportions" can also be raw counts (a, not p):

p3 <- round_multinom(a, 3)  # 0.068  0.117  0.207  0.251  0.357
sum(p3)                     # 1


## Allocate 9 seats from 178 votes using different methods:

votes <- c(red=66, green=80, blue=32)
round_multinom(votes, seats=9, method="DH")   # 4 4 1
round_multinom(votes, seats=9, method="MSL")  # 3 4 2
round_multinom(votes, seats=9, method="SL")   # 3 4 2
}
\keyword{arith}
\keyword{distribution}

round_multinom <- function(x, digits=NULL, seats=NULL, method="SL", 
labels=names(x))
{
  method <- match.arg(toupper(method), c("DH","MSL","SL"))

  if(is.null(digits) && is.null(seats) || !is.null(digits) && !is.null(seats))
    stop("Please pass a value as 'digits' or 'seats', not both")
  if(!is.null(digits))
  {
    if(digits<0 || digits>6)
      stop("Please pass a positive value (0-6) as 'digits'")
    n <- as.integer(10^digits)
  }
  else
    n <- seats

  party <- seq_along(x)
  series <- switch(method,
                   DH  = 1 + 1  *(seq_len(n)-1),  # 1, 2,   3,   ..., n
                   MSL = 1 + 1.4*(seq_len(n)-1),  # 1, 2.4, 3.8, ..., 1.4n-0.4
                   SL  = 1 + 2  *(seq_len(n)-1))  # 1, 3,   5,   ..., 2n-1

  output <- data.frame(party=rep(party,each=n), score=as.numeric(sapply(x, 
function(votes) votes/series)))
  output <- factor(output$party)[order(-output$score)][seq_len(n)]
  output <- as.integer(table(output))

  if(!is.null(digits))
    output <- output / n
  names(output) <- labels

  return(output)
}

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rounding multinomial proportions

Reply via email to