In a package that I am updating, I have a data documentation file
monoCyteSim.Rd.  In this file, two data sets are documented: bivarSim
and ccSim.  The usage section is;

> \usage{
>     bivarSim
>     ccSim
> }

Since the data are lazy-loaded I *don't* wrap the names of the data
sets in "data()".

I do this in another data documentation file (SydColDat.Rd) without
problem. However when I check the package using --as-cran I get a

> * checking for code/documentation mismatches ... WARNING
> Variables with usage in documentation object 'monoCyteSim' but not in
> code: ‘bivarSim’ ‘ccSim’

No such warning seems to arise in respect of SydColDat.Rd.

Can anyone explain what is going on, and what if anything I can do
about it?  I would be grateful for any insight.

I have attached the two help files, the one that triggers the
warning and the one that doesn't.  I have changed the extension from
.Rd to .txt so that (I hope!) the mailer doesn't strip them away.

The complete package, as it currently stands (if this is of any
interest), is available from my web page:

Scroll to near the bottom and click on "Eglhmm".



Rolf Turner

Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
         +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619
Simulated monocyte counts and psychosis symptoms.
Discretised values of monocyte counts, and ratings of level of
psychosis simulated from a model fitted to a data set consisting of
observations made on a number of patients from the Northland District
Health Board system.  The real data must be kept confidential due
to ethics constraints.
These data are \bold{not} immediately available in the \code{eglhmm}
package.  Their presence would cause the size of the \code{data}
directory to exceed 4.5 Mb., which is unacceptably large.
Consequently these data sets have been placed in a separate
\dQuote{data only} package called \code{monoCyteSim}, which is
available from \code{github}.  This package may be obtained by
executing the command:
After having installed the \code{monoCyteSim} package, you may load
it via \code{library(monoCyteSim)} and then access the data sets
in the usual way, e.g. \code{X <- ccSim}.

Alternatively (after having installed the \code{monoCyteSim}
package) you may use the \code{::} syntax to access a single data
set, e.g. \code{X <- monoCyteSim::ccSim}.

You can access the documentation via, e.g., \code{?monoCyteSim::ccSim}.
\title{Sydney coliform bacteria data}
   Transformed counts of faecal coliform bacteria in sea water
   at seven locations: Longreef, Bondi East, Port Hacking ``50'',
   and Port Hacking ``100'' (controls) and Bondi Offshore, Malabar
   Offshore and North Head Offshore (outfalls).  At each location
   measurements were made at four depths: 0, 20, 40, and 60 meters.
  Data frames with 5432 observations on the following 6 variables.
    \item{\code{y}}{Transformed measures of the number of faecal
    coliform count bacteria in a sea-water sample of some specified
    volume.  The original measures were obtained by a repeated
    dilution process.

    For \code{SydColCount} the transformation used was essentially
    a square root transformation, resulting values greater than 150
    being set to \code{NA}.  The results are putatively compatible
    with a Poisson model for the emission probabilities.

    For \code{SydColDisc} the data were discretised
    using the \code{cut()} function with breaks given
    by \code{c(0,1,5,25,200,Inf)} and labels equal to

    Note that in the \code{SydColDisc} data there are 180 fewer
    missing values (\code{NA}s) in the \code{y} column than in
    the \code{SydColCount} data.  This is because in forming
    the \code{SydColCount} data (transforming the original data
    to a putative Poisson distribution) values that were greater
    than 150 were set equal to \code{NA}, and there were 180 such

    \item{\code{locn}}{a factor with levels \dQuote{LngRf}
    (Longreef), \dQuote{BondiE} (Bondi East), \dQuote{PH50}
    (Port Hacking 50), \dQuote{PH100}  (Port Hacking 100),
    \dQuote{BondiOff} (Bondi Offshore), \dQuote{MlbrOff} (Malabar
    Offshore) and \dQuote{NthHdOff} (North Head Offshore)}

    \item{\code{depth}}{a factor with levels \dQuote{0} (0 metres),
    \dQuote{20} (20 metres), \dQuote{40} (40 metres) and \dQuote{60}
    (60 metres).}

    \item{\code{}}{A factor with levels \code{no} and \code{yes},
    indicating whether the Malabar sewage outfall had been commissioned.}

    \item{\code{}}{A factor with levels \code{no} and \code{yes},
    indicating whether the North Head sewage outfall had been commissioned.}

    \item{\code{}}{A factor with levels \code{no} and \code{yes},
    i.ndicating whether the Bondi Offshore sewage outfall had been 

   The observations corresponding to each location-depth combination
   constitute a time series.  The sampling interval is ostensibly
   1 week; distinct time series are ostensibly synchronous.
   The measurements were made over a 194 week period.  See Turner
   et al. (1998) for more detail.
  Geoff Coade, of the New South Wales Environment Protection
  Authority (Australia) 
  T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson.  Hidden
  Markov chains in generalized linear models.  Canadian J. Statist.,
  vol. 26, pp. 107 -- 125, 1998.
  Rolf Turner.  Direct maximization of the likelihood of a hidden
  Markov model. \emph{Computational Statistics and Data Analysis}
  \bold{52}, pp. 4147 -- 4160, 2008, doi:10.1016/j.csda.2008.01.029.

# Select out a subset of four locations:
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn \%in\% loc4,] 
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)
______________________________________________ mailing list

Reply via email to