Re: [R] Request for users of my R-Tcl/Tk examples, limmaGUI or affylmGUI
I have used your R-Tcl/Tk Examples. (Thanks! The ideas were much appreciated.) I am from the United Kingdom. David -- Professor David Firth http://www.warwick.ac.uk/go/dfirth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Request for users of my R-Tcl/Tk examples, limmaGUI or affylmGUI
On Thursday 02 February 2006 12:53, you wrote: ... [PLEASE REPLY _OFF_ THE LIST, i.e. DON'T CC to [EMAIL PROTECTED] oops! I just realised that I hit the wrong button and replied to r-help just now... Sorry! I guess that *is* one of the perils of sending a request such as this to a mailing list, where clumsy fat-fingered people such as me might reply. apologies all round, David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] : Model formula question
On Wednesday 01 February 2006 02:37, maneesh deshpande wrote: Hi, I have a data set with a continuous predictor X, a factor A and a continuous dependent variable Y. I am trying to build a linear model of the form: Y = (b0 + b1*X1)*B(A) where B(A) is a constant for each level of the factor A. I am not quite sure how to formulate the appropriate model formula. If I write: Y ~ ( 1 + X)/A , I get estimates for as many constants and slopes as the number of levels of A. Yes, that's right: the / symbol has a special (non-arithmetic) meaning when used like this in a model formula. See for example p151 onwards in the reference that is given by ?formula. What I really need is an overall multiplicative constant which depends on the factor A. The gnm (generalized nonlinear models) package has facilities for this. The model above could be specified there as Y ~ -1 + Mult(X, -1 + A) (where the first -1 removes the intercept, and the second one says to estimate a separate multiplier for each level of A rather than using contrasts in A). Or, if you want to constrain all of your multipliers to have the same sign, you can use Y ~ -1 + Mult(X, Exp(-1 + A)) (note the capital E there!). It is unclear to me that using the *same* set of multipliers for both intercept and slope will typically be the right thing to do, though. It would not, for example, be invariant to transformation of X to X-c, with c constant. That is to say, your X variable needs to be on a scale for which the zero value has a special meaning, in order to allow the above model to make sense. But presumably you have thought about this already. Hoping that helps, David -- Professor David Firth http://www.warwick.ac.uk/go/dfirth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] edit.data.frame
On 12 Jan 2006, at 22:17, Fredrik Lundgren wrote: Dear list, Sometimes I have huge data.frames and the small spreadsheetlike edit.data.frame is quite handy to get an overview of the data. However, when I close the editor all data are rolled over the console window, which takes time and clutters the window. Is there a way to avoid this? An alternative to the editor is showData() from the relimp package. It is modeless, meaning that your data window can be left open/ minimized while you work in R. I haven't tested it with _very_ large data frames though. David -- Professor David Firth http://www.warwick.ac.uk/go/dfirth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Firths bias correction for log-linear models
On 12 Jan 2006, at 20:54, [EMAIL PROTECTED] wrote: Dear R-Help List, I'm trying to implement Firth's (1993) bias correction for log-linear models. Firth (1993) states that such a correction can be implemented by supplementing the data with a function of h_i, the diagonals from the hat matrix, but doesn't provide further details. I can see that for a saturated log-linear model, h_i=1 for all i, hence one just adds 1/2 to each count, which is equivalent to the Jeffrey's prior, but I'd also like to get bias corrected estimates for other log linear models. It appears that I need to iterate using GLM, with the weights option and h_i, which I can get from the function hatvalues. For logistic regression, this can be performed by splitting up each observation into response and nonresponse, and using weights as described in Heinze, G. and Schemper, M. (2002), but I'm unsure of how to implement the analogue for log-linear models. A procedure using IWLS is described by Firth (1992) in Dodge and Whittaker (1992), but this book isn't in the local library, and its $141+ on Amazon. I've tried looking at the code in the logistf and brlr libraries, but I haven't had any (successful) ideas. Can anyone help me in describing how to implement this in R? I don't recommend the adjusted IWLS approach in practice, because that algorithm is only first-order convergent. It is mainly of theoretical interest. The brlr function (in the brlr package) provides a template for a more direct approach in practice. The central operation there is an application of optim(), with objective function - (l + (0.5 * log(detinfo))) in which l is the log likelihood and detinfo is the determinant of the Fisher information matrix. In the case of a Poisson log-linear model, the Fisher information is, using standard GLM-type notation, t(X) %*% diag(mu) %*% X. It is straightforward to differentiate this penalized log-likelihood function, so (as in brlr) derivatives can be supplied for use with a second-order convergent optim() algorithm such as BFGS (see ?optim for a reference on the algorithm). I hope that helps. Please feel free to contact me off the list if anything is unclear. Kind regards, David -- Professor David Firth http://www.warwick.ac.uk/go/dfirth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to predict with logistic model in package logistf ?
On 27 Oct 2005, at 09:18, jinlong li wrote: dear community, I am a beginer in R , and can't predict with logistic model in package logistf, Not exactly the answer to your question, but an alternative to the logistf package, which purports to do similar things, is brlr (which does provide a predict method). Does this work for you? library(brlr) data(sex2) fit - brlr(case ~ age + oc + vic + vicl + vis + dia, data = sex2) predict(fit, newdata = sex2) David could anyone help me ? thanks ! the following is my command and result : library(logistf) data(sex2) fit-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2) predict(fit,newdata=sex2) Error in predict(fit, newdata = sex2) : no applicable method for predict __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is it possible to use glm() with 30 observations?
On 2 Jul 2005, at 06:01, Spencer Graves wrote: The issue is not 30 observations but whether it is possible to perfectly separate the two possible outcomes. Consider the following: tst.glm - data.frame(x=1:3, y=c(0, 1, 0)) glm(y~x, family=binomial, data=tst.glm) tst2.glm - data.frame(x=1:1000, y=rep(0:1, each=500)) glm(y~x, family=binomial, data=tst2.glm) The algorithm fits y~x to tst.glm without complaining for tst.glm, but issues warnings for tst2.glm. This is called the Hauck-Donner effect, and RSiteSearch(Hauck-Donner) just now produced 8 hits. For more information, look for Hauck-Donnner in the index of Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New York: Springer. Not exactly. The phenomenon that causes the warning for tst2.glm above is more commonly known as complete separation. For some comments on its implications you might look at another work by B D Ripley, the 1996 book Pattern Recognition and Neural Networks. There are some further references in the help files of the brlr package on CRAN. The problem noted by Hauck and Donner (1997, JASA) is slightly related, but not the same. See the aforementioned book by Venables and Ripley, for example. The glm function does not routinely warn us about the Hauck-Donner effect, afaik. The original poster did not say what was the purpose of the logistic regression was, so it is hard to advise. Depending on the purpose, the separation that was detected may or may not be a problem. Regards, David (If you don't already have this book, I recommend you give serious consideration to purchasing a copy. It is excellent on many issues relating to statistical analysis and R. Spencer Graves Kerry Bush wrote: I have a very simple problem. When using glm to fit binary logistic regression model, sometimes I receive the following warning: Warning messages: 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, What does this output tell me? Since I only have 30 observations, i assume this is a small sample problem. Is it possible to fit this model in R with only 30 observations? Could any expert provide suggestions to avoid the warning? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ranking predictive features in logsitic regression
On 30 Jun, 2005, at 21:20, Stephen Choularton wrote: Hi Is there some function R that multiplies each coefficient by the standard deviation of the corresponding variable and produces a ranking? Possibly you meant un-signed coefficients? In which case something like function(model) rank(abs(coef(model)) * apply(model.matrix(model), 2, sd)) should do what you asked about. The relimp package provides approximate inference for comparisons of this kind. I should say that I don't think that such a ranking will often be very useful, though. Some coefficients will be determined with greater precision than others, and there may be correlations to worry about, or variables may only make sense when considered in groups (eg factor effects, or interactions with corresponding main effects, etc.) David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mu^2(1-mu)^2 variance function for GLM
Dear Henric I do not have a ready stock of other examples, but I do have my own version of a family function for this, reproduced below. It differs from yours (apart from being a regular family function rather than using a modified quasi) in the definition of deviance residuals. These necessarily involve an arbitrary constant (see McCullagh and Nelder, 1989, p330); in my function that arbitrariness is in the choice eps - 0.0005. I don't think the deviance contributions as you specified in your code below will have the right derivative (with respect to mu) for observations where y=0 or y=1. Anyway, this at least gives you some kind of check. I hope it helps. This function will be part of a new package which Heather Turner and I will submit to CRAN in a few days' time. Please do let me know if you find any problems with it. Here is my wedderburn family function: wedderburn - function (link = logit) { linktemp - substitute(link) if (!is.character(linktemp)) { linktemp - deparse(linktemp) if (linktemp == link) linktemp - eval(link) } if (any(linktemp == c(logit, probit, cloglog))) stats - make.link(linktemp) else stop(paste(linktemp, link not available for wedderburn quasi-family;, available links are, \logit\, \probit\ and \cloglog\)) variance - function(mu) mu^2 * (1-mu)^2 validmu - function(mu) { all(mu 0) all(mu 1)} dev.resids - function(y, mu, wt){ eps - 0.0005 2 * wt * (y/mu + (1 - y)/(1 - mu) - 2 + (2 * y - 1) * log((y + eps)*(1 - mu)/((1- y + eps) * mu))) } aic - function(y, n, mu, wt, dev) NA initialize - expression({ if (any(y 0 | y 1)) stop(paste( Values for the wedderburn family must be in [0,1])) n - rep.int(1, nobs) mustart - (y + 0.1)/1.2 }) structure(list(family = wedderburn, link = linktemp, linkfun = stats$linkfun, linkinv = stats$linkinv, variance = variance, dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta, initialize = initialize, validmu = validmu, valideta = stats$valideta), class = family) } Best wishes, David http://www.warwick.ac.uk/go/dfirth On 16 Jun 2005, at 09:27, Henric Nilsson wrote: Dear list, I'm trying to mimic the analysis of Wedderburn (1974) as cited by McCullagh and Nelder (1989) on p.328-332. This is the leaf-blotch on barley example, and the data is available in the `faraway' package. Wedderburn suggested using the variance function mu^2(1-mu)^2. This variance function isn't readily available in R's `quasi' family object, but it seems to me that the following definition could be used: }, mu^2(1-mu)^2 = { variance - function(mu) mu^2 * (1 - mu)^2 validmu - function(mu) all(mu 0) all(mu 1) dev.resids - function(y, mu, wt) 2 * wt * ((2 * y - 1) * (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1, (1 - y)/(1 - mu - 2 + y/mu + (1 - y)/(1 - mu)) I've modified the `quasi' function accordingly (into `quasi2' given below) and my results are very much in line with the ones cited by McCullagh and Nelder on p.330-331: data(leafblotch, package = faraway) summary(fit - glm(blotch ~ site + variety, + family = quasi2(link = logit, variance = mu^2(1-mu)^2), + data = leafblotch)) Call: glm(formula = blotch ~ site + variety, family = quasi2(link = logit, variance = mu^2(1-mu)^2), data = leafblotch) Deviance Residuals: Min1QMedian3Q Max -3.23175 -0.65385 -0.09426 0.46946 1.97152 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -7.922530.44463 -17.818 2e-16 *** site21.383080.44463 3.111 0.00268 ** site33.860130.44463 8.682 8.18e-13 *** site43.556970.44463 8.000 1.53e-11 *** site54.108410.44463 9.240 7.48e-14 *** site64.305410.44463 9.683 1.13e-14 *** site74.918110.44463 11.061 2e-16 *** site85.694920.44463 12.808 2e-16 *** site97.067620.44463 15.896 2e-16 *** variety2-0.467280.46868 -0.997 0.32210 variety3 0.078770.46868 0.168 0.86699 variety4 0.954180.46868 2.036 0.04544 * variety5 1.352760.46868 2.886 0.00514 ** variety6 1.328590.46868 2.835 0.00595 ** variety7 2.340660.46868 4.994 3.99e-06 *** variety8 3.262680.46868 6.961 1.30e-09 *** variety9 3.135560.46868 6.690 4.10e-09 *** variety103.887360.46868 8.294 4.33e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
Re: [R] error messages on R CMD check
Dear Johannes I have noticed the same complaint, and you are right that it can be unconnected with faulty S3 methods, etc., in your package. For example, in my relimp package (current version 0.9-1, new on CRAN today) the DESCRIPTION file has Suggests: tcltk and that works fine (ie it passes R CMD check). But if I change that to Depends: tcltk I get the same kind of errors that you got: david% R CMD check relimp [...] * checking S3 generic/method consistency ... WARNING Error in .try_quietly({ : Error: package 'tcltk' could not be loaded Execution halted See section 'Generic functions and methods' of the 'Writing R Extensions' manual. * checking replacement functions ... WARNING Error in .try_quietly({ : Error: package 'tcltk' could not be loaded Execution halted In R, the argument of a replacement function which corresponds to the right hand side must be named 'value'. * checking foreign function calls ... WARNING Error in .try_quietly({ : Error: package 'tcltk' could not be loaded Execution halted See section 'System and foreign language interfaces' of the 'Writing R Extensions' manual. * checking Rd files ... OK * checking for missing documentation entries ... ERROR Error in .try_quietly({ : Error: package 'tcltk' could not be loaded In this case it is because the calling environment for R CMD check did not have the DISPLAY variable set, so tcltk could not be loaded. If instead I do david% setenv DISPLAY :0 david% R CMD check relimp then all checks are passed. This is with david% R --version R 2.0.1 (2004-11-15). This is not really an explanation of what you got, just a confirmation that something here probably needs fixing (even if it's only the error message). Perhaps one of us should file a bug report (having checked first that a fix is not already made in the latest development version)? David On 30 Mar, 2005, at 13:34, Johannes Hüsing wrote: Dear all, I am trying to wrap up a package. On entering R CMD check, I get the following error messages: [...] * checking S3 generic/method consistency ... WARNING Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc, character.only = TRUE, verbose = FALSE) : package/namespace load failed for 'resper' Execution halted See section 'Generic functions and methods' of the 'Writing R Extensions' manual. * checking replacement functions ... WARNING Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc, character.only = TRUE, verbose = FALSE) : package/namespace load failed for 'resper' Execution halted In R, the argument of a replacement function which corresponds to the right hand side must be named 'value'. * checking foreign function calls ... WARNING Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc, character.only = TRUE, verbose = FALSE) : package/namespace load failed for 'resper' Execution halted See section 'System and foreign language interfaces' of the 'Writing R Extensions' manual. * checking Rd files ... OK * checking for missing documentation entries ... ERROR Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc, character.only = TRUE, verbose = FALSE) : I am not getting these messages, nor their relation to section 'Generic functions and methods' of the 'Writing R Extensions' manual, as I did not write any generic methods in my package. There has been some discussion of the error message around Christmas 2003: http://maths.newcastle.edu.au/~rking/R/devel/03b/1438.html, but I can't see how the circumstances described there apply to my situation (export a class name). Could somebody give me a clue on how to give my search a direction? Greetings Johannes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] model.matrix for a factor effect with no intercept
I was surprised by this (in R 2.0.1): a - ordered(-1:1) a [1] -1 0 1 Levels: -1 0 1 model.matrix(~ a) (Intercept) a.La.Q 1 1 -7.071068e-01 0.4082483 2 1 -9.073800e-17 -0.8164966 3 1 7.071068e-01 0.4082483 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$a [1] contr.poly model.matrix(~ -1 + a) a-1 a0 a1 1 1 0 0 2 0 1 0 3 0 0 1 attr(,assign) [1] 1 1 1 attr(,contrasts) attr(,contrasts)$a [1] contr.poly Without the intercept, treatment contrasts seem to have been used (this despite the contr.poly in the contrasts attribute). It's not restricted to ordered factors. For example, if Helmert contrasts are used for nominal factors, the same sort of thing happens. I suppose it is a deliberate feature (perhaps to protect the user from accidentally fitting models that make no sense? or maybe some better reason?) -- is it explained somewhere? David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] model.matrix for a factor effect with no intercept
Brian, many thanks for these helpful pointers: On 23 Feb 2005, at 15:45, Prof Brian Ripley wrote: MASS4 p.150 White Book p.38 Those are the only two reasonably comprehensive accounts that I am aware of (and they have only partial overlap). The underlying motivation is to span the _additional_ vector space covered by the term, the complement to what has gone before. Put another way, as each term is added, only enough columns are added to the model matrix to span the same space as if dummy coding had been used for that term and its predecessors. So think of this as a way to produce a parsimonious (usually full-rank) basis for the model space. Yes, indeed. My surprise was that *this* particular basis (dummy coding) was the one used here. I should have got a clue from what contrasts() does, and is documented to do, options(contrasts) $contrasts [1] contr.treatment contr.poly contrasts(a) .L .Q [1,] -7.071068e-01 0.4082483 [2,] -9.073800e-17 -0.8164966 [3,] 7.071068e-01 0.4082483 but when there's no intercept in the model the contrasts used appear to be contrasts(a, contrasts = FALSE) -1 0 1 -1 1 0 0 0 0 1 0 1 0 0 1 which are not the same as contr.poly(a, contrasts = FALSE) ^0^1 ^2 [1,] 1 -7.071068e-01 0.4082483 [2,] 1 -9.073800e-17 -0.8164966 [3,] 1 7.071068e-01 0.4082483 -- which is what I had naively expected to get in my model matrix. This is of course all a matter of convention. The present convention does seem a touch confusing though: the basis for the space spanned by a factor is determined by options(contrasts) or by the contrasts attribute of the factor or by the contrasts argument in the call, *except* when there's no intercept or other factor earlier in the model in which case all such settings are ignored (well, not *quite* ignored: they do get put in the contrasts attribute of the resultant model matrix). On the other hand, one good reason to use dummy coding for the first-encountered factor when there's no intercept is that the associated parameters are then often interpretable as group-specific intercepts. Would it be an improvement, though, if the contrasts attribute of the resultant model matrix contained contr.treatment in such cases instead of the name of a contrast function that was not actually used? Best wishes, David On Wed, 23 Feb 2005, David Firth wrote: I was surprised by this (in R 2.0.1): a - ordered(-1:1) a [1] -1 0 1 Levels: -1 0 1 model.matrix(~ a) (Intercept) a.La.Q 1 1 -7.071068e-01 0.4082483 2 1 -9.073800e-17 -0.8164966 3 1 7.071068e-01 0.4082483 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$a [1] contr.poly model.matrix(~ -1 + a) a-1 a0 a1 1 1 0 0 2 0 1 0 3 0 0 1 attr(,assign) [1] 1 1 1 attr(,contrasts) attr(,contrasts)$a [1] contr.poly Without the intercept, treatment contrasts seem to have been used (this despite the contr.poly in the contrasts attribute). It's not restricted to ordered factors. For example, if Helmert contrasts are used for nominal factors, the same sort of thing happens. I suppose it is a deliberate feature (perhaps to protect the user from accidentally fitting models that make no sense? or maybe some better reason?) -- is it explained somewhere? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CGIwithR
See below for the reply I sent you when you asked me this earlier today. David On 16 Jan 2005, at 20:22, ebashi wrote: Dear R users; I'm trying to use CGIwithR on a linux machine, I have followed the instructions on the package manual but still it does not run, the message that I get is as follows: The requested URL was not found on this server I used the example trivial, I put trivial.html under Web directory and trivial.R in cgi-bin directory, which itself is a subdirectory of Web directory, ( I have changed the modes of R.cgi and .Rprofile according to what package says) but i still get the same message, do you have any tips for me? my question is that where should for example myscript.R that is mentioned in the manual, be placed? (under Web directory or under cgi-bin). besides the path to R and GS, should anything else in the R.cgi be changed? many tanx in advance Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dear Sean Thanks for your email. It is hard to diagnose your problem at a distance, but here is some general stuff: -- the trivial.html file needs to be in a place that is served by your http server (is that the URL that is not found? ie can you see the form in your browser? it wasn't clear to me from your email.) -- your web server needs to be configured to allow CGI scripts, and probably to specify one or more directories where such scripts are allowed to live. This is often a directory called cgi-bin; but note that it is not enough just to create a directory with that name, you must tell your webserver through its configuration file(s) that it exists and that CGI scripts placed there can be run. How this is done depends on which HTTP server you are running, and I cannot help you with that I'm afraid. -- the trivial.R, R.cgi and .Rprofile files all need to be in one of the places where CGI scripts are allowed (by your HTTP server) to live. I hope that helps. Good luck. best wishes, David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] link to a vignette/overview document in .Rd file?
Is there a reliable way to link to a vignette or other overview document, which has been included in the inst/doc directory of a source package, within a package help file in Rd format? or maybe link to an index of such documents? David Professor David Firth Dept of Statistics University of Warwick Coventry CV4 7AL United Kingdom Voice: +44 (0)247 657 2581 Fax: +44 (0)247 652 4532 Web: http://www.warwick.ac.uk/go/dfirth __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R code for var-cov matrix given variances and correlations
On 21 Dec, 2004, at 13:09, Haynes, Maurice (NIH/NICHD) wrote: Dear list members, Where can I find code for computing the p*p variance-covariance matrix given a vector of p variances (ordered varA, varB, ..., varp) and a vector of all possible correlations (ordered corAB, corAC, ..., corp-1,p)? Something like this (not tested): sd.vec - sqrt(var.vec) n - length(sd.vec) cor.mat - matrix(1, n, n) for (i in 2:n){ for (j in 1:i-1){ cor.mat[i,j] - cor.vec[(i-1)*(i-2)/2 + j] cor.mat[j,i] - cor.mat[i,j]}} cov.mat - cor.mat * outer(sd.vec, sd.vec) I hope that helps. David I know that the covariance between 2 variables is equal to the product of their correlation and their standard deviations: corAB * varA^.5 * varB^.5 and so: covAB - function(corAB, varA, varB) { corAB * varA^.5 * varB^.5 } If the vector of variances were var.vec - c(14, 12, 7) and the vector of correlations were cor.vec - c(.4, .2, .5), then the vector of covariances would be: covAB(c(.4, .2, .5),c(14, 14, 12), c(12, 7, 7)) [1] 5.184593 1.979899 4.582576 and the variance-covariance matrix with covariances rounded to the first decimal place would be: vmat - matrix(c(14, 5.2, 2.0, 5.2, 12, 4.6, 2.0, 4.6, 7), + nrow=3) vmat [,1] [,2] [,3] [1,] 14.0 5.2 2.0 [2,] 5.2 12.0 4.6 [3,] 2.0 4.6 7.0 So the question is: How can I generate a p*p variance-covariance matrix from a vector of variances and a vector of correlations without resorting to a construction like: vmat - matrix(rep(0, p*p), nrow=p) if (p == 2) { vmat[1,1] - var[1] vmat[1,2] - cor[1] * (var[1]^.5 * var[2]^.5) vmat[2,1] - cor[1] * (var[2]^.5 * var[1]^.5) vmat[2,2] - var[2] } if (p == 3) { vmat[1,1] - var[1] vmat[1,2] - cor[1] * (var[1]^.5 * var[2]^.5) vmat[1,3] - cor[2] * (var[1]^.5 * var[3]^.5) vmat[2,1] - cor[1] * (var[2]^.5 * var[1]^.5) vmat[2,2] - var[2] vmat[2,3] - cor[3] * (var[2]^.5 * var[3]^.5) vmat[3,1] - cor[2] * (var[3]^.5 * var[1]^.5) vmat[3,2] - cor[3] * (var[3]^.5 * var[2]^.5) vmat[3,3] - var[3] } and so forth? Thanks, Maurice Haynes National Institute of Child Health and Human Development Child and Family Research Section 6705 Rockledge Drive, Suite 8030 Bethesda, MD 20892 Voice: 301-496-8180 Fax: 301-496-2766 E-mail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multinomial logistic regression
On 23 Sep 2004, at 01:36, array chip wrote: Hi, how can I do multinomial logistic regression in R? I think glm() can only handle binary response variable, and polr() can only handle ordinal response variable. how to do logistic regression with multinomial response variable? Perhaps multinom() in package nnet (in bundle VR) will do what you want? Regards, David Thanks __ Y! Messenger - Communicate in real time. Download now. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ordered probit and cauchit
On Wednesday, Sep 22, 2004, at 03:42 Europe/London, roger koenker wrote: What is the current state of the R-art for ordered probit models, and more esoterically is there any available R strategy for ordered cauchit models, i.e. ordered multinomial alternatives with a cauchy link function. MCMC is an option, obviously, but for a univariate latent variable model this seems to be overkill... standard mle methods should be preferable. (??) A quick look at polr (in the MASS package) suggests to me that it wouldn't be all that hard to extend it to link functions other than logistic. Is MLE known to be well behaved in these models if the latent variable is Cauchy? David Googling reveals that spss provides such functions... just to wave a red flag. url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cgi/servlets/httpd in R
On Thursday, May 6, 2004, at 16:01 Europe/London, Samuelson, Frank* wrote: Here's a related question: Do any of the mentioned R-web interfaces (Rweb, R-Online, CGIwithR, RSPerl) support reusing the same R process, eliminating the startup overhead? This would be useful to me as well. Currently I use such a method on my computing cluster: All 40 compute nodes run an R process/compute server that listens at a socket for any connection and subsequent commands from another computer. When the master process disconnects, the R processes go back to listening at the socket. Connecting to the R compute servers this way takes 2 milliseconds rather than the typical ~2 second R startup time. I only know about CGIwithR, which is very simple and starts a new R for every request. Startup time on my Mac is around 1.3 seconds; for my own purposes this is absolutely fine, but I can appreciate that it could well be problematic in certain kinds of application (eg where a very large amount of data has to be loaded). Best wishes, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cgi/servlets/httpd in R
On Wednesday, May 5, 2004, at 18:09 Europe/London, foobar wrote: Hi R-helpers Has anyone had any experience doing CGI or Servlets or using an httpd server in R? yes. See the R FAQ, section 4. (Or maybe you already have, in which case I misunderstood the question...) Best wishes, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm questions
Dear Paul Here are some attempts at your questions. I hope it's of some help. On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote: Greetings, everybody. Can I ask some glm questions? 1. How do you find out -2*lnL(saturated model)? In the output from glm, I find: Null deviance: which I think is -2[lnL(null) - lnL(saturated)] Residual deviance: -2[lnL(fitted) - lnL(saturated)] The Null model is the one that includes the constant only (plus offset if specified). Right? I can use the Null and Residual deviance to calculate the usual model Chi-squared statistic -2[lnL(null) - lnL(fitted)]. But, just for curiosity's sake, what't the saturated model's -2lnL ? It's important to remember that lnL is defined only up to an additive constant. For example a Poisson model has lnL contributions -mu + y*log(mu) + constant, and the constant is arbitrary. The differencing in the deviance calculation eliminates it. What constant would you like to use?? 2. Why no 'scaled deviance' in output? Or, how are you supposed to tell if there is over-dispersion? I just checked andSAS gives us the scaled and nonscaled deviance. I have read the Venables Ripley (MASS 4ed) chapter on GLM . I believe I understand the cautionary point about overdispersion toward the end (p. 408). Since I'm comparing lots of other books at the moment, I believe I see people using the practice that is being criticized. The Pearson Chi-square based estimate of dispersion is recommended and one uses an F test to decide if the fitted model is significantly worse than the saturated model. But don't we still assess over-dispersion by looking at the scaled deviance (after it is calculated properly)? Can I make a guess why glm does not report scaled deviance? Are the glm authors trying to discourage us from making the lazy assessment in which one concludes over-dispersion is present if the scaled deviance exceeds the degrees of freedom? I am unclear what you are asking here. I assume by scaled deviance you mean deviance divided by phi, a (known) scale parameter? (I'm sorry, I don't know SAS's definition.)In many applications (eg binomial, Poisson) deviance and scaled deviance are the same thing, since phi is 1. Yes, if you wanted to judge overdispersion relative to some other value of phi you would scale the deviance. What other value of phi would you like? 3. When I run example(glm) at the end there's a Gamma model and the printout says: (Dispersion parameter for Gamma family taken to be 0.001813340) I don't find an estimate for the Gamma distribution's shape paremeter in the output. I'm uncertain what the reported dispersion parameter refers to. Its the denominator (phi) in the exponential family formula, isn't it? y*theta - c(theta) exp [ -- h(y,phi) ] phi Phi is the coefficient of variation, ie variance/(mean^2). Thus it is a shape parameter. If you are used to some other parameterization of the gamma family, just express the mean and variance in that parameterization to see the relation between your parameters and phi. 4. For GLM teaching purposes, can anybody point me at some good examples of GLM that do not use Normal, Poisson, Negative Binomial, and/or Logistic Regression? I want to justify the effort to understand the GLM as a framework. We have already in previous semesters followed the usual econometric approach in which OLS, Poisson/Count, and Logistic regression are treated as special cases. Some of the students don't see any benefit from tackling the GLM's new notation/terminology. McCullagh and Nelder (1989) has some I believe, eg gamma models. Also quasi-likelihood models, such as the Wedderburn (1974) approach to analysis of 2-component compositional data (the leaf blotch example in McCN). On the more general point: yes, if all that students need to know is OLS, Poisson rate models and logistic regression, then GLM is overkill. The point, surely, is that GLM opens up a way of thinking in which mean function and variance function are specified separately? This becomes most clear through a presentation of GLMs via quasi-likelihood (as a the right generalization of weighted least squares) rather than via the exponential-family likelihoods. In my opinion. 4. Is it possible to find all methods that an object inherits? I found out by reading the source code for J Fox's car package that model.matrix() returns the X matrix of coded input variables, so one can do fun things like calculate robust standard errors and such. That's really useful, because before I found that, I was recoding up a storm to re-create the X matrix used in a model. Is there a direct way to find a list of all the other methods that would apply to an object? methods(class=glm) methods(class=lm) is probably not as direct as you had in mind! But it's a start. Best wishes, David
Re: [R] glm questions --- saturated model
On Tuesday, Mar 16, 2004, at 14:51 Europe/London, BXC (Bendix Carstensen) wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Firth Sent: Tuesday, March 16, 2004 1:12 PM To: Paul Johnson Cc: [EMAIL PROTECTED] Subject: Re: [R] glm questions Dear Paul Here are some attempts at your questions. I hope it's of some help. On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote: Greetings, everybody. Can I ask some glm questions? 1. How do you find out -2*lnL(saturated model)? In the output from glm, I find: Null deviance: which I think is -2[lnL(null) - lnL(saturated)] Residual deviance: -2[lnL(fitted) - lnL(saturated)] The Null model is the one that includes the constant only (plus offset if specified). Right? I can use the Null and Residual deviance to calculate the usual model Chi-squared statistic -2[lnL(null) - lnL(fitted)]. But, just for curiosity's sake, what't the saturated model's -2lnL ? It's important to remember that lnL is defined only up to an additive constant. For example a Poisson model has lnL contributions -mu + y*log(mu) + constant, and the constant is arbitrary. The differencing in the deviance calculation eliminates it. What constant would you like to use?? I have always been und the impression that the constant chosen by glm is that which makes the deviance of the saturated model 0, the saturated model being the one with one parameter per observation in the dataset. ... But a look at the deviance formula above --- -2[lnL(fitted) - lnL(saturated)] --- shows us that *any* constant can be added to lnL, and the deviance for the saturated model will still be zero. David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm questions
On Tuesday, Mar 16, 2004, at 15:15 Europe/London, Rolf Turner wrote: David Firth wrote (in response to a question from Paul Johnson): On the more general point: yes, if all that students need to know is OLS, Poisson rate models and logistic regression, then GLM is overkill. I couldn't agree less. The glm (not GLM!) framework gives a Well, I really did _intend_ GLM when I wrote GLM, meaning the sort of theoretical thing that Paul described, involving the presentation to (political-science etc) students of the general (linear) exponential family. All that stuff is not needed for a proper understanding of the three models mentioned, and certainly those three models are all meaningful without it. Your second paragraph below is one that I wholeheartedly agree with though (except that it should be linear function of the parameters, not the predictors), and it seems to agree also with what I wrote in the second part of the paragraph which you quote above (the part that you cut) from my earlier reply. David coherence to the structure and changes a collection of ad hoc (and thereby essentially meaningless cook-book) techniques into a single meaningful technique: A parameter (the mean) of a distribution is a transformation of a linear function of some predictors. One seeks to estimate the linear coefficients via maximum likelihood. In a broad array of circumstances the maximization can be carried out by the glm() function (using iteratively reweighted least squares). The process is quick and efficient and the notation is about as transparent as can be imagined. cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm questions --- saturated model
On Tuesday, Mar 16, 2004, at 20:28 Europe/London, Paul Johnson wrote: I'm confused going back and forth between the textbooks and these emails. Please pardon me that I seem so pedantic. I am pretty certain that -2lnL(saturated) is not 0 by definition. I'm pretty certain of that too. In the binomial model with groups of size=1, then the observed scores will be {0,1} but the predicted mean will be some number in [0,1], and so -2lnL will not be 0. I'm reading, for example, Annette Dobson, An Introduction to Generalized Linear Models, 2ed (2002 p. 77), where it gives the formula one can use for -2lnL(saturated) in the binomial model. For the Normal distribution, Dobson says -2lnL(saturated) = N log(2 pi sigma^2) She gives the saturated model -2lnL(saturated) for lots of distributions, actually. I thought the point in the first note from Prof. Firth was that the deviance is defined up to an additive constant because you can add or subtract from lnL in the deviance formula D = -2[lnL(full) - lnL(subset)] and the deviance is unaffected. But I don't think that means there is a completely free quantity in lnL(saturated). No, that's not what I said earlier. Nor what I meant. lnL(any model, including saturated) is defined only up to an additive constant. Dobson's formulae are presumably correct, and would remain so if we added a constant. For discrete distributions we can probably all agree to define likelihood as the probability of getting the observed data (ie to use counting measure on 0,1,2,... or whatever to define our density) --- in which case the arbitrariness is resolved. For continuous distributions we can't do that: the probability is zero for every set of data. So we use density with respect to some measure, the choice of which measure leads to an arbitrary constant (as Peter said). But the value of the constant doesn't matter -- and that really is the point. I hope that's somehow clearer. Best wishes, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] structured random effects
Are there facilities in R or packages to estimate the parameters of a (generalized) linear mixed model like this one: the design is crossed, and response $y_{ij}$ relates to fixed and random effects through a linear predictor \[ \eta_{ij} = \beta x_{ij} + U_i - U_j \] where $U_1, U_2, \ldots$ are iid $N(0, \tau^2)$. Any suggestions would be welcomed. David Professor David Firth Dept of Statistics University of Warwick Coventry CV4 7AL United Kingdom Voice: +44 (0)247 657 2581 Fax: +44 (0)247 652 4532 Web: http://www.warwick.ac.uk/go/dfirth __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] structured random effects
Thanks to Douglas and Spencer for their helpful replies. I take that as a fairly authoritative no to my question! The difficulty in the model I mentioned is not only the crossed design, but the homologous factors (ie i and j take the same values), and U_i - U_j with the *same* U variable appearing twice with different subscripts in the predictor. Thanks again -- I am happy to know that If I work on this I'm not reinventing what already exists. Best regards, David On Friday, Feb 6, 2004, at 18:28 Europe/London, Douglas Bates wrote: Spencer Graves [EMAIL PROTECTED] writes: Have you considered lme? For applications like this, I highly recommend Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer). I failed to produce anything with lme until I read this book. hope this helps. spencer graves Actually lme is better suited to nested designs than to crossed designs and lme doesn't handle generalized linear mixed models. GLMM from package lme4 does handle generalized linear mixed models but does not handle crossed random effects easily. I'm working on code that will handle nested, crossed and partially crossed random effects but it will be some time before it appears in a finished package David Firth wrote: Are there facilities in R or packages to estimate the parameters of a (generalized) linear mixed model like this one: the design is crossed, and response $y_{ij}$ relates to fixed and random effects through a linear predictor \[ \eta_{ij} = \beta x_{ij} + U_i - U_j \] where $U_1, U_2, \ldots$ are iid $N(0, \tau^2)$. Any suggestions would be welcomed. David Professor David Firth Dept of Statistics University of Warwick Coventry CV4 7AL United Kingdom Voice: +44 (0)247 657 2581 Fax: +44 (0)247 652 4532 Web: http://www.warwick.ac.uk/go/dfirth __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Douglas Bates[EMAIL PROTECTED] Statistics Department608/262-2598 University of Wisconsin - Madison http://www.stat.wisc.edu/~bates/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] warning associated with Logistic Regression
On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote: On 25-Jan-04 Guillem Chust wrote: Hi All, When I tried to do logistic regression (with high maximum number of iterations) I got the following warning message Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, As I checked from the Archive R-Help mails, it seems that this happens when the dataset exhibits complete separation. This is so. Indeed, there is a sense in which you are experiencing unusually good fortune, since for values of your predictors in one region you are perfectly predicting the 0s in your reponse, and for values in another region your a perfectly predicting the 1s. What better could you hope for? However, you would respond that this is not realistic: your variables are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or exactly 0, so this perfect prediction is not realistic. In that case, you are somewhat stuck. The plain fact is that your data (in particular the way the values of the X variables are distributed) are not adequate to tell you what is happening. There may be manipulative tricks (like penalised regression) which would inhibit the logistic regression from going all the way to a perfect fit; but, then, how would you know how far to let it go (because it will certainly go as far in that direction as you allow it to). The key parameter in this situation the dispersion parameter (sigma in the usual notation). When you get perfect fit in a completely separated situation, this corresponds to sigma=0. If you don't like this, then there must be reasons why you want sigma0 and this may imply that you have reasons for wanting sigma to be at least s0 (say), or, if you are prepared to be Bayesian about it, you may be satisfied that there is a prior distribution for sigma which would not allow sigma=0, and would attach high probability to a range of sigma values which you condisder to be realistic. Unless you have a fairly firm idea of what sort of values sigma is likely to havem then you are indeed stuck because you have no reason to prefer one positive value of sigma to a different positive value of sigma. In that case you cannot really object if the logistic regression tries to make it as small as possible! This seems arguable. Accepting that we are talking about point estimation (the desirability of which is of course open to question!!), then old-fashioned criteria like bias, variance and mean squared error can be used as a guide. For example, we might desire to use an estimation method for which the MSE of the estimated logistic regression coefficients (suitably standardized) is as small as possible; or some other such thing. The simplest case is estimation of log(pi/(1-pi)) given an observation r from binomial(n,pi). Suppose we find that r=n -- what then can we say about pi? Clearly not much if n is small, rather more if n is large. Better in terms of MSE than the MLE (whose MSE is infinite) is to use log(p/(1-p)), with p = (r+0.5)/(n+1). See for example Cox Snell's book on binary data. This corresponds to penalizing the likelihood by the Jeffreys prior, a penalty function which has good frequentist properties also in the more general logistic regression context. References given in the brlr package give the theory and some empirical evidence. The logistf package, also on CRAN, is another implementation. I do not mean to imply that the Jeffreys-prior penalty will be the right thing for all applications -- it will not. (eg if you really do have prior information, it would be better to use it.) In general I agree wholeheartedly that it is best to get more/better data! In the absence of such reasons, (cut) All good wishes, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] warning associated with Logistic Regression
On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote: Hi All, When I tried to do logistic regression (with high maximum number of iterations) I got the following warning message Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, As I checked from the Archive R-Help mails, it seems that this happens when the dataset exhibits complete separation. Yes. correct. However, p-values tend to 1 The reported p-values cannot be trusted: the asymptotic theory on which they are based is not valid in such circumstances. , and residual deviance tends to 0. Yes, this happens under complete separation: the model fits the observed 0/1 data perfectly. My questions then is: -Is the converged model correct? Well, converged is not really the right word to use -- the iterative algorithm has diverged. At least one of the coefficients has its MLE at infinity (or minus infinity). In that sense what you see reported (ie large values of estimated log odds-ratios, which approximate infinity) is correct. Still more correct would be estimates reported as Inf or -Inf: but the algorithm is not programmed to detect such divergence. or -Can I limit the number of iterations in order to avoid this warning? Yes, probably, but this is not a sensible course of action. The iterations are iterations of an algorithm to compute the MLE. The MLE is not finite-valued, and the warning is a clue to that. If you *really* want finite parameter estimates, the answer is not to use maximum likelihood as the method of estimation. Various alternatives exist, mostly based on penalizing the likelihood [one such is in the brlr package, but there are others]. As a general principle surely it's better to maximize a different criterion (eg a penalized likelihood, with a purposefully chosen penalty function) rather than stop the MLE algorithm prematurely and arbitrarily? I hope this helps! David Professor David Firth Dept of Statistics University of Warwick Coventry CV4 7AL United Kingdom Email: [EMAIL PROTECTED] Voice: +44 (0)247 657 2581 Fax: +44 (0)247 652 4532 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lm function
If the model is called mymodel, you could do summary(mymodel)$coefficients[,2] or sqrt(diag(vcov(mymodel))) This gives the estimated standard errors of the coefficients, which I think is what you wanted. David On Thursday, Jan 22, 2004, at 20:35 Europe/London, jenny smith wrote: Hi, How can I extract the standard deviation of the coefficients when using the lm function. I know $coefficient gives me the individual betas. thank you - [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ? data.entry read-only ?
The function showData() in the relimp package does something like this (and has a go at your bonus question), in a Tk widget. It's far from a polished product, but gives some idea of what can be done using the tcltk package. Usage is, for example, data(trees) showData(trees) It will be very slow with very large datasets, but fine for smaller ones. I just discovered, though, that showData() has at least one serious bug, arising from changes (quite a while ago now! I should have mended this earlier) in the tcltk package. I'll put a new version of relimp on CRAN soon, and in the meantime I have put a fixed version of showData() at http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/ software/relimp/showdata.r -- In case it's of use. David On Monday, Jan 12, 2004, at 20:19 Europe/London, (Ted Harding) wrote: Hi Folks, The spreadsheet-like layout displayed when 'data.entry' is invoked is very useful simply for legible display of data, quite apart from its intended use for the purpose of entering or editing data. If one wants to use it _simply_ as a display device, so that one can look around inside a data-set while working on it, then it is not a good idea to have its _editing_ capabilities active: this is dangerous, since an inadvertent keystroke could change the data. So: is there a way to invoke 'data.entry' in read-only mode? Or some other function with equivalent display which can be run read-only? [Bonus question: is there a way to change its cosmetics, e.g. background colour, font, ... ?] With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-Jan-04 Time: 20:19:58 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] matching of arguments in ...?
I am puzzled by this (with R --vanilla): test - function(formula, ...) lm(formula, ...) test(1:4 ~ 1, offset=rep(1,4)) Error in eval(expr, envir, enclos) : ..1 used in an incorrect context, no ... to look in test(1:4 ~ 1, weights=rep(1,4)) Error in eval(expr, envir, enclos) : ..1 used in an incorrect context, no ... to look in test(1:4 ~ 1, x=TRUE) Call: lm(formula = formula, x = TRUE) Coefficients: (Intercept) 2.5 Some arguments (such as x) seem to pass willingly through ..., while others (such as offset and weights) do not. Same thing happens with glm. I haven't experimented more widely. Can some kind soul offer an explanation? Thanks, David version _ platform powerpc-apple-darwin6.7.5 arch powerpc os darwin6.7.5 system powerpc, darwin6.7.5 status major1 minor8.0 year 2003 month10 day 08 language R __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] logistic regression for a data set with perfect separation
On Wednesday, Sep 10, 2003, at 18:50 Europe/London, Christoph Lehmann wrote: Dear R experts I have the follwoing data V1 V2 1 -5.800 0 2 -4.800 0 3 -2.867 0 4 -0.867 0 5 -0.733 0 6 -1.667 0 7 -0.133 1 8 1.200 1 9 1.333 1 and I want to know, whether V1 can predict V2: of course it can, since there is a perfect separation between cases 1..6 and 7..9 How can I test, whether this conclusion (being able to assign an observation i to class j, only knowing its value on Variable V1) holds also for the population, our data were drawn from? For this you really need more data. The only way you'll ever be able to reject that hypothesis is by finding an instance of 010 or 101 in the (ordered by V1) sample. And if you find such then you can reject with certainty. Means, which inference procedure is recommended? Logistic regression, for obvious reasons makes no sense. Not so obvious to me! Logistic regression still makes sense, but care is needed in the method of estimation/inference. The maximum likelihood solution in the above case is a model which says V2 is 1 with certainty at some values of V1, and is zero with certainty at other values; and that seems an unwarranted inference with so little data. That's a criticism of maximum likelihood, rather than a criticism of logistic regression. (Think about the more extreme situation of tossing a coin once: if a head is observed, the ML solution is that the coin lands heads with certainty, ie that there no chance of tails.) There are alternative (Bayesian and pseudo-Bayesian) methods of inference which can yield more sensible answers in general. [One such is implemented in package brlr (bias reduced logistic regression) on CRAN.] To test the hypothesis described above, though, with the data you have, would seem to require a fully Bayesian analysis whose conclusions would depend strongly on the prior probability attached to the hypothesis. ie you need more data... I hope that helps in some way! Regards, David Many thanks for your help Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Tobit analysis
Gaussian and normal are two different words for the same thing. The example at http://www.biostat.wustl.edu/archives/html/s-news/1999-06/msg00125.html should give you plenty of clues. David On Wednesday, Jul 16, 2003, at 16:26 Europe/London, Andrew Barnes wrote: Having read previous correspondance on this topic, am I right in using a gaussian distribution for a tobit model, one article suggests a normal distribution? Also, I want to censure at the upper bound, so, using the survival5 package I use: survreg(Surv(y,yc,type=right)~x) for a censored regression. Could anybody who's had experience of this, confirm whether I'm in the ballpark? Grateful thanks Andrew Barnes === Dr Andrew Barnes Agricultural Management Economist Modelling and Strategy Group Land Economy Research Department Research Division SAC West Mains Road Edinburgh EH9 3JG Tel: 0131 535 4042E.Mail: [EMAIL PROTECTED] Fax: 0131 667 2601Mobile: 07811 935 022 Website: http://www.sac.ac.uk/management/external/default.htm __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] packaged datasets in .csv format (David Firth)
Many thanks to those who replied to my question. Dirk's suggestion, to use a .R file in the data directory of the package, specifying how the .csv should be read, works fine as an answer to the question about making comma-separated files available. Uwe's answer to my other question (; vs ,), ie compatibility with existing R packages, is well taken! Cheers, David On Thursday, Jul 10, 2003, at 12:25 Europe/London, Uwe Ligges wrote: Andreas Christmann wrote: - - Message: 1 Date: Wed, 9 Jul 2003 10:53:27 +0100 From: David Firth [EMAIL PROTECTED] Subject: [R] packaged datasets in .csv format To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII; format=flowed A couple of questions in connection with using .csv format to include data in a package: First, the background. The data() function loads data from .csv (comma-separated values) files using read.table(..., header = TRUE, sep = ;) But ?read.table says ## To write a CSV file for input to Excel one might use write.table(x, file = foo.csv, sep = ,, col.names = NA) ## and to read this file back into R one needs read.table(file.csv, header = TRUE, sep = ,, row.names=1) As a result, .csv files created by write.table() as above are not read in by data() in the way that might be expected [that is, expected by someone who had not read help(data)!] Two questions, then: -- is there some compelling reason for the use of `sep = ;' in place of `sep = ,, row.names=1'? Do you really want an answer? Today, one reason is compatibility to all the other packages on CRAN. I prefer ; instead of , , because in text variables there are often ,. That's why text variables can be quoted. -- if I want to maintain a dataset in .csv format, for use both in R and in other systems such as Excel, SPSS, etc, what is the best way to go about it? When regularly using that many systems on the same data sets, it might be worth using a database system, e.g. MySQL. BTW: R *and* Excel *and* (for sure, but I haven't tested) also SPSS can read a couple of different ASCII formatted files, so there are quite a lot possible formats. Uwe Ligges Depends. Perhaps it is best to check it out for the software packages and the versions of the software packages you are using. Andreas Christmann Any advice would be much appreciated. Cheers, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] packaged datasets in .csv format
A couple of questions in connection with using .csv format to include data in a package: First, the background. The data() function loads data from .csv (comma-separated values) files using read.table(..., header = TRUE, sep = ;) But ?read.table says ## To write a CSV file for input to Excel one might use write.table(x, file = foo.csv, sep = ,, col.names = NA) ## and to read this file back into R one needs read.table(file.csv, header = TRUE, sep = ,, row.names=1) As a result, .csv files created by write.table() as above are not read in by data() in the way that might be expected [that is, expected by someone who had not read help(data)!] Two questions, then: -- is there some compelling reason for the use of `sep = ;' in place of `sep = ,, row.names=1'? -- if I want to maintain a dataset in .csv format, for use both in R and in other systems such as Excel, SPSS, etc, what is the best way to go about it? Any advice would be much appreciated. Cheers, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] packaged datasets in .csv format
A couple of questions in connection with using .csv format to include data in a package: First, the background. The data() function loads data from .csv (comma-separated values) files using read.table(..., header = TRUE, sep = ;) But ?read.table says ## To write a CSV file for input to Excel one might use write.table(x, file = foo.csv, sep = ,, col.names = NA) ## and to read this file back into R one needs read.table(file.csv, header = TRUE, sep = ,, row.names=1) As a result, .csv files created by write.table() as above are not read in by data() in the way that might be expected [that is, expected by someone who had not read help(data)!] Two questions, then: -- is there some compelling reason for the use of `sep = ;' in place of `sep = ,, row.names=1'? -- if I want to maintain a dataset in .csv format, for use both in R and in other systems such as Excel, SPSS, etc, what is the best way to go about it? Any advice would be much appreciated. Cheers, David __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] CGIwithR for IIS
CGIwithR works only on unix-like operating systems. Not Windows. David On Wednesday, Apr 2, 2003, at 14:12 Europe/London, Gianluca Emireni wrote: Hi, I have a (maybe stupid) question... Does CGIwithR package can be installed in R for Windows to work with a Microsoft IIS web-server? Or I need other libraries? Thank you, Gianluca. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] CGIwithR version 0.40
A new version of the CGIwithR package is now at CRAN. Version 0.40 has the following improvements: -- more comprehensive decoding of form data -- images are no longer cached by browsers -- buffering of the output page to avoid server/browser sync problems Still for unix-like systems only. David Firth Nuffield College Oxford OX1 1NF United Kingdom Phone +44 1865 278544 Fax +44 1865 278621 http://www.stats.ox.ac.uk/~firth __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Overdispersed poisson - negative observation
Peter You are right to suppose that negative observations should not be a problem for a quasi-Poisson fit (though negative *fitted* values would be, in a log-linear model). The problem is that the quasipoisson function is overly restrictive, in requiring non-negativity of the y variable. (commenting out the error trap is not enough to solve the problem). There are two places where quasipoisson has problems: dev.resids - function(y, mu, wt) 2 * wt * (y * log(ifelse(y == 0, 1, y/mu)) - (y - mu)) fails for negative values of y, while initialize - expression({ if (any(y 0)) stop(paste(Negative values not allowed for, the quasiPoisson family)) n - rep(1, nobs) mustart - y + 0.1 }) would (without the error trap) return negative initial values for means, which won't work with a log link. The first problem is the hard one: the deviance isn't defined when y is negative. But an alternative is to use the Pearson X^2 statistic, which is defined for all values of y (and makes more sense anyway for quasi-likelhood models -- it's what you use to determine the dispersion constant). The second problem is easy to fix: for example, just take as starting values the global mean (corresponding to zero-valued effects in the model). I often find that this is a good starting point anyway -- better than the default choice, which typically is not in the model space (even when there are no negative values of y). I give below an amended quasipoisson which seems to work, ie it fits the model by the quasi-likelihood method with variance proportional to mean. It makes only the two changes just described. A *slightly* unsatisfactory aspect is that quantities reported as deviance are in fact Pearson X^2 statistics. But no other choice really makes sense there. David quasipoisson - function (link = log) ## Amended by David Firth, 2003.01.16, at points labelled ### ## to cope with negative y values ## ## Computes Pearson X^2 rather than Poisson deviance ## ## Starting values are all equal to the global mean { linktemp - substitute(link) if (!is.character(linktemp)) { linktemp - deparse(linktemp) if (linktemp == link) linktemp - eval(link) } if (any(linktemp == c(log, identity, sqrt))) stats - make.link(linktemp) else stop(paste(linktemp, link not available for poisson, family; available links are, \identity\, \log\ and \sqrt\)) variance - function(mu) mu validmu - function(mu) all(mu 0) dev.resids - function(y, mu, wt) wt*(y-mu)^2/mu ### aic - function(y, n, mu, wt, dev) NA initialize - expression({ n - rep(1, nobs) mustart - rep(mean(y), length(y)) ### }) structure(list(family = quasipoisson, link = linktemp, linkfun = stats$linkfun, linkinv = stats$linkinv, variance = variance, dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta, initialize = initialize, validmu = validmu, valideta = stats$valideta), class = family) } On Thursday, Jan 16, 2003, at 15:45 Europe/London, [EMAIL PROTECTED] wrote: Dear R users I have been looking for functions that can deal with overdispersed poisson models. Some (one) of the observations are negative. According to actuarial literature (England Verall, Stochastic Claims Reserving in General Insurance , Institute of Actiuaries 2002) this can be handled through the use of quasi likelihoods instead of normal likelihoods. The presence of negatives is not normal in a poisson model, however, we see them frequently in this type of data, and we would like to be able to fit the model anyway. At the moment R is complaining about negative values and the link function = log. My code looks like this. Do any of you know if this problem can be solved in R? Any suggestions are welcomed. Kind regards, Peter Fledelius (new R user) *** Code paym - c(5012, 3257, 2638, 898, 1734, 2642, 1828, 599, 54, 172, 106, 4179, , 5270, 3116, 1817, -103, 673, 535, 3410, 5582, 4881, 2268, 2594, 3479, 649, 603, 5655, 5900, 4211, 5500, 2159, 2658, 984, 1092, 8473, 6271, 6333, 3786, 225, 1513, 4932, 5257, 1233, 2917, 557, 3463, 6956, 1368, 1351, 5596, 6165, 3133, 2262, 2063) alpha - factor(c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4, 5,5,5,5,5,5, 6,6,6,6,6, 7,7,7,7, 8,8,8, 9,9, 10)) beta- factor(c(1,2,3,4,5,6,7,8,9,10, 1,2,3,4,5,6,7,8,9, 1,2,3,4,5,6,7,8, 1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5, 1,2,3,4, 1,2,3, 1,2, 1)) d.AD - data.frame(paym, alpha, beta) glm.qD93 - glm