Re: [R] Request for users of my R-Tcl/Tk examples, limmaGUI or affylmGUI

2006-02-02 Thread David Firth
I have used your R-Tcl/Tk Examples.  (Thanks!  The ideas 
were much appreciated.) 
I am from the United Kingdom. 

David

-- 
Professor David Firth
http://www.warwick.ac.uk/go/dfirth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Request for users of my R-Tcl/Tk examples, limmaGUI or affylmGUI

2006-02-02 Thread David Firth
On Thursday 02 February 2006 12:53, you wrote:
...
 [PLEASE REPLY _OFF_ THE LIST, i.e. DON'T CC to
 [EMAIL PROTECTED]

oops!  I just realised that I hit the wrong button and 
replied to r-help just now...

Sorry!

I guess that *is* one of the perils of sending a request 
such as this to a mailing list, where clumsy fat-fingered 
people such as me might reply.

apologies all round,
David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] : Model formula question

2006-02-01 Thread David Firth
On Wednesday 01 February 2006 02:37, maneesh deshpande 
wrote:
 Hi,

 I have a data set with a continuous predictor X, a factor
 A and a continuous dependent
 variable Y.
 I am trying to build a linear model of the form:

 Y = (b0 + b1*X1)*B(A)

 where B(A) is a constant for each level of the factor A.
 I am not quite sure how to formulate the appropriate
 model formula. If I write:

 Y ~ ( 1 + X)/A

 , I get estimates for as many constants and slopes as the
 number of levels of A.

Yes, that's right: the / symbol has a special 
(non-arithmetic) meaning when used like this in a model 
formula.  See for example p151 onwards in the reference 
that is given by ?formula.

 What I really need is an overall multiplicative constant
 which depends on the factor A.

The gnm (generalized nonlinear models) package has 
facilities for this.  The model above could be specified 
there as
Y ~ -1 + Mult(X, -1 + A)
(where the first -1 removes the intercept, and the second 
one says to estimate a separate multiplier for each level 
of A rather than using contrasts in A).  Or, if you want to 
constrain all of your multipliers to have the same sign, 
you can use
Y ~ -1 + Mult(X, Exp(-1 + A))
(note the capital E there!).

It is unclear to me that using the *same* set of multipliers 
for both intercept and slope will typically be the right 
thing to do, though.  It would not, for example, be 
invariant to transformation of X to X-c, with c constant. 
That is to say, your X variable needs to be on a scale for 
which the zero value has a special meaning, in order to 
allow the above model to make sense.  But presumably you 
have thought about this already.

Hoping that helps,
David

-- 
Professor David Firth
http://www.warwick.ac.uk/go/dfirth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] edit.data.frame

2006-01-13 Thread David Firth
On 12 Jan 2006, at 22:17, Fredrik Lundgren wrote:

 Dear list,

 Sometimes I have huge data.frames and the small spreadsheetlike
 edit.data.frame is quite handy to get an overview of the data.  
 However,
 when I close the editor all data are rolled over the console window,
 which takes time and clutters the window. Is there a way to avoid  
 this?


An alternative to the editor is showData() from the relimp package.   
It is modeless, meaning that your data window can be left open/ 
minimized while you work in R.  I haven't tested it with _very_ large  
data frames though.

David

--
Professor David Firth
http://www.warwick.ac.uk/go/dfirth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Firths bias correction for log-linear models

2006-01-12 Thread David Firth
On 12 Jan 2006, at 20:54, [EMAIL PROTECTED] wrote:

 Dear R-Help List,

 I'm trying to implement Firth's (1993) bias correction for log-linear 
 models.
 Firth (1993) states that such a correction can be implemented by 
 supplementing
 the data with a function of h_i, the diagonals from the hat matrix, 
 but doesn't
 provide further details. I can see that for a saturated log-linear 
 model,  h_i=1
 for all i, hence one just adds 1/2 to each count, which is equivalent 
 to the
 Jeffrey's prior, but I'd also like to get bias corrected estimates for 
 other log
 linear models. It appears that I need to iterate using GLM, with the 
 weights
 option and h_i, which I can get from the function hatvalues. For 
 logistic
 regression, this can be performed by splitting up each observation 
 into response
 and nonresponse, and using weights as described in Heinze, G. and 
 Schemper, M.
 (2002), but I'm unsure of how to implement the analogue for log-linear 
 models. A
 procedure using IWLS is described by Firth (1992) in Dodge and 
 Whittaker (1992),
 but this book isn't in the local library, and its $141+ on Amazon. 
 I've tried
 looking at the code in the logistf and brlr libraries, but I haven't 
 had any
 (successful) ideas. Can anyone help me in describing how to implement 
 this in R?

I don't recommend the adjusted IWLS approach in practice, because that 
algorithm is only first-order convergent.  It is mainly of theoretical 
interest.

The brlr function (in the brlr package) provides a template for a more 
direct approach in practice.  The central operation there is an 
application of optim(), with objective function
  - (l + (0.5 * log(detinfo)))
in which l is the log likelihood and detinfo is the determinant of the 
Fisher information matrix.  In the case of a Poisson log-linear model, 
the Fisher information is, using standard GLM-type notation, t(X) %*% 
diag(mu) %*% X.  It is straightforward to differentiate this penalized 
log-likelihood function, so (as in brlr) derivatives can be supplied 
for use with a second-order convergent optim() algorithm such as BFGS 
(see ?optim for a reference on the algorithm).

I hope that helps.  Please feel free to contact me off the list if 
anything is unclear.

Kind regards,
David

--
Professor David Firth
http://www.warwick.ac.uk/go/dfirth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to predict with logistic model in package logistf ?

2005-10-27 Thread David Firth
On 27 Oct 2005, at 09:18, jinlong li wrote:

 dear community,
  I am a beginer in R , and can't predict with logistic model  in 
 package
 logistf,

Not exactly the answer to your question, but an alternative to the 
logistf package, which purports to do similar things, is brlr (which 
does provide a predict method).  Does this work for you?

   library(brlr)
   data(sex2)
   fit - brlr(case ~ age + oc + vic + vicl + vis + dia, data = sex2)
   predict(fit, newdata = sex2)

David

 could anyone help me ? thanks !

 the following is my command and result :

 library(logistf)
 data(sex2)
 fit-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
 predict(fit,newdata=sex2)
Error in predict(fit, newdata = sex2) : no applicable method for
 predict

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to use glm() with 30 observations?

2005-07-02 Thread David Firth
On 2 Jul 2005, at 06:01, Spencer Graves wrote:

 The issue is not 30 observations but whether it is possible to
 perfectly separate the two possible outcomes.  Consider the following:

 tst.glm - data.frame(x=1:3, y=c(0, 1, 0))
 glm(y~x, family=binomial, data=tst.glm)

 tst2.glm - data.frame(x=1:1000,
   y=rep(0:1, each=500))
 glm(y~x, family=binomial, data=tst2.glm)

 The algorithm fits y~x to tst.glm without complaining for tst.glm,
 but issues warnings for tst2.glm.  This is called the Hauck-Donner
 effect, and RSiteSearch(Hauck-Donner) just now produced 8 hits.  For
 more information, look for Hauck-Donnner in the index of Venables, W.
 N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New
 York: Springer.

Not exactly.  The phenomenon that causes the warning for tst2.glm above 
is more commonly known as complete separation.  For some comments on 
its implications you might look at another work by B D Ripley, the 1996 
book Pattern Recognition and Neural Networks.  There are some further 
references in the help files of the brlr package on CRAN.

The problem noted by Hauck and Donner (1997, JASA) is slightly related, 
but not the same.  See the aforementioned book by Venables and Ripley, 
for example.  The glm function does not routinely warn us about the 
Hauck-Donner effect, afaik.

The original poster did not say what was the purpose of the logistic 
regression was, so it is hard to advise.  Depending on the purpose, the 
separation that was detected may or may not be a problem.

Regards,
David

 (If you don't already have this book, I recommend you
 give serious consideration to purchasing a copy.  It is excellent on
 many issues relating to statistical analysis and R.

 Spencer Graves

 Kerry Bush wrote:

 I have a very simple problem. When using glm to fit
 binary logistic regression model, sometimes I receive
 the following warning:

 Warning messages:
 1: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,
 2: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,

 What does this output tell me? Since I only have 30
 observations, i assume this is a small sample problem.
 Is it possible to fit this model in R with only 30
 observations? Could any expert provide suggestions to
 avoid the warning?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 -- 
 Spencer Graves, PhD
 Senior Development Engineer
 PDF Solutions, Inc.
 333 West San Carlos Street Suite 700
 San Jose, CA 95110, USA

 [EMAIL PROTECTED]
 www.pdf.com http://www.pdf.com
 Tel:  408-938-4420
 Fax: 408-280-7915

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ranking predictive features in logsitic regression

2005-07-01 Thread David Firth
On 30 Jun, 2005, at 21:20, Stephen Choularton wrote:

 Hi

 Is there some function R that multiplies each coefficient by the
 standard deviation of the corresponding variable and produces a 
 ranking?



Possibly you meant un-signed coefficients?  In which case something like

   function(model) rank(abs(coef(model)) * apply(model.matrix(model), 2, 
sd))

should do what you asked about.

The relimp package provides approximate inference for comparisons of 
this kind.

I should say that I don't think that such a ranking will often be very 
useful, though.  Some coefficients will be determined with greater 
precision than others, and there may be correlations to worry about, or 
variables may only make sense when considered in groups (eg factor 
effects, or interactions with corresponding main effects, etc.)

David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] mu^2(1-mu)^2 variance function for GLM

2005-06-16 Thread David Firth
Dear Henric

I do not have a ready stock of other examples, but I do have my own 
version of a family function for this, reproduced below.  It differs 
from yours (apart from being a regular family function rather than 
using a modified quasi) in the definition of deviance residuals.  
These necessarily involve an arbitrary constant (see McCullagh and 
Nelder, 1989, p330); in my function that arbitrariness is in the choice 
eps - 0.0005.  I don't think the deviance contributions as you 
specified in your code below will have the right derivative (with 
respect to mu) for observations where y=0 or y=1.

Anyway, this at least gives you some kind of check.  I hope it helps.  
This function will be part of a new package which Heather Turner and I 
will submit to CRAN in a few days' time.  Please do let me know if you 
find any problems with it.

Here is my wedderburn family function:

wedderburn -
 function (link = logit)
{
 linktemp - substitute(link)
 if (!is.character(linktemp)) {
 linktemp - deparse(linktemp)
 if (linktemp == link)
 linktemp - eval(link)
 }
 if (any(linktemp == c(logit, probit, cloglog)))
 stats - make.link(linktemp)
 else stop(paste(linktemp,
 link not available for wedderburn quasi-family;,
 available links are,
 \logit\, \probit\ and \cloglog\))
 variance - function(mu)  mu^2 * (1-mu)^2
 validmu - function(mu) {
 all(mu  0)  all(mu  1)}
 dev.resids - function(y, mu, wt){
 eps -  0.0005
 2 * wt * (y/mu + (1 - y)/(1 - mu) - 2 +
   (2 * y - 1) * log((y + eps)*(1 - mu)/((1- y + eps) * 
mu)))
 }
 aic - function(y, n, mu, wt, dev) NA
 initialize - expression({
 if (any(y  0 | y  1)) stop(paste(
Values for the wedderburn family must be in [0,1]))
 n - rep.int(1, nobs)
 mustart - (y + 0.1)/1.2
 })
 structure(list(family = wedderburn,
link = linktemp,
linkfun = stats$linkfun,
linkinv = stats$linkinv,
variance = variance,
dev.resids = dev.resids,
aic = aic,
mu.eta = stats$mu.eta,
initialize = initialize,
validmu = validmu,
valideta = stats$valideta),
   class = family)
}

Best wishes,
David
http://www.warwick.ac.uk/go/dfirth

On 16 Jun 2005, at 09:27, Henric Nilsson wrote:

 Dear list,

 I'm trying to mimic the analysis of Wedderburn (1974) as cited by
 McCullagh and Nelder (1989) on p.328-332. This is the leaf-blotch on
 barley example, and the data is available in the `faraway' package.

 Wedderburn suggested using the variance function mu^2(1-mu)^2. This
 variance function isn't readily available in R's `quasi' family object,
 but it seems to me that the following definition could be used:

 }, mu^2(1-mu)^2 = {
  variance - function(mu) mu^2 * (1 - mu)^2
  validmu - function(mu) all(mu  0)  all(mu  1)
  dev.resids - function(y, mu, wt) 2 * wt * ((2 * y - 1) *
  (log(ifelse(y == 0, 1, y/mu)) - log(ifelse(y == 1, 1,
  (1 - y)/(1 - mu - 2 + y/mu + (1 - y)/(1 - mu))

 I've modified the `quasi' function accordingly (into `quasi2' given
 below) and my results are very much in line with the ones cited by
 McCullagh and Nelder on p.330-331:

 data(leafblotch, package = faraway)
 summary(fit - glm(blotch ~ site + variety,
 + family = quasi2(link = logit, variance = mu^2(1-mu)^2),
 + data = leafblotch))

 Call:
 glm(formula = blotch ~ site + variety, family = quasi2(link = logit,
  variance = mu^2(1-mu)^2), data = leafblotch)

 Deviance Residuals:
   Min1QMedian3Q   Max
 -3.23175  -0.65385  -0.09426   0.46946   1.97152

 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept) -7.922530.44463 -17.818   2e-16 ***
 site21.383080.44463   3.111  0.00268 **
 site33.860130.44463   8.682 8.18e-13 ***
 site43.556970.44463   8.000 1.53e-11 ***
 site54.108410.44463   9.240 7.48e-14 ***
 site64.305410.44463   9.683 1.13e-14 ***
 site74.918110.44463  11.061   2e-16 ***
 site85.694920.44463  12.808   2e-16 ***
 site97.067620.44463  15.896   2e-16 ***
 variety2-0.467280.46868  -0.997  0.32210
 variety3 0.078770.46868   0.168  0.86699
 variety4 0.954180.46868   2.036  0.04544 *
 variety5 1.352760.46868   2.886  0.00514 **
 variety6 1.328590.46868   2.835  0.00595 **
 variety7 2.340660.46868   4.994 3.99e-06 ***
 variety8 3.262680.46868   6.961 1.30e-09 ***
 variety9 3.135560.46868   6.690 4.10e-09 ***
 variety103.887360.46868   8.294 4.33e-12 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' 

Re: [R] error messages on R CMD check

2005-03-30 Thread David Firth
Dear Johannes
I have noticed the same complaint, and you are right that it can be 
unconnected with faulty S3 methods, etc., in your package.

For example, in my relimp package (current version 0.9-1, new on CRAN 
today) the DESCRIPTION file has
  Suggests: tcltk
and that works fine (ie it passes R CMD check).

But if I change that to
  Depends: tcltk
I get the same kind of errors that you got:
david% R CMD check relimp
[...]
* checking S3 generic/method consistency ... WARNING
Error in .try_quietly({ : Error: package 'tcltk' could not be loaded
Execution halted
See section 'Generic functions and methods' of the 'Writing R 
Extensions'
manual.
* checking replacement functions ... WARNING
Error in .try_quietly({ : Error: package 'tcltk' could not be loaded
Execution halted
In R, the argument of a replacement function which corresponds to the 
right
hand side must be named 'value'.
* checking foreign function calls ... WARNING
Error in .try_quietly({ : Error: package 'tcltk' could not be loaded
Execution halted
See section 'System and foreign language interfaces' of the 'Writing R
Extensions' manual.
* checking Rd files ... OK
* checking for missing documentation entries ... ERROR
Error in .try_quietly({ : Error: package 'tcltk' could not be loaded

In this case it is because the calling environment for R CMD check did 
not have the DISPLAY variable set, so tcltk could not be loaded.  If 
instead I do

david% setenv DISPLAY :0
david% R CMD check relimp
then all checks are passed.  This is with
david% R --version
R 2.0.1 (2004-11-15).
This is not really an explanation of what you got, just a confirmation 
that something here probably needs fixing (even if it's only the error 
message).  Perhaps one of us should file a bug report (having checked 
first that a fix is not already made in the latest development 
version)?

David
On 30 Mar, 2005, at 13:34, Johannes Hüsing wrote:
Dear all,
I am trying to wrap up a package. On entering
R CMD check, I get the following error messages:
[...]
* checking S3 generic/method consistency ... WARNING
Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc,
character.only = TRUE, verbose = FALSE) :
	package/namespace load failed for 'resper'
Execution halted
See section 'Generic functions and methods' of the 'Writing R 
Extensions'
manual.
* checking replacement functions ... WARNING
Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc,
character.only = TRUE, verbose = FALSE) :
	package/namespace load failed for 'resper'
Execution halted
In R, the argument of a replacement function which corresponds to the 
right
hand side must be named 'value'.
* checking foreign function calls ... WARNING
Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc,
character.only = TRUE, verbose = FALSE) :
	package/namespace load failed for 'resper'
Execution halted
See section 'System and foreign language interfaces' of the 'Writing R
Extensions' manual.
* checking Rd files ... OK
* checking for missing documentation entries ... ERROR
Error in .try_quietly({ : Error in library(package, lib.loc = lib.loc,
character.only = TRUE, verbose = FALSE) :

I am not getting these messages, nor their relation to
section 'Generic functions and methods' of the 'Writing
R Extensions' manual, as I did not write any generic methods
in my package.
There has been some discussion of the error message around
Christmas 2003: 
http://maths.newcastle.edu.au/~rking/R/devel/03b/1438.html,
but I can't see how the circumstances described there apply to
my situation (export a class name).

Could somebody give me a clue on how to give my search a
direction?
Greetings
Johannes
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] model.matrix for a factor effect with no intercept

2005-02-23 Thread David Firth
I was surprised by this (in R 2.0.1):
 a - ordered(-1:1)
 a
[1] -1 0  1
Levels: -1  0  1
 model.matrix(~ a)
  (Intercept)   a.La.Q
1   1 -7.071068e-01  0.4082483
2   1 -9.073800e-17 -0.8164966
3   1  7.071068e-01  0.4082483
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
 model.matrix(~ -1 + a)
  a-1 a0 a1
1   1  0  0
2   0  1  0
3   0  0  1
attr(,assign)
[1] 1 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
Without the intercept, treatment contrasts seem to have been used (this 
despite the contr.poly in the contrasts attribute).

It's not restricted to ordered factors.  For example, if Helmert 
contrasts are used for nominal factors, the same sort of thing happens.

I suppose it is a deliberate feature (perhaps to protect the user from 
accidentally fitting models that make no sense?  or maybe some better 
reason?) -- is it explained somewhere?

David
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] model.matrix for a factor effect with no intercept

2005-02-23 Thread David Firth
Brian, many thanks for these helpful pointers:
On 23 Feb 2005, at 15:45, Prof Brian Ripley wrote:
MASS4 p.150
White Book p.38
Those are the only two reasonably comprehensive accounts that I am 
aware of (and they have only partial overlap).

The underlying motivation is to span the _additional_ vector space 
covered by the term, the complement to what has gone before.  Put 
another way, as each term is added, only enough columns are added to 
the model matrix to span the same space as if dummy coding had been 
used for that term and its predecessors.  So think of this as a way to 
produce a parsimonious (usually full-rank) basis for the model space.
Yes, indeed.  My surprise was that *this* particular basis (dummy 
coding) was the one used here.  I should have got a clue from what 
contrasts() does, and is documented to do,

 options(contrasts)
$contrasts
[1] contr.treatment contr.poly
 contrasts(a)
.L .Q
[1,] -7.071068e-01  0.4082483
[2,] -9.073800e-17 -0.8164966
[3,]  7.071068e-01  0.4082483
but when there's no intercept in the model the contrasts used appear to 
be

 contrasts(a, contrasts = FALSE)
   -1 0 1
-1  1 0 0
0   0 1 0
1   0 0 1
which are not the same as
 contr.poly(a, contrasts = FALSE)
 ^0^1 ^2
[1,]  1 -7.071068e-01  0.4082483
[2,]  1 -9.073800e-17 -0.8164966
[3,]  1  7.071068e-01  0.4082483
-- which is what I had naively expected to get in my model matrix. 

This is of course all a matter of convention.  The present convention 
does seem a touch confusing though: the basis for the space spanned by 
a factor is determined by options(contrasts) or by the contrasts 
attribute of the factor or by the contrasts argument in the call, 
*except* when there's no intercept or other factor earlier in the model 
in which case all such settings are ignored (well, not *quite* ignored: 
they do get put in the contrasts attribute of the resultant model 
matrix).

On the other hand, one good reason to use dummy coding for the 
first-encountered factor when there's no intercept is that the 
associated parameters are then often interpretable as group-specific 
intercepts.

Would it be an improvement, though, if the contrasts attribute of the 
resultant model matrix contained contr.treatment in such cases 
instead of the name of a contrast function that was not actually used?

Best wishes,
David

On Wed, 23 Feb 2005, David Firth wrote:
I was surprised by this (in R 2.0.1):
a - ordered(-1:1)
a
[1] -1 0  1
Levels: -1  0  1
model.matrix(~ a)
 (Intercept)   a.La.Q
1   1 -7.071068e-01  0.4082483
2   1 -9.073800e-17 -0.8164966
3   1  7.071068e-01  0.4082483
attr(,assign)
[1] 0 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
model.matrix(~ -1 + a)
 a-1 a0 a1
1   1  0  0
2   0  1  0
3   0  0  1
attr(,assign)
[1] 1 1 1
attr(,contrasts)
attr(,contrasts)$a
[1] contr.poly
Without the intercept, treatment contrasts seem to have been used 
(this despite the contr.poly in the contrasts attribute).

It's not restricted to ordered factors.  For example, if Helmert 
contrasts are used for nominal factors, the same sort of thing 
happens.

I suppose it is a deliberate feature (perhaps to protect the user 
from accidentally fitting models that make no sense?  or maybe some 
better reason?) -- is it explained somewhere?
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] CGIwithR

2005-01-16 Thread David Firth
See below for the reply I sent you when you asked me this earlier today.
David
On 16 Jan 2005, at 20:22, ebashi wrote:
Dear R users;
I'm trying to use CGIwithR on a linux machine, I have
followed the instructions on the package manual but
still it does not run,
the message that I get is as follows:
The requested URL was not found on this server
I used the example trivial, I put trivial.html under
Web directory and trivial.R in cgi-bin directory,
which itself is a subdirectory of Web directory, ( I
have changed the modes of R.cgi and .Rprofile
according to what package says) but i still get
the same message, do you have any tips for me? my
question is that where should for example myscript.R
that is mentioned in the manual, be
placed? (under Web directory or
under cgi-bin).
besides the path to R and GS, should anything else in
the R.cgi be changed?
many tanx in advance
Sean
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Dear Sean
Thanks for your email.  It is hard to diagnose your problem at a 
distance, but here is some general stuff:

-- the trivial.html file needs to be in a place that is served by your 
http server (is that the URL that is not found?  ie can you see the 
form in your browser?  it wasn't clear to me from your email.)

-- your web server needs to be configured to allow CGI scripts, and 
probably to specify one or more directories where such scripts are 
allowed to live.  This is often a directory called cgi-bin; but note 
that it is not enough just to create a directory with that name, you 
must tell your webserver through its configuration file(s) that it 
exists and that CGI scripts placed there can be run.  How this is done 
depends on which HTTP server you are running, and I cannot help you 
with that I'm afraid.

-- the trivial.R, R.cgi and .Rprofile files all need to be in one of 
the places where CGI scripts are allowed (by your HTTP server) to live.

I hope that helps.  Good luck.
best wishes,
David
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] link to a vignette/overview document in .Rd file?

2004-12-21 Thread David Firth
Is there a reliable way to link to a vignette or other overview 
document, which has been included in the inst/doc directory of a 
source package, within a package help file in Rd format?

or maybe link to an index of such documents?
David
Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532
Web:   http://www.warwick.ac.uk/go/dfirth
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R code for var-cov matrix given variances and correlations

2004-12-21 Thread David Firth
On 21 Dec, 2004, at 13:09, Haynes, Maurice (NIH/NICHD) wrote:
Dear list members,
Where can I find code for computing the p*p variance-covariance
matrix given a vector of p variances (ordered varA, varB, ...,
varp) and a vector of all possible correlations (ordered corAB,
corAC, ..., corp-1,p)?
Something like this (not tested):
sd.vec - sqrt(var.vec)
n - length(sd.vec)
cor.mat - matrix(1, n, n)
for (i in 2:n){
for (j in 1:i-1){
cor.mat[i,j] - cor.vec[(i-1)*(i-2)/2 + j]
cor.mat[j,i] - cor.mat[i,j]}}
cov.mat - cor.mat * outer(sd.vec, sd.vec)
I hope that helps.
David

I know that the covariance between 2 variables is equal to the
product of their correlation and their standard deviations:
corAB * varA^.5 * varB^.5
and so:
covAB - function(corAB, varA, varB) {
corAB * varA^.5 * varB^.5
}
If the vector of variances were
var.vec - c(14, 12, 7)
and the vector of correlations were
cor.vec - c(.4, .2, .5),
then the vector of covariances would be:
covAB(c(.4, .2, .5),c(14, 14, 12), c(12, 7, 7))
[1] 5.184593 1.979899 4.582576

and the variance-covariance matrix with covariances rounded to
the first decimal place would be:
vmat - matrix(c(14, 5.2, 2.0, 5.2, 12, 4.6, 2.0, 4.6, 7),
+ nrow=3)
vmat
 [,1] [,2] [,3]
[1,] 14.0  5.2  2.0
[2,]  5.2 12.0  4.6
[3,]  2.0  4.6  7.0

So the question is: How can I generate a p*p variance-covariance
matrix from a vector of variances and a vector of correlations
without resorting to a construction like:
vmat - matrix(rep(0, p*p), nrow=p)
if (p == 2) {
vmat[1,1] - var[1]
vmat[1,2] - cor[1] * (var[1]^.5 * var[2]^.5)
vmat[2,1] - cor[1] * (var[2]^.5 * var[1]^.5)
vmat[2,2] - var[2]
}
if (p == 3) {
vmat[1,1] - var[1]
vmat[1,2] - cor[1] * (var[1]^.5 * var[2]^.5)
vmat[1,3] - cor[2] * (var[1]^.5 * var[3]^.5)
vmat[2,1] - cor[1] * (var[2]^.5 * var[1]^.5)
vmat[2,2] - var[2]
vmat[2,3] - cor[3] * (var[2]^.5 * var[3]^.5)
vmat[3,1] - cor[2] * (var[3]^.5 * var[1]^.5)
vmat[3,2] - cor[3] * (var[3]^.5 * var[2]^.5)
vmat[3,3] - var[3]
}
and so forth?
Thanks,
Maurice Haynes
National Institute of Child Health and Human Development
Child and Family Research Section
6705 Rockledge Drive, Suite 8030
Bethesda, MD  20892
Voice: 301-496-8180
Fax: 301-496-2766
E-mail: [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multinomial logistic regression

2004-09-23 Thread David Firth
On 23 Sep 2004, at 01:36, array chip wrote:
Hi, how can I do multinomial logistic regression in R?
I think glm() can only handle binary response
variable, and polr() can only handle ordinal response
variable. how to do logistic regression with
multinomial response variable?
Perhaps multinom() in package nnet (in bundle VR) will do what you want?
Regards,
David
Thanks

__
Y! Messenger - Communicate in real time. Download now.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ordered probit and cauchit

2004-09-22 Thread David Firth
On Wednesday, Sep 22, 2004, at 03:42 Europe/London, roger koenker wrote:
What is the current state of the R-art for ordered probit models, and 
more
esoterically is there any available R strategy for ordered cauchit 
models,
i.e. ordered multinomial alternatives with a cauchy link function.  
MCMC
is an option, obviously, but for a univariate latent variable model 
this seems
to be overkill... standard mle methods should be preferable.  (??)

A quick look at polr (in the MASS package) suggests to me that it 
wouldn't be all that hard to extend it to link functions other than 
logistic.

Is MLE known to be well behaved in these models if the latent variable 
is Cauchy?

David

Googling reveals that spss provides such functions... just to wave a 
red
flag.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of 
Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cgi/servlets/httpd in R

2004-05-06 Thread David Firth
On Thursday, May 6, 2004, at 16:01 Europe/London, Samuelson, Frank* 
wrote:

Here's a related question:  Do any of the mentioned R-web interfaces
(Rweb, R-Online, CGIwithR, RSPerl) support reusing the same R process,
eliminating the startup overhead?  This would be useful to me as well.
Currently I use such a method on my computing cluster:  All 40
compute nodes run an R process/compute server that listens at a socket 
for
any
connection and subsequent commands from another computer.
When the master process disconnects, the R processes go back to 
listening
at the socket.  Connecting to the R compute servers this way
takes  2 milliseconds rather than the typical ~2 second R startup 
time.


I only know about CGIwithR, which is very simple and starts a new R for 
every request.  Startup time on my Mac is around 1.3 seconds; for my 
own purposes this is absolutely fine, but I can appreciate that it 
could well be problematic in certain kinds of application (eg where a 
very large amount of data has to be loaded).

Best wishes,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cgi/servlets/httpd in R

2004-05-05 Thread David Firth
On Wednesday, May 5, 2004, at 18:09 Europe/London, foobar wrote:
Hi R-helpers
Has anyone had any experience doing CGI or Servlets or using an httpd 
server in R?


yes.  See the R FAQ, section 4.  (Or maybe you already have, in which 
case I misunderstood the question...)

Best wishes,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm questions

2004-03-16 Thread David Firth
Dear Paul

Here are some attempts at your questions.  I hope it's of some help.

On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote:

Greetings, everybody. Can I ask some glm questions?

1. How do you find out -2*lnL(saturated model)?

In the output from glm, I find:

Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
Residual deviance:   -2[lnL(fitted) - lnL(saturated)]
The Null model is the one that includes the constant only (plus offset 
if specified). Right?

I can use the Null and Residual deviance to calculate the usual model 
Chi-squared statistic
-2[lnL(null) - lnL(fitted)].

But, just for curiosity's sake, what't the saturated model's -2lnL ?
It's important to remember that lnL is defined only up to an additive 
constant.  For example a Poisson model has lnL contributions -mu + 
y*log(mu) + constant, and the constant is arbitrary.  The differencing 
in the deviance calculation eliminates it.  What constant would you 
like to use??

2. Why no 'scaled deviance' in output?  Or, how are you supposed to 
tell if there is over-dispersion?
I just checked andSAS gives us the scaled and nonscaled deviance.
I have read the Venables  Ripley (MASS 4ed) chapter on GLM . I 
believe I understand the cautionary point about overdispersion toward 
the end (p. 408).  Since I'm comparing lots of other books at the 
moment, I believe I see people using the practice that is being 
criticized.  The Pearson Chi-square based estimate of dispersion is 
recommended and one uses an F test to decide if the fitted model is 
significantly worse than the saturated model.  But don't we still 
assess over-dispersion by looking at the scaled deviance (after it is 
calculated properly)?

Can I make a guess why glm does not report scaled deviance?  Are the 
glm authors trying to discourage us from making the lazy assessment in 
which one concludes over-dispersion is present if the scaled deviance 
exceeds the degrees of freedom?
I am unclear what you are asking here.  I assume by scaled deviance 
you mean deviance divided by phi, a (known) scale parameter?  (I'm 
sorry, I don't know SAS's definition.)In many applications (eg 
binomial, Poisson) deviance and scaled deviance are the same thing, 
since phi is 1.  Yes, if you wanted to judge overdispersion relative to 
some other value of phi you would scale the deviance.  What other value 
of phi would you like?

3. When I run example(glm) at the end there's a Gamma model and the 
printout says:

(Dispersion parameter for Gamma family taken to be 0.001813340)
I don't find an estimate for the Gamma distribution's shape paremeter 
in the output.  I'm uncertain what the reported dispersion parameter 
refers to.  Its the denominator (phi) in the exponential family 
formula, isn't it?
   y*theta - c(theta)   exp [   
-- h(y,phi) ]
   phi

Phi is the coefficient of variation, ie variance/(mean^2).  Thus it is 
a shape parameter.  If you are used to some other parameterization of 
the gamma family, just express the mean and variance in that 
parameterization to see the relation between your parameters and phi.

4. For GLM teaching purposes, can anybody point me at some good 
examples of GLM that do not use Normal, Poisson, Negative Binomial, 
and/or Logistic Regression?  I want to justify the effort to 
understand the GLM as a framework.  We have already in previous 
semesters followed the usual econometric approach in which OLS, 
Poisson/Count, and Logistic regression are treated as special cases.  
Some of the students don't see any benefit from tackling the GLM's new 
notation/terminology.
McCullagh and Nelder (1989) has some I believe, eg gamma models.  Also 
quasi-likelihood models, such as the Wedderburn (1974) approach to 
analysis of 2-component compositional data (the leaf blotch example in 
McCN).

On the more general point: yes, if all that students need to know is 
OLS, Poisson rate models and logistic regression, then GLM is overkill. 
 The point, surely, is that GLM opens up a way of thinking in which 
mean function and variance function are specified separately?  This 
becomes most clear through a presentation of GLMs via quasi-likelihood 
(as a the right generalization of weighted least squares) rather than 
via the exponential-family likelihoods.  In my opinion.

4. Is it possible to find all methods that an object inherits?

I found out by reading the source code for J Fox's car package that 
model.matrix() returns the X matrix of coded input variables, so one 
can do fun things like calculate robust standard errors and such. 
That's really useful, because before I found that, I was recoding up a 
storm to re-create the X matrix used in a model.

Is there a direct way to find a list of all the other methods that 
would apply to an object?
methods(class=glm)
methods(class=lm)
is probably not as direct as you had in mind!  But it's a start.

Best wishes,
David

Re: [R] glm questions --- saturated model

2004-03-16 Thread David Firth
On Tuesday, Mar 16, 2004, at 14:51 Europe/London, BXC (Bendix 
Carstensen) wrote:

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Firth
Sent: Tuesday, March 16, 2004 1:12 PM
To: Paul Johnson
Cc: [EMAIL PROTECTED]
Subject: Re: [R] glm questions
Dear Paul

Here are some attempts at your questions.  I hope it's of some help.

On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote:

Greetings, everybody. Can I ask some glm questions?

1. How do you find out -2*lnL(saturated model)?

In the output from glm, I find:

Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
Residual deviance:   -2[lnL(fitted) - lnL(saturated)]
The Null model is the one that includes the constant only
(plus offset
if specified). Right?

I can use the Null and Residual deviance to calculate the
usual model
Chi-squared statistic
-2[lnL(null) - lnL(fitted)].
But, just for curiosity's sake, what't the saturated model's -2lnL ?
It's important to remember that lnL is defined only up to an additive
constant.  For example a Poisson model has lnL contributions -mu +
y*log(mu) + constant, and the constant is arbitrary.  The
differencing
in the deviance calculation eliminates it.  What constant would you
like to use??
I have always been und the impression that the constant chosen by glm 
is
that which makes the deviance of the saturated model 0, the saturated
model being the one with one parameter per observation in the dataset.
...
But a look at the deviance formula above ---
   -2[lnL(fitted) - lnL(saturated)]
--- shows us that *any* constant can be added to lnL, and the deviance 
for the saturated model will still be zero.

David

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm questions

2004-03-16 Thread David Firth
On Tuesday, Mar 16, 2004, at 15:15 Europe/London, Rolf Turner wrote:

David Firth wrote (in response to a question from Paul Johnson):

On the more general point: yes, if all that students need to know is
OLS, Poisson rate models and logistic regression, then GLM is 
overkill.
	I couldn't agree less.  The glm (not GLM!) framework gives a

Well, I really did _intend_ GLM when I wrote GLM, meaning the sort of 
theoretical thing that Paul described, involving the presentation to 
(political-science etc) students of the general (linear) exponential 
family.  All that stuff is not needed for a proper understanding of the 
three models mentioned, and certainly those three models are all 
meaningful without it.  Your second paragraph below is one that I 
wholeheartedly agree with though (except that it should be linear 
function of the parameters, not the predictors), and it seems to agree 
also with what I wrote in the second part of the paragraph which you 
quote above (the part that you cut) from my earlier reply.

David

coherence to the structure and changes a collection of ad hoc
(and thereby essentially meaningless cook-book) techniques
into a single meaningful technique:
A parameter (the mean) of a distribution is a transformation
of a linear function of some predictors.  One seeks to
estimate the linear coefficients via maximum likelihood.  In
a broad array of circumstances the maximization can be
carried out by the glm() function (using iteratively
reweighted least squares).  The process is quick and
efficient and the notation is about as transparent as can be
imagined.
	cheers,

Rolf Turner
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] glm questions --- saturated model

2004-03-16 Thread David Firth
On Tuesday, Mar 16, 2004, at 20:28 Europe/London, Paul Johnson wrote:

I'm confused going back and forth between the textbooks and these 
emails.  Please pardon me that I seem so pedantic.

I am pretty certain that -2lnL(saturated) is not 0 by definition.
I'm pretty certain of that too.

In the binomial model with groups of size=1, then the observed scores 
will be {0,1} but the predicted mean will be some number in [0,1], and 
so -2lnL will not be 0.  I'm reading, for example, Annette Dobson, An 
Introduction to Generalized Linear Models, 2ed (2002  p. 77), where it 
gives the formula one can use for -2lnL(saturated) in the binomial 
model.

For the Normal distribution, Dobson says

-2lnL(saturated) = N log(2 pi sigma^2)

She gives the saturated model -2lnL(saturated) for lots of 
distributions, actually.

I thought the point in the first note from Prof. Firth was that the 
deviance is defined up to an additive constant because you can add or 
subtract from lnL in the deviance formula

D = -2[lnL(full) - lnL(subset)]

and the deviance is unaffected.  But I don't think that means there is 
a completely free quantity in lnL(saturated).
No, that's not what I said earlier.  Nor what I meant.

lnL(any model, including saturated) is defined only up to an additive 
constant.  Dobson's formulae are presumably correct, and would remain 
so if we added a constant.

For discrete distributions we can probably all agree to define 
likelihood as the probability of getting the observed data (ie to use 
counting measure on 0,1,2,... or whatever to define our density) --- 
in which case the arbitrariness is resolved.  For continuous 
distributions we can't do that: the probability is zero for every set 
of data.  So we use density with respect to some measure, the choice of 
which measure leads to an arbitrary constant (as Peter said).  But the 
value of the constant doesn't matter -- and that really is the point.

I hope that's somehow clearer.

Best wishes,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] structured random effects

2004-02-06 Thread David Firth
Are there facilities in R or packages to estimate the parameters of a 
(generalized) linear mixed model like this one: the design is crossed, 
and response $y_{ij}$ relates to fixed and random effects through a 
linear predictor
\[
  \eta_{ij} = \beta x_{ij} + U_i - U_j
\]
where $U_1, U_2, \ldots$ are iid $N(0, \tau^2)$.

Any suggestions would be welcomed.

David

Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532
Web:   http://www.warwick.ac.uk/go/dfirth
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] structured random effects

2004-02-06 Thread David Firth
Thanks to Douglas and Spencer for their helpful replies.

I take that as a fairly authoritative no to my question!  The 
difficulty
in the model I mentioned is not only the crossed design, but the
homologous factors (ie i and j take the same values),
and U_i - U_j with the *same* U variable
appearing twice with different subscripts in the predictor.

Thanks again -- I am happy to know that If I work on this I'm not
reinventing what already exists.
Best regards,
David
On Friday, Feb 6, 2004, at 18:28 Europe/London, Douglas Bates wrote:

Spencer Graves [EMAIL PROTECTED] writes:

  Have you considered lme?  For applications like this, I highly
recommend Pinheiro and Bates (2000) Mixed-Effects Models in S and
S-Plus (Springer).  I failed to produce anything with lme until I
read this book. hope this helps.  spencer graves
Actually lme is better suited to nested designs than to crossed
designs and lme doesn't handle generalized linear mixed models.
GLMM from package lme4 does handle generalized linear mixed models but
does not handle crossed random effects easily.
I'm working on code that will handle nested, crossed and partially
crossed random effects but it will be some time before it appears in a
finished package


David Firth wrote:

Are there facilities in R or packages to estimate the parameters of
a (generalized) linear mixed model like this one: the design is
crossed, and response $y_{ij}$ relates to fixed and random effects
through a linear predictor

\[
  \eta_{ij} = \beta x_{ij} + U_i - U_j
\]
where $U_1, U_2, \ldots$ are iid $N(0, \tau^2)$.
Any suggestions would be welcomed.

David

Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532
Web:   http://www.warwick.ac.uk/go/dfirth
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
--
Douglas Bates[EMAIL PROTECTED]
Statistics Department608/262-2598
University of Wisconsin - Madison
http://www.stat.wisc.edu/~bates/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] warning associated with Logistic Regression

2004-01-26 Thread David Firth
On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote:

On 25-Jan-04 Guillem Chust wrote:
Hi All,

When I tried to do logistic regression (with high maximum number of
iterations) I got the following warning message
Warning message:
fitted probabilities numerically 0 or 1 occurred in: (if
(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
As I checked from the Archive R-Help mails, it seems that this happens
when the dataset exhibits complete separation.
This is so. Indeed, there is a sense in which you are experiencing
unusually good fortune, since for values of your predictors in one
region you are perfectly predicting the 0s in your reponse, and for
values in another region your a perfectly predicting the 1s. What
better could you hope for?
However, you would respond that this is not realistic: your variables
are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or
exactly 0, so this perfect prediction is not realistic.
In that case, you are somewhat stuck. The plain fact is that your
data (in particular the way the values of the X variables are 
distributed)
are not adequate to tell you what is happening.

There may be manipulative tricks (like penalised regression) which
would inhibit the logistic regression from going all the way to a
perfect fit; but, then, how would you know how far to let it go
(because it will certainly go as far in that direction as you allow
it to).
The key parameter in this situation the dispersion parameter (sigma
in the usual notation). When you get perfect fit in a completely
separated situation, this corresponds to sigma=0. If you don't like
this, then there must be reasons why you want sigma0 and this may
imply that you have reasons for wanting sigma to be at least s0 (say),
or, if you are prepared to be Bayesian about it, you may be satisfied
that there is a prior distribution for sigma which would not allow
sigma=0, and would attach high probability to a range of sigma values
which you condisder to be realistic.
Unless you have a fairly firm idea of what sort of values sigma is
likely to havem then you are indeed stuck because you have no reason
to prefer one positive value of sigma to a different positive value
of sigma. In that case you cannot really object if the logistic
regression tries to make it as small as possible!
This seems arguable.  Accepting that we are talking about point 
estimation (the desirability of which is of course open to question!!), 
then old-fashioned criteria like bias, variance and mean squared error 
can be used as a guide.  For example, we might desire to use an 
estimation method for which the MSE of the estimated logistic 
regression coefficients (suitably standardized) is as small as 
possible; or some other such thing.

The simplest case is estimation of log(pi/(1-pi)) given an observation 
r from binomial(n,pi).  Suppose we find that r=n -- what then can we 
say about pi?  Clearly not much if n is small, rather more if n is 
large.  Better in terms of MSE than the MLE (whose MSE is infinite) is 
to use log(p/(1-p)), with p = (r+0.5)/(n+1).  See for example Cox  
Snell's book on binary data.  This corresponds to penalizing the 
likelihood by the Jeffreys prior, a penalty function which has good 
frequentist properties also in the more general logistic regression 
context.  References given in the brlr package give the theory and some 
empirical evidence.  The logistf package, also on CRAN, is another 
implementation.

I do not mean to imply that the Jeffreys-prior penalty will be the 
right thing for all applications -- it will not.  (eg if you really do 
have prior information, it would be better to use it.)

In general I agree wholeheartedly that it is best to get more/better 
data!

In the absence of such reasons,
(cut)

All good wishes,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] warning associated with Logistic Regression

2004-01-25 Thread David Firth
On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:

Hi All,

When I tried to do logistic regression (with high maximum number of
iterations) I got the following warning message
Warning message:
fitted probabilities numerically 0 or 1 occurred in: (if
(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
As I checked from the Archive R-Help mails, it seems that this happens 
when
the dataset exhibits complete separation.
Yes.  correct.

However, p-values tend to 1
The reported p-values cannot be trusted: the asymptotic theory on which 
they are based is not valid in such circumstances.

, and
residual deviance tends to 0.
Yes, this happens under complete separation: the model fits the 
observed 0/1 data perfectly.

My questions then is:
-Is the converged model correct?
Well, converged is not really the right word to use -- the iterative 
algorithm has diverged.  At least one of the coefficients has its MLE 
at infinity (or minus infinity).  In that sense what you see reported 
(ie large values of estimated log odds-ratios, which approximate 
infinity) is correct.  Still more correct would be estimates reported 
as Inf or -Inf: but the algorithm is not programmed to detect such 
divergence.

or
-Can I limit the number of iterations in order to avoid this warning?
Yes, probably, but this is not a sensible course of action.  The 
iterations are iterations of an algorithm to compute the MLE.  The MLE 
is not finite-valued, and the warning is a clue to that.

If you *really* want finite parameter estimates, the answer is not to 
use maximum likelihood as the method of estimation.  Various 
alternatives exist, mostly based on penalizing the likelihood [one such 
is in the brlr package, but there are others].  As a general principle 
surely it's better to maximize a different criterion (eg a penalized 
likelihood, with a purposefully chosen penalty function) rather than 
stop the MLE algorithm prematurely and arbitrarily?

I hope this helps!

David

Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom
Email: [EMAIL PROTECTED]
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lm function

2004-01-22 Thread David Firth
If the model is called mymodel, you could do

  summary(mymodel)$coefficients[,2]

or

  sqrt(diag(vcov(mymodel)))

This gives the estimated standard errors of the coefficients, which I 
think is what you wanted.

David

On Thursday, Jan 22, 2004, at 20:35 Europe/London, jenny smith wrote:

Hi,
How can I extract the standard deviation of the coefficients when 
using the lm function.  I know $coefficient gives me the 
individual betas.  thank you

-

	[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ? data.entry read-only ?

2004-01-12 Thread David Firth
The function showData() in the relimp package does something like this  
(and has a go at your bonus question), in a Tk widget.  It's far from a  
polished product, but gives some idea of what can be done using the  
tcltk package.  Usage is, for example,

data(trees)
showData(trees)
It will be very slow with very large datasets, but fine for smaller  
ones.

I just discovered, though, that showData() has at least one serious  
bug, arising from changes (quite a while ago now! I should have mended  
this earlier) in the tcltk package.  I'll put a new version of relimp  
on CRAN soon, and in the meantime I have put a fixed version of  
showData() at
http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/ 
software/relimp/showdata.r
--  In case it's of use.

David

On Monday, Jan 12, 2004, at 20:19 Europe/London, (Ted Harding) wrote:

Hi Folks,

The spreadsheet-like layout displayed when 'data.entry' is
invoked is very useful simply for legible display of data,
quite apart from its intended use for the purpose of entering
or editing data.
If one wants to use it _simply_ as a display device, so that
one can look around inside a data-set while working on it,
then it is not a good idea to have its _editing_ capabilities
active: this is dangerous, since an inadvertent keystroke
could change the data.
So: is there a way to invoke 'data.entry' in read-only
mode? Or some other function with equivalent display which
can be run read-only?
[Bonus question: is there a way to change its cosmetics,
 e.g. background colour, font, ... ?]
With thanks,
Ted.

E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 12-Jan-04   Time: 20:19:58
-- XFMail --
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!  
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] matching of arguments in ...?

2003-11-05 Thread David Firth
I am puzzled by this (with R --vanilla):

   test - function(formula, ...) lm(formula, ...)
   test(1:4 ~ 1, offset=rep(1,4))
  Error in eval(expr, envir, enclos) : ..1 used in an incorrect 
context, no ... to look in
test(1:4 ~ 1, weights=rep(1,4))
  Error in eval(expr, envir, enclos) : ..1 used in an incorrect 
context, no ... to look in
test(1:4 ~ 1, x=TRUE)

  Call:
  lm(formula = formula, x = TRUE)
  Coefficients:
  (Intercept)
  2.5
Some arguments (such as x) seem to pass willingly through ..., while 
others (such as offset and weights) do not.  Same thing happens with 
glm.  I haven't experimented more widely.

Can some kind soul offer an explanation?

Thanks,
David
 version
 _
platform powerpc-apple-darwin6.7.5
arch powerpc
os   darwin6.7.5
system   powerpc, darwin6.7.5
status
major1
minor8.0
year 2003
month10
day  08
language R
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] logistic regression for a data set with perfect separation

2003-09-10 Thread David Firth
On Wednesday, Sep 10, 2003, at 18:50 Europe/London, Christoph Lehmann 
wrote:

Dear R experts

I have the follwoing data
  V1 V2
1 -5.800  0
2 -4.800  0
3 -2.867  0
4 -0.867  0
5 -0.733  0
6 -1.667  0
7 -0.133  1
8  1.200  1
9  1.333  1
and I want to know, whether V1 can predict V2: of course it can, since
there is a perfect separation between cases 1..6 and 7..9
How can I test, whether this conclusion (being able to assign an
observation i to class j, only knowing its value on Variable V1)  holds
also for the population, our data were drawn from?
For this you really need more data.  The only way you'll ever be able 
to reject that hypothesis is by finding an instance of 010 or 101 in 
the (ordered by V1) sample.  And if you find such then you can reject 
with certainty.

Means, which inference procedure is recommended? Logistic regression,
for obvious reasons makes no sense.
Not so obvious to me!  Logistic regression still makes sense, but care 
is needed in the method of estimation/inference.  The maximum 
likelihood solution in the above case is a model which says V2 is 1 
with certainty at some values of V1, and is zero with certainty at 
other values; and that seems an unwarranted inference with so little 
data.  That's a criticism of maximum likelihood, rather than a 
criticism of logistic regression.  (Think about the more extreme 
situation of tossing a coin once: if a head is observed, the ML 
solution is that the coin lands heads with certainty, ie that there no 
chance of tails.)

There are alternative (Bayesian and pseudo-Bayesian) methods of 
inference which can yield more sensible answers in general.  [One such 
is implemented in package brlr (bias reduced logistic regression) on 
CRAN.]  To test the hypothesis described above, though, with the data 
you have, would seem to require a fully Bayesian analysis whose 
conclusions would depend strongly on the prior probability attached to 
the hypothesis.  ie you need more data...

I hope that helps in some way!

Regards,
David

Many thanks for your help

Christoph
--
Christoph Lehmann [EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Tobit analysis

2003-07-16 Thread David Firth
Gaussian and normal are two different words for the same thing.

The example at
http://www.biostat.wustl.edu/archives/html/s-news/1999-06/msg00125.html
should give you plenty of clues.
David

On Wednesday, Jul 16, 2003, at 16:26 Europe/London, Andrew Barnes wrote:

Having read previous correspondance on this topic, am I right in using 
a
gaussian distribution for a tobit model, one article suggests a normal 
distribution?

Also, I want to censure at the upper bound, so, using the survival5 
package I use:

survreg(Surv(y,yc,type=right)~x) for a censored regression.

Could anybody who's had experience of this, confirm whether I'm in the 
ballpark?

Grateful thanks
Andrew Barnes
===
Dr Andrew Barnes
Agricultural Management Economist
Modelling and Strategy Group
Land Economy Research Department
Research Division
SAC
West Mains Road
Edinburgh
EH9 3JG
Tel: 0131 535 4042E.Mail:  [EMAIL PROTECTED]
Fax: 0131 667 2601Mobile:  07811 935 022
Website: http://www.sac.ac.uk/management/external/default.htm

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] packaged datasets in .csv format (David Firth)

2003-07-10 Thread David Firth
Many thanks to those who replied to my question.

Dirk's suggestion, to use a .R file in the data directory of the  
package, specifying how the .csv should be read, works fine as an  
answer to the question about making comma-separated files available.

Uwe's answer to my other question (; vs ,), ie compatibility with  
existing R packages, is well taken!

Cheers,
David
On Thursday, Jul 10, 2003, at 12:25 Europe/London, Uwe Ligges wrote:

Andreas Christmann wrote:
- 
-

Message: 1
Date: Wed, 9 Jul 2003 10:53:27 +0100
From: David Firth [EMAIL PROTECTED]
Subject: [R] packaged datasets in .csv format
To: [EMAIL PROTECTED]
Message-ID:
[EMAIL PROTECTED]
Content-Type: text/plain; charset=US-ASCII; format=flowed
A couple of questions in connection with using .csv format to  
include data in a package:

First, the background.  The data() function loads data from .csv  
(comma-separated values) files using

   read.table(..., header = TRUE, sep = ;)

But ?read.table says

  ## To write a CSV file for input to Excel one might use
  write.table(x, file = foo.csv, sep = ,, col.names = NA)
  ## and to read this file back into R one needs
  read.table(file.csv, header = TRUE, sep = ,, row.names=1)
As a result, .csv files created by write.table() as above are not  
read in by data() in the way that might be expected [that is,  
expected by someone who had not read help(data)!]

Two questions, then:
-- is there some compelling reason for  the use of `sep = ;' in  
place of `sep = ,, row.names=1'?
Do you really want an answer?
Today, one reason is compatibility to all the other packages on CRAN.

I prefer ; instead of , , because in text variables there are  
often ,.
That's why text variables can be quoted.


-- if I want to maintain a dataset in .csv format, for use both in R  
and in other systems such as Excel, SPSS, etc, what is the best way  
to go about it?
When regularly using that many systems on the same data sets, it might  
be worth using a database system, e.g. MySQL.

BTW: R *and* Excel *and* (for sure, but I haven't tested) also SPSS  
can read a couple of different ASCII formatted files, so there are  
quite a lot possible formats.

Uwe Ligges


Depends. Perhaps it is best to check it out for the software packages
and the versions of the software packages you are using.

Andreas Christmann
Any advice would be much appreciated.

Cheers,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] packaged datasets in .csv format

2003-07-09 Thread David Firth
A couple of questions in connection with using .csv format to include 
data in a package:

First, the background.  The data() function loads data from .csv 
(comma-separated values) files using

  read.table(..., header = TRUE, sep = ;)

But ?read.table says

 ## To write a CSV file for input to Excel one might use
 write.table(x, file = foo.csv, sep = ,, col.names = NA)
 ## and to read this file back into R one needs
 read.table(file.csv, header = TRUE, sep = ,, row.names=1)
As a result, .csv files created by write.table() as above are not read 
in by data() in the way that might be expected [that is, expected by 
someone who had not read help(data)!]

Two questions, then:
-- is there some compelling reason for  the use of `sep = ;' in place 
of `sep = ,, row.names=1'?
-- if I want to maintain a dataset in .csv format, for use both in R 
and in other systems such as Excel, SPSS, etc, what is the best way to 
go about it?

Any advice would be much appreciated.

Cheers,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] packaged datasets in .csv format

2003-07-09 Thread David Firth
A couple of questions in connection with using .csv format to include 
data in a package:

First, the background.  The data() function loads data from .csv 
(comma-separated values) files using

  read.table(..., header = TRUE, sep = ;)

But ?read.table says

 ## To write a CSV file for input to Excel one might use
 write.table(x, file = foo.csv, sep = ,, col.names = NA)
 ## and to read this file back into R one needs
 read.table(file.csv, header = TRUE, sep = ,, row.names=1)
As a result, .csv files created by write.table() as above are not read 
in by data() in the way that might be expected [that is, expected by 
someone who had not read help(data)!]

Two questions, then:
-- is there some compelling reason for  the use of `sep = ;' in place 
of `sep = ,, row.names=1'?
-- if I want to maintain a dataset in .csv format, for use both in R 
and in other systems such as Excel, SPSS, etc, what is the best way to 
go about it?

Any advice would be much appreciated.

Cheers,
David
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] CGIwithR for IIS

2003-04-02 Thread David Firth
CGIwithR works only on unix-like operating systems.  Not Windows.

David

On Wednesday, Apr 2, 2003, at 14:12 Europe/London, Gianluca Emireni 
wrote:

Hi, I have a (maybe stupid) question... Does CGIwithR package can be
installed in R for Windows to work with a Microsoft IIS web-server? Or 
I
need other libraries?
Thank you, Gianluca.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] CGIwithR version 0.40

2003-01-29 Thread David Firth
A new version of the CGIwithR package is now at CRAN.  Version 0.40 has 
the following improvements:

-- more comprehensive decoding of form data
-- images are no longer cached by browsers
-- buffering of the output page to avoid server/browser sync problems

Still for unix-like systems only.


David Firth
Nuffield College
Oxford OX1 1NF
United Kingdom

Phone +44 1865 278544
Fax   +44 1865 278621
http://www.stats.ox.ac.uk/~firth

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Overdispersed poisson - negative observation

2003-01-16 Thread David Firth
Peter

You are right to suppose that negative observations should not be a 
problem for a quasi-Poisson fit (though negative *fitted* values would 
be, in a log-linear model).

The problem is that the quasipoisson function is overly restrictive, 
in requiring non-negativity of the y variable.  (commenting out the 
error trap is not enough to solve the problem).  There are two places 
where quasipoisson has problems:

dev.resids - function(y, mu, wt) 2 * wt * (y * log(ifelse(y ==
0, 1, y/mu)) - (y - mu))

fails for negative values of y, while

initialize - expression({
if (any(y  0)) stop(paste(Negative values not allowed for,
the quasiPoisson family))
n - rep(1, nobs)
mustart - y + 0.1
})

would (without the error trap) return negative initial values for 
means, which won't work with a log link.

The first problem is the hard one: the deviance isn't defined when y is 
negative.  But an alternative is to use the Pearson X^2 statistic, 
which is defined for all values of y (and makes more sense anyway for 
quasi-likelhood models -- it's what you use to determine the dispersion 
constant).

The second problem is easy to fix: for example, just take as starting 
values the global mean (corresponding to zero-valued effects in the 
model).  I often find that this is a good starting point anyway -- 
better than the default choice, which typically is not in the model 
space (even when there are no negative values of y).

I give below an amended quasipoisson which seems to work, ie it fits 
the model by the quasi-likelihood method with variance proportional to 
mean.  It makes only the two changes just described.  A *slightly* 
unsatisfactory aspect is that quantities reported as deviance are in 
fact Pearson X^2 statistics.  But no other choice really makes sense 
there.

David

quasipoisson - function (link = log)
## Amended by David Firth, 2003.01.16, at points labelled ###
## to cope with negative y values
##
## Computes Pearson X^2 rather than Poisson deviance
##
## Starting values are all equal to the global mean
{
linktemp - substitute(link)
if (!is.character(linktemp)) {
linktemp - deparse(linktemp)
if (linktemp == link)
linktemp - eval(link)
}
if (any(linktemp == c(log, identity, sqrt)))
stats - make.link(linktemp)
else stop(paste(linktemp, link not available for poisson,
family; available links are, \identity\, \log\ and 
\sqrt\))
variance - function(mu) mu
validmu - function(mu) all(mu  0)
dev.resids - function(y, mu, wt) wt*(y-mu)^2/mu   ###
aic - function(y, n, mu, wt, dev) NA
initialize - expression({
n - rep(1, nobs)
mustart - rep(mean(y), length(y)) ###
})
structure(list(family = quasipoisson, link = linktemp,
linkfun = stats$linkfun, linkinv = stats$linkinv, variance = 
variance,
dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta,
initialize = initialize, validmu = validmu, valideta = 
stats$valideta),
class = family)
}


On Thursday, Jan 16, 2003, at 15:45 Europe/London, 
[EMAIL PROTECTED] wrote:

Dear R users

I have been looking for functions that can deal with overdispersed 
poisson
models. Some (one) of the observations are negative. According to 
actuarial
literature (England  Verall, Stochastic Claims Reserving in General
Insurance , Institute of Actiuaries 2002) this can be handled through 
the
use of quasi likelihoods instead of normal likelihoods. The presence of
negatives is not normal in a poisson model, however, we see them 
frequently
in this type of data, and we would like to be able to fit the model 
anyway.

At the moment R is complaining about negative values and the link 
function
= log.

My code looks like this. Do any of you know if this problem can be 
solved
in R? Any suggestions are welcomed.

Kind regards,

Peter Fledelius (new R user)

*** Code 
paym   - c(5012, 3257, 2638,  898, 1734, 2642, 1828,  599,   54,  172,
 106, 4179, , 5270, 3116, 1817, -103,  673,  535,
3410, 5582, 4881, 2268, 2594, 3479,  649,  603,
5655, 5900, 4211, 5500, 2159, 2658,  984,
1092, 8473, 6271, 6333, 3786,  225,
1513, 4932, 5257, 1233, 2917,
 557, 3463, 6956, 1368,
1351, 5596, 6165,
3133, 2262,
2063)
alpha   - factor(c(1,1,1,1,1,1,1,1,1,1,
 2,2,2,2,2,2,2,2,2,
 3,3,3,3,3,3,3,3,
 4,4,4,4,4,4,4,
 5,5,5,5,5,5,
 6,6,6,6,6,
 7,7,7,7,
 8,8,8,
 9,9,
 10))
beta- factor(c(1,2,3,4,5,6,7,8,9,10,
 1,2,3,4,5,6,7,8,9,
 1,2,3,4,5,6,7,8,
 1,2,3,4,5,6,7,
 1,2,3,4,5,6,
 1,2,3,4,5,
 1,2,3,4,
 1,2,3,
 1,2,
 1))
d.AD - data.frame(paym, alpha, beta)
glm.qD93 - glm