Re: [R] scope argument in step function

2005-07-02 Thread Prof Brian Ripley
  What I wondered is if I pass the 'upper=~.' ,
 it seems step() thinks the full model is current one. Not adding
 anymore. If this is the right answer, is there a better way than
 creating fmla argument in the above?

Yes, that is exactly what the help page for step says it means. (So why 
are you unsure and asking?)

upper = terms(Response ~ ., data=ds3))

would appear to be a simpler way to get your intention.


On Fri, 1 Jul 2005, Young Cho wrote:

 Thanks a lot for help in advance. I am switching from matlab to R and I 
 guess I need some time to get rolling. I was wondering why this code :

 fit.0 - lm( Response ~ 1, data = ds3)
 step(fit.0,scope=list(upper=~.,lower=~1),data=ds3)
 Start:  AIC= -32.66
 Response ~ 1

 Call:
 lm(formula = Response ~ 1, data = ds3)
 Coefficients:
 (Intercept)
  1.301


 is not working different from the following:


 cnames - dimnames(ds3)[[2]]
 cnames - cnames[-444]# last col is Response

 fmla - as.formula(paste( ~ ,paste(cnames,collapse=+)))
 step(fit.0,scope=list(upper=fmla,lower=~1),data=ds3)
 Start:  AIC= -32.66
 Response ~ 1
 fmla - as.formula(paste( ~ ,paste(cnames,collapse=+)))
 fit.s - step(fit.0,scope=list(upper=fmla,lower=~1),data=ds3)

 Step:  AIC= -Inf
 Response ~ ENTP9324 + CH1W0281
   Df Sum of Sq RSS  AIC
 none0 -Inf
 - CH1W0281  3   0.00381 0.00381 -115
 - ENTP9324  9 1   1  -34

 The dataframe ds3 is 17 by 444 and I understand it is not smart thing to 
 run stepwise regression. What I wondered is if I pass the 'upper=~.' , 
 it seems step() thinks the full model is current one. Not adding 
 anymore. If this is the right answer, is there a better way than 
 creating fmla argument in the above?


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Generating correlated data from uniform distribution

2005-07-02 Thread Ken Knoblauch
While you are looking at weird distributions, here is one that 
 we have used in experiments on noise masking to explore the 
 bandwidth of visual mechanisms 

D'Zmura, M.,  Knoblauch, K. (1998). Spectral bandwidths for the detection of 
color. 
Vision Research, 20, 3117-28 and
G. Monaci, G. Menegaz, S. Susstrunk and K. Knoblauch Chromatic Contrast 
Detection in Spatial 
Chromatic Noise Visual Neuroscience, Vol. 21, No 3, pp. 291-294, 2004

N - 1
x - runif(N, -.5,.5)
y - runif(N, -abs(x), abs(x))
plot(x,y)

y is not uniform but it is conditional on x.  The plot reveals
why we called this sectored noise.

HTH

ken


Jim Brennan jfbrennan at rogers.com writes:

 Yes you are right I guess this works only for normal data. Free advice
 sometimes comes with too little consideration :-)

Worth every cent...

 Sorry about that and thanks to Spencer for the correct way.

Hmm, but is it? Or rather, what is the relation between the
correlation of the normals  and that of the transformed variables? 
Looks nontrivial to me.

Incidentally, here's a way that satisfies the criteria, but in a
rather weird way:

N - 1
rho - .6
x - runif(N, -.5,.5)
y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - (p.dalgaard at biostat.ku.dk)  FAX: 
(+45) 35327907


Ken Knoblauch
Inserm U371, Cerveau et Vision
Department of Cognitive Neurosciences
18 avenue du Doyen Lepine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: 06 84 10 64 10
http://www.lyon.inserm.fr/371/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to call sas in R

2005-07-02 Thread Nongluck Klibbua
Hello all,
I would like to know how to call sas code in R. Since I simulate data in
R and I need to use sas code (garch-t,egarch and gjr) to estimate it. I
need to simulate 500 times with 2000 obs. How I can call that code in
R.Also, how I can keep the parameters from the estimate.

j=1:500
i=1:2000
sas code
keep parameters.

Best Appreciate,
Luck

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Generating correlated data from uniform distribution

2005-07-02 Thread Peter Dalgaard
Jim Brennan [EMAIL PROTECTED] writes:

 OK now I am skeptical especially when you say in a weird way:-)
 This may be OK but look at plot(x,y) and I am suspicious. Is it still
 alright with this kind of relationship?
...
 N - 1
 rho - .6
 x - runif(N, -.5,.5)
 y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))

Well, the covariance is (everything has mean zero, of course)

E(XY) = (1+rho)/2*EX^2 + (1-rho)/2*E(X*-X) = rho*EX^2 

The marginal distribution of Y is a mixture of two identical uniforms
(X and -X) so is uniform and in particular has the same variance as X.

In summary,  EXY/sqrt(EX^2EY^2) == rho

So as I said, it satisfies the formal requirements. X and Y are
uniformly distributed and their correlation is rho. 

If for nothing else, I suppose that this example is good for
demonstrating that independence and uncorrelatedness is not the same
thing. 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to use glm() with 30 observations?

2005-07-02 Thread David Firth
On 2 Jul 2005, at 06:01, Spencer Graves wrote:

 The issue is not 30 observations but whether it is possible to
 perfectly separate the two possible outcomes.  Consider the following:

 tst.glm - data.frame(x=1:3, y=c(0, 1, 0))
 glm(y~x, family=binomial, data=tst.glm)

 tst2.glm - data.frame(x=1:1000,
   y=rep(0:1, each=500))
 glm(y~x, family=binomial, data=tst2.glm)

 The algorithm fits y~x to tst.glm without complaining for tst.glm,
 but issues warnings for tst2.glm.  This is called the Hauck-Donner
 effect, and RSiteSearch(Hauck-Donner) just now produced 8 hits.  For
 more information, look for Hauck-Donnner in the index of Venables, W.
 N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New
 York: Springer.

Not exactly.  The phenomenon that causes the warning for tst2.glm above 
is more commonly known as complete separation.  For some comments on 
its implications you might look at another work by B D Ripley, the 1996 
book Pattern Recognition and Neural Networks.  There are some further 
references in the help files of the brlr package on CRAN.

The problem noted by Hauck and Donner (1997, JASA) is slightly related, 
but not the same.  See the aforementioned book by Venables and Ripley, 
for example.  The glm function does not routinely warn us about the 
Hauck-Donner effect, afaik.

The original poster did not say what was the purpose of the logistic 
regression was, so it is hard to advise.  Depending on the purpose, the 
separation that was detected may or may not be a problem.

Regards,
David

 (If you don't already have this book, I recommend you
 give serious consideration to purchasing a copy.  It is excellent on
 many issues relating to statistical analysis and R.

 Spencer Graves

 Kerry Bush wrote:

 I have a very simple problem. When using glm to fit
 binary logistic regression model, sometimes I receive
 the following warning:

 Warning messages:
 1: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,
 2: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,

 What does this output tell me? Since I only have 30
 observations, i assume this is a small sample problem.
 Is it possible to fit this model in R with only 30
 observations? Could any expert provide suggestions to
 avoid the warning?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 -- 
 Spencer Graves, PhD
 Senior Development Engineer
 PDF Solutions, Inc.
 333 West San Carlos Street Suite 700
 San Jose, CA 95110, USA

 [EMAIL PROTECTED]
 www.pdf.com http://www.pdf.com
 Tel:  408-938-4420
 Fax: 408-280-7915

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to use glm() with 30 observations?

2005-07-02 Thread Ted Harding
On 02-Jul-05 Kerry Bush wrote:
 I have a very simple problem. When using glm to fit
 binary logistic regression model, sometimes I receive
 the following warning:
 
 Warning messages:
 1: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,  
 2: fitted probabilities numerically 0 or 1 occurred
 in: glm.fit(x = X, y = Y, weights = weights, start =
 start, etastart = etastart,  
 
 What does this output tell me? Since I only have 30
 observations, i assume this is a small sample problem.

It isn't. Spencer Graves has shown clearly with two examples
that you can get a fit with no warnings with 3 observations,
and a fit with warnings for 1000 observations. As he says,
it arises when you get perfect separation with respect to
the linear model.

However, it may be worth expanding Spencer's explanation.
With a single explanatory variable x (as in Spencer's examples),
perfect separation occurs when y = 0 for all x = some x0,
and y = 1 for all x  x0.

One of the parameters in the linear model is the scale parameter
(i.e. the recipsorcal of the slope). If you express the model
in the form

  logit(P(Y=1;x)) = (x - mu)/sigma

then sigma is the scale parameter in question.

As sigma - 0, P(Y=1;x) - 0 for x  mu, and - 1 for x  mu.

Therefore, for any value of mu between x0 (at and below which
all y=0 in your data) and x1 (the next larger value of x, at
and above which all y=1), letting sigma - 0 gives a fit
which perfectly predicts your y-values: it predicts P(Y=1) = 0,
i.e. P(Y=0) = 1, for x  mu, and predicts P(Y=1) = 1 for x  mu;
and this is exactly what you have observed in the data.

So it's not a disaster -- in model-fitting terms, it is a
resounding success!

However, in real life one does not expect to be dealing with
a situation where the outcomes are so perfectly predictable,
and therefore one views such a result with due mistrust.
One attributes the perfect separation not to perfect
predictability, but to the possibility that, by chance,
all the y=0 occur at lower values of x, and all the y=1
at higher values of x.

 Is it possible to fit this model in R with only 30
 observations? Could any expert provide suggestions to
 avoid the warning?

Yes! As Spencer showed, it is possible with 3 -- but of
course it depends on the outcomes y.

As to a suggestion to avoid the warning -- what you really
need to avoid is data where the x-values are so sparse in
the neighbourhood of the P(Y=1;x) = 0.5 area that it becomes
likely that you will get y-values showing perfect separation.

What that means in practice is that, over the range of x-values
such that P(Y=1;x) rises from (say) 0.2 to 0.8 (chosen for
illustration), you should have several x values in your data.
Then the phenomonon of perfect separation becomes unlikely.

But what that means in real life is that you need enough data,
over the relevant range of x-values, to enable you to obtain
(with high probability) a non-zero estimate of sigma (i.e. an
estimate of slope 1/sigma which is not infinite) -- i.e. that
you have enough data, and in the right places, to estimate the
rate at which the probability increases from low to high values.

(Theoretically, it is possible that you get perfect separation
even with well-distributed x-values and sigma  0; but with
well-distributed x-values the chance that this would occur is so
small that it can be neglected).

So, to come back to your particular case, the really meaningful
suggestion for avoiding the warning is that you need better
data. If your study is such that you have to take the x-values
as they come (as opposed to a designed experiement where you
can decide what they are to be), then this suggestion boils
down to get more data.

What that would mean depends on having information about the
smallest value of sigma (largest value of slope) that is
*plausible* in your context. Your data are not particularly
useful, since they positively encourage adopting sigma=0.
So objective information about this could only come from
independent knowledge.

However, as a rule of thumb, in such a situation I would try
to get more data until I had, say, 10 x-values roughly evenly
distributed between the largest for which y=0 and the smallest
for which y=1. If that didn't work first time, then repeat
using the extended data as starting-point.

Or simply sample more data until the phenomenonof perfect
separation was well avoided, and the S.D. of the x-coefficient
was distinctly smaller than the value of the x-coefficient.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-05   Time: 10:45:04
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] Generating correlated data from uniform distribution

2005-07-02 Thread Ted Harding
On 02-Jul-05 Peter Dalgaard wrote:
 Jim Brennan [EMAIL PROTECTED] writes:
 
 OK now I am skeptical especially when you say in a weird way:-)
 This may be OK but look at plot(x,y) and I am suspicious. Is it still
 alright with this kind of relationship?
 ...
 N - 1
 rho - .6
 x - runif(N, -.5,.5)
 y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))
 
 Well, the covariance is (everything has mean zero, of course)
 
 E(XY) = (1+rho)/2*EX^2 + (1-rho)/2*E(X*-X) = rho*EX^2 
 
 The marginal distribution of Y is a mixture of two identical uniforms
 (X and -X) so is uniform and in particular has the same variance as X.
 
 In summary,  EXY/sqrt(EX^2EY^2) == rho
 
 So as I said, it satisfies the formal requirements. X and Y are
 uniformly distributed and their correlation is rho. 
 
 If for nothing else, I suppose that this example is good for
 demonstrating that independence and uncorrelatedness is not the same
 thing.

That was a nice sneaky solution! I was toying with something similar,
but less sneaky, until I saw Peter's, on the lines of

  x-runif(2N, -0.5,0.5); ix-(N-k):(N+k); y-x; y[ix]-(-y[ix])

(which makes the same point about independence and correlation).
The larger k as a fraction of N, the more you swing from rho = 1
to rho = -1, but you cannot achieve, as Peter did, an arbitrary
correlation coefficient rho since the value depends on k which
can only take discrete values.

Another approach which leads to a less special joint distribution
is

  x-sort(runif(N, -0.5,0.5)); y-sort(runif(N, -0.5,0.5))

followed by a rho-dependent permutation of y. I'm still pondering
a way of choosing the permutation so as to get a desired rho.

The extremes are the identity, which for a given sample will
give as close as you can get to rho = +1, and reversal, which
gives as close as you can get to rho = -1.

However, the maximum theoretical rho which you can get (as opposed
to what is possible for particular samples, which may get arbitrarily
close to +1) depends on N. For instance, with N=3, it looks as
though the theoretical rho is about 0.9 with the identity
permutation (for N=1000, however, just about all samples give
rho  0.99).

I smell a source of interesting exam questions ...

Over to you!

Best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-05   Time: 12:22:09
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] interrupted Y axis

2005-07-02 Thread Nick Drew
I did not find an answer to my question after a quick
search using the R
search engine so thought I'd ask away:

Does any know if there's a function exists to create
an interrupted Y axis?
What I mean by interrupted Y axis is that part of the
Y axis has been
removed or excised to permit one to see parts of the
data in more detail.

Perhaps an example will make this clear. Please go to
http://www.jbc.org/cgi/reprint/274/41/28950 and open
the PDF document
located there. Go to page 4, figure 2c provides a
crude example of what I
mean by interrupted Y axis. Part of the Y axis between
800 and 4500 has been
removed to permit easy inspection of the upper end of
the range of data.
(This is not my work but simply an example of what I'm
trying to describe.)
One finds these interrupted Y axis graphs in
newspapers or other
periodicals, more often than not as a bar chart.

Does a function in R exist to permit on to this easily
to a graph? If not,
would such a function be useful? If yes, would the
grid package be the right
tool for me to try and implement this?
~Nick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster of iris data set from Mahalanobis distances

2005-07-02 Thread Jose Claudio Faria
Dear R list,

My question:

I'm trying to calculate Mahalanobis distances for 'Species' from the iris data 
set as below:

cat('\n')
cat('Cluster analyse of iris data\n')
cat('from Mahalanobis distance obtained of SAS\n')
cat('\n')

n   = 3
dat = c(0,
   89.86419,   0,
  179.38471,17.20107,   0)

D = matrix(0, n, n)

nam = c('Set', 'Ver', 'Vir')
rownames(D) = nam
colnames(D) = nam

k = 0
for (i in 1:n) {
   for (j in 1:i) {
  k  = k+1
  D[i,j] = dat[k]
  D[j,i] = dat[k]
   }
}

D=sqrt(D) #D2  - D

dendroS = hclust(as.dist(D), method='single')
dendroC = hclust(as.dist(D), method='complete')

win.graph(w = 3.5, h = 6)
split.screen(c(2, 1))
screen(1)
plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')

screen(2)
plot(dendroC, main='Complete', sub='', xlab='', col='red')


I'm not fouding how to make it in the CRAN documentation (Archives, packages: 
mclust, cluster, fpc and mva).

Could help me?

Regards,
-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interrupted Y axis

2005-07-02 Thread Gabor Grothendieck
On 7/2/05, Nick Drew [EMAIL PROTECTED] wrote:
 I did not find an answer to my question after a quick
 search using the R
 search engine so thought I'd ask away:
 
 Does any know if there's a function exists to create
 an interrupted Y axis?
 What I mean by interrupted Y axis is that part of the
 Y axis has been
 removed or excised to permit one to see parts of the
 data in more detail.
 
 Perhaps an example will make this clear. Please go to
 http://www.jbc.org/cgi/reprint/274/41/28950 and open
 the PDF document
 located there. Go to page 4, figure 2c provides a
 crude example of what I
 mean by interrupted Y axis. Part of the Y axis between
 800 and 4500 has been
 removed to permit easy inspection of the upper end of
 the range of data.
 (This is not my work but simply an example of what I'm
 trying to describe.)
 One finds these interrupted Y axis graphs in
 newspapers or other
 periodicals, more often than not as a bar chart.
 
 Does a function in R exist to permit on to this easily
 to a graph? If not,
 would such a function be useful? If yes, would the
 grid package be the right
 tool for me to try and implement this?


See axis.break in the plotrix package.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interrupted Y axis

2005-07-02 Thread Marc Schwartz
On Sat, 2005-07-02 at 06:09 -0700, Nick Drew wrote:
 I did not find an answer to my question after a quick
 search using the R
 search engine so thought I'd ask away:
 
 Does any know if there's a function exists to create
 an interrupted Y axis?
 What I mean by interrupted Y axis is that part of the
 Y axis has been
 removed or excised to permit one to see parts of the
 data in more detail.
 
 Perhaps an example will make this clear. Please go to
 http://www.jbc.org/cgi/reprint/274/41/28950 and open
 the PDF document
 located there. Go to page 4, figure 2c provides a
 crude example of what I
 mean by interrupted Y axis. Part of the Y axis between
 800 and 4500 has been
 removed to permit easy inspection of the upper end of
 the range of data.
 (This is not my work but simply an example of what I'm
 trying to describe.)
 One finds these interrupted Y axis graphs in
 newspapers or other
 periodicals, more often than not as a bar chart.
 
 Does a function in R exist to permit on to this easily
 to a graph? If not,
 would such a function be useful? If yes, would the
 grid package be the right
 tool for me to try and implement this?
 ~Nick

Using:

RSiteSearch(axis break)

comes up with 78 hits. You might want to consider some of the approaches
taken, including the axis.break() function in the 'plotrix' package on
CRAN.

Note however that the examples (ie. newspapers) you site are called Pop
Charts by Bill Cleveland in his book The Elements of Graphing Data and
are highly criticized.

If you have such disparate ranges you might want to consider separate
plots and/or log scaling. Be careful however, that expanding the
vertical axis to isolate specific data can result in the visual
perception of significant difference where none exists. This notion is
also explored by Cleveland and Tufte (The Visual Display of
Quantitative Information).

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to use glm() with 30 observations?

2005-07-02 Thread Spencer Graves
  I agree with Ted:  in model-fitting terms, it is a
resounding success!  With any data set having at least one point with a 
binomial yield of 0 or 100%, you can get this phenomenon by adding 
series of random numbers sequentially to a model.  Eventually, you will 
add enough variables that some linear combination of the predictors 
will provide perfect prediction for that case.  In such cases, the usual 
asymptotic normality of the parameter estimates breaks down, but you can 
still test using 2*log(likelihood ratio) being approximately chi-square 
with anova(fit1, fit2), as explained in Venables and Ripley and many 
other sources, e.g., ?anova.glm:

  tst2.glm - data.frame(x=1:1000,
+  y=rep(0:1, each=500))
  fit0 - glm(y~1, family=binomial, data=tst2.glm)
  fit1 - glm(y~x, family=binomial, data=tst2.glm)
Warning messages:
1: algorithm did not converge in: glm.fit(x = X, y = Y, weights = 
weights, start = start, etastart = etastart,
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y 
= Y, weights = weights, start = start, etastart = etastart,
  anova(fit0, fit1, test=Chisq)
Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ x
   Resid. Df Resid. Dev  Df Deviance  P(|Chi|)
1   999 1386.3
2   998  6.764e-05   1   1386.3 1.999e-303
 
  This effect exposes a limit to the traditional advice for 
experimental design to test only at two levels for each experimental 
factor, and put them as far apart as possible without jeopardizing 
linearity.  Here, you want to estimate a priori the range over which the 
probability of success goes from, say, 20% to 80%, then double that 
range and make all sets of test conditions unique and spaced roughly 
evenly over that range.  Designing experiments with binary response is 
an extremely difficult problem.  This crude procedure should get you 
something moderately close to an optimal design.

  Best Wishes,
  spencer graves

(Ted Harding) wrote:
 On 02-Jul-05 Kerry Bush wrote:
 
I have a very simple problem. When using glm to fit
binary logistic regression model, sometimes I receive
the following warning:

Warning messages:
1: fitted probabilities numerically 0 or 1 occurred
in: glm.fit(x = X, y = Y, weights = weights, start =
start, etastart = etastart,  
2: fitted probabilities numerically 0 or 1 occurred
in: glm.fit(x = X, y = Y, weights = weights, start =
start, etastart = etastart,  

What does this output tell me? Since I only have 30
observations, i assume this is a small sample problem.
 
 
 It isn't. Spencer Graves has shown clearly with two examples
 that you can get a fit with no warnings with 3 observations,
 and a fit with warnings for 1000 observations. As he says,
 it arises when you get perfect separation with respect to
 the linear model.
 
 However, it may be worth expanding Spencer's explanation.
 With a single explanatory variable x (as in Spencer's examples),
 perfect separation occurs when y = 0 for all x = some x0,
 and y = 1 for all x  x0.
 
 One of the parameters in the linear model is the scale parameter
 (i.e. the recipsorcal of the slope). If you express the model
 in the form
 
   logit(P(Y=1;x)) = (x - mu)/sigma
 
 then sigma is the scale parameter in question.
 
 As sigma - 0, P(Y=1;x) - 0 for x  mu, and - 1 for x  mu.
 
 Therefore, for any value of mu between x0 (at and below which
 all y=0 in your data) and x1 (the next larger value of x, at
 and above which all y=1), letting sigma - 0 gives a fit
 which perfectly predicts your y-values: it predicts P(Y=1) = 0,
 i.e. P(Y=0) = 1, for x  mu, and predicts P(Y=1) = 1 for x  mu;
 and this is exactly what you have observed in the data.
 
 So it's not a disaster -- in model-fitting terms, it is a
 resounding success!
 
 However, in real life one does not expect to be dealing with
 a situation where the outcomes are so perfectly predictable,
 and therefore one views such a result with due mistrust.
 One attributes the perfect separation not to perfect
 predictability, but to the possibility that, by chance,
 all the y=0 occur at lower values of x, and all the y=1
 at higher values of x.
 
 
Is it possible to fit this model in R with only 30
observations? Could any expert provide suggestions to
avoid the warning?
 
 
 Yes! As Spencer showed, it is possible with 3 -- but of
 course it depends on the outcomes y.
 
 As to a suggestion to avoid the warning -- what you really
 need to avoid is data where the x-values are so sparse in
 the neighbourhood of the P(Y=1;x) = 0.5 area that it becomes
 likely that you will get y-values showing perfect separation.
 
 What that means in practice is that, over the range of x-values
 such that P(Y=1;x) rises from (say) 0.2 to 0.8 (chosen for
 illustration), you should have several x values in your data.
 Then the phenomonon of perfect separation becomes unlikely.
 
 But what that means in real life is that you need enough data,
 over the relevant range of x-values, to enable 

[R] probability-probability plot

2005-07-02 Thread Yulei He
Hi, there.

Is there any function in R to plot the probability-probability plot (PP 
plot)? Suppose I am testing some data against normal.

Thanks.

Yulei


$$$
Yulei He
1586 Murfin Ave. Apt 37
Ann Arbor, MI 48105-3135
[EMAIL PROTECTED]
734-647-0305(H)
734-763-0421(O)
734-763-0427(O)
734-764-8263(fax)
$$

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] plot question

2005-07-02 Thread alex diaz
dear list:

in the following plot:

plot(rnorm(10),rnorm(10),xlab=year,ylab=expression
(paste('M x'*10^{3},)),font.lab=2)

font.lab=2, but xlab and ylab are different. I want 
both labels in the same way. help?

a.d.

-
Email Enviado utilizando o serviço MegaMail

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] probability-probability plot

2005-07-02 Thread Jonathan Baron
On 07/02/05 15:41, Yulei He wrote:
 Hi, there.
 
 Is there any function in R to plot the probability-probability plot (PP
 plot)? Suppose I am testing some data against normal.

qqnorm might be what you want, or lead to it.

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Specifying the minimum and maximum of x and y-axis in plot

2005-07-02 Thread Alexandre Brito
see ?plot.default

for example:
plot(1:10,1:10,ylim=c(0,10),xlim=c(0,20))


- Original Message - 
From: Tolga Uzuner [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Saturday, July 02, 2005 9:39 PM
Subject: [R] Specifying the minimum and maximum of x and y-axis in plot


 Hi,

 How do I specify the minimum and maximum of the x and y-axis when using
 plot ?

 Thanks,
 Tolga

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to set the position and size of the select.list window

2005-07-02 Thread wu sz
Hello,

I use select.list to obtain a window of select items, but how can I
set the position and size of this window?

Thank you!
shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] jagged array

2005-07-02 Thread Tolga Uzuner
Hi,

I have a jagged array which looks like this:

  res
...
[[996]]
[1] 1.375 3.375 4.125 4.625 4.875 4.875 6.625 7.125 8.875

[[997]]
[1] 1.875 5.125 6.875 7.875 9.875

[[998]]
[1] 1.875 5.375 6.625 6.875 8.125 9.375 9.625

[[999]]
[1] 1.875 6.875 9.875

[[1000]]
[1] 1.875 6.125 6.875 9.375 9.625


Now, I want to use the first row,so I say res[1]

  res[1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

How do I access, an element within this ? I try

  res[1][1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

which returns the same vector.. ??? How do I get to 3.125 ?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] jagged array

2005-07-02 Thread Berton Gunter
Please read An Introduction to R and the help file for [, where this is
explained. If you want to use R, you need to expend effort to learn it.

-- Bert Gunter 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tolga Uzuner
Sent: Saturday, July 02, 2005 4:41 PM
To: r-help@stat.math.ethz.ch
Subject: [R] jagged array

Hi,

I have a jagged array which looks like this:

  res
...
[[996]]
[1] 1.375 3.375 4.125 4.625 4.875 4.875 6.625 7.125 8.875

[[997]]
[1] 1.875 5.125 6.875 7.875 9.875

[[998]]
[1] 1.875 5.375 6.625 6.875 8.125 9.375 9.625

[[999]]
[1] 1.875 6.875 9.875

[[1000]]
[1] 1.875 6.125 6.875 9.375 9.625


Now, I want to use the first row,so I say res[1]

  res[1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

How do I access, an element within this ? I try

  res[1][1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

which returns the same vector.. ??? How do I get to 3.125 ?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] jagged array

2005-07-02 Thread Tolga Uzuner
Berton Gunter wrote:

Please read An Introduction to R and the help file for [, where this is
explained. If you want to use R, you need to expend effort to learn it.

-- Bert Gunter 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tolga Uzuner
Sent: Saturday, July 02, 2005 4:41 PM
To: r-help@stat.math.ethz.ch
Subject: [R] jagged array

Hi,

I have a jagged array which looks like this:

  res
...
[[996]]
[1] 1.375 3.375 4.125 4.625 4.875 4.875 6.625 7.125 8.875

[[997]]
[1] 1.875 5.125 6.875 7.875 9.875

[[998]]
[1] 1.875 5.375 6.625 6.875 8.125 9.375 9.625

[[999]]
[1] 1.875 6.875 9.875

[[1000]]
[1] 1.875 6.125 6.875 9.375 9.625


Now, I want to use the first row,so I say res[1]

  res[1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

How do I access, an element within this ? I try

  res[1][1]
[[1]]
 [1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

which returns the same vector.. ??? How do I get to 3.125 ?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



  

tx

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot question

2005-07-02 Thread Gabor Grothendieck
Trying characters and expressions variously it seems that font.lab applies
to character strings but not to expressions so if you want to use an expression
just use bold (or whatever) explicitly on the expression.  One gotcha is that
bold will not work as one might have expected on numbers so they must 
be represented as character strings -- which is why we have used 3 rather 
than 3 below.

plot(rnorm(10),rnorm(10),xlab=quote(bold(year)),ylab=quote(bold(Mx10^3)))

On 7/2/05, alex diaz [EMAIL PROTECTED] wrote:
 dear list:
 
 in the following plot:
 
 plot(rnorm(10),rnorm(10),xlab=year,ylab=expression
 (paste('M x'*10^{3},)),font.lab=2)
 
 font.lab=2, but xlab and ylab are different. I want
 both labels in the same way. help?
 
 a.d.
 
 -
 Email Enviado utilizando o serviço MegaMail
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] jagged array

2005-07-02 Thread Spencer Graves
  Access list items using [[.  res[1] is a list with only one 
attribute.  res[1][1] is the same list with only one attributre. 
res[[1]] is the first attribute of that list, so res[[1]][1] in your 
example should be 3.125.

  spencer graves

Tolga Uzuner wrote:

 Berton Gunter wrote:
 
 
Please read An Introduction to R and the help file for [, where this is
explained. If you want to use R, you need to expend effort to learn it.

-- Bert Gunter 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tolga Uzuner
Sent: Saturday, July 02, 2005 4:41 PM
To: r-help@stat.math.ethz.ch
Subject: [R] jagged array

Hi,

I have a jagged array which looks like this:


res

...
[[996]]
[1] 1.375 3.375 4.125 4.625 4.875 4.875 6.625 7.125 8.875

[[997]]
[1] 1.875 5.125 6.875 7.875 9.875

[[998]]
[1] 1.875 5.375 6.625 6.875 8.125 9.375 9.625

[[999]]
[1] 1.875 6.875 9.875

[[1000]]
[1] 1.875 6.125 6.875 9.375 9.625


Now, I want to use the first row,so I say res[1]


res[1]

[[1]]
[1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

How do I access, an element within this ? I try


res[1][1]

[[1]]
[1] 3.125 4.375 4.625 5.125 5.375 6.375 6.875 7.875 9.125 9.125 9.375 9.375

which returns the same vector.. ??? How do I get to 3.125 ?

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



 

 
 tx
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] probability-probability plot

2005-07-02 Thread Spencer Graves
  There are PP plots and QQ plots.  I've tried PP plots and didn't get 
much from them.  The normal QQ plot [qqnorm, in R], however, I use for 
all kinds of things.  In a data mining situation, I'll get, e.g., 500 
p.values, all uniformily distributed under the null hypothesis.  I'll 
transform that null distribution to N(0, 1) with qnorm(p.values) and make

  qqnorm(qnorm(p.values), datax=TRUE).

  I may be chastised severely by Tukeyites for datax=TRUE, but the 
human eye can decode a 45 degree angle easier than any other angle other 
than horizontal or vertical.  If you have a couple of mild outliers with 
a typical aspect ratio of the plot, the datax=TRUE option will make 
the line moderately close to 45 degrees.

  spencer graves

Jonathan Baron wrote:

 On 07/02/05 15:41, Yulei He wrote:
  Hi, there.
  
  Is there any function in R to plot the probability-probability plot (PP
  plot)? Suppose I am testing some data against normal.
 
 qqnorm might be what you want, or lead to it.
 

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html