from:"Murray Jorgensen"

[R] A printing macro

2006-11-13 Thread Murray Jorgensen

I am exploring the result of clustering a large multivariate data set 
into a number of groups, represented, say, by a factor G.

I wrote a function to see how categorical variables vary between groups:

   ddisp - function(dvar) {
+  csqt - chisq.test(G,dvar)
+  print(csqt$statistic)
+  print(csqt$observed)
+  print(round(csqt$expected))
+  round(csqt$residuals)
+  }
 
   x - ceiling(4*runif(100))
   G - gl(4,1,100)
   ddisp(x)
X-squared
  6.235645
dvar
G1  2  3  4
   1 10  5  5  5
   2  6  9  5  5
   3  8  6  5  6
   4  7  4  4 10
dvar
G   1 2 3 4
   1 8 6 5 6
   2 8 6 5 6
   3 8 6 5 6
   4 8 6 5 6
dvar
G1  2  3  4
   1  1  0  0 -1
   2 -1  1  0 -1
   3  0  0  0  0
   4  0 -1  0  1
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(G, dvar)

As I need to apply this function to a large number of variables x it 
would be helpful if the function printed x rather than the formal 
argument dvar. I have a vague idea that things like deparse() and 
substitute() will come into the solution but I have not yet come up with 
the right incantation. Any help appreciated!

Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stepAIC for overdispersed Poisson

2006-11-13 Thread Murray Jorgensen

I am wondering if stepAIC in the MASS library may be used for model 
selection in an overdispersed Poisson situation. What I thought of doing 
was to get an estimate of the overdispersion parameter phi from fitting 
a model with all or most of the available predictors (we have a large 
number of observations so this should not be problematical) and then use 
stepAIC with scale = phi. Should this be OK?

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A printing macro

2006-11-13 Thread Murray Jorgensen


Thanks for these suggestions, Professor Ripley. It's interesting that
the function parameters in R are not truly dummy as they can effect
the result of a function.

Murray

Prof Brian Ripley wrote:
 ddisp - function(dvar) {
 yn - substitute(dvar)
 csqt - eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn)))
 
 }
 
 There are other ways, such as forming the cross-classification table, 
 setting its dimnames and passing that to chisq.test.
 
 On Mon, 13 Nov 2006, Murray Jorgensen wrote:
 
 I am exploring the result of clustering a large multivariate data set
 into a number of groups, represented, say, by a factor G.

 I wrote a function to see how categorical variables vary between groups:

   ddisp - function(dvar) {
 +  csqt - chisq.test(G,dvar)
 +  print(csqt$statistic)
 +  print(csqt$observed)
 +  print(round(csqt$expected))
 +  round(csqt$residuals)
 +  }
 
   x - ceiling(4*runif(100))
   G - gl(4,1,100)
   ddisp(x)
 X-squared
  6.235645
dvar
 G1  2  3  4
   1 10  5  5  5
   2  6  9  5  5
   3  8  6  5  6
   4  7  4  4 10
dvar
 G   1 2 3 4
   1 8 6 5 6
   2 8 6 5 6
   3 8 6 5 6
   4 8 6 5 6
dvar
 G1  2  3  4
   1  1  0  0 -1
   2 -1  1  0 -1
   3  0  0  0  0
   4  0 -1  0  1
 Warning message:
 Chi-squared approximation may be incorrect in: chisq.test(G, dvar)

 As I need to apply this function to a large number of variables x it
 would be helpful if the function printed x rather than the formal
 argument dvar. I have a vague idea that things like deparse() and
 substitute() will come into the solution but I have not yet come up with
 the right incantation. Any help appreciated!

 Murray Jorgensen


 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862


-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Recoding categorical variables

2006-10-25 Thread Murray Jorgensen

I want to recode an integer-valued variable y so that its values become 
1:length(y). I can do this using a loop but maybe someone can suggest 
code without a loop. My code is this:

y - round(20*runif(25))
table(y)
suy - sort(unique(y))
m - length(suy)
z - y + max(suy)
for(i in 1:m) z[y==suy[i]] - i
rbind(y,z)

(the recoded y is stored in z)

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Recoding categorical variables

2006-10-25 Thread Murray Jorgensen

Thanks John! Deepayan Sarkar also suggested this. I don't really expect 
to see any better solution.  Murray

John Fox wrote:
 Dear Murray,
 
 How about as.numeric(factor(y)) ?
 
 I hope this helps,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Murray 
 Jorgensen
 Sent: Wednesday, October 25, 2006 7:13 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Recoding categorical variables

 I want to recode an integer-valued variable y so that its 
 values become 1:length(y). I can do this using a loop but 
 maybe someone can suggest code without a loop. My code is this:

 y - round(20*runif(25))
 table(y)
 suy - sort(unique(y))
 m - length(suy)
 z - y + max(suy)
 for(i in 1:m) z[y==suy[i]] - i
 rbind(y,z)

 (the recoded y is stored in z)

 Murray Jorgensen
 -- 
 Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
 Department of Statistics, University of Waikato, Hamilton, New Zealand
 Email: [EMAIL PROTECTED]Fax 7 838 4155
 Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pooled Covariance Matrix

2006-09-20 Thread Murray Jorgensen

Thank you, Professor Ripley.  Murray Jorgensen

Prof Brian Ripley wrote:
 On Wed, 20 Sep 2006, Murray Jorgensen wrote:
 
 I am in a discriminant analysis situation with a frame containing
 several variables and a grouping factor, if you like:

 set.seed(200906)
 exampledf - as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5))
 exampledf$Group - factor(rep(c(1,2,3),c(3,3,4)))
 exampledf

 I'm sure there must be a simple way to get the within group pooled
 covariance matrix but I haven't found it yet.
 
 There are two versions of this, weighted and unweighted, and the 
 difference caused confusion in the early discriminant analysis 
 literature. (See MASS4 p.333.)  The weighted version is conventional.
 
 Suppose you have a matrix X and a grouping factor g.  Then either of
 
group.means - rowsum(X, g)/as.vector(table(g))
group.means - tapply(X, list(rep(g, ncol(X)), col(X)), mean)
 
 gives the group means, and var(X - group.means[g,]) seems to be what you 
 want.
 
 I started thinking that one might begin by forming a frame with the same
  dimensions but containing the group means. But then I found a thread
 from two years back called Getting the groupmean for each person which
 seemed to imply that doing this was a bit subtle even for ncol=1. Hence
 I will risk a question to the list.
 
 That thread seems to be about efficiency for very large matrices on R of 
 two years' ago.
 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Pooled Covariance Matrix

2006-09-19 Thread Murray Jorgensen

I am in a discriminant analysis situation with a frame containing 
several variables and a grouping factor, if you like:

set.seed(200906)
exampledf - as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5))
exampledf$Group - factor(rep(c(1,2,3),c(3,3,4)))
exampledf

I'm sure there must be a simple way to get the within group pooled 
covariance matrix but I haven't found it yet.

I started thinking that one might begin by forming a frame with the same 
  dimensions but containing the group means. But then I found a thread 
from two years back called Getting the groupmean for each person which 
seemed to imply that doing this was a bit subtle even for ncol=1. Hence 
I will risk a question to the list.

Thanks for any help,  Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] unquoting

2006-08-20 Thread Murray Jorgensen

I would like a function to strip quotes off character strings. I should 
work like this:

  A - matrix(1:6, nrow = 2, ncol=3)
  AF - as.data.frame(A)
  names(AF) - c(First,Second,Third)
  AF
   First Second Third
1 1  3 5
2 2  4 6
  names(AF)[2]
[1] Second
  attach(AF)
  unquote(names(AF)[2])
[1] 3 4

Of course what I actually get is

Error: couldn't find function unquote

The reason that I want to do this is that I have a frame with a rather 
large number of variables and I would like to loop over the names and 
print out various descriptive summaries of the variable's distribution.

OK, OK, I could just go

  AF[,2]
[1] 3 4

but once I thought of unquoting I have some sort of inner need to know 
how to do it!

Cheers,

Murray
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unquoting

2006-08-20 Thread Murray Jorgensen

Thank you, Brian, Peter and Gabor

Brian has what want. My heading was a bit misleading. I was looking for 
a function that would, in logicians' terms, convert 'mention' into 
'use'. (This usually goes along with the story about the importance of 
knowing the difference between a lion and lion.) get() is just such a 
function.

Murray Jorgensen

Prof Brian Ripley wrote:
 ?get
 
 I really think this has nothing to do with `quoting', rather to do with 
 evaluating variables from their names. At first I though you were looking 
 for noquote(), which does unquote in the conventional sense.
 
 noquote(names(AF)[2])
 [1] Second
 get(names(AF)[2])
 [1] 3 4
 
 On Sun, 20 Aug 2006, Murray Jorgensen wrote:
 
 I would like a function to strip quotes off character strings. I should 
 work like this:

   A - matrix(1:6, nrow = 2, ncol=3)
   AF - as.data.frame(A)
   names(AF) - c(First,Second,Third)
   AF
First Second Third
 1 1  3 5
 2 2  4 6
   names(AF)[2]
 [1] Second
   attach(AF)
   unquote(names(AF)[2])
 [1] 3 4

 Of course what I actually get is

 Error: couldn't find function unquote

 The reason that I want to do this is that I have a frame with a rather 
 large number of variables and I would like to loop over the names and 
 print out various descriptive summaries of the variable's distribution.

 OK, OK, I could just go

   AF[,2]
 [1] 3 4

 but once I thought of unquoting I have some sort of inner need to know 
 how to do it!

 Cheers,

 Murray

 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting models in a loop

2006-08-02 Thread Murray Jorgensen

Thanks to all for their help. I am busy today but tomorrow I will have 
time to digest all the feedback and follow up if necessary

Cheers,  Murray

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting models in a loop

2006-08-02 Thread Murray Jorgensen

Thanks to all who helped me with this problem, especially Bill Venables 
and Gabor Grothendieck. I hope one day to learn more about the advanced 
features of the language used by Bill.

 From a practical standpoint I think I will just avoid doing things like 
this in my teaching. It is hard enough just getting across the 
elementary ideas.

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fitting models in a loop

2006-07-31 Thread Murray Jorgensen

If I want to display a few polynomial regression fits I can do something 
like

for (i in 1:6) {
mod - lm(y ~ poly(x,i))
print(summary(mod))
}

Suppose that I don't want to over-write the fitted model objects, 
though. How do I create a list of blank fitted model objects for later 
use in a loop?

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [Fwd: Re: Parameterization puzzle]

2006-07-21 Thread Murray Jorgensen

Bother! This cold has made me accident-prone. I meant to hit Reply-all.
Clarification below.

 Original Message 
Subject: Re: [R] Parameterization puzzle
Date: Fri, 21 Jul 2006 19:10:03 +1200
From: Murray Jorgensen [EMAIL PROTECTED]
To: Prof Brian Ripley [EMAIL PROTECTED]
References: [EMAIL PROTECTED] 
[EMAIL PROTECTED]

Apologies for a non-selfcontained example. Here is what I should have sent:

pyears - scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

deaths - scan()
2 32 12 104 28 206 28 186 31 102

l - log(pyears)
Smoke - gl(2,1,10,labels=c(No,Yes))
Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84),
ordered=TRUE)
mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson)
summary(mod1.glm)
age - as.numeric(Age)
mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke +
   poly(age,1):Smoke + offset(l),family=poisson)
summary(mod2.glm)


Cheers, Murray Jorgensen

Prof Brian Ripley wrote:
 R does not know that poly(age,2) and poly(age,1) are linearly dependent.
 (And indeed they only are for some functions 'poly'.)
 
 I cannot reproduce your example ('l' is missing), but perhaps
 
 glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l), poisson)
 
 was your intention?
 
 On Fri, 21 Jul 2006, Murray Jorgensen wrote:
 
 Consider the following example (based on an example in Pat Altham's GLM 
 notes)

 pyears - scan()
 18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

 deaths - scan()
 2 32 12 104 28 206 28 186 31 102

 Smoke - gl(2,1,10,labels=c(No,Yes))
 Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84),
 ordered=TRUE)
 mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson)
 summary(mod1.glm)
 age - as.numeric(Age)
 mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke +
poly(age,1):Smoke + offset(l),family=poisson)
 summary(mod2.glm)



 The business part of the summary for the first model

 Estimate Std. Error z value Pr(|z|)
 (Intercept)-5.927060.16577 -35.754   2e-16 ***
 Age.L   4.064900.47414   8.573   2e-16 ***
 Age.Q  -1.082930.41326  -2.620 0.008781 **
 Age.C   0.241580.31756   0.761 0.446816
 Age^4   0.042440.23061   0.184 0.853986
 SmokeYes0.619160.17296   3.580 0.000344 ***
 Age.L:SmokeYes -1.312340.49267  -2.664 0.007729 **
 Age.Q:SmokeYes  0.390430.43008   0.908 0.363976
 Age.C:SmokeYes -0.295930.33309  -0.888 0.374298
 Age^4:SmokeYes -0.036820.24432  -0.151 0.880218

 inspires me to fit the second model that omits the nonsignificant terms, 
 however this produces the summary

Estimate Std. Error z value Pr(|z|)
 (Intercept)-5.8368 0.1213 -48.103   2e-16 ***
 poly(age, 2)1   3.9483 0.1755  22.497   2e-16 ***
 poly(age, 2)2  -1.0460 0.1448  -7.223 5.08e-13 ***
 SmokeYes0.5183 0.1262   4.106 4.02e-05 ***
 SmokeNo:poly(age, 1)1.3755 0.4340   3.169  0.00153 **
 SmokeYes:poly(age, 1)   NA NA  NA   NA

 Why do we get a SmokeNo:poly(age, 1) term? Can I re-express mod2.glm so 
 that this term does not appear?

 Cheers,  Murray Jorgensen


 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parameterization puzzle

2006-07-21 Thread Murray Jorgensen

Thanks to Brian and Berwin with there help. I faced a double problem in 
that I not only wanted to fit the model but I also wanted to do so in 
such a way that it would seem natural for a classroom example.

The moral seems to be that I should do the orthogonal polynomial stuff 
outside the model formula. Here then is my solution:

pyears - scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

deaths - scan()
2 32 12 104 28 206 28 186 31 102

l - log(pyears)
Smoke - gl(2,1,10,labels=c(No,Yes))
Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84),
ordered=TRUE)
mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson)
summary(mod1.glm)
age - as.numeric(Age)
age1 - poly(age,2)[,1]
age2 - poly(age,2)[,2]
mod2.glm - aso1.glm - glm(deaths ~ age1 + age2 + Smoke +
   age1:Smoke + offset(l),family=poisson)
summary(mod2.glm)

The final summary then comes out looking like this:

   Estimate Std. Error z value Pr(|z|)
(Intercept)-5.8368 0.1213 -48.103   2e-16 ***
age15.3238 0.4129  12.893   2e-16 ***
age2   -1.0460 0.1448  -7.223 5.08e-13 ***
SmokeYes0.5183 0.1262   4.106 4.02e-05 ***
age1:SmokeYes  -1.3755 0.4340  -3.169  0.00153 **


which is just what I wanted.

Cheers,  Murray Jorgensen

Prof Brian D Ripley wrote:
 On Fri, 21 Jul 2006, Berwin A Turlach wrote:
 
 G'day all,

 BDR == Prof Brian Ripley [EMAIL PROTECTED] writes:

BDR R does not know that poly(age,2) and poly(age,1) are linearly
BDR dependent.
 Indeed, I also thought that this is the reason of the problem.

BDR (And indeed they only are for some functions 'poly'.)
 I am surprised about this.  Should probably read the help page of
 'poly' once more and more carefully.
 
 My point was perhaps simpler: if you or I or Murray had a function 
 poly() in our workspace, that one would be found, I think.  (I have not 
 checked the ramifications of namespaces here, but I believe that would 
 be the intention, that formulae are evaluated in their defining 
 environment.)  So omly when the model matrix is set up could the linear 
 dependence be known (and there is nothing in the system poly() to tell 
 model.matrix).
 
 What is sometimes called extrinsic aliasing is left to the fitting 
 function, which seems to be to do a sensible job provided the main 
 effect is in the model.  Indeed, including interactions without main 
 effects (as Murray did) often makes the results hard to interpret.
 
BDR I cannot reproduce your example ('l' is missing), [...]
 My guess is that 'l' is 'pyears'.  At least, I worked under that
 assumption.
 
 Apparently l = log(pyears), which was my uncertain guess.
 
 Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot
 fit any of the Poisson GLM that Murray tried.  I always get the error
 message:

 Error: no valid set of coefficients has been found: please supply 
 starting values
 
 Related to the offset, I believe.
 

 But I have to investigate this further.  I can fit binomial models
 that give me similar answers.

BDR [...] but perhaps
BDR glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l),
BDR poisson)
BDR was your intention?
 In this parameterisation a 'poly(age,1)' term will appear among the
 coefficients with an estimated value of NA since it is aliased with
 'poly(age, 2)1'.  So I don't believe that this was Murray's intention.

 The only suggestion I can come up with is:

 summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2), 
 family=binomial))

 [...]

 Coefficients:
  Estimate Std. Error z value Pr(|z|)
 (Intercept)  -10.798950.45149 -23.918   2e-16 ***
 age2.378920.20877  11.395   2e-16 ***
 SmokeYes   1.445730.37347   3.871 0.000108 ***
 I(age^2)  -0.197060.02749  -7.168  7.6e-13 ***
 age:SmokeYes  -0.308500.09756  -3.162 0.001566 **

 [...]

 Which doesn't use orthogonal polynomials anymore.  But I don't see how
 you can fit the model that Murray want to fit using orthogonal
 polynomials given the way R's model language operates.

 So I guess the Poisson GLM that Murray wants to fit is:

glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson)

 Cheers,

Berwin

 == Full address 
 Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
 School of Mathematics and Statistics+61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway
 Crawley WA 6009e-mail: [EMAIL PROTECTED]
 Australiahttp://www.maths.uwa.edu.au/~berwin


 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

[R] Parameterization puzzle

2006-07-20 Thread Murray Jorgensen

Consider the following example (based on an example in Pat Altham's GLM 
notes)

pyears - scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

deaths - scan()
2 32 12 104 28 206 28 186 31 102

Smoke - gl(2,1,10,labels=c(No,Yes))
Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84),
ordered=TRUE)
mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson)
summary(mod1.glm)
age - as.numeric(Age)
mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke +
   poly(age,1):Smoke + offset(l),family=poisson)
summary(mod2.glm)



The business part of the summary for the first model

Estimate Std. Error z value Pr(|z|)
(Intercept)-5.927060.16577 -35.754   2e-16 ***
Age.L   4.064900.47414   8.573   2e-16 ***
Age.Q  -1.082930.41326  -2.620 0.008781 **
Age.C   0.241580.31756   0.761 0.446816
Age^4   0.042440.23061   0.184 0.853986
SmokeYes0.619160.17296   3.580 0.000344 ***
Age.L:SmokeYes -1.312340.49267  -2.664 0.007729 **
Age.Q:SmokeYes  0.390430.43008   0.908 0.363976
Age.C:SmokeYes -0.295930.33309  -0.888 0.374298
Age^4:SmokeYes -0.036820.24432  -0.151 0.880218

inspires me to fit the second model that omits the nonsignificant terms, 
however this produces the summary

   Estimate Std. Error z value Pr(|z|)
(Intercept)-5.8368 0.1213 -48.103   2e-16 ***
poly(age, 2)1   3.9483 0.1755  22.497   2e-16 ***
poly(age, 2)2  -1.0460 0.1448  -7.223 5.08e-13 ***
SmokeYes0.5183 0.1262   4.106 4.02e-05 ***
SmokeNo:poly(age, 1)1.3755 0.4340   3.169  0.00153 **
SmokeYes:poly(age, 1)   NA NA  NA   NA

Why do we get a SmokeNo:poly(age, 1) term? Can I re-express mod2.glm so 
that this term does not appear?

Cheers,  Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] princomp and eigen

2006-07-15 Thread Murray Jorgensen

Consider the following output [R2.2.0; Windows XP]

  set.seed(160706)
  X - matrix(rnorm(40),nrow=10,ncol=4)
  Xpc - princomp(X,cor=FALSE)
  summary(Xpc,loadings=TRUE, cutoff=0)
Importance of components:
   Comp.1Comp.2Comp.3 Comp.4
Standard deviation 1.2268300 0.9690865 0.7918504 0.55295970
Proportion of Variance 0.4456907 0.2780929 0.1856740 0.09054235
Cumulative Proportion  0.4456907 0.7237836 0.9094576 1.

Loadings:
  Comp.1 Comp.2 Comp.3 Comp.4
[1,] -0.405 -0.624  0.466  0.479
[2,] -0.199 -0.636 -0.346 -0.660
[3,]  0.884 -0.443  0.023  0.148
[4,]  0.122  0.099  0.814 -0.559
  eigen(var(X))
$values
[1] 1.6723465 1.0434763 0.6966967 0.3397382

$vectors
[,1]   [,2][,3]   [,4]
[1,] -0.4048158  0.6240510  0.46563382  0.4794473
[2,] -0.1994853  0.6361009 -0.34634256 -0.6600213
[3,]  0.8839775  0.4429553  0.02261302  0.1478618
[4,]  0.1221215 -0.0986234  0.81407655 -0.5591414


I would have expected the princomp component standard deviations to be 
the square roots of the eigen() $values and they clearly are not.

Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Extracting Variance components

2006-06-04 Thread Murray Jorgensen

I can ask my question using and example from Chapter 1 of Pinheiro  Bates.

  # 1.4 An Analysis of Covariance Model
 
  OrthoFem - Orthodont[ Orthodont$Sex == Female, ]
  fm1OrthF -
+   lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )
  summary( fm1OrthF )
Linear mixed-effects model fit by REML
  Data: OrthoFem
AIC BIClogLik
   149.2183 156.169 -70.60916

Random effects:
  Formula: ~1 | Subject
 (Intercept)  Residual
StdDev: 2.06847 0.7800331
[...etc...]

I can extract the estimate of the variance component \sigma (0.7800331) via

sigma - fm1OrthF$sigma

How do I extract the other component \sigma_b (2.06847) ?

Cheers,  Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Frequencies into Poisson responses

2006-05-18 Thread Murray Jorgensen

Actually I had just meant that because I had some factors I had a 
conceptual table. Using ftable() was just my way of getting the factors 
into data frame form. But thank you for showing me that as.data.frame() 
does exactly what I want.

Murray Jorgensen

Prof Brian Ripley wrote:
 In fact you have an ftable, not a multi-way table, but as.data.frame 
 works for multi-way tables, and here as.data.frame(as.table(ftabc)) works:
 
 as.data.frame(as.table(ftabc))
A B CC Freq
 1  1 1  1   20
 2  2 1  1   38
 3  3 1  1   20
 4  1 2  1   22
 5  2 2  1   25
 6  3 2  1   23
 ...
 
 It would be more usual to start with a table and use ftable to display it.
 It might just be worth adding as.data.frame.ftable.
 
 
 On Thu, 18 May 2006, Murray Jorgensen wrote:
 
 Suppose that one has several factors, all of the same length. These
 define a multi-way contingency table. Now suppose one wants to fit a
 Poisson GLM a.k.a. log-linear model to the frequencies in this table.
 How may we make the table into a data frame suitable for glm() ?
 I have an answer to my own question below, but surely more elegant
 solutions exist?

 set.seed(060518)
 na - nb - 3
 nc - 4
 n - na*nb*nc
 a - round(runif(1000,0.5,na+0.5))
 b - round(runif(1000,0.5,nb+0.5))
 cc - round(runif(1000,0.5,nc+0.5))
 A - factor(a)
 B - factor(b)
 CC - factor(cc)
 ftabc - ftable(A,B,CC)
 freqs - as.vector(ftabc)
 A1 - gl(na,nb,n)
 B1 - gl(nb,1,n)
 C1 - gl(nc,na*nb,n)
 required - data.frame(A1,B1,C1,freqs)
 required

 Cheers,  Murray Jorgensen


 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Frequencies into Poisson responses

2006-05-17 Thread Murray Jorgensen

Suppose that one has several factors, all of the same length. These 
define a multi-way contingency table. Now suppose one wants to fit a 
Poisson GLM a.k.a. log-linear model to the frequencies in this table. 
How may we make the table into a data frame suitable for glm() ?
I have an answer to my own question below, but surely more elegant 
solutions exist?

set.seed(060518)
na - nb - 3
nc - 4
n - na*nb*nc
a - round(runif(1000,0.5,na+0.5))
b - round(runif(1000,0.5,nb+0.5))
cc - round(runif(1000,0.5,nc+0.5))
A - factor(a)
B - factor(b)
CC - factor(cc)
ftabc - ftable(A,B,CC)
freqs - as.vector(ftabc)
A1 - gl(na,nb,n)
B1 - gl(nb,1,n)
C1 - gl(nc,na*nb,n)
required - data.frame(A1,B1,C1,freqs)
required

Cheers,  Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Pasting data into scan()

2006-05-02 Thread Murray Jorgensen

I think that Mozilla Thunderbird replaces tabs with blanks routinely.
I was actually pasting from Crimson Editor but results are identical 
with Notepad and MS Word. However there is no problem when pasting from 
WinEdt:

  strength - scan()
1: 0.023   0.032   0.054   0.069   0.081   0.094
7: 0.105   0.127   0.148   0.169   0.188   0.216
13: 0.255   0.277   0.311   0.361   0.376   0.395
19: 0.432   0.463   0.481   0.519   0.529   0.567
25: 0.642   0.674   0.752   0.823   0.887   0.926
31:
Read 30 items

and scanning the file directly works with or without ' sep=\t ':
  strength - scan(C:\\Files\\Teaching\\STAT321\\tensile.dat)
Read 30 items
  strength - scan(C:\\Files\\Teaching\\STAT321\\tensile.dat,sep=\t)
Read 30 items

I will send the file to BR (but not to the list!)

Murray Jorgensen

Prof Brian Ripley wrote:
 No tabs are being echoed, so your cut/paste is removing the tabs.
 What are you pasting from? (Not an R problem, of course.)
 
 On Tue, 2 May 2006, Murray Jorgensen wrote:
 
 The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets
 looks like this:

 0.0230.0320.0540.0690.0810.094
 0.1050.1270.1480.1690.1880.216
 0.2550.2770.3110.3610.3760.395
 0.4320.4630.4810.5190.5290.567
 0.6420.6740.7520.8230.8870.926

 except that my mail client has replaced the tab separators by blanks. If
 I paste this data into R 2.2.1 what I get is

  strength - scan()
 1: 0.0230.0320.0540.0690.0810.094
 1: 0.1050.1270.1480.1690.1880.216
 Error in scan() : scan() expected 'a real', got
 '0.0230.0320.0540.0690.0810.094'
  0.2550.2770.3110.3610.3760.395
 Error: syntax error in 0.2550.2770
  0.4320.4630.4810.5190.5290.567
 Error: syntax error in 0.4320.4630
  0.6420.6740.7520.8230.8870.926
 Error: syntax error in 0.6420.6740

 Aha! I thought, what I need is scan(sep = \t)
 but this generates the same error messages.

 Help!   Murray


 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Pasting data into scan()

2006-05-01 Thread Murray Jorgensen

The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets 
looks like this:

0.023   0.032   0.054   0.069   0.081   0.094
0.105   0.127   0.148   0.169   0.188   0.216
0.255   0.277   0.311   0.361   0.376   0.395
0.432   0.463   0.481   0.519   0.529   0.567
0.642   0.674   0.752   0.823   0.887   0.926

except that my mail client has replaced the tab separators by blanks. If 
I paste this data into R 2.2.1 what I get is

  strength - scan()
1: 0.0230.0320.0540.0690.0810.094
1: 0.1050.1270.1480.1690.1880.216
Error in scan() : scan() expected 'a real', got 
'0.0230.0320.0540.0690.0810.094'
  0.2550.2770.3110.3610.3760.395
Error: syntax error in 0.2550.2770
  0.4320.4630.4810.5190.5290.567
Error: syntax error in 0.4320.4630
  0.6420.6740.7520.8230.8870.926
Error: syntax error in 0.6420.6740

Aha! I thought, what I need is scan(sep = \t)
but this generates the same error messages.

Help!   Murray

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Pasting data into scan() - oops!

2006-05-01 Thread Murray Jorgensen

I forgot to mention that I am using Windows XP.


 Original Message 
Subject: Pasting data into scan()
Date: Tue, 02 May 2006 11:55:03 +1200
From: Murray Jorgensen [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch

The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets
looks like this:
[...]
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] vector-factor operation

2006-04-15 Thread Murray Jorgensen

Thanks to Berwin Turlach and Petr Pikal for

tapply(vec, Fac, mean)[Fac]

and Gabor Grothendieck Thomas Lumley for

ave(vec,Fac)  .

Looking at the code for tapply and ave I guess that the latter is to be 
preferred.

Murray Jorgensen

Gabor Grothendieck wrote:
 Look at ?ave
 
 ave(vec, Fac)
 ave(vec, Fac, FUN = mean) # same
 ave(vec, Fac, FUN = sd)
 
 On 4/14/06, Murray Jorgensen [EMAIL PROTECTED] wrote:
 I found myself wanting to average a vector [vec] within each level of a
 factor [Fac], returning a vector of the same length as vec. After a
 while I realised that

 lm1 - lm(vec ~ Fac)
 fitted(lm1)

 did what I want.

 But there must be another way to do this, and it would be good to be
 able to apply other functions than mean() in this way.

 Cheers, Murray

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] vector-factor operation

2006-04-13 Thread Murray Jorgensen

I found myself wanting to average a vector [vec] within each level of a 
factor [Fac], returning a vector of the same length as vec. After a 
while I realised that

lm1 - lm(vec ~ Fac)
fitted(lm1)

did what I want.

But there must be another way to do this, and it would be good to be 
able to apply other functions than mean() in this way.

Cheers, Murray
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Nested error structure in nonlinear model

2006-03-31 Thread Murray Jorgensen

I am trying to fit a nonlinear regression model to data. There are 
several predictor variables and 8 parameters. I will write the model as

Y ~ Yhat(theta1,...,theta8)

OK, I can do this using nls() - but only just as there are not as many 
observations as might be desired.

Now the problem is that we have a factor Site and I want to include a 
corresponding error component. I tried something like (excuse the 
doctored R output):

  mod0.nlme -  nlme(Y ~ Yhat(theta1,...,theta8) + site.err ,
+start = c(mod0.st,0),
+groups = ~ Site,
+   fixed = theta1 + ... + theta8 ~ 1,
+   random = site.err ~ 1)
Error in nlme.formula(Y ~ Yhat(theta1,...,theta8  :
 starting values for the fixed component are not the correct length

and then

  mod0.nlme -  nlme(Y ~ Yhat(theta1,...,theta8) + site.err ,
+start = mod0.st,
+groups = ~ Site,
+   fixed = theta1 + ... + theta8 ~ 1,
+   random = site.err ~ 1)
Error: Singularity in backsolve at level 0, block 1

The straightforward way would be to go

  mod0.nlme -  nlme(Y ~ Yhat(theta1,...,theta8) ,
+start = mod0.st,
+groups = ~ Site,
+   fixed = theta1 + ... + theta8 ~ 1)

making all 8 parameters random, but there is little hope of getting that 
to work. (I did try it and it does not crash straight away but settles 
down to run forever.)

Any comments welcome. I regret that I cannot go into much detail about 
the actual problem.

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nonlinear model: pseudo-design matrix

2006-02-13 Thread Murray Jorgensen

Given a nonlinear model formula and a set of values for all the
parameters defining a point in parameter space, is there a neat way to
extract the pseudodesign matrix of the model at the point? That is the
matrix of partial derivatives of the fitted values w.r.t. the parameters
evaluated at the point.

(I have figured out how to extract the gradient information from an nls 
fitted model using the nlsModel part, but I wish to implement a score 
test, so I need to be able to extract the information at points other 
than the mle.)

Thanks, Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Huber location estimate

2005-12-22 Thread Murray Jorgensen



Prof Brian Ripley wrote:
 On Thu, 22 Dec 2005, Murray Jorgensen wrote:
 
 We have a choice when calculating the Huber location estimate:
  set.seed(221205)
  y - 7 + 3*rt(30,1)
 
 
 That's Cauchy, BTW, a very extreme case.

Sure, the sort of situation where one might want a robust estimator.

[...]

 Note that the huber() scale estimate is the initial MAD, whereas rlm 
 iterates.  Because during iteration the 'center' for MAD is known to be 
 zero, the results differ.  For symmetric distributions there is little 
 difference, but your sample is not close to symmetric.

[Blush] yes I knew that and somehow I forgot. But leave rlm() alone for 
a while and do IRLS with fixed scale:

th - median(y)
s - mad(y)
# paste this in a few times:
w - ifelse((y-th 1.345*s  y-th-1.345*s), 1, 1.345*s/abs(y-th))
th - weighted.mean(y,w)
th

We converge to
  th
[1] 5.9203
close to the answer given by rlm() different from
  huber(y)$mu
[1] 5.9117

So the variable scale does not account for the difference.

Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Huber location estimate

2005-12-22 Thread Murray Jorgensen

D'oh!  Apologies for wasting everybody's time!

Murray

Martin Maechler wrote:
Murray == Murray Jorgensen [EMAIL PROTECTED]
on Thu, 22 Dec 2005 22:13:45 +1300 writes:
 
 
 Murray Prof Brian Ripley wrote:
  On Thu, 22 Dec 2005, Murray Jorgensen wrote:
  
  We have a choice when calculating the Huber location estimate:
   set.seed(221205)
   y - 7 + 3*rt(30,1)
  
  
  That's Cauchy, BTW, a very extreme case.
 
 Murray Sure, the sort of situation where one might want a robust 
 estimator.
 
 Murray [...]
 
  Note that the huber() scale estimate is the initial MAD, whereas rlm 
  iterates.  Because during iteration the 'center' for MAD is known to 
 be 
  zero, the results differ.  For symmetric distributions there is little 
  difference, but your sample is not close to symmetric.
 
 Murray [Blush] yes I knew that and somehow I forgot. But leave rlm() 
 alone for 
 Murray a while and do IRLS with fixed scale:
 
 Murray th - median(y)
 Murray s - mad(y)
 Murray # paste this in a few times:
 Murray w - ifelse((y-th 1.345*s  y-th-1.345*s), 1, 1.345*s/abs(y-th))
 Murray th - weighted.mean(y,w)
 Murray th
 
 Murray We converge to
  th
 Murray [1] 5.9203
 Murray close to the answer given by rlm() different from
  huber(y)$mu
 Murray [1] 5.9117
 
 Murray So the variable scale does not account for the difference.
 
 No, the main difference is the different default:  
 huber() has  k=1.5
 and rlm()   has  k=1.345 :
 
 Try this
 
 set.seed(221205)
 y - 7 + 3*rt(30,1)
 
 str(huber(y, k = 1.345), digits = 5)
 ## List of 2
 ##  $ mu: num 5.9203
 ##  $ s : num 4.0967
 
 str(rlm(y ~ 1)[c(coefficients, s)], digits = 5) #
 ## (edited to)
 ##  $ coefficients: num 5.9204
 ##  $ s   : num 3.7463
 
 which gives 'mu' very close, even for the iterated
 vs. non-iterated scales.
 
 Martin Maechler, ETH Zurich

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Huber location estimate

2005-12-21 Thread Murray Jorgensen

We have a choice when calculating the Huber location estimate:
  set.seed(221205)
  y - 7 + 3*rt(30,1)
  library(MASS)
  huber(y)$mu
[1] 5.9117
  coefficients(rlm(y~1))
(Intercept)
  5.9204

I was surprised to get two different results. The function huber() works 
  directly with the definition whereas rlm() uses iteratively reweighted 
least squares.

My surprise is because I vaguely remember

@ARTICLE{hw77,
   author  = {Holland, P. W. and Welsch, R. E.},
   title   = {Robust Regression using Iteratively Reweighted Least-Squares},
   journal = {Communications in Statistics: Theory and Methods},
   volume  = {A6(9)},
   number  = {},
   pages   = {813-827},
   year= {1977}
}
as saying that the two methods were equivalent. Obviously they aren't 
quite. Comments welcome.

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Simple introduction to Bayesian Statistics

2005-11-23 Thread Murray Jorgensen

Joshua N Pritikin wrote:

 I made it through the first chapter (Background) of Bayesian Data
 Analysis by Gelman, et al.  Conceptually, the Bayesian approach
 appeals to me and my curiosity is piqued.  However, the next chapter
 was much too terse. The math is daunting.  Where can I find a gentle
 introduction?  Or which math book(s) do I need to read first?
 
 On page 27, there is mention of introductory books including Bayesian
 Methods by Jeff Gill (2002).  Just for fun, I took a look at this book
 to see whether I could get through it.  Chapter 1 was inviting, but
 again, the math got too sophisticated starting from chapter 2.

Try
Introduction to Bayesian Statistics
William M. Bolstad
ISBN: 0-471-27020-2

by my colleague Bill Bolstad. This is written for a course targeting 
bright first or second year students and assumes very little background.

Cheers,  Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Extracting variance components in lme

2005-11-02 Thread Murray Jorgensen

Consider the output for the inroductory Rail example in Mixed Effects 
Models in S and S-PLUS by Pinheiro and Bates:

  summary(fm1Rail.lme)
Linear mixed-effects model fit by REML
  Data: Rail
   AIC  BIC   logLik
   128.177 130.6766 -61.0885

Random effects:
  Formula: ~1 | Rail
 (Intercept) Residual
StdDev:24.80547 4.020779

Fixed effects: travel ~ 1
 Value Std.Error DF  t-value p-value
(Intercept)  66.5  10.17104 12 6.538173   0

Standardized Within-Group Residuals:
 Min  Q1 Med  Q3 Max
-1.61882658 -0.28217671  0.03569328  0.21955784  1.61437744

Number of Observations: 18
Number of Groups: 6


I want to extract the variance components  sigma = 4.020779 (the within 
component) and sigma_b = 24.80547 (the between component).

I can get sigma easily:
sigma -  fm1Rail.lme$sigma

but how can I get sigma_b ?

Cheers,  Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Extracting Variance Components fro lme

2005-11-02 Thread Murray Jorgensen

It seems that what I need to get the within group component is

  as.numeric(VarCorr(fm1Rail.lme)[2,2])

thanks to Bert Gunter and Peter Alspach.

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Extracting variance components in lme

2005-11-02 Thread Murray Jorgensen

Woops! I should have written:


as.numeric(VarCorr(fm1Rail.lme)[1,2])

for the within component.

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Downloading zip files

2005-10-30 Thread Murray Jorgensen

I have not had a great amount of success installing/updating packages 
from the Packages menu of Rgui under Windows XL. (Except for 
installing from loacal zip files.)

But I am not asking for help in using these facilities because I prefer 
to keep a folder of package zip files. On the other hand I do find it 
tedious having to right-click Save link as on every individual file 
from CRAN.

I'm sure someone knows a faster way to do it.

Cheers,  Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Error in integrate

2005-10-06 Thread Murray Jorgensen

I'm using R 2.0.1 under Windows XP. I get the following message when I 
run the code listed below. I don't seem to have any problems using the 
function slice outside integrate.

Error in integrate(slice, 0, Inf, , , , , , , a, b) * delta :
 non-numeric argument to binary operator

[ By the way, I'm trying to evaluate a two-dimensional integral by 
slicing it up into one-dimensional bits which I will loop over to evaluate.]

Here's the code:

mu - 0.3
sig2 - 0.01
alpha - 20
beta - 15.75
rho - 0.1
k - 1/(rho^2.5*gamma(rho)^2*sqrt(2*pi*sig2))

slice - function(w,a,b)
{
  g - w^(1/rho)
  g1 - w1^(1/rho)
  h - g1^a*g^b
  E - -(y-rho*mu -g1/alpha + g/beta)^2/(2*sig2*rho)
  k*h*exp(E- g1 - g)
}

hi - 10
delta - 0.05
grid - seq(delta/2,hi,delta)
y - -0.3
a - 0
b - 0
m - length(grid)
A - rep(0,m)
j - 0

for (w1 in grid)
{
  j - j+1
  A[j] - integrate(slice,0,Inf,,,a,b)*delta
  cat(A[j],\n)
}


-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] R for teaching multivariate statistics (Summary)

2005-10-05 Thread Murray Jorgensen

Greetings all

I promised a summary of the responses that I got to my question:

Next year I will be teaching a third year course in applied statistics 
about 1/3 of which is multivariate statistics. I would be interested in 
hearing experiences from those who have taught multivariate statistics 
using R. Especially I am interested in the textbook that you used or 
recommended.

There were not many replies, so my task is easy!

Peter Dunn mentioned the new book by Brian Everitt, An R and S-Plus 
Companion to Multivariate Analysis which he had yet to see. Someone 
else has it out on loan here so I have not seen it either.

Peter has been using Bryan Manly's book but finds that it is expensive 
and describes different algorithms to those used in R.

Brian Ripley drew attention to Chapters 11 and 12 of MASS.

Pierre Bady pointed out the material on the website Enseignements de 
Statistique en Biologie http://pbil.univ-lyon1.fr/R/enseignement.html
by A.B. Dufour, D. Chessel  J.R. Lobry. I hope to explore this some 
more when I get back to higher bandwidth.

Pat Altham has a wealth of material on her web site, especially
http://www.statslab.cam.ac.uk/~pat/misc.ps
and
http://www.statslab.cam.ac.uk/~pat/AppMultNotes.ps.gz

Many thanks to these respondants for their help.

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] R for teaching multivariate statistics

2005-10-03 Thread Murray Jorgensen

Greetings all -

Next year I will be teaching a third year course in applied statistics 
about 1/3 of which is multivariate statistics. I would be interested in 
hearing experiences from those who have taught multivariate statistics 
using R. Especially I am interested in the textbook that you used or 
recommended.

If you reply directly to me I will summarize to reduce list traffic.

Regards,  Murray Jorgensen

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Coarsening Factors

2005-09-08 Thread Murray Jorgensen

It is not uncommon to want to coarsen a factor by grouping levels 
together. I have found one way to do this in R:

  sites
  [1] F A A D A A B F C F A D E E D C F A E D F C E D E F F D B C
Levels: A B C D E F
  regions - list(I = c(A,B,C), II = D, III = c(E,F))
  library(Epi)
  region - Relevel(sites,regions)
  region
  [1] III I   I   II  I   I   I   III I   III I   II  III III II  I 
III I   III
[20] II  III I   III II  III III III II  I   I
Levels: I II III

However this seems like using a sledgehammer to crack a nut. Can someone 
suggest a simpler way to do this task?

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk Home +64 7 825 0441   Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Sorting Text Frames

2005-09-07 Thread Murray Jorgensen

Uwe Ligges wrote:
 I guess there is just too much space or some special characters in your 
 variables that cause problems when printing ...
 Hence you have to debug your data yourself.
 
 Uwe Ligges

However the problem persists when I don't try to print the fram thanks 
to a file.

thanks[tord[1:74],]
gives me row numbers, V1 and V2 to the console, but

thanks[tord[1:75],]
gives only the row numbes and V1.

It is not a special problem with row tord[75]:

  thanks[tord[75],]
   V1 
V2
1192 Votic (Russia) [may God give you health]Antagoo Jumal tervüt teilee

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk Home +64 7 825 0441   Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Sorting Text Frames

2005-09-06 Thread Murray Jorgensen

[Using 2.0.1 under Windows XP]
There are a few pages on the internet that list equivalents of
thank you in many languages. I downloaded one from a Google search
and I thought that it would be interesting and a good R exercise to
sort the file into the order of the expressions, rather than the languages.

I tidied up the web page and got it into the format that it was nearly
in: Language Name in columns 1-43, the expression in the remaining
columns.

Then I read it in:

  thanks - read.fwf(C:\\Files\\Reading\\thankyou.txt, c(43,37))
  thanks[1:4,]
V1V2
1 Abenaki (Maine USA, Montreal Canada)Wliwni ni
2 Abenaki (Maine USA, Montreal Canada)   Wliwni
3 Abenaki (Maine USA, Montreal Canada)   Oliwni
4 Achí (Baja Verapaz Guatemala)   Mantiox chawe

  dim(thanks)
[1] 12542

Now I tried sorting the frame into the order of the second column:

tord - order(thanks$V2)
sink(C:\\Files\\Reading\\thanks.txt)
thanks[tord[1:74],]
sink()

This gives more or less the expected output, the file thanks.txt beginning

   V1 
V2
145  Cahuila (United States)'\301cha-ma
862  Paipai (Mexico, USA)'Ara'ya:ikm
863  Paipai (Mexico, USA)'Ara'yai:km
864  Paipai (Mexico, USA) 'Ara'ye:km
311  Eyak (Alaska)'Awa'ahdah

[you may get a bit of wrapping there!]

However I don't really want just 74 lines, I would like the whole file. But
if I get rid of the [1:74] or replace 74 with any larger number I get 
output
like this, with no second column:

   V1
145  Cahuila (United States)
862  Paipai (Mexico, USA)
863  Paipai (Mexico, USA)
864  Paipai (Mexico, USA)
311  Eyak (Alaska)

Does anyone know what is going on?
Tusen tak in advance, in fact 1254 tak in advance!

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk Home +64 7 825 0441   Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nlme SASmixed in 2.0.1

2005-04-05 Thread Murray Jorgensen

I assigned a class the first problem in Pinheiro  Bates, which uses the 
data set PBIB from the SASmixed package. I have recently downloaded 
2.0.1 and its associated packages. On trying

library(SASmixed)
data(PBIB)
library(nlme)
plot(PBIB)
I get a warning message
Warning message:
replacing previous import: coef in: namespaceImportFrom(self, 
asNamespace(ns))

after library(nlme) and a pairs type plot of PBIB.
Pressing on I get:
 lme1 - lme(response ~ Treatment, data=PBIB, random =~1| Block)
 summary(lme1)
Error: No slot of name rep for this object of class lme
Error in deviance([EMAIL PROTECTED], REML = REML) :
Unable to find the argument object in selecting a method for 
function deviance


Everything works fine under 1.8.1 and plot(PBIB) is of trellis style, 
which is what I think the authors intend.

Cheers,   Murray Jorgensen
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Order of messgaes/ missing messages

2004-08-04 Thread Murray Jorgensen

The TCP protocol sends a message as a stream of packets. When packets 
are lost over a particular link in the path from the sending machine at 
ETHZ to a recipient parts of the message will be re-transmitted. A 
message sent after another message can arrive before it if it 
experiences less packet loss on its path through the network. It's 
statistical, you see!

Murray Jorgensen
hadley wickham wrote:
Note that if this bothers you then you could try a web-based
email systems (e.g. yahoo, hotmail, etc.) to receive your r-help
messages since the path to any of them would be independent of your
particular location on the net.

I use gmail and still often recieve replies before the original
message.  Why should using webmail make a difference?  Surely it's the
r mailing list server that does all the sending (apart from specific
individual replies)?  And the listserve must recieve the reply after
the original, so it seems to me that it must be the time it takes to
get between the server and you that is varying.  Or is the listserve
sitting on some emails before sending them?  One test would be to see
if we recieve emails in the same order.
Hadley
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Association between discrete and continuous variable

2004-07-12 Thread Murray Jorgensen

I'm wondering if mutual information al la Cover  Thomas (1991, Ch 2) is 
not the killer association measure for all types of random variables?

Murray Jorgensen
PS  Yes, this is probably OT!
Richard A. O'Keefe wrote:
What's the reommended way, in R, to determine the strength of
association between a discrete variable and a continuous variable?
Yes, I have read the manuals, trawled the archives, c.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Nested source()s

2004-07-12 Thread Murray Jorgensen

I had an error message while running a macro from Yudi Pawitan's web site:
 source(ex2-13.r)
Error in parse(file, n, text, prompt) : syntax error on line 2
Inspecting ex2-13.r I found that the error was generated by another 
source() command.

Clearly R does not like nested source()s, which is fair enough when you 
think about it. Still it's something that you might want to do. Does 
anyone know how to get achieve the substance of what nested source() 
commands would give you?

Murray Jorgensen
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] text editor for R

2004-07-07 Thread Murray Jorgensen

I tried R-WinEdt a few years ago, but as I remember it interfered with 
my usual use of WinEdt which is as a front end to MiKTeX. Is there a way 
to use WinEdt both ways?

Murray Jorgensen
Marc Schwartz wrote:
On Wed, 2004-07-07 at 17:47, Yi-Xiong Sean Zhou wrote:
Hi, 

What is the best text editor for programming in R? I am using JEdit as the
text editor, however, it does not have anything specific for R. It will be
nice to have a developing environment where the keywords are highlighted,
plus some other debugging functions. 

Yi-Xiong

More information is available at:
http://www.sciviews.org/_rgui/
Your e-mail headers suggest that you are using Windows. Thus, perhaps
the two best choices (subject to challenge by others) would be:
1. R-WinEdt (Under IDE/Script Editors)
2. ESS for Windows
The above two tools provide for a wide variety of functionality beyond
syntax highlighting.
There is a syntax highlighting file listed at the above site for jEdit.
HTH,
Marc Schwartz
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] text editor for R

2004-07-07 Thread Murray Jorgensen

Oh, yes. I think I did do something like this, but for some reason the 
two incarnations bothered me.

Murray Jorgensen
David Scott wrote:
On Thu, 8 Jul 2004, Murray Jorgensen wrote:

I tried R-WinEdt a few years ago, but as I remember it interfered with 
my usual use of WinEdt which is as a front end to MiKTeX. Is there a way 
to use WinEdt both ways?

Murray Jorgensen
This problem annoyed me for a while too. My solution (which is not perhaps 
ideal) is this. You want two different incarnations of WinEdt, one for 
TeX, the other for R. On the desktop I have a shortcut to WinEdt which is 
the one for TeX stuff. I open the other one with R syntax highlighting etc 
by starting R and using library(RWinEdt). To do this you have to install 
the RWinEdt package and SWinRegistry. This is all well explained in the 
ReadMe.txt for RWinEdt.

I think with the right additions to the Target field in a shortcut to 
WinEdt you can call up the incarnation of WinEdt that is suitable for R. I 
haven't done that. You would then have two shortcuts to WinEdt, one for 
your TeX stuff, one for R.

Uwe Ligges is the guru for this though.
David Scott

David Scott	Department of Statistics, Tamaki Campus
		The University of Auckland, PB 92019
		Auckland	NEW ZEALAND
Phone: +64 9 373 7599 ext 86830		Fax: +64 9 373 7000
Email:	[EMAIL PROTECTED] 

Graduate Officer, Department of Statistics
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: Summary [R] R 1.9.0, special characters in variable names.

2004-06-24 Thread Murray Jorgensen

This is not what I would call a summary. A summary should:
1.  State the original question.
2.  Give a pre'cis of the responses.
Murray Jorgensen
Sixten Borg wrote:
Summary: 
The locale setting in the operating system seems to be involved in what confused me a little bit.
Thank you all for your help, especially the suggested work-around  data.frame(..., check.names=F) which works very well.

A mystery still to be solved is why two versions of R, running on the same machine on 
the same time, behaves differently.
Please do not respond to this on the list. I very much welcome you not to respond at 
all.
Sixten.

Prof Brian Ripley [EMAIL PROTECTED] 2004-06-24 10:20:55 
Can we stop blaming R for things which are not its fault, especially as 
that has already been pointed out twice this morning?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] MCLUST Covariance Parameterization.

2004-06-07 Thread Murray Jorgensen

I'm not going to directly answer your question but it seems to me that 
you want to fit a completely unconstrained Gaussian mixture model. This 
may not be the best thing to do as without constraints on the Sigma_k 
the model may have more parameters than it is reasonable to try to 
estimate with the available data.

A preliminary fit of a mixture model, even if it does not have quite the 
 covariance structure that you want gives a clustering of the data that 
you can use to explore the correlation structure of the components 
empirically: then you can revise the covariance structure and re-fit.

I think that this sort of approach is likely to be more effective than 
fitting the fully unstructured model directly.

Murray Jorgensen
[EMAIL PROTECTED] wrote:
Hello all (especially MCLUS users). 

I'm trying to make use of the MCLUST package by C. Fraley and A. Raftery. My problem is trying to figure out how the (model) identifier (e.g, EII, VII, VVI, etc.) relates to the covariance matrix. The parameterization of the covariance matrix makes use of the method of decomposition in Banfield and Rraftery (1993) and Fraley and Raftery (2002) where 

Sigma_k = lambda_k*D_k*A_k*D_k^' 

where Sigma_k is the covariance matrix for the kth (k=1,...,G), lambda_k is the kth groups constant of proportionality, D_k is the orthogonal matrix of eigenvectors for the kth group, and A_k is a diagonal matrix whose elements are proportional to the eigenvalues. The parameterization of the covariance matrix Sigma_k depends on the distribution (whether spherical, diagonal, or ellipsoidal), volume (equal or variable), shape (equal or variable), and orientation (coordinate axes, equal, or variable). The distribution, volume, shape and orientation are a function of  lambda_k, D_k, and A_k. Thus, depending on whether or not these values are constant across class defines Sigma_k. 

What I'm trying to figure out is how the distribution, volume, shape, and orientation relate to Sigma_k. As far as the 
parameterization of Sigma_k, what do distribution, volume, shape, and orientation 
even mean. Does a table exist of how these values relate to the Sigma_k? I know a table exists in the MCLUST software manual on the 
MCLUST website, but this table doesn't relate the values of distribution, volume, shape, and orientation to Sigma_k directly, only to 
how Sigma_k would be parameterized (this isnt helpful unless you know what distribution, volume, shape and orientation  mean 
in terms of the within class covariance matrix) So, just what do the distribution, volume, shape, and orientation mean in the context 
of Sigma_k?
 
What do the distribution, volume, shape, and orientation mean for a Sigma_k=sigma^2*I where I is a p by p covariance matrix, sigma^2 is the constant variance and Sigma_1=Sigma_2==Sigma_G. What about when a Sigma_k=sigma^2_k*I, or when Sigma_1=Sigma_2==Sigma_G in situations where each element of the (constant across class) covariance matrix is different?

I would say I have a pretty good understanding of finite mixture modeling, but nothing I've read (expect the works cited in the 2002 JASA paper) talks about parameterizing the Sigma_k matrix in such a way. It would be nice to specify a structure directly for Sigma_k (as most books talk about). Any help on this issue would be greatly appreciated. 

Thanks, 
Ken.

__

[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Problem with logtrans, from library MASS

2004-05-14 Thread Murray Jorgensen

Thanks for the help.

The variables do all come from a frame but with various transformations 
and manipulations. I prefer not to stick them all into a new frame just 
to call one function.

Thanks again,

Murray Jorgensen

Prof Brian Ripley wrote:
On 14 May 2004, Peter Dalgaard wrote:


Murray Jorgensen [EMAIL PROTECTED] writes:


Greetings all!

This problem occurs using R 1.8.1 on Windows XP. I downloaded the
binaries for R and all packages, including the VR bundle, in December
2003.
The data consists of NZ$ prices and attributes for 643 cars.

 summary(price)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
  14290   35800   48990   65400   79000  285000   2
 library(MASS)
 boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.1,0.5,length=30))
 boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.5,0.1,length=30))
This all works fine, I get a fairly sharp peak near lambda=-0.25, but then:

 logtrans(price ~ doors+CC+KW+KG+LENGTH, alpha=seq(-1,0,length=30))
Error in eval(expr, envir, enclos) : Object price not found
The issue is that logtrans has a 'data' argument that defaults to NULL
(the author(s) may have a reason for having this inconsistent with
boxplot?). Adding data=.GlobalEnv seems to work.


It's an R/S difference.  data=NULL or list() in S does give you the normal
search path.  I guess this got noticed for boxplot and not logtrans.
Using a data argument would in any case be a very good idea.

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Problem with logtrans, from library MASS

2004-05-13 Thread Murray Jorgensen

Greetings all!

This problem occurs using R 1.8.1 on Windows XP. I downloaded the 
binaries for R and all packages, including the VR bundle, in December 2003.

The data consists of NZ$ prices and attributes for 643 cars.

 summary(price)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
  14290   35800   48990   65400   79000  285000   2
 library(MASS)
 boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.1,0.5,length=30))
 boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.5,0.1,length=30))
This all works fine, I get a fairly sharp peak near lambda=-0.25, but then:

 logtrans(price ~ doors+CC+KW+KG+LENGTH, alpha=seq(-1,0,length=30))
Error in eval(expr, envir, enclos) : Object price not found
Comments welcome,

Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Re: Mozilla Misbehavior

2004-05-10 Thread Murray Jorgensen

It was a proxy settings problem. All fixed now. Sorry to disturb the list.

Murray Jorgensen

[EMAIL PROTECTED] wrote:

Hi Murray:
Perhaps the errors are thrown by Mozilla because it encounters an 
encrypted/secure page and does not know what to do (..:)..)? Would it 
help if you went to Edit  Preferences...  Privacy and Security  
select SSL or Certificates drop down list
and then once in SSL, or once in Certificates tab, make the necessary 
changes so that Mozilla can at least show you the certificates which you 
can then accept or reject before proceeding.
And indeed, why are mail archives on secure pages?
HTH,
Arin


Message: 9
Date: Mon, 10 May 2004 09:32:48 +1200
From: Murray Jorgensen [EMAIL PROTECTED]
Subject: Re: [R] Modalwert
To: [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=us-ascii; format=flowed
Off-topic, I know, but when I try to follow secure links like this in 
Mozilla, I always get a message The connection was refused when 
trying to contact www.stat.math.ethz.ch and I am forced to use 
Internet Explorer instead. This is under Windows XP.
Is this because www.stat.math.ethz.ch is using some non-standard 
Microsoft extensions of html? Or is it because my Mozilla 1.6 browser 
is wrongly configured?
Why are the mail archives on secure pages anyway?
Murray Jorgensen


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Modalwert

2004-05-09 Thread Murray Jorgensen

Off-topic, I know, but when I try to follow secure links like this in 
Mozilla, I always get a message The connection was refused when trying 
to contact www.stat.math.ethz.ch and I am forced to use Internet 
Explorer instead. This is under Windows XP.

Is this because www.stat.math.ethz.ch is using some non-standard 
Microsoft extensions of html? Or is it because my Mozilla 1.6 browser is 
wrongly configured?

Why are the mail archives on secure pages anyway?

Murray Jorgensen

Thomas Petzoldt wrote:
Sonja Dornieden wrote:

Hai -
kann mir jemand sagen, wie ich den Modalwert in R berechne?! IRgendwie 
finde
ich den Befehl nicht
greetz und herzlichen Dank
Sonja


Hi,

there was already a thread in this list about this question with subject 
Computing the mode on 24.02.2004. You will find several answers in the 
R-Help archive:

https://www.stat.math.ethz.ch/pipermail/r-help/2004-February/

Thomas P.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Changing Gui preferences

2004-04-23 Thread Murray Jorgensen

I want to modify my Gui preferences to get rid of the MDI toolbar and 
status bar. [ Under Windows XP and R 1.8.1 and 1.9.0.]

When I uncheck the boxes and click apply I am told to save the 
preferences and the changes will take effect when restarting R. I click 
save and a box comes up to ask me to save Rconsole (I'm not sure where 
this should be saved but I navigate my way to where the old Rconsole was 
and save on top of it. Strangely I am not asked if I want to replace the 
old Rconsole.

Anyway when I re-start there are no changes and I am still cluttered up 
by the MDI bars that I want to get rid of!

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Rgui front-end has encountered a problem and needs to close

2004-04-21 Thread Murray Jorgensen

Well I don't know if anyone can help with this but it will be 
interesting to know if others have had the same problem.

I can't start R at home on my laptop [ I'm using 1.8.1 under Windows 
XP]. When I click on the shortcut I get the usual Windows box for when 
an application needs to close. A couple of clicks down it displays the 
following:

Error signature
AppName: rgui.exeAppVer: 1.81.31121.0ModName: msvcrt.dll
ModVer: 7.0.2600.1106Offset: 0003213b
as well as another uncopyable window containing a large amount of binary.

I was going to re-install R at work today but the thing worked OK. But 
back home tonight I get the same problem! The only difference between 
home and work is that at work I'm connected to a LAN.

Cheers,

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Scanning tab-separated numbers

2004-03-01 Thread Murray Jorgensen

I want to paste in the following numbers into a scan:

0.023   0.032   0.054   0.069   0.081   0.094
0.105   0.127   0.148   0.169   0.188   0.216
they are separated by tabs alone, unless my mailer has done something to 
the tabs.

Now have a look at this:

 scan()
1: 0.0230.0320.0540.0690.0810.094
1: 0.1050.1270.1480.1690.1880.216
Error in scan() : scan expected a real, got 
0.0230.0320.0540.0690.0810.094
 scan(sep=\t)
1: 0.0230.0320.0540.0690.0810.094
1: 0.1050.1270.1480.1690.1880.216
Error in scan(sep =) : scan expected a real, got 
0.0230.0320.0540.0690.0810.094


Platform: Windows XP   Release 1.8.1

I can't seem to scan in tab-separated numbers even when I try to tell R 
to expect that. (this may be related to the Sweave problem I mentioned a 
few days ago.)

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Sweave and sep = \t

2004-02-23 Thread Murray Jorgensen

In my .Snw file:

=
fyle - choose.files()
fyle
f - count.fields(fyle, sep = \t)
f
@
and in the .tex file:

\begin{Schunk}
\begin{Sinput}
 fyle - choose.files()
 fyle
\end{Sinput}
\begin{Soutput}
[1] C:\\Files\\Data\\Cars03\\Rover.txt
\end{Soutput}
\begin{Sinput}
 f - count.fields(fyle, sep =)
 f
\end{Sinput}
\begin{Soutput}
[1] 8 8 8 8 8
\end{Soutput}
\end{Schunk}
But I want the sep = \t to be left as is.

Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] references on cluster analysis

2004-02-22 Thread Murray Jorgensen

I don't really believe that there is any satisfactory definition of the 
true number of clusters let along a procedure that would reliably find it.

Murray Jorgensen

Martin Maechler wrote:

Back from my vacation, I haven't seen an R-help answer on this
  (Christian, where have you been ? ;-)

GiampS == Giampiero Salvi [EMAIL PROTECTED]
   on Sat, 7 Feb 2004 23:40:36 +0100 (CET) writes:


GiampS Hi all, I'm doing a study on predicting the true
GiampS number of clusters in a hierarchical clustering
GiampS scheme. My main reference is at the moment
GiampS Milligan GW and Cooper MC (1985) An examination of
GiampS procedures for determining the number of clusters in
GiampS a data set Psychometrika vol 50 no 2 pp 159-179
GiampS and all the references included in that paper.

(not available to me)

GiampS I'm planning to perform a similar comparison on a
GiampS number of indexes, but on a much larger data set (in
GiampS the order of 3000 points), and with a much higher
GiampS true number of clusters (in the order of some
GiampS hundreds), to see if the properties of the indexes
GiampS scale accordingly.
GiampS I was wondering if the set of indexes described in
GiampS the reference are still state of the art (most of
GiampS them were introduced in the '60s and '70s), or if
GiampS there are new indexes and methods I could include in
GiampS my study. I would really appreciate if you could
GiampS point me to some newer references addressing this problem.
Gordon's 2nd edition,

  author =   {A. D. Gordon},
  title ={Classification, 2nd Edition},
  publisher ={Chappman \ Hall/CRC},
  year = 1999,
  series =   {Monographs on Statistics and Applied Probability 82},
  edition =  {2nd edition}
has a whole chapter (one of the last ones in the book) on this.

R's cluster package has a generic silhouette() function (with 2 methods),
and plot.silhouette() method --- all are improvements from
Kaufman  Rousseeuw's original code.
A recent research paper using CLEST (Fridyland  Dudoit),
mentioning GAP (Tibshirani) etc etc  still find silhouette
among the best indices for determining the number of clusters.
A student's (master) thesis here seems to point in the same
direction.
GiampS I also read Milligan's chapter in the book
GiampS Clustering and Classification from 1995, 
(which book? author?)

GiampS but didn't find information on this subject that wasn't
GiampS included in the previous paper.
Regards,
Martin Maechler [EMAIL PROTECTED]   http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] In praise of -

2004-02-17 Thread Murray Jorgensen

This is not a problem, but I thought that I might say one or two things 
in favour of right pointing assignment before anybody influential gets 
the idea of scrapping it!

Situation (a)

You have just typed a long complcated expression into the console and 
you suddenly realise that you will need it later in the session. Just 
append
- something
to the end of the expression. You can do this with any recently 
evaluated expression with the help of the up-arrow key (in Windows at 
least).

Situation (b)

You have written a chunk of code and you want to see how it behaves for 
various values of the scalar fred. Just put
- fred
at the start of the code block, type any number, then paste the code 
into the console.

Cheers,  Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] In praise of -

2004-02-17 Thread Murray Jorgensen



Liaw, Andy wrote:

 This is also easily accomodated w/o the use of -:  Instead of

 MyValue - fred
 [code that follows]

 use:

 fred -
 MyValue
 [code that follows]

 Andy
The point is that

- fred
[code that follows]
is sitting as one piece in the clipboard. So instead of

3 paste
8 paste
etc
you need to do

fred - 3 paste
fred - 8 paste
etc
Not a big saving, but some of us are lazy!

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Sweave problem

2004-02-05 Thread Murray Jorgensen

Sorry, in an effort to be minimal I cut away some of my surrounding text:
The first code block is meant to be typed in by the user, except for the
numbers which are to be thought of as pasted in. I certainly do not wish to
supply a file name to scan() for this particular example.

Murray

At 07:44 5/02/2004 +, Prof Brian Ripley wrote:
Note that ?scan says

file: the name of a file to read data values from.  If the
  specified file is '', then input is taken from the keyboard

and you want it to come from the file.  So I would not have expected it to 
work.

However, the help file is not totally accurate, as this will work in a
batch file in R (although it has not worked in some versions of S-PLUS).  
The problem is that Sweave does expect the input to all be S code, and
does not get as far as running it.


Far from being `rather simple', this is a rather sophisticated usage, 
intended for use only at a keyboard.


On Thu, 5 Feb 2004, Murray Jorgensen wrote:

 Here is the file minimal.Snw:
 
 \documentclass[a4paper]{article}
 \title{R tips and tricks}
 \author{Murray Jorgensen}
 \usepackage{Sweave}
 \begin{document}
 \maketitle
 \section*{Entering data from a single variable}
 The following data are transformed tensile strength measurements on 
 polyester
 fibres. They may be found on the file \texttt{TENSILE.DAT}. We
 may enter this data into R using the \texttt{scan} command.
 =
 strength - scan()
 0.023   0.032   0.054   0.069   0.081   0.094
 0.105   0.127   0.148   0.169   0.188   0.216
 
 @
 \end{document}
 
 and now I attempt to use SWeave to convert it to minimal.tex:
 
   Sweave(minimal.Snw)
 Writing to file minimal.tex
 Processing code chunks ...
   1 : echo term verbatim
 
 Error:  chunk 1
 Error in parse(file, n, text, prompt) : parse error
 
 It seems a rather simple piece of code to generate such an error message!
 
 Murray
 
 [R 1.8.1 on Windows XP]
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Sweave problem

2004-02-04 Thread Murray Jorgensen

Here is the file minimal.Snw:

\documentclass[a4paper]{article}
\title{R tips and tricks}
\author{Murray Jorgensen}
\usepackage{Sweave}
\begin{document}
\maketitle
\section*{Entering data from a single variable}
The following data are transformed tensile strength measurements on 
polyester
fibres. They may be found on the file \texttt{TENSILE.DAT}. We
may enter this data into R using the \texttt{scan} command.
=
strength - scan()
0.023   0.032   0.054   0.069   0.081   0.094
0.105   0.127   0.148   0.169   0.188   0.216

@
\end{document}
and now I attempt to use SWeave to convert it to minimal.tex:

 Sweave(minimal.Snw)
Writing to file minimal.tex
Processing code chunks ...
 1 : echo term verbatim
Error:  chunk 1
Error in parse(file, n, text, prompt) : parse error
It seems a rather simple piece of code to generate such an error message!

Murray

[R 1.8.1 on Windows XP]
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Finding Sweave.sty and other problems

2004-01-29 Thread Murray Jorgensen

Hmmm, I can certainly remove the path from the \usepackage command. (Now
that I come to think of it, I have never seen that in LaTeX before.) I
wonder why Sweave put it there in the first place? I thought that I was
just running it straight out of the box. Don't tell me though: I will
read the manual some more.

Murray

At 07:59 29/01/2004 +, Prof Brian Ripley wrote:
~ is an active character in TeX, so it assumes it is not in a filename.
You will need to escape it.

It would be better to have
\usepackage{Sweave}
there and the path in your TEXINPUTS.  TeX is not really designed to work 
with file paths.

On Thu, 29 Jan 2004, Murray Jorgensen wrote:

 Hi,
 
 I've just tried to run example-3 from Friedrich Leish. I'm using R 1.8.1 
 and MiKTeX 2.2 on Windows XP.
 
 I go
 ===
   library(tools)
   Sweave(example-3.Snw)
 Writing to file example-3.tex
 Processing code chunks ...
   1 : term hide
   2 : echo term verbatim
   3 : term tex
   4 : term verbatim eps pdf
 
 You can now run LaTeX on example-3.tex
 ===
 The file example-3.tex looks OK, it starts off
 ===
 \documentclass[a4paper]{article}
 
 \usepackage{C:/PROGRA~1/R/rw1081/share/texmf/Sweave}
 \begin{document}
 
 
 \section*{The Cats Data}
 ..
 ===
 but my LaTeX log file tells a sad story:
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Finding Sweave.sty and other problems

2004-01-28 Thread Murray Jorgensen

Hi,

I've just tried to run example-3 from Friedrich Leish. I'm using R 1.8.1 
and MiKTeX 2.2 on Windows XP.

I go
===
 library(tools)
 Sweave(example-3.Snw)
Writing to file example-3.tex
Processing code chunks ...
 1 : term hide
 2 : echo term verbatim
 3 : term tex
 4 : term verbatim eps pdf
You can now run LaTeX on example-3.tex
===
The file example-3.tex looks OK, it starts off
===
\documentclass[a4paper]{article}
\usepackage{C:/PROGRA~1/R/rw1081/share/texmf/Sweave}
\begin{document}
\section*{The Cats Data}
..
===
but my LaTeX log file tells a sad story:
===
This is TeX, Version 3.141592 (MiKTeX 2.2) (preloaded format=latex 
2000.11.28)  29 JAN 2004 16:36
**example-3.tex
(example-3.tex
LaTeX2e 2001/06/01
Babel v3.7h and hyphenation patterns for english, french, german, 
ngerman, du
mylang, nohyphenation, loaded.
(C:\texmf\tex\latex\base\article.cls
Document Class: article 2001/04/21 v1.4e Standard LaTeX document class
(C:\texmf\tex\latex\base\size10.clo
File: size10.clo 2001/04/21 v1.4e Standard LaTeX file (size option)
)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
\abovecaptionskip=\skip41
\belowcaptionskip=\skip42
\bibindent=\dimen102
)
! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
? s
OK, entering \scrollmode...

! LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H return  for immediate help.
 ...
l.4 \begin
  {document}
You're in trouble here.  Try typing  return  to proceed.
If that doesn't work, type  X return  to quit.
! Extra \endcsname.
[EMAIL PROTECTED] [EMAIL PROTECTED] -h@@k\endcsname
  [EMAIL PROTECTED] \let 
\CurrentOptio...
l.4 \begin
  {document}
I'm ignoring this, since I wasn't doing a \csname.

! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
The control sequence marked to be read again should
not appear between \csname and \endcsname.
! Extra \endcsname.
[EMAIL PROTECTED]@aded ...er \ifx \csname [EMAIL PROTECTED]
  \relax \expandafter 
[EMAIL PROTECTED]
l.4 \begin
  {document}
I'm ignoring this, since I wasn't doing a \csname.

! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
The control sequence marked to be read again should
not appear between \csname and \endcsname.
! Extra \endcsname.
[EMAIL PROTECTED]@ptions ...xdef \csname [EMAIL PROTECTED]
  [EMAIL PROTECTED] 
[EMAIL PROTECTED]
l.4 \begin
  {document}
I'm ignoring this, since I wasn't doing a \csname.

! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
The control sequence marked to be read again should
not appear between \csname and \endcsname.
! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
The control sequence marked to be read again should
not appear between \csname and \endcsname.
! Extra \endcsname.
argument ...e/texmf/[EMAIL PROTECTED] \endcsname
  ,
l.4 \begin
  {document}
I'm ignoring this, since I wasn't doing a \csname.
! Missing \endcsname inserted.
to be read again
   \protect
l.4 \begin
  {document}
The control sequence marked to be read again should
not appear between \csname and \endcsname.
! Extra \endcsname.
argument [EMAIL PROTECTED]@currname [EMAIL PROTECTED] \endcsname
  [EMAIL PROTECTED] 
\InputIfFileExists...
l.4 \begin
  {document}
I'm ignoring this, since I wasn't doing a \csname.

! LaTeX Error: File `C:/[EMAIL PROTECTED] \penalty [EMAIL PROTECTED] \ 
{}1/R/rw1081/share
/texmf/Sweave.sty' not found.

Type X to quit or RETURN to proceed,
or enter new name. (Default extension: sty)
Enter file name: x

Overfull \hbox (689.89102pt too wide) in paragraph at lines 4--4
[][] \OT1/cmr/m/n/10 1/R/rw1081/share/texmf/Sweave.sty-h@@k 
1/R/rw1081/share/te
xmf/Sweave.sty1/R/rw1081/share/texmf/Sweave.sty1/R/rw1081/share/texmf/Sweave.st
y, 1/R/rw1081/share/texmf/Sweave.sty 1/R/rw1081/share/texmf/Sweave.sty
 []

 )
(\end occurred when \ifx on line 4 was incomplete)
(\end occurred when \ifx on line 4 was incomplete)
Here is how much of TeX's memory you used:
 205 strings out of 96052
 1946 string characters out of 1197190
 46593 words of memory out of 1050795
 3221 multiletter control sequences out of 35000
 3640 words of font info for 14 fonts, out of 50 for 1000
 14 hyphenation exceptions out of 607
 23i,1n,17p,117b,40s stack positions out of 1500i,500n,5000p,20b,32768s
No pages of output.
===
Any comments welcome!

Murray

--
Dr Murray Jorgensen

[R] Assignments in loops

2003-12-29 Thread Murray Jorgensen

Greetings all. Any help with the following would be appreciated.

I want to create a data frame for each file in a directory. The following
code does not work but it may show what I am trying to do:

carmakes - c('BMW','Chrysler','Citroen','Fiat','Ford','Holden','Honda',
'Mercedes','MG','Mitsubishi','Nissan','Peugeot','Renault','Subaru','Toyota',
'VW')
for (brand in carmakes) {
   fyle - paste(c:/data/cars03/,brand,.txt,sep=)
   brand - read.table(fyle, header = TRUE, sep = \t,na.strings = 
c(-,POA), colClasses=c(character,rep(numeric,7)),
 comment.char = #)
}

I need something like an unquote() function that will allow the brand on
the LHS of the assignment to be treated as an object name instead of a
character string.

Murray






Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Assignments in loops

2003-12-29 Thread Murray Jorgensen

Thanks to Andy, Peter and Roger for drawing my attention to assign(), which
is just what I needed and works fine.

Murray

At 14:11 30/12/2003 +1300, Murray Jorgensen wrote:
Greetings all. Any help with the following would be appreciated.

I want to create a data frame for each file in a directory. The following
code does not work but it may show what I am trying to do:

carmakes - c('BMW','Chrysler','Citroen','Fiat','Ford','Holden','Honda',
'Mercedes','MG','Mitsubishi','Nissan','Peugeot','Renault','Subaru','Toyota',
'VW')
for (brand in carmakes) {
   fyle - paste(c:/data/cars03/,brand,.txt,sep=)
   brand - read.table(fyle, header = TRUE, sep = \t,na.strings = 
c(-,POA), colClasses=c(character,rep(numeric,7)),
 comment.char = #)
}

I need something like an unquote() function that will allow the brand on
the LHS of the assignment to be treated as an object name instead of a
character string.

Murray






Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] missing data and completed missing data

2003-12-22 Thread Murray Jorgensen

You might take a set of classified data, say Fisher's irises, and treat 
the classification as initially unknown.

Murray

Raphael Schoenle wrote:

Hi,
 
This is not exactly an R request, but does anyone know of a good dataset
that contains missing and missing data that have been completed later
(like from persistent in-person interview attempts)? (want it for some
Bayesian regression analysis)
 
Thanks!!
 
-Raphael

	[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] mclust - clustering by spatial patterns

2003-12-18 Thread Murray Jorgensen



Thomas W Blackwell wrote:

[...]
Why not simply use  dist()  and  hclust() ?   Starting with
presence/absence data, what could  mclust()  possibly do that
is different from  hclust() ?
Um, fit a statistical model.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] reverse lexicographic order

2003-12-14 Thread Murray Jorgensen

Hi all,

I have some email addresses that I would like to sort in reverse 
lexicographic order so that addresses from the same domain will be 
grouped together. How might that be done?

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] reverse lexicographic order

2003-12-14 Thread Murray Jorgensen

Hi Duncan, Hi Peter,

thanks for those ideas!  I'm sure I will learn a lot be fooling around 
with them.

Cheers,

Murray

Duncan Murdoch wrote:

On Mon, 15 Dec 2003 10:08:31 +1300, you wrote:


Hi all,

I have some email addresses that I would like to sort in reverse 
lexicographic order so that addresses from the same domain will be 
grouped together. How might that be done?


I'm not sure this is what you want, but this function sorts a
character vector by last letters, then 2nd last, 3rd last, and so on:
revsort - function(x,...) {
x[order(unlist(lapply(strsplit(x,''),
function(x) paste(rev(x),collapse=''))),...)]
}

revsort(as.character(1:20))
 [1] 10 20 1  11 2  12 3  13 4  14 5  15 6
16 7 
[16] 17 8  18 9  19

The ... args are given to order(), so na.last=FALSE and
decreasing=TRUE are possibilities.
Duncan Murdoch




--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] mode

2003-12-12 Thread Murray Jorgensen

Apologies for pursuing this increasingly off-topic thread, but I've just
remembered that 'my' mode is not so hard to compute. Suppose a
one-dimensional data set is in general position (all gaps unequal), find
the smallest gap, then choose whichever endpoint has the closest neighbour.
That's the mode.

At 16:39 12/12/2003 +1300, Murray Jorgensen wrote:
Opps! This is what I should have written:

The mode of a data vector x might be defined as the limit of m_p as p 
tends to zero from above and where m_p is the m minimizing
sum(abs(x - m)^p).  I would not expect the mode so defined to be of much 
use in data analysis, nor would it be easy to compute.

Murray

[Thanks to Duncan Murdoch for noticing the missing ^p .]

Douglas Bates wrote:

 Christian Mora [EMAIL PROTECTED] writes:
 
 
How can I get the mode (most frequent value) from a dataset (continuos
variables)? I can obtain it when the data is discrete (by making a table
and looking at the higher frequency) but I don't know how obtain it
from, for example, a density plot of the data. Does anyone know how to
do it? Thanks
 
 
 I don't think the mode of a sample from a continuous random variable is
well
 defined.
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 

-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] mode

2003-12-11 Thread Murray Jorgensen

The mode of a data vector x might be defined as the limit of m_p as p 
tends to zero from above and where m_p is the m minimizing
sum(abs(x - m)).  I would not expect the mode so defined to be of much 
use in data analysis, nor would it be easy to compute.

Murray

Douglas Bates wrote:

Christian Mora [EMAIL PROTECTED] writes:


How can I get the mode (most frequent value) from a dataset (continuos
variables)? I can obtain it when the data is discrete (by making a table
and looking at the higher frequency) but I don't know how obtain it
from, for example, a density plot of the data. Does anyone know how to
do it? Thanks


I don't think the mode of a sample from a continuous random variable is well
defined.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] mode

2003-12-11 Thread Murray Jorgensen

Opps! This is what I should have written:

The mode of a data vector x might be defined as the limit of m_p as p 
tends to zero from above and where m_p is the m minimizing
sum(abs(x - m)^p).  I would not expect the mode so defined to be of much 
use in data analysis, nor would it be easy to compute.

Murray

[Thanks to Duncan Murdoch for noticing the missing ^p .]

Douglas Bates wrote:

Christian Mora [EMAIL PROTECTED] writes:


How can I get the mode (most frequent value) from a dataset (continuos
variables)? I can obtain it when the data is discrete (by making a table
and looking at the higher frequency) but I don't know how obtain it
from, for example, a density plot of the data. Does anyone know how to
do it? Thanks


I don't think the mode of a sample from a continuous random variable is well
defined.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Character graphics

2003-12-07 Thread Murray Jorgensen

Does anyone else miss email-friendly character graphics such as the 
following example, produced using Minitab?

Histogram of C6   N = 478   N* = 21
Each * represents 2 observation(s)
MidpointCount
 -12   16  
 -11   53  ***
 -10   63  
  -9   83  **
  -8   93  ***
  -7   74  *
  -6   45  ***
  -5   13  ***
  -46  ***
  -32  *
  -2   10  *
  -1   18  *
   01  *
BTW, Minitab 13 protests when you want to do this:

MTB  gstd
* NOTE  * Character graphs are obsolete.
* NOTE  * Standard Graphics are enabled.
  Professional Graphics are disabled.
  Use the GPRO command to enable Professional Graphics.
Has anyone tried to get similar unprofessional displays out of R?

Cheers,  Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] add a point to regression line and cook's distance

2003-12-03 Thread Murray Jorgensen

Not a good idea, unless the regression function is *known* to be linear. 
More likely it is only approximately linear over small ranges.

Murray Jorgensen

Wiener, Matthew wrote:

If you know that the line should pass through (0,0), would it make sense to
do a regression without an intercept?  You can do that by putting -1 in
the formula, like:  lm(y ~ x - 1).
Hope this helps,

Matt

Matthew Wiener
RY84-202
Applied Computer Science  Mathematics Dept.
Merck Research Labs
126 E. Lincoln Ave.
Rahway, NJ 07065
732-594-5303 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
Sent: Wednesday, December 03, 2003 5:51 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [R] add a point to regression line and cook's distance
  What is the context?  What do the outliers represent?  If you 
think carefully about the context, you may find the answer. 

  hope this helps.  spencer graves
p.s.  I know statisticians who worked for HP before the split and who 
still work for either HP or Agilent, I'm not certain which.  If you want 
to contact me off-line, I can give you a couple of names if that might 
help. 

[EMAIL PROTECTED] wrote:


Hi, 

This is more a statistics question rather than R question. But I thought
people on this list may have some pointers.

MY question is like the following:
I would like to have a robust regression line. The data I have are mostly
clustered around a small range. So

the regression line tend to be influenced strongly by outlier points (with
large cook's distance). From the application

's background, I know that the line should pass (0,0), which is far away
from the data cloud. I would like to add this

point to have a more robust line. The question is: does it make sense to do
this? what are the negative impacts if any?

thanks,
jonathan
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] sampling without repetition

2003-11-18 Thread Murray Jorgensen

Really you just want a random permutation of r:

 rperm - sample(r)

Then you can obtain a random partition of r into cells of any desired size
simply by taking appropriately-sized consecutive blocks of rperm.

Murray

At 22:42 17/11/2003 -0500, Rajarshi Guha wrote:

Sorry for not providing all the details.

The 3 sets can be of any size (which will be specified by the user of
the function) and cover all of r (ie, set1 + set2 + set3 == r)

Thanks to everybody for all the solutions.

---
Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
---
I'd love to go out with you, but there are important world issues that
need worrying about.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Plotting lm() attributes

2003-11-12 Thread Murray Jorgensen

Suppose you fit a linear model

 model.1 ~ lm(v1 ~ ..., data=myframe)

and v2 is some other column of myframe typically not in the model. You 
will often want to try

 plot(v2, model.1$residuals)

but this will fail if there are NAs in the response v1 as 
model.1$residuals has length equal to the number of nonmissing values in 
 v1. I suppose

 plot(v2[!is.na(v1)], model.1$residuals)

does the job, but it seems irritating that model.1$residuals, does not 
have length agreeing with the number of rows in the data frame. It would 
be even more irritating for model.1$fitted.values, where the removed 
elements would often have nonmissing values.

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] ETA for 1.8.1 ?

2003-11-05 Thread Murray Jorgensen

When is version 1.8.1 likely to be released?

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] detecting singular matrices

2003-10-09 Thread Murray Jorgensen

My colleague runs R 1.7.1 under Windows XP. He remarks:

A
[,1] [,2] [,3]
[1,]123
[2,]456
[3,]789
b
[1] 1 2 3
solve(A,b)
[1] -0.333  0.667  0.000
solve(A)
   [,1] [,2][,3]
[1,] -4.5036e+15  9.00720e+15 -4.5036e+15
[2,]  9.0072e+15 -1.80144e+16  9.0072e+15
[3,] -4.5036e+15  9.00720e+15 -4.5036e+15
eigen(A)
$values
[1]  1.611684e+01 -1.116844e+00 -1.303678e-15
with a condition number of  1.236260e+16 I think I'd like it to tell me it's singular like it used to :) and I know it is singular 
Under 1.6.0 also on XP I get

 solve(A,1:3)
Error in solve.default(A, 1:3) : singular matrix `a' in solve
 solve(A)
Error in solve.default(A) : singular matrix `a' in solve
 eigen(A)
$values
[1]  1.611684e+01 -1.116844e+00 -1.120190e-16
$vectors
   [,1][,2]   [,3]
[1,] -0.2719964 -0.76169140  0.3734378
[2,] -0.6159646 -0.08408654 -0.7468756
[3,] -0.9599328  0.59351831  0.3734378
Which seems to be preferable output.

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] R-project [.com?] [.net?]

2003-09-22 Thread Murray Jorgensen

I got a shock a few days ago when I accidentally visited 
www.r-project.com . I thought that the r-project site had been hacked 
until I realised my mistake. There is also a site www.r-project.net. 
Both of these sites appear to be Japanese. Does anyone know anything 
about them? I suppose that it is not unusual for names close to those of 
popular sites to be used. It is good that they use a different language 
or there might well be confusion.

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Suming logical vectors

2003-09-17 Thread Murray Jorgensen

Can anyone explain the following?  [R 1.6.0 Windows XP, yes I will 
upgrade soon.]

Murray

 sort(IATmedian)[0:50]==0
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE
 sum(sort(IATmedian)[0:50]==0)
[1] 2
 sum(sort(IATmedian)==0)
[1] 2
 sum(IATmedian==0)
[1] NA

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Suming logical vectors

2003-09-17 Thread Murray Jorgensen

Blush!  Not too hard to explain at all!

Deepayan Sarkar wrote:

Your IATmedian has some NAs (which are removed by sort) ?

On Thu, 18 Sep 2003, Murray Jorgensen wrote:


Can anyone explain the following?  [R 1.6.0 Windows XP, yes I will
upgrade soon.]
Murray

 sort(IATmedian)[0:50]==0
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE
 sum(sort(IATmedian)[0:50]==0)
[1] 2
 sum(sort(IATmedian)==0)
[1] 2
 sum(IATmedian==0)
[1] NA
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help



--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Quit asking me if I want to save the workspace!

2003-09-16 Thread Murray Jorgensen

How do you stop R from putting up a dialog box when you quit Rgui?
(I use Windows and I never save workspaces that way)
Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Quit asking me if I want to save the workspace!

2003-09-16 Thread Murray Jorgensen

Rafael A. Irizarry wrote:
you can type this:

q(no)

see the help file for q
Still more work than two mouse clicks.

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Quit asking me if I want to save the workspace!

2003-09-16 Thread Murray Jorgensen

Ah! now that tells me what I want to know. I was trying to type

C:\Program Files\R\rw1071\bin\Rgui.exe --no-save

instead of

C:\Program Files\R\rw1071\bin\Rgui.exe --no-save

into the Target box.  Silly me!

Jason Turner wrote:

On Wed, 2003-09-17 at 14:26, Murray Jorgensen wrote:

Rafael A. Irizarry wrote:

you can type this:

q(no)

see the help file for q
Still more work than two mouse clicks.


Two clicks!  How awful!   ;)

Actually, it bugs me too, so my desktop shortcut (under Win XP) has this
for Target.
###
C:\Program Files\R\rw1071\bin\Rgui.exe --no-save
###
 
(my mail client might've line-wrapped that by the time you see it. 
Everything between the ### marks is one line.  *Include* the quotes. 
There is a space between Rgui.exe and --no-save)

See Appendix B of An Introduction to R if you need more info.

Hope that helps.

Jason
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R tools for large files

2003-08-26 Thread Murray Jorgensen

I would like to thank those who have responded and especially Brian 
Ripley for making his unix tools for Windows available. A colleague has 
also mentioned to me the set of unix tools called Cygwin.

Two things that can be done with R alone are to read the first n lines 
of a file into n strings with readLines() and to scan in a block of the 
file after skipping a number of lines.

I will probably use Fortran to extract subsets of the file as I need to 
use it for other things that I am planning to do with the file.

I'll maybe also play a bit with readLines() and writeLines() inside 
loops to see if I can build up my random subsets of files this way.

BTW, I now estimate the file at about 100,000 lines so indeed, it is not 
all that large!

Murray Jorgensen

Prof Brian Ripley wrote:

On Mon, 25 Aug 2003, Murray Jorgensen wrote:


At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote:

I think that is only a medium-sized file.
Large for my purposes means more than I really want to read into memory
which in turn means takes more than 30s. I'm at home now and the file
isn't so I'm not sure if the file is large or not.
More responses interspesed below. BTW, I forgot to mention that I'm using
Windows and so do not have nice unix tools readily available.


But you do, thanks to me, as you need them to installed R packages.


On Mon, 25 Aug 2003, Murray Jorgensen wrote:


I'm wondering if anyone has written some functions or code for handling 
very large files in R. I am working with a data file that is 41 
variables times who knows how many observations making up 27MB altogether.

The sort of thing that I am thinking of having R do is

- count the number of lines in a file
You can do that without reading the file into memory: use
system(paste(wc -l, filename)) 
Don't think that I can do that in Windows XL.


I presume you mean Windows XP?  Of course you can, and wc.exe is in 
Rtools.zip!


or read in blocks of lines via a 

connection
But that does sound promising!


- form a data frame by selecting all cases whose line numbers are in a 
supplied vector (which could be used to extract random subfiles of 
particular sizes)
R should handle that easily in today's memory sizes.  Buy some more RAM if 
you don't already have 1/2Gb.  As others have said, for a real large file,
use a RDBMS to do the selection for you.
It's just that R is so good in reading in initial segments of a file that I
can't believe that it can't be effective in reading more general
(pre-specified) subsets.


--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R tools for large files

2003-08-26 Thread Murray Jorgensen

Hi Martin,

I don't know much about the concept of connection but I had supposed 
it to at least include the concept of file and perhaps also input 
device and output device'. I guess the important point that you are 
making is that it is sequential in the sense that you describe. I 
suppose at the time that I wrote my emails I didn't *know* that this was 
 the case but rather assumed that this must be so, since it would be 
tedious in the extreme to have to work with the access functions if they 
kept going back to the beginning of the connection.

It may help to explain the application. The large files that I am 
working with are themselves statistical summaries of internet traffic 
flows (you will appreciate why they can be almost arbitrarily large!) I 
am interested in clustering these flows into different classes of 
traffic. I am using a model-based approach, so that the end-point will 
be statistical models for each cluster. Once these have been estimated 
they may be used in the classification of future traffic [including a 
residual class of traffic that does not fit any cluster well].

Based on experience with my clustering software (Multimix) I believe 
that it should work well on data sets of, say, 3000 observations. I plan 
to select a small number of random subsets of this size. The replication 
of these subsets should help me with model selection questions (How many 
Clusters? How complex should each cluster model be?)

Tom Mulholland makes a good point when he notes that many R users (and 
other users) have very little control over their computing environment 
owing to somewhat arbitrary IT management decisions. For this reason it 
will be advantageous to have several solutions to large file problems.

I'm pleased that you think that efficient R functions for manipulating 
numbered lines from files may be written. I'm going to have a go at it 
just as soon as I finish a big item of paperwork!

BTW, I will be out of town and with much reduced email access over the 
next week or so, so if I don't reply to the list or individuals this 
should not be put down to laziness or rudeness!

Cheers,

Murray Jorgensen

PS  Give my regards to Chris Hennig.

Martin Maechler wrote:
Hi Murray,

from reading your summarizing reply, I wonder if you missed the
most important point about connections  connection := generalization of file):
Once you open() one, you can read it **sequentially**, e.g., in
bunches of a few lines  i.e., you don't re-start from the
beginning each time.
I think this will allow to devise a pretty efficient R function
for reading (and returning as a vector of strings) line numbers
(n1, n2,..., nm).   

Did you know this?  If not, maybe you forward this answer (and
your reaction to it) to R-help as well.
Regards,
Martin Maechler [EMAIL PROTECTED]   http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] R tools for large files

2003-08-25 Thread Murray Jorgensen

I'm wondering if anyone has written some functions or code for handling 
very large files in R. I am working with a data file that is 41 
variables times who knows how many observations making up 27MB altogether.

The sort of thing that I am thinking of having R do is

- count the number of lines in a file

- form a data frame by selecting all cases whose line numbers are in a 
supplied vector (which could be used to extract random subfiles of 
particular sizes)

Does anyone know of a package that might be useful for this?

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R tools for large files

2003-08-25 Thread Murray Jorgensen

Could you be more specific? Do you mean the chapter on connections?

Ko-Kang Kevin Wang wrote:

Hi,

Have you looked at R Data Import/Export?

On Mon, 25 Aug 2003, Murray Jorgensen wrote:


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R tools for large files

2003-08-25 Thread Murray Jorgensen

Andrew,

This is no doubt true, but some things in R work very well with big 
files without the need for any extra software:

readLines(c:/data/perry/data.csv,n=12)
# prints out the first 12 lines as strings
flows - read.csv(c:/data/perry/data.csv,na.strings=?, 
header=F,nrows=1000)
# makes a data frame from the first 1000 records

I would like to get some solution where I don't find myself generating 
large numbers of derived files from the original data file.

Murray

Andrew C. Ward wrote:
Dear Murray,

One way that works very well for many people (including me)
is to store the data in an external database, such as MySQL,
and read in just the bits you want using the excellent
package RODBC. Getting a database to do all the selecting
is very fast and efficient, leaving R to concentrate on the
analysis and visualisation. This is all described in the
R Import/Export Manual.
Regards,

Andrew C. Ward

CAPE Centre
Department of Chemical Engineering
The University of Queensland
Brisbane Qld 4072 Australia
[EMAIL PROTECTED]
Quoting Murray Jorgensen [EMAIL PROTECTED]:


I'm wondering if anyone has written some functions or
code for handling 
very large files in R. I am working with a data file that
is 41 
variables times who knows how many observations making up
27MB altogether.

The sort of thing that I am thinking of having R do is

- count the number of lines in a file

- form a data frame by selecting all cases whose line
numbers are in a 
supplied vector (which could be used to extract random
subfiles of 
particular sizes)

Does anyone know of a package that might be useful for
this?
Murray

--
Dr Murray Jorgensen 
http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato,
Hamilton, New Zealand
Email: [EMAIL PROTECTED]   
Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile
021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help



--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R tools for large files

2003-08-25 Thread Murray Jorgensen

At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote:
I think that is only a medium-sized file.

Large for my purposes means more than I really want to read into memory
which in turn means takes more than 30s. I'm at home now and the file
isn't so I'm not sure if the file is large or not.

More responses interspesed below. BTW, I forgot to mention that I'm using
Windows and so do not have nice unix tools readily available.

On Mon, 25 Aug 2003, Murray Jorgensen wrote:

 I'm wondering if anyone has written some functions or code for handling 
 very large files in R. I am working with a data file that is 41 
 variables times who knows how many observations making up 27MB altogether.
 
 The sort of thing that I am thinking of having R do is
 
 - count the number of lines in a file

You can do that without reading the file into memory: use
system(paste(wc -l, filename)) 

Don't think that I can do that in Windows XL.

or read in blocks of lines via a 
connection

But that does sound promising!


 - form a data frame by selecting all cases whose line numbers are in a 
 supplied vector (which could be used to extract random subfiles of 
 particular sizes)

R should handle that easily in today's memory sizes.  Buy some more RAM if 
you don't already have 1/2Gb.  As others have said, for a real large file,
use a RDBMS to do the selection for you.

It's just that R is so good in reading in initial segments of a file that I
can't believe that it can't be effective in reading more general
(pre-specified) subsets.

Murray


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Interlacing two vectors

2003-08-20 Thread Murray Jorgensen

I want to interlace two vectors. This I can do:

 x - 1:4
 z - x+0.5
 as.vector(t(cbind(x,z)))
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
but this seems rather inelegant. Any suggestions?

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] r-help using random generating

2003-03-18 Thread Murray Jorgensen

Student Exercise?!

Cheryl H. wrote:
To whom it may concern:
Given that my sample size is n, my mean is 100, and my sd is 10, I need 
to use a random number generator (which I believe is the function 
rnorm(5,100,10)), but I need to repeat it a large number of times, and 
then plot the sampling distributions of the sample means, sd's, and 
variances of those generated sets. I'm having a real hard time trying to 
figure out how to do this easily. Please help if possible! Thanks,
Cheryl

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] help with likelihood contour plot

2003-03-17 Thread Murray Jorgensen

Can some kind person point out my error here? I'm trying to set up a 
grid for a countour plot of a likelihood function.

 u - rnorm(20,9.5,2.5)
 # sample of size 20 from N(9.5,2.5^2)
 loglik - function(th1,th2) {
+ n - length(u)
+ -(n/2)*log(2*pi*th2^2)-0.5*sum((u-th1)^2/th2^2)
+ }
 x - seq(4.5,14.5,len=50)
 y - seq(0.5,6,len=50)
 f - outer(x, y, loglik(x,y))
Error in match.fun(FUN) : not function, character, or symbol: loglik(x, y)
In addition: Warning message:
longer object length
is not a multiple of shorter object length in: u - th1
 loglik(9,2)
[1] -44.56294
 is.function(loglik)
[1] TRUE
Thanks,

Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] lda on curves

2003-02-16 Thread Murray Jorgensen

I'm working on a rather interesting consulting problem with a client. A 
number of physical variables are measured on a number of cricket bowlers 
in the performance of a delivery. An example variable might be a 
directional component of angular momentum for a particular joint 
measured at a large number (101) of equally spaced timepoints.

Each bowler generates a (fairly smooth) curve for each variable 
measured. I decided to represent each curve by a few orthogonal 
polynomial constrasts.

There are 4 groups of bowlers corresponding to various speeds of 
delivery. I want to use canonical variant analysis to find linear 
combinations of my transformed variables discriminating well between the 
groups of bowlers.

I used lda() from the MASS library to do this, but examining the output 
I notice that the higher-order orthogonal polynomials are getting larger 
coefficients than the more important lower-order ones. This is clearly 
because some scaling of the variables is being done by lda(), and 
because the higher-order polynomial vaiable values are smaller, they are 
scaled up.

I would like to turn off this scaling as it is not what is needed in 
this problem and will cause the tail to wag the dog. There is no 
obvious parameter to do this in

lda(x,   grouping, prior = proportions, tol = 1.0e-4,
   subset, na.action = na.fail,
   method, CV = FALSE, nu)

so I thought that I might try a hack. However:

 lda
function (x, ...)
{
if (is.null(class(x)))
class(x) - data.class(x)
UseMethod(lda, x, ...)
}

which isn't very helpful.

Any ideas about how to perform an unscaled canonical variates analysis?

Cheers,

Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] google

2003-01-27 Thread Murray Jorgensen

That's good. But it will probably not take you to other pages where 
people mention that they are using R for something, which is a bit of a 
pity.

Rafael A. Irizarry wrote:
fyi, I typed R in google and hit the I'm feeling lucky botton... it
took me to http://www.r-project.org


-rafael
ps - toysrus.com is second.

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] order() on vector with Inf's

2003-01-23 Thread Murray Jorgensen


[EMAIL PROTECTED] wrote:
On Thu, 23 Jan 2003, Murray Jorgensen wrote:
[useful suggestions omitted]

llbet[llbet==Inf] - -Inf

should correct the problem, but why not correct the calculation?

Not a bad idea, but the numerical difficulties always occur well away 
from the likelihood maximum, and I'm only looking for reasonable 
starting values.

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] order() on vector with Inf's

2003-01-22 Thread Murray Jorgensen

I have a parameter vector, ibeta, and a corresponding vector of 
loglikelihoods, llbet. If llbet contains no NAs or Inf's then I can 
extract the best parameter by

index - order(-llbet)[1]
beta - ibeta[index]

or similar. The argument na.last of order() allows me to fix this up 
even if llbet contains some NAs which I wish to ignore.

Unfortunately my llbet contains Inf's, which are not points of infinite 
likelihood, if anything they should be -Inf's. Anyway na.last does not 
seem to help me with these. What should I do?

Murray
--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

97 matches

Mail list logo