[R] How to do varimax rotation for principal component based factor analysis, any packages?

2006-04-16 Thread Yong Wang
Dear R users
the factanal pacakge is always MLE, which package can do varimax
rotation with the results from princomp ?

thank you

yong

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] string vector indices

2006-04-16 Thread Philipp Pagel
On Sun, Apr 16, 2006 at 12:26:29AM -0400, Luke wrote:
 
 x - a, aab
 y - a, aa, aab, aabc.
 
 Is there any R function to get the indices of y for the elements of x, that
 is, foo(x, y) will give me the index vector c(1, 3)?

match(x, y)

cu
Philipp

-- 
Dr. Philipp PagelTel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics  Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS  Tel.  +49-89-3187 3675
GSF - National Research Center   Fax.  +49-89-3187 3585
  for Environment and Health
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to do varimax rotation for principal component based factor analysis, any packages?

2006-04-16 Thread ronggui
I don't think the result from _princomp _ should be rotated.
The definition of principal components explicitly indicates that the
result is to have axes that are uncorrelated and line up in directions
of maximum variance.

If I am wrong,I hope some one else to point it out.

2006/4/16, Yong Wang [EMAIL PROTECTED]:
 Dear R users
 the factanal pacakge is always MLE, which package can do varimax
 rotation with the results from princomp ?

 thank you

 yong

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
黄荣贵
Deparment of Sociology
Fudan University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] permutation of rows of a matrix

2006-04-16 Thread Manuel López-Ibáñez
Dear John,

I understand what you mean. However, when someone is learning R for the 
first time or have little experience, such examples help to understand 
the connection of different parts of the language.

Moreover, things that make sense once you know them, can be difficult to 
relate in the first place. For example, it would be interesting to know 
how many new R users don't know that there is a manual page for [.

I hope you can understand my point of view (you may disagree, though.)

Regards,
Manuel.


John Fox wrote:
 Dear Manuel,
 
 Although ?sample doesn't specifically describe permuting the rows of a
 matrix, it does say that sample(x) generates a random permutation of the
 elements of x (or 1:x). Indexing the rows of the matrix by a permutation of
 1:x (where x is the number of rows) doesn't seem to be much of a leap.
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel 
López-Ibáñez
Sent: Saturday, April 15, 2006 9:44 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] permutation of rows of a matrix

help(sample) does not say anything about randomly permuting 
the rows of a matrix M by using M[sample(m,m),]. Perhaps it 
could be added as an example of use.

John Fox wrote:

Dear Jose,

M[sample(m, m),] will randomly permute the rows of M. [You probably 
could have figured this out via help.search(permutation), which 
would have led you to sample().]

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox




-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of javargas
Sent: Saturday, April 15, 2006 7:53 AM
To: r-help@stat.math.ethz.ch
Subject: [R] permutation of rows of a matrix

How can I generate a random permutation between rows of a 

matrix M (of 

m rows and n columns)?

Thanks for your help,

Jose

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


  
__
LLama Gratis a cualquier PC del Mundo. 
Llamadas a fijos y móviles desde 1 céntimo por minuto. 
http://es.voice.yahoo.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
 
 
 




__ 
LLama Gratis a cualquier PC del Mundo. 
Llamadas a fijos y móviles desde 1 céntimo por minuto. 
http://es.voice.yahoo.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] solve matrix

2006-04-16 Thread Peter Dalgaard
Nongluck Klibbua [EMAIL PROTECTED] writes:

 Hi R-users,
 I try to use solve to find the answer of this linear equation
 a=x*b where a is 2*1 matrix and b is 2*2 matrix.I would like to get x so 
 I call y-solve(a,b) but this one happen a is 2*1 matrix so what I should 
 do .
 Thanks 
 Luck


First get your extents right. For the matrix product x*b to make
sense, you need x to be n*2 if b is 2*2, and the result is n*2, so how
can a be 2*1?? 

Next, notice that solve(A,b) solves A*x = b, where A is a square
matrix, so you need solve(b,a) or maybe solve(b, t(a)) or
t(solve(b,t(a)) depending on what your problem really was.
 
-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to do varimax rotation for principal component based factor analysis, any packages?

2006-04-16 Thread Prof Brian Ripley

On Sun, 16 Apr 2006, ronggui wrote:


I don't think the result from _princomp _ should be rotated.
The definition of principal components explicitly indicates that the
result is to have axes that are uncorrelated and line up in directions
of maximum variance.

If I am wrong,I hope some one else to point it out.


They can be, whether they should be is another matter.  See the 
Statistical Complements to MASS3 at 
http://www.stats.ox.ac.uk/pub/MASS3/Compl.shtml for some examples and 
references.


The original post is starting from false premises: there is no `factanal 
package', and the varimax function in stats is not confined to the results 
of factanal().


If for some reason you want to use the so-called `principal component 
factor analysis', you could make use of one of the several statistical 
systems with names starting with S that provide it, or write your own R 
code.




2006/4/16, Yong Wang [EMAIL PROTECTED]:

Dear R users
the factanal pacakge is always MLE, which package can do varimax
rotation with the results from princomp ?

thank you

yong

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




--
»ÆÈÙ¹ó
Deparment of Sociology
Fudan University




--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] permutation of rows of a matrix

2006-04-16 Thread John Fox
Dear Manuel,

I do understand your point of view, and think that it is reasonable, but in
this case, I disagree. 

It wouldn't hurt to have an example in ?sample of using a permutation to
index a matrix, but this is one of many uses of permutations and it is not
possible to show or even to anticipate all of them. 

To use a programming environment like R effectively, it's necessary to
acquire some basic facility with the language (such as an understanding of
how indexing works), and it's much more efficient to acquire this facility
by reading a manual or book than through help pages. For example, the
Introduction to R manual that comes with R has a section on indexing arrays,
and most introductory books on R, including free ones, have more detailed
explanations of the subject.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Manuel 
 López-Ibáñez
 Sent: Sunday, April 16, 2006 4:52 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] permutation of rows of a matrix
 
 Dear John,
 
 I understand what you mean. However, when someone is learning 
 R for the first time or have little experience, such examples 
 help to understand the connection of different parts of the language.
 
 Moreover, things that make sense once you know them, can be 
 difficult to relate in the first place. For example, it would 
 be interesting to know how many new R users don't know that 
 there is a manual page for [.
 
 I hope you can understand my point of view (you may disagree, though.)
 
 Regards,
   Manuel.
 
 
 John Fox wrote:
  Dear Manuel,
  
  Although ?sample doesn't specifically describe permuting 
 the rows of a 
  matrix, it does say that sample(x) generates a random 
 permutation of 
  the elements of x (or 1:x). Indexing the rows of the matrix by a 
  permutation of 1:x (where x is the number of rows) doesn't 
 seem to be much of a leap.
  
  Regards,
   John
  
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
  
  
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Manuel 
 López-Ibáñez
 Sent: Saturday, April 15, 2006 9:44 AM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] permutation of rows of a matrix
 
 help(sample) does not say anything about randomly permuting 
 the rows 
 of a matrix M by using M[sample(m,m),]. Perhaps it could be 
 added as 
 an example of use.
 
 John Fox wrote:
 
 Dear Jose,
 
 M[sample(m, m),] will randomly permute the rows of M. [You 
 probably 
 could have figured this out via help.search(permutation), which 
 would have led you to sample().]
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of javargas
 Sent: Saturday, April 15, 2006 7:53 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] permutation of rows of a matrix
 
 How can I generate a random permutation between rows of a
 
 matrix M (of
 
 m rows and n columns)?
 
 Thanks for your help,
 
 Jose
 
 __
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 __
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 
 __
 LLama Gratis a cualquier PC del Mundo. 
 Llamadas a fijos y móviles desde 1 céntimo por minuto. 
 http://es.voice.yahoo.com
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
  
  
  
 
   
   
   
 __
 LLama Gratis a cualquier PC del Mundo. 
 Llamadas a fijos y móviles desde 1 céntimo por minuto. 
 http://es.voice.yahoo.com
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE 

Re: [R] string vector indices

2006-04-16 Thread Marc Schwartz
On Sun, 2006-04-16 at 11:48 +0200, Philipp Pagel wrote:
 On Sun, Apr 16, 2006 at 12:26:29AM -0400, Luke wrote:
  
  x - a, aab
  y - a, aa, aab, aabc.
  
  Is there any R function to get the indices of y for the elements of x, that
  is, foo(x, y) will give me the index vector c(1, 3)?
 
 match(x, y)
 
 cu
   Philipp


Just a quick note here that there is one limitation in using match(),
which is that it will only return the index of the first match.

So, if there was an element in 'x' that appeared more than once in 'y',
you would still only get c(1, 3):

x - c(a, aab)
y - c(a, aa, aab, aabc, a)

 match(x, y)
[1] 1 3

Note that the second occurrence of a in 'y' is not picked up.

To solve this problem use %in% combined with which():

 which(y %in% x)
[1] 1 3 5

See ?which and ?%in% for more information. Note that the help for %in%
is on the same page as ?match. 

This approach will work in both situations.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dcolumn

2006-04-16 Thread Frank E Harrell Jr
Brian Quinif wrote:
 Does anyone out there use dcolumn=TRUE in the latex() function in the
 Hmisc library?
 
 I would like to line up the data in a latex table I'm making using
 latex(), but I'm having some issues with this feature.  Since there is
 no description of it in the help, I thought that it might be
 incomplete or something like that.

?latex points you to ?format.df for dcolumn

 
 On a related note, if anyone knows how to get the end result I want
 (LaTeX table with data aligned by decimal point), I'd love to hear it.
  I have been having issues with this which I think might be related to
 the fact that I have std. errors enclosed in parentheses.

For such customized non-numeric formats you might want to just include 
latex markup inside the character strings.

FH

 
 Thanks,
 
 BQ

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] matching identical row names

2006-04-16 Thread John Kane
Dear Dr. Fox
Your reply to Sirinivas Iyyar was most helpful to me. I am trying to collapse 
some categories of a data.frame in a similar way.  
I have a data frame in the form below 

Prog Sub.Program   Job  V1  V2  V3
  1 Alpha   A 1   2   3
  2 Alpha   B 2   3   1
  2 GammaB 1   3   3
  2 Alpha   A 3   4   1
  2 GammaB 2   2   3
  1 Alpha   A 2  22
 
What I want is to sum the values of VI, V2 and V3 and end up with a new 
data.frame that would look like
 
   ProgSubprog Job   Sum(V1)  Sum(V2), Sum(V3)
1  AlphaA34  5
2  AlphaA34  1
2  Gamma B 35   6

I thought that I could use by() to create a vector for each of V1:V3 but I 
cannot see any way to capture the values.
temp1 - by(Data[,4] simply gives me the complete output.

An example of what I have done is
-

Prog  - 1, 2, 2, 2,2,1,
Sub.Program -  c(Alpha, Alpha, Gamma, Alpha, Gamma, Alpha )
Job - c(A, B, B, A, B,A)
V1 - c(1,2, 1,3,2,2)
V2 -  c(2, 3, 3, 4, 2, 2)   
V3 -  c(3, 1 , 3, 1, 3,2 
Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3)   

by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum)

 
I also get the expected NA. for cells that do not exist. Is there any way to 
set them to 0 in the operation?  

 

Any help would be greatly appreciated. 
Thanks 
John

- Original Message 
From: John Fox [EMAIL PROTECTED]
To: Srinivas Iyyer [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Saturday, April 15, 2006 9:35:46 AM
Subject: Re: [R] matching identical row names

Dear Srinivas,

Your data are likely in a data frame rather than a matrix (since the columns
are heterogeneous), and name is a variable, not the row names of the data
frame.

There are several ways to do what you want; one simple way, assuming that
the data are in a data frame named Data, is

 by(Data[,2:5], Data$name, mean)

If you want the result in the form of a matrix, then you could do

 aggregate(Data[,2:5], list(Data$name), mean)

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Srinivas Iyyer
 Sent: Friday, April 14, 2006 11:58 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] matching identical row names
 
 dear group, 
  
 i have a sample matrix
 name   v1   v2   v3   v4
 cat   1011   12   15
 dog   3 12   10   14
 cat   9 12   12   15
 cat   5 12   10   11
 dog   12113  123  31
 ...
 
 
 since cat is repeated 3 times, I want a mean value for it. 
 Like wise for every element of the name column. 
 cat v1 = mean(c(10,9,5))
 cat v3 = mean(c(11,12,13))
 ..etc.
 
 name v1   v2 v3   v4
 cat  8   11.6   11.3  13.6
 dog  7.5 62.5   66.5  22.5
 
 could any one help me in solving this mystery. thank you.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] matching identical row names

2006-04-16 Thread John Fox
Dear John,

You can use aggregate(), also described in my suggestion to Sirinivas:

 aggregate(Data[, 4:6], Data[1:3], sum)
  Prog Sub.Program Job V1 V2 V3
11   Alpha   A  3  4  5
22   Alpha   A  3  4  1
32   Alpha   B  2  3  1
42   Gamma   B  3  5  6

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of John Kane
 Sent: Sunday, April 16, 2006 10:29 AM
 To: John Fox; Srinivas Iyyer
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] matching identical row names
 
 Dear Dr. Fox
 Your reply to Sirinivas Iyyar was most helpful to me. I am 
 trying to collapse some categories of a data.frame in a similar way.  
 I have a data frame in the form below 
 
 Prog Sub.Program   Job  V1  V2  V3
   1 Alpha   A 1   2   3
   2 Alpha   B 2   3   1
   2 GammaB 1   3   3
   2 Alpha   A 3   4   1
   2 GammaB 2   2   3
   1 Alpha   A 2  22
  
 What I want is to sum the values of VI, V2 and V3 and end up 
 with a new data.frame that would look like
  
ProgSubprog Job   Sum(V1)  Sum(V2), Sum(V3)
 1  AlphaA34  5
 2  AlphaA34  1
 2  Gamma B 35   6
 
 I thought that I could use by() to create a vector for each 
 of V1:V3 but I cannot see any way to capture the values.
 temp1 - by(Data[,4] simply gives me the complete output.
 
 An example of what I have done is
 -
 
 Prog  - 1, 2, 2, 2,2,1,
 Sub.Program -  c(Alpha, Alpha, Gamma, Alpha, 
 Gamma, Alpha )
 Job - c(A, B, B, A, B,A)
 V1 - c(1,2, 1,3,2,2)
 V2 -  c(2, 3, 3, 4, 2, 2)   
 V3 -  c(3, 1 , 3, 1, 3,2 
 Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3)   
 
 by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum)
 
  
 I also get the expected NA. for cells that do not exist. Is 
 there any way to set them to 0 in the operation?  
 
  
 
 Any help would be greatly appreciated. 
 Thanks
 John
 
 - Original Message 
 From: John Fox [EMAIL PROTECTED]
 To: Srinivas Iyyer [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Saturday, April 15, 2006 9:35:46 AM
 Subject: Re: [R] matching identical row names
 
 Dear Srinivas,
 
 Your data are likely in a data frame rather than a matrix 
 (since the columns are heterogeneous), and name is a 
 variable, not the row names of the data frame.
 
 There are several ways to do what you want; one simple way, 
 assuming that the data are in a data frame named Data, is
 
  by(Data[,2:5], Data$name, mean)
 
 If you want the result in the form of a matrix, then you could do
 
  aggregate(Data[,2:5], list(Data$name), mean)
 
 I hope this helps,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
 Srinivas Iyyer
  Sent: Friday, April 14, 2006 11:58 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] matching identical row names
  
  dear group,
   
  i have a sample matrix
  name   v1   v2   v3   v4
  cat   1011   12   15
  dog   3 12   10   14
  cat   9 12   12   15
  cat   5 12   10   11
  dog   12113  123  31
  ...
  
  
  since cat is repeated 3 times, I want a mean value for it. 
  Like wise for every element of the name column. 
  cat v1 = mean(c(10,9,5))
  cat v3 = mean(c(11,12,13))
  ..etc.
  
  name v1   v2 v3   v4
  cat  8   11.6   11.3  13.6
  dog  7.5 62.5   66.5  22.5
  
  could any one help me in solving this mystery. thank you.
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list

Re: [R] matching identical row names

2006-04-16 Thread John Kane
This was immensely helpful ! 

I had  tried aggregate() , messed it up and  decided that I must have 
misunderstood its use vs by() and so posted my question to you. Instead I must 
have had a typo in there somewhere .  I went back, did it again and it is 
lovely. I was going to have to 'slice and dice' the dataset with  a myriad of 
subset calls otherwise.  

Thank you very much.  
I hope the weather is as nice in Hamilton as it is in Kingston. And now I can 
go get some sun!

- Original Message 
From: John Fox [EMAIL PROTECTED]
To: John Kane [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch; Srinivas Iyyer [EMAIL PROTECTED]
Sent: Sunday, April 16, 2006 11:51:51 AM
Subject: RE: [R] matching identical row names

Dear John,

You can use aggregate(), also described in my suggestion to Sirinivas:

 aggregate(Data[, 4:6], Data[1:3], sum)
  Prog Sub.Program Job V1 V2 V3
11   Alpha   A  3  4  5
22   Alpha   A  3  4  1
32   Alpha   B  2  3  1
42   Gamma   B  3  5  6

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of John Kane
 Sent: Sunday, April 16, 2006 10:29 AM
 To: John Fox; Srinivas Iyyer
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] matching identical row names
 
 Dear Dr. Fox
 Your reply to Sirinivas Iyyar was most helpful to me. I am 
 trying to collapse some categories of a data.frame in a similar way.  
 I have a data frame in the form below 
 
 Prog Sub.Program   Job  V1  V2  V3
   1 Alpha   A 1   2   3
   2 Alpha   B 2   3   1
   2 GammaB 1   3   3
   2 Alpha   A 3   4   1
   2 GammaB 2   2   3
   1 Alpha   A 2  22
  
 What I want is to sum the values of VI, V2 and V3 and end up 
 with a new data.frame that would look like
  
ProgSubprog Job   Sum(V1)  Sum(V2), Sum(V3)
 1  AlphaA34  5
 2  AlphaA34  1
 2  Gamma B 35   6
 
 I thought that I could use by() to create a vector for each 
 of V1:V3 but I cannot see any way to capture the values.
 temp1 - by(Data[,4] simply gives me the complete output.
 
 An example of what I have done is
 -
 
 Prog  - 1, 2, 2, 2,2,1,
 Sub.Program -  c(Alpha, Alpha, Gamma, Alpha, 
 Gamma, Alpha )
 Job - c(A, B, B, A, B,A)
 V1 - c(1,2, 1,3,2,2)
 V2 -  c(2, 3, 3, 4, 2, 2)   
 V3 -  c(3, 1 , 3, 1, 3,2 
 Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3)   
 
 by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum)
 
  
 I also get the expected NA. for cells that do not exist. Is 
 there any way to set them to 0 in the operation?  
 
  
 
 Any help would be greatly appreciated. 
 Thanks
 John
 
 - Original Message 
 From: John Fox [EMAIL PROTECTED]
 To: Srinivas Iyyer [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Saturday, April 15, 2006 9:35:46 AM
 Subject: Re: [R] matching identical row names
 
 Dear Srinivas,
 
 Your data are likely in a data frame rather than a matrix 
 (since the columns are heterogeneous), and name is a 
 variable, not the row names of the data frame.
 
 There are several ways to do what you want; one simple way, 
 assuming that the data are in a data frame named Data, is
 
  by(Data[,2:5], Data$name, mean)
 
 If you want the result in the form of a matrix, then you could do
 
  aggregate(Data[,2:5], list(Data$name), mean)
 
 I hope this helps,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
 Srinivas Iyyer
  Sent: Friday, April 14, 2006 11:58 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] matching identical row names
  
  dear group,
   
  i have a sample matrix
  name   v1   v2   v3   v4
  cat   1011   12   15
  dog   3 12   10   14
  cat   9 12   12   15
  cat   5 12   10   11
  dog   12113  123  31
  ...
  
  
  since cat is repeated 3 times, I want a mean value for it. 
  Like wise for every element of the name column. 
  cat v1 = mean(c(10,9,5))
  cat v3 = mean(c(11,12,13))
  ..etc.
  
  name v1   v2 v3   v4
  cat  8   11.6   11.3  13.6
  dog  7.5 62.5   66.5  22.5
  
  could any one help me in solving this mystery. thank you.
  
  

[R] Predict nls new data with se.fit snf intervals

2006-04-16 Thread Jiawei Yan
Using R  to predict.nls() using new data,  `se.fit' and  `interval' are
ignored. . Is there any update for this? Anybody has those routine or may
advise me how to do that? Thanks

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] [S] Problems with lme and 2 levels of nesting:Summary

2006-04-16 Thread Douglas Bates
I have taken the liberty of including the R-help mailing list on this
reply as that is the appropriate place to discuss lmer results.

On 4/5/06, Andreas Svensson [EMAIL PROTECTED] wrote:
 Hello again
 I have now recieved some helpful hints in this matter and will summarize them 
 but first let me reiterate the problem:

 I had two treatments: 2 types of food fed to females. Each female were 
 present only in one treatment and laid one clutch (female ID = clutch ID). 
 From each clutch, 50 larvae were taken and divided on 5 cups; 10 larvae in 
 each cup (the cups were not meant to differ in any way, it's just impossible 
 to count 50 larvae in motion, but ok with 10). The day of death was recorded 
 for each larva. Since there's only one data point per larva, I see no need of 
 including larva ID in the model. So:

 Response: DeathDay
 Fixed factor: Treatment
 Random factors: Clucth, Cup
 Design: Cup nested in Clutch

 I'm primarlity interested in the effect of Treatment on DeathDay, but would 
 also like to know about the variation within Clutch and Cup. The unit of 
 replication is Clutch. I want to use a model where the variation within 
 Clutch and within Cup is taken into account, while avoiding pseudoreplication 
 of 5
 cups (and 50 larvae) coming from the same clutch. The solution is therefore a 
 mixed model with
 Cup nested in Clutch as random factors.
 A year ago, I was recommended this script from S-help at Insightful:

 model1-lme(DeathDay~Treatment,random=~Treatment|Clutch/Cup)

 I think this is wrong since Treatment shouldn't be among the random factors. 
 From the various tips I've recieved the correct syntax using lme (in libarary 
 nlme) should be:

 model2-lme(DeathDay~ Treatment,random=~1|Clutch/Cup)

 and using lmer (in library lme4):

 model3- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch),  
 method=REML)

 I get the result that there is NO effect of Treatment. So far so good.

If there is no significant effect for Treatment then I would go on to
fit a model without the Treatment effect and use that model to examine
the significane of the random effects.

 But the above models will not give me any information of the importance of 
 the random factors.  If I do the same in SPSS:

 glm
 DeathDay by Cup Clutch Treatment
 /method=sstype(1)
 /random Cup Clutch
 /design Cup(Clutch).

 ...I get an ANOVA table that I (and referees) can understand. It tells me 
 about where the variation originates from; in one experiment only Clutch is 
 significant, in another there is also a (surprising but explainable) effect 
 of Cup.

An analysis of variance table may be appropriate for balanced data
with nested factors such as you have but it does not generalize well
to other situations in which these models are applied. The preferred
approach to model-building in R/S-PLUS is to build the model
sequentially.  I would use


fm1 - lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch))
fm2 - lmer(DeathDay ~ 1 + (1|Clutch:Cup) + (1|Clutch)) # equivalent
to your model7
fm3 - lmer(DeathDay ~ 1 + (1|Clutch))

and then compare fm2 and fm3, say with anova(fm2, fm3).  As described
below the p-value from the likelihood ratio test in this case  is not
exact because the hypothesis you are testing is on the boundary of the
parameter space.  However, the value returned will be a conservatve
p-value so you suffer a loss of power rather than compromising the
level of the test.

If you choose you could look at the distribution of the parameters
from a Bayesian perspective by using the mcmcsamp function to generate
a Markov Chain Monte Carlo sample from the parameters in fm2 and check
if you have a precise or an imprecise estimate of the variance of the
random effects for the Clutch:Cup grouping factor.  If that parameter
could reasonably be zero then you could repeat the sampling for fm3
and look at the distribution of the variance of the random effects
associated with the Clutch grouping factor.

In R/S-plus I can achieve something similar with model
simplification. REML will not allow me to remove the fixed factor, so
I have to use regular ML for this:

I think I would prefer to characterize the situation by saying that
the log-likelihood from the REML criterion cannot be used to compare
models when you change the fixed-effects specificatio.  However, this
still doesn't explain to me why you want to retain the Treatment
factor when you have judged it to be inert.

 model4- lme(DeathDay ~ Treatment, random=~ 1 | Clutch/Cup,method=ML)  
 #Full model
 model5- lme(DeathDay ~ Treatment, random=~ 1 | Clutch,method=ML)  # 
 model minus Cup
 model6- lme(DeathDay ~ Treatment, random=~ 1 | Cup,method=ML)# 
 model minus Clutch
 model7- lme(DeathDay ~ 1, random=~ 1 | Clutch/Cup,method=ML) # 
 model minus Treatment

 ...and make anova(model4, modelxxx)etc to test for effects of Cup, Clutch and 
 Treatment. But is this really correct? In model5 I remove Cup, doesn't this 
 wrongly boost the df for 

Re: [R] generalized hypergeometric function

2006-04-16 Thread Ben Bolker
Marc Schwartz MSchwartz at mn.rr.com writes:

 
 On Sat, 2006-04-15 at 20:59 -0300, Bernardo Rangel tura wrote:
  Hi R-masters
  
  I need compute generalized hypergeometric function.
  I look in R-project and R-help list and not find nothing about 
  generalized hypergeometric function
  
  Is possible calculate the generalized hypergeometric function?
  
  Somebody have a script for this?
  
  Note that he is looking for the h'geom FUNCTION, not DISTRIBUTION
(e.g. http://mathworld.wolfram.com/GeneralizedHypergeometricFunction.html);
Robin Hankin wrote some code (hypergeo in the Davies package on CRAN)
to compute a particular Gaussian h'geom function, and was asking
at one point on the mailing list whether anyone was interested
in other code; I don't know whether it will be generalized
enough for you.

   Ben Bolker

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Var.calc in Match()

2006-04-16 Thread Brian Quinif
Does anyone else find that using the Var.calc option (for
heteroscedasticity consistent std. errors) in Match() (from the
Matching library) slows down computation of the matching estimator by
a lot?

I don't really understand why when I use this option it slows down so
much, but for me it does significantly. I want to use the
heteroscedasticity consistent std. errors in my project, but as long
as it takes to compute them, I don't know if I will be able to.

Thanks,

BQ

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Ramón Casero Cañas
Michael Dewey wrote:
 At 17:12 09/04/06, Ramón Casero Cañas wrote:
 
 I am not sure what the problem you really want to solve is but it seems
 that
 a) abnormality is rare
 b) the logistic regression predicts it to be rare.
 If you want a prediction system why not try different cut-offs (other
 than 0.5 on the probability scale) and perhaps plot sensitivity and
 specificity to help to choose a cut-off?

Thanks for your suggestions, Michael. It took me some time to figure out
how to do this in R (as trivial as it may be for others). Some comments
about what I've done follow, in case anyone is interested.

The problem is a) abnormality is rare (Prevalence=14%) and b) there is
not much difference in the independent variable between abnormal and
normal. So the logistic regression model predicts that P(abnormal) =
0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
decide between normal/abnormal. But you are right, in that another
cut-off point can be chosen.

For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
Specificity=52%. They are pretty bad, although for clinical purposes I
would say that Positive/Negative Predictive Values are more interesting.
But then PPV=19% and NPV=90%, which isn't great. As an overall test of
how good the model is for classification I have computed the area under
the ROC, from your suggestion of using Sensitivity and Specificity.

I couldn't find how to do this directly with R, so I implemented it
myself (it's not difficult but I'm new here). I tried with package ROCR,
but apparently it doesn't cover binary outcomes.

The area under the ROC is 0.64, so I would say that even though the
model seems to fit the data, it just doesn't allow acceptable
discrimination, not matter what the cut-off point.


I have also studied the effect of low prevalence. For this, I used
option ran.gen in the boot function (package boot) to define a function
that resamples the data so that it balances abnormal and normal cases.

A logistic regression model is fitted to each replicate, to a parametric
bootstrap, and thus compute the bias of the estimates of the model
coefficients, beta0 and beta1. This shows very small bias for beta1, but
a rather large bias for beta0.

So I would say that prevalence has an effect on beta0, but not beta1.
This is good, because a common measure like the odds ratio depends only
on beta1.

Cheers,

-- 
Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] generalized hypergeometric function

2006-04-16 Thread Marc Schwartz
On Sun, 2006-04-16 at 17:54 +, Ben Bolker wrote:
 Marc Schwartz MSchwartz at mn.rr.com writes:
 
  
  On Sat, 2006-04-15 at 20:59 -0300, Bernardo Rangel tura wrote:
   Hi R-masters
   
   I need compute generalized hypergeometric function.
   I look in R-project and R-help list and not find nothing about 
   generalized hypergeometric function
   
   Is possible calculate the generalized hypergeometric function?
   
   Somebody have a script for this?
   
   Note that he is looking for the h'geom FUNCTION, not DISTRIBUTION
 (e.g. http://mathworld.wolfram.com/GeneralizedHypergeometricFunction.html);
 Robin Hankin wrote some code (hypergeo in the Davies package on CRAN)
 to compute a particular Gaussian h'geom function, and was asking
 at one point on the mailing list whether anyone was interested
 in other code; I don't know whether it will be generalized
 enough for you.
 
Ben Bolker


Thanks Ben. I stand corrected on that point. Didn't click in my initial
reading.

Regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Rick Bilonick
On Sun, 2006-04-16 at 19:10 +0100, Ramón Casero Cañas wrote:

 Thanks for your suggestions, Michael. It took me some time to figure out
 how to do this in R (as trivial as it may be for others). Some comments
 about what I've done follow, in case anyone is interested.
 
 The problem is a) abnormality is rare (Prevalence=14%) and b) there is
 not much difference in the independent variable between abnormal and
 normal. So the logistic regression model predicts that P(abnormal) =
 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
 decide between normal/abnormal. But you are right, in that another
 cut-off point can be chosen.
 
 For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
 Specificity=52%. They are pretty bad, although for clinical purposes I
 would say that Positive/Negative Predictive Values are more interesting.
 But then PPV=19% and NPV=90%, which isn't great. As an overall test of
 how good the model is for classification I have computed the area under
 the ROC, from your suggestion of using Sensitivity and Specificity.
 
 I couldn't find how to do this directly with R, so I implemented it
 myself (it's not difficult but I'm new here). I tried with package ROCR,
 but apparently it doesn't cover binary outcomes.
 
 The area under the ROC is 0.64, so I would say that even though the
 model seems to fit the data, it just doesn't allow acceptable
 discrimination, not matter what the cut-off point.
 
 
 I have also studied the effect of low prevalence. For this, I used
 option ran.gen in the boot function (package boot) to define a function
 that resamples the data so that it balances abnormal and normal cases.
 
 A logistic regression model is fitted to each replicate, to a parametric
 bootstrap, and thus compute the bias of the estimates of the model
 coefficients, beta0 and beta1. This shows very small bias for beta1, but
 a rather large bias for beta0.
 
 So I would say that prevalence has an effect on beta0, but not beta1.
 This is good, because a common measure like the odds ratio depends only
 on beta1.
 
 Cheers,
 
 -- 
 Ramón Casero Cañas
 
 http://www.robots.ox.ac.uk/~rcasero/wiki
 http://www.robots.ox.ac.uk/~rcasero/blog
 

The Epi package has function ROC that draws the ROC curve and computes
the AUC among other things.

Rick B.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] summary stats

2006-04-16 Thread Neil Hepburn
I have a data set that has student test scores along with several
categorical variables. I would like to generate a set of summary stats
(mean, variance, n) for the data grouped by school authority and by exam
topic.  I have tried the by() function but that seems to only be able to
handle one level of grouping.  In particular what I would like is
something like the following 

Board   Subject MeanVarianceN
board1  english 70  150 600
board2  english 66  210 510
board1  science 69  180 605
board2  science 71  220 520

and so on.


I have already generated the stats that I need using GROUP BY in a
select query in MySQL.  I'm just curious now about doing the same thing
in R

thanks in advance,
Neil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] summary stats

2006-04-16 Thread Christos Hatzis
You could use aggregate:

agg.mean - aggregate(my.data, by=list(Board, Subject), FUN=mean)

The caveat is that you can only use one aggregator function at a time.  You
could rerun the same for FUN=var and FUN=length to get the additional
aggregate statustics that you need and then cbind the results:

cbind(agg.mean, agg.var, agg.n)

-Christos 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn
Sent: Sunday, April 16, 2006 2:56 PM
To: r-help@stat.math.ethz.ch
Subject: [R] summary stats

I have a data set that has student test scores along with several
categorical variables. I would like to generate a set of summary stats
(mean, variance, n) for the data grouped by school authority and by exam
topic.  I have tried the by() function but that seems to only be able to
handle one level of grouping.  In particular what I would like is something
like the following 

Board   Subject MeanVarianceN
board1  english 70  150 600
board2  english 66  210 510
board1  science 69  180 605
board2  science 71  220 520

and so on.


I have already generated the stats that I need using GROUP BY in a select
query in MySQL.  I'm just curious now about doing the same thing in R

thanks in advance,
Neil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Frank E Harrell Jr
Ramón Casero Cañas wrote:
 Michael Dewey wrote:
 
At 17:12 09/04/06, Ramón Casero Cañas wrote:

I am not sure what the problem you really want to solve is but it seems
that
a) abnormality is rare
b) the logistic regression predicts it to be rare.
If you want a prediction system why not try different cut-offs (other
than 0.5 on the probability scale) and perhaps plot sensitivity and
specificity to help to choose a cut-off?
 
 
 Thanks for your suggestions, Michael. It took me some time to figure out
 how to do this in R (as trivial as it may be for others). Some comments
 about what I've done follow, in case anyone is interested.
 
 The problem is a) abnormality is rare (Prevalence=14%) and b) there is
 not much difference in the independent variable between abnormal and
 normal. So the logistic regression model predicts that P(abnormal) =
 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
 decide between normal/abnormal. But you are right, in that another
 cut-off point can be chosen.
 
 For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
 Specificity=52%. They are pretty bad, although for clinical purposes I
 would say that Positive/Negative Predictive Values are more interesting.
 But then PPV=19% and NPV=90%, which isn't great. As an overall test of
 how good the model is for classification I have computed the area under
 the ROC, from your suggestion of using Sensitivity and Specificity.
 
 I couldn't find how to do this directly with R, so I implemented it
 myself (it's not difficult but I'm new here). I tried with package ROCR,
 but apparently it doesn't cover binary outcomes.
 
 The area under the ROC is 0.64, so I would say that even though the
 model seems to fit the data, it just doesn't allow acceptable
 discrimination, not matter what the cut-off point.
 
 
 I have also studied the effect of low prevalence. For this, I used
 option ran.gen in the boot function (package boot) to define a function
 that resamples the data so that it balances abnormal and normal cases.
 
 A logistic regression model is fitted to each replicate, to a parametric
 bootstrap, and thus compute the bias of the estimates of the model
 coefficients, beta0 and beta1. This shows very small bias for beta1, but
 a rather large bias for beta0.
 
 So I would say that prevalence has an effect on beta0, but not beta1.
 This is good, because a common measure like the odds ratio depends only
 on beta1.
 
 Cheers,
 

This makes me think you are trying to go against maximum likelihood to 
optimize an improper criterion.  Forcing a single cutpoint to be chosen 
seems to be at the heart of your problem.  There's nothing wrong with 
using probabilities and letting the utility possessor make the final 
decision.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] a question on df of linear model

2006-04-16 Thread Joe Moore
Dear R-users:

On page 155 of Mixed-effects Models in S and S-Plus, the degree of 
freedoms of the anova comparison of lme and lm are 8 and 5.

But when I use the following SAS code:
proc glm data=ortho2;
   class gender;
   model distance = age|gender / solution ;
run;

The df is 3.

Could you please explain this to me?

Thanks

Joe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Reading SPSS .sav files

2006-04-16 Thread David Kaplan
Greetings,

How to I read in SPSS .sav files into R.  

Thank you.

David


=
David Kaplan, PhD  
Professor of Education   
School of Education   
University of Delaware  
Newark DE 19716

Voice: 302-831-8696
Fax:302-831-4110 
email:  mailto:[EMAIL PROTECTED] 
Homepage: http://www.udel.edu/dkaplan   
SOE page:  http://www.udel.edu/educ

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] summary stats

2006-04-16 Thread John Fox
Dear Neil,

Coincidentally, more or less the same question was asked on r-help yesterday
and today. You can use either the by() function or aggregate(), though
you'll have to do a bit of work on the result if you want it to look just
like your example.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn
 Sent: Sunday, April 16, 2006 1:56 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] summary stats
 
 I have a data set that has student test scores along with 
 several categorical variables. I would like to generate a set 
 of summary stats (mean, variance, n) for the data grouped by 
 school authority and by exam topic.  I have tried the by() 
 function but that seems to only be able to handle one level 
 of grouping.  In particular what I would like is something 
 like the following 
 
 Board Subject MeanVarianceN
 board1english 70  150 600
 board2english 66  210 510
 board1science 69  180 605
 board2science 71  220 520
 
 and so on.
 
 
 I have already generated the stats that I need using GROUP 
 BY in a select query in MySQL.  I'm just curious now about 
 doing the same thing in R
 
 thanks in advance,
 Neil
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reading SPSS .sav files

2006-04-16 Thread John Fox
Dear David,

There is the read.spss() function in the foreign package, and spss.get() in
Hmisc. The former is a part of the standard R distribution, so you could
have discovered it via, e.g., help.search(SPSS)

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of David Kaplan
 Sent: Sunday, April 16, 2006 3:35 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Reading SPSS .sav files
 
 Greetings,
 
 How to I read in SPSS .sav files into R.  
 
 Thank you.
 
 David
 
 
 =
 David Kaplan, PhD  
 Professor of Education   
 School of Education   
 University of Delaware  
 Newark DE 19716
 
 Voice: 302-831-8696
 Fax:302-831-4110 
 email:  mailto:[EMAIL PROTECTED] 
 Homepage: http://www.udel.edu/dkaplan   
 SOE page:  http://www.udel.edu/educ
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reading SPSS .sav files

2006-04-16 Thread Peter Dalgaard
David Kaplan [EMAIL PROTECTED] writes:

 Greetings,
 
 How to I read in SPSS .sav files into R.  

with read.spss() (package foreign)
 
 Thank you.
 
 David
...
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Did you? Following the pointers in there should be enough to answer a
question like this.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reading SPSS .sav files

2006-04-16 Thread Austin, Matt
I don't use SPSS, so I can't help with the details.  That being said, have
you looked at the foreign package?

--Matt

Matt Austin
Statistician
Amgen, Inc
800 9AMGEN9 x77431
805-447-7431




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of David Kaplan
Sent: Sunday, April 16, 2006 1:35 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Reading SPSS .sav files


Greetings,

How to I read in SPSS .sav files into R.  

Thank you.

David


=
David Kaplan, PhD  
Professor of Education   
School of Education   
University of Delaware  
Newark DE 19716

Voice: 302-831-8696
Fax:302-831-4110 
email:  mailto:[EMAIL PROTECTED] 
Homepage: http://www.udel.edu/dkaplan   
SOE page:  http://www.udel.edu/educ

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Ramón Casero Cañas
Frank E Harrell Jr wrote:
 
 This makes me think you are trying to go against maximum likelihood to
 optimize an improper criterion.  Forcing a single cutpoint to be chosen
 seems to be at the heart of your problem.  There's nothing wrong with
 using probabilities and letting the utility possessor make the final
 decision.

I agree, and in fact I was thinking along those lines, but I also needed
a way of evaluating how good is the model to discriminate between
abnormal and normal cases, as opposed to e.g. GOF. The only way I know
of is using area under ROC (thus setting cut-off points), which also
followed neatly from Michael Dewey comments. Any alternatives would be
welcome :)

-- 
Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wiki
http://www.robots.ox.ac.uk/~rcasero/blog

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logistic regression model with non-integer weights

2006-04-16 Thread Frank E Harrell Jr
Ramón Casero Cañas wrote:
 Frank E Harrell Jr wrote:
 
This makes me think you are trying to go against maximum likelihood to
optimize an improper criterion.  Forcing a single cutpoint to be chosen
seems to be at the heart of your problem.  There's nothing wrong with
using probabilities and letting the utility possessor make the final
decision.
 
 
 I agree, and in fact I was thinking along those lines, but I also needed
 a way of evaluating how good is the model to discriminate between
 abnormal and normal cases, as opposed to e.g. GOF. The only way I know
 of is using area under ROC (thus setting cut-off points), which also
 followed neatly from Michael Dewey comments. Any alternatives would be
 welcome :)
 

To get the ROC area you don't need to do any of that, and as you 
indicated, it is a good discrimination measure.  The lrm function in the 
Design package gives it to you automatically (C index), and you can also 
get it with the Hmisc package's somers2 and rcorr.cens functions.  ROC 
area is highly related to the Wilcoxon 2-sample test statistic for 
comparing cases and non-cases.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Problem getting R's decision tree for Quinlan's golf example data

2006-04-16 Thread Alan Lapedes
Newbie question, but I've checked archives etc. Am trying to reproduce
in R Quinlan's trivial example of the golf decision tree. The data file
of 14 examples follows (read in via read.table()):

Outlook Temperature Humidity Windy PlayDontPlay
1 sunny 85 85 false DontPlay
2 sunny 80 90 true DontPlay
3 overcast 83 78 false Play
4 rain 70 96 false Play
5 rain 68 80 false Play
6 rain 65 70 true DontPlay
7 overcast 64 65 true Play
8 sunny 72 95 false DontPlay
9 sunny 69 70 false Play
10 rain 75 80 false Play
11 sunny 75 70 true Play
12 overcast 72 90 true Play
13 overcast 81 75 false Play
14 rain 71 80 true DontPlay

R reports no format or other trivial problems: 
 summary(golf)
 Outlook   Temperature  Humidity  Windy PlayDontPlay
 overcast:4   Min.   :64.0   Min.   :65.0   false:8   DontPlay:5
 rain:5   1st Qu.:69.2   1st Qu.:71.2   true :6   Play:9
 sunny   :5   Median :72.0   Median :80.0   
  Mean   :73.6   Mean   :80.3   
  3rd Qu.:78.8   3rd Qu.:88.8   
  Max.   :85.0   Max.   :96.0 

I then try to build a decision tree:
 golf.rpart - rpart(PlayDontPlay ~ Outlook + Temperature + Humidity + Windy, 
 method=class, data=golf)

which doesn't yield a tree: 
 golf.rpart
n= 14 

node), split, n, loss, yval, (yprob)
  * denotes terminal node

1) root 14 5 Play (0.35714 0.64286) *

 plot(golf.rpart)
Error in plot.rpart(golf.rpart) : fit is not a tree, just a root

Thanks,
Alan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] survfit with newdata and individual=TRUE

2006-04-16 Thread Chris Adolph

I'm using the survfit() function with the Cox proportional hazards model, 
and having trouble producing fitted survivor curves when I ask survfit to 
use newdata recorded over multiple periods for the same individual.  I'm 
using the counting process formulation, as I have time-varying covariates.

For example,

cph.m1 - coxph(Surv(time=sdatenrm, time2=edatenrm, event=censor)~age)

xhyp - list(age=c(50,60),
  sdatenrm=c(1,20),
  edatenrm=c(21,40),
  censor=c(0,1)
  )

survfit(cph.m1,newdata=xhyp,conf.int=.95,individual=FALSE)

works fine, but then running

survfit(cph.m1,newdata=xhyp,conf.int=.95,individual=TRUE)

gives this error message:

Error in temp[, 1] : incorrect number of dimensions

Is there something wrong with the way I'm giving survfit() newdata?

Thanks,

Chris Adolph

_
Christopher Adolph
Assistant Professor of Political Science
Center for Statistics and the Social Sciences
Box 354320
University of Washington
Seattle, WA 98195-4320
[EMAIL PROTECTED]
www.chrisadolph.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R-vim suite

2006-04-16 Thread Jose Quesada
Hi All,

If you use vim to edit R code, you may be interested in this.
I have put together a personalized syntax file, some code templates,
and a way to send code from Vim to R using autoHotKeys (windows).

http://www.andrew.cmu.edu/user/jquesada/RvimSuite/instructions.html

Actually, the little autoHotKeys can be useful even if you don't use
vim just to send the example R code from the help pages to the
console.


Best wishes,
-Jose
--
Jose Quesada, PhD.

[EMAIL PROTECTED]   Dept. of Psychology
http://www.andrew.cmu.edu/~jquesada Sussex University
Brighton, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] suse 10.0

2006-04-16 Thread Ro
Ups...the links are:

http://fawn.hsu-hh.de/~steuer/SL-10.0-OSS/

http://mirrors.kernel.org/opensuse/distribution/SL-10.0-OSS/inst-source/

On 15/04/06, Detlef Steuer [EMAIL PROTECTED] wrote:

 Hi,

 you can add a repository for R found on my server:

 http://fawn.hsu-hh.de
 ~steuer/SL-10.0-OSS

 There I provide rpms for R and ESS.

 You will need e.g.
 http://mirrors.kernel.org
 opensuse/distribution/SL-10.0-OSS/inst-source

 to get a fortran compiler etc.

 Hope that helps.

 Detlef


 On Fri, 14 Apr 2006 06:21:26 -0700
 louis homer [EMAIL PROTECTED] wrote:

  I downloaded the free CD's and installed Suse 10.0. When I tried to
 install
  the latest RPM for R, the Yast installer complained that two files were
  missing. I was told that I can install from the internet, and started
 the
  process, but found I must supply the name of the server and the name of
 the
  directory. I tried several combinations unsuccessfully. Can anyone help
 me?
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


 --
 Keinen Gedanken zweimal denken, außer er ist schön. Unbekannte Quelle

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Problem getting R's decision tree for Quinlan's golf exam ple data [Broadcast]

2006-04-16 Thread Liaw, Andy
See ?rpart.control.  I get:
 
 golf.rp = rpart(Outlook ~ ., golf, control=rpart.control(minsplit=1))
 golf.rp
n= 14 
node), split, n, loss, yval, (yprob)
  * denotes terminal node
 1) root 14 9 rain (0.2857143 0.3571429 0.3571429)  
   2) Temperature 71.5 6 2 rain (0.167 0.667 0.167)  
 4) Temperature 64.5 1 0 overcast (1.000 0.000 0.000) *
 5) Temperature=64.5 5 1 rain (0.000 0.800 0.200)  
  10) Humidity=75 3 0 rain (0.000 1.000 0.000) *
  11) Humidity 75 2 1 rain (0.000 0.500 0.500)  
22) Temperature 67 1 0 rain (0.000 1.000 0.000) *
23) Temperature=67 1 0 sunny (0.000 0.000 1.000) *
   3) Temperature=71.5 8 4 sunny (0.375 0.125 0.500)  
 6) PlayDontPlay=Play 5 2 overcast (0.600 0.200 0.200)  
  12) Humidity=72.5 4 1 overcast (0.750 0.250 0.000)  
24) Temperature=78 2 0 overcast (1.000 0.000 0.000) *
25) Temperature 78 2 1 overcast (0.500 0.500 0.000)  
  50) Temperature 73.5 1 0 overcast (1.000 0.000 0.000)
*
  51) Temperature=73.5 1 0 rain (0.000 1.000 0.000) *
  13) Humidity 72.5 1 0 sunny (0.000 0.000 1.000) *
 7) PlayDontPlay=DontPlay 3 0 sunny (0.000 0.000 1.000) *

Andy

  _  

From: [EMAIL PROTECTED] on behalf of Alan Lapedes
Sent: Sun 4/16/2006 5:14 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Problem getting R's decision tree for Quinlan's golf example
data [Broadcast]



Newbie question, but I've checked archives etc. Am trying to reproduce 
in R Quinlan's trivial example of the golf decision tree. The data file 
of 14 examples follows (read in via read.table()): 

Outlook Temperature Humidity Windy PlayDontPlay 
1 sunny 85 85 false DontPlay 
2 sunny 80 90 true DontPlay 
3 overcast 83 78 false Play 
4 rain 70 96 false Play 
5 rain 68 80 false Play 
6 rain 65 70 true DontPlay 
7 overcast 64 65 true Play 
8 sunny 72 95 false DontPlay 
9 sunny 69 70 false Play 
10 rain 75 80 false Play 
11 sunny 75 70 true Play 
12 overcast 72 90 true Play 
13 overcast 81 75 false Play 
14 rain 71 80 true DontPlay 

R reports no format or other trivial problems: 
 summary(golf) 
 Outlook   Temperature  Humidity  Windy PlayDontPlay 
 overcast:4   Min.   :64.0   Min.   :65.0   false:8   DontPlay:5
 rain:5   1st Qu.:69.2   1st Qu.:71.2   true :6   Play:9
 sunny   :5   Median :72.0   Median :80.0   
  Mean   :73.6   Mean   :80.3   
  3rd Qu.:78.8   3rd Qu.:88.8   
  Max.   :85.0   Max.   :96.0 

I then try to build a decision tree: 
 golf.rpart - rpart(PlayDontPlay ~ Outlook + Temperature + Humidity +
Windy, method=class, data=golf) 

which doesn't yield a tree: 
 golf.rpart 
n= 14 

node), split, n, loss, yval, (yprob) 
  * denotes terminal node 

1) root 14 5 Play (0.35714 0.64286) * 

 plot(golf.rpart) 
Error in plot.rpart(golf.rpart) : fit is not a tree, just a root 

Thanks, 
Alan 

__ 
R-help@stat.math.ethz.ch mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help  
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] autoscall the y-axis

2006-04-16 Thread Fred J.
Dear R users
 
 I need to  auto scale the left y axis in the code below, so that when
 I scroll left or right the left y-axis scale changes to accumulate the
 range of the displayed data with in the max hight of the y-axis. 
 
 also how can I make the crosshair horizontal since it is only
 vertical in this code. this code with a kind help from 
 Gregory (Greg) L. Snow Ph.D.
 
 just stated with tcl/tk and waiting for the book to arrive. 
 
 thank you
 
 # code starts 
 library(tkrplot)
 
 L - length(myData); 
 
 tt - tktoplevel()#cursor='crosshair')
 left - tclVar(1)
 oldleft - tclVar(1)
 right - tclVar(L)
 cury - tclVar(' ')
 curx - NA
 tmpusr - numeric(4)
 tmpplt - numeric(4)
 
 f1 - function(){
   lleft - as.numeric(tclvalue(left))
   rright - as.numeric(tclvalue(right))
   x - seq(lleft,rright,by=1)
   par(bg='black', fg='green', col='white', col.axis='white',
   col.lab='magenta', col.main='blue', col.sub='cyan')
   plot(x,myData[x], type='s', ylim=range(myData))
 
 ##   par(new=TRUE)
 ##   plot(x,myData2[x], type='s', col='yellow',axes=F)
   
   par(new=TRUE)
   plot(x,myData3[x], type='s', col='cyan',axes=F)
 
   axis(4)
   tmpusr - par('usr')
   tmpplt - par('plt')
   if(!is.na(curx)){
 abline(v=curx, col='red', lty=2)
 points(curx,myData[curx],pch=16,col='red')
   }
 }
 
 img - tkrplot(tt, f1,hscale=2,vscale=1.2)
 tkconfigure(img, cursor='crosshair')
 
 f2 - function(...){
 ol - as.numeric(tclvalue(oldleft))
 tclvalue(oldleft) - tclvalue(left)
 r - as.numeric(tclvalue(right))
 tclvalue(right) - as.character(r + as.numeric(...) - ol)
 tkrreplot(img)
 }
 
 f3 - function(...){
 tkrreplot(img)
 }
 
 f4 - function(...){
 ol - as.numeric(tclvalue(oldleft))
 tclvalue(left) - as.character(ol+10)
 tclvalue(oldleft) - as.character(ol+10)
 r - as.numeric(tclvalue(right))
 tclvalue(right) - as.character(r+10)
 tkrreplot(img)
 }
 
 iw - as.numeric(tcl('image','width', tkcget(img,'-image')))
 ih - as.numeric(tcl('image','height',tkcget(img,'-image')))
 
 mm - function(x,Y){
   tx - (as.numeric(x)-1)/iw
   ty - 1-(as.numeric(Y)-1)/ih
   
   if( tx  tmpplt[1]  tx  tmpplt[2] 
  ty  tmpplt[3]  ty  tmpplt[4] ){
 
 newx - 
(tx-tmpplt[1])/(tmpplt[2]-tmpplt[1])*(tmpusr[2]-tmpusr[1])+tmpusr[1]
 curx - round(newx)
 tkrreplot(img)
 
 newy - myData[curx]
 newy - floor(newy) + (newy-floor(newy))*32/100
 
 #newy2 - myData2[curx]
 newy3 - myData3[curx]
 
 tclvalue(cury) - paste('x =',curx,'  myData=',round(newy,2)
 ,'  myData3=',round(newy3,4))
   }
 } 
 
 tkbind(img, 'Motion', mm)
 
 l1 - tklabel(tt, textvariable=cury)
 
 s1 - tkscale(tt, command=f2, from=1, to=length(myData),
 variable=left, orient=horiz,label='left',length=700)
 s2  - tkscale(tt, command=f3, from=1, to=length(myData),
 variable=right, orient=horiz,label='right',length=700)
 b1 - tkbutton(tt, text='-', command=f4)
 
 tkpack(l1,img,s1,s2,b1)
 # code ends 

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] interaction terms in formula of lm or glm

2006-04-16 Thread Luke
I would like to include pairwise interaction terms for lm(). For example, I
want to include the quadratic term of variable V3.

 my.formula
y ~ 1 + V1 + V3 + V3:V3

 my.data
   y V1 V2  V3
1 31  1 42 140
2 32  0 43 120
3 33  0 57 150
4 34  0 55 132

 foo - lm(my.formula, data = my.data)

 foo$coefficients
(Intercept)  V1  V3
29.47368421 -2.15789474  0.02631579

Why do the foo coefficients not include V3:V3 ?

I thought that the variable V3:V3 has the values of squares of V3
elements, that is,
V3:V3
140*140
120*120

Am I wrong?

If I specify a fouth varaible V4 with square values of V3, and the formula
is y ~ 1 + V1 + V3 + V4, it seems that the foo will give me different in and
out-sample predictions.

-Luke

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interaction terms in formula of lm or glm

2006-04-16 Thread Christos Hatzis
If you want the quadratic term, you need to pass it as an argument in
function I():

y ~ 1 + V1 + V3 + I(V3^2)

This is documented in ?formula

-Christos  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Luke
Sent: Monday, April 17, 2006 12:08 AM
To: R-help@stat.math.ethz.ch
Subject: [R] interaction terms in formula of lm or glm

I would like to include pairwise interaction terms for lm(). For example, I
want to include the quadratic term of variable V3.

 my.formula
y ~ 1 + V1 + V3 + V3:V3

 my.data
   y V1 V2  V3
1 31  1 42 140
2 32  0 43 120
3 33  0 57 150
4 34  0 55 132

 foo - lm(my.formula, data = my.data)

 foo$coefficients
(Intercept)  V1  V3
29.47368421 -2.15789474  0.02631579

Why do the foo coefficients not include V3:V3 ?

I thought that the variable V3:V3 has the values of squares of V3
elements, that is,
V3:V3
140*140
120*120

Am I wrong?

If I specify a fouth varaible V4 with square values of V3, and the formula
is y ~ 1 + V1 + V3 + V4, it seems that the foo will give me different in and
out-sample predictions.

-Luke

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html