[R] How to do varimax rotation for principal component based factor analysis, any packages?
Dear R users the factanal pacakge is always MLE, which package can do varimax rotation with the results from princomp ? thank you yong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] string vector indices
On Sun, Apr 16, 2006 at 12:26:29AM -0400, Luke wrote: x - a, aab y - a, aa, aab, aabc. Is there any R function to get the indices of y for the elements of x, that is, foo(x, y) will give me the index vector c(1, 3)? match(x, y) cu Philipp -- Dr. Philipp PagelTel. +49-8161-71 2131 Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186 Technical University of Munich Science Center Weihenstephan 85350 Freising, Germany and Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675 GSF - National Research Center Fax. +49-89-3187 3585 for Environment and Health Ingolstädter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do varimax rotation for principal component based factor analysis, any packages?
I don't think the result from _princomp _ should be rotated. The definition of principal components explicitly indicates that the result is to have axes that are uncorrelated and line up in directions of maximum variance. If I am wrong,I hope some one else to point it out. 2006/4/16, Yong Wang [EMAIL PROTECTED]: Dear R users the factanal pacakge is always MLE, which package can do varimax rotation with the results from princomp ? thank you yong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- 黄荣贵 Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] permutation of rows of a matrix
Dear John, I understand what you mean. However, when someone is learning R for the first time or have little experience, such examples help to understand the connection of different parts of the language. Moreover, things that make sense once you know them, can be difficult to relate in the first place. For example, it would be interesting to know how many new R users don't know that there is a manual page for [. I hope you can understand my point of view (you may disagree, though.) Regards, Manuel. John Fox wrote: Dear Manuel, Although ?sample doesn't specifically describe permuting the rows of a matrix, it does say that sample(x) generates a random permutation of the elements of x (or 1:x). Indexing the rows of the matrix by a permutation of 1:x (where x is the number of rows) doesn't seem to be much of a leap. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel López-Ibáñez Sent: Saturday, April 15, 2006 9:44 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] permutation of rows of a matrix help(sample) does not say anything about randomly permuting the rows of a matrix M by using M[sample(m,m),]. Perhaps it could be added as an example of use. John Fox wrote: Dear Jose, M[sample(m, m),] will randomly permute the rows of M. [You probably could have figured this out via help.search(permutation), which would have led you to sample().] Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of javargas Sent: Saturday, April 15, 2006 7:53 AM To: r-help@stat.math.ethz.ch Subject: [R] permutation of rows of a matrix How can I generate a random permutation between rows of a matrix M (of m rows and n columns)? Thanks for your help, Jose __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] solve matrix
Nongluck Klibbua [EMAIL PROTECTED] writes: Hi R-users, I try to use solve to find the answer of this linear equation a=x*b where a is 2*1 matrix and b is 2*2 matrix.I would like to get x so I call y-solve(a,b) but this one happen a is 2*1 matrix so what I should do . Thanks Luck First get your extents right. For the matrix product x*b to make sense, you need x to be n*2 if b is 2*2, and the result is n*2, so how can a be 2*1?? Next, notice that solve(A,b) solves A*x = b, where A is a square matrix, so you need solve(b,a) or maybe solve(b, t(a)) or t(solve(b,t(a)) depending on what your problem really was. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do varimax rotation for principal component based factor analysis, any packages?
On Sun, 16 Apr 2006, ronggui wrote: I don't think the result from _princomp _ should be rotated. The definition of principal components explicitly indicates that the result is to have axes that are uncorrelated and line up in directions of maximum variance. If I am wrong,I hope some one else to point it out. They can be, whether they should be is another matter. See the Statistical Complements to MASS3 at http://www.stats.ox.ac.uk/pub/MASS3/Compl.shtml for some examples and references. The original post is starting from false premises: there is no `factanal package', and the varimax function in stats is not confined to the results of factanal(). If for some reason you want to use the so-called `principal component factor analysis', you could make use of one of the several statistical systems with names starting with S that provide it, or write your own R code. 2006/4/16, Yong Wang [EMAIL PROTECTED]: Dear R users the factanal pacakge is always MLE, which package can do varimax rotation with the results from princomp ? thank you yong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- »ÆÈÙ¹ó Deparment of Sociology Fudan University -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] permutation of rows of a matrix
Dear Manuel, I do understand your point of view, and think that it is reasonable, but in this case, I disagree. It wouldn't hurt to have an example in ?sample of using a permutation to index a matrix, but this is one of many uses of permutations and it is not possible to show or even to anticipate all of them. To use a programming environment like R effectively, it's necessary to acquire some basic facility with the language (such as an understanding of how indexing works), and it's much more efficient to acquire this facility by reading a manual or book than through help pages. For example, the Introduction to R manual that comes with R has a section on indexing arrays, and most introductory books on R, including free ones, have more detailed explanations of the subject. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel López-Ibáñez Sent: Sunday, April 16, 2006 4:52 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] permutation of rows of a matrix Dear John, I understand what you mean. However, when someone is learning R for the first time or have little experience, such examples help to understand the connection of different parts of the language. Moreover, things that make sense once you know them, can be difficult to relate in the first place. For example, it would be interesting to know how many new R users don't know that there is a manual page for [. I hope you can understand my point of view (you may disagree, though.) Regards, Manuel. John Fox wrote: Dear Manuel, Although ?sample doesn't specifically describe permuting the rows of a matrix, it does say that sample(x) generates a random permutation of the elements of x (or 1:x). Indexing the rows of the matrix by a permutation of 1:x (where x is the number of rows) doesn't seem to be much of a leap. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel López-Ibáñez Sent: Saturday, April 15, 2006 9:44 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] permutation of rows of a matrix help(sample) does not say anything about randomly permuting the rows of a matrix M by using M[sample(m,m),]. Perhaps it could be added as an example of use. John Fox wrote: Dear Jose, M[sample(m, m),] will randomly permute the rows of M. [You probably could have figured this out via help.search(permutation), which would have led you to sample().] Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of javargas Sent: Saturday, April 15, 2006 7:53 AM To: r-help@stat.math.ethz.ch Subject: [R] permutation of rows of a matrix How can I generate a random permutation between rows of a matrix M (of m rows and n columns)? Thanks for your help, Jose __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] string vector indices
On Sun, 2006-04-16 at 11:48 +0200, Philipp Pagel wrote: On Sun, Apr 16, 2006 at 12:26:29AM -0400, Luke wrote: x - a, aab y - a, aa, aab, aabc. Is there any R function to get the indices of y for the elements of x, that is, foo(x, y) will give me the index vector c(1, 3)? match(x, y) cu Philipp Just a quick note here that there is one limitation in using match(), which is that it will only return the index of the first match. So, if there was an element in 'x' that appeared more than once in 'y', you would still only get c(1, 3): x - c(a, aab) y - c(a, aa, aab, aabc, a) match(x, y) [1] 1 3 Note that the second occurrence of a in 'y' is not picked up. To solve this problem use %in% combined with which(): which(y %in% x) [1] 1 3 5 See ?which and ?%in% for more information. Note that the help for %in% is on the same page as ?match. This approach will work in both situations. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] dcolumn
Brian Quinif wrote: Does anyone out there use dcolumn=TRUE in the latex() function in the Hmisc library? I would like to line up the data in a latex table I'm making using latex(), but I'm having some issues with this feature. Since there is no description of it in the help, I thought that it might be incomplete or something like that. ?latex points you to ?format.df for dcolumn On a related note, if anyone knows how to get the end result I want (LaTeX table with data aligned by decimal point), I'd love to hear it. I have been having issues with this which I think might be related to the fact that I have std. errors enclosed in parentheses. For such customized non-numeric formats you might want to just include latex markup inside the character strings. FH Thanks, BQ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] matching identical row names
Dear Dr. Fox Your reply to Sirinivas Iyyar was most helpful to me. I am trying to collapse some categories of a data.frame in a similar way. I have a data frame in the form below Prog Sub.Program Job V1 V2 V3 1 Alpha A 1 2 3 2 Alpha B 2 3 1 2 GammaB 1 3 3 2 Alpha A 3 4 1 2 GammaB 2 2 3 1 Alpha A 2 22 What I want is to sum the values of VI, V2 and V3 and end up with a new data.frame that would look like ProgSubprog Job Sum(V1) Sum(V2), Sum(V3) 1 AlphaA34 5 2 AlphaA34 1 2 Gamma B 35 6 I thought that I could use by() to create a vector for each of V1:V3 but I cannot see any way to capture the values. temp1 - by(Data[,4] simply gives me the complete output. An example of what I have done is - Prog - 1, 2, 2, 2,2,1, Sub.Program - c(Alpha, Alpha, Gamma, Alpha, Gamma, Alpha ) Job - c(A, B, B, A, B,A) V1 - c(1,2, 1,3,2,2) V2 - c(2, 3, 3, 4, 2, 2) V3 - c(3, 1 , 3, 1, 3,2 Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3) by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum) I also get the expected NA. for cells that do not exist. Is there any way to set them to 0 in the operation? Any help would be greatly appreciated. Thanks John - Original Message From: John Fox [EMAIL PROTECTED] To: Srinivas Iyyer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Saturday, April 15, 2006 9:35:46 AM Subject: Re: [R] matching identical row names Dear Srinivas, Your data are likely in a data frame rather than a matrix (since the columns are heterogeneous), and name is a variable, not the row names of the data frame. There are several ways to do what you want; one simple way, assuming that the data are in a data frame named Data, is by(Data[,2:5], Data$name, mean) If you want the result in the form of a matrix, then you could do aggregate(Data[,2:5], list(Data$name), mean) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Srinivas Iyyer Sent: Friday, April 14, 2006 11:58 PM To: r-help@stat.math.ethz.ch Subject: [R] matching identical row names dear group, i have a sample matrix name v1 v2 v3 v4 cat 1011 12 15 dog 3 12 10 14 cat 9 12 12 15 cat 5 12 10 11 dog 12113 123 31 ... since cat is repeated 3 times, I want a mean value for it. Like wise for every element of the name column. cat v1 = mean(c(10,9,5)) cat v3 = mean(c(11,12,13)) ..etc. name v1 v2 v3 v4 cat 8 11.6 11.3 13.6 dog 7.5 62.5 66.5 22.5 could any one help me in solving this mystery. thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] matching identical row names
Dear John, You can use aggregate(), also described in my suggestion to Sirinivas: aggregate(Data[, 4:6], Data[1:3], sum) Prog Sub.Program Job V1 V2 V3 11 Alpha A 3 4 5 22 Alpha A 3 4 1 32 Alpha B 2 3 1 42 Gamma B 3 5 6 I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Kane Sent: Sunday, April 16, 2006 10:29 AM To: John Fox; Srinivas Iyyer Cc: r-help@stat.math.ethz.ch Subject: Re: [R] matching identical row names Dear Dr. Fox Your reply to Sirinivas Iyyar was most helpful to me. I am trying to collapse some categories of a data.frame in a similar way. I have a data frame in the form below Prog Sub.Program Job V1 V2 V3 1 Alpha A 1 2 3 2 Alpha B 2 3 1 2 GammaB 1 3 3 2 Alpha A 3 4 1 2 GammaB 2 2 3 1 Alpha A 2 22 What I want is to sum the values of VI, V2 and V3 and end up with a new data.frame that would look like ProgSubprog Job Sum(V1) Sum(V2), Sum(V3) 1 AlphaA34 5 2 AlphaA34 1 2 Gamma B 35 6 I thought that I could use by() to create a vector for each of V1:V3 but I cannot see any way to capture the values. temp1 - by(Data[,4] simply gives me the complete output. An example of what I have done is - Prog - 1, 2, 2, 2,2,1, Sub.Program - c(Alpha, Alpha, Gamma, Alpha, Gamma, Alpha ) Job - c(A, B, B, A, B,A) V1 - c(1,2, 1,3,2,2) V2 - c(2, 3, 3, 4, 2, 2) V3 - c(3, 1 , 3, 1, 3,2 Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3) by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum) I also get the expected NA. for cells that do not exist. Is there any way to set them to 0 in the operation? Any help would be greatly appreciated. Thanks John - Original Message From: John Fox [EMAIL PROTECTED] To: Srinivas Iyyer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Saturday, April 15, 2006 9:35:46 AM Subject: Re: [R] matching identical row names Dear Srinivas, Your data are likely in a data frame rather than a matrix (since the columns are heterogeneous), and name is a variable, not the row names of the data frame. There are several ways to do what you want; one simple way, assuming that the data are in a data frame named Data, is by(Data[,2:5], Data$name, mean) If you want the result in the form of a matrix, then you could do aggregate(Data[,2:5], list(Data$name), mean) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Srinivas Iyyer Sent: Friday, April 14, 2006 11:58 PM To: r-help@stat.math.ethz.ch Subject: [R] matching identical row names dear group, i have a sample matrix name v1 v2 v3 v4 cat 1011 12 15 dog 3 12 10 14 cat 9 12 12 15 cat 5 12 10 11 dog 12113 123 31 ... since cat is repeated 3 times, I want a mean value for it. Like wise for every element of the name column. cat v1 = mean(c(10,9,5)) cat v3 = mean(c(11,12,13)) ..etc. name v1 v2 v3 v4 cat 8 11.6 11.3 13.6 dog 7.5 62.5 66.5 22.5 could any one help me in solving this mystery. thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list
Re: [R] matching identical row names
This was immensely helpful ! I had tried aggregate() , messed it up and decided that I must have misunderstood its use vs by() and so posted my question to you. Instead I must have had a typo in there somewhere . I went back, did it again and it is lovely. I was going to have to 'slice and dice' the dataset with a myriad of subset calls otherwise. Thank you very much. I hope the weather is as nice in Hamilton as it is in Kingston. And now I can go get some sun! - Original Message From: John Fox [EMAIL PROTECTED] To: John Kane [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch; Srinivas Iyyer [EMAIL PROTECTED] Sent: Sunday, April 16, 2006 11:51:51 AM Subject: RE: [R] matching identical row names Dear John, You can use aggregate(), also described in my suggestion to Sirinivas: aggregate(Data[, 4:6], Data[1:3], sum) Prog Sub.Program Job V1 V2 V3 11 Alpha A 3 4 5 22 Alpha A 3 4 1 32 Alpha B 2 3 1 42 Gamma B 3 5 6 I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Kane Sent: Sunday, April 16, 2006 10:29 AM To: John Fox; Srinivas Iyyer Cc: r-help@stat.math.ethz.ch Subject: Re: [R] matching identical row names Dear Dr. Fox Your reply to Sirinivas Iyyar was most helpful to me. I am trying to collapse some categories of a data.frame in a similar way. I have a data frame in the form below Prog Sub.Program Job V1 V2 V3 1 Alpha A 1 2 3 2 Alpha B 2 3 1 2 GammaB 1 3 3 2 Alpha A 3 4 1 2 GammaB 2 2 3 1 Alpha A 2 22 What I want is to sum the values of VI, V2 and V3 and end up with a new data.frame that would look like ProgSubprog Job Sum(V1) Sum(V2), Sum(V3) 1 AlphaA34 5 2 AlphaA34 1 2 Gamma B 35 6 I thought that I could use by() to create a vector for each of V1:V3 but I cannot see any way to capture the values. temp1 - by(Data[,4] simply gives me the complete output. An example of what I have done is - Prog - 1, 2, 2, 2,2,1, Sub.Program - c(Alpha, Alpha, Gamma, Alpha, Gamma, Alpha ) Job - c(A, B, B, A, B,A) V1 - c(1,2, 1,3,2,2) V2 - c(2, 3, 3, 4, 2, 2) V3 - c(3, 1 , 3, 1, 3,2 Mydata - data.frame(cbind( Prog, Sub.Program, Job, V1, V2, V3) by(MyData[,4],list(Sub.Program=Sub.Program, Job=Job), sum) I also get the expected NA. for cells that do not exist. Is there any way to set them to 0 in the operation? Any help would be greatly appreciated. Thanks John - Original Message From: John Fox [EMAIL PROTECTED] To: Srinivas Iyyer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Saturday, April 15, 2006 9:35:46 AM Subject: Re: [R] matching identical row names Dear Srinivas, Your data are likely in a data frame rather than a matrix (since the columns are heterogeneous), and name is a variable, not the row names of the data frame. There are several ways to do what you want; one simple way, assuming that the data are in a data frame named Data, is by(Data[,2:5], Data$name, mean) If you want the result in the form of a matrix, then you could do aggregate(Data[,2:5], list(Data$name), mean) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Srinivas Iyyer Sent: Friday, April 14, 2006 11:58 PM To: r-help@stat.math.ethz.ch Subject: [R] matching identical row names dear group, i have a sample matrix name v1 v2 v3 v4 cat 1011 12 15 dog 3 12 10 14 cat 9 12 12 15 cat 5 12 10 11 dog 12113 123 31 ... since cat is repeated 3 times, I want a mean value for it. Like wise for every element of the name column. cat v1 = mean(c(10,9,5)) cat v3 = mean(c(11,12,13)) ..etc. name v1 v2 v3 v4 cat 8 11.6 11.3 13.6 dog 7.5 62.5 66.5 22.5 could any one help me in solving this mystery. thank you.
[R] Predict nls new data with se.fit snf intervals
Using R to predict.nls() using new data, `se.fit' and `interval' are ignored. . Is there any update for this? Anybody has those routine or may advise me how to do that? Thanks [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [S] Problems with lme and 2 levels of nesting:Summary
I have taken the liberty of including the R-help mailing list on this reply as that is the appropriate place to discuss lmer results. On 4/5/06, Andreas Svensson [EMAIL PROTECTED] wrote: Hello again I have now recieved some helpful hints in this matter and will summarize them but first let me reiterate the problem: I had two treatments: 2 types of food fed to females. Each female were present only in one treatment and laid one clutch (female ID = clutch ID). From each clutch, 50 larvae were taken and divided on 5 cups; 10 larvae in each cup (the cups were not meant to differ in any way, it's just impossible to count 50 larvae in motion, but ok with 10). The day of death was recorded for each larva. Since there's only one data point per larva, I see no need of including larva ID in the model. So: Response: DeathDay Fixed factor: Treatment Random factors: Clucth, Cup Design: Cup nested in Clutch I'm primarlity interested in the effect of Treatment on DeathDay, but would also like to know about the variation within Clutch and Cup. The unit of replication is Clutch. I want to use a model where the variation within Clutch and within Cup is taken into account, while avoiding pseudoreplication of 5 cups (and 50 larvae) coming from the same clutch. The solution is therefore a mixed model with Cup nested in Clutch as random factors. A year ago, I was recommended this script from S-help at Insightful: model1-lme(DeathDay~Treatment,random=~Treatment|Clutch/Cup) I think this is wrong since Treatment shouldn't be among the random factors. From the various tips I've recieved the correct syntax using lme (in libarary nlme) should be: model2-lme(DeathDay~ Treatment,random=~1|Clutch/Cup) and using lmer (in library lme4): model3- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch), method=REML) I get the result that there is NO effect of Treatment. So far so good. If there is no significant effect for Treatment then I would go on to fit a model without the Treatment effect and use that model to examine the significane of the random effects. But the above models will not give me any information of the importance of the random factors. If I do the same in SPSS: glm DeathDay by Cup Clutch Treatment /method=sstype(1) /random Cup Clutch /design Cup(Clutch). ...I get an ANOVA table that I (and referees) can understand. It tells me about where the variation originates from; in one experiment only Clutch is significant, in another there is also a (surprising but explainable) effect of Cup. An analysis of variance table may be appropriate for balanced data with nested factors such as you have but it does not generalize well to other situations in which these models are applied. The preferred approach to model-building in R/S-PLUS is to build the model sequentially. I would use fm1 - lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch)) fm2 - lmer(DeathDay ~ 1 + (1|Clutch:Cup) + (1|Clutch)) # equivalent to your model7 fm3 - lmer(DeathDay ~ 1 + (1|Clutch)) and then compare fm2 and fm3, say with anova(fm2, fm3). As described below the p-value from the likelihood ratio test in this case is not exact because the hypothesis you are testing is on the boundary of the parameter space. However, the value returned will be a conservatve p-value so you suffer a loss of power rather than compromising the level of the test. If you choose you could look at the distribution of the parameters from a Bayesian perspective by using the mcmcsamp function to generate a Markov Chain Monte Carlo sample from the parameters in fm2 and check if you have a precise or an imprecise estimate of the variance of the random effects for the Clutch:Cup grouping factor. If that parameter could reasonably be zero then you could repeat the sampling for fm3 and look at the distribution of the variance of the random effects associated with the Clutch grouping factor. In R/S-plus I can achieve something similar with model simplification. REML will not allow me to remove the fixed factor, so I have to use regular ML for this: I think I would prefer to characterize the situation by saying that the log-likelihood from the REML criterion cannot be used to compare models when you change the fixed-effects specificatio. However, this still doesn't explain to me why you want to retain the Treatment factor when you have judged it to be inert. model4- lme(DeathDay ~ Treatment, random=~ 1 | Clutch/Cup,method=ML) #Full model model5- lme(DeathDay ~ Treatment, random=~ 1 | Clutch,method=ML) # model minus Cup model6- lme(DeathDay ~ Treatment, random=~ 1 | Cup,method=ML)# model minus Clutch model7- lme(DeathDay ~ 1, random=~ 1 | Clutch/Cup,method=ML) # model minus Treatment ...and make anova(model4, modelxxx)etc to test for effects of Cup, Clutch and Treatment. But is this really correct? In model5 I remove Cup, doesn't this wrongly boost the df for
Re: [R] generalized hypergeometric function
Marc Schwartz MSchwartz at mn.rr.com writes: On Sat, 2006-04-15 at 20:59 -0300, Bernardo Rangel tura wrote: Hi R-masters I need compute generalized hypergeometric function. I look in R-project and R-help list and not find nothing about generalized hypergeometric function Is possible calculate the generalized hypergeometric function? Somebody have a script for this? Note that he is looking for the h'geom FUNCTION, not DISTRIBUTION (e.g. http://mathworld.wolfram.com/GeneralizedHypergeometricFunction.html); Robin Hankin wrote some code (hypergeo in the Davies package on CRAN) to compute a particular Gaussian h'geom function, and was asking at one point on the mailing list whether anyone was interested in other code; I don't know whether it will be generalized enough for you. Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Var.calc in Match()
Does anyone else find that using the Var.calc option (for heteroscedasticity consistent std. errors) in Match() (from the Matching library) slows down computation of the matching estimator by a lot? I don't really understand why when I use this option it slows down so much, but for me it does significantly. I want to use the heteroscedasticity consistent std. errors in my project, but as long as it takes to compute them, I don't know if I will be able to. Thanks, BQ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logistic regression model with non-integer weights
Michael Dewey wrote: At 17:12 09/04/06, Ramón Casero Cañas wrote: I am not sure what the problem you really want to solve is but it seems that a) abnormality is rare b) the logistic regression predicts it to be rare. If you want a prediction system why not try different cut-offs (other than 0.5 on the probability scale) and perhaps plot sensitivity and specificity to help to choose a cut-off? Thanks for your suggestions, Michael. It took me some time to figure out how to do this in R (as trivial as it may be for others). Some comments about what I've done follow, in case anyone is interested. The problem is a) abnormality is rare (Prevalence=14%) and b) there is not much difference in the independent variable between abnormal and normal. So the logistic regression model predicts that P(abnormal) = 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to decide between normal/abnormal. But you are right, in that another cut-off point can be chosen. For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and Specificity=52%. They are pretty bad, although for clinical purposes I would say that Positive/Negative Predictive Values are more interesting. But then PPV=19% and NPV=90%, which isn't great. As an overall test of how good the model is for classification I have computed the area under the ROC, from your suggestion of using Sensitivity and Specificity. I couldn't find how to do this directly with R, so I implemented it myself (it's not difficult but I'm new here). I tried with package ROCR, but apparently it doesn't cover binary outcomes. The area under the ROC is 0.64, so I would say that even though the model seems to fit the data, it just doesn't allow acceptable discrimination, not matter what the cut-off point. I have also studied the effect of low prevalence. For this, I used option ran.gen in the boot function (package boot) to define a function that resamples the data so that it balances abnormal and normal cases. A logistic regression model is fitted to each replicate, to a parametric bootstrap, and thus compute the bias of the estimates of the model coefficients, beta0 and beta1. This shows very small bias for beta1, but a rather large bias for beta0. So I would say that prevalence has an effect on beta0, but not beta1. This is good, because a common measure like the odds ratio depends only on beta1. Cheers, -- Ramón Casero Cañas http://www.robots.ox.ac.uk/~rcasero/wiki http://www.robots.ox.ac.uk/~rcasero/blog __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] generalized hypergeometric function
On Sun, 2006-04-16 at 17:54 +, Ben Bolker wrote: Marc Schwartz MSchwartz at mn.rr.com writes: On Sat, 2006-04-15 at 20:59 -0300, Bernardo Rangel tura wrote: Hi R-masters I need compute generalized hypergeometric function. I look in R-project and R-help list and not find nothing about generalized hypergeometric function Is possible calculate the generalized hypergeometric function? Somebody have a script for this? Note that he is looking for the h'geom FUNCTION, not DISTRIBUTION (e.g. http://mathworld.wolfram.com/GeneralizedHypergeometricFunction.html); Robin Hankin wrote some code (hypergeo in the Davies package on CRAN) to compute a particular Gaussian h'geom function, and was asking at one point on the mailing list whether anyone was interested in other code; I don't know whether it will be generalized enough for you. Ben Bolker Thanks Ben. I stand corrected on that point. Didn't click in my initial reading. Regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logistic regression model with non-integer weights
On Sun, 2006-04-16 at 19:10 +0100, Ramón Casero Cañas wrote: Thanks for your suggestions, Michael. It took me some time to figure out how to do this in R (as trivial as it may be for others). Some comments about what I've done follow, in case anyone is interested. The problem is a) abnormality is rare (Prevalence=14%) and b) there is not much difference in the independent variable between abnormal and normal. So the logistic regression model predicts that P(abnormal) = 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to decide between normal/abnormal. But you are right, in that another cut-off point can be chosen. For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and Specificity=52%. They are pretty bad, although for clinical purposes I would say that Positive/Negative Predictive Values are more interesting. But then PPV=19% and NPV=90%, which isn't great. As an overall test of how good the model is for classification I have computed the area under the ROC, from your suggestion of using Sensitivity and Specificity. I couldn't find how to do this directly with R, so I implemented it myself (it's not difficult but I'm new here). I tried with package ROCR, but apparently it doesn't cover binary outcomes. The area under the ROC is 0.64, so I would say that even though the model seems to fit the data, it just doesn't allow acceptable discrimination, not matter what the cut-off point. I have also studied the effect of low prevalence. For this, I used option ran.gen in the boot function (package boot) to define a function that resamples the data so that it balances abnormal and normal cases. A logistic regression model is fitted to each replicate, to a parametric bootstrap, and thus compute the bias of the estimates of the model coefficients, beta0 and beta1. This shows very small bias for beta1, but a rather large bias for beta0. So I would say that prevalence has an effect on beta0, but not beta1. This is good, because a common measure like the odds ratio depends only on beta1. Cheers, -- Ramón Casero Cañas http://www.robots.ox.ac.uk/~rcasero/wiki http://www.robots.ox.ac.uk/~rcasero/blog The Epi package has function ROC that draws the ROC curve and computes the AUC among other things. Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] summary stats
I have a data set that has student test scores along with several categorical variables. I would like to generate a set of summary stats (mean, variance, n) for the data grouped by school authority and by exam topic. I have tried the by() function but that seems to only be able to handle one level of grouping. In particular what I would like is something like the following Board Subject MeanVarianceN board1 english 70 150 600 board2 english 66 210 510 board1 science 69 180 605 board2 science 71 220 520 and so on. I have already generated the stats that I need using GROUP BY in a select query in MySQL. I'm just curious now about doing the same thing in R thanks in advance, Neil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] summary stats
You could use aggregate: agg.mean - aggregate(my.data, by=list(Board, Subject), FUN=mean) The caveat is that you can only use one aggregator function at a time. You could rerun the same for FUN=var and FUN=length to get the additional aggregate statustics that you need and then cbind the results: cbind(agg.mean, agg.var, agg.n) -Christos -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn Sent: Sunday, April 16, 2006 2:56 PM To: r-help@stat.math.ethz.ch Subject: [R] summary stats I have a data set that has student test scores along with several categorical variables. I would like to generate a set of summary stats (mean, variance, n) for the data grouped by school authority and by exam topic. I have tried the by() function but that seems to only be able to handle one level of grouping. In particular what I would like is something like the following Board Subject MeanVarianceN board1 english 70 150 600 board2 english 66 210 510 board1 science 69 180 605 board2 science 71 220 520 and so on. I have already generated the stats that I need using GROUP BY in a select query in MySQL. I'm just curious now about doing the same thing in R thanks in advance, Neil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logistic regression model with non-integer weights
Ramón Casero Cañas wrote: Michael Dewey wrote: At 17:12 09/04/06, Ramón Casero Cañas wrote: I am not sure what the problem you really want to solve is but it seems that a) abnormality is rare b) the logistic regression predicts it to be rare. If you want a prediction system why not try different cut-offs (other than 0.5 on the probability scale) and perhaps plot sensitivity and specificity to help to choose a cut-off? Thanks for your suggestions, Michael. It took me some time to figure out how to do this in R (as trivial as it may be for others). Some comments about what I've done follow, in case anyone is interested. The problem is a) abnormality is rare (Prevalence=14%) and b) there is not much difference in the independent variable between abnormal and normal. So the logistic regression model predicts that P(abnormal) = 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to decide between normal/abnormal. But you are right, in that another cut-off point can be chosen. For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and Specificity=52%. They are pretty bad, although for clinical purposes I would say that Positive/Negative Predictive Values are more interesting. But then PPV=19% and NPV=90%, which isn't great. As an overall test of how good the model is for classification I have computed the area under the ROC, from your suggestion of using Sensitivity and Specificity. I couldn't find how to do this directly with R, so I implemented it myself (it's not difficult but I'm new here). I tried with package ROCR, but apparently it doesn't cover binary outcomes. The area under the ROC is 0.64, so I would say that even though the model seems to fit the data, it just doesn't allow acceptable discrimination, not matter what the cut-off point. I have also studied the effect of low prevalence. For this, I used option ran.gen in the boot function (package boot) to define a function that resamples the data so that it balances abnormal and normal cases. A logistic regression model is fitted to each replicate, to a parametric bootstrap, and thus compute the bias of the estimates of the model coefficients, beta0 and beta1. This shows very small bias for beta1, but a rather large bias for beta0. So I would say that prevalence has an effect on beta0, but not beta1. This is good, because a common measure like the odds ratio depends only on beta1. Cheers, This makes me think you are trying to go against maximum likelihood to optimize an improper criterion. Forcing a single cutpoint to be chosen seems to be at the heart of your problem. There's nothing wrong with using probabilities and letting the utility possessor make the final decision. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] a question on df of linear model
Dear R-users: On page 155 of Mixed-effects Models in S and S-Plus, the degree of freedoms of the anova comparison of lme and lm are 8 and 5. But when I use the following SAS code: proc glm data=ortho2; class gender; model distance = age|gender / solution ; run; The df is 3. Could you please explain this to me? Thanks Joe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Reading SPSS .sav files
Greetings, How to I read in SPSS .sav files into R. Thank you. David = David Kaplan, PhD Professor of Education School of Education University of Delaware Newark DE 19716 Voice: 302-831-8696 Fax:302-831-4110 email: mailto:[EMAIL PROTECTED] Homepage: http://www.udel.edu/dkaplan SOE page: http://www.udel.edu/educ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] summary stats
Dear Neil, Coincidentally, more or less the same question was asked on r-help yesterday and today. You can use either the by() function or aggregate(), though you'll have to do a bit of work on the result if you want it to look just like your example. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn Sent: Sunday, April 16, 2006 1:56 PM To: r-help@stat.math.ethz.ch Subject: [R] summary stats I have a data set that has student test scores along with several categorical variables. I would like to generate a set of summary stats (mean, variance, n) for the data grouped by school authority and by exam topic. I have tried the by() function but that seems to only be able to handle one level of grouping. In particular what I would like is something like the following Board Subject MeanVarianceN board1english 70 150 600 board2english 66 210 510 board1science 69 180 605 board2science 71 220 520 and so on. I have already generated the stats that I need using GROUP BY in a select query in MySQL. I'm just curious now about doing the same thing in R thanks in advance, Neil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reading SPSS .sav files
Dear David, There is the read.spss() function in the foreign package, and spss.get() in Hmisc. The former is a part of the standard R distribution, so you could have discovered it via, e.g., help.search(SPSS) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Kaplan Sent: Sunday, April 16, 2006 3:35 PM To: r-help@stat.math.ethz.ch Subject: [R] Reading SPSS .sav files Greetings, How to I read in SPSS .sav files into R. Thank you. David = David Kaplan, PhD Professor of Education School of Education University of Delaware Newark DE 19716 Voice: 302-831-8696 Fax:302-831-4110 email: mailto:[EMAIL PROTECTED] Homepage: http://www.udel.edu/dkaplan SOE page: http://www.udel.edu/educ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reading SPSS .sav files
David Kaplan [EMAIL PROTECTED] writes: Greetings, How to I read in SPSS .sav files into R. with read.spss() (package foreign) Thank you. David ... PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Did you? Following the pointers in there should be enough to answer a question like this. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reading SPSS .sav files
I don't use SPSS, so I can't help with the details. That being said, have you looked at the foreign package? --Matt Matt Austin Statistician Amgen, Inc 800 9AMGEN9 x77431 805-447-7431 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of David Kaplan Sent: Sunday, April 16, 2006 1:35 PM To: r-help@stat.math.ethz.ch Subject: [R] Reading SPSS .sav files Greetings, How to I read in SPSS .sav files into R. Thank you. David = David Kaplan, PhD Professor of Education School of Education University of Delaware Newark DE 19716 Voice: 302-831-8696 Fax:302-831-4110 email: mailto:[EMAIL PROTECTED] Homepage: http://www.udel.edu/dkaplan SOE page: http://www.udel.edu/educ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logistic regression model with non-integer weights
Frank E Harrell Jr wrote: This makes me think you are trying to go against maximum likelihood to optimize an improper criterion. Forcing a single cutpoint to be chosen seems to be at the heart of your problem. There's nothing wrong with using probabilities and letting the utility possessor make the final decision. I agree, and in fact I was thinking along those lines, but I also needed a way of evaluating how good is the model to discriminate between abnormal and normal cases, as opposed to e.g. GOF. The only way I know of is using area under ROC (thus setting cut-off points), which also followed neatly from Michael Dewey comments. Any alternatives would be welcome :) -- Ramón Casero Cañas http://www.robots.ox.ac.uk/~rcasero/wiki http://www.robots.ox.ac.uk/~rcasero/blog __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logistic regression model with non-integer weights
Ramón Casero Cañas wrote: Frank E Harrell Jr wrote: This makes me think you are trying to go against maximum likelihood to optimize an improper criterion. Forcing a single cutpoint to be chosen seems to be at the heart of your problem. There's nothing wrong with using probabilities and letting the utility possessor make the final decision. I agree, and in fact I was thinking along those lines, but I also needed a way of evaluating how good is the model to discriminate between abnormal and normal cases, as opposed to e.g. GOF. The only way I know of is using area under ROC (thus setting cut-off points), which also followed neatly from Michael Dewey comments. Any alternatives would be welcome :) To get the ROC area you don't need to do any of that, and as you indicated, it is a good discrimination measure. The lrm function in the Design package gives it to you automatically (C index), and you can also get it with the Hmisc package's somers2 and rcorr.cens functions. ROC area is highly related to the Wilcoxon 2-sample test statistic for comparing cases and non-cases. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem getting R's decision tree for Quinlan's golf example data
Newbie question, but I've checked archives etc. Am trying to reproduce in R Quinlan's trivial example of the golf decision tree. The data file of 14 examples follows (read in via read.table()): Outlook Temperature Humidity Windy PlayDontPlay 1 sunny 85 85 false DontPlay 2 sunny 80 90 true DontPlay 3 overcast 83 78 false Play 4 rain 70 96 false Play 5 rain 68 80 false Play 6 rain 65 70 true DontPlay 7 overcast 64 65 true Play 8 sunny 72 95 false DontPlay 9 sunny 69 70 false Play 10 rain 75 80 false Play 11 sunny 75 70 true Play 12 overcast 72 90 true Play 13 overcast 81 75 false Play 14 rain 71 80 true DontPlay R reports no format or other trivial problems: summary(golf) Outlook Temperature Humidity Windy PlayDontPlay overcast:4 Min. :64.0 Min. :65.0 false:8 DontPlay:5 rain:5 1st Qu.:69.2 1st Qu.:71.2 true :6 Play:9 sunny :5 Median :72.0 Median :80.0 Mean :73.6 Mean :80.3 3rd Qu.:78.8 3rd Qu.:88.8 Max. :85.0 Max. :96.0 I then try to build a decision tree: golf.rpart - rpart(PlayDontPlay ~ Outlook + Temperature + Humidity + Windy, method=class, data=golf) which doesn't yield a tree: golf.rpart n= 14 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 14 5 Play (0.35714 0.64286) * plot(golf.rpart) Error in plot.rpart(golf.rpart) : fit is not a tree, just a root Thanks, Alan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] survfit with newdata and individual=TRUE
I'm using the survfit() function with the Cox proportional hazards model, and having trouble producing fitted survivor curves when I ask survfit to use newdata recorded over multiple periods for the same individual. I'm using the counting process formulation, as I have time-varying covariates. For example, cph.m1 - coxph(Surv(time=sdatenrm, time2=edatenrm, event=censor)~age) xhyp - list(age=c(50,60), sdatenrm=c(1,20), edatenrm=c(21,40), censor=c(0,1) ) survfit(cph.m1,newdata=xhyp,conf.int=.95,individual=FALSE) works fine, but then running survfit(cph.m1,newdata=xhyp,conf.int=.95,individual=TRUE) gives this error message: Error in temp[, 1] : incorrect number of dimensions Is there something wrong with the way I'm giving survfit() newdata? Thanks, Chris Adolph _ Christopher Adolph Assistant Professor of Political Science Center for Statistics and the Social Sciences Box 354320 University of Washington Seattle, WA 98195-4320 [EMAIL PROTECTED] www.chrisadolph.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R-vim suite
Hi All, If you use vim to edit R code, you may be interested in this. I have put together a personalized syntax file, some code templates, and a way to send code from Vim to R using autoHotKeys (windows). http://www.andrew.cmu.edu/user/jquesada/RvimSuite/instructions.html Actually, the little autoHotKeys can be useful even if you don't use vim just to send the example R code from the help pages to the console. Best wishes, -Jose -- Jose Quesada, PhD. [EMAIL PROTECTED] Dept. of Psychology http://www.andrew.cmu.edu/~jquesada Sussex University Brighton, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] suse 10.0
Ups...the links are: http://fawn.hsu-hh.de/~steuer/SL-10.0-OSS/ http://mirrors.kernel.org/opensuse/distribution/SL-10.0-OSS/inst-source/ On 15/04/06, Detlef Steuer [EMAIL PROTECTED] wrote: Hi, you can add a repository for R found on my server: http://fawn.hsu-hh.de ~steuer/SL-10.0-OSS There I provide rpms for R and ESS. You will need e.g. http://mirrors.kernel.org opensuse/distribution/SL-10.0-OSS/inst-source to get a fortran compiler etc. Hope that helps. Detlef On Fri, 14 Apr 2006 06:21:26 -0700 louis homer [EMAIL PROTECTED] wrote: I downloaded the free CD's and installed Suse 10.0. When I tried to install the latest RPM for R, the Yast installer complained that two files were missing. I was told that I can install from the internet, and started the process, but found I must supply the name of the server and the name of the directory. I tried several combinations unsuccessfully. Can anyone help me? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Keinen Gedanken zweimal denken, auÃer er ist schön. Unbekannte Quelle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem getting R's decision tree for Quinlan's golf exam ple data [Broadcast]
See ?rpart.control. I get: golf.rp = rpart(Outlook ~ ., golf, control=rpart.control(minsplit=1)) golf.rp n= 14 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 14 9 rain (0.2857143 0.3571429 0.3571429) 2) Temperature 71.5 6 2 rain (0.167 0.667 0.167) 4) Temperature 64.5 1 0 overcast (1.000 0.000 0.000) * 5) Temperature=64.5 5 1 rain (0.000 0.800 0.200) 10) Humidity=75 3 0 rain (0.000 1.000 0.000) * 11) Humidity 75 2 1 rain (0.000 0.500 0.500) 22) Temperature 67 1 0 rain (0.000 1.000 0.000) * 23) Temperature=67 1 0 sunny (0.000 0.000 1.000) * 3) Temperature=71.5 8 4 sunny (0.375 0.125 0.500) 6) PlayDontPlay=Play 5 2 overcast (0.600 0.200 0.200) 12) Humidity=72.5 4 1 overcast (0.750 0.250 0.000) 24) Temperature=78 2 0 overcast (1.000 0.000 0.000) * 25) Temperature 78 2 1 overcast (0.500 0.500 0.000) 50) Temperature 73.5 1 0 overcast (1.000 0.000 0.000) * 51) Temperature=73.5 1 0 rain (0.000 1.000 0.000) * 13) Humidity 72.5 1 0 sunny (0.000 0.000 1.000) * 7) PlayDontPlay=DontPlay 3 0 sunny (0.000 0.000 1.000) * Andy _ From: [EMAIL PROTECTED] on behalf of Alan Lapedes Sent: Sun 4/16/2006 5:14 PM To: r-help@stat.math.ethz.ch Subject: [R] Problem getting R's decision tree for Quinlan's golf example data [Broadcast] Newbie question, but I've checked archives etc. Am trying to reproduce in R Quinlan's trivial example of the golf decision tree. The data file of 14 examples follows (read in via read.table()): Outlook Temperature Humidity Windy PlayDontPlay 1 sunny 85 85 false DontPlay 2 sunny 80 90 true DontPlay 3 overcast 83 78 false Play 4 rain 70 96 false Play 5 rain 68 80 false Play 6 rain 65 70 true DontPlay 7 overcast 64 65 true Play 8 sunny 72 95 false DontPlay 9 sunny 69 70 false Play 10 rain 75 80 false Play 11 sunny 75 70 true Play 12 overcast 72 90 true Play 13 overcast 81 75 false Play 14 rain 71 80 true DontPlay R reports no format or other trivial problems: summary(golf) Outlook Temperature Humidity Windy PlayDontPlay overcast:4 Min. :64.0 Min. :65.0 false:8 DontPlay:5 rain:5 1st Qu.:69.2 1st Qu.:71.2 true :6 Play:9 sunny :5 Median :72.0 Median :80.0 Mean :73.6 Mean :80.3 3rd Qu.:78.8 3rd Qu.:88.8 Max. :85.0 Max. :96.0 I then try to build a decision tree: golf.rpart - rpart(PlayDontPlay ~ Outlook + Temperature + Humidity + Windy, method=class, data=golf) which doesn't yield a tree: golf.rpart n= 14 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 14 5 Play (0.35714 0.64286) * plot(golf.rpart) Error in plot.rpart(golf.rpart) : fit is not a tree, just a root Thanks, Alan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] autoscall the y-axis
Dear R users I need to auto scale the left y axis in the code below, so that when I scroll left or right the left y-axis scale changes to accumulate the range of the displayed data with in the max hight of the y-axis. also how can I make the crosshair horizontal since it is only vertical in this code. this code with a kind help from Gregory (Greg) L. Snow Ph.D. just stated with tcl/tk and waiting for the book to arrive. thank you # code starts library(tkrplot) L - length(myData); tt - tktoplevel()#cursor='crosshair') left - tclVar(1) oldleft - tclVar(1) right - tclVar(L) cury - tclVar(' ') curx - NA tmpusr - numeric(4) tmpplt - numeric(4) f1 - function(){ lleft - as.numeric(tclvalue(left)) rright - as.numeric(tclvalue(right)) x - seq(lleft,rright,by=1) par(bg='black', fg='green', col='white', col.axis='white', col.lab='magenta', col.main='blue', col.sub='cyan') plot(x,myData[x], type='s', ylim=range(myData)) ## par(new=TRUE) ## plot(x,myData2[x], type='s', col='yellow',axes=F) par(new=TRUE) plot(x,myData3[x], type='s', col='cyan',axes=F) axis(4) tmpusr - par('usr') tmpplt - par('plt') if(!is.na(curx)){ abline(v=curx, col='red', lty=2) points(curx,myData[curx],pch=16,col='red') } } img - tkrplot(tt, f1,hscale=2,vscale=1.2) tkconfigure(img, cursor='crosshair') f2 - function(...){ ol - as.numeric(tclvalue(oldleft)) tclvalue(oldleft) - tclvalue(left) r - as.numeric(tclvalue(right)) tclvalue(right) - as.character(r + as.numeric(...) - ol) tkrreplot(img) } f3 - function(...){ tkrreplot(img) } f4 - function(...){ ol - as.numeric(tclvalue(oldleft)) tclvalue(left) - as.character(ol+10) tclvalue(oldleft) - as.character(ol+10) r - as.numeric(tclvalue(right)) tclvalue(right) - as.character(r+10) tkrreplot(img) } iw - as.numeric(tcl('image','width', tkcget(img,'-image'))) ih - as.numeric(tcl('image','height',tkcget(img,'-image'))) mm - function(x,Y){ tx - (as.numeric(x)-1)/iw ty - 1-(as.numeric(Y)-1)/ih if( tx tmpplt[1] tx tmpplt[2] ty tmpplt[3] ty tmpplt[4] ){ newx - (tx-tmpplt[1])/(tmpplt[2]-tmpplt[1])*(tmpusr[2]-tmpusr[1])+tmpusr[1] curx - round(newx) tkrreplot(img) newy - myData[curx] newy - floor(newy) + (newy-floor(newy))*32/100 #newy2 - myData2[curx] newy3 - myData3[curx] tclvalue(cury) - paste('x =',curx,' myData=',round(newy,2) ,' myData3=',round(newy3,4)) } } tkbind(img, 'Motion', mm) l1 - tklabel(tt, textvariable=cury) s1 - tkscale(tt, command=f2, from=1, to=length(myData), variable=left, orient=horiz,label='left',length=700) s2 - tkscale(tt, command=f3, from=1, to=length(myData), variable=right, orient=horiz,label='right',length=700) b1 - tkbutton(tt, text='-', command=f4) tkpack(l1,img,s1,s2,b1) # code ends - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] interaction terms in formula of lm or glm
I would like to include pairwise interaction terms for lm(). For example, I want to include the quadratic term of variable V3. my.formula y ~ 1 + V1 + V3 + V3:V3 my.data y V1 V2 V3 1 31 1 42 140 2 32 0 43 120 3 33 0 57 150 4 34 0 55 132 foo - lm(my.formula, data = my.data) foo$coefficients (Intercept) V1 V3 29.47368421 -2.15789474 0.02631579 Why do the foo coefficients not include V3:V3 ? I thought that the variable V3:V3 has the values of squares of V3 elements, that is, V3:V3 140*140 120*120 Am I wrong? If I specify a fouth varaible V4 with square values of V3, and the formula is y ~ 1 + V1 + V3 + V4, it seems that the foo will give me different in and out-sample predictions. -Luke [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] interaction terms in formula of lm or glm
If you want the quadratic term, you need to pass it as an argument in function I(): y ~ 1 + V1 + V3 + I(V3^2) This is documented in ?formula -Christos -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luke Sent: Monday, April 17, 2006 12:08 AM To: R-help@stat.math.ethz.ch Subject: [R] interaction terms in formula of lm or glm I would like to include pairwise interaction terms for lm(). For example, I want to include the quadratic term of variable V3. my.formula y ~ 1 + V1 + V3 + V3:V3 my.data y V1 V2 V3 1 31 1 42 140 2 32 0 43 120 3 33 0 57 150 4 34 0 55 132 foo - lm(my.formula, data = my.data) foo$coefficients (Intercept) V1 V3 29.47368421 -2.15789474 0.02631579 Why do the foo coefficients not include V3:V3 ? I thought that the variable V3:V3 has the values of squares of V3 elements, that is, V3:V3 140*140 120*120 Am I wrong? If I specify a fouth varaible V4 with square values of V3, and the formula is y ~ 1 + V1 + V3 + V4, it seems that the foo will give me different in and out-sample predictions. -Luke [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html