Re: [R] stacking data frames with different variables
Perfect. Thanks Hadley! -Original Message- From: hadley wickham [mailto:[EMAIL PROTECTED] Sent: Sunday, September 09, 2007 10:11 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] stacking data frames with different variables Have a look at rbind.fill in the reshape package. Hadley On 9/9/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, If I need to stack two data frames, I can use rbind, but it requires that all variables exist in both sets. I can make that happen, but other stat packages would figure out where the differences were, add the missing variables to each, set their values to missing and stack them. Is there a more automatic way to do that in R? Below is an example program. Thanks, Bob # Top data frame has two variables. x - c(1,2) y - c(1,2) top - data.frame(x,y) top # Bottom data frame has only one of them. x - c(3,4) bottom - data.frame(x) bottom # So rbind won't work. rbind(top, bottom) # After figuring out where the mismatches are I can # make the two DFs the same manually. bottom - data.frame( bottom, y=NA) bottom # Now I get the desired result. both - rbind(top,bottom) both = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAs in indices
Thanks to both Charles and Jim for such helpful info. The help file ?[.data.frame is just great. Too bad it is so hard to find! I had used na.strings on read.table but had gotten it in my head that it was for numeric missing value codes. But of course, strings is strings! That took care of periods everywhere I was able to use my original approach to get rid of some 99's and 999's that applied only to certain columns (na.strings would zap them for all columns). Jim's suggestion to add which makes perfect sense. I really don't like the idea of referencing x[NA] even though x[c(T,T,F,F,NA,F)] might make it obvious which were wanted. I'm surprised I didn't get caught by that long ago. Cheers, Bob -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Sunday, September 02, 2007 2:33 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] NAs in indices On Sun, 2 Sep 2007, Muenchen, Robert A (Bob) wrote: Hi All, I'm fiddling with an program to read a text file containing periods that SAS uses for missing values. I know that if I had the original SAS data set instead of a text file, R would handle this conversion for me. Data frames do not allow missing values in their indices but vectors do. Why is that? A search of the error message points out the problem and solution but not why they differ. A simplified program that demonstrates the issue is below. Thanks, Bob # Here's a data frame that has both periods and NAs. # I want sex to remain character for now. sex=c(m,f,.,NA) x=c(1,2,3,NA) myDF - data.frame(sex,x,stringsAsFactors=F) rm(sex,x) myDF # Substituting NA into data frame does not work # due to NAs in the indices. The error message is: # missing values are not allowed in subscripted assignments of data frames myDF[ myDF$sex==., sex ] - NA myDF # This works because myDF$sex is a vector and vectors allow NAs in indexes. # Why don't data frames allow this? myDF$sex[ myDF$sex==. ] - NA myDF R version 2.5.1 'allows' it. df - as.data.frame(diag(3)[,-1]) df[ df[,1]==1 ] - NA df but the result may not be what you were expecting. See ?[.data.frame (esp. Details) for more info on why it does not 'work' as you expected. Also, since you mention a 'text file' I suggest you look at ?read.table or ?scan where you will see that dots.are.NA - read.table(my.file, na.strings = '.' ) may help you. Chuck = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED]UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093- 0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing transform to with
Gabor, That's very nice! I like your my.transform much better. Too bad about the incompatibility. Swapping that out would no doubt break some existing programs. I love that old joke, God was able to create the universe in just 6 days only because he didn't have an installed base to worry about! Cheers, Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Sunday, September 02, 2007 10:47 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Comparing transform to with Try this version of transform. In the first test we show it works on your example but we have used the head of the built in anscombe data set. The second and third show that it necessarily is incompatible with transform because transform always looks up variables in DF first whereas my.transform looks up the computed ones first. my.transform - function(DF, ...) { f - function(){} formals(f) - eval(substitute(as.pairlist(c(alist(...), DF body(f) - substitute(modifyList(DF, data.frame(...))) f() } # test a - head(anscombe) # 1 my.transform(a, sum1 = x1+x2+x3+x4, sum2 = y1+y2+y3+y4, total = sum1+sum2) # 2 my.transform(a, y2 = y1, y3 = y2) # 3 transform(a, y2 = y1, y3 = y2) # different On 9/1/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I've been successfully using the with function for analyses and the transform function for multiple transformations. Then I thought, why not use with for both? I ran into problems couldn't figure them out from help files or books. So I created a simplified version of what I'm doing: rm( list=ls() ) x1-c(1,3,3) x2-c(3,2,1) x3-c(2,5,2) x4-c(5,6,9) myDF-data.frame(x1,x2,x3,x4) rm(x1,x2,x3,x4) ls() myDF This creates two new variables just fine transform(myDF, sum1=x1+x2, sum2=x3+x4 ) This next code does not see sum1, so it appears that transform cannot see the variables that it creates. Would I need to transform new variables in a second pass? transform(myDF, sum1=x1+x2, sum2=x3+x4, total=sum1+sum2 ) Next I'm trying the same thing using with. It doesn't not work but also does not generate error messages, giving me the impression that I'm doing something truly idiotic: with(myDF, { sum1-x1+x2 sum2-x3+x4 total - sum1+sum2 } ) myDF ls() Then I thought, perhaps one of the advantages of transform is that it works on the left side of the equation without using a longer name like myDF$sum1. with probably doesn't do that, so I use the longer form below. It also does not work and generates no error messages. # Try it again, writing vars to myDF explicitly. # It generates no errors, and no results. with(myDF, { myDF$sum1-x1+x2 myDF$sum2-x3+x4 myDF$total - myDF$sum1+myDF$sum2 } ) myDF ls() I would appreciate some advice about the relative roles of these two functions why my attempts with with have failed. Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NAs in indices
Hi All, I'm fiddling with an program to read a text file containing periods that SAS uses for missing values. I know that if I had the original SAS data set instead of a text file, R would handle this conversion for me. Data frames do not allow missing values in their indices but vectors do. Why is that? A search of the error message points out the problem and solution but not why they differ. A simplified program that demonstrates the issue is below. Thanks, Bob # Here's a data frame that has both periods and NAs. # I want sex to remain character for now. sex=c(m,f,.,NA) x=c(1,2,3,NA) myDF - data.frame(sex,x,stringsAsFactors=F) rm(sex,x) myDF # Substituting NA into data frame does not work # due to NAs in the indices. The error message is: # missing values are not allowed in subscripted assignments of data frames myDF[ myDF$sex==., sex ] - NA myDF # This works because myDF$sex is a vector and vectors allow NAs in indexes. # Why don't data frames allow this? myDF$sex[ myDF$sex==. ] - NA myDF = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing transform to with
Hi All, I've been successfully using the with function for analyses and the transform function for multiple transformations. Then I thought, why not use with for both? I ran into problems couldn't figure them out from help files or books. So I created a simplified version of what I'm doing: rm( list=ls() ) x1-c(1,3,3) x2-c(3,2,1) x3-c(2,5,2) x4-c(5,6,9) myDF-data.frame(x1,x2,x3,x4) rm(x1,x2,x3,x4) ls() myDF This creates two new variables just fine transform(myDF, sum1=x1+x2, sum2=x3+x4 ) This next code does not see sum1, so it appears that transform cannot see the variables that it creates. Would I need to transform new variables in a second pass? transform(myDF, sum1=x1+x2, sum2=x3+x4, total=sum1+sum2 ) Next I'm trying the same thing using with. It doesn't not work but also does not generate error messages, giving me the impression that I'm doing something truly idiotic: with(myDF, { sum1-x1+x2 sum2-x3+x4 total - sum1+sum2 } ) myDF ls() Then I thought, perhaps one of the advantages of transform is that it works on the left side of the equation without using a longer name like myDF$sum1. with probably doesn't do that, so I use the longer form below. It also does not work and generates no error messages. # Try it again, writing vars to myDF explicitly. # It generates no errors, and no results. with(myDF, { myDF$sum1-x1+x2 myDF$sum2-x3+x4 myDF$total - myDF$sum1+myDF$sum2 } ) myDF ls() I would appreciate some advice about the relative roles of these two functions why my attempts with with have failed. Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
Gabor, That works great! I think this would be a very helpful addition to the main R distribution. Perhaps with a single colon representing numerical order (exactly as you have written it) and two colons representing the order of the variables as they appear in the data frame (your first example). That's analogous to SAS' x1-xN, which you know gets those N variables, and a--z, which selects an unknown number of variables a through z. How many that is depends upon their order in the data frame. That would not only be very useful in general, but it would also make transitioning to R from SAS or SPSS less confusing. Is R still being extended in such basic ways, or does that muck up existing programs too much? Thanks, Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 8:52 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Try this: %:% - function(x, y) { +prex - gsub([0-9], , x); postx - gsub([^0-9], , x) +prey - gsub([0-9], , y); posty - gsub([^0-9], , y) +stopifnot(prex == prey) +paste(prex, seq(from = as.numeric(postx), to = as.numeric(posty)), sep = ) + } x2 %:% x4 [1] x2 x3 x4 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Thanks Bert Gabor for two very interesting solutions! It would be very handy in R if string1:stringN generated string1,string2...stringN it would make selections like this much more obvious. I know it's easy to with the colon operator and paste function but that's quite a step up in complexity compared to SAS' x1 x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners face early in learning R. While on the subject of the colon operator, why doesn't anscombe[[1:4]] select the x variables in list form as anscombe[,1:4] or anscombe[1:4] do in data frame form? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 6:50 PM To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: RE: [R] subset using noncontiguous variables by name (not index) The problem is that x3:x5 does not mean what you think it means. The only reason it does the right thing in subset() is because a clever trick is used there (read the code -- it's not hard to understand) to ensure that it does. Gabor has essentially mimicked that trick in his solution. However, it is not necessary do this. You can construct the call directly as you tried to do. Using the anscombe example, here's how: chooz - c(x1,x3:x4,y2) ## enclose the desired expression in quotes do.call (subset, list( x = anscombe, select = parse(text = chooz))) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Sunday, August 26, 2007 2:10 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get
[R] FW: subset using noncontiguous variables by name (not index)
Thomas, that's a good point. I was thinking of anscombe[x1::y1] making it clear which one, but you would then want just x1::y1 to have unambiguous meaning on its own, which is impossible. As for x1:xN, it's unambiguous on its own. I thought one of the great advantages of R was that it could use different methods so that a new operator would not be needed. The colon operator would just have a new method for when stringN appeared. One that would be very useful have obvious meaning. Thanks, Bob -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 10:25 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: Gabor, That works great! I think this would be a very helpful addition to the main R distribution. Perhaps with a single colon representing numerical order (exactly as you have written it) and two colons representing the order of the variables as they appear in the data frame (your first example). That's analogous to SAS' x1-xN, which you know gets those N variables, and a--z, which selects an unknown number of variables a through z. How many that is depends upon their order in the data frame. That would not only be very useful in general, but it would also make transitioning to R from SAS or SPSS less confusing. Is R still being extended in such basic ways, or does that muck up existing programs too much? In principle base R can be extended like that, but a strong case is needed for non-standard evaluation rules and for depleting the restricted supply of short binary operator names. The reason for subset() and its behaviour is that 'variables as they appear the in data frame' is typically ambiguous -- which data frame? In SPSS you have only one and in SAS there is a default one, so there is no ambiguity in X1--Y2, but in R it needs another argument specifying the data frame, so it can't really be a binary operator. The double colon :: and triple colon ::: are already used for namespaces, and a search of r-help reveals two previous, different, suggestions for %:%. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
Thanks for helping me see why R doesn't have the obvious! -Bob -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 2:12 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] subset using noncontiguous variables by name (not index) On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: Thomas, that's a good point. I was thinking of anscombe[x1::y1] making it clear which one, but you would then want just x1::y1 to have unambiguous meaning on its own, which is impossible. As for x1:xN, it's unambiguous on its own. It actually isn't. We already have a meaning. Consider x1-4 xN-6 x1:xN It also breaks R's argument passing rules by treating x1 as string rather than a name. What would be unambiguous at the moment is x1:x4, provided there was a sufficiently precise set of rules on what was allowed. Consider x1:x-1(negative?) x1:x3.14 (non-integer?) x3.12:x3.14 (is the prefix x or x3.?) x1:X4 (the prefix changes) 01:14 (is the prefix empty or 0?) x09:xA2 (is this illegal decimal or legal hexadecimal?) IL23R1:IL23R4 (what is the prefix?) x1a:x4a(infix numbering?) -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset using noncontiguous variables by name (not index)
Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 - c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 - x1,x3:x5,x7 myVars2 # This doesn't work either. summary( subset(mydata,select=myVars2 ) ) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
Thanks Bert Gabor for two very interesting solutions! It would be very handy in R if string1:stringN generated string1,string2...stringN it would make selections like this much more obvious. I know it's easy to with the colon operator and paste function but that's quite a step up in complexity compared to SAS' x1 x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners face early in learning R. While on the subject of the colon operator, why doesn't anscombe[[1:4]] select the x variables in list form as anscombe[,1:4] or anscombe[1:4] do in data frame form? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Sunday, August 26, 2007 6:50 PM To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: RE: [R] subset using noncontiguous variables by name (not index) The problem is that x3:x5 does not mean what you think it means. The only reason it does the right thing in subset() is because a clever trick is used there (read the code -- it's not hard to understand) to ensure that it does. Gabor has essentially mimicked that trick in his solution. However, it is not necessary do this. You can construct the call directly as you tried to do. Using the anscombe example, here's how: chooz - c(x1,x3:x4,y2) ## enclose the desired expression in quotes do.call (subset, list( x = anscombe, select = parse(text = chooz))) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Sunday, August 26, 2007 2:10 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 - c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 - x1,x3:x5,x7 myVars2 # This doesn't work either. summary( subset(mydata,select=myVars2 ) ) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html
[R] Saving results from Linux command line
Hi All, I'm used to running R on Windows learning Linux. I know ESS is the way to go in the long run, but I'm trying now to just understand the command line. I can interactively enter commands, see the results on the screen and save input output to myresults.txt with this approach: $script myresults.txt $R ...r commands... q() $exit I can also use the Linux tee command to do essentially the same thing. Both of those approaches do what I want, but I assume there is a way to do it within R. I've been through AITR Appendix B and the FAQ looking for either a startup option or an R function to do this but I don't see either. What am I missing? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager, Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving results from Linux command line
I certainly appreciate those advantages, but I feel I'm missing something very basic. I would have expected a function like save.transcript or save.console to be able to write out the console's contents. I see a similar situation in the Windows GUI. There is the menu choice Save Workspace and the matching function save.image. In the console window, there is the menu choice File Save to file but I don't see an equivalent function. Is are there functions for all menu choices in R? Thanks, Bob -Original Message- From: Richard M. Heiberger [mailto:[EMAIL PROTECTED] Sent: Friday, August 24, 2007 10:01 AM To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch Subject: Re: [R] Saving results from Linux command line Go for the best and do it with ESS. ESS understands the file extension myfile.rt (not myfile.txt, which is generic) as an R transcript and therefore font-locks it for the R syntax and is able to resend multiple-line statements with a single ENTER. Within emacs, you can save the *R* buffer to a myfile.rt file (you can also save the R transcript as myfile.rt running inside a *shell* buffer, but that is silly at this point). Plus you get syntax highlighting and the other features on your myfile.R file. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving results from Linux command line
I looked long and hard for that information. Thank you VERY much! -Bob -Original Message- From: Richard M. Heiberger [mailto:[EMAIL PROTECTED] Sent: Friday, August 24, 2007 1:52 PM To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch Subject: Re: [R] Saving results from Linux command line There can't be functions in the R language to save the transcript of a session. In this respect R is a filter. It takes an input stream of text and returns an output stream of text. R doesn't remember the streams. The Windows RGui remembers them. The ESS *R* buffer remembers them. Any terminal emulator could in principle remember them. R itself can't. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving results from Linux command line
As the help files says, ...like the Unix program tee. I thought sink only diverted to a file. Thanks! -Bob -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Friday, August 24, 2007 2:17 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Saving results from Linux command line There could still be functions that divert a copy of all the output to a file, for example. And indeed there are. sink(transcript.txt, split=TRUE) -thomas On Fri, 24 Aug 2007, Muenchen, Robert A (Bob) wrote: I looked long and hard for that information. Thank you VERY much! - Bob -Original Message- From: Richard M. Heiberger [mailto:[EMAIL PROTECTED] Sent: Friday, August 24, 2007 1:52 PM To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch Subject: Re: [R] Saving results from Linux command line There can't be functions in the R language to save the transcript of a session. In this respect R is a filter. It takes an input stream of text and returns an output stream of text. R doesn't remember the streams. The Windows RGui remembers them. The ESS *R* buffer remembers them. Any terminal emulator could in principle remember them. R itself can't. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] length, mean, na.rm, na.omit...
Hi All, Can anyone tell me why the length function does not use na.rm? I know how to work around it, I'm just curious to know why such a useful option was left out. I'm also interested in the logic of setting na.rm=TRUE as the default on mean, sd, etc. This is the opposite of the many other stat packages I have used, so I assume it provides some programming benefit that is not obvious to me. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] do.call vs. lapply for lists
Hi All, I'm trying to understand the difference between do.call and lapply for applying a function to a list. Below is one of the variations of programs (by Marc Schwartz) discussed here recently to select the first and last n observations per group. I've looked in several books, the R FAQ and searched the archives, but I can't find enough to figure out why lapply doesn't do what do.call does in this case. The help files newsletter descriptions of do.call sound like it would do the same thing, but I'm sure that's due to my lack of understanding about their specific terminology. I would appreciate it if you could take a moment to enlighten me. Thanks, Bob mydata - data.frame( id = c('001','001','001','002','003','003'), math= c(80,75,70,65,65,70), reading = c(65,70,88,NA,90,NA) ) mydata mylast - lapply( split(mydata,mydata$id), tail, n=1) mylast class(mylast) #It's a list, so lapply will so *something* with it. #This gets the desired result: do.call(rbind, mylast) #This doesn't do the same thing, which confuses me: lapply(mylast,rbind) #...and data.frame won't fix it as I've seen it do in other circumstances: data.frame( lapply(mylast,rbind) ) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do.call vs. lapply for lists
Marc, That makes the difference between do.call and lapply crystal clear. Your explanation would make a nice FAQ entry. Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Monday, April 09, 2007 1:06 PM To: Muenchen, Robert A (Bob) Cc: R-help@stat.math.ethz.ch Subject: Re: do.call vs. lapply for lists On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote: Hi All, I'm trying to understand the difference between do.call and lapply for applying a function to a list. Below is one of the variations of programs (by Marc Schwartz) discussed here recently to select the first and last n observations per group. I've looked in several books, the R FAQ and searched the archives, but I can't find enough to figure out why lapply doesn't do what do.call does in this case. The help files newsletter descriptions of do.call sound like it would do the same thing, but I'm sure that's due to my lack of understanding about their specific terminology. I would appreciate it if you could take a moment to enlighten me. Thanks, Bob mydata - data.frame( id = c('001','001','001','002','003','003'), math= c(80,75,70,65,65,70), reading = c(65,70,88,NA,90,NA) ) mydata mylast - lapply( split(mydata,mydata$id), tail, n=1) mylast class(mylast) #It's a list, so lapply will so *something* with it. #This gets the desired result: do.call(rbind, mylast) #This doesn't do the same thing, which confuses me: lapply(mylast,rbind) #...and data.frame won't fix it as I've seen it do in other circumstances: data.frame( lapply(mylast,rbind) ) Bob, A key difference is that do.call() operates (in the above example) as if the actual call was: rbind(mylast[[1]], mylast[[2]], mylast[[3]]) id math reading 3 001 70 88 4 002 65 NA 6 003 70 NA In other words, do.call() takes the quoted function and passes the list object as if it was a list of individual arguments. So rbind() is only called once. In this case, rbind() internally handles all of the factor level issues, etc. to enable a single common data frame to be created from the three independent data frames contained in 'mylast': str(mylast) List of 3 $ 001:'data.frame':1 obs. of 3 variables: ..$ id : Factor w/ 3 levels 001,002,003: 1 ..$ math : num 70 ..$ reading: num 88 $ 002:'data.frame':1 obs. of 3 variables: ..$ id : Factor w/ 3 levels 001,002,003: 2 ..$ math : num 65 ..$ reading: num NA $ 003:'data.frame':1 obs. of 3 variables: ..$ id : Factor w/ 3 levels 001,002,003: 3 ..$ math : num 70 ..$ reading: num NA On the other hand, lapply() (as above) calls rbind() _separately_ for each component of mylast. It therefore acts as if the following series of three separate calls were made: rbind(mylast[[1]]) id math reading 3 001 70 88 rbind(mylast[[2]]) id math reading 4 002 65 NA rbind(mylast[[3]]) id math reading 6 003 70 NA Of course, the result of lapply() is that the above are combined into a single R list object and returned: lapply(mylast, rbind) $`001` id math reading 3 001 70 88 $`002` id math reading 4 002 65 NA $`003` id math reading 6 003 70 NA It is a subtle, but of course critical, difference in how the internal function is called and how the arguments are passed. Does that help? Regards, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
The Exegesis paper gave me a great look at the history of all this. I had not been aware that S-PLUS had gone that route. There is much to be said for knowing you might be more successful but sticking to your perspective instead. And in the long run, that may be the more successful route anyway. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 5:27 PM To: Douglas Bates; Muenchen, Robert A (Bob) Cc: R-help@stat.math.ethz.ch Subject: RE: [R] how to get lsmeans? From: Douglas Bates On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. You may get strong reactions to such a suggestion. I recommend reading Bill Venables' famous unpublished paper Exegeses on linear models (google for the title - very few people use Exegeses and linear models in the same sentence - in fact I would not be surprised if Bill was the only one who has ever done so). It's on the MASS page: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf I believe it's based on a talk Bill gave at a S-PLUS User's Conference. I think it deserves to be required reading for all graduate level linear models course. You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of give me every possible statistic that could be calculated from this model, whether or not it makes sense. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it. The fact that it is so difficult to explain what lsmeans are and why they would be of interest is an indication of why they aren't implemented in any of the required packages. Perhaps I should have made it clear in my original post: I gave the example and code more to show what the mysterious least squares means are (which John explained lucidly), than how to replicate what SAS (or JMP) outputs. I do not understand how people can feel comfortable reporting things like lsmeans and p-values from type insert your favorite Roman numeral here tests when they do not know how such things arise or, at the very least, what they _really_ mean. (Given how simple lsmeans are computed, not knowing how to compute them is pretty much the same as not knowing what they are.) One of the dangers of wholesale output as SAS or SPSS gives is for the user to simply pick an answer and run with it, without understanding what that answer is, or if it corresponds to the question of interest. As to whether to weight the levels of the factors being held constant, my suggestion to John would be to offer both choices (unweighted and weighted by observed frequencies). I can see why one would want to weight by observed frequencies (if the data are sampled from a population), but there are certainly situations (perhaps more often than not in the cases I've encountered) that the observed frequencies do not come close to approximating what they are in the population. In such cases the unweighted average would make more sense to me. Cheers, Andy -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding
Re: [R] how to get lsmeans?
Hi All, Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. Of course this is easy for me to say, as I'm not the one who does the work! Much thanks to those who do. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse for not yet doing this is that I want to undertake a more general revision of the effects package, and haven't had time to do it. There are several changes that I'd like to make to the package. For example, I have results for multinomial and proportional odds logit models (described in a paper by me and Bob Andersen in the 2006 issue of Sociological Methodology) that I want to incorporate, and I'd like to improve the appearance of the default graphs. But Brian's suggestion is very straightforward, and I guess that I shouldn't wait to implement it; I'll do so very soon. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604
Re: [R] Select the last two rows by id group
Marc, thanks for so many great variations! I especially like: tail(sort(table(DF$County))) I often have frequency tables that are of interest only towards the end. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 20, 2007 8:58 PM To: Muenchen, Robert A (Bob) Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Select the last two rows by id group On Tue, 2007-03-20 at 11:53 -0400, Muenchen, Robert A (Bob) wrote: Very nice! This is almost duplicates the SAS first.var and last.var ability to choose the first and last observations by group(s). Substituting the head function in where Marc has the tail function below will adapt it to the first n. It is more flexible than the SAS approach because it can do the first/last n rather than just the single first or last. Let's say we want to choose the last observation in a county, and counties have duplicate names in different states. You could sort by state, then county, then use only county where Marc uses score$id in his last example below, and it would get the last record for *every* county regardless of duplicates. Does this sound correct? That's a handy bit of code! Cheers, Bob Bob, You can test it using data here: DF - read.csv(http://www.nws.noaa.gov/nwr/SameCode.txt;, header = FALSE) colnames(DF) - c(Code, County, State) str(DF) 'data.frame': 3288 obs. of 3 variables: $ Code : int 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ... $ County: Factor w/ 1996 levels Abbeville,Acadia,..: 97 105 116 169 186 249 259 272 326 348 ... $ State : Factor w/ 60 levels AK,AL,AR,..: 2 2 2 2 2 2 2 2 2 2 ... The data is already sorted by State and then County. system.time(DF.tail - do.call(rbind, lapply(split(DF, DF$County), tail, 1))) [1] 6.851 0.085 7.085 0.000 0.000 str(DF.tail) 'data.frame': 1996 obs. of 3 variables: $ Code : int 45001 22001 16001 40001 55001 50001 72001 72003 72005 72007 ... $ County: Factor w/ 1996 levels Abbeville,Acadia,..: 1 2 3 4 5 6 7 8 9 10 ... $ State : Factor w/ 60 levels AK,AL,AR,..: 48 22 17 42 58 56 45 45 45 45 ... # How many unique county names in the source dataset? length(unique(DF$County)) [1] 1996 # Are they all the same unique counties? all(DF.tail$County == sort(unique(DF$County))) [1] TRUE It is curious to see just how many duplicates there are. For example: tail(sort(table(DF$County))) MadisonJacksonLincoln Franklin Jefferson Washington 20 24 24 25 26 31 subset(DF, County == Washington) Code County State 651129 WashingtonAL 181 5143 WashingtonAR 304 8121 WashingtonCO 385 12133 WashingtonFL 535 13303 WashingtonGA 593 16087 WashingtonID 688 17189 WashingtonIL 783 18175 WashingtonIN 879 19183 WashingtonIA 987 20201 WashingtonKS 1106 21229 WashingtonKY 1167 22117 WashingtonLA 1189 23029 WashingtonME 1211 24043 WashingtonMD 1393 27163 WashingtonMN 1474 28151 WashingtonMS 1590 29221 WashingtonMO 1740 31177 WashingtonNE 1883 36115 WashingtonNY 1981 37187 WashingtonNC 2124 39167 WashingtonOH 2202 40147 WashingtonOK 2239 41067 WashingtonOR 2304 42125 WashingtonPA 2313 44009 WashingtonRI 2515 47179 WashingtonTN 2759 48477 WashingtonTX 2800 49053 WashingtonUT 2814 50023 WashingtonVT 2904 51191 WashingtonVA 3108 55131 WashingtonWI # The last state with Washington County (my neighbors, the Cheeseheads) was in the result set subset(DF.tail, County == Washington) Code County State Washington 55131 WashingtonWI subset(DF, County == Allen) Code County State 697 18003 AllenIN 887 20001 AllenKS 993 21003 AllenKY 1113 22003 AllenLA 2042 39003 AllenOH # The last state with Allen County (OH) was in the result set subset(DF.tail, County == Allen) Code County State Allen 39003 AllenOH Just noticed a Big Ten theme there...Go Gophers! ;-) So, it would seem that your hypothesis is correct, at least in this limited testing. I would want to validate it more rigorously of course. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r
Re: [R] Select the last two rows by id group
Very nice! This is almost duplicates the SAS first.var and last.var ability to choose the first and last observations by group(s). Substituting the head function in where Marc has the tail function below will adapt it to the first n. It is more flexible than the SAS approach because it can do the first/last n rather than just the single first or last. Let's say we want to choose the last observation in a county, and counties have duplicate names in different states. You could sort by state, then county, then use only county where Marc uses score$id in his last example below, and it would get the last record for *every* county regardless of duplicates. Does this sound correct? That's a handy bit of code! Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Tuesday, March 20, 2007 10:59 AM To: Lauri Nikkinen Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Select the last two rows by id group On Tue, 2007-03-20 at 16:33 +0200, Lauri Nikkinen wrote: Hi R-users, Following this post http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html , how do I get last two rows (or six or ten) by id group out of the data frame? Here the example gives just the last row. Sincere thanks, Lauri A slight modification to Gabor's solution: score id reading math 1 1 65 80 2 1 70 75 3 1 88 70 4 2 NA 65 5 3 90 65 6 3 NA 70 # Return the last '2' rows # Note the addition of unlist() score[unlist(tapply(rownames(score), score$id, tail, 2)), ] id reading math 2 1 70 75 3 1 88 70 4 2 NA 65 5 3 90 65 6 3 NA 70 Note that when tail() returns more than one value, tapply() will create a list rather than a vector: tapply(rownames(score), score$id, tail, 2) $`1` [1] 2 3 $`2` [1] 4 $`3` [1] 5 6 Thus, we need to unlist() the indices to use them in the subsetting process that Gabor used in his solution. Another alternative, if the rownames do not correspond to the sequential row indices as they do in this example: do.call(rbind, lapply(split(score, score$id), tail, 2)) id reading math 1.2 1 70 75 1.3 1 88 70 22 NA 65 3.5 3 90 65 3.6 3 NA 70 This uses split() to create a list of data frames from score, where each data frame is 'split' by the 'id' column values. tail() is then applied to each data frame using lapply(), the results of which are then rbind()ed back to a single data frame. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS, SPSS Product Comparison Table
Hi All, Thanks to lots of good ideas from R-helpers, I've polished up the table and posted it here: http://oit.utk.edu/scc/RforSASSPSSproducts.pdf To be consistent with its product orientation, I dropped mixed models (it's not a separate product in either SAS or SPSS). I also added SAS/QC and links to similar pages such as CRAN's Task Views. People (especially Patrick Burns) sent the following list of topics that are not SAS or SPSS products, but which might make good additions to Task Views: resampling techniques: boot, coin (and many others) report generation: R (the Sweave function) neural networks: nnet, AMORE, neural, grnnR finance: Rmetrics, portfolio (and several more) designed experiments: BHH2, blockrand, conf.design, spc Bayesian: BRugs, R2WinBUGS, bayesm (and many more) circular statistics: CircStats, circular robustness: R and many packages medical imaging: DICOM, AnalyzeFMRI, fmri functional data analysis: fda, MFDA Robust spatial statistics: spatial, spatstat, pastecs, fields, geoR (and more) Markov chain Monte Carlo: MCMCpack, mcmc meta-analysis: meta graphical models: mimR, ggm Mixed Models: lmer, nlme, lme4 mixture models: mixreg, mixtools pharmacokinetics: PK, PKfit, PKtools musicology: tuneR sudoku: sudoku Frank Harrell made an excellent suggestion that this be a page at the R-wiki. It's unlikely that any one person would know all these areas so it might work out if everyone could edit the sections they know. If anyone wants to put it up there, let me know I'll be happy to send it to you in any form you like. I expect once a table format was established editing it would be easy. I acknowledged everyone who wrote at the bottom of the table. If I forgot anyone, it was an oversight. Drop me a line I'll put you on there. Thanks again to everyone for all the help! Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] JGR data editor question
Hi All, I'm learning JGR 1.4-15 with R 2.4.1 in Windows XP (all patches applied). JGR looks great but I'm having trouble getting the data editor to save my results. I don't see anything in R-help about it. Here are the steps I followed: 1. I chose ToolsObject Browser double-clicked on a data frame, mydata. 2. A spreadsheet editor popped up and allowed me to make changes. 3. I clicked Update at the bottom right of the data editor screen. 4. It asked, Export to R? and has Export as: mydata filled in. 5. I clicked Yes and then closed the window by clicking the usual [X] in the top right corner. 6. Double-clicking the data file again opened it back up but the changes were gone. Am I missing a step? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] JGR data editor question
That's it! I tried an absurd number of variations, but never that! I had only changed one value and never left the cell. I assumed hitting Enter would do it. Thanks! -Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Jim Porzak [mailto:[EMAIL PROTECTED] Sent: Saturday, February 10, 2007 11:46 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] JGR data editor question Hi Bob, I can not reproduce your problem, with possible exception in your step 2: In data editor, you need to click off of the last cell you edited for the changes to take On 2/10/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm learning JGR 1.4-15 with R 2.4.1 in Windows XP (all patches applied). JGR looks great but I'm having trouble getting the data editor to save my results. I don't see anything in R-help about it. Here are the steps I followed: 1. I chose ToolsObject Browser double-clicked on a data frame, mydata. 2. A spreadsheet editor popped up and allowed me to make changes. 3. I clicked Update at the bottom right of the data editor screen. 4. It asked, Export to R? and has Export as: mydata filled in. 5. I clicked Yes and then closed the window by clicking the usual [X] in the top right corner. 6. Double-clicking the data file again opened it back up but the changes were gone. Am I missing a step? Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HTH, Jim Porzak Loyalty Matrix Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAS, SPSS Product Comparison Table
Hi All, My paper R for SAS and SPSS Users received a bit more of a reaction than I expected. I posted the link (http://oit.utk.edu/scc/RforSASSPSSusers.pdf) about 12 days ago on R-help and the equivalent SAS and SPSS lists. Since then people have downloaded it 5,503 times and I've gotten lots of questions along the lines of, Surely R can't do for free what [fill in a SAS or SPSS product here] does? To try to address those, I've compiled a table that is organized by the product categories SAS and SPSS offer. Keep in mind that I still know far more about SAS and SPSS than I do about R, so I could really use some help with this. The table is below in tabbed form. I would appreciate it if the many R gurus out there would look it over and send suggestions. I'll add it as an appendix when it's done (well, as done as a moving target like this ever is!) Thanks, Bob Topic SAS Product SPSS ProductR Package Advanced Models SAS/STATSPSS Advanced Models(tm)R Automated Data Preparation NoneSPSS Data Preparation(tm) None? Automated Forecasting SAS Forecast Studio DecisionTime/WhatIf(tm) None? Basics SAS SPSS Base(tm) R Conjoint Analysis SAS/STAT: Transreg SPSS Conjoint(tm) Acepack? Correspondence Analysis SAS/STAT: Corresp SPSS Categories(tm) Homals, MASS, FactoMineR, ade4, PTAk, ccoresp, vegan, made4,PsychoR Custom Tables Base: Proc Tabulate SPSS Custom Tables(tm) reshape Data Mining Enterprise MinerClementine Rattle Exact Tests SAS/STAT: various SPSS Exact Tests(tm) exactLoglinTest GeneticsSAS/Genetics, SAS/Microarray Solution, JMP Genomics NoneBioconductor GIS/Mapping SAS/GIS SPSS Maps(tm) maps Graphical User InterfaceEnterprise GuideSPSSJGR, R Commander, pmg, Sciviews GraphicsSAS/GRAPH(r)SPSS Base(tm) R, ggplot Guided Analysis SAS/LAB NoneNone Matrix/Linear Algebra SAS/IML(tm), SAS/IML Workshop SPSS Matrix(tm) R Missing Values Imputation SAS/STAT: Proc MI SPSS Missing Values Analysis(tm) aregImpute (Hmisc), fit.mult.impute (Design) Mixed ModelsProc Mixed SPSS Advanced Modelslmer Operations Research SAS/OR NoneTSP Power Analysis SAS/STAT: Power,GLM Power SamplePower(tm) asypow, powerpkg, pwr Regression Models SAS/BASESPSS Regression Models(tm) R Sampling, Nonrandom SAS/STAT: surveymeans, etc. SPSS Complex Samples(tm) survey Structural EquationsSAS/STAT: Calis Amos(tm)sem Text Analysis Text Miner SPSS Text Analysis for Surveys(tm) tm Time Series SAS/ETS(tm) SPSS Trends(tm) ArDec, brainwaver, dyn, fame, Systemfit, tsDyn, tseries, tseriesChaos, tsfa, urca, uroot Trees, Decision or Regression Enterprise MinerSPSS Classification Trees(tm), AnswerTree(tm)tree, rpart Visualization SAS/INSIGHT Nonerggobi, GGobi = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in Industry
That sounds like a good idea. The name R makes it especially hard to find job postings, resumes or do any other type of search. Googling resume+sas or job opening+sas is quick and fairly effective (less a few airline jobs). Doing that with R is of course futile. At the risk of getting flamed, it's too bad it's not called something more unique such as Rpackage, Rlanguage, etc. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold Sent: Tuesday, February 06, 2007 2:08 PM To: R-help@stat.math.ethz.ch Subject: [R] R in Industry The other day, CNN had a story on working at Google. Out of curiosity, I went to the Google employment web site (I'm not looking, but just curious). In perusing their job posts for statisticians, preference is given to those who use R and python. Other languages, S-Plus and something called SAS were listed as lower priorities. When I started using Python, I noted they have a portion of the web site with job postings. CRAN does not have something similar, but think it might be useful. I think R is becoming more widely used in industry and I wonder if helping it move along a bit, the maintainer of CRAN could create a section of the web site devoted to jobs where R is a requirement. Hence, we could have our own little monster.com kind of thing going on. Of the multitude of ways the gospel can be spread, this is small. But, I think every small step forward is good. Anyone think this is useful? Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for SAS SPSS Users Document
Julien Barnier wrote: ... I think it will be very useful to me, even if I will use it the reverse way : learn how to use SAS from R... I hadn't thought of using the document in reverse to learn SAS or SPSS if you already know R. I'll have to reread it from that perspective see if there are any changes I can make to help in that direction without a total rewrite. If anyone has any suggestions along those lines, please send them my way. Thanks for the PDF tip. Several people suggested that. I thought cutting pasting examples would be important, which is not as easy from PDF. OpenOffice can open the .doc version on Linux if you use that. I have added a PDF version at the same link ending in PDF: http://oit.utk.edu/scc/RforSASSPSSusers.pdf Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Julien Barnier Sent: Wednesday, January 31, 2007 3:20 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] R for SAS SPSS Users Document Hi, I am pleased to announce the availability of the document, R for SAS and SPSS Users, at http://oit.utk.edu/scc/RforSASSPSSusers.doc I've looked at the document and printed it. I think it will be very useful to me, even if I will use it the reverse way : learn how to use SAS from R... As I am far from an R expert, I will not be able to give you good advices on R code. But maybe you would have had more comments on your tutorial if you had given the link to the PDF version instead of the MSWord one : http://oit.utk.edu/scc/RforSASSPSSusers.pdf Thanks again for your document, -- Julien __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spss.get. Warning with SPSS 14 dataset
Here's a warning about that: http://tolstoy.newcastle.edu.au/R/help/04/12/8827.html Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Kane Sent: Friday, January 26, 2007 11:49 AM To: R R-help Subject: [R] spss.get. Warning with SPSS 14 dataset I am using spss.get to import an SPSS database Data.sav, created with SPSS 14 : df1 - spss.get(C:/temp/Data.sav , lowernames=TRUE, datevars = c(dateinte)) I am getting this warning. I get the same warning with read.spss. Warning message: C:/temp/Data.sav: Unrecognized record type 7, subtype 16 encountered in system file This is a stupid question but should I be worried about it? So far the data looks clean but it is not my data base originally and I wondered if there is anything specific that I should be checking for. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R for SAS SPSS Users Document
Greetings, I am pleased to announce the availability of the document, R for SAS and SPSS Users, at http://oit.utk.edu/scc/RforSASSPSSusers.doc . It presents an introductory view of R for people who already know SAS and/or SPSS. Included are 27 programs written in all three languages (i.e. 81 total) so that people can see how R works compared to the other two, task by task. I would appreciate it if folks with far more R expertise than I have could review it and provide advice on ways to improve programming examples or wording. The wording was challenging since the jargon used by the three packages differs so much. I'm sure there is much room for improvement. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregation using list with Hmisc summarize function
Hi All, I'm using the Hmisc summarize function and used list instead of llist to provide the by variables. It generated an error message. Is this a bug, or do I misunderstand how Hmisc works with lists? The program below demonstrates the error message. Thanks, Bob x-1:8 group - c(1,1,1,1,2,2,2,2) gender- c(1,2,1,2,1,2,1,2) mydata-data.frame(x,group,gender) attach(mydata) # Creating a list using Hmisc llist works: summarize(x, by=llist(group,gender), FUN=mean, na.rm=TRUE) # Creating a list using built-in list function does not: summarize(x, by= list(group,gender), FUN=mean, na.rm=TRUE) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , News: http://listserv.utk.edu/archives/statnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Switching labels on a factor
Chris, Argh!!! I was writing a reply just now insisting that the output below makes no sense when it finally hit me: the first line of output from the unclass function is just the data and bears no relationship whatsoever with the order of the m and f below it. I had gotten the idea that it picked up on the first value and so displayed the label to match. I don't even want to think about how much time I spent working on a problem that was nonexistent! Thank you very much for your help! Bob unclass(mydata$gR) [1] 2 2 2 2 1 1 1 1 attr(,levels) [1] m f = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Chris Andrews [mailto:[EMAIL PROTECTED] Sent: Monday, December 18, 2006 12:05 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Switching labels on a factor Bob, This is I think exactly what one wants to have happen. The first four observations are still women. Both the labels and the underlying integers should change. (If you want to give all the people sex changes, try Relevel in the Epi package. mydata$afterthechange - Relevel(mydata$gender, list(m=f, f=m)) mydata workshop gender q1 q2 q3 q4 gR afterthechange 11 f 1 1 5 1 f m 22 f 2 1 4 1 f m 31 f 2 2 4 3 f m 42 f 3 1 NA 3 f m 51 m 4 5 2 4 m f 62 m 5 4 5 5 m f 71 m 5 3 4 4 m f 82 m 4 5 5 NA m f unclass(mydata$afterthechange) [1] 1 1 1 1 2 2 2 2 attr(,levels) [1] m f Chris Date: Fri, 15 Dec 2006 15:34:15 -0500 From: Muenchen, Robert A (Bob) [EMAIL PROTECTED] Subject: [R] Switching labels on a factor To: R-help@stat.math.ethz.ch Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII Hi All, I'm perplexed by the way the unclass function displays a factor whose labels have been swapped with the relevel function. I realize it won't affect any results and that the relevel did nothing useful in this particular case. I'm just doing it to learn ways to manipulate factors. The display of unclass leaves me feeling that the relevel had failed. I've checked three books searched R-help, but found no mention of this particular issue. The program below demonstrates the problem. Is this a bug, or is there a reason for it to work this way? Thanks, Bob mystring- (id,workshop,gender,q1,q2,q3,q4 1,1,f,1,1,5,1 2,2,f,2,1,4,1 3,1,f,2,2,4,3 4,2,f,3,1, ,3 5,1,m,4,5,2,4 6,2,m,5,4,5,5 7,1,m,5,3,4,4 8,2,m,4,5,5,9) mydata-read.table(textConnection(mystring), header=TRUE,sep=,,row.names=id,na.strings=9) mydata # Create a gender Releveled variable, gR. # Now 1=m, 2=f mydata$gR - relevel(mydata$gender, m) # Print the data to show that the labels of gR match those of gender. mydata # Show that the underlying codes have indeed reversed. as.numeric(mydata$gender) as.numeric(mydata$gR) # Unclass the two variables to see that print order # implies that both the codes and labels have # flipped, cancelling each other out. For gR, # m appears to be associated with 2, and f with 1 unclass(mydata$gender) unclass(mydata$gR) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html -- Christopher Andrews, PhD SUNY Buffalo, Department of Biostatistics 242 Farber Hall, [EMAIL PROTECTED], 716 829 2756 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying variable labels across a data frame
Hi All, I'm working on a class example that demonstrates one way to deal with factors and their labels. I create a function called myLabeler and apply it with lapply. It works on the whole data frame when I subscript it as in lapply( myQFvars[ ,myQFnames ], myLabeler ) but does not work if I leave the [] subscripts off. I would appreciate it if anyone could tell me why. The program below works up until the final two statements. Thanks, Bob # Assigning factor labels to potentially lots of vars. mystring- (id,workshop,gender,q1,q2,q3,q4 1,1,f,1,1,5,1 2,2,f,2,1,4,1 3,1,f,2,2,4,3 4,2,f,3,1, ,3 5,1,m,4,5,2,4 6,2,m,5,4,5,5 7,1,m,5,3,4,4 8,2,m,4,5,5,9) mydata-read.table(textConnection(mystring), header=TRUE,sep=,,row.names=id,na.strings=9) print(mydata) # Create copies of q variables to use as factors # so we can count them. myQlevels - c(1,2,3,4,5) myQlabels - c(Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) print(myQlevels) print(myQlabels) # Generate two sets of var names to use. myQnames - paste( q, 1:4, sep=) myQFnames - paste( qf, 1:4, sep=) print(myQnames) #The original names. print(myQFnames) #The names for new factor variables. # Extract the q variables to a separate data frame. myQFvars - mydata[ ,myQnames] print(myQFvars) # Rename all the variables with F for Factor. colnames(myQFvars) - myQFnames print(myQFvars) # Create a function to apply the labels to lots of variables. myLabeler - function(x) { factor(x, myQlevels, myQlabels) } # Here's how to use the function on one variable. summary( myLabeler(myQFvars[qf1]) ) #Apply it to all the variables. This method works. myQFvars[ ,myQFnames] - lapply( myQFvars[ ,myQFnames ], myLabeler ) summary(myQFvars) #Here are the results I wanted. # This is the same as above but using the unsubscripted # data frame name. It does not work. myTest - lapply( myQFvars, myLabeler ) summary(myTest) #I'm not sure what these results are. = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Switching labels on a factor
Hi All, I'm perplexed by the way the unclass function displays a factor whose labels have been swapped with the relevel function. I realize it won't affect any results and that the relevel did nothing useful in this particular case. I'm just doing it to learn ways to manipulate factors. The display of unclass leaves me feeling that the relevel had failed. I've checked three books searched R-help, but found no mention of this particular issue. The program below demonstrates the problem. Is this a bug, or is there a reason for it to work this way? Thanks, Bob mystring- (id,workshop,gender,q1,q2,q3,q4 1,1,f,1,1,5,1 2,2,f,2,1,4,1 3,1,f,2,2,4,3 4,2,f,3,1, ,3 5,1,m,4,5,2,4 6,2,m,5,4,5,5 7,1,m,5,3,4,4 8,2,m,4,5,5,9) mydata-read.table(textConnection(mystring), header=TRUE,sep=,,row.names=id,na.strings=9) mydata # Create a gender Releveled variable, gR. # Now 1=m, 2=f mydata$gR - relevel(mydata$gender, m) # Print the data to show that the labels of gR match those of gender. mydata # Show that the underlying codes have indeed reversed. as.numeric(mydata$gender) as.numeric(mydata$gR) # Unclass the two variables to see that print order # implies that both the codes and labels have # flipped, cancelling each other out. For gR, # m appears to be associated with 2, and f with 1 unclass(mydata$gender) unclass(mydata$gR) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Conditional Tranformations
Gabor, Those are handy variations! Perhaps my brain in still in SAS mode on this. I'm expecting something like the code below that checks for male only once, checks for female only when not male (skipping NAs) and does all formulas under the appropriate conditions. The formulas I made up to keep the code short may not be as easily modified to let the logical 0/1 values fix them. if gender==m then do; Score1=... Score2= ... end; else if gender==f then do; Score1=... Score2= ... end; R may not have anything quite like that. R certainly has many other features that SAS lacks. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 12:39 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations And here is a variation: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = score1 + 0.5 * q1 ) or transform( transform(mydata, score1 = (2 + (gender == m)) * q1 + q2), score2 = score1 + 0.5 * q1 ) On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = (2.5 + (gender == m)) * q1 + q2 ) On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Mark, I finally got that approach to work by spreading the logical condition everywhere. That gets the lengths to match. Still, I can't help but think there must be a way to specify the logic once per condition. Thanks, Bob mydata$score1-numeric(mydata$q1) #just initializing. mydata$score2-numeric(mydata$q1) mydata$score1-NA mydata$score2-NA mydata mydata$score1[mydata$gender == f]- 2*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 8:45 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations I'm not sure if I understand your question but I don't think you need iflelse statements. myscore-numeric(q1) ( because I'm not sure how to initialize a list so initialize a vector with q1 elements ) myscore-NA ( I think this should set all the values in myscore to NA ) myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2 myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2 the above should do what you do in the first part of your code but I don't know if that was your question ? also, it does it making myscore a vector because I didn't know how to initialize a list. Someone else may goive a better solution. I'm no expert. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, November 24, 2006 8:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Multiple Conditional Tranformations Greetings, I'm learning R and I'm stuck on a basic concept: how to specify a logical condition once and then perform multiple transformations under that condition. The program below is simplified to demonstrate the goal. Its results are exactly what I want, but I would like to check the logical state of gender only once and create both (or any number of) scores at once. mystring- (id,group,gender,q1,q2,q3,q4 01,1,f,2,2,5,4 02,2,f,2,1,4,5 03,1,f,2,2,4,4 04,2,f,1,1,5,5 05,1,m,4,5,4, 06,2,m,5,4,5,5 07,1,m,3,3,4,5 08,2,m,5,5,5,4) mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name s=id) mydata #Create score1 so that it differs for males and females: mydata$score1 - ifelse( mydata$gender==f , (mydata$score1 - (2*mydata$q1)+mydata
Re: [R] Multiple Conditional Tranformations
That's exactly what I'm looking for. Thanks so much for taking the time to do it that way. On the redundancy issue, I think SAS checks the else if condition only if the original if is false. The check for f when not m I put in only to exclude missing values for gender. Thanks!! Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 7:37 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations Firstly your outline does not check once, it checks twice. First it check for m and then it redundantly checks for f. On the other hand the two variations in my post do check once. Although substantially longer than the solutions in my prior posts, if you want the style shown in your post try this: mydata2 - cbind(mydata, score1 = 0, score2 = 0) is.m - mydata$gender == m mydata2[is.m, ] - transform(mydata[is.m, ], score1 = 3 * q1 + q2, score2 = 3.5 * q1 + q2 ) mydata2[!is.m,] - transform(mydata2[!is.m, ], score1 = 2 * q1 + q2, score2 = 2.5 * q1 + q2 ) On 11/25/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Gabor, Those are handy variations! Perhaps my brain in still in SAS mode on this. I'm expecting something like the code below that checks for male only once, checks for female only when not male (skipping NAs) and does all formulas under the appropriate conditions. The formulas I made up to keep the code short may not be as easily modified to let the logical 0/1 values fix them. if gender==m then do; Score1=... Score2= ... end; else if gender==f then do; Score1=... Score2= ... end; R may not have anything quite like that. R certainly has many other features that SAS lacks. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 12:39 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations And here is a variation: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = score1 + 0.5 * q1 ) or transform( transform(mydata, score1 = (2 + (gender == m)) * q1 + q2), score2 = score1 + 0.5 * q1 ) On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = (2.5 + (gender == m)) * q1 + q2 ) On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Mark, I finally got that approach to work by spreading the logical condition everywhere. That gets the lengths to match. Still, I can't help but think there must be a way to specify the logic once per condition. Thanks, Bob mydata$score1-numeric(mydata$q1) #just initializing. mydata$score2-numeric(mydata$q1) mydata$score1-NA mydata$score2-NA mydata mydata$score1[mydata$gender == f]- 2*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 8:45 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations I'm not sure if I understand your question but I don't think you need iflelse statements. myscore-numeric(q1) ( because I'm not sure how to initialize a list so initialize a vector with q1 elements ) myscore-NA ( I think this should set all the values in myscore to NA ) myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2 myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2 the above should do what you do in the first part of your code but I don't know
Re: [R] Multiple Conditional Tranformations
I have a program that is similar to your longer version, but I could never get the syntax quite right. This will be a big help in understanding how by works with functions. Thanks, Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 11:11 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations Here is a correction: do.call(rbind, by(mydata, 1:nrow(mydata), function(x) switch(as.character(x$gender), m = transform(x, score1 = 3*q1+q2, score2 = 3.5*q1+q2), f = transform(x, score1 = 2*q1+q2, score2 = 2.5*q1+q2), transform(x, score1 = NA, score2 = NA)) )) On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Here are some additional solutions. It appears that the SAS code is performing the transformation row by row and for each row the code in your post is specifying the transformation so if you want to do it that way we could use 'by' like this (where this time we have also added NA processing for the gender): do.call(rbind, by(mydata, 1:nrow(mydata), function(x) switch(as.character(x$gender), m = transform(x, score1 = 3*q1+q2, score2 = 3.5*q1+q2), f = transform(x, score1 = 2*q1+q2, score2 = 2.5*q1+q2), NA) )) # or this somewhat longer version: do.call(rbind, by(mydata, 1:nrow(mydata), function(x) with(x, { if (is.na(gender)) { score1 - score2 - NA } else if (gender == m) { score1 = 3 * q1 + q2 score2 = 3.5 * q1 + q2 } else if (gender == f) { score1 = 2 * q1 + q2 score2 = 2.5 * q1 + q2 } cbind(x, score1, score2) }))) On 11/25/06, Muehnchen, Robert A (Bob) [EMAIL PROTECTED] wrote: That's exactly what I'm looking for. Thanks so much for taking the time to do it that way. On the redundancy issue, I think SAS checks the else if condition only if the original if is false. The check for f when not m I put in only to exclude missing values for gender. Thanks!! Bob -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 7:37 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations Firstly your outline does not check once, it checks twice. First it check for m and then it redundantly checks for f. On the other hand the two variations in my post do check once. Although substantially longer than the solutions in my prior posts, if you want the style shown in your post try this: mydata2 - cbind(mydata, score1 = 0, score2 = 0) is.m - mydata$gender == m mydata2[is.m, ] - transform(mydata[is.m, ], score1 = 3 * q1 + q2, score2 = 3.5 * q1 + q2 ) mydata2[!is.m,] - transform(mydata2[!is.m, ], score1 = 2 * q1 + q2, score2 = 2.5 * q1 + q2 ) On 11/25/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Gabor, Those are handy variations! Perhaps my brain in still in SAS mode on this. I'm expecting something like the code below that checks for male only once, checks for female only when not male (skipping NAs) and does all formulas under the appropriate conditions. The formulas I made up to keep the code short may not be as easily modified to let the logical 0/1 values fix them. if gender==m then do; Score1=... Score2= ... end; else if gender==f then do; Score1=... Score2= ... end; R may not have anything quite like that. R certainly has many other features that SAS lacks. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, November 25, 2006 12:39 AM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations And here is a variation: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = score1 + 0.5 * q1 ) or transform( transform(mydata, score1 = (2 + (gender == m)) * q1 + q2), score2 = score1 + 0.5 * q1 ) On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this: transform(mydata, score1 = (2 + (gender == m)) * q1 + q2, score2 = (2.5 + (gender == m)) * q1 + q2 ) On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Mark, I finally got that approach
[R] Multiple Conditional Tranformations
Greetings, I'm learning R and I'm stuck on a basic concept: how to specify a logical condition once and then perform multiple transformations under that condition. The program below is simplified to demonstrate the goal. Its results are exactly what I want, but I would like to check the logical state of gender only once and create both (or any number of) scores at once. mystring- (id,group,gender,q1,q2,q3,q4 01,1,f,2,2,5,4 02,2,f,2,1,4,5 03,1,f,2,2,4,4 04,2,f,1,1,5,5 05,1,m,4,5,4, 06,2,m,5,4,5,5 07,1,m,3,3,4,5 08,2,m,5,5,5,4) mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name s=id) mydata #Create score1 so that it differs for males and females: mydata$score1 - ifelse( mydata$gender==f , (mydata$score1 - (2*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score1 - (3*mydata$q1)+mydata$q2), NA ) ) mydata #Create score2 so that it too differs for males and females: mydata$score2 - ifelse( mydata$gender==f , (mydata$score2 - (2.5*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA ) ) mydata Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , News: http://listserv.utk.edu/archives/statnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Conditional Tranformations
Mark, Here's what I get when I try that approach. Thanks, Bob mydata$score1-numeric(mydata$q1) #just initializing. mydata$score2-numeric(mydata$q1) mydata$score1-NA mydata$score2-NA mydata group gender q1 q2 q3 q4 score1 score2 1 1 f 2 2 5 4 NA NA 2 2 f 2 1 4 5 NA NA 3 1 f 2 2 4 4 NA NA 4 2 f 1 1 5 5 NA NA 5 1 m 4 5 4 NA NA NA 6 2 m 5 4 5 5 NA NA 7 1 m 3 3 4 5 NA NA 8 2 m 5 5 5 4 NA NA mydata$score1[mydata$gender == f]-2*mydata$q1 + mydata$q2 Warning message: number of items to replace is not a multiple of replacement length mydata$score2[mydata$gender == f]-2.5*mydata$q1 + mydata$q2 Warning message: number of items to replace is not a multiple of replacement length mydata$score1[mydata$gender == m]-3*mydata$q1 + mydata$q2 Warning message: number of items to replace is not a multiple of replacement length mydata$score2[mydata$gender == m]-3.5*mydata$q1 + mydata$q2 Warning message: number of items to replace is not a multiple of replacement length -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 8:45 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations I'm not sure if I understand your question but I don't think you need iflelse statements. myscore-numeric(q1) ( because I'm not sure how to initialize a list so initialize a vector with q1 elements ) myscore-NA ( I think this should set all the values in myscore to NA ) myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2 myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2 the above should do what you do in the first part of your code but I don't know if that was your question ? also, it does it making myscore a vector because I didn't know how to initialize a list. Someone else may goive a better solution. I'm no expert. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, November 24, 2006 8:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Multiple Conditional Tranformations Greetings, I'm learning R and I'm stuck on a basic concept: how to specify a logical condition once and then perform multiple transformations under that condition. The program below is simplified to demonstrate the goal. Its results are exactly what I want, but I would like to check the logical state of gender only once and create both (or any number of) scores at once. mystring- (id,group,gender,q1,q2,q3,q4 01,1,f,2,2,5,4 02,2,f,2,1,4,5 03,1,f,2,2,4,4 04,2,f,1,1,5,5 05,1,m,4,5,4, 06,2,m,5,4,5,5 07,1,m,3,3,4,5 08,2,m,5,5,5,4) mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name s=id) mydata #Create score1 so that it differs for males and females: mydata$score1 - ifelse( mydata$gender==f , (mydata$score1 - (2*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score1 - (3*mydata$q1)+mydata$q2), NA ) ) mydata #Create score2 so that it too differs for males and females: mydata$score2 - ifelse( mydata$gender==f , (mydata$score2 - (2.5*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA ) ) mydata Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , News: http://listserv.utk.edu/archives/statnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see
Re: [R] Multiple Conditional Tranformations
Mark, I finally got that approach to work by spreading the logical condition everywhere. That gets the lengths to match. Still, I can't help but think there must be a way to specify the logic once per condition. Thanks, Bob mydata$score1-numeric(mydata$q1) #just initializing. mydata$score2-numeric(mydata$q1) mydata$score1-NA mydata$score2-NA mydata mydata$score1[mydata$gender == f]- 2*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 8:45 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations I'm not sure if I understand your question but I don't think you need iflelse statements. myscore-numeric(q1) ( because I'm not sure how to initialize a list so initialize a vector with q1 elements ) myscore-NA ( I think this should set all the values in myscore to NA ) myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2 myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2 the above should do what you do in the first part of your code but I don't know if that was your question ? also, it does it making myscore a vector because I didn't know how to initialize a list. Someone else may goive a better solution. I'm no expert. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, November 24, 2006 8:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Multiple Conditional Tranformations Greetings, I'm learning R and I'm stuck on a basic concept: how to specify a logical condition once and then perform multiple transformations under that condition. The program below is simplified to demonstrate the goal. Its results are exactly what I want, but I would like to check the logical state of gender only once and create both (or any number of) scores at once. mystring- (id,group,gender,q1,q2,q3,q4 01,1,f,2,2,5,4 02,2,f,2,1,4,5 03,1,f,2,2,4,4 04,2,f,1,1,5,5 05,1,m,4,5,4, 06,2,m,5,4,5,5 07,1,m,3,3,4,5 08,2,m,5,5,5,4) mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name s=id) mydata #Create score1 so that it differs for males and females: mydata$score1 - ifelse( mydata$gender==f , (mydata$score1 - (2*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score1 - (3*mydata$q1)+mydata$q2), NA ) ) mydata #Create score2 so that it too differs for males and females: mydata$score2 - ifelse( mydata$gender==f , (mydata$score2 - (2.5*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA ) ) mydata Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , News: http://listserv.utk.edu/archives/statnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research
Re: [R] Multiple Conditional Tranformations
Good idea. I'm still getting used to how flexible R is on substitutions like that! -Bob -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 10:20 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations You could set temp-which(my$gender[my$gender == f]) and then temp will have the female indices and Then you could just put temp everywhere instead of the statement but I think that's the best you can do. Definitely, someone will reply and there may be a shorter way that I am unaware of. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, November 24, 2006 10:09 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Multiple Conditional Tranformations Mark, I finally got that approach to work by spreading the logical condition everywhere. That gets the lengths to match. Still, I can't help but think there must be a way to specify the logic once per condition. Thanks, Bob mydata$score1-numeric(mydata$q1) #just initializing. mydata$score2-numeric(mydata$q1) mydata$score1-NA mydata$score2-NA mydata mydata$score1[mydata$gender == f]- 2*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] + mydata$q2[mydata$gender==f] mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] + mydata$q2[mydata$gender==m] mydata = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] Sent: Friday, November 24, 2006 8:45 PM To: Muenchen, Robert A (Bob) Subject: RE: [R] Multiple Conditional Tranformations I'm not sure if I understand your question but I don't think you need iflelse statements. myscore-numeric(q1) ( because I'm not sure how to initialize a list so initialize a vector with q1 elements ) myscore-NA ( I think this should set all the values in myscore to NA ) myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2 myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2 the above should do what you do in the first part of your code but I don't know if that was your question ? also, it does it making myscore a vector because I didn't know how to initialize a list. Someone else may goive a better solution. I'm no expert. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Friday, November 24, 2006 8:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Multiple Conditional Tranformations Greetings, I'm learning R and I'm stuck on a basic concept: how to specify a logical condition once and then perform multiple transformations under that condition. The program below is simplified to demonstrate the goal. Its results are exactly what I want, but I would like to check the logical state of gender only once and create both (or any number of) scores at once. mystring- (id,group,gender,q1,q2,q3,q4 01,1,f,2,2,5,4 02,2,f,2,1,4,5 03,1,f,2,2,4,4 04,2,f,1,1,5,5 05,1,m,4,5,4, 06,2,m,5,4,5,5 07,1,m,3,3,4,5 08,2,m,5,5,5,4) mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name s=id) mydata #Create score1 so that it differs for males and females: mydata$score1 - ifelse( mydata$gender==f , (mydata$score1 - (2*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score1 - (3*mydata$q1)+mydata$q2), NA ) ) mydata #Create score2 so that it too differs for males and females: mydata$score2 - ifelse( mydata$gender==f , (mydata$score2 - (2.5*mydata$q1)+mydata$q2), ifelse( mydata$gender==m, (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA ) ) mydata Thanks! Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , News: http://listserv.utk.edu/archives/statnews.html http://listserv.utk.edu/archives/statnews.html = [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R