[R] creating dummy variables based on conditions
Hello everyone, I have a dataset which includes the first three variables from the demo data below (year, id and var). I need to create the new variable ans as follows If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of one was recorded. Sample data with the output is shown below. yearid var ans [1,] 2010 1 1 1 [2,] 2010 2 0 0 [3,] 2010 1 0 1 [4,] 2010 1 0 1 [5,] 2011 2 1 1 [6,] 2011 2 0 1 [7,] 2011 1 0 0 [8,] 2011 1 0 0 Any help on how to achieve this is much appreciated. Thanks Anup [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables based on conditions
Hello, Your data seems to be of class 'matrix'. The following code needs it to be a data.frame. dat - as.data.frame(your input matrix) res - do.call(rbind, lapply(split(dat, list(dat$id, dat$year)), function(x){ x$ans - if(any(x$var == 1)) 1 else 0 x})) rownames(res) - NULL res Hope this helps, Rui Barradas Em 14-07-2013 12:30, Anup Nandialath escreveu: Hello everyone, I have a dataset which includes the first three variables from the demo data below (year, id and var). I need to create the new variable ans as follows If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of one was recorded. Sample data with the output is shown below. yearid var ans [1,] 2010 1 1 1 [2,] 2010 2 0 0 [3,] 2010 1 0 1 [4,] 2010 1 0 1 [5,] 2011 2 1 1 [6,] 2011 2 0 1 [7,] 2011 1 0 0 [8,] 2011 1 0 0 Any help on how to achieve this is much appreciated. Thanks Anup [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables based on conditions
Hi, You could try this: (if I understand it correctly) dat1- read.table(text= year id var ans 2010 1 1 1 2010 2 0 0 2010 1 0 1 2010 1 0 1 2011 2 1 1 2011 2 0 1 2011 1 0 0 2011 1 0 0 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$newres-with(dat1,ave(var,id,year,FUN=function(x) any(x==1)*1)) dat1 # year id var ans newres #1 2010 1 1 1 1 #2 2010 2 0 0 0 #3 2010 1 0 1 1 #4 2010 1 0 1 1 #5 2011 2 1 1 1 #6 2011 2 0 1 1 #7 2011 1 0 0 0 #8 2011 1 0 0 0 A.K. - Original Message - From: Anup Nandialath anupme...@gmail.com To: r-help@r-project.org Cc: Sent: Sunday, July 14, 2013 7:30 AM Subject: [R] creating dummy variables based on conditions Hello everyone, I have a dataset which includes the first three variables from the demo data below (year, id and var). I need to create the new variable ans as follows If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of one was recorded. Sample data with the output is shown below. year id var ans [1,] 2010 1 1 1 [2,] 2010 2 0 0 [3,] 2010 1 0 1 [4,] 2010 1 0 1 [5,] 2011 2 1 1 [6,] 2011 2 0 1 [7,] 2011 1 0 0 [8,] 2011 1 0 0 Any help on how to achieve this is much appreciated. Thanks Anup [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables based on conditions
hi Arun, Thanks for this. This solution works great. Knid Regards Anup On Sun, Jul 14, 2013 at 8:07 PM, arun smartpink...@yahoo.com wrote: Hi, You could try this: (if I understand it correctly) dat1- read.table(text= yearid var ans 2010 1 1 1 2010 2 0 0 2010 1 0 1 2010 1 0 1 2011 2 1 1 2011 2 0 1 2011 1 0 0 2011 1 0 0 ,sep=,header=TRUE,stringsAsFactors=FALSE) dat1$newres-with(dat1,ave(var,id,year,FUN=function(x) any(x==1)*1)) dat1 # year id var ans newres #1 2010 1 1 1 1 #2 2010 2 0 0 0 #3 2010 1 0 1 1 #4 2010 1 0 1 1 #5 2011 2 1 1 1 #6 2011 2 0 1 1 #7 2011 1 0 0 0 #8 2011 1 0 0 0 A.K. - Original Message - From: Anup Nandialath anupme...@gmail.com To: r-help@r-project.org Cc: Sent: Sunday, July 14, 2013 7:30 AM Subject: [R] creating dummy variables based on conditions Hello everyone, I have a dataset which includes the first three variables from the demo data below (year, id and var). I need to create the new variable ans as follows If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of one was recorded. Sample data with the output is shown below. yearid var ans [1,] 2010 1 1 1 [2,] 2010 2 0 0 [3,] 2010 1 0 1 [4,] 2010 1 0 1 [5,] 2011 2 1 1 [6,] 2011 2 0 1 [7,] 2011 1 0 0 [8,] 2011 1 0 0 Any help on how to achieve this is much appreciated. Thanks Anup [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
R is for dummies (like me, but I don't use dummy variables) or for the non-Dummies like all experts who help us all the time@@.. so dummy variables are not needed! :) QED... On Sat, Apr 20, 2013 at 6:16 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 21/04/13 10:56, Eva Prieto Castro wrote: Hi, Why do you write that dummy variables are not needed in R?. I would like you explain it. As others have said --- do some self-study. But a brief answer is that in any reasonable modelling problem in which dummy variables might arise, R creates the dummy variables that it uses automagically , behind the scenes, from the *factors* whose levels correspond to the dummy variables. Summary: Learn about and understand *factors*; forget about dummy variables. cheers, Rolf Turner __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating dummy variables
Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
Dummy variables are not needed in R. Bert Sent from my iPhone -- please excuse typos. On Apr 20, 2013, at 11:23 AM, shyam basnet shyamabc2...@yahoo.com wrote: Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
Hello Shyam, if your data is stored in variable dataset, for example, the following code will create the desired dummy-coded variables and attach them to the dataset: ## #init vars SWHE=BARL - vector(length=nrow(dataset)) SWHE[]=BARL[] - 0 #initialize dummy-coded vars with all 0s #fill in variables SWHE[grep(SWHE, dataset$crop)] - 1 #grep returns the indices where a match is found, see ?grep BARL[grep(BARL, dataset$crop)] - 1 #attach new dummy codes to dataset dataset$SWHE - SWHE dataset$BARL - BARL ## Hope this helps, Patrick 2013/4/20 shyam basnet shyamabc2...@yahoo.com Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
On Apr 20, 2013, at 2:03 PM, Bert Gunter wrote: Dummy variables are not needed in R. Bert Bert is correct on this point, but if you what to know how the regression functions in R do this behind the scenes then you could always look at: ?model.matrix # where _some_ of the the automagical stuff happens model.matrix( ~ crop, data=dat[,crop, drop=FALSE]) (Intercept) cropSWHE 1 11 2 11 3 10 4 10 5 11 6 11 7 10 8 10 attr(,assign) [1] 0 1 attr(,contrasts) attr(,contrasts)$crop [1] contr.treatment Sent from my iPhone -- please excuse typos. On Apr 20, 2013, at 11:23 AM, shyam basnet shyamabc2...@yahoo.com wrote: Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
Hi, Why do you write that dummy variables are not needed in R?. I would like you explain it. Thanks, Eva --- El dom, 21/4/13, David Winsemius dwinsem...@comcast.net escribió: De: David Winsemius dwinsem...@comcast.net Asunto: Re: [R] creating dummy variables Para: Bert Gunter gunter.ber...@gene.com CC: r-help@R-project.org r-help@r-project.org, shyam basnet shyamabc2...@yahoo.com Fecha: domingo, 21 de abril, 2013 00:38 On Apr 20, 2013, at 2:03 PM, Bert Gunter wrote: Dummy variables are not needed in R. Bert Bert is correct on this point, but if you what to know how the regression functions in R do this behind the scenes then you could always look at: ?model.matrix # where _some_ of the the automagical stuff happens model.matrix( ~ crop, data=dat[,crop, drop=FALSE]) (Intercept) cropSWHE 1 1 1 2 1 1 3 1 0 4 1 0 5 1 1 6 1 1 7 1 0 8 1 0 attr(,assign) [1] 0 1 attr(,contrasts) attr(,contrasts)$crop [1] contr.treatment Sent from my iPhone -- please excuse typos. On Apr 20, 2013, at 11:23 AM, shyam basnet shyamabc2...@yahoo.com wrote: Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
On Apr 20, 2013, at 3:56 PM, Eva Prieto Castro wrote: Hi, Why do you write that dummy variables are not needed in R?. I would like you explain it. I suppose you might want individual instruction, but Rhelp was established with certain principles (expressed in the Posting Guide), one of which is that persons posting to Rhelp should have made demonstrated effort on their own to study the offered documentation. You are not demonstrating that you have yet understood this principle. -- David. Thanks, Eva --- El dom, 21/4/13, David Winsemius dwinsem...@comcast.net escribió: De: David Winsemius dwinsem...@comcast.net Asunto: Re: [R] creating dummy variables Para: Bert Gunter gunter.ber...@gene.com CC: r-help@R-project.org r-help@r-project.org, shyam basnet shyamabc2...@yahoo.com Fecha: domingo, 21 de abril, 2013 00:38 On Apr 20, 2013, at 2:03 PM, Bert Gunter wrote: Dummy variables are not needed in R. Bert Bert is correct on this point, but if you what to know how the regression functions in R do this behind the scenes then you could always look at: ?model.matrix # where _some_ of the the automagical stuff happens model.matrix( ~ crop, data=dat[,crop, drop=FALSE]) (Intercept) cropSWHE 1 11 2 11 3 10 4 10 5 11 6 11 7 10 8 10 attr(,assign) [1] 0 1 attr(,contrasts) attr(,contrasts)$crop [1] contr.treatment Sent from my iPhone -- please excuse typos. On Apr 20, 2013, at 11:23 AM, shyam basnet shyamabc2...@yahoo.com wrote: Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
To all who ask about dummy variables in R: Please please read the Introduction to R section on Statistical Models in R or other tutorial (there are many on the web) on regression modeling in R. (For that matter -- please read all of this or other basic R tutorial before posting here!). An excellent but somewhat terse and technical discussion can be found in the latest edition of VR's MASS -- the chapter on linear models. I do not intend to (poorly) try to recapitulate what others have already (well) explained. Do Your Homework! Documentation within R can be found in ?lm ?formula ?contrasts ?model.matrix These may not make much sense unless you have done your homework first -- or already understand the statistical issues. -- Bert On Sat, Apr 20, 2013 at 3:56 PM, Eva Prieto Castro evapcas...@yahoo.eswrote: Hi, Why do you write that dummy variables are not needed in R?. I would like you explain it. Thanks, Eva --- El *dom, 21/4/13, David Winsemius dwinsem...@comcast.net* escribió: De: David Winsemius dwinsem...@comcast.net Asunto: Re: [R] creating dummy variables Para: Bert Gunter gunter.ber...@gene.com CC: r-help@R-project.org r-help@r-project.org, shyam basnet shyamabc2...@yahoo.com Fecha: domingo, 21 de abril, 2013 00:38 On Apr 20, 2013, at 2:03 PM, Bert Gunter wrote: Dummy variables are not needed in R. Bert Bert is correct on this point, but if you what to know how the regression functions in R do this behind the scenes then you could always look at: ?model.matrix # where _some_ of the the automagical stuff happens model.matrix( ~ crop, data=dat[,crop, drop=FALSE]) (Intercept) cropSWHE 1 11 2 11 3 10 4 10 5 11 6 11 7 10 8 10 attr(,assign) [1] 0 1 attr(,contrasts) attr(,contrasts)$crop [1] contr.treatment Sent from my iPhone -- please excuse typos. On Apr 20, 2013, at 11:23 AM, shyam basnet shyamabc2...@yahoo.comhttp://mc/compose?to=shyamabc2...@yahoo.com wrote: Hello R-users, The below is a snippet of my data: fid crop year value 5_1_1 SWHE 1995 171 5_1_1 SWHE 1997 696 5_1_1 BARL 1996 114 5_1_1 BARL 1997 344 5_2_2 SWHE 1995 120 5_2_2 SWHE 1996 511 5_2_2 BARL 1996 239 5_2_2 BARL 1997 349 Here, I want to create dummy variables with the names of the content of a column 'crop' in a way that the new variable 'SWHE' would receive a value of 1 if the column 'crop' contains 'SWHE' and 0 otherwise. So, I would have two new variables SWHE and BARL as below: fid crop year value SWHE BARL 5_1_1 SWHE 1995 171 1 0 5_1_1 SWHE 1997 696 1 0 5_1_1 BARL 1996 114 0 1 5_1_1 BARL 1997 344 0 1 5_2_2 SWHE 1995 120 1 0 5_2_2 SWHE 1996 511 1 0 5_2_2 BARL 1996 239 0 1 5_2_2 BARL 1997 349 0 1 Cheers, Shyam Nepal David Winsemius Alameda, CA, USA __ R-help@r-project.org http://mc/compose?to=R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables
On 21/04/13 10:56, Eva Prieto Castro wrote: Hi, Why do you write that dummy variables are not needed in R?. I would like you explain it. As others have said --- do some self-study. But a brief answer is that in any reasonable modelling problem in which dummy variables might arise, R creates the dummy variables that it uses automagically , behind the scenes, from the *factors* whose levels correspond to the dummy variables. Summary: Learn about and understand *factors*; forget about dummy variables. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating dummy variables in r
On Jan 30, 2013, at 04:58 , Bert Gunter wrote: You almost never need dummy variables in R. R creates them automatically from factors given model and possibly contrasts specification. ?contrasts ## for some technical details. If you have not read An Introduction to R do so now. Pay particular attention to the chapter on modeling and categorical variables. You can also google around to find appropriate tutorials. Here is one: http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm I repeat: DO not create dummy variablesby hand in R unless you have understood the above and have good reason to do so. In this case it's a cutpoint-type situation, and the user might be excused for not wanting to deal with the mysteries of cut() (yet). More importantly, the main issue here seems to be a lack of understanding of where new variables are located. I.e., if the data set is called dd, you need dd$prev1 - (etc) and if you use attach(), do it _after_ modifying the data (or detach() and reattach). Otherwise, new variables end up in the global environment. (This is logical enough once you realize that the result of a computation does not necessarily fit into the dataset.) By the way: You don't need ifelse(): as.numeric(ret1 = .5) or even just (ret1 = .5) works. -- Bert On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson thoms...@email.arizona.edu wrote: Hello, Semi-new r user here and still learning the ropes. I am creating dummy variables for a dataset on stock prices in r. One dummy variable is called prev1 and is: prev1 - ifelse(ret1 = .5, 1, 0) where ret1 is the previous day's return. The variable prev1 is created fine and works in my regression model and for running conditional statistics. However, when I call the names() function on the dataset the freshly created variable (prev1) doesn't show up; also, when I export the dataset the prev1 variable doesn't show up in the exported file. Is there a way to make the variable show up on both the call function but more importantly on the exported file? Or am I forced to create dummy variables elsewhere(much tougher)? Thanks, Joe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating dummy variables in r
Hello, Semi-new r user here and still learning the ropes. I am creating dummy variables for a dataset on stock prices in r. One dummy variable is called prev1 and is: prev1 - ifelse(ret1 = .5, 1, 0) where ret1 is the previous day's return. The variable prev1 is created fine and works in my regression model and for running conditional statistics. However, when I call the names() function on the dataset the freshly created variable (prev1) doesn't show up; also, when I export the dataset the prev1 variable doesn't show up in the exported file. Is there a way to make the variable show up on both the call function but more importantly on the exported file? Or am I forced to create dummy variables elsewhere(much tougher)? Thanks, Joe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating dummy variables in r
You almost never need dummy variables in R. R creates them automatically from factors given model and possibly contrasts specification. ?contrasts ## for some technical details. If you have not read An Introduction to R do so now. Pay particular attention to the chapter on modeling and categorical variables. You can also google around to find appropriate tutorials. Here is one: http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm I repeat: DO not create dummy variablesby hand in R unless you have understood the above and have good reason to do so. -- Bert On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson thoms...@email.arizona.edu wrote: Hello, Semi-new r user here and still learning the ropes. I am creating dummy variables for a dataset on stock prices in r. One dummy variable is called prev1 and is: prev1 - ifelse(ret1 = .5, 1, 0) where ret1 is the previous day's return. The variable prev1 is created fine and works in my regression model and for running conditional statistics. However, when I call the names() function on the dataset the freshly created variable (prev1) doesn't show up; also, when I export the dataset the prev1 variable doesn't show up in the exported file. Is there a way to make the variable show up on both the call function but more importantly on the exported file? Or am I forced to create dummy variables elsewhere(much tougher)? Thanks, Joe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating dummy variables
Hello R project I am a R beginner trying to create a dummy variable to clasificate soil types. So, I have a column in my database called codtipo (typecode in english) where soil type is coded as 1.1 to 1.4 arenosol (I have 4 types) 2.1 to 2.3 calcisols 4.1 to 4.4 fluvisols and so on To make dummy variables I understand that, I create different columns as for gipsisols datos$gipsi=datos$codsuelo for (i in 1:length(datos$gipsi)){if(datos$codsuelo[i]=5.1 (datos$codsuelo[i]=5.4){datos$gipsi[i]=1}else{0} } for cambisols it should be datos$cambi=datos$codsuelo for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } and so on... but anyway R answers that a necesary value TRUE/FALSE is not existing. What can I do? thanks a lot!! Arantzazu Blanco Bernardeau Dpto de Química Agrícola, Geología y Edafología Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 06:51:42 -0700 From: lampria...@yahoo.com To: jorism...@gmail.com CC: r-help@r-project.org Subject: Re: [R] ordinal variables Thank you Joris, I'll have a look into the commands you sent me. They look convincing. I hope my students will also see them in a positive way (although I can force them to pretend that they have a positive attitude)! Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] ordinal variables To: Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 14:35 see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear colleagues, I teach statistics using SPSS. I want to use R instead. I hit on one problem and I need some quick advice. When I want to work with ordinal variables, in SPSS I can compute the median or create a barchart or compute a spearman correlation with no problems. In R, if I read the ordinal variable as numeric, then I cannot do a barplot because I miss the category names. If I read the variables as characters, then I cannot run a spearman. How can I read a variable as numeric, still have the chance to assign value labels, and be able to get table of frequencies etc? I want to be able to do all these things in R commander. My students will probable be scared away if I try anything else other than R commander (just writing commands will not make them happy). I hope I am not asking for too much. Hopefully there is a way __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Citas sin compromiso por Internet Te damos las claves para encontrar pareja en la red [[alternative HTML version deleted]] __ R-help@r-project.org
Re: [R] Creating dummy variables
Was that the original code that you ran? As there appear to be several mistakes in the code: 1. In the gipsisoil stuff, there is a ')' too much 2. In the gambisoil stuff both signs point in the same direction, you probably want one and one My general suggestion would be to skip the loops altogether and vectorize your code: datos$cambi=datos$codsuelo datos$cambi[datos$codsuelo=3.1 datos$codsuelo =3.3] - 1 Another source of your error could be that datos$codtipo is not numeric. What does class(datos$codzuelo) say? HTH Jannis for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } --- Arantzazu Blanco Bernardeau aramu...@hotmail.com schrieb am Do, 3.6.2010: Von: Arantzazu Blanco Bernardeau aramu...@hotmail.com Betreff: [R] Creating dummy variables An: r-help@r-project.org Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr Hello R project I am a R beginner trying to create a dummy variable to clasificate soil types. So, I have a column in my database called codtipo (typecode in english) where soil type is coded as 1.1 to 1.4 arenosol (I have 4 types) 2.1 to 2.3 calcisols 4.1 to 4.4 fluvisols and so on To make dummy variables I understand that, I create different columns as for gipsisols datos$gipsi=datos$codsuelo for (i in 1:length(datos$gipsi)){if(datos$codsuelo[i]=5.1 (datos$codsuelo[i]=5.4){datos$gipsi[i]=1}else{0} } for cambisols it should be datos$cambi=datos$codsuelo for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } and so on... but anyway R answers that a necesary value TRUE/FALSE is not existing. What can I do? thanks a lot!! Arantzazu Blanco Bernardeau Dpto de Química Agrícola, Geología y Edafología Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 06:51:42 -0700 From: lampria...@yahoo.com To: jorism...@gmail.com CC: r-help@r-project.org Subject: Re: [R] ordinal variables Thank you Joris, I'll have a look into the commands you sent me. They look convincing. I hope my students will also see them in a positive way (although I can force them to pretend that they have a positive attitude)! Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] ordinal variables To: Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 14:35 see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear colleagues, I teach statistics using SPSS. I want to use R instead. I hit on one problem and I need some quick advice. When I want to work with ordinal variables, in SPSS I can compute the median or create a barchart or compute a spearman correlation with no problems. In R, if I read the ordinal variable as numeric, then I cannot do a barplot because I miss the category names. If I read the variables as characters, then I cannot run a spearman. How can I read a variable as numeric, still have the chance to assign value labels, and be able to get table of frequencies etc? I want to be able to do all these things in R commander. My students will probable be scared away if I try anything else other than R commander (just writing commands will not make them happy). I hope I am not asking for too much. Hopefully there is a way __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
Re: [R] Creating dummy variables
hey thanks I did solve it already, it had more mistakes as you see :S bye Arantzazu Blanco Bernardeau Dpto de Química Agrícola, Geología y Edafología Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 14:40:30 + From: bt_jan...@yahoo.de Subject: AW: [R] Creating dummy variables To: r-help@r-project.org; aramu...@hotmail.com Was that the original code that you ran? As there appear to be several mistakes in the code: 1. In the gipsisoil stuff, there is a ')' too much 2. In the gambisoil stuff both signs point in the same direction, you probably want one and one My general suggestion would be to skip the loops altogether and vectorize your code: datos$cambi=datos$codsuelo datos$cambi[datos$codsuelo=3.1 datos$codsuelo =3.3] - 1 Another source of your error could be that datos$codtipo is not numeric. What does class(datos$codzuelo) say? HTH Jannis for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } --- Arantzazu Blanco Bernardeau aramu...@hotmail.com schrieb am Do, 3.6.2010: Von: Arantzazu Blanco Bernardeau aramu...@hotmail.com Betreff: [R] Creating dummy variables An: r-help@r-project.org Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr Hello R project I am a R beginner trying to create a dummy variable to clasificate soil types. So, I have a column in my database called codtipo (typecode in english) where soil type is coded as 1.1 to 1.4 arenosol (I have 4 types) 2.1 to 2.3 calcisols 4.1 to 4.4 fluvisols and so on To make dummy variables I understand that, I create different columns as for gipsisols datos$gipsi=datos$codsuelo for (i in 1:length(datos$gipsi)){if(datos$codsuelo[i]=5.1 (datos$codsuelo[i]=5.4){datos$gipsi[i]=1}else{0} } for cambisols it should be datos$cambi=datos$codsuelo for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } and so on... but anyway R answers that a necesary value TRUE/FALSE is not existing. What can I do? thanks a lot!! Arantzazu Blanco Bernardeau Dpto de Química Agrícola, Geología y Edafología Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 06:51:42 -0700 From: lampria...@yahoo.com To: jorism...@gmail.com CC: r-help@r-project.org Subject: Re: [R] ordinal variables Thank you Joris, I'll have a look into the commands you sent me. They look convincing. I hope my students will also see them in a positive way (although I can force them to pretend that they have a positive attitude)! Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] ordinal variables To: Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 14:35 see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear colleagues, I teach statistics using SPSS. I want to use R instead. I hit on one problem and I need some quick advice. When I want to work with ordinal variables, in SPSS I can compute the median or create a barchart or compute a spearman correlation with no problems. In R, if I read the ordinal variable as numeric, then I cannot do a barplot because I miss the category names. If I read the variables as characters, then I cannot run a spearman. How can I read a variable as numeric, still have the chance to assign value labels, and be able to get table of frequencies etc
Re: [R] Creating dummy variables
Do **NOT** use dummy variables in R. R's modeling functions takes care of this themselves using factors. You say you are a beginner. OK, so begin **properly** -- by reading An Introduction to R. Chapter 11 on Statistical Models in R was written precisely to help people like you learn what to do and avoid asking inappropriate questions like this on this list. Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Arantzazu Blanco Bernardeau Sent: Thursday, June 03, 2010 10:04 AM To: bt_jan...@yahoo.de; r-help@r-project.org Subject: Re: [R] Creating dummy variables hey thanks I did solve it already, it had more mistakes as you see :S bye Arantzazu Blanco Bernardeau Dpto de Qummica Agrmcola, Geologma y Edafologma Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 14:40:30 + From: bt_jan...@yahoo.de Subject: AW: [R] Creating dummy variables To: r-help@r-project.org; aramu...@hotmail.com Was that the original code that you ran? As there appear to be several mistakes in the code: 1. In the gipsisoil stuff, there is a ')' too much 2. In the gambisoil stuff both signs point in the same direction, you probably want one and one My general suggestion would be to skip the loops altogether and vectorize your code: datos$cambi=datos$codsuelo datos$cambi[datos$codsuelo=3.1 datos$codsuelo =3.3] - 1 Another source of your error could be that datos$codtipo is not numeric. What does class(datos$codzuelo) say? HTH Jannis for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } --- Arantzazu Blanco Bernardeau aramu...@hotmail.com schrieb am Do, 3.6.2010: Von: Arantzazu Blanco Bernardeau aramu...@hotmail.com Betreff: [R] Creating dummy variables An: r-help@r-project.org Datum: Donnerstag, 3. Juni, 2010 14:11 Uhr Hello R project I am a R beginner trying to create a dummy variable to clasificate soil types. So, I have a column in my database called codtipo (typecode in english) where soil type is coded as 1.1 to 1.4 arenosol (I have 4 types) 2.1 to 2.3 calcisols 4.1 to 4.4 fluvisols and so on To make dummy variables I understand that, I create different columns as for gipsisols datos$gipsi=datos$codsuelo for (i in 1:length(datos$gipsi)){if(datos$codsuelo[i]=5.1 (datos$codsuelo[i]=5.4){datos$gipsi[i]=1}else{0} } for cambisols it should be datos$cambi=datos$codsuelo for (i in 1:length(datos$cambi)){if(datos$codsuelo[i]=3.1 datos$codsuelo[i]=3.3){datos$cambi[i]=1}else{0} } and so on... but anyway R answers that a necesary value TRUE/FALSE is not existing. What can I do? thanks a lot!! Arantzazu Blanco Bernardeau Dpto de Qummica Agrmcola, Geologma y Edafologma Universidad de Murcia-Campus de Espinardo Date: Thu, 3 Jun 2010 06:51:42 -0700 From: lampria...@yahoo.com To: jorism...@gmail.com CC: r-help@r-project.org Subject: Re: [R] ordinal variables Thank you Joris, I'll have a look into the commands you sent me. They look convincing. I hope my students will also see them in a positive way (although I can force them to pretend that they have a positive attitude)! Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] ordinal variables To: Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 14:35 see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM
[R] Creating Dummy Variables in R
Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
On 12/16/2009 03:58 PM, whitaker m. (mw1006) wrote: Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Without an example of your code, it's a bit difficult. But it might be easier to use one variable clarity with three possible values (IF, VVS1, VVS2), defined as a factor. lm(Price ~ Weight + Clarity) should then do the trick (unless you explicitly want to use a different dummy coding than the default) Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote: Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? You should code the categorical variable Clarity as a factor so that R knows that this is a categorical variable and can deal with it appropriately in subsequent computations such as summary() or lm(). Thus, I would recommend to store your data as dat - data.frame( Price = c(500, 1000, 864, 340, 90, 450), Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3), Clarity = c(IF, VVS1, VVS2)[c(1, 3, 2, 3, 1, 2)]) which yields, e.g., R summary(dat) PriceWeight Clarity Min. : 90.0 Min. :0.500 IF :2 1st Qu.: 367.5 1st Qu.:2.375 VVS1:2 Median : 475.0 Median :2.800 VVS2:2 Mean : 540.7 Mean :3.600 3rd Qu.: 773.0 3rd Qu.:4.650 Max. :1000.0 Max. :8.000 and then you can also do R lm(Price ~ Weight + Clarity, data = dat) Call: lm(formula = Price ~ Weight + Clarity, data = dat) Coefficients: (Intercept) Weight ClarityVVS1 ClarityVVS2 -45.0580.01 490.02 403.00 or if you wish to choose a different coding R lm(Price ~ 0 + Weight + Clarity, data = dat) Call: lm(formula = Price ~ 0 + Weight + Clarity, data = dat) Coefficients: WeightClarityIF ClarityVVS1 ClarityVVS2 80.01 -45.05 444.97 357.95 Some further reading of introductory material on linear regression in R would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc. hth, Z Any helps is greatly appreciated. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
Is your variable Clarity a categorical with 4 levels? Thus, the need for k-1 (3) dummies? Your error may be the result of creating k instead of k-1 dummies, but can't be sure from the example. In R, you don't have to (unless you really want to) explicitly create separate variables. You can use the internal contrast functions. See ?contr.treatment Which is dummy coding by default. You can specify which group is the reference group. Alternatively, if you prefer effects coding, you can see ?contr.sum There are others as well. Tom Fletcher -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of whitaker m. (mw1006) Sent: Wednesday, December 16, 2009 8:59 AM To: r-help@r-project.org Subject: [R] Creating Dummy Variables in R Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.