Re: [R] Having trouble converting a dataframe of character vectors to factors
Hi Bert, Thanks for drawing my attention to simplify argument and for the examples. I understand know. Thanks. Dan -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Wednesday, February 20, 2013 4:25 PM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] Having trouble converting a dataframe of character vectors to factors Pleaser re-read ?sapply and pay particular attention to the simplify argument. The following should help explain the issues: z - data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE) sapply(z,class) a b character character z1 - sapply(z,as.factor) sapply(z1,class) a b c d e f character character character character character character z2 - sapply(z,factor, simplify = FALSE) sapply(z2,class) ab factor factor z3 - lapply(z,factor) sapply(z3,class) ab factor factor z3 $a [1] a b c Levels: a b c $b [1] d e f Levels: d e f ## Note that both z2 and z3 are lists, and would have to be converted back to data frames. -- Bert On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan lopez...@llnl.gov wrote: R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 = c(very important, somewhat important, important, somewhat important, important, important, very important, very important, somewhat important, not important), Q2 = c(Somewhat, Very Much, Somewhat, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much), Q3 = c(yes, yes, yes, yes, yes, yes, yes, yes, yes, yes), Q4 = c(None, None, None, None, Confirmed Field of Study, Confirmed Field of Study, Confirmed Field of Study, None, None, None)), .Names = c(Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having trouble converting a dataframe of character vectors to factors
scs2-data.frame(lapply(scs2, factor)) Calling data.frame() on the output of lapply() can result in changing column names and will drop attributes that the input data.frame may have had. I prefer to modify the original data.frame instead of making a new one from scratch to avoid these problems. Also, calling factor() on a factor will drop any unused levels, which you may not want to do. Calling as.factor will not. Compare the following three methods f1 - function (dataFrame) { dataFrame[] - lapply(dataFrame, factor) dataFrame } f2 - function (dataFrame) { dataFrame[] - lapply(dataFrame, as.factor) dataFrame } f3 - function (dataFrame) { data.frame(lapply(dataFrame, factor)) } on the following data.frame x - data.frame(stringsAsFactors=FALSE, check.names=FALSE, No/Yes = factor(c(Yes,Yes,Yes), levels=c(No,Yes)), Size = ordered(c(Small,Large,Medium), levels=c(Small,Medium,Large)), Name = c(Adam,Bill,Chuck)) attr(x, Date) - as.POSIXlt(2013-02-21) str(x) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : chr Adam Bill Chuck - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f1(x)) # drops unused levels 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 1 level Yes: 1 1 1 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f2(x)) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f3(x)) # mangles column names, drops unused levels, drops Date attribute 'data.frame': 3 obs. of 3 variables: $ No.Yes: Factor w/ 1 level Yes: 1 1 1 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mark Lamias Sent: Wednesday, February 20, 2013 6:51 PM To: Daniel Lopez; R help (r-help@r-project.org) Subject: Re: [R] Having trouble converting a dataframe of character vectors to factors How about this? scs2-data.frame(lapply(scs2, factor)) From: Lopez, Dan lopez...@llnl.gov To: R help (r-help@r-project.org) r-help@r-project.org Sent: Wednesday, February 20, 2013 7:09 PM Subject: [R] Having trouble converting a dataframe of character vectors to factors R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 = c(very important, somewhat important, important, somewhat important, important, important, very important, very important, somewhat important, not important), Q2 = c(Somewhat, Very Much, Somewhat, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much), Q3 = c(yes, yes, yes, yes, yes, yes, yes, yes, yes, yes),
Re: [R] Having trouble converting a dataframe of character vectors to factors
Great point, William. I agree your approach is the one to take to preserve attributes. Thanks for following up. From: William Dunlap wdun...@tibco.com elp (r-help@r-project.org) r-help@r-project.org Sent: Thursday, February 21, 2013 11:32 AM Subject: RE: [R] Having trouble converting a dataframe of character vectors to factors scs2-data.frame(lapply(scs2, factor)) Calling data.frame() on the output of lapply() can result in changing column names and will drop attributes that the input data.frame may have had. I prefer to modify the original data.frame instead of making a new one from scratch to avoid these problems. Also, calling factor() on a factor will drop any unused levels, which you may not want to do. Calling as.factor will not. Compare the following three methods f1 - function (dataFrame) { dataFrame[] - lapply(dataFrame, factor) dataFrame } f2 - function (dataFrame) { dataFrame[] - lapply(dataFrame, as.factor) dataFrame } f3 - function (dataFrame) { data.frame(lapply(dataFrame, factor)) } on the following data.frame x - data.frame(stringsAsFactors=FALSE, check.names=FALSE, No/Yes = factor(c(Yes,Yes,Yes), levels=c(No,Yes)), Size = ordered(c(Small,Large,Medium), levels=c(Small,Medium,Large)), Name = c(Adam,Bill,Chuck)) attr(x, Date) - as.POSIXlt(2013-02-21) str(x) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : chr Adam Bill Chuck - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f1(x)) # drops unused levels 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 1 level Yes: 1 1 1 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f2(x)) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 - attr(*, Date)= POSIXlt, format: 2013-02-21 str(f3(x)) # mangles column names, drops unused levels, drops Date attribute 'data.frame': 3 obs. of 3 variables: $ No.Yes: Factor w/ 1 level Yes: 1 1 1 $ Size : Ord.factor w/ 3 levels SmallMedium..: 1 3 2 $ Name : Factor w/ 3 levels Adam,Bill,..: 1 2 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mark Lamias Sent: Wednesday, February 20, 2013 6:51 PM To: Daniel Lopez; R help (r-help@r-project.org) Subject: Re: [R] Having trouble converting a dataframe of character vectors to factors How about this? scs2-data.frame(lapply(scs2, factor)) From: Lopez, Dan lopez...@llnl.gov To: R help (r-help@r-project.org) r-help@r-project.org Sent: Wednesday, February 20, 2013 7:09 PM Subject: [R] Having trouble converting a dataframe of character vectors to factors R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 =
Re: [R] Having trouble converting a dataframe of character vectors to factors
Hi Bill, Great info. The problem is what was originally given to me looks like DPUT1 below (random sample of 25). This is the only format they can give me this in and the data already looks molten. So I applied reshape2::dcast which resulted in a dataframe made of character vectors; except for the first column which is an integer vector. So after dropping columns full of (blanks) and reordering columns I figured I needed factors to accomplish my goal (refer below) and converted everything to factors with: x2[,-1]-as.data.frame(lapply(x[,-1],as.factor)) and ended up with DPUT2 below (random sample of 25) Now after reading your last email I figured I've done will since no attributes got dropped and no levels got dropped (just need to add some in because couldn't be derived from original dataframe) and column names seem fine. Now I have a new problem which is how to reorder levels in a dataframe and possible add some unused. After seeing contents using Hmisc::contents I figured the next logical step is to handle like vectors a chunk at a time. For example subsetting to grepl(Q1_,names(scs.c2)) gives these vectors which all have identical levels except for one: $Q1_1 thru $Q1_7 except $Q1_3 [1]important not important somewhat important very important $Q1_3 [1] important not important somewhat important very important #So I tried I tried this which had no effect keepcols- grepl(Q1_,names(scs.c2)) levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important) #then this which also failed. It coerced a bunch of NA's and turned the vectors back to character vectors scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x) factor(x,levels(x)[c(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important)]) Mind you I can easily do this in MS Excel and is probably what I am going to break down and do fairly soon. But I wanted to give this a good solid shot in R because I want to learn to handle these situations in R. I've been using R for almost a year. __ ADDITIONAL BACKGROUND MY GOAL I ultimately want to get started with some basic correlation analysis for some of the columns : taking your example (slightly modified) I hope to be able to do this xx - data.frame(stringsAsFactors=FALSE, check.names=FALSE,No/Yes = factor(c(Yes,No,No,No), levels=c(No,Yes)), Size = ordered(c(Small,Large,Medium,Medium), levels=c(Small,Medium,Large)),Name = c(Adam,Bill,Chuck,Larry)) cor(sapply(xx[,1:2],as.numeric)) No/Yes Size No/Yes 1.000 -0.8164966 Size -0.8164966 1.000 DPUT1 structure(list(svaID = c(771L, 771L, 775L, 775L, 774L, 776L, 774L, 771L, 771L, 771L, 771L, 774L, 774L, 775L, 765L, 775L, 765L, 775L, 771L, 777L, 775L, 771L, 774L, 776L, 776L), question = structure(c(19L, 12L, 23L, 3L, 10L, 36L, 25L, 1L, 30L, 7L, 21L, 13L, 16L, 32L, 6L, 5L, 18L, 19L, 14L, 2L, 2L, 9L, 37L, 28L, 24L), .Label = c(Q1, Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q17_1, Q17_2, Q17_3, Q17_4, Q17_5, Q18, Q19, Q2, Q20, Q3, Q4, Q5, Q6, Q6_A_1, Q6_A_2, Q6_A_3, Q6_A_4, Q6_A_5, Q7, Q8, Q9), class = factor), answer = structure(c(11L, 29L, 29L, 26L, 29L, 29L, 1L, 1L, 1L, 13L, 11L, 1L, 1L, 1L, 26L, 26L, 11L, 11L, 29L, 13L, 13L, 29L, 29L, 29L, 27L), .Label = c(, 1, 2, 3, 4, 5, Change of College/University, Change of Field of Study, Confirmed Field of Study, did not meet expectations, exceeded expectations, Family/Friend, important, Live Locally, LLNL Contact, LLNL Housing page, Local Newspaper, met expectations, no, None, Not at All, not important, Pursue an Advanced Degree, Somewhat, somewhat important, very important, Very Much, Web, yes), class = factor)), .Names = c(svaID, question, answer), row.names = c(68L, 62L, 147L, 113L, 97L, 168L, 111L, 45L, 51L, 43L, 70L, 100L, 108L, 127L, 5L, 115L, 30L, 142L, 64L, 186L, 112L, 59L, 95L, 160L, 157L), class = data.frame) DPUT2 structure(list(svaID = c(765L, 771L, 774L, 775L, 776L, 777L, 778L, 779L, 782L, 783L, 786L, 788L, 789L, 790L, 791L, 793L, 794L, 795L, 797L, 801L, 803L, 804L, 805L, 807L, 808L), Q1_1 = structure(c(5L, 5L, 5L, 2L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 2L, 2L), .Label = c(, important, not important, somewhat important, very important), class = factor), Q1_2 = structure(c(2L, 5L, 2L, 5L, 2L, 4L, 3L, 5L, 4L, 2L, 2L, 5L, 2L, 3L, 5L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 1L, 2L, 3L ), .Label = c(, important, not important, somewhat important, very important), class = factor), Q1_3 = structure(c(4L, 4L, 4L, 4L, 4L, 1L, 1L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 1L, 4L), .Label = c(important, not important, somewhat important, very important), class = factor), Q1_4 = structure(c(5L,
Re: [R] Having trouble converting a dataframe of character vectors to factors
#So I tried I tried this which had no effect keepcols- grepl(Q1_,names(scs.c2)) levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important) #then this which also failed. It coerced a bunch of NA's and turned the vectors back to character vectors scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x) factor(x,levels(x)[c(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important)]) First, to make a factor variable with a given set of levels, use factor(x, levels=yourLevels) (and not levels(x) - yourLevels). Also, to change a character vector to a factor with levels that are different than the values in the character vector, I would use the levels and labels arguments to factor. E.g., x - c(i, iii) factor(x, levels=c(i,ii,iii), labels=c(One,Two,Three)) [1] One Three Levels: One Two Three I haven't tried your complete example, but I would not use sapply() when producing something you will want to convert to the columns of a data.frame. Use lapply() instead. I generally use 'for' loops to process the columns of a data.frame one at a time. It is easy to understand, is quick enough, and may even reduce memory usage. E.g., instead of keepcols - grepl(Q1_, names(csc.c2)) scs.c2[, keepCols] - lapply(scs.c2[, keepCols], function(x)factor(x,levels=c(...))) try for(iCol in grep(Q1_, names(scs.c2))) { scs.c2[, iCol] - factor(scs.c2[, iCol], levels=c(...)) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Lopez, Dan [mailto:lopez...@llnl.gov] Sent: Thursday, February 21, 2013 2:51 PM To: William Dunlap; Mark Lamias; R help (r-help@r-project.org) Subject: RE: [R] Having trouble converting a dataframe of character vectors to factors Hi Bill, Great info. The problem is what was originally given to me looks like DPUT1 below (random sample of 25). This is the only format they can give me this in and the data already looks molten. So I applied reshape2::dcast which resulted in a dataframe made of character vectors; except for the first column which is an integer vector. So after dropping columns full of (blanks) and reordering columns I figured I needed factors to accomplish my goal (refer below) and converted everything to factors with: x2[,-1]-as.data.frame(lapply(x[,-1],as.factor)) and ended up with DPUT2 below (random sample of 25) Now after reading your last email I figured I've done will since no attributes got dropped and no levels got dropped (just need to add some in because couldn't be derived from original dataframe) and column names seem fine. Now I have a new problem which is how to reorder levels in a dataframe and possible add some unused. After seeing contents using Hmisc::contents I figured the next logical step is to handle like vectors a chunk at a time. For example subsetting to grepl(Q1_,names(scs.c2)) gives these vectors which all have identical levels except for one: $Q1_1 thru $Q1_7 except $Q1_3 [1]important not important somewhat important very important $Q1_3 [1] important not important somewhat important very important #So I tried I tried this which had no effect keepcols- grepl(Q1_,names(scs.c2)) levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important) #then this which also failed. It coerced a bunch of NA's and turned the vectors back to character vectors scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x) factor(x,levels(x)[c(NoResp=,NotImportant=not important,SomewhatImpt=somewhat important,Important=important,VeryImpt=very important)]) Mind you I can easily do this in MS Excel and is probably what I am going to break down and do fairly soon. But I wanted to give this a good solid shot in R because I want to learn to handle these situations in R. I've been using R for almost a year. __ ADDITIONAL BACKGROUND MY GOAL I ultimately want to get started with some basic correlation analysis for some of the columns : taking your example (slightly modified) I hope to be able to do this xx - data.frame(stringsAsFactors=FALSE, check.names=FALSE,No/Yes = factor(c(Yes,No,No,No), levels=c(No,Yes)), Size = ordered(c(Small,Large,Medium,Medium), levels=c(Small,Medium,Large)),Name = c(Adam,Bill,Chuck,Larry)) cor(sapply(xx[,1:2],as.numeric)) No/Yes Size No/Yes 1.000 -0.8164966 Size -0.8164966 1.000 DPUT1 structure(list(svaID = c(771L, 771L, 775L, 775L, 774L, 776L, 774L, 771L, 771L, 771L, 771L, 774L, 774L, 775L, 765L, 775L, 765L, 775L, 771L, 777L, 775L, 771L, 774L, 776L, 776L), question = structure(c(19L, 12L, 23L, 3L, 10L, 36L, 25L, 1L, 30L, 7L,
[R] Having trouble converting a dataframe of character vectors to factors
R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 = c(very important, somewhat important, important, somewhat important, important, important, very important, very important, somewhat important, not important), Q2 = c(Somewhat, Very Much, Somewhat, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much), Q3 = c(yes, yes, yes, yes, yes, yes, yes, yes, yes, yes), Q4 = c(None, None, None, None, Confirmed Field of Study, Confirmed Field of Study, Confirmed Field of Study, None, None, None)), .Names = c(Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having trouble converting a dataframe of character vectors to factors
Pleaser re-read ?sapply and pay particular attention to the simplify argument. The following should help explain the issues: z - data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE) sapply(z,class) a b character character z1 - sapply(z,as.factor) sapply(z1,class) a b c d e f character character character character character character z2 - sapply(z,factor, simplify = FALSE) sapply(z2,class) ab factor factor z3 - lapply(z,factor) sapply(z3,class) ab factor factor z3 $a [1] a b c Levels: a b c $b [1] d e f Levels: d e f ## Note that both z2 and z3 are lists, and would have to be converted back to data frames. -- Bert On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan lopez...@llnl.gov wrote: R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 = c(very important, somewhat important, important, somewhat important, important, important, very important, very important, somewhat important, not important), Q2 = c(Somewhat, Very Much, Somewhat, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much), Q3 = c(yes, yes, yes, yes, yes, yes, yes, yes, yes, yes), Q4 = c(None, None, None, None, Confirmed Field of Study, Confirmed Field of Study, Confirmed Field of Study, None, None, None)), .Names = c(Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Having trouble converting a dataframe of character vectors to factors
How about this? scs2-data.frame(lapply(scs2, factor)) From: Lopez, Dan lopez...@llnl.gov To: R help (r-help@r-project.org) r-help@r-project.org Sent: Wednesday, February 20, 2013 7:09 PM Subject: [R] Having trouble converting a dataframe of character vectors to factors R Experts, I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors. I tried the following which did not work: scs2-sapply(scs2,as.factor) also this didn't work: scs2-sapply(scs2,function(x) as.factor(x)) After doing either of above I end up with str(scs2) chr [1:10, 1:10] very important very important very important very important ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ... class(scs2) matrix But when I do it one at a time it works: scs2$Q1_1-as.factor(scs2$Q1_1) scs2$Q1_2- as.factor(scs2$Q1_2) What am I doing wrong? How do I accomplish this with sapply or similar function? Data for reproducibility: scs2-structure(list(Q1_1 = c(very important, very important, very important, very important, very important, very important, very important, somewhat important, important, very important), Q1_2 = c(important, somewhat important, very important, important, important, very important, somewhat important, somewhat important, very important, very important), Q1_3 = c(very important, important, very important, very important, important, very important, very important, somewhat important, not important, important), Q1_4 = c(very important, important, very important, very important, important, important, important, very important, somewhat important, important), Q1_5 = c(very important, not important, important, very important, not important, important, somewhat important, important, somewhat important, not important), Q1_6 = c(very important, not important, important, very important, somewhat important, very important, very important, very important, important, important), Q1_7 = c(very important, somewhat important, important, somewhat important, important, important, very important, very important, somewhat important, not important), Q2 = c(Somewhat, Very Much, Somewhat, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much, Very Much), Q3 = c(yes, yes, yes, yes, yes, yes, yes, yes, yes, yes), Q4 = c(None, None, None, None, Confirmed Field of Study, Confirmed Field of Study, Confirmed Field of Study, None, None, None)), .Names = c(Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, 172L, 110L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.