Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-21 Thread Lopez, Dan
Hi Bert,

Thanks for drawing my attention to simplify argument and for the examples. I 
understand know.

Thanks.
Dan


-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com] 
Sent: Wednesday, February 20, 2013 4:25 PM
To: Lopez, Dan
Cc: R help (r-help@r-project.org)
Subject: Re: [R] Having trouble converting a dataframe of character vectors to 
factors

Pleaser re-read ?sapply and pay particular attention to the simplify argument.

The following should help explain the issues:

 z - data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE)
 sapply(z,class)
  a   b
character character
 z1 - sapply(z,as.factor)
 sapply(z1,class)
  a   b   c   d   e   f
character character character character character character
 z2 - sapply(z,factor, simplify = FALSE)
 sapply(z2,class)
   ab
factor factor
 z3 - lapply(z,factor)
 sapply(z3,class)
   ab
factor factor
 z3
$a
[1] a b c
Levels: a b c

$b
[1] d e f
Levels: d e f

## Note that both z2 and z3 are lists, and would have to be converted back to 
data frames.

-- Bert

On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan lopez...@llnl.gov wrote:
 R Experts,

 I have a dataframe made up of character vectors--these are results from 
 survey questions. I need to convert them to factors.

 I tried the following which did not work:
 scs2-sapply(scs2,as.factor)
 also this didn't work:
 scs2-sapply(scs2,function(x) as.factor(x))

 After doing either of above I end up with
str(scs2)

 chr [1:10, 1:10] very important very important very important very 
 important ...

  - attr(*, dimnames)=List of 2

   ..$ : NULL

   ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...

class(scs2)
 matrix

 But when I do it one at a time it works:
 scs2$Q1_1-as.factor(scs2$Q1_1)
 scs2$Q1_2- as.factor(scs2$Q1_2)

 What am I doing wrong?  How do I accomplish this with sapply or similar 
 function?

 Data for reproducibility:


 scs2-structure(list(Q1_1 = c(very important, very important, 
 very important,

 very important, very important, very important, very 
 important,

 somewhat important, important, very important), Q1_2 = 
 c(important,

 somewhat important, very important, important, important,

 very important, somewhat important, somewhat important,

 very important, very important), Q1_3 = c(very important,

 important, very important, very important, important,

 very important, very important, somewhat important, not 
 important,

 important), Q1_4 = c(very important, important, very 
 important,

 very important, important, important, important, very 
 important,

 somewhat important, important), Q1_5 = c(very important,

 not important, important, very important, not important,

 important, somewhat important, important, somewhat important,

 not important), Q1_6 = c(very important, not important,

 important, very important, somewhat important, very important,

 very important, very important, important, important),

 Q1_7 = c(very important, somewhat important, important,

 somewhat important, important, important, very important,

 very important, somewhat important, not important),

 Q2 = c(Somewhat, Very Much, Somewhat, Very Much,

 Very Much, Very Much, Very Much, Very Much, Very Much,

 Very Much), Q3 = c(yes, yes, yes, yes, yes, yes,

 yes, yes, yes, yes), Q4 = c(None, None, None,

 None, Confirmed Field of Study, Confirmed Field of Study,

 Confirmed Field of Study, None, None, None)), .Names = 
 c(Q1_1,

 Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4

 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,

 172L, 110L), class = data.frame)


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-21 Thread William Dunlap
 scs2-data.frame(lapply(scs2, factor))

Calling data.frame() on the output of lapply() can result in changing column 
names
and will drop attributes that the input data.frame may have had.  I prefer to 
modify
the original data.frame instead of making a new one from scratch to avoid these 
problems.

Also, calling factor() on a factor will drop any unused levels, which you may 
not want
to do.  Calling as.factor will not.

Compare the following three methods

  f1 - function (dataFrame) {
  dataFrame[] - lapply(dataFrame, factor)
  dataFrame
  }
  f2 - function (dataFrame) {
  dataFrame[] - lapply(dataFrame, as.factor)
  dataFrame
  }
  f3 - function (dataFrame) {
  data.frame(lapply(dataFrame, factor))
  }

on the following data.frame
  x - data.frame(stringsAsFactors=FALSE, check.names=FALSE,
   No/Yes = factor(c(Yes,Yes,Yes), levels=c(No,Yes)),
   Size = ordered(c(Small,Large,Medium), 
levels=c(Small,Medium,Large)),
   Name = c(Adam,Bill,Chuck))
  attr(x, Date) - as.POSIXlt(2013-02-21)


   str(x)
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : chr  Adam Bill Chuck
   - attr(*, Date)= POSIXlt, format: 2013-02-21

   str(f1(x)) # drops unused levels
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 1 level Yes: 1 1 1
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3
   - attr(*, Date)= POSIXlt, format: 2013-02-21
   str(f2(x))
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3
   - attr(*, Date)= POSIXlt, format: 2013-02-21
   str(f3(x)) # mangles column names, drops unused levels, drops Date attribute
  'data.frame':   3 obs. of  3 variables:
   $ No.Yes: Factor w/ 1 level Yes: 1 1 1
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Mark Lamias
 Sent: Wednesday, February 20, 2013 6:51 PM
 To: Daniel Lopez; R help (r-help@r-project.org)
 Subject: Re: [R] Having trouble converting a dataframe of character vectors 
 to factors
 
 How about this?
 
 scs2-data.frame(lapply(scs2, factor))
 
 
 
 
 
  From: Lopez, Dan lopez...@llnl.gov
 To: R help (r-help@r-project.org) r-help@r-project.org
 Sent: Wednesday, February 20, 2013 7:09 PM
 Subject: [R] Having trouble converting a dataframe of character vectors to 
 factors
 
 R Experts,
 
 I have a dataframe made up of character vectors--these are results from survey
 questions. I need to convert them to factors.
 
 I tried the following which did not work:
 scs2-sapply(scs2,as.factor)
 also this didn't work:
 scs2-sapply(scs2,function(x) as.factor(x))
 
 After doing either of above I end up with
 str(scs2)
 
 chr [1:10, 1:10] very important very important very important very 
 important ...
 
 - attr(*, dimnames)=List of 2
 
   ..$ : NULL
 
   ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...
 
 class(scs2)
 matrix
 
 But when I do it one at a time it works:
 scs2$Q1_1-as.factor(scs2$Q1_1)
 scs2$Q1_2- as.factor(scs2$Q1_2)
 
 What am I doing wrong?  How do I accomplish this with sapply or similar 
 function?
 
 Data for reproducibility:
 
 
 scs2-structure(list(Q1_1 = c(very important, very important, very 
 important,
 
 very important, very important, very important, very important,
 
 somewhat important, important, very important), Q1_2 = c(important,
 
 somewhat important, very important, important, important,
 
 very important, somewhat important, somewhat important,
 
 very important, very important), Q1_3 = c(very important,
 
 important, very important, very important, important,
 
 very important, very important, somewhat important, not important,
 
 important), Q1_4 = c(very important, important, very important,
 
 very important, important, important, important, very important,
 
 somewhat important, important), Q1_5 = c(very important,
 
 not important, important, very important, not important,
 
 important, somewhat important, important, somewhat important,
 
 not important), Q1_6 = c(very important, not important,
 
 important, very important, somewhat important, very important,
 
 very important, very important, important, important),
 
     Q1_7 = c(very important, somewhat important, important,
 
     somewhat important, important, important, very important,
 
     very important, somewhat important, not important),
 
     Q2 = c(Somewhat, Very Much, Somewhat, Very Much,
 
     Very Much, Very Much, Very Much, Very Much, Very Much,
 
     Very Much), Q3 = c(yes, yes, yes, yes, yes, yes,
 
     yes, yes, yes, yes), 

Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-21 Thread Mark Lamias
Great point, William.  I agree your approach is the one to take to preserve 
attributes.

Thanks for following up.





 From: William Dunlap wdun...@tibco.com

elp (r-help@r-project.org) r-help@r-project.org 
Sent: Thursday, February 21, 2013 11:32 AM
Subject: RE: [R] Having trouble converting a dataframe of character vectors to 
factors

 scs2-data.frame(lapply(scs2, factor))

Calling data.frame() on the output of lapply() can result in changing column 
names
and will drop attributes that the input data.frame may have had.  I prefer to 
modify
the original data.frame instead of making a new one from scratch to avoid these 
problems.

Also, calling factor() on a factor will drop any unused levels, which you may 
not want
to do.  Calling as.factor will not.

Compare the following three methods

  f1 - function (dataFrame) {
      dataFrame[] - lapply(dataFrame, factor)
      dataFrame
  }
  f2 - function (dataFrame) {
      dataFrame[] - lapply(dataFrame, as.factor)
      dataFrame
  }
  f3 - function (dataFrame) {
      data.frame(lapply(dataFrame, factor))
  }

on the following data.frame
  x - data.frame(stringsAsFactors=FALSE, check.names=FALSE,
               No/Yes = factor(c(Yes,Yes,Yes), levels=c(No,Yes)),
               Size = ordered(c(Small,Large,Medium), 
levels=c(Small,Medium,Large)),
               Name = c(Adam,Bill,Chuck))
  attr(x, Date) - as.POSIXlt(2013-02-21)


   str(x)
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : chr  Adam Bill Chuck
   - attr(*, Date)= POSIXlt, format: 2013-02-21

   str(f1(x)) # drops unused levels
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 1 level Yes: 1 1 1
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3
   - attr(*, Date)= POSIXlt, format: 2013-02-21
   str(f2(x))
  'data.frame':   3 obs. of  3 variables:
   $ No/Yes: Factor w/ 2 levels No,Yes: 2 2 2
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3
   - attr(*, Date)= POSIXlt, format: 2013-02-21
   str(f3(x)) # mangles column names, drops unused levels, drops Date attribute
  'data.frame':   3 obs. of  3 variables:
   $ No.Yes: Factor w/ 1 level Yes: 1 1 1
   $ Size  : Ord.factor w/ 3 levels SmallMedium..: 1 3 2
   $ Name  : Factor w/ 3 levels Adam,Bill,..: 1 2 3

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Mark Lamias
 Sent: Wednesday, February 20, 2013 6:51 PM
 To: Daniel Lopez; R help (r-help@r-project.org)
 Subject: Re: [R] Having trouble converting a dataframe of character vectors 
 to factors
 
 How about this?
 
 scs2-data.frame(lapply(scs2, factor))
 
 
 
 
 
  From: Lopez, Dan lopez...@llnl.gov
 To: R help (r-help@r-project.org) r-help@r-project.org
 Sent: Wednesday, February 20, 2013 7:09 PM
 Subject: [R] Having trouble converting a dataframe of character vectors to 
 factors
 
 R Experts,
 
 I have a dataframe made up of character vectors--these are results from survey
 questions. I need to convert them to factors.
 
 I tried the following which did not work:
 scs2-sapply(scs2,as.factor)
 also this didn't work:
 scs2-sapply(scs2,function(x) as.factor(x))
 
 After doing either of above I end up with
 str(scs2)
 
 chr [1:10, 1:10] very important very important very important very 
 important ...
 
 - attr(*, dimnames)=List of 2
 
   ..$ : NULL
 
   ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...
 
 class(scs2)
 matrix
 
 But when I do it one at a time it works:
 scs2$Q1_1-as.factor(scs2$Q1_1)
 scs2$Q1_2- as.factor(scs2$Q1_2)
 
 What am I doing wrong?  How do I accomplish this with sapply or similar 
 function?
 
 Data for reproducibility:
 
 
 scs2-structure(list(Q1_1 = c(very important, very important, very 
 important,
 
 very important, very important, very important, very important,
 
 somewhat important, important, very important), Q1_2 = c(important,
 
 somewhat important, very important, important, important,
 
 very important, somewhat important, somewhat important,
 
 very important, very important), Q1_3 = c(very important,
 
 important, very important, very important, important,
 
 very important, very important, somewhat important, not important,
 
 important), Q1_4 = c(very important, important, very important,
 
 very important, important, important, important, very important,
 
 somewhat important, important), Q1_5 = c(very important,
 
 not important, important, very important, not important,
 
 important, somewhat important, important, somewhat important,
 
 not important), Q1_6 = c(very important, not important,
 
 important, very important, somewhat important, very important,
 
 very important, very important, important, important),
 
     Q1_7 = 

Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-21 Thread Lopez, Dan
Hi Bill,

Great info.

The problem is what was originally given to me looks like DPUT1 below (random 
sample of 25).

This is the only format they can give me this in and the data already looks 
molten. So I applied reshape2::dcast which resulted in a dataframe made of 
character vectors; except for the first column which is an integer vector.

So after dropping columns full of  (blanks) and reordering columns I figured 
I needed factors to accomplish my goal (refer below) and converted everything 
to factors with:
 x2[,-1]-as.data.frame(lapply(x[,-1],as.factor))

and ended up with DPUT2 below (random sample of 25)

Now after reading your last email I figured I've done will since no attributes 
got dropped and no levels got dropped (just need to add some in because 
couldn't be derived from original dataframe) and column names seem fine.

Now I have a new problem which is how to reorder levels in a dataframe and 
possible add some unused. After seeing contents using Hmisc::contents I figured 
the next logical step is to handle like vectors a chunk at a time.
For example subsetting to grepl(Q1_,names(scs.c2)) gives these vectors which 
all have identical levels except for one:
$Q1_1 thru $Q1_7 except $Q1_3 
[1]important  not important  somewhat 
important very important
$Q1_3
[1] important  not important  somewhat important very 
important

#So I tried I tried this which had no effect
keepcols- grepl(Q1_,names(scs.c2))
levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not 
important,SomewhatImpt=somewhat 
important,Important=important,VeryImpt=very important)
#then this which also failed. It coerced a bunch of NA's and turned the vectors 
back to character vectors
scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x) 
factor(x,levels(x)[c(NoResp=,NotImportant=not 
important,SomewhatImpt=somewhat 
important,Important=important,VeryImpt=very important)])
 
Mind you I can easily do this in MS Excel and is probably what I am going to 
break down and do fairly soon. But I wanted to give this a good solid shot in R 
because I want to learn to handle these situations in R. I've been using R for 
almost a year.
__
ADDITIONAL BACKGROUND

MY GOAL
I ultimately want to get started with some basic correlation analysis for some 
of the columns : taking your example (slightly modified) I hope to be able to 
do this
xx - data.frame(stringsAsFactors=FALSE, check.names=FALSE,No/Yes = 
factor(c(Yes,No,No,No), levels=c(No,Yes)),
Size = ordered(c(Small,Large,Medium,Medium), 
levels=c(Small,Medium,Large)),Name = c(Adam,Bill,Chuck,Larry))
 cor(sapply(xx[,1:2],as.numeric))
No/Yes   Size
No/Yes  1.000 -0.8164966
Size   -0.8164966  1.000

DPUT1
structure(list(svaID = c(771L, 771L, 775L, 775L, 774L, 776L, 
774L, 771L, 771L, 771L, 771L, 774L, 774L, 775L, 765L, 775L, 765L, 
775L, 771L, 777L, 775L, 771L, 774L, 776L, 776L), question = structure(c(19L, 
12L, 23L, 3L, 10L, 36L, 25L, 1L, 30L, 7L, 21L, 13L, 16L, 32L, 
6L, 5L, 18L, 19L, 14L, 2L, 2L, 9L, 37L, 28L, 24L), .Label = c(Q1, 
Q1_1, Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q10, 
Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q17_1, Q17_2, 
Q17_3, Q17_4, Q17_5, Q18, Q19, Q2, Q20, Q3, Q4, 
Q5, Q6, Q6_A_1, Q6_A_2, Q6_A_3, Q6_A_4, Q6_A_5, 
Q7, Q8, Q9), class = factor), answer = structure(c(11L, 
29L, 29L, 26L, 29L, 29L, 1L, 1L, 1L, 13L, 11L, 1L, 1L, 1L, 26L, 
26L, 11L, 11L, 29L, 13L, 13L, 29L, 29L, 29L, 27L), .Label = c(, 
1, 2, 3, 4, 5, Change of College/University, Change of Field of 
Study, 
Confirmed Field of Study, did not meet expectations, exceeded 
expectations, 
Family/Friend, important, Live Locally, LLNL Contact, 
LLNL Housing page, Local Newspaper, met expectations, no, 
None, Not at All, not important, Pursue an Advanced Degree, 
Somewhat, somewhat important, very important, Very Much, 
Web, yes), class = factor)), .Names = c(svaID, question, 
answer), row.names = c(68L, 62L, 147L, 113L, 97L, 168L, 111L, 
45L, 51L, 43L, 70L, 100L, 108L, 127L, 5L, 115L, 30L, 142L, 64L, 
186L, 112L, 59L, 95L, 160L, 157L), class = data.frame)

DPUT2
structure(list(svaID = c(765L, 771L, 774L, 775L, 776L, 777L, 
778L, 779L, 782L, 783L, 786L, 788L, 789L, 790L, 791L, 793L, 794L, 
795L, 797L, 801L, 803L, 804L, 805L, 807L, 808L), Q1_1 = structure(c(5L, 
5L, 5L, 2L, 5L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 
5L, 5L, 5L, 5L, 5L, 2L, 2L, 2L), .Label = c(, important, 
not important, somewhat important, very important), class = factor), 
Q1_2 = structure(c(2L, 5L, 2L, 5L, 2L, 4L, 3L, 5L, 4L, 2L, 
2L, 5L, 2L, 3L, 5L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 1L, 2L, 3L
), .Label = c(, important, not important, somewhat important,
very important), class = factor), Q1_3 = structure(c(4L, 
4L, 4L, 4L, 4L, 1L, 1L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 
4L, 4L, 1L, 4L, 4L, 4L, 4L, 1L, 4L), .Label = c(important, 
not important, somewhat important, very important), class = 
factor), 
Q1_4 = structure(c(5L, 

Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-21 Thread William Dunlap
#So I tried I tried this which had no effect
keepcols- grepl(Q1_,names(scs.c2))
levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not 
important,SomewhatImpt=somewhat   
important,Important=important,VeryImpt=very important)
#then this which also failed. It coerced a bunch of NA's and turned the 
vectors back to character vectors
scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x) 
factor(x,levels(x)[c(NoResp=,NotImportant=not 
important,SomewhatImpt=somewhat 
important,Important=important,VeryImpt=very important)])

First, to make a factor variable with a given set of levels, use
factor(x, levels=yourLevels)
(and not levels(x) - yourLevels).

Also, to change a character vector to a factor with levels that are different 
than the
values in the character vector, I would use the levels and labels arguments to 
factor.  E.g.,
x - c(i, iii)
factor(x, levels=c(i,ii,iii), labels=c(One,Two,Three))
[1] One   Three
Levels: One Two Three

I haven't tried your complete example, but I would not use sapply() when 
producing
something you will want to convert to the columns of a data.frame.  Use 
lapply() instead.

I generally use 'for' loops to process the columns of a data.frame one at a 
time.
It is easy to understand, is quick enough, and may even reduce memory usage.  
E.g.,
instead of
keepcols - grepl(Q1_, names(csc.c2))
scs.c2[, keepCols] - lapply(scs.c2[, keepCols], 
function(x)factor(x,levels=c(...)))
try
for(iCol in grep(Q1_, names(scs.c2))) {
scs.c2[, iCol] - factor(scs.c2[, iCol], levels=c(...))
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Lopez, Dan [mailto:lopez...@llnl.gov]
 Sent: Thursday, February 21, 2013 2:51 PM
 To: William Dunlap; Mark Lamias; R help (r-help@r-project.org)
 Subject: RE: [R] Having trouble converting a dataframe of character vectors 
 to factors
 
 Hi Bill,
 
 Great info.
 
 The problem is what was originally given to me looks like DPUT1 below (random 
 sample
 of 25).
 
 This is the only format they can give me this in and the data already looks 
 molten. So I
 applied reshape2::dcast which resulted in a dataframe made of character 
 vectors; except
 for the first column which is an integer vector.
 
 So after dropping columns full of  (blanks) and reordering columns I 
 figured I needed
 factors to accomplish my goal (refer below) and converted everything to 
 factors with:
  x2[,-1]-as.data.frame(lapply(x[,-1],as.factor))
 
 and ended up with DPUT2 below (random sample of 25)
 
 Now after reading your last email I figured I've done will since no 
 attributes got dropped
 and no levels got dropped (just need to add some in because couldn't be 
 derived from
 original dataframe) and column names seem fine.
 
 Now I have a new problem which is how to reorder levels in a dataframe and 
 possible
 add some unused. After seeing contents using Hmisc::contents I figured the 
 next logical
 step is to handle like vectors a chunk at a time.
 For example subsetting to grepl(Q1_,names(scs.c2)) gives these vectors 
 which all have
 identical levels except for one:
 $Q1_1 thru $Q1_7 except $Q1_3
 [1]important  not important  somewhat 
 important very
 important
 $Q1_3
 [1] important  not important  somewhat important very 
 important
 
 #So I tried I tried this which had no effect
 keepcols- grepl(Q1_,names(scs.c2))
 levels(scs.c2[,keepcols])-list(NoResp=,NotImportant=not
 important,SomewhatImpt=somewhat
 important,Important=important,VeryImpt=very important)
 #then this which also failed. It coerced a bunch of NA's and turned the 
 vectors back to
 character vectors
 scs.c2[,keepcols]-sapply(scs.c2[,keepcols],function(x)
 factor(x,levels(x)[c(NoResp=,NotImportant=not
 important,SomewhatImpt=somewhat
 important,Important=important,VeryImpt=very important)])
 
 Mind you I can easily do this in MS Excel and is probably what I am going to 
 break down
 and do fairly soon. But I wanted to give this a good solid shot in R because 
 I want to
 learn to handle these situations in R. I've been using R for almost a year.
 __
 ADDITIONAL BACKGROUND
 
 MY GOAL
 I ultimately want to get started with some basic correlation analysis for 
 some of the
 columns : taking your example (slightly modified) I hope to be able to do this
 xx - data.frame(stringsAsFactors=FALSE, check.names=FALSE,No/Yes =
 factor(c(Yes,No,No,No), levels=c(No,Yes)),
 Size = ordered(c(Small,Large,Medium,Medium),
 levels=c(Small,Medium,Large)),Name = c(Adam,Bill,Chuck,Larry))
  cor(sapply(xx[,1:2],as.numeric))
 No/Yes   Size
 No/Yes  1.000 -0.8164966
 Size   -0.8164966  1.000
 
 DPUT1
 structure(list(svaID = c(771L, 771L, 775L, 775L, 774L, 776L,
 774L, 771L, 771L, 771L, 771L, 774L, 774L, 775L, 765L, 775L, 765L,
 775L, 771L, 777L, 775L, 771L, 774L, 776L, 776L), question = structure(c(19L,
 12L, 23L, 3L, 10L, 36L, 25L, 1L, 30L, 7L, 

[R] Having trouble converting a dataframe of character vectors to factors

2013-02-20 Thread Lopez, Dan
R Experts,

I have a dataframe made up of character vectors--these are results from survey 
questions. I need to convert them to factors.

I tried the following which did not work:
scs2-sapply(scs2,as.factor)
also this didn't work:
scs2-sapply(scs2,function(x) as.factor(x))

After doing either of above I end up with
str(scs2)

chr [1:10, 1:10] very important very important very important very 
important ...

 - attr(*, dimnames)=List of 2

  ..$ : NULL

  ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...

class(scs2)
matrix

But when I do it one at a time it works:
scs2$Q1_1-as.factor(scs2$Q1_1)
scs2$Q1_2- as.factor(scs2$Q1_2)

What am I doing wrong?  How do I accomplish this with sapply or similar 
function?

Data for reproducibility:


scs2-structure(list(Q1_1 = c(very important, very important, very 
important,

very important, very important, very important, very important,

somewhat important, important, very important), Q1_2 = c(important,

somewhat important, very important, important, important,

very important, somewhat important, somewhat important,

very important, very important), Q1_3 = c(very important,

important, very important, very important, important,

very important, very important, somewhat important, not important,

important), Q1_4 = c(very important, important, very important,

very important, important, important, important, very important,

somewhat important, important), Q1_5 = c(very important,

not important, important, very important, not important,

important, somewhat important, important, somewhat important,

not important), Q1_6 = c(very important, not important,

important, very important, somewhat important, very important,

very important, very important, important, important),

Q1_7 = c(very important, somewhat important, important,

somewhat important, important, important, very important,

very important, somewhat important, not important),

Q2 = c(Somewhat, Very Much, Somewhat, Very Much,

Very Much, Very Much, Very Much, Very Much, Very Much,

Very Much), Q3 = c(yes, yes, yes, yes, yes, yes,

yes, yes, yes, yes), Q4 = c(None, None, None,

None, Confirmed Field of Study, Confirmed Field of Study,

Confirmed Field of Study, None, None, None)), .Names = c(Q1_1,

Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4

), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,

172L, 110L), class = data.frame)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-20 Thread Bert Gunter
Pleaser re-read ?sapply and pay particular attention to the simplify argument.

The following should help explain the issues:

 z - data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE)
 sapply(z,class)
  a   b
character character
 z1 - sapply(z,as.factor)
 sapply(z1,class)
  a   b   c   d   e   f
character character character character character character
 z2 - sapply(z,factor, simplify = FALSE)
 sapply(z2,class)
   ab
factor factor
 z3 - lapply(z,factor)
 sapply(z3,class)
   ab
factor factor
 z3
$a
[1] a b c
Levels: a b c

$b
[1] d e f
Levels: d e f

## Note that both z2 and z3 are lists, and would have to be converted
back to data frames.

-- Bert

On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan lopez...@llnl.gov wrote:
 R Experts,

 I have a dataframe made up of character vectors--these are results from 
 survey questions. I need to convert them to factors.

 I tried the following which did not work:
 scs2-sapply(scs2,as.factor)
 also this didn't work:
 scs2-sapply(scs2,function(x) as.factor(x))

 After doing either of above I end up with
str(scs2)

 chr [1:10, 1:10] very important very important very important very 
 important ...

  - attr(*, dimnames)=List of 2

   ..$ : NULL

   ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...

class(scs2)
 matrix

 But when I do it one at a time it works:
 scs2$Q1_1-as.factor(scs2$Q1_1)
 scs2$Q1_2- as.factor(scs2$Q1_2)

 What am I doing wrong?  How do I accomplish this with sapply or similar 
 function?

 Data for reproducibility:


 scs2-structure(list(Q1_1 = c(very important, very important, very 
 important,

 very important, very important, very important, very important,

 somewhat important, important, very important), Q1_2 = c(important,

 somewhat important, very important, important, important,

 very important, somewhat important, somewhat important,

 very important, very important), Q1_3 = c(very important,

 important, very important, very important, important,

 very important, very important, somewhat important, not important,

 important), Q1_4 = c(very important, important, very important,

 very important, important, important, important, very important,

 somewhat important, important), Q1_5 = c(very important,

 not important, important, very important, not important,

 important, somewhat important, important, somewhat important,

 not important), Q1_6 = c(very important, not important,

 important, very important, somewhat important, very important,

 very important, very important, important, important),

 Q1_7 = c(very important, somewhat important, important,

 somewhat important, important, important, very important,

 very important, somewhat important, not important),

 Q2 = c(Somewhat, Very Much, Somewhat, Very Much,

 Very Much, Very Much, Very Much, Very Much, Very Much,

 Very Much), Q3 = c(yes, yes, yes, yes, yes, yes,

 yes, yes, yes, yes), Q4 = c(None, None, None,

 None, Confirmed Field of Study, Confirmed Field of Study,

 Confirmed Field of Study, None, None, None)), .Names = c(Q1_1,

 Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4

 ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,

 172L, 110L), class = data.frame)


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble converting a dataframe of character vectors to factors

2013-02-20 Thread Mark Lamias
How about this?

scs2-data.frame(lapply(scs2, factor))





 From: Lopez, Dan lopez...@llnl.gov
To: R help (r-help@r-project.org) r-help@r-project.org 
Sent: Wednesday, February 20, 2013 7:09 PM
Subject: [R] Having trouble converting a dataframe of character vectors to 
factors

R Experts,

I have a dataframe made up of character vectors--these are results from survey 
questions. I need to convert them to factors.

I tried the following which did not work:
scs2-sapply(scs2,as.factor)
also this didn't work:
scs2-sapply(scs2,function(x) as.factor(x))

After doing either of above I end up with
str(scs2)

chr [1:10, 1:10] very important very important very important very 
important ...

- attr(*, dimnames)=List of 2

  ..$ : NULL

  ..$ : chr [1:10] Q1_1 Q1_2 Q1_3 Q1_4 ...

class(scs2)
matrix

But when I do it one at a time it works:
scs2$Q1_1-as.factor(scs2$Q1_1)
scs2$Q1_2- as.factor(scs2$Q1_2)

What am I doing wrong?  How do I accomplish this with sapply or similar 
function?

Data for reproducibility:


scs2-structure(list(Q1_1 = c(very important, very important, very 
important,

very important, very important, very important, very important,

somewhat important, important, very important), Q1_2 = c(important,

somewhat important, very important, important, important,

very important, somewhat important, somewhat important,

very important, very important), Q1_3 = c(very important,

important, very important, very important, important,

very important, very important, somewhat important, not important,

important), Q1_4 = c(very important, important, very important,

very important, important, important, important, very important,

somewhat important, important), Q1_5 = c(very important,

not important, important, very important, not important,

important, somewhat important, important, somewhat important,

not important), Q1_6 = c(very important, not important,

important, very important, somewhat important, very important,

very important, very important, important, important),

    Q1_7 = c(very important, somewhat important, important,

    somewhat important, important, important, very important,

    very important, somewhat important, not important),

    Q2 = c(Somewhat, Very Much, Somewhat, Very Much,

    Very Much, Very Much, Very Much, Very Much, Very Much,

    Very Much), Q3 = c(yes, yes, yes, yes, yes, yes,

    yes, yes, yes, yes), Q4 = c(None, None, None,

    None, Confirmed Field of Study, Confirmed Field of Study,

    Confirmed Field of Study, None, None, None)), .Names = c(Q1_1,

Q1_2, Q1_3, Q1_4, Q1_5, Q1_6, Q1_7, Q2, Q3, Q4

), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,

172L, 110L), class = data.frame)


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.