Re: [R] Convert factor to numeric vector of labels
My reason for setting stringsAsFactors = FALSE is more that I really dislike having R convert what I "think" are character variables to factors when I import data. I suspect that it takes quite a few new users by surprise that what they had intended to be a character variable has become a factor. And it can take a long time to track down the problem if you're a newbie. - A quick (overly simple) example where I had intended the data in the second column to be character. Original data found at http://ca.geocities.com/jrkrideau/R/facts.txt 1, b 1, b 3, b 3, b 4, a 4, a 3, a options(stringsAsFactors = TRUE) df <- read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";) ; df[,2] [1] b b b a a a Levels: a b options(stringsAsFactors = FALSE) df <- read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";) ; df[,2] [1] " b" " b" " b" " a" " a" " a" - There are probably good reasons for setting the default either way and while currently, I am strongly of the FALSE persuation I can see some serious problems changing the default, particularly when most existing code will assume TRUE. It might be that a "Why are my character variables turning into factors" as a compliment to "How do I convert factors to numeric" in the FAQ would be sufficient. As it is the reader knows what seems to have happened but there is no clue as to why or how this is happening. If there are enough problems in importing numeric as factors a note about the default might be worthwhile in both FAQ entries since it seems to indicate that this is not a rare problem. --- Matthew Keller <[EMAIL PROTECTED]> wrote: > Hi all, > > If we, the R community, are endeavoring to make R > user friendly > (gasp!), I think that one of the first places to > start would be in > setting stringsAsFactors = FALSE. Several times I've > run into > instances of folks decrying R's "rediculous usage of > memory" in > reading data, only to come to find out that these > folks were > unknowingly importing certain columns as factors. > The fix is easy once > you know it, but it isn't obvious to new users, and > I'd bet that it > turns some % of people off of the program. Factors > are not used often > enough to justify this default behavior in my > opinion. When factors > are used, the user knows to treat the variable as a > factor, and so it > can be done on a case-by-case (or should I say > variable-by-variable?) > basis. > > Is this a default that should be changed? > > Matt > > > > This is one of R's rather _endearing_ little > > idiosyncrasies. I ran into it a while ago. > > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html > > > > > > For some reason, possibly historical, the option > > "stringAsFactors" is set to TRUE. > > > > As Prof Ripley says FAQ 7.10 will tell you > > as.numeric(as.character(f)) # for a one-off > conversion > > > > >From Gabor Grothendieck A one-off solution for a > > complete data.frame > > > > DF <- data.frame(let = letters[1:3], num = 1:3, > > stringsAsFactors = FALSE) > > > > str(DF) # to see what has happened. > > > > You can reset the option globally, see below. > However > > you might want to read Gabor Grothendieck's > comment > > about this in the thread referenced above since it > > could cause problems if you transfer files alot. > > > > Personally I went with the global option since I > don't > > tend to transfer programs to other people and I > was > > getting tired of tracking down errors in my > programs > > caused by numeric and character variables suddenly > > deciding to become factors. > > > > >From Steven Tucker: > > > > You can also this option globally with > > options(stringsAsFactors = TRUE) # in > > \library\base\R\Rprofile > > > > --- Falk Lieder <[EMAIL PROTECTED]> > wrote: > > > > > Hi, > > > > > > I have imported a data file to R. Unfortunately > R > > > has interpreted some > > > numeric variables as factors. Therefore I want > to > > > reconvert these to numeric > > > vectors whose values are the factor levels' > labels. > > > I tried > > > as.numeric(), > > > but it returns a vector of factor levels (i.e. > > > 1,2,3,...) instead of labels > > > (i.e. 0.71, 1.34, 2.61,â¦). > > > What can I do instead? > > > > > > Best wishes, Falk > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > -- > Matthew C Keller > Postdoctoral Fellow > Virginia Institute for Psychiatric and Behavioral > Genetics > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and pro
Re: [R] Convert factor to numeric vector of labels
I think that you grossly underestimate the frequency of use of factors in R, not to mention that factors are stored more efficiently than character vectors. All modeling functions depend upon them. Most testing, grouping and plotting functions (base R and Lattice) either use them directly as arguments or coerce character vectors to factors internally. So, no...I would not advocate modifying such fundamental behavior. UseRs should read the documentation before "jumping in with both feet" so that they understand the underlying design philosophy behind R and the actual documented functional behaviors. This would be superior to moving forward with functional expectations that are predicated on false assumptions and importantly, save you time. In Falk's case, it seems reasonable, without having seen any actual data, that the presumptive numeric column that was converted to a factor, had non-numeric characters in it. Thus, that a numeric column was coerced to a factor on import should have raised a red flag pointing to a data quality problem. Had the default behavior been otherwise, it is likely that Falk would have proceeded with subsequent analyses without being aware of this issue, perhaps resulting in a bad outcome. The function that handles this in the read.table() family of functions is called type.convert(). An example may be helpful: Vec <- as.character(1:10) > Vec [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" # Default behavior converts the character vector # to numeric > str(type.convert(Vec)) int [1:10] 1 2 3 4 5 6 7 8 9 10 # Now add in a non-numeric character (ie. bad data) Vec1 <- c(Vec, "a") > Vec1 [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "a" > str(type.convert(Vec1)) Factor w/ 11 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2 ... Voilà HTH, Marc Schwartz On Tue, 2007-08-14 at 13:47 -0600, Matthew Keller wrote: > Hi all, > > If we, the R community, are endeavoring to make R user friendly > (gasp!), I think that one of the first places to start would be in > setting stringsAsFactors = FALSE. Several times I've run into > instances of folks decrying R's "rediculous usage of memory" in > reading data, only to come to find out that these folks were > unknowingly importing certain columns as factors. The fix is easy once > you know it, but it isn't obvious to new users, and I'd bet that it > turns some % of people off of the program. Factors are not used often > enough to justify this default behavior in my opinion. When factors > are used, the user knows to treat the variable as a factor, and so it > can be done on a case-by-case (or should I say variable-by-variable?) > basis. > > Is this a default that should be changed? > > Matt > > > On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote: > > This is one of R's rather _endearing_ little > > idiosyncrasies. I ran into it a while ago. > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html > > > > > > For some reason, possibly historical, the option > > "stringAsFactors" is set to TRUE. > > > > As Prof Ripley says FAQ 7.10 will tell you > > as.numeric(as.character(f)) # for a one-off conversion > > > > >From Gabor Grothendieck A one-off solution for a > > complete data.frame > > > > DF <- data.frame(let = letters[1:3], num = 1:3, > > stringsAsFactors = FALSE) > > > > str(DF) # to see what has happened. > > > > You can reset the option globally, see below. However > > you might want to read Gabor Grothendieck's comment > > about this in the thread referenced above since it > > could cause problems if you transfer files alot. > > > > Personally I went with the global option since I don't > > tend to transfer programs to other people and I was > > getting tired of tracking down errors in my programs > > caused by numeric and character variables suddenly > > deciding to become factors. > > > > >From Steven Tucker: > > > > You can also this option globally with > > options(stringsAsFactors = TRUE) # in > > \library\base\R\Rprofile > > > > --- Falk Lieder <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I have imported a data file to R. Unfortunately R > > > has interpreted some > > > numeric variables as factors. Therefore I want to > > > reconvert these to numeric > > > vectors whose values are the factor levels' labels. > > > I tried > > > as.numeric(), > > > but it returns a vector of factor levels (i.e. > > > 1,2,3,...) instead of labels > > > (i.e. 0.71, 1.34, 2.61,…). > > > What can I do instead? > > > > > > Best wishes, Falk > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert factor to numeric vector of labels
Matt: I believe you have confused issues. Setting stringsAsFactors = FALSE would dramatically **increase** the amount of memory used for storing character vectors, which is what factors are for. So your proposed solution does exactly the opposite of what you want. The issue you are worried about is when numeric fields are somehow interpreted as non-numeric. This can happen for a variety of reasons (stray characters in numeric fields,quotes around numbers,...). The solution is not to set a global default that does the opposite of what you want in its intended use, but to read the documentation and either set the appropriate arguments (perhaps colClasses of read.table) or fix the original data before R reads it (e.g. remove quotes and stray characters). Failing that, the "one-off" solutions given are the correct way to handle what is a data problem, not an R problem. However, I should add that there are arguments for making stringsAsFactors = FALSE; search the archives for discussions why. The memory penalty will have to be paid, of course. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Keller Sent: Tuesday, August 14, 2007 12:48 PM To: John Kane Cc: Falk Lieder; r-help@stat.math.ethz.ch Subject: Re: [R] Convert factor to numeric vector of labels Hi all, If we, the R community, are endeavoring to make R user friendly (gasp!), I think that one of the first places to start would be in setting stringsAsFactors = FALSE. Several times I've run into instances of folks decrying R's "rediculous usage of memory" in reading data, only to come to find out that these folks were unknowingly importing certain columns as factors. The fix is easy once you know it, but it isn't obvious to new users, and I'd bet that it turns some % of people off of the program. Factors are not used often enough to justify this default behavior in my opinion. When factors are used, the user knows to treat the variable as a factor, and so it can be done on a case-by-case (or should I say variable-by-variable?) basis. Is this a default that should be changed? Matt On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote: > This is one of R's rather _endearing_ little > idiosyncrasies. I ran into it a while ago. > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html > > > For some reason, possibly historical, the option > "stringAsFactors" is set to TRUE. > > As Prof Ripley says FAQ 7.10 will tell you > as.numeric(as.character(f)) # for a one-off conversion > > >From Gabor Grothendieck A one-off solution for a > complete data.frame > > DF <- data.frame(let = letters[1:3], num = 1:3, > stringsAsFactors = FALSE) > > str(DF) # to see what has happened. > > You can reset the option globally, see below. However > you might want to read Gabor Grothendieck's comment > about this in the thread referenced above since it > could cause problems if you transfer files alot. > > Personally I went with the global option since I don't > tend to transfer programs to other people and I was > getting tired of tracking down errors in my programs > caused by numeric and character variables suddenly > deciding to become factors. > > >From Steven Tucker: > > You can also this option globally with > options(stringsAsFactors = TRUE) # in > \library\base\R\Rprofile > > --- Falk Lieder <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I have imported a data file to R. Unfortunately R > > has interpreted some > > numeric variables as factors. Therefore I want to > > reconvert these to numeric > > vectors whose values are the factor levels' labels. > > I tried > > as.numeric(), > > but it returns a vector of factor levels (i.e. > > 1,2,3,...) instead of labels > > (i.e. 0.71, 1.34, 2.61,.). > > What can I do instead? > > > > Best wishes, Falk > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert factor to numeric vector of labels
Hi all, If we, the R community, are endeavoring to make R user friendly (gasp!), I think that one of the first places to start would be in setting stringsAsFactors = FALSE. Several times I've run into instances of folks decrying R's "rediculous usage of memory" in reading data, only to come to find out that these folks were unknowingly importing certain columns as factors. The fix is easy once you know it, but it isn't obvious to new users, and I'd bet that it turns some % of people off of the program. Factors are not used often enough to justify this default behavior in my opinion. When factors are used, the user knows to treat the variable as a factor, and so it can be done on a case-by-case (or should I say variable-by-variable?) basis. Is this a default that should be changed? Matt On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote: > This is one of R's rather _endearing_ little > idiosyncrasies. I ran into it a while ago. > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html > > > For some reason, possibly historical, the option > "stringAsFactors" is set to TRUE. > > As Prof Ripley says FAQ 7.10 will tell you > as.numeric(as.character(f)) # for a one-off conversion > > >From Gabor Grothendieck A one-off solution for a > complete data.frame > > DF <- data.frame(let = letters[1:3], num = 1:3, > stringsAsFactors = FALSE) > > str(DF) # to see what has happened. > > You can reset the option globally, see below. However > you might want to read Gabor Grothendieck's comment > about this in the thread referenced above since it > could cause problems if you transfer files alot. > > Personally I went with the global option since I don't > tend to transfer programs to other people and I was > getting tired of tracking down errors in my programs > caused by numeric and character variables suddenly > deciding to become factors. > > >From Steven Tucker: > > You can also this option globally with > options(stringsAsFactors = TRUE) # in > \library\base\R\Rprofile > > --- Falk Lieder <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I have imported a data file to R. Unfortunately R > > has interpreted some > > numeric variables as factors. Therefore I want to > > reconvert these to numeric > > vectors whose values are the factor levels' labels. > > I tried > > as.numeric(), > > but it returns a vector of factor levels (i.e. > > 1,2,3,...) instead of labels > > (i.e. 0.71, 1.34, 2.61,…). > > What can I do instead? > > > > Best wishes, Falk > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert factor to numeric vector of labels
This is one of R's rather _endearing_ little idiosyncrasies. I ran into it a while ago. http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html For some reason, possibly historical, the option "stringAsFactors" is set to TRUE. As Prof Ripley says FAQ 7.10 will tell you as.numeric(as.character(f)) # for a one-off conversion >From Gabor Grothendieck A one-off solution for a complete data.frame DF <- data.frame(let = letters[1:3], num = 1:3, stringsAsFactors = FALSE) str(DF) # to see what has happened. You can reset the option globally, see below. However you might want to read Gabor Grothendieck's comment about this in the thread referenced above since it could cause problems if you transfer files alot. Personally I went with the global option since I don't tend to transfer programs to other people and I was getting tired of tracking down errors in my programs caused by numeric and character variables suddenly deciding to become factors. >From Steven Tucker: You can also this option globally with options(stringsAsFactors = TRUE) # in \library\base\R\Rprofile --- Falk Lieder <[EMAIL PROTECTED]> wrote: > Hi, > > I have imported a data file to R. Unfortunately R > has interpreted some > numeric variables as factors. Therefore I want to > reconvert these to numeric > vectors whose values are the factor levels' labels. > I tried > as.numeric(), > but it returns a vector of factor levels (i.e. > 1,2,3,...) instead of labels > (i.e. 0.71, 1.34, 2.61, ). > What can I do instead? > > Best wishes, Falk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert factor to numeric vector of labels
See the FAQ Q7.10 (and please study the posting guide) On Sun, 12 Aug 2007, Falk Lieder wrote: Hi, I have imported a data file to R. Unfortunately R has interpreted some numeric variables as factors. Therefore I want to reconvert these to numeric vectors whose values are the factor levels' labels. I tried as.numeric(), but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels (i.e. 0.71, 1.34, 2.61, ). What can I do instead? Best wishes, Falk [[alternative HTML version deleted]] PLEASE do read the posting guide http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert factor to numeric vector of labels
Hi, I have imported a data file to R. Unfortunately R has interpreted some numeric variables as factors. Therefore I want to reconvert these to numeric vectors whose values are the factor levels' labels. I tried as.numeric(), but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels (i.e. 0.71, 1.34, 2.61, ). What can I do instead? Best wishes, Falk [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.