Re: [R] Convert factor to numeric vector of labels

2007-08-15 Thread John Kane
My reason for setting stringsAsFactors = FALSE is more
that I really dislike having R convert what I "think"
are character variables to factors when I import data.


I suspect that it takes quite a few new users by
surprise that what they had intended to be a character
variable has become a factor. And it can take a long
time to track down the problem if you're a newbie.
-

A quick (overly simple) example where I had intended
the data in the second column to be character. 

Original data found at
http://ca.geocities.com/jrkrideau/R/facts.txt

1, b
1, b
3, b
3, b
4, a
4, a
3, a

options(stringsAsFactors = TRUE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";)
 ; df[,2]
[1]  b  b  b  a  a  a
Levels:  a  b


options(stringsAsFactors = FALSE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt";)
 ; df[,2]
[1] " b" " b" " b" " a" " a" " a"
-

There are probably good reasons for setting the
default either way and while currently, I am strongly
of the FALSE persuation I can see some serious
problems changing the default, particularly when most
existing code will assume TRUE.  

It might be that a  "Why are my character variables
turning into factors"  as a compliment to "How do I
convert factors to numeric" in the FAQ would be
sufficient.  As it is the reader knows what seems to
have happened but there is no clue as to why or how
this is happening.

If there are enough problems in importing numeric as
factors a note about the default might be worthwhile
in both FAQ entries  since it seems to indicate that
this is not a rare problem.



--- Matthew Keller <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> If we, the R community, are endeavoring to make R
> user friendly
> (gasp!), I think that one of the first places to
> start would be in
> setting stringsAsFactors = FALSE. Several times I've
> run into
> instances of folks decrying R's "rediculous usage of
> memory" in
> reading data, only to come to find out that these
> folks were
> unknowingly importing certain columns as factors.
> The fix is easy once
> you know it, but it isn't obvious to new users, and
> I'd bet that it
> turns some % of people off of the program. Factors
> are not used often
> enough to justify this default behavior in my
> opinion. When factors
> are used, the user knows to treat the variable as a
> factor, and so it
> can be done on a case-by-case (or should I say
> variable-by-variable?)
> basis.
> 
> Is this a default that should be changed?
> 
> Matt
> 
> 

> > This is one of R's rather _endearing_  little
> > idiosyncrasies. I ran into it a while ago.
> >
>
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
> >
> >
> > For some reason, possibly historical, the option
> > "stringAsFactors" is set to TRUE.
> >
> > As Prof Ripley says FAQ 7.10 will tell you
> > as.numeric(as.character(f)) # for a one-off
> conversion
> >
> > >From Gabor Grothendieck  A one-off solution for a
> > complete data.frame
> >
> > DF <- data.frame(let = letters[1:3], num = 1:3,
> >  stringsAsFactors = FALSE)
> >
> > str(DF)  # to see what has happened.
> >
> > You can reset the option globally, see below. 
> However
> > you might want to read Gabor Grothendieck's
> comment
> > about this in the thread referenced above since it
> > could cause problems if you transfer files alot.
> >
> > Personally I went with the global option since I
> don't
> > tend to transfer programs to other people and I
> was
> > getting tired of tracking down errors in my
> programs
> > caused by numeric and character variables suddenly
> > deciding to become factors.
> >
> > >From Steven Tucker:
> >
> > You can also this option globally with
> >  options(stringsAsFactors = TRUE)  # in
> > \library\base\R\Rprofile
> >
> > --- Falk Lieder <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > >
> > > I have imported a data file to R. Unfortunately
> R
> > > has interpreted some
> > > numeric variables as factors. Therefore I want
> to
> > > reconvert these to numeric
> > > vectors whose values are the factor levels'
> labels.
> > > I tried
> > > as.numeric(),
> > > but it returns a vector of factor levels (i.e.
> > > 1,2,3,...) instead of labels
> > > (i.e. 0.71, 1.34, 2.61,…).
> > > What can I do instead?
> > >
> > > Best wishes, Falk
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> -- 
> Matthew C Keller
> Postdoctoral Fellow
> Virginia Institute for Psychiatric and Behavioral
> Genetics
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and pro

Re: [R] Convert factor to numeric vector of labels

2007-08-14 Thread Marc Schwartz
I think that you grossly underestimate the frequency of use of factors
in R, not to mention that factors are stored more efficiently than
character vectors.

All modeling functions depend upon them.  Most testing, grouping and
plotting functions (base R and Lattice) either use them directly as
arguments or coerce character vectors to factors internally.

So, no...I would not advocate modifying such fundamental behavior.

UseRs should read the documentation before "jumping in with both feet"
so that they understand the underlying design philosophy behind R and
the actual documented functional behaviors. This would be superior to
moving forward with functional expectations that are predicated on false
assumptions and importantly, save you time.

In Falk's case, it seems reasonable, without having seen any actual
data, that the presumptive numeric column that was converted to a
factor, had non-numeric characters in it. 

Thus, that a numeric column was coerced to a factor on import should
have raised a red flag pointing to a data quality problem.  

Had the default behavior been otherwise, it is likely that Falk would
have proceeded with subsequent analyses without being aware of this
issue, perhaps resulting in a bad outcome.

The function that handles this in the read.table() family of functions
is called type.convert(). An example may be helpful:

Vec <- as.character(1:10)

> Vec
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

# Default behavior converts the character vector
# to numeric
> str(type.convert(Vec))
 int [1:10] 1 2 3 4 5 6 7 8 9 10


# Now add in a non-numeric character (ie. bad data)
Vec1 <- c(Vec, "a")

> Vec1
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "a" 


> str(type.convert(Vec1))
 Factor w/ 11 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2 ...

Voilà

HTH,

Marc Schwartz


On Tue, 2007-08-14 at 13:47 -0600, Matthew Keller wrote:
> Hi all,
> 
> If we, the R community, are endeavoring to make R user friendly
> (gasp!), I think that one of the first places to start would be in
> setting stringsAsFactors = FALSE. Several times I've run into
> instances of folks decrying R's "rediculous usage of memory" in
> reading data, only to come to find out that these folks were
> unknowingly importing certain columns as factors. The fix is easy once
> you know it, but it isn't obvious to new users, and I'd bet that it
> turns some % of people off of the program. Factors are not used often
> enough to justify this default behavior in my opinion. When factors
> are used, the user knows to treat the variable as a factor, and so it
> can be done on a case-by-case (or should I say variable-by-variable?)
> basis.
> 
> Is this a default that should be changed?
> 
> Matt
> 
> 
> On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote:
> > This is one of R's rather _endearing_  little
> > idiosyncrasies. I ran into it a while ago.
> > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
> >
> >
> > For some reason, possibly historical, the option
> > "stringAsFactors" is set to TRUE.
> >
> > As Prof Ripley says FAQ 7.10 will tell you
> > as.numeric(as.character(f)) # for a one-off conversion
> >
> > >From Gabor Grothendieck  A one-off solution for a
> > complete data.frame
> >
> > DF <- data.frame(let = letters[1:3], num = 1:3,
> >  stringsAsFactors = FALSE)
> >
> > str(DF)  # to see what has happened.
> >
> > You can reset the option globally, see below.  However
> > you might want to read Gabor Grothendieck's comment
> > about this in the thread referenced above since it
> > could cause problems if you transfer files alot.
> >
> > Personally I went with the global option since I don't
> > tend to transfer programs to other people and I was
> > getting tired of tracking down errors in my programs
> > caused by numeric and character variables suddenly
> > deciding to become factors.
> >
> > >From Steven Tucker:
> >
> > You can also this option globally with
> >  options(stringsAsFactors = TRUE)  # in
> > \library\base\R\Rprofile
> >
> > --- Falk Lieder <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > I have imported a data file to R. Unfortunately R
> > > has interpreted some
> > > numeric variables as factors. Therefore I want to
> > > reconvert these to numeric
> > > vectors whose values are the factor levels' labels.
> > > I tried
> > > as.numeric(),
> > > but it returns a vector of factor levels (i.e.
> > > 1,2,3,...) instead of labels
> > > (i.e. 0.71, 1.34, 2.61,…).
> > > What can I do instead?
> > >
> > > Best wishes, Falk
> >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert factor to numeric vector of labels

2007-08-14 Thread Bert Gunter
Matt:

I believe you have confused issues.

Setting stringsAsFactors = FALSE would dramatically **increase** the amount
of memory used for storing character vectors, which is what factors are for.
So your proposed solution does exactly the opposite of what you want.

The issue you are worried about is when numeric fields are somehow
interpreted as non-numeric. This can happen for a variety of reasons (stray
characters in numeric fields,quotes around numbers,...). The solution is not
to set a global default that does the opposite of what you want in its
intended use, but to read the documentation and either set the appropriate
arguments (perhaps colClasses of read.table) or fix the original data before
R reads it (e.g. remove quotes and stray characters). Failing that, the
"one-off" solutions given are the correct way to handle what is a data
problem, not an R problem.

However, I should add that there are arguments for making stringsAsFactors =
FALSE; search the archives for discussions why. The memory penalty will have
to be paid, of course.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Matthew Keller
Sent: Tuesday, August 14, 2007 12:48 PM
To: John Kane
Cc: Falk Lieder; r-help@stat.math.ethz.ch
Subject: Re: [R] Convert factor to numeric vector of labels

Hi all,

If we, the R community, are endeavoring to make R user friendly
(gasp!), I think that one of the first places to start would be in
setting stringsAsFactors = FALSE. Several times I've run into
instances of folks decrying R's "rediculous usage of memory" in
reading data, only to come to find out that these folks were
unknowingly importing certain columns as factors. The fix is easy once
you know it, but it isn't obvious to new users, and I'd bet that it
turns some % of people off of the program. Factors are not used often
enough to justify this default behavior in my opinion. When factors
are used, the user knows to treat the variable as a factor, and so it
can be done on a case-by-case (or should I say variable-by-variable?)
basis.

Is this a default that should be changed?

Matt


On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote:
> This is one of R's rather _endearing_  little
> idiosyncrasies. I ran into it a while ago.
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
>
>
> For some reason, possibly historical, the option
> "stringAsFactors" is set to TRUE.
>
> As Prof Ripley says FAQ 7.10 will tell you
> as.numeric(as.character(f)) # for a one-off conversion
>
> >From Gabor Grothendieck  A one-off solution for a
> complete data.frame
>
> DF <- data.frame(let = letters[1:3], num = 1:3,
>  stringsAsFactors = FALSE)
>
> str(DF)  # to see what has happened.
>
> You can reset the option globally, see below.  However
> you might want to read Gabor Grothendieck's comment
> about this in the thread referenced above since it
> could cause problems if you transfer files alot.
>
> Personally I went with the global option since I don't
> tend to transfer programs to other people and I was
> getting tired of tracking down errors in my programs
> caused by numeric and character variables suddenly
> deciding to become factors.
>
> >From Steven Tucker:
>
> You can also this option globally with
>  options(stringsAsFactors = TRUE)  # in
> \library\base\R\Rprofile
>
> --- Falk Lieder <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I have imported a data file to R. Unfortunately R
> > has interpreted some
> > numeric variables as factors. Therefore I want to
> > reconvert these to numeric
> > vectors whose values are the factor levels' labels.
> > I tried
> > as.numeric(),
> > but it returns a vector of factor levels (i.e.
> > 1,2,3,...) instead of labels
> > (i.e. 0.71, 1.34, 2.61,.).
> > What can I do instead?
> >
> > Best wishes, Falk
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert factor to numeric vector of labels

2007-08-14 Thread Matthew Keller
Hi all,

If we, the R community, are endeavoring to make R user friendly
(gasp!), I think that one of the first places to start would be in
setting stringsAsFactors = FALSE. Several times I've run into
instances of folks decrying R's "rediculous usage of memory" in
reading data, only to come to find out that these folks were
unknowingly importing certain columns as factors. The fix is easy once
you know it, but it isn't obvious to new users, and I'd bet that it
turns some % of people off of the program. Factors are not used often
enough to justify this default behavior in my opinion. When factors
are used, the user knows to treat the variable as a factor, and so it
can be done on a case-by-case (or should I say variable-by-variable?)
basis.

Is this a default that should be changed?

Matt


On 8/13/07, John Kane <[EMAIL PROTECTED]> wrote:
> This is one of R's rather _endearing_  little
> idiosyncrasies. I ran into it a while ago.
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
>
>
> For some reason, possibly historical, the option
> "stringAsFactors" is set to TRUE.
>
> As Prof Ripley says FAQ 7.10 will tell you
> as.numeric(as.character(f)) # for a one-off conversion
>
> >From Gabor Grothendieck  A one-off solution for a
> complete data.frame
>
> DF <- data.frame(let = letters[1:3], num = 1:3,
>  stringsAsFactors = FALSE)
>
> str(DF)  # to see what has happened.
>
> You can reset the option globally, see below.  However
> you might want to read Gabor Grothendieck's comment
> about this in the thread referenced above since it
> could cause problems if you transfer files alot.
>
> Personally I went with the global option since I don't
> tend to transfer programs to other people and I was
> getting tired of tracking down errors in my programs
> caused by numeric and character variables suddenly
> deciding to become factors.
>
> >From Steven Tucker:
>
> You can also this option globally with
>  options(stringsAsFactors = TRUE)  # in
> \library\base\R\Rprofile
>
> --- Falk Lieder <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I have imported a data file to R. Unfortunately R
> > has interpreted some
> > numeric variables as factors. Therefore I want to
> > reconvert these to numeric
> > vectors whose values are the factor levels' labels.
> > I tried
> > as.numeric(),
> > but it returns a vector of factor levels (i.e.
> > 1,2,3,...) instead of labels
> > (i.e. 0.71, 1.34, 2.61,…).
> > What can I do instead?
> >
> > Best wishes, Falk
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert factor to numeric vector of labels

2007-08-13 Thread John Kane
This is one of R's rather _endearing_  little 
idiosyncrasies. I ran into it a while ago.
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html


For some reason, possibly historical, the option
"stringAsFactors" is set to TRUE.  

As Prof Ripley says FAQ 7.10 will tell you
as.numeric(as.character(f)) # for a one-off conversion

>From Gabor Grothendieck  A one-off solution for a
complete data.frame

DF <- data.frame(let = letters[1:3], num = 1:3,
 stringsAsFactors = FALSE)

str(DF)  # to see what has happened.

You can reset the option globally, see below.  However
you might want to read Gabor Grothendieck's comment
about this in the thread referenced above since it
could cause problems if you transfer files alot. 

Personally I went with the global option since I don't
tend to transfer programs to other people and I was
getting tired of tracking down errors in my programs
caused by numeric and character variables suddenly
deciding to become factors.

>From Steven Tucker:

You can also this option globally with
 options(stringsAsFactors = TRUE)  # in
\library\base\R\Rprofile
 
--- Falk Lieder <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I have imported a data file to R. Unfortunately R
> has interpreted some
> numeric variables as factors. Therefore I want to
> reconvert these to numeric
> vectors whose values are the factor levels' labels.
> I tried
> as.numeric(),
> but it returns a vector of factor levels (i.e.
> 1,2,3,...) instead of labels
> (i.e. 0.71, 1.34, 2.61,…).
> What can I do instead?
> 
> Best wishes, Falk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert factor to numeric vector of labels

2007-08-12 Thread Prof Brian Ripley

See the FAQ Q7.10 (and please study the posting guide)

On Sun, 12 Aug 2007, Falk Lieder wrote:


Hi,

I have imported a data file to R. Unfortunately R has interpreted some
numeric variables as factors. Therefore I want to reconvert these to numeric
vectors whose values are the factor levels' labels. I tried
as.numeric(),
but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels
(i.e. 0.71, 1.34, 2.61,…).
What can I do instead?

Best wishes, Falk

[[alternative HTML version deleted]]



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert factor to numeric vector of labels

2007-08-12 Thread Falk Lieder
Hi,

I have imported a data file to R. Unfortunately R has interpreted some
numeric variables as factors. Therefore I want to reconvert these to numeric
vectors whose values are the factor levels' labels. I tried
as.numeric(),
but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels
(i.e. 0.71, 1.34, 2.61,…).
What can I do instead?

Best wishes, Falk

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.