Re: [R] Condition to factor (easy to remember)

2009-10-01 Thread Dieter Menne



Douglas Bates-2 wrote:
 
 On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu
 wrote:
 And besides, Frank Harrell will soon be weighing in to tell you why
 you shouldn't dichotomize in the first place.
 

Subjects in this study received a 20 ml infusion of Kirsch (40%, Swiss
Brand) at t=10 minutes, therefore the second interval should read Prost
instead of Post. Even Frank would admit this is a valid dichotomization.

Dieter



-- 
View this message in context: 
http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25696647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Condition to factor (easy to remember)

2009-09-30 Thread Dieter Menne

Dear List,

creating factors in a given non-default orders is notoriously difficult to
explain in a course. Students love the ifelse construct given below most,
but I remember some comment from Martin Mächler (?) that ifelse should be
banned from courses.

Any better idea? Not necessarily short, easy to remember is important.

Dieter


data = c(1,7,10,50,70)
levs = c(Pre,Post)

# Typical C-Programmer style
factor(levs[as.integer(data 10)+1], levels=levs)

# Easiest to understand
factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

-- 
View this message in context: 
http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread David Winsemius


On Sep 30, 2009, at 3:43 AM, Dieter Menne wrote:



Dear List,

creating factors in a given non-default orders is notoriously  
difficult to
explain in a course. Students love the ifelse construct given below  
most,
but I remember some comment from Martin Mächler (?) that ifelse  
should be

banned from courses.

Any better idea? Not necessarily short, easy to remember is important.

Dieter


data = c(1,7,10,50,70)
levs = c(Pre,Post)

# Typical C-Programmer style
factor(levs[as.integer(data 10)+1], levels=levs)


I agree with your observation that many people express a preference  
for the ifelse version. I had the same sort of comment on some of my  
Excel code (not in a statistical application)  a couple of days ago.  
In your code the as.integer function is superfluous and you could  
argue that it might even be easier to understand for the Boolean- 
challenged masses if you substituted as.logical(). It would be also  
superfluous, but it might convey a message that the programmer _knew+  
that the + operation is capable of doing the necessary coercion.




# Easiest to understand
factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

--

-- Boole Rules

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Dieter Menne



David Winsemius wrote:
 
 
 # Typical C-Programmer style
 factor(levs[as.integer(data 10)+1], levels=levs)
 
 In your code the as.integer function is superfluous 

Oops... done too much c# lately, getting invalid cast challenged.

Dieter


-- 
View this message in context: 
http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Gabor Grothendieck
1. A common way of doing this is cut:

   cut(data, c(-Inf, 10, Inf), lab = levs, right = TRUE)
  [1] Pre  Pre  Pre  Post Post
  Levels: Pre Post

We don't actually need right=TRUE as its the default but if you omit
it then it can be hard to remember whether the right end of intervals
are included or excluded in the subdivision so I would recommend
including it as a matter of course.  Slightly less safe but if you
knew the values were  integral then another approach that would allow
dropping the right= argument would be to use 10.5 as the breakpoint in
which case the setting of right= does not matter anyways.

2. Similar to cut is findInterval so the subscripting of your first
solution could be done via findInterval:

levs[ findInterval(data, c(-Inf, 10), right = TRUE) ]
   [1] Pre  Pre  Pre  Post Post

The same comment regarding 10.5 applies.  I've omitted the factor(...)
part to focus on the difference and in the remaining examples have
done that too.

3. Either of these could replace the ifelse.  Both work by vectorizing
an ordinary if but sapply is a more common way to do it so is likely
preferable from the viewpoint of clarity.

# 3a
sapply(data, function(x) if (x = 10) levs[1] else levs[2])
   [1] Pre  Pre  Pre  Post Post

# 3b
Vectorize(function(x) if (x = 10) levs[1] else levs[2])(data)
   [1] Pre  Pre  Pre  Post Post

4. The subscripting in your first solution could be done like this
which is a bit longer but is arguably easier to understand:

levs[ 1 * (data =10) + 2 * (data  10) ]
   [1] Pre  Pre  Pre  Post Post


On Wed, Sep 30, 2009 at 3:43 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:

 Dear List,

 creating factors in a given non-default orders is notoriously difficult to
 explain in a course. Students love the ifelse construct given below most,
 but I remember some comment from Martin Mächler (?) that ifelse should be
 banned from courses.

 Any better idea? Not necessarily short, easy to remember is important.

 Dieter


 data = c(1,7,10,50,70)
 levs = c(Pre,Post)

 # Typical C-Programmer style
 factor(levs[as.integer(data 10)+1], levels=levs)

 # Easiest to understand
 factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

 --
 View this message in context: 
 http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Ista Zahn
An extremely verbose, but (in my view) easy to understand approach is:

 data.f - data; data.f[which(data = 10)] - levs[1]; data.f[which(data  
 10)] - levs[2]; data.f - factor(data.f)

-Ista

On Wed, Sep 30, 2009 at 8:31 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:



 David Winsemius wrote:


 # Typical C-Programmer style
 factor(levs[as.integer(data 10)+1], levels=levs)

 In your code the as.integer function is superfluous

 Oops... done too much c# lately, getting invalid cast challenged.

 Dieter


 --
 View this message in context: 
 http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Douglas Bates
On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote:
 On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne
 dieter.me...@menne-biomed.de wrote:

 Dear List,

 creating factors in a given non-default orders is notoriously difficult to
 explain in a course. Students love the ifelse construct given below most,
 but I remember some comment from Martin Mächler (?) that ifelse should be
 banned from courses.

 Any better idea? Not necessarily short, easy to remember is important.

 Dieter

 data = c(1,7,10,50,70)
 levs = c(Pre,Post)

 # Typical C-Programmer style
 factor(levs[as.integer(data 10)+1], levels=levs)

 # Easiest to understand
 factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

 Why not

 factor(data  10, labels = c(Pre, Post))
 [1] Pre  Pre  Pre  Post Post
 Levels: Pre Post

 All you have to remember is that FALSE comes before TRUE.

And besides, Frank Harrell will soon be weighing in to tell you why
you shouldn't dichotomize in the first place.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread hadley wickham
On Wed, Sep 30, 2009 at 2:32 PM, Ista Zahn istaz...@gmail.com wrote:
 An extremely verbose, but (in my view) easy to understand approach is:

 data.f - data; data.f[which(data = 10)] - levs[1]; data.f[which(data  
 10)] - levs[2]; data.f - factor(data.f)


All those which()s are unnecessary.  And if you're going to use this
approach I'd recommend initialising data.f with NA's so you can tell
if you missed any cases.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread Peter Dalgaard

Douglas Bates wrote:

On Wed, Sep 30, 2009 at 2:42 PM, Douglas Bates ba...@stat.wisc.edu wrote:

On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne
dieter.me...@menne-biomed.de wrote:


Dear List,
creating factors in a given non-default orders is notoriously difficult to
explain in a course. Students love the ifelse construct given below most,
but I remember some comment from Martin Mächler (?) that ifelse should be
banned from courses.
Any better idea? Not necessarily short, easy to remember is important.
Dieter
data = c(1,7,10,50,70)
levs = c(Pre,Post)

# Typical C-Programmer style
factor(levs[as.integer(data 10)+1], levels=levs)

# Easiest to understand
factor(ifelse(data =10, levs[1], levs[2]), levels=levs)

Why not


factor(data  10, labels = c(Pre, Post))

[1] Pre  Pre  Pre  Post Post
Levels: Pre Post

All you have to remember is that FALSE comes before TRUE.


And besides, Frank Harrell will soon be weighing in to tell you why
you shouldn't dichotomize in the first place.


And someone might also remind you that it is safest to include 
levels=c(FALSE,TRUE), just in case the condition is always TRUE. (Terry 
Thernau has the scars from the implementation of Surv()...)


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condition to factor (easy to remember)

2009-09-30 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Douglas Bates
 Sent: Wednesday, September 30, 2009 12:42 PM
 To: Dieter Menne
 Cc: r-help@r-project.org
 Subject: Re: [R] Condition to factor (easy to remember)
 
 On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne
 dieter.me...@menne-biomed.de wrote:
 
  Dear List,
 
  creating factors in a given non-default orders is 
 notoriously difficult to
  explain in a course. Students love the ifelse construct 
 given below most,
  but I remember some comment from Martin Mächler (?) that 
 ifelse should be
  banned from courses.
 
  Any better idea? Not necessarily short, easy to remember is 
 important.
 
  Dieter
 
  data = c(1,7,10,50,70)
  levs = c(Pre,Post)
 
  # Typical C-Programmer style
  factor(levs[as.integer(data 10)+1], levels=levs)
 
  # Easiest to understand
  factor(ifelse(data =10, levs[1], levs[2]), levels=levs)
 
 Why not
 
  factor(data  10, labels = c(Pre, Post))
 [1] Pre  Pre  Pre  Post Post
 Levels: Pre Post
 
 All you have to remember is that FALSE comes before TRUE.

And if you don't  want to remember that order or if you want TRUE to come
before FALSE use the levels argument to factor.  E.g.,
 factor(data10, levels=c(TRUE,FALSE), labels=c(Post,Pre))
[1] Pre  Pre  Pre  Post Post
Levels: Post Pre

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.