Re: [R] Creating New Variable Using Ifelse

2017-08-10 Thread Courtney Benjamin
Thanks very much; with your tips, I was able to get the nested ifelse statement 
to work properly!

Courtney Benjamin



From: PIKAL Petr <petr.pi...@precheza.cz>
Sent: Thursday, August 10, 2017 5:39 AM
To: Courtney Benjamin; r-help@r-project.org
Subject: RE: Creating New Variable Using Ifelse

Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Courtney
> Benjamin
> Sent: Thursday, August 10, 2017 5:55 AM
> To: r-help@r-project.org
> Subject: [R] Creating New Variable Using Ifelse
>
> Hello R Help List,
>
> I am an R novice and trying to use the ifelse function to create a new binary
> variable based off of the responses of two other binary variables; NAs are
> involved.  I pulled it off almost successfully, but when I checked the counts 
> of
> my new variable for accuracy, I found that a small portion of the NA cases 
> were
> not being passed through as NAs, but as "0" counts in my new variable.  My
> many attempts at creating a nested ifelse statement that would pass the NAs
> through properly have not been successful.  Any help is greatly appreciated.
>
> Here is a MRE:?
>
> library(RCurl)
> data <-
> getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-
> ed/master/elsq2wbl.csv")

Did not work for me probably due to some restriction of data access.

> elsq2wbl <- read.csv(text = data)
>
> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA
> elsq2wbl [elsq2wbl[, "PSWBL"] < -2, "PSWBL"] <- NA

If you wanted recode values below zero to NA you can do this easily without any 
ifelse

> summary(test)
   mp tepl   kryst
 Min.   : 7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.:32.25   1st Qu.:400.0   1st Qu.: 24.15
 Median :37.50   Median :500.0   Median : 26.55
 Mean   :38.44   Mean   :485.7   Mean   : 42.64
 3rd Qu.:44.75   3rd Qu.:600.0   3rd Qu.: 33.62
 Max.   :76.02   Max.   :900.0   Max.   :150.00
 NA's   :3   NA's   :6
> test[is.na(test)] <- 999
> summary(test)
   mp  tepl   kryst
 Min.   :  7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.: 34.27   1st Qu.:400.0   1st Qu.: 25.93
 Median : 41.75   Median :500.0   Median : 92.90
 Mean   :244.28   Mean   :485.7   Mean   :452.51
 3rd Qu.: 70.05   3rd Qu.:600.0   3rd Qu.:999.00
 Max.   :999.00   Max.   :900.0   Max.   :999.00
>
> test[test>900] <- NA
> summary(test)
   mp tepl   kryst
 Min.   : 7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.:32.25   1st Qu.:400.0   1st Qu.: 24.15
 Median :37.50   Median :500.0   Median : 26.55
 Mean   :38.44   Mean   :485.7   Mean   : 42.64
 3rd Qu.:44.75   3rd Qu.:600.0   3rd Qu.: 33.62
 Max.   :76.02   Max.   :900.0   Max.   :150.00
 NA's   :3   NA's   :6


>
> #Labeling categorical variable levels
> elsq2wbl$EVERRELJOB <- factor(elsq2wbl$EVERRELJOB, levels = c(0,1), labels =
> c("No","Yes"))
> elsq2wbl$PSWBL <- factor(elsq2wbl$PSWBL, levels = c(0,1), labels =
> c("No","Yes"))
>
> ##Trying to create a new variable to indicate if the student had a job
> #related to the college studies that was NOT a WBL experience
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" &
> elsq2wbl$EVERRELJOB=="Yes",1,0)

You can use simple logical functions to get desired result

First sample data
> a <- sample(c("Y", "N"), 30, replace=TRUE)
> b <- sample(c("Y", "N"), 30, replace=TRUE)
> (a=="N")*(b=="Y")
 [1] 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1

has 1 only if both conditions are met.

and it works with NA values too.

> a[c(3,5,8)] <- NA
> b[c(3,6,7,8)] <- NA
> (a=="N")*(b=="Y")
 [1]  1  1 NA  0 NA NA NA NA  0  0  1  0  0  1  0  0  0  0  1  0  0  0  0  0  0
[26]  0  0  0  0  1

Cheers
Petr


>
> #Cross tab to check counts of two variables that new variable is based upon
> xtabs(~PSWBL+EVERRELJOB,subset(elsq2wbl,BYSCTRL==1==1),add
> NA=TRUE)
>
> #Checking count of newly created variable
> Q2sub <- subset(elsq2wbl,BYSCTRL==1==1)
> library(plyr)
> count(Q2sub,'NONWBLRELJOB')
>
> #The new variable has the correct count of "1", but 88 cases too many for "0"
> #The cross tab shows 20 and 68 NA cases that are being incorrectly counted as
> "0" in the new variable
>
> #My other approach at trying to handle the NAs properly-returns an error
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" &
> elsq2wbl$EVERRELJOB=="Yes",1,ifelse(i

Re: [R] Creating New Variable Using Ifelse

2017-08-10 Thread PIKAL Petr
Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Courtney
> Benjamin
> Sent: Thursday, August 10, 2017 5:55 AM
> To: r-help@r-project.org
> Subject: [R] Creating New Variable Using Ifelse
>
> Hello R Help List,
>
> I am an R novice and trying to use the ifelse function to create a new binary
> variable based off of the responses of two other binary variables; NAs are
> involved.  I pulled it off almost successfully, but when I checked the counts 
> of
> my new variable for accuracy, I found that a small portion of the NA cases 
> were
> not being passed through as NAs, but as "0" counts in my new variable.  My
> many attempts at creating a nested ifelse statement that would pass the NAs
> through properly have not been successful.  Any help is greatly appreciated.
>
> Here is a MRE:?
>
> library(RCurl)
> data <-
> getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-
> ed/master/elsq2wbl.csv")

Did not work for me probably due to some restriction of data access.

> elsq2wbl <- read.csv(text = data)
>
> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA
> elsq2wbl [elsq2wbl[, "PSWBL"] < -2, "PSWBL"] <- NA

If you wanted recode values below zero to NA you can do this easily without any 
ifelse

> summary(test)
   mp tepl   kryst
 Min.   : 7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.:32.25   1st Qu.:400.0   1st Qu.: 24.15
 Median :37.50   Median :500.0   Median : 26.55
 Mean   :38.44   Mean   :485.7   Mean   : 42.64
 3rd Qu.:44.75   3rd Qu.:600.0   3rd Qu.: 33.62
 Max.   :76.02   Max.   :900.0   Max.   :150.00
 NA's   :3   NA's   :6
> test[is.na(test)] <- 999
> summary(test)
   mp  tepl   kryst
 Min.   :  7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.: 34.27   1st Qu.:400.0   1st Qu.: 25.93
 Median : 41.75   Median :500.0   Median : 92.90
 Mean   :244.28   Mean   :485.7   Mean   :452.51
 3rd Qu.: 70.05   3rd Qu.:600.0   3rd Qu.:999.00
 Max.   :999.00   Max.   :900.0   Max.   :999.00
>
> test[test>900] <- NA
> summary(test)
   mp tepl   kryst
 Min.   : 7.11   Min.   :100.0   Min.   : 21.70
 1st Qu.:32.25   1st Qu.:400.0   1st Qu.: 24.15
 Median :37.50   Median :500.0   Median : 26.55
 Mean   :38.44   Mean   :485.7   Mean   : 42.64
 3rd Qu.:44.75   3rd Qu.:600.0   3rd Qu.: 33.62
 Max.   :76.02   Max.   :900.0   Max.   :150.00
 NA's   :3   NA's   :6


>
> #Labeling categorical variable levels
> elsq2wbl$EVERRELJOB <- factor(elsq2wbl$EVERRELJOB, levels = c(0,1), labels =
> c("No","Yes"))
> elsq2wbl$PSWBL <- factor(elsq2wbl$PSWBL, levels = c(0,1), labels =
> c("No","Yes"))
>
> ##Trying to create a new variable to indicate if the student had a job
> #related to the college studies that was NOT a WBL experience
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" &
> elsq2wbl$EVERRELJOB=="Yes",1,0)

You can use simple logical functions to get desired result

First sample data
> a <- sample(c("Y", "N"), 30, replace=TRUE)
> b <- sample(c("Y", "N"), 30, replace=TRUE)
> (a=="N")*(b=="Y")
 [1] 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1

has 1 only if both conditions are met.

and it works with NA values too.

> a[c(3,5,8)] <- NA
> b[c(3,6,7,8)] <- NA
> (a=="N")*(b=="Y")
 [1]  1  1 NA  0 NA NA NA NA  0  0  1  0  0  1  0  0  0  0  1  0  0  0  0  0  0
[26]  0  0  0  0  1

Cheers
Petr


>
> #Cross tab to check counts of two variables that new variable is based upon
> xtabs(~PSWBL+EVERRELJOB,subset(elsq2wbl,BYSCTRL==1==1),add
> NA=TRUE)
>
> #Checking count of newly created variable
> Q2sub <- subset(elsq2wbl,BYSCTRL==1==1)
> library(plyr)
> count(Q2sub,'NONWBLRELJOB')
>
> #The new variable has the correct count of "1", but 88 cases too many for "0"
> #The cross tab shows 20 and 68 NA cases that are being incorrectly counted as
> "0" in the new variable
>
> #My other approach at trying to handle the NAs properly-returns an error
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" &
> elsq2wbl$EVERRELJOB=="Yes",1,ifelse(is.na(elsq2wbl$PSWBL)(elsq2wbl$
> EVERRELJOB),NA,
>
> ifelse(elsq2wbl$PSWBL!="No" & elsq2wbl$EVERRELJOB!="Yes",0)))
>
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory &

Re: [R] Creating New Variable Using Ifelse

2017-08-09 Thread Ismail SEZEN

> On 10 Aug 2017, at 06:54, Courtney Benjamin  wrote:
> 
> Hello R Help List,
> 
> I am an R novice and trying to use the ifelse function to create a new binary 
> variable based off of the responses of two other binary variables; NAs are 
> involved.  I pulled it off almost successfully, but when I checked the counts 
> of my new variable for accuracy, I found that a small portion of the NA cases 
> were not being passed through as NAs, but as "0" counts in my new variable.  
> My many attempts at creating a nested ifelse statement that would pass the 
> NAs through properly have not been successful.  Any help is greatly 
> appreciated.
> 
> Here is a MRE:?
> 
> library(RCurl)
> data <- 
> getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq2wbl.csv;)
> elsq2wbl <- read.csv(text = data)
> 
> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA
> elsq2wbl [elsq2wbl[, "PSWBL"] < -2, "PSWBL"] <- NA
> 
> #Labeling categorical variable levels
> elsq2wbl$EVERRELJOB <- factor(elsq2wbl$EVERRELJOB, levels = c(0,1), labels = 
> c("No","Yes"))
> elsq2wbl$PSWBL <- factor(elsq2wbl$PSWBL, levels = c(0,1), labels = 
> c("No","Yes"))
> 
> ##Trying to create a new variable to indicate if the student had a job
> #related to the college studies that was NOT a WBL experience
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & 
> elsq2wbl$EVERRELJOB=="Yes",1,0)
> 
> #Cross tab to check counts of two variables that new variable is based upon
> xtabs(~PSWBL+EVERRELJOB,subset(elsq2wbl,BYSCTRL==1==1),addNA=TRUE)
> 
> #Checking count of newly created variable
> Q2sub <- subset(elsq2wbl,BYSCTRL==1==1)
> library(plyr)
> count(Q2sub,'NONWBLRELJOB')
> 
> #The new variable has the correct count of "1", but 88 cases too many for "0"
> #The cross tab shows 20 and 68 NA cases that are being incorrectly counted as 
> "0" in the new variable
> 
> #My other approach at trying to handle the NAs properly-returns an error
> elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & 
> elsq2wbl$EVERRELJOB=="Yes",1,ifelse(is.na(elsq2wbl$PSWBL)(elsq2wbl$EVERRELJOB),NA,
>   
> ifelse(elsq2wbl$PSWBL!="No" & elsq2wbl$EVERRELJOB!="Yes",0)))
> 
> 
> 
> Courtney Benjamin

I could not follow the question up clearly. But one thing that come across to 
my sight is that you have values in elsq2wbl$EVERRELJOB as below:

summary(factor(elsq2wbl$EVERRELJOB))
  -9   -8   -7   -4   -301 
 139  459  946 2488 1948 4619 5598 

and in fact, you want to set negative values to NA.

> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA

But after the command, you still have 1948 ‘-3' in the variable;

summary(factor(elsq2wbl$EVERRELJOB))
  -301 NA's 
1948 4619 5598 4032 

So I think, you need to fix the line as follows:

> ##Recoding Negative Responses to NA
> elsq2wbl [elsq2wbl[, "EVERRELJOB"] <= -3, "EVERRELJOB"] <- NA


Instead of using ‘-2' and ‘-3' as threshold to set NA for different variables, 
why don’t you use “less than zero” condition as follows?

elsq2wbl [elsq2wbl[, "EVERRELJOB"] < 0, "EVERRELJOB"] <- NA
elsq2wbl [elsq2wbl[, "PSWBL"] < 0, "PSWBL"] <- NA

Hence, in both columns (variables), values lower than zero will be NA and you 
only will have 0, 1 and NA values in the variable as you called “binary”.

_ifelse_ part:

You have NA’s in both variables. In this circumstances, consider following 
ifelse samples (both sides of '&' can be exchanged)

ifelse(TRUE & TRUE, 1, 0) # 1
ifelse(TRUE & FALSE, 1, 0) # 0
ifelse(FALSE & FALSE, 1, 0) # 0
ifelse(TRUE & NA, 1, 0) # NA
ifelse(FALSE & NA, 1, 0) # 0

according to above, try to create new logic to achieve what you need.

In your last neste-ifelse, you forgot to define a value if deepest ifelse 
statement fails.

elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & 
elsq2wbl$EVERRELJOB=="Yes", 1,
ifelse(is.na(elsq2wbl$PSWBL) & 
is.na(elsq2wbl$EVERRELJOB), NA,
   ifelse(elsq2wbl$PSWBL != "No" & 
elsq2wbl$EVERRELJOB != "Yes",0, "Forgotten value")))

Also, please, try to create a _minimal reproducible example_ instead of make us 
download a big csv file (219 columns x 16197 rows) and try to understand what 
you are trying to do. :)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating New Variable Using Ifelse

2017-08-09 Thread Courtney Benjamin
Hello R Help List,

I am an R novice and trying to use the ifelse function to create a new binary 
variable based off of the responses of two other binary variables; NAs are 
involved.  I pulled it off almost successfully, but when I checked the counts 
of my new variable for accuracy, I found that a small portion of the NA cases 
were not being passed through as NAs, but as "0" counts in my new variable.  My 
many attempts at creating a nested ifelse statement that would pass the NAs 
through properly have not been successful.  Any help is greatly appreciated.

Here is a MRE:?

library(RCurl)
data <- 
getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq2wbl.csv;)
elsq2wbl <- read.csv(text = data)

##Recoding Negative Responses to NA
elsq2wbl [elsq2wbl[, "EVERRELJOB"] < -3, "EVERRELJOB"] <- NA
elsq2wbl [elsq2wbl[, "PSWBL"] < -2, "PSWBL"] <- NA

#Labeling categorical variable levels
elsq2wbl$EVERRELJOB <- factor(elsq2wbl$EVERRELJOB, levels = c(0,1), labels = 
c("No","Yes"))
elsq2wbl$PSWBL <- factor(elsq2wbl$PSWBL, levels = c(0,1), labels = 
c("No","Yes"))

##Trying to create a new variable to indicate if the student had a job
#related to the college studies that was NOT a WBL experience
elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & 
elsq2wbl$EVERRELJOB=="Yes",1,0)

#Cross tab to check counts of two variables that new variable is based upon
xtabs(~PSWBL+EVERRELJOB,subset(elsq2wbl,BYSCTRL==1==1),addNA=TRUE)

#Checking count of newly created variable
Q2sub <- subset(elsq2wbl,BYSCTRL==1==1)
library(plyr)
count(Q2sub,'NONWBLRELJOB')

#The new variable has the correct count of "1", but 88 cases too many for "0"
#The cross tab shows 20 and 68 NA cases that are being incorrectly counted as 
"0" in the new variable

#My other approach at trying to handle the NAs properly-returns an error
elsq2wbl$NONWBLRELJOB <- ifelse(elsq2wbl$PSWBL=="No" & 
elsq2wbl$EVERRELJOB=="Yes",1,ifelse(is.na(elsq2wbl$PSWBL)(elsq2wbl$EVERRELJOB),NA,

   ifelse(elsq2wbl$PSWBL!="No" & elsq2wbl$EVERRELJOB!="Yes",0)))



Courtney Benjamin

Broome-Tioga BOCES

Automotive Technology II Teacher

Located at Gault Toyota

Doctoral Candidate-Educational Theory & Practice

State University of New York at Binghamton

cbenj...@btboces.org

607-763-8633

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.