Re: [R] Question concerning side effects of treating invalid factor levels
Hi Tibor, I'll try again. Your problem has nothing to do with factors and everything to do with trying to bind a vector to a dataframe and not understanding that a vector must be of one class and that a column in a data frame is a vector and therefore must also be of one class. If you want to add a new row of data to your existing data frame use a new data frame with one row. df <- data.frame( P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), RT = round(runif(6, 7000, 16000), 0) ) dq<-data.frame( P = factor("in"), ANSWER = factor("V>N"), RT = round(runif(1,7000, 16000), 0) ) df2 <- rbind(df,dq) df2 In this approach R kindly adds the new class to the factor variables and the numeric value to the numeric variable. Keeping in mind that a vector can only be of one class will save you many debugging hours later on. Tim -Original Message- From: Sarah Goslee Sent: Tuesday, September 20, 2022 9:02 AM To: tibor.k...@rub.de Cc: Ebert,Timothy Aaron ; r-help@r-project.org Subject: Re: [R] Question concerning side effects of treating invalid factor levels [External Email] Hi Tibor, No, you are misunderstanding the source of the problem. It has nothing to do with factors. Instead, it has to do with the inability of a vector to hold more than one class. You are using rbind() to add a new row to your data frame, but that vector is being coerced to character. That's what is forcing your numeric column to become character: you're adding a character to it. > c("in", "V>N", round(runif(1, 7000, 16000), 0)) [1] "in""V>N" "15709" It has nothing whatsoever to do with factors or factor levels, and would occur if you were adding it to a data frame with character values. If you want to mix types, you cannot use a vector. c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0)) > str(rbind(df, c2)) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4 $ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3 $ RT: num 10867 14808 11600 15881 8984 ... Sarah On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help wrote: > > Hi, > > this is a misunderstanding of my question. I wasn't worried about invalid > factor levels that produce NA. My question was why a column changes its > class, which I thought was a side effect. If you add a vector containing one > character string, the class of the whole vector becomes _chr_. And after this > element has been added to a column, we have two NAs for the column which are > factors, and a character string, which is responsible for the change of a > numerical vector into a character string vector (see ?c, where you find: "The > output type is determined from the highest type of the components in the > hierarchy NULL < raw < logical < integer < double < complex < character < > list < expression."). > > > Best > > > Tibor > > > > > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron : > > > > In your example code, the variable remains a class factor, and all entries > > are valid. The variables will behave as expected given the factor levels in > > the original dataframe. > > > > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of > > error messages warning me that I was bad. > > What you get is NA for "not available", or "not appropriate" or a missing > > value. You gave the system an invalid factor level so it was entered as > > missing. If you get data that has a new factor level, you need to tell R to > > expect a new factor level first. > > > > levels(f1) <- c(levels(f1),"New Level") > > levels(f1) <- c(levels(f1),c("NL1","NL2")) > > > > > > Tim > > -Original Message- > > From: R-help On Behalf Of Tibor Kiss > > via R-help > > Sent: Monday, September 19, 2022 6:11 AM > > To: r-help@r-project.org > > Subject: [R] Question concerning side effects of treating invalid > > factor levels > > > > [External Email] > > > > Dear List members, > > > > I have tried now for several times to find out about a side effect of > > treating invalid factor levels, but did not find an answer. Various answers > > on stackexchange etc. produce the stuff that irritates me without even > > mentioning it. > > So I am asking the list (apologies i
Re: [R] Question concerning side effects of treating invalid factor levels
Hi Tibor, No, you are misunderstanding the source of the problem. It has nothing to do with factors. Instead, it has to do with the inability of a vector to hold more than one class. You are using rbind() to add a new row to your data frame, but that vector is being coerced to character. That's what is forcing your numeric column to become character: you're adding a character to it. > c("in", "V>N", round(runif(1, 7000, 16000), 0)) [1] "in""V>N" "15709" It has nothing whatsoever to do with factors or factor levels, and would occur if you were adding it to a data frame with character values. If you want to mix types, you cannot use a vector. c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0)) > str(rbind(df, c2)) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4 $ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3 $ RT: num 10867 14808 11600 15881 8984 ... Sarah On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help wrote: > > Hi, > > this is a misunderstanding of my question. I wasn’t worried about invalid > factor levels that produce NA. My question was why a column changes its > class, which I thought was a side effect. If you add a vector containing one > character string, the class of the whole vector becomes _chr_. And after this > element has been added to a column, we have two NAs for the column which are > factors, and a character string, which is responsible for the change of a > numerical vector into a character string vector (see ?c, where you find: "The > output type is determined from the highest type of the components in the > hierarchy NULL < raw < logical < integer < double < complex < character < > list < expression.“). > > > Best > > > Tibor > > > > > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron : > > > > In your example code, the variable remains a class factor, and all entries > > are valid. The variables will behave as expected given the factor levels in > > the original dataframe. > > > > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of > > error messages warning me that I was bad. > > What you get is NA for "not available", or "not appropriate" or a missing > > value. You gave the system an invalid factor level so it was entered as > > missing. If you get data that has a new factor level, you need to tell R to > > expect a new factor level first. > > > > levels(f1) <- c(levels(f1),"New Level") > > levels(f1) <- c(levels(f1),c("NL1","NL2")) > > > > > > Tim > > -Original Message- > > From: R-help On Behalf Of Tibor Kiss via > > R-help > > Sent: Monday, September 19, 2022 6:11 AM > > To: r-help@r-project.org > > Subject: [R] Question concerning side effects of treating invalid factor > > levels > > > > [External Email] > > > > Dear List members, > > > > I have tried now for several times to find out about a side effect of > > treating invalid factor levels, but did not find an answer. Various answers > > on stackexchange etc. produce the stuff that irritates me without even > > mentioning it. > > So I am asking the list (apologies if this has been treated in the past). > > > > If you add an invalid factor level to a column in a data frame, this has > > the side effect of turning a numerical column into a column with character > > strings. Here is a simple example: > > > >> df <- data.frame( > >P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), > >ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), > >RT = round(runif(6, 7000, 16000), 0)) > > > >> str(df) > > 'data.frame': 6 obs. of 3 variables: > > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 > > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 > > $ RT: num 11157 13719 14388 14527 14686 .. > > > >> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > > > >> str(df) > > 'data.frame': 7 obs. of 3 variables: > > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA > > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
Re: [R] Question concerning side effects of treating invalid factor levels
Hi, this is a misunderstanding of my question. I wasn’t worried about invalid factor levels that produce NA. My question was why a column changes its class, which I thought was a side effect. If you add a vector containing one character string, the class of the whole vector becomes _chr_. And after this element has been added to a column, we have two NAs for the column which are factors, and a character string, which is responsible for the change of a numerical vector into a character string vector (see ?c, where you find: "The output type is determined from the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression.“). Best Tibor > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron : > > In your example code, the variable remains a class factor, and all entries > are valid. The variables will behave as expected given the factor levels in > the original dataframe. > > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of > error messages warning me that I was bad. > What you get is NA for "not available", or "not appropriate" or a missing > value. You gave the system an invalid factor level so it was entered as > missing. If you get data that has a new factor level, you need to tell R to > expect a new factor level first. > > levels(f1) <- c(levels(f1),"New Level") > levels(f1) <- c(levels(f1),c("NL1","NL2")) > > > Tim > -Original Message- > From: R-help On Behalf Of Tibor Kiss via R-help > Sent: Monday, September 19, 2022 6:11 AM > To: r-help@r-project.org > Subject: [R] Question concerning side effects of treating invalid factor > levels > > [External Email] > > Dear List members, > > I have tried now for several times to find out about a side effect of > treating invalid factor levels, but did not find an answer. Various answers > on stackexchange etc. produce the stuff that irritates me without even > mentioning it. > So I am asking the list (apologies if this has been treated in the past). > > If you add an invalid factor level to a column in a data frame, this has the > side effect of turning a numerical column into a column with character > strings. Here is a simple example: > >> df <- data.frame( >P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), >ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), >RT = round(runif(6, 7000, 16000), 0)) > >> str(df) > 'data.frame': 6 obs. of 3 variables: > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 > $ RT: num 11157 13719 14388 14527 14686 .. > >> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > >> str(df) > 'data.frame': 7 obs. of 3 variables: > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA > $ RT: chr "11478" "15819" "8305" "8852" ... > > You see that RT has changed from _num_ to _chr_ as a side effect of adding > the invalid factor level as NA. I would appreciate understanding what the > purpose of the type coercion is. > > Thanks in advance > > > Tibor > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0 > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3Dreserved=0 > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning side effects of treating invalid factor levels
Dear Eric, thank you very much. I wouldn’t have come to the idea to look up the help page for _c()_, which of course explains the coercion to the highest type. Best T. > Am 19.09.2022 um 13:31 schrieb Eric Berger : > > You are misinterpreting what is going on. > The rbind command includes c(char, char, int) which produces a > character vector of length 3. > This is what you are rbind-ing which changes the type of the RT column. > > If you do rbind(df, data.frame(P="in", ANSWER="V>N", > RT=round(runif(1,7000,16000),0))) > you will see that everything is fine. (New factor values are created.) > > HTH, > Eric > > On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help > wrote: >> >> Dear List members, >> >> I have tried now for several times to find out about a side effect of >> treating invalid factor levels, but did not find an answer. Various answers >> on stackexchange etc. produce the stuff that irritates me without even >> mentioning it. >> So I am asking the list (apologies if this has been treated in the past). >> >> If you add an invalid factor level to a column in a data frame, this has the >> side effect of turning a numerical column into a column with character >> strings. Here is a simple example: >> >>> df <- data.frame( >>P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), >>ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), >>RT = round(runif(6, 7000, 16000), 0)) >> >>> str(df) >> 'data.frame': 6 obs. of 3 variables: >> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 >> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 >> $ RT: num 11157 13719 14388 14527 14686 .. >> >>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) >> >>> str(df) >> 'data.frame': 7 obs. of 3 variables: >> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA >> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA >> $ RT: chr "11478" "15819" "8305" "8852" … >> >> You see that RT has changed from _num_ to _chr_ as a side effect of adding >> the invalid factor level as NA. I would appreciate understanding what the >> purpose of the type coercion is. >> >> Thanks in advance >> >> >> Tibor >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning side effects of treating invalid factor levels
Sorry, My bad. A vector must be of a single class. When you declare c("in", "V>N", round(runif(1, 7000, 16000), 0)) R will calculate the random number, but then convert it to a character class to conform with the other two elements in that vector. R then binds this to your original df and finds that it must add a character to a numeric vector. To keep the vector of all the same class it converts everything to character. Better? Tim From: tibor.k...@rub.de Sent: Monday, September 19, 2022 8:07 AM To: Ebert,Timothy Aaron Cc: r-help@r-project.org Subject: Re: [R] Question concerning side effects of treating invalid factor levels [External Email] Hi, this is a misunderstanding of my question. I wasn't worried about invalid factor levels that produce NA. My question was why a column changes its class, which I thought was a side effect. If you add a vector containing one character string, the class of the whole vector becomes _chr_. And after this element has been added to a column, we have two NAs for the column which are factors, and a character string, which is responsible for the change of a numerical vector into a character string vector (see ?c, where you find: "The output type is determined from the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression."). Best Tibor Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron mailto:teb...@ufl.edu>>: In your example code, the variable remains a class factor, and all entries are valid. The variables will behave as expected given the factor levels in the original dataframe. (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of error messages warning me that I was bad. What you get is NA for "not available", or "not appropriate" or a missing value. You gave the system an invalid factor level so it was entered as missing. If you get data that has a new factor level, you need to tell R to expect a new factor level first. levels(f1) <- c(levels(f1),"New Level") levels(f1) <- c(levels(f1),c("NL1","NL2")) Tim -Original Message- From: R-help mailto:r-help-boun...@r-project.org>> On Behalf Of Tibor Kiss via R-help Sent: Monday, September 19, 2022 6:11 AM To: r-help@r-project.org<mailto:r-help@r-project.org> Subject: [R] Question concerning side effects of treating invalid factor levels [External Email] Dear List members, I have tried now for several times to find out about a side effect of treating invalid factor levels, but did not find an answer. Various answers on stackexchange etc. produce the stuff that irritates me without even mentioning it. So I am asking the list (apologies if this has been treated in the past). If you add an invalid factor level to a column in a data frame, this has the side effect of turning a numerical column into a column with character strings. Here is a simple example: df <- data.frame( P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), RT = round(runif(6, 7000, 16000), 0)) str(df) 'data.frame': 6 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 $ RT: num 11157 13719 14388 14527 14686 .. df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) str(df) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA $ RT: chr "11478" "15819" "8305" "8852" ... You see that RT has changed from _num_ to _chr_ as a side effect of adding the invalid factor level as NA. I would appreciate understanding what the purpose of the type coercion is. Thanks in advance Tibor __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu%7C8befab72dd954eeba5fe08da9a378808%7C
Re: [R] Question concerning side effects of treating invalid factor levels
In your example code, the variable remains a class factor, and all entries are valid. The variables will behave as expected given the factor levels in the original dataframe. (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of error messages warning me that I was bad. What you get is NA for "not available", or "not appropriate" or a missing value. You gave the system an invalid factor level so it was entered as missing. If you get data that has a new factor level, you need to tell R to expect a new factor level first. levels(f1) <- c(levels(f1),"New Level") levels(f1) <- c(levels(f1),c("NL1","NL2")) Tim -Original Message- From: R-help On Behalf Of Tibor Kiss via R-help Sent: Monday, September 19, 2022 6:11 AM To: r-help@r-project.org Subject: [R] Question concerning side effects of treating invalid factor levels [External Email] Dear List members, I have tried now for several times to find out about a side effect of treating invalid factor levels, but did not find an answer. Various answers on stackexchange etc. produce the stuff that irritates me without even mentioning it. So I am asking the list (apologies if this has been treated in the past). If you add an invalid factor level to a column in a data frame, this has the side effect of turning a numerical column into a column with character strings. Here is a simple example: > df <- data.frame( P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), RT = round(runif(6, 7000, 16000), 0)) > str(df) 'data.frame': 6 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 $ RT: num 11157 13719 14388 14527 14686 .. > df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > str(df) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA $ RT: chr "11478" "15819" "8305" "8852" ... You see that RT has changed from _num_ to _chr_ as a side effect of adding the invalid factor level as NA. I would appreciate understanding what the purpose of the type coercion is. Thanks in advance Tibor __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3Dreserved=0 and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning side effects of treating invalid factor levels
You are misinterpreting what is going on. The rbind command includes c(char, char, int) which produces a character vector of length 3. This is what you are rbind-ing which changes the type of the RT column. If you do rbind(df, data.frame(P="in", ANSWER="V>N", RT=round(runif(1,7000,16000),0))) you will see that everything is fine. (New factor values are created.) HTH, Eric On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help wrote: > > Dear List members, > > I have tried now for several times to find out about a side effect of > treating invalid factor levels, but did not find an answer. Various answers > on stackexchange etc. produce the stuff that irritates me without even > mentioning it. > So I am asking the list (apologies if this has been treated in the past). > > If you add an invalid factor level to a column in a data frame, this has the > side effect of turning a numerical column into a column with character > strings. Here is a simple example: > > > df <- data.frame( > P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), > ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), > RT = round(runif(6, 7000, 16000), 0)) > > > str(df) > 'data.frame': 6 obs. of 3 variables: > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 > $ RT: num 11157 13719 14388 14527 14686 .. > > > df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > > > str(df) > 'data.frame': 7 obs. of 3 variables: > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA > $ RT: chr "11478" "15819" "8305" "8852" … > > You see that RT has changed from _num_ to _chr_ as a side effect of adding > the invalid factor level as NA. I would appreciate understanding what the > purpose of the type coercion is. > > Thanks in advance > > > Tibor > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question concerning side effects of treating invalid factor levels
Dear List members, I have tried now for several times to find out about a side effect of treating invalid factor levels, but did not find an answer. Various answers on stackexchange etc. produce the stuff that irritates me without even mentioning it. So I am asking the list (apologies if this has been treated in the past). If you add an invalid factor level to a column in a data frame, this has the side effect of turning a numerical column into a column with character strings. Here is a simple example: > df <- data.frame( P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")), ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), RT = round(runif(6, 7000, 16000), 0)) > str(df) 'data.frame': 6 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 $ RT: num 11157 13719 14388 14527 14686 .. > df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0))) > str(df) 'data.frame': 7 obs. of 3 variables: $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA $ RT: chr "11478" "15819" "8305" "8852" … You see that RT has changed from _num_ to _chr_ as a side effect of adding the invalid factor level as NA. I would appreciate understanding what the purpose of the type coercion is. Thanks in advance Tibor __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.