Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-20 Thread Ebert,Timothy Aaron
Hi Tibor,
   I'll try again. Your problem has nothing to do with factors and everything 
to do with trying to bind a vector to a dataframe and not understanding that a 
vector must be of one class and that a column in a data frame is a vector and 
therefore must also be of one class. If you want to add a new row of data to 
your existing data frame use a new data frame with one row.

df <- data.frame(
  P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
  ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
  RT = round(runif(6, 7000, 16000), 0)
)

dq<-data.frame(
  P = factor("in"),
  ANSWER = factor("V>N"),
  RT = round(runif(1,7000, 16000), 0)
)

df2 <- rbind(df,dq)
df2

In this approach R kindly adds the new class to the factor variables and the 
numeric value to the numeric variable.
Keeping in mind that a vector can only be of one class will save you many 
debugging hours later on.

Tim

-Original Message-
From: Sarah Goslee  
Sent: Tuesday, September 20, 2022 9:02 AM
To: tibor.k...@rub.de
Cc: Ebert,Timothy Aaron ; r-help@r-project.org
Subject: Re: [R] Question concerning side effects of treating invalid factor 
levels

[External Email]

Hi Tibor,

No, you are misunderstanding the source of the problem. It has nothing to do 
with factors.

Instead, it has to do with the inability of a vector to hold more than one 
class.

You are using rbind() to add a new row to your data frame, but that vector is 
being coerced to character. That's what is forcing your numeric column to 
become character: you're adding a character to it.

> c("in", "V>N", round(runif(1, 7000, 16000), 0))
[1] "in""V>N"   "15709"

It has nothing whatsoever to do with factors or factor levels, and would occur 
if you were adding it to a data frame with character values.

If you want to mix types, you cannot use a vector.

c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0))
> str(rbind(df, c2))
'data.frame': 7 obs. of  3 variables:
 $ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4
 $ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3
 $ RT: num  10867 14808 11600 15881 8984 ...


Sarah

On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help  
wrote:
>
> Hi,
>
> this is a misunderstanding of my question. I wasn't worried about invalid 
> factor levels that produce NA. My question was why a column changes its 
> class, which I thought was a side effect. If you add a vector containing one 
> character string, the class of the whole vector becomes _chr_. And after this 
> element has been added to a column, we have two NAs for the column which are 
> factors, and a character string, which is responsible for the change of a 
> numerical vector into a character string vector (see ?c, where you find: "The 
> output type is determined from the highest type of the components in the 
> hierarchy NULL < raw < logical < integer < double < complex < character < 
> list < expression.").
>
>
> Best
>
>
> Tibor
>
>
>
> > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron :
> >
> > In your example code, the variable remains a class factor, and all entries 
> > are valid. The variables will behave as expected given the factor levels in 
> > the original dataframe.
> >
> > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
> > error messages warning me that I was bad.
> > What you get is NA for "not available", or "not appropriate" or a missing 
> > value. You gave the system an invalid factor level so it was entered as 
> > missing. If you get data that has a new factor level, you need to tell R to 
> > expect a new factor level first.
> >
> > levels(f1) <- c(levels(f1),"New Level")
> > levels(f1) <- c(levels(f1),c("NL1","NL2"))
> >
> >
> > Tim
> > -Original Message-
> > From: R-help  On Behalf Of Tibor Kiss 
> > via R-help
> > Sent: Monday, September 19, 2022 6:11 AM
> > To: r-help@r-project.org
> > Subject: [R] Question concerning side effects of treating invalid 
> > factor levels
> >
> > [External Email]
> >
> > Dear List members,
> >
> > I have tried now for several times to find out about a side effect of 
> > treating invalid factor levels, but did not find an answer. Various answers 
> > on stackexchange etc. produce the stuff that irritates me without even 
> > mentioning it.
> > So I am asking the list (apologies i

Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-20 Thread Sarah Goslee
Hi Tibor,

No, you are misunderstanding the source of the problem. It has nothing
to do with factors.

Instead, it has to do with the inability of a vector to hold more than
one class.

You are using rbind() to add a new row to your data frame, but that
vector is being coerced to character. That's what is forcing your
numeric column to become character: you're adding a character to it.

> c("in", "V>N", round(runif(1, 7000, 16000), 0))
[1] "in""V>N"   "15709"

It has nothing whatsoever to do with factors or factor levels, and
would occur if you were adding it to a data frame with character
values.

If you want to mix types, you cannot use a vector.

c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0))
> str(rbind(df, c2))
'data.frame': 7 obs. of  3 variables:
 $ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4
 $ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3
 $ RT: num  10867 14808 11600 15881 8984 ...


Sarah

On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help
 wrote:
>
> Hi,
>
> this is a misunderstanding of my question. I wasn’t worried about invalid 
> factor levels that produce NA. My question was why a column changes its 
> class, which I thought was a side effect. If you add a vector containing one 
> character string, the class of the whole vector becomes _chr_. And after this 
> element has been added to a column, we have two NAs for the column which are 
> factors, and a character string, which is responsible for the change of a 
> numerical vector into a character string vector (see ?c, where you find: "The 
> output type is determined from the highest type of the components in the 
> hierarchy NULL < raw < logical < integer < double < complex < character < 
> list < expression.“).
>
>
> Best
>
>
> Tibor
>
>
>
> > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron :
> >
> > In your example code, the variable remains a class factor, and all entries 
> > are valid. The variables will behave as expected given the factor levels in 
> > the original dataframe.
> >
> > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
> > error messages warning me that I was bad.
> > What you get is NA for "not available", or "not appropriate" or a missing 
> > value. You gave the system an invalid factor level so it was entered as 
> > missing. If you get data that has a new factor level, you need to tell R to 
> > expect a new factor level first.
> >
> > levels(f1) <- c(levels(f1),"New Level")
> > levels(f1) <- c(levels(f1),c("NL1","NL2"))
> >
> >
> > Tim
> > -Original Message-
> > From: R-help  On Behalf Of Tibor Kiss via 
> > R-help
> > Sent: Monday, September 19, 2022 6:11 AM
> > To: r-help@r-project.org
> > Subject: [R] Question concerning side effects of treating invalid factor 
> > levels
> >
> > [External Email]
> >
> > Dear List members,
> >
> > I have tried now for several times to find out about a side effect of 
> > treating invalid factor levels, but did not find an answer. Various answers 
> > on stackexchange etc. produce the stuff that irritates me without even 
> > mentioning it.
> > So I am asking the list (apologies if this has been treated in the past).
> >
> > If you add an invalid factor level to a column in a data frame, this has 
> > the side effect of turning a numerical column into a column with character 
> > strings. Here is a simple example:
> >
> >> df <- data.frame(
> >P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
> >ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
> >RT = round(runif(6, 7000, 16000), 0))
> >
> >> str(df)
> > 'data.frame':   6 obs. of  3 variables:
> > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
> > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
> > $ RT: num  11157 13719 14388 14527 14686 ..
> >
> >> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
> >
> >> str(df)
> > 'data.frame':   7 obs. of  3 variables:
> > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
> > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1

Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-20 Thread Tibor Kiss via R-help
Hi, 

this is a misunderstanding of my question. I wasn’t worried about invalid 
factor levels that produce NA. My question was why a column changes its class, 
which I thought was a side effect. If you add a vector containing one character 
string, the class of the whole vector becomes _chr_. And after this element has 
been added to a column, we have two NAs for the column which are factors, and a 
character string, which is responsible for the change of a numerical vector 
into a character string vector (see ?c, where you find: "The output type is 
determined from the highest type of the components in the hierarchy NULL < raw 
< logical < integer < double < complex < character < list < expression.“).

 
Best


Tibor



> Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron :
> 
> In your example code, the variable remains a class factor, and all entries 
> are valid. The variables will behave as expected given the factor levels in 
> the original dataframe.
> 
> (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
> error messages warning me that I was bad.
> What you get is NA for "not available", or "not appropriate" or a missing 
> value. You gave the system an invalid factor level so it was entered as 
> missing. If you get data that has a new factor level, you need to tell R to 
> expect a new factor level first.
> 
> levels(f1) <- c(levels(f1),"New Level")
> levels(f1) <- c(levels(f1),c("NL1","NL2"))
> 
> 
> Tim
> -Original Message-
> From: R-help  On Behalf Of Tibor Kiss via R-help
> Sent: Monday, September 19, 2022 6:11 AM
> To: r-help@r-project.org
> Subject: [R] Question concerning side effects of treating invalid factor 
> levels
> 
> [External Email]
> 
> Dear List members,
> 
> I have tried now for several times to find out about a side effect of 
> treating invalid factor levels, but did not find an answer. Various answers 
> on stackexchange etc. produce the stuff that irritates me without even 
> mentioning it.
> So I am asking the list (apologies if this has been treated in the past).
> 
> If you add an invalid factor level to a column in a data frame, this has the 
> side effect of turning a numerical column into a column with character 
> strings. Here is a simple example:
> 
>> df <- data.frame(
>P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
>ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
>RT = round(runif(6, 7000, 16000), 0))
> 
>> str(df)
> 'data.frame':   6 obs. of  3 variables:
> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
> $ RT: num  11157 13719 14388 14527 14686 ..
> 
>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
> 
>> str(df)
> 'data.frame':   7 obs. of  3 variables:
> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
> $ RT: chr  "11478" "15819" "8305" "8852" ...
> 
> You see that RT has changed from _num_ to _chr_ as a side effect of adding 
> the invalid factor level as NA. I would appreciate understanding what the 
> purpose of the type coercion is.
> 
> Thanks in advance
> 
> 
> Tibor
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0
> PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3Dreserved=0
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-20 Thread Tibor Kiss via R-help
Dear Eric,

thank you very much. I wouldn’t have come to the idea to look up the help page 
for _c()_, which of course explains the coercion to the highest type. 

Best

T.


> Am 19.09.2022 um 13:31 schrieb Eric Berger :
> 
> You are misinterpreting what is going on.
> The rbind command includes c(char, char, int) which produces a
> character vector of length 3.
> This is what you are rbind-ing which changes the type of the RT column.
> 
> If you do rbind(df, data.frame(P="in", ANSWER="V>N",
> RT=round(runif(1,7000,16000),0)))
> you will see that everything is fine. (New factor values are created.)
> 
> HTH,
> Eric
> 
> On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help
>  wrote:
>> 
>> Dear List members,
>> 
>> I have tried now for several times to find out about a side effect of 
>> treating invalid factor levels, but did not find an answer. Various answers 
>> on stackexchange etc. produce the stuff that irritates me without even 
>> mentioning it.
>> So I am asking the list (apologies if this has been treated in the past).
>> 
>> If you add an invalid factor level to a column in a data frame, this has the 
>> side effect of turning a numerical column into a column with character 
>> strings. Here is a simple example:
>> 
>>> df <- data.frame(
>>P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
>>ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
>>RT = round(runif(6, 7000, 16000), 0))
>> 
>>> str(df)
>> 'data.frame':   6 obs. of  3 variables:
>> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
>> $ RT: num  11157 13719 14388 14527 14686 ..
>> 
>>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
>> 
>>> str(df)
>> 'data.frame':   7 obs. of  3 variables:
>> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
>> $ RT: chr  "11478" "15819" "8305" "8852" …
>> 
>> You see that RT has changed from _num_ to _chr_ as a side effect of adding 
>> the invalid factor level as NA. I would appreciate understanding what the 
>> purpose of the type coercion is.
>> 
>> Thanks in advance
>> 
>> 
>> Tibor
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-19 Thread Ebert,Timothy Aaron


Sorry, My bad.
A vector must be of a single class. When you declare c("in", "V>N", 
round(runif(1, 7000, 16000), 0)) R will calculate the random number, but then 
convert it to a character class to conform with the other two elements in that 
vector. R then binds this to your original df and finds that it must add a 
character to a numeric vector. To keep the vector of all the same class it 
converts everything to character.

Better?

Tim

From: tibor.k...@rub.de 
Sent: Monday, September 19, 2022 8:07 AM
To: Ebert,Timothy Aaron 
Cc: r-help@r-project.org
Subject: Re: [R] Question concerning side effects of treating invalid factor 
levels

[External Email]
Hi,

this is a misunderstanding of my question. I wasn't worried about invalid 
factor levels that produce NA. My question was why a column changes its class, 
which I thought was a side effect. If you add a vector containing one character 
string, the class of the whole vector becomes _chr_. And after this element has 
been added to a column, we have two NAs for the column which are factors, and a 
character string, which is responsible for the change of a numerical vector 
into a character string vector (see ?c, where you find: "The output type is 
determined from the highest type of the components in the hierarchy NULL < raw 
< logical < integer < double < complex < character < list < expression.").


Best


Tibor




Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>>:

In your example code, the variable remains a class factor, and all entries are 
valid. The variables will behave as expected given the factor levels in the 
original dataframe.

(At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
error messages warning me that I was bad.
What you get is NA for "not available", or "not appropriate" or a missing 
value. You gave the system an invalid factor level so it was entered as 
missing. If you get data that has a new factor level, you need to tell R to 
expect a new factor level first.

levels(f1) <- c(levels(f1),"New Level")
levels(f1) <- c(levels(f1),c("NL1","NL2"))


Tim
-Original Message-
From: R-help 
mailto:r-help-boun...@r-project.org>> On Behalf 
Of Tibor Kiss via R-help
Sent: Monday, September 19, 2022 6:11 AM
To: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: [R] Question concerning side effects of treating invalid factor levels

[External Email]

Dear List members,

I have tried now for several times to find out about a side effect of treating 
invalid factor levels, but did not find an answer. Various answers on 
stackexchange etc. produce the stuff that irritates me without even mentioning 
it.
So I am asking the list (apologies if this has been treated in the past).

If you add an invalid factor level to a column in a data frame, this has the 
side effect of turning a numerical column into a column with character strings. 
Here is a simple example:


df <- data.frame(
   P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
   ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
   RT = round(runif(6, 7000, 16000), 0))


str(df)
'data.frame':   6 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
$ RT: num  11157 13719 14388 14527 14686 ..


df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))


str(df)
'data.frame':   7 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
$ RT: chr  "11478" "15819" "8305" "8852" ...

You see that RT has changed from _num_ to _chr_ as a side effect of adding the 
invalid factor level as NA. I would appreciate understanding what the purpose 
of the type coercion is.

Thanks in advance


Tibor
__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu%7C8befab72dd954eeba5fe08da9a378808%7C

Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-19 Thread Ebert,Timothy Aaron
In your example code, the variable remains a class factor, and all entries are 
valid. The variables will behave as expected given the factor levels in the 
original dataframe.

(At least on my system R 4.2, in RStudio, in Windows) R returns a couple of 
error messages warning me that I was bad.
What you get is NA for "not available", or "not appropriate" or a missing 
value. You gave the system an invalid factor level so it was entered as 
missing. If you get data that has a new factor level, you need to tell R to 
expect a new factor level first.

levels(f1) <- c(levels(f1),"New Level")
levels(f1) <- c(levels(f1),c("NL1","NL2"))


Tim
-Original Message-
From: R-help  On Behalf Of Tibor Kiss via R-help
Sent: Monday, September 19, 2022 6:11 AM
To: r-help@r-project.org
Subject: [R] Question concerning side effects of treating invalid factor levels

[External Email]

Dear List members,

I have tried now for several times to find out about a side effect of treating 
invalid factor levels, but did not find an answer. Various answers on 
stackexchange etc. produce the stuff that irritates me without even mentioning 
it.
So I am asking the list (apologies if this has been treated in the past).

If you add an invalid factor level to a column in a data frame, this has the 
side effect of turning a numerical column into a column with character strings. 
Here is a simple example:

> df <- data.frame(
P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
RT = round(runif(6, 7000, 16000), 0))

> str(df)
'data.frame':   6 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
$ RT: num  11157 13719 14388 14527 14686 ..

> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))

> str(df)
'data.frame':   7 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
$ RT: chr  "11478" "15819" "8305" "8852" ...

You see that RT has changed from _num_ to _chr_ as a side effect of adding the 
invalid factor level as NA. I would appreciate understanding what the purpose 
of the type coercion is.

Thanks in advance


Tibor
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=sNDYEJKhjSu%2FtrTIwZx5yVemKgDheQYXLrcQqJ2mOgo%3Dreserved=0
PLEASE do read the posting guide 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C6ee1a1f50c14442beef508da9a301bde%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637991828670135028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=AP%2B4fa5pvbGr3IfwdiQvjXwkOdY90CIWIWWWmpIHH7w%3Dreserved=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question concerning side effects of treating invalid factor levels

2022-09-19 Thread Eric Berger
You are misinterpreting what is going on.
The rbind command includes c(char, char, int) which produces a
character vector of length 3.
This is what you are rbind-ing which changes the type of the RT column.

If you do rbind(df, data.frame(P="in", ANSWER="V>N",
RT=round(runif(1,7000,16000),0)))
you will see that everything is fine. (New factor values are created.)

HTH,
Eric

On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help
 wrote:
>
> Dear List members,
>
> I have tried now for several times to find out about a side effect of 
> treating invalid factor levels, but did not find an answer. Various answers 
> on stackexchange etc. produce the stuff that irritates me without even 
> mentioning it.
> So I am asking the list (apologies if this has been treated in the past).
>
> If you add an invalid factor level to a column in a data frame, this has the 
> side effect of turning a numerical column into a column with character 
> strings. Here is a simple example:
>
> > df <- data.frame(
> P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
> ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
> RT = round(runif(6, 7000, 16000), 0))
>
> > str(df)
> 'data.frame':   6 obs. of  3 variables:
> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
> $ RT: num  11157 13719 14388 14527 14686 ..
>
> > df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
>
> > str(df)
> 'data.frame':   7 obs. of  3 variables:
> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
> $ RT: chr  "11478" "15819" "8305" "8852" …
>
> You see that RT has changed from _num_ to _chr_ as a side effect of adding 
> the invalid factor level as NA. I would appreciate understanding what the 
> purpose of the type coercion is.
>
> Thanks in advance
>
>
> Tibor
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question concerning side effects of treating invalid factor levels

2022-09-19 Thread Tibor Kiss via R-help
Dear List members,

I have tried now for several times to find out about a side effect of treating 
invalid factor levels, but did not find an answer. Various answers on 
stackexchange etc. produce the stuff that irritates me without even mentioning 
it. 
So I am asking the list (apologies if this has been treated in the past).

If you add an invalid factor level to a column in a data frame, this has the 
side effect of turning a numerical column into a column with character strings. 
Here is a simple example:

> df <- data.frame(
P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))), 
RT = round(runif(6, 7000, 16000), 0))

> str(df)
'data.frame':   6 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
$ RT: num  11157 13719 14388 14527 14686 ..

> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))

> str(df)
'data.frame':   7 obs. of  3 variables:
$ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
$ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
$ RT: chr  "11478" "15819" "8305" "8852" …

You see that RT has changed from _num_ to _chr_ as a side effect of adding the 
invalid factor level as NA. I would appreciate understanding what the purpose 
of the type coercion is.

Thanks in advance


Tibor
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.