Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
Thank you for the explanation, Peter. Angel -Mensaje original- De: peter dalgaard [mailto:pda...@gmail.com] Enviado el: lun 01/09/2014 20:10 Para: Angel Rodriguez CC: r-help Asunto: Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable On 01 Sep 2014, at 13:08 , Angel Rodriguez wrote: > Thank you John, Jim, Jeff and both Davids for your answers. > > After trying different combinations of values for the variable samplem, it > looks like if age is greater than 65, R applies the correct code 1 whatever > the value of samplem, but if age is less than 65, it just copies the values > of samplem to sample. I do not understand why it does so. > It's because indexed assignment is really (white lie alert: it's actually worse) N$sample <- `[<-`(`$`(N, `sample`), index, value) and since N$sample isn't there from the outset, partial matching kicks in for the `$`bit and makes the right hand side equivalent to the same thing with `samplem`. The result still gets assigned to N$sample, but the value is the same that N$samplem would get from N$samplem[N$age >= 65] <- 1 Notice the difference if you do > N$sample <- NA > N$sample[N$age >= 65] <- 1 > N age samplem sample 1 67 NA 1 2 62 1 NA 3 74 1 1 4 61 1 NA 5 60 1 NA 6 55 1 NA 7 60 1 NA 8 59 1 NA 9 58 NA NA -pd > In any case, Jim's syntax work very well, although I do not understand why > either. > > Answering to Jim, I just wanted a variable that could identify individuals > with some characteristics (not only age, as in this example that has been > oversimplified). > > Best regards, > > Angel Rodriguez-Laso > > > -Mensaje original- > De: John McKown [mailto:john.archie.mck...@gmail.com] > Enviado el: vie 29/08/2014 14:46 > Para: Angel Rodriguez > CC: r-help > Asunto: Re: [R] Unexpected behavior when giving a value to a new variable > based on the value of another variable > > On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez > wrote: >> >> Dear subscribers, >> >> I've found that if there is a variable in the dataframe with a name very >> similar to a new variable, R does not give the correct values to this latter >> variable based on the values of a third value: >> >> > >> >> Any clue for this behavior? >> > >> >> Thank you very much. >> >> Angel Rodriguez-Laso >> Research project manager >> Matia Instituto Gerontologico > > That is unusual, but appears to be documented in a section from > > ?`[` > > > Character indices > > Character indices can in some circumstances be partially matched (see > pmatch) to the names or dimnames of the object being subsetted (but > never for subassignment). Unlike S (Becker et al p. 358)), R never > uses partial matching when extracting by [, and partial matching is > not by default used by [[ (see argument exact). > > Thus the default behaviour is to use partial matching only when > extracting from recursive objects (except environments) by $. Even in > that case, warnings can be switched on by > options(warnPartialMatchDollar = TRUE). > > Neither empty ("") nor NA indices match any names, not even empty nor > missing names. If any object has no names or appropriate dimnames, > they are taken as all "" and so match nothing. > > > Note the commend about "partial matching" in the middle paragraph in > the quote above. > > -- > There is nothing more pleasant than traveling and meeting new people! > Genghis Khan > > Maranatha! <>< > John McKown > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
On 01 Sep 2014, at 13:08 , Angel Rodriguez wrote: > Thank you John, Jim, Jeff and both Davids for your answers. > > After trying different combinations of values for the variable samplem, it > looks like if age is greater than 65, R applies the correct code 1 whatever > the value of samplem, but if age is less than 65, it just copies the values > of samplem to sample. I do not understand why it does so. > It's because indexed assignment is really (white lie alert: it's actually worse) N$sample <- `[<-`(`$`(N, `sample`), index, value) and since N$sample isn't there from the outset, partial matching kicks in for the `$`bit and makes the right hand side equivalent to the same thing with `samplem`. The result still gets assigned to N$sample, but the value is the same that N$samplem would get from N$samplem[N$age >= 65] <- 1 Notice the difference if you do > N$sample <- NA > N$sample[N$age >= 65] <- 1 > N age samplem sample 1 67 NA 1 2 62 1 NA 3 74 1 1 4 61 1 NA 5 60 1 NA 6 55 1 NA 7 60 1 NA 8 59 1 NA 9 58 NA NA -pd > In any case, Jim's syntax work very well, although I do not understand why > either. > > Answering to Jim, I just wanted a variable that could identify individuals > with some characteristics (not only age, as in this example that has been > oversimplified). > > Best regards, > > Angel Rodriguez-Laso > > > -Mensaje original- > De: John McKown [mailto:john.archie.mck...@gmail.com] > Enviado el: vie 29/08/2014 14:46 > Para: Angel Rodriguez > CC: r-help > Asunto: Re: [R] Unexpected behavior when giving a value to a new variable > based on the value of another variable > > On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez > wrote: >> >> Dear subscribers, >> >> I've found that if there is a variable in the dataframe with a name very >> similar to a new variable, R does not give the correct values to this latter >> variable based on the values of a third value: >> >> > >> >> Any clue for this behavior? >> > >> >> Thank you very much. >> >> Angel Rodriguez-Laso >> Research project manager >> Matia Instituto Gerontologico > > That is unusual, but appears to be documented in a section from > > ?`[` > > > Character indices > > Character indices can in some circumstances be partially matched (see > pmatch) to the names or dimnames of the object being subsetted (but > never for subassignment). Unlike S (Becker et al p. 358)), R never > uses partial matching when extracting by [, and partial matching is > not by default used by [[ (see argument exact). > > Thus the default behaviour is to use partial matching only when > extracting from recursive objects (except environments) by $. Even in > that case, warnings can be switched on by > options(warnPartialMatchDollar = TRUE). > > Neither empty ("") nor NA indices match any names, not even empty nor > missing names. If any object has no names or appropriate dimnames, > they are taken as all "" and so match nothing. > > > Note the commend about "partial matching" in the middle paragraph in > the quote above. > > -- > There is nothing more pleasant than traveling and meeting new people! > Genghis Khan > > Maranatha! <>< > John McKown > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
Thank you John, Jim, Jeff and both Davids for your answers. After trying different combinations of values for the variable samplem, it looks like if age is greater than 65, R applies the correct code 1 whatever the value of samplem, but if age is less than 65, it just copies the values of samplem to sample. I do not understand why it does so. In any case, Jim's syntax work very well, although I do not understand why either. Answering to Jim, I just wanted a variable that could identify individuals with some characteristics (not only age, as in this example that has been oversimplified). Best regards, Angel Rodriguez-Laso -Mensaje original- De: John McKown [mailto:john.archie.mck...@gmail.com] Enviado el: vie 29/08/2014 14:46 Para: Angel Rodriguez CC: r-help Asunto: Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez wrote: > > Dear subscribers, > > I've found that if there is a variable in the dataframe with a name very > similar to a new variable, R does not give the correct values to this latter > variable based on the values of a third value: > > > > Any clue for this behavior? > > > Thank you very much. > > Angel Rodriguez-Laso > Research project manager > Matia Instituto Gerontologico That is unusual, but appears to be documented in a section from ?`[` Character indices Character indices can in some circumstances be partially matched (see pmatch) to the names or dimnames of the object being subsetted (but never for subassignment). Unlike S (Becker et al p. 358)), R never uses partial matching when extracting by [, and partial matching is not by default used by [[ (see argument exact). Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE). Neither empty ("") nor NA indices match any names, not even empty nor missing names. If any object has no names or appropriate dimnames, they are taken as all "" and so match nothing. Note the commend about "partial matching" in the middle paragraph in the quote above. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
One clue is the help file for "$"... ?" $" In particular there see the discussion of character indices and the "exact" argument. You can also find this discussed in the Introduction to R document that comes with the software. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 29, 2014 1:53:47 AM PDT, Angel Rodriguez wrote: > >Dear subscribers, > >I've found that if there is a variable in the dataframe with a name >very similar to a new variable, R does not give the correct values to >this latter variable based on the values of a third value: > > >> M <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, >58)),.Names = c("age"), row.names = c(NA, -9L), >+class = "data.frame") >> M$sample[M$age >= 65] <- 1 >> M > age sample >1 67 1 >2 62 NA >3 74 1 >4 61 NA >5 60 NA >6 55 NA >7 60 NA >8 59 NA >9 58 NA >> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = >c(NA, 1, 1, 1, 1,1,1,1,NA)), >+ .Names = c("age","samplem"), row.names = c(NA, >-9L), class = "data.frame") >> N$sample[N$age >= 65] <- 1 >> N > age samplem sample >1 67 NA 1 >2 62 1 1 >3 74 1 1 >4 61 1 1 >5 60 1 1 >6 55 1 1 >7 60 1 1 >8 59 1 1 >9 58 NA NA > > > >Any clue for this behavior? > > > >My specifications: > >R version 3.1.1 (2014-07-10) >Platform: x86_64-w64-mingw32/x64 (64-bit) > >locale: >[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >LC_MONETARY=Spanish_Spain.1252 >[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 > >attached base packages: >[1] stats graphics grDevices utils datasets methods base > > >other attached packages: >[1] foreign_0.8-61 > >loaded via a namespace (and not attached): >[1] tools_3.1.1 > > > > >Thank you very much. > >Angel Rodriguez-Laso >Research project manager >Matia Instituto Gerontologico > > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez wrote: > > Dear subscribers, > > I've found that if there is a variable in the dataframe with a name very > similar to a new variable, R does not give the correct values to this latter > variable based on the values of a third value: > > > > Any clue for this behavior? > > > Thank you very much. > > Angel Rodriguez-Laso > Research project manager > Matia Instituto Gerontologico That is unusual, but appears to be documented in a section from ?`[` Character indices Character indices can in some circumstances be partially matched (see pmatch) to the names or dimnames of the object being subsetted (but never for subassignment). Unlike S (Becker et al p. 358)), R never uses partial matching when extracting by [, and partial matching is not by default used by [[ (see argument exact). Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE). Neither empty ("") nor NA indices match any names, not even empty nor missing names. If any object has no names or appropriate dimnames, they are taken as all "" and so match nothing. Note the commend about "partial matching" in the middle paragraph in the quote above. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
You are being bitten by the "partial matching" of the "$" operator (see ?"$" for a better explanation). Here is solution that works: **original** > N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, > 1, 1, 1,1,1,1,NA)), + .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame") > N$sample[N$age >= 65] <- 1 > N age samplem sample 1 67 NA 1 2 62 1 1 3 74 1 1 4 61 1 1 5 60 1 1 6 55 1 1 7 60 1 1 8 59 1 1 9 58 NA NA > > > N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, 1, > 1, 1, 1,1,1,1,NA)), + .Names = c("age","samplem"), row.names = c(NA, -9L), class = "data.frame") > N[["sample"]][N$age >= 65] <- 1 # use the '[[' operation for complete > matching > N age samplem sample 1 67 NA 1 2 62 1 NA 3 74 1 1 4 61 1 NA 5 60 1 NA 6 55 1 NA 7 60 1 NA 8 59 1 NA 9 58 NA NA Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Aug 29, 2014 at 4:53 AM, Angel Rodriguez wrote: > > Dear subscribers, > > I've found that if there is a variable in the dataframe with a name very > similar to a new variable, R does not give the correct values to this latter > variable based on the values of a third value: > > >> M <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58)),.Names = >> c("age"), row.names = c(NA, -9L), > +class = "data.frame") >> M$sample[M$age >= 65] <- 1 >> M > age sample > 1 67 1 > 2 62 NA > 3 74 1 > 4 61 NA > 5 60 NA > 6 55 NA > 7 60 NA > 8 59 NA > 9 58 NA >> N <- structure(list(V1 = c(67, 62, 74, 61, 60, 55, 60, 59, 58), V2 = c(NA, >> 1, 1, 1, 1,1,1,1,NA)), > + .Names = c("age","samplem"), row.names = c(NA, -9L), > class = "data.frame") >> N$sample[N$age >= 65] <- 1 >> N > age samplem sample > 1 67 NA 1 > 2 62 1 1 > 3 74 1 1 > 4 61 1 1 > 5 60 1 1 > 6 55 1 1 > 7 60 1 1 > 8 59 1 1 > 9 58 NA NA > > > > Any clue for this behavior? > > > > My specifications: > > R version 3.1.1 (2014-07-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > LC_MONETARY=Spanish_Spain.1252 > [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] foreign_0.8-61 > > loaded via a namespace (and not attached): > [1] tools_3.1.1 > > > > > Thank you very much. > > Angel Rodriguez-Laso > Research project manager > Matia Instituto Gerontologico > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.