Re: [R] Removing variables from data frame with a wile card
Steven, The default is drop=TRUE. If you want to retain a data.frame and not have it reduced to a vector under some circumstances. https://win-vector.com/2018/02/27/r-tip-use-drop-false-with-data-frames/ -Original Message- From: R-help On Behalf Of Steven T. Yen Sent: Sunday, February 12, 2023 5:19 PM To: Andrew Simmons Cc: R-help Mailing List Subject: Re: [R] Removing variables from data frame with a wile card In the line suggested by Andrew Simmons, mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] what does drop=FALSE do? Thanks. On 1/14/2023 8:48 PM, Steven Yen wrote: > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular >> expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns >> starting with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> How do I remove them with a wild cardsomething similar to "del yr*" >>> in Windows/doc? Thank you. >>> >>>> colnames(mydata) >>> [1] "year" "weight" "confeduc" "confothr" "college" >>> [6] ... >>> [41] "yr3""yr4""yr5""yr6" "yr7" >>> [46] "yr8""yr9""yr10" "yr11" "yr12" >>> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >>> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >>> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >>> [66] "yr28"... >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Great, Thanks. Now I have many options. Steven from iPhone > On Feb 13, 2023, at 10:52 AM, Andrew Simmons wrote: > > What I meant is that that > > mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > > and > > mydata[!grepl("^yr", colnames(mydata))] > > should be identical. Some people would prefer the first because the > indexing looks the same as matrix indexing, whereas some people would > prefer the second because it is more efficient. However, I would argue > it is exactly as efficient. You can see from the first few lines of > `[.data.frame` when the first index is missing and the second is > provided, it does almost the same thing as if only the first index > provided. > >> On Sun, Feb 12, 2023 at 9:38 PM Steven Yen wrote: >> >> x[“V2”] would retain columns of x headed by V2. What I need is the >> opposite——I need a data grime with those columns excluded. >> >> Steven from iPhone >> >> On Feb 13, 2023, at 9:33 AM, Rolf Turner wrote: >> >> >> On Sun, 12 Feb 2023 14:57:36 -0800 >> Jeff Newmiller wrote: >> >> x["V2"] >> >> >> is more efficient than using drop=FALSE, and perfectly normal syntax >> >> (data frames are lists of columns). >> >> >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
What I meant is that that mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] and mydata[!grepl("^yr", colnames(mydata))] should be identical. Some people would prefer the first because the indexing looks the same as matrix indexing, whereas some people would prefer the second because it is more efficient. However, I would argue it is exactly as efficient. You can see from the first few lines of `[.data.frame` when the first index is missing and the second is provided, it does almost the same thing as if only the first index provided. On Sun, Feb 12, 2023 at 9:38 PM Steven Yen wrote: > > x[“V2”] would retain columns of x headed by V2. What I need is the > opposite——I need a data grime with those columns excluded. > > Steven from iPhone > > On Feb 13, 2023, at 9:33 AM, Rolf Turner wrote: > > > On Sun, 12 Feb 2023 14:57:36 -0800 > Jeff Newmiller wrote: > > x["V2"] > > > is more efficient than using drop=FALSE, and perfectly normal syntax > > (data frames are lists of columns). > > > > > I never cease to be amazed by the sagacity and perspicacity of the > designers of R. I would have worried that x["V2"] would turn out to be > a *list* (of length 1), but no, it retains the data.frame class, which > is clearly the Right Thing To Do. > > cheers, > > Rolf > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Stats. Dep't. phone: +64-9-373-7599 ext. 89622 > Home phone: +64-9-480-4619 > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Complain, complain... x[ names( x ) != "V2" ] or x[ ! names( x ) %in% c( "V2", "V3" ) ] or any other character or logical or integer expression that selects columns you want... On February 12, 2023 6:38:00 PM PST, Steven Yen wrote: >x[“V2”] would retain columns of x headed by V2. What I need is the opposite——I >need a data grime with those columns excluded. > >Steven from iPhone > >> On Feb 13, 2023, at 9:33 AM, Rolf Turner wrote: >> >> >>> On Sun, 12 Feb 2023 14:57:36 -0800 >>> Jeff Newmiller wrote: >>> >>> x["V2"] >>> >>> is more efficient than using drop=FALSE, and perfectly normal syntax >>> (data frames are lists of columns). >> >> >> >> I never cease to be amazed by the sagacity and perspicacity of the >> designers of R. I would have worried that x["V2"] would turn out to be >> a *list* (of length 1), but no, it retains the data.frame class, which >> is clearly the Right Thing To Do. >> >> cheers, >> >> Rolf >> >> -- >> Honorary Research Fellow >> Department of Statistics >> University of Auckland >> Stats. Dep't. phone: +64-9-373-7599 ext. 89622 >> Home phone: +64-9-480-4619 >> -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
x[“V2”] would retain columns of x headed by V2. What I need is the opposite——I need a data grime with those columns excluded. Steven from iPhone > On Feb 13, 2023, at 9:33 AM, Rolf Turner wrote: > > >> On Sun, 12 Feb 2023 14:57:36 -0800 >> Jeff Newmiller wrote: >> >> x["V2"] >> >> is more efficient than using drop=FALSE, and perfectly normal syntax >> (data frames are lists of columns). > > > > I never cease to be amazed by the sagacity and perspicacity of the > designers of R. I would have worried that x["V2"] would turn out to be > a *list* (of length 1), but no, it retains the data.frame class, which > is clearly the Right Thing To Do. > > cheers, > > Rolf > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Stats. Dep't. phone: +64-9-373-7599 ext. 89622 > Home phone: +64-9-480-4619 > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
On Sun, 12 Feb 2023 14:57:36 -0800 Jeff Newmiller wrote: > x["V2"] > > is more efficient than using drop=FALSE, and perfectly normal syntax > (data frames are lists of columns). I never cease to be amazed by the sagacity and perspicacity of the designers of R. I would have worried that x["V2"] would turn out to be a *list* (of length 1), but no, it retains the data.frame class, which is clearly the Right Thing To Do. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Thanks Jeff and Andrew. My initial file, mydata, is a data frame with 92 columns (variables). After the operation (trimming), it remains a data frame with 72 variables. So yes indeed, I do not need the drop=FALSE. > is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 92 > mydata<-mydata[,!grepl("^yr",colnames(mydata)),drop=FALSE] > is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 72 On 2/13/2023 6:57 AM, Jeff Newmiller wrote: > x["V2"] > > is more efficient than using drop=FALSE, and perfectly normal syntax (data > frames are lists of columns). I would ignore the naysayers, or put a comment > in if you want to accelerate their uptake. > > As I understand it, one of the main reasons tibbles exist is because of > drop=TRUE. List-slice (single-dimension) indexing works equally well with > both standard and tibble types of data frames. > > On February 12, 2023 2:30:15 PM PST, Andrew Simmons > wrote: >> drop = FALSE means that should the indexing select exactly one column, then >> return a data frame with one column, instead of the object in the column. >> It's usually not necessary, but I've messed up some data before by assuming >> the indexing always returns a data frame when it doesn't, so drop = FALSE >> let's me that I will always get a data frame. >> >> ``` >> x <- data.frame(V1 = 1:5, V2 = letters[1:5]) >> x[, "V2"] >> x[, "V2", drop = FALSE] >> ``` >> >> You'll notice that the first returns a character vector, a through e, where >> the second returns a data frame with one column where the object in the >> column is the same character vector. >> >> You could alternatively use >> >> x["V2"] >> >> which should be identical to x[, "V2", drop = FALSE], but some people don't >> like that because it doesn't look like matrix indexing anymore. >> >> >> On Sun, Feb 12, 2023, 17:18 Steven T. Yen wrote: >> >>> In the line suggested by Andrew Simmons, >>> >>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >>> >>> what does drop=FALSE do? Thanks. >>> >>> On 1/14/2023 8:48 PM, Steven Yen wrote: >>> >>> Thanks to all. Very helpful. >>> >>> Steven from iPhone >>> >>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons >>> wrote: >>> >>> You'll want to use grep() or grepl(). By default, grep() uses extended >>> regular expressions to find matches, but you can also use perl regular >>> expressions and globbing (after converting to a regular expression). >>> For example: >>> >>> grepl("^yr", colnames(mydata)) >>> >>> will tell you which 'colnames' start with "yr". If you'd rather you >>> use globbing: >>> >>> grepl(glob2rx("yr*"), colnames(mydata)) >>> >>> Then you might write something like this to remove the columns starting >>> with yr: >>> >>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >>> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen >>> wrote: >>> >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> >>> How do I remove them with a wild cardsomething similar to "del yr*" >>> >>> in Windows/doc? Thank you. >>> >>> >>> colnames(mydata) >>> >>>[1] "year" "weight" "confeduc" "confothr" "college" >>> >>>[6] ... >>> >>> [41] "yr3""yr4""yr5""yr6" "yr7" >>> >>> [46] "yr8""yr9""yr10" "yr11" "yr12" >>> >>> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >>> >>> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >>> >>> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >>> >>> [66] "yr28"... >>> >>> >>> __ >>> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
x["V2"] is more efficient than using drop=FALSE, and perfectly normal syntax (data frames are lists of columns). I would ignore the naysayers, or put a comment in if you want to accelerate their uptake. As I understand it, one of the main reasons tibbles exist is because of drop=TRUE. List-slice (single-dimension) indexing works equally well with both standard and tibble types of data frames. On February 12, 2023 2:30:15 PM PST, Andrew Simmons wrote: >drop = FALSE means that should the indexing select exactly one column, then >return a data frame with one column, instead of the object in the column. >It's usually not necessary, but I've messed up some data before by assuming >the indexing always returns a data frame when it doesn't, so drop = FALSE >let's me that I will always get a data frame. > >``` >x <- data.frame(V1 = 1:5, V2 = letters[1:5]) >x[, "V2"] >x[, "V2", drop = FALSE] >``` > >You'll notice that the first returns a character vector, a through e, where >the second returns a data frame with one column where the object in the >column is the same character vector. > >You could alternatively use > >x["V2"] > >which should be identical to x[, "V2", drop = FALSE], but some people don't >like that because it doesn't look like matrix indexing anymore. > > >On Sun, Feb 12, 2023, 17:18 Steven T. Yen wrote: > >> In the line suggested by Andrew Simmons, >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> what does drop=FALSE do? Thanks. >> >> On 1/14/2023 8:48 PM, Steven Yen wrote: >> >> Thanks to all. Very helpful. >> >> Steven from iPhone >> >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons >> wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses extended >> regular expressions to find matches, but you can also use perl regular >> expressions and globbing (after converting to a regular expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting >> with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen >> wrote: >> >> >> I have a data frame containing variables "yr3",...,"yr28". >> >> >> How do I remove them with a wild cardsomething similar to "del yr*" >> >> in Windows/doc? Thank you. >> >> >> colnames(mydata) >> >> [1] "year" "weight" "confeduc" "confothr" "college" >> >> [6] ... >> >> [41] "yr3""yr4""yr5""yr6" "yr7" >> >> [46] "yr8""yr9""yr10" "yr11" "yr12" >> >> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >> >> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >> >> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >> >> [66] "yr28"... >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
drop = FALSE means that should the indexing select exactly one column, then return a data frame with one column, instead of the object in the column. It's usually not necessary, but I've messed up some data before by assuming the indexing always returns a data frame when it doesn't, so drop = FALSE let's me that I will always get a data frame. ``` x <- data.frame(V1 = 1:5, V2 = letters[1:5]) x[, "V2"] x[, "V2", drop = FALSE] ``` You'll notice that the first returns a character vector, a through e, where the second returns a data frame with one column where the object in the column is the same character vector. You could alternatively use x["V2"] which should be identical to x[, "V2", drop = FALSE], but some people don't like that because it doesn't look like matrix indexing anymore. On Sun, Feb 12, 2023, 17:18 Steven T. Yen wrote: > In the line suggested by Andrew Simmons, > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > > what does drop=FALSE do? Thanks. > > On 1/14/2023 8:48 PM, Steven Yen wrote: > > Thanks to all. Very helpful. > > Steven from iPhone > > On Jan 14, 2023, at 3:08 PM, Andrew Simmons > wrote: > > You'll want to use grep() or grepl(). By default, grep() uses extended > regular expressions to find matches, but you can also use perl regular > expressions and globbing (after converting to a regular expression). > For example: > > grepl("^yr", colnames(mydata)) > > will tell you which 'colnames' start with "yr". If you'd rather you > use globbing: > > grepl(glob2rx("yr*"), colnames(mydata)) > > Then you might write something like this to remove the columns starting > with yr: > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > > On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen > wrote: > > > I have a data frame containing variables "yr3",...,"yr28". > > > How do I remove them with a wild cardsomething similar to "del yr*" > > in Windows/doc? Thank you. > > > colnames(mydata) > > [1] "year" "weight" "confeduc" "confothr" "college" > > [6] ... > > [41] "yr3""yr4""yr5""yr6" "yr7" > > [46] "yr8""yr9""yr10" "yr11" "yr12" > > [51] "yr13" "yr14" "yr15" "yr16" "yr17" > > [56] "yr18" "yr19" "yr20" "yr21" "yr22" > > [61] "yr23" "yr24" "yr25" "yr26" "yr27" > > [66] "yr28"... > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
In the line suggested by Andrew Simmons, mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] what does drop=FALSE do? Thanks. On 1/14/2023 8:48 PM, Steven Yen wrote: > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses extended >> regular expressions to find matches, but you can also use perl regular >> expressions and globbing (after converting to a regular expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns >> starting with yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> How do I remove them with a wild cardsomething similar to "del yr*" >>> in Windows/doc? Thank you. >>> colnames(mydata) >>> [1] "year" "weight" "confeduc" "confothr" "college" >>> [6] ... >>> [41] "yr3" "yr4" "yr5" "yr6" "yr7" >>> [46] "yr8" "yr9" "yr10" "yr11" "yr12" >>> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >>> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >>> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >>> [66] "yr28"... >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Às 16:54 de 15/01/2023, Sorkin, John escreveu: I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R. # Create data frame with three columns one <- rep(1,10) one two <- rep(2,10) two three <- rep(3,10) three mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three columns\n") mydata # Drop the column whose name starts with th, i.e. column three # Find the location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n") ColumToDelete # Drop the column whose name starts with "th" newdata2 <- mydata[,-ColumnToDelete] cat("Data frame after droping column whose name is three\n") newdata2 I hope this helps. John From: R-help on behalf of Valentin Petzel Sent: Saturday, January 14, 2023 1:21 PM To: avi.e.gr...@gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gr...@gmail.com: Steven, Just want to add a few things to what people wrote. In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern. That is fine. For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in: mydata$NAME <- NULL using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep. R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change. For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier. -Original Message- From: R-help On Behalf Of Steven Yen Sent: Saturday, January 14, 2023 7:49 AM To: Andrew Simmons Cc: R-help Mailing List Subject: Re: [R] Removing variables from data frame with a wile card Thanks to all. Very helpful. Steven from iPhone On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: You'll want to use grep() or grepl(). By default, grep() uses extended regular expressions to find matches, but you can also use perl regular expressions and globbing (after converting to a regular expression). For example: grepl("^yr", colnames(mydata)) will tell you which 'colnames' start with "yr". If you'd rather you use globbing: grepl(glob2rx("yr*"), colnames(mydata)) Then you might write something like this to remove the columns starting with yr: mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: I have a data frame containing variables "yr3",...,"yr28". How do I remove them with a wild cardsomething similar to "del yr*" in Windows/doc? Thank you. colnames(mydata) [1] "year" "weight" "confeduc" "confothr" "college" [6] ... [41] "yr3" "yr4"
Re: [R] Removing variables from data frame with a wile card
John, As you said, you are new to the discussion so let me catch you up. The original question was about removing many columns that shared a similar feature in the naming convention while leaving other columns in-place. Quite a few replies were given on how to do that including how to use a regular expression to gather the column names you want to remove. It was only afterwards that the topic changed a bit to mention that some people used additional ways both in base R and also using packages like dplyr in the tidyverse. As a general rule, most packages out there provide functionality that can be done in base R if you wish, and some are written purely in R while some augment that with parts re-done in C or something. If a package is well built and frequently used, it may well make your life as a programmer easier as the code need not be re-invented and debugged. Of course some packages are of poorer quality. So we fully agree that unless asked for, the base R answers should be the focus HERE. Then again, languages are not static and sometimes we see things like pipes moved in a modified version into the main language. Avi -Original Message- From: Sorkin, John Sent: Sunday, January 15, 2023 11:55 AM To: Valentin Petzel ; avi.e.gr...@gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R. # Create data frame with three columns one <- rep(1,10) one two <- rep(2,10) two three <- rep(3,10) three mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three columns\n") mydata # Drop the column whose name starts with th, i.e. column three # Find the location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n") ColumToDelete # Drop the column whose name starts with "th" newdata2 <- mydata[,-ColumnToDelete] cat("Data frame after droping column whose name is three\n") newdata2 I hope this helps. John From: R-help on behalf of Valentin Petzel Sent: Saturday, January 14, 2023 1:21 PM To: avi.e.gr...@gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gr...@gmail.com: > Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF > that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a > column within it. You can do that in several ways but the simplest is > something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of > unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a > data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let > you select columns that start wi
Re: [R] Removing variables from data frame with a wile card
I am new to this thread. At the risk of presenting something that has been shown before, below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a column whose name starts with "th" using nothing more than base r functions and base R syntax. While additions to R such as tidyverse can be very helpful, many things that they do can be accomplished simply using base R. # Create data frame with three columns one <- rep(1,10) one two <- rep(2,10) two three <- rep(3,10) three mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three columns\n") mydata # Drop the column whose name starts with th, i.e. column three # Find the location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The colomumn to be dropped is the column called three, which is column",ColumToDelete,"\n") ColumToDelete # Drop the column whose name starts with "th" newdata2 <- mydata[,-ColumnToDelete] cat("Data frame after droping column whose name is three\n") newdata2 I hope this helps. John From: R-help on behalf of Valentin Petzel Sent: Saturday, January 14, 2023 1:21 PM To: avi.e.gr...@gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gr...@gmail.com: > Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF > that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a > column within it. You can do that in several ways but the simplest is > something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of > unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a > data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let > you select columns that start with or end with or contain some pattern and I > find that way easier. > > > > -Original Message- > From: R-help On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons > Cc: R-help Mailing List > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular >> expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting with >> yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >
Re: [R] Removing variables from data frame with a wile card
Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gr...@gmail.com: > Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF > that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a > column within it. You can do that in several ways but the simplest is > something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of > unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a > data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let > you select columns that start with or end with or contain some pattern and I > find that way easier. > > > > -Original Message- > From: R-help On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons > Cc: R-help Mailing List > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular >> expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Then you might write something like this to remove the columns starting with >> yr: >> >> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] >> >>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >>> >>> I have a data frame containing variables "yr3",...,"yr28". >>> >>> How do I remove them with a wild cardsomething similar to "del yr*" >>> in Windows/doc? Thank you. >>> >>>> colnames(mydata) >>> [1] "year" "weight" "confeduc" "confothr" "college" >>> [6] ... >>> [41] "yr3" "yr4" "yr5" "yr6" "yr7" >>> [46] "yr8" "yr9" "yr10" "yr11" "yr12" >>> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >>> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >>> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >>> [66] "yr28"... >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained,
Re: [R] Removing variables from data frame with a wile card
John, I am very familiar with the evolving tidyverse and some messages a while back included people who wanted this forum to mainly stick to base R, so I leave out examples. Indeed, the tidyverse is designed to make it easy to select columns with all kinds of conditions including using regular expressions that allow more precision (as does grep) so you want to match “yr” followed by exactly one or two digits. Some of the answers suggest starting with “yr” was enough. They also allow selecting on arbitrary considerations like whether the column contains numeric data. You can do most things in base R, albeit I find the tidyverse method easier most of the time and also able to do some extremely complicated things with some care, such as creating multiple new columns form a set of columns that each implement a different function like mean, and mode and standard deviation and make the new columns the same names as the one they are derived from but a different suffix reflecting what transformation was done. One nice feature is the ideas behind how data streams through multiple steps with one or a few transformations in each step, and the intermediate parts you do not want, simply melt away. The part about selecting or deselecting columns can often be used in many of the verbs. From: John Kane Sent: Saturday, January 14, 2023 4:07 PM To: avi.e.gr...@gmail.com Cc: R-help Mailing List Subject: Re: [R] Removing variables from data frame with a wile card You rang sir? library(tidyverse) xx = 1:10 yr1 = yr2 = yr3 = rnorm(10) dat1 <- data.frame(xx , yr1, yr2, y3) dat1 %>% select(!starts_with("yr")) or for something a bit more exotic as I have been trying to learn a bit about the "data.table package library(data.table) xx = 1:10 yr1 = yr2 = yr3 = rnorm(10) dat2 <- data.table(xx , yr1, yr2, yr3) dat2[, !names(dat2) %like% "yr", with=FALSE ] On Sat, 14 Jan 2023 at 12:28, mailto:avi.e.gr...@gmail.com> > wrote: Steven, Just want to add a few things to what people wrote. In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern. That is fine. For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in: mydata$NAME <- NULL using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep. R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change. For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier. -Original Message- From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Steven Yen Sent: Saturday, January 14, 2023 7:49 AM To: Andrew Simmons mailto:akwsi...@gmail.com> > Cc: R-help Mailing List mailto:r-help@r-project.org> > Subject: Re: [R] Removing variables from data frame with a wile card Thanks to all. Very helpful. Steven from iPhone > On Jan 14, 2023, at 3:08 PM, Andrew Simmons <mailto:akwsi...@gmail.com> > wrote: > > You'll want to use grep() or grepl(). By default, grep() uses > extended regular expressions to find matches, but you can also use > perl regular expressions and globbing (after converting to a regular > expression). > For example: > > grepl("^yr", colnames(mydata)) > > will tell you which 'colnames' start with "yr". If you'd rather you > use globbing: > > grepl(glob2rx("yr*"), colnames(mydata)) > > Then you might write something like this to remove the columns starting with > yr: > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen > <mailto:st...@ntu.edu.tw> > wrote: >> >> I have a data frame containing variables "yr3",...,"yr28". >> >> How do I remove them with a wild cardsomething similar to "del yr*" >> in Windows/doc? Thank you. >> >>> colnames(mydata) >> [1] "year" "weight" "confeduc" "confothr" "college" >> [6] ... >> [41] "yr3""yr4""yr5""yr6" "yr7" >> [46] "yr8""yr9""yr10" "yr11" "yr12" >> [51] "yr13" "yr14" "yr15" "yr16" "yr17&qu
Re: [R] Removing variables from data frame with a wile card
Valentin, You are correct that R does many things largely behind the scenes that make some operations fairly efficient. >From a programming point of view, though, many people might make a data.frame >and not think of it as a list of vectors of the same length that are kept that >way. So if they made a copy of the original data with fewer columns, they might be tempted to think the original item was completely copied and the original is either around or if the identifier was re-used, will be garbage collected. As you note, the only thinks collected are the columns you chose not to include. For some it seems cleaner to set a list item to NULL, which seems to remove it immediately. The real point I hoped to make is that using base R, you can indeed approach removing (multiple) columns in two logical ways. One is to seemingly remove them in the original object, even if your point is valid. The other is to make a copy of just what you want and ignore the rest and it may be kept around or not. If someone really wanted to get down to the basics, they could get a reference to all the columns they want to keep, as in col1 <- mydata[["col1"] ] and use those to make a new data.frame, or many other variants on these methods. Many programming languages have some qualms (I mean designers and programmers, and just plain purists) about when "pointers" of sorts are used and whether things should be mutable and so on so I prefer to avoid religious wars. -Original Message- From: Valentin Petzel Sent: Saturday, January 14, 2023 1:21 PM To: avi.e.gr...@gmail.com Cc: 'R-help Mailing List' Subject: Re: [R] Removing variables from data frame with a wile card Hello Avi, while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily. There is a data structure that is in fact mutable which are environments. For example compare L <- list() local({L$a <- 3}) L$a with E <- new.env() local({E$a <- 3}) E$a The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made. Under the hood we have a parser trick: If R sees something like f(a) <- ... it will look for a function f<- and call a <- f<-(a, ...) (this also happens for example when you do names(x) <- ...) So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result. The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing d[, (cols_to_remove) := NULL] will actually change the data. Regards, Valentin 14.01.2023 18:28:33 avi.e.gr...@gmail.com: > Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original DF > that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a > column within it. You can do that in several ways but the simplest is > something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of > unctional programming method that does that with all components of your grep. > > R does have optimizations that make this less useful as a partial copy of a > data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that let > you select columns that start with or end with or contain some pattern and I > find that way easier. > > > > -Original Message- > From: R-help On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons > Cc: R-help Mailing List > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > >> On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: >> >> You'll want to use grep() or grepl(). By default, grep() uses >> extended regular expressions to find matches, but you can also use >> perl regular expressions and globbing (after converting to a regular >> expression). >> For example: >> >> grepl("^yr", colnames(mydata)) >> >> will tell you which 'colnames' start with "yr". If you'd rather you >> use globbing: >> >> grepl(glob2rx("yr*"), colnames(mydata)) >> >> Th
Re: [R] Removing variables from data frame with a wile card
You rang sir? library(tidyverse) xx = 1:10 yr1 = yr2 = yr3 = rnorm(10) dat1 <- data.frame(xx , yr1, yr2, y3) dat1 %>% select(!starts_with("yr")) or for something a bit more exotic as I have been trying to learn a bit about the "data.table package library(data.table) xx = 1:10 yr1 = yr2 = yr3 = rnorm(10) dat2 <- data.table(xx , yr1, yr2, yr3) dat2[, !names(dat2) %like% "yr", with=FALSE ] On Sat, 14 Jan 2023 at 12:28, wrote: > Steven, > > Just want to add a few things to what people wrote. > > In base R, the methods mentioned will let you make a copy of your original > DF that is missing the items you are selecting that match your pattern. > > That is fine. > > For some purposes, you want to keep the original data.frame and remove a > column within it. You can do that in several ways but the simplest is > something where you sat the column to NULL as in: > > mydata$NAME <- NULL > > using the mydata["NAME"] notation can do that for you by using a loop of > unctional programming method that does that with all components of your > grep. > > R does have optimizations that make this less useful as a partial copy of > a data.frame retains common parts till things change. > > For those who like to use the tidyverse, it comes with lots of tools that > let you select columns that start with or end with or contain some pattern > and I find that way easier. > > > > -Original Message- > From: R-help On Behalf Of Steven Yen > Sent: Saturday, January 14, 2023 7:49 AM > To: Andrew Simmons > Cc: R-help Mailing List > Subject: Re: [R] Removing variables from data frame with a wile card > > Thanks to all. Very helpful. > > Steven from iPhone > > > On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: > > > > You'll want to use grep() or grepl(). By default, grep() uses > > extended regular expressions to find matches, but you can also use > > perl regular expressions and globbing (after converting to a regular > expression). > > For example: > > > > grepl("^yr", colnames(mydata)) > > > > will tell you which 'colnames' start with "yr". If you'd rather you > > use globbing: > > > > grepl(glob2rx("yr*"), colnames(mydata)) > > > > Then you might write something like this to remove the columns starting > with yr: > > > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > > > >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: > >> > >> I have a data frame containing variables "yr3",...,"yr28". > >> > >> How do I remove them with a wild cardsomething similar to "del yr*" > >> in Windows/doc? Thank you. > >> > >>> colnames(mydata) > >> [1] "year" "weight" "confeduc" "confothr" "college" > >> [6] ... > >> [41] "yr3""yr4""yr5""yr6" "yr7" > >> [46] "yr8""yr9""yr10" "yr11" "yr12" > >> [51] "yr13" "yr14" "yr15" "yr16" "yr17" > >> [56] "yr18" "yr19" "yr20" "yr21" "yr22" > >> [61] "yr23" "yr24" "yr25" "yr26" "yr27" > >> [66] "yr28"... > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- John Kane Kingston ON Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Steven, Just want to add a few things to what people wrote. In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern. That is fine. For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in: mydata$NAME <- NULL using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep. R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change. For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier. -Original Message- From: R-help On Behalf Of Steven Yen Sent: Saturday, January 14, 2023 7:49 AM To: Andrew Simmons Cc: R-help Mailing List Subject: Re: [R] Removing variables from data frame with a wile card Thanks to all. Very helpful. Steven from iPhone > On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: > > You'll want to use grep() or grepl(). By default, grep() uses > extended regular expressions to find matches, but you can also use > perl regular expressions and globbing (after converting to a regular > expression). > For example: > > grepl("^yr", colnames(mydata)) > > will tell you which 'colnames' start with "yr". If you'd rather you > use globbing: > > grepl(glob2rx("yr*"), colnames(mydata)) > > Then you might write something like this to remove the columns starting with > yr: > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >> >> I have a data frame containing variables "yr3",...,"yr28". >> >> How do I remove them with a wild cardsomething similar to "del yr*" >> in Windows/doc? Thank you. >> >>> colnames(mydata) >> [1] "year" "weight" "confeduc" "confothr" "college" >> [6] ... >> [41] "yr3""yr4""yr5""yr6" "yr7" >> [46] "yr8""yr9""yr10" "yr11" "yr12" >> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >> [66] "yr28"... >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
The -grep(pattern,colnames) as a subscript is a bit dangerous. If no colname matches the pattern then all columns will be omitted (because -0 is the same as 0, which means no column). !grepl(pattern,colnames) avoids this problem. > mydata <- data.frame(A=1:3,B=11:13) > mydata[, -grep("^yr", colnames(mydata))] data frame with 0 columns and 3 rows > mydata[, !grepl("^yr", colnames(mydata))] A B 1 1 11 2 2 12 3 3 13 -Bill On Fri, Jan 13, 2023 at 11:07 PM Eric Berger wrote: > mydata[, -grep("^yr",colnames(mydata))] > > On Sat, Jan 14, 2023 at 8:57 AM Steven T. Yen wrote: > > > I have a data frame containing variables "yr3",...,"yr28". > > > > How do I remove them with a wild cardsomething similar to "del yr*" > > in Windows/doc? Thank you. > > > > > colnames(mydata) > >[1] "year" "weight" "confeduc" "confothr" "college" > >[6] ... > > [41] "yr3""yr4""yr5""yr6" "yr7" > > [46] "yr8""yr9""yr10" "yr11" "yr12" > > [51] "yr13" "yr14" "yr15" "yr16" "yr17" > > [56] "yr18" "yr19" "yr20" "yr21" "yr22" > > [61] "yr23" "yr24" "yr25" "yr26" "yr27" > > [66] "yr28"... > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
Thanks to all. Very helpful. Steven from iPhone > On Jan 14, 2023, at 3:08 PM, Andrew Simmons wrote: > > You'll want to use grep() or grepl(). By default, grep() uses extended > regular expressions to find matches, but you can also use perl regular > expressions and globbing (after converting to a regular expression). > For example: > > grepl("^yr", colnames(mydata)) > > will tell you which 'colnames' start with "yr". If you'd rather you > use globbing: > > grepl(glob2rx("yr*"), colnames(mydata)) > > Then you might write something like this to remove the columns starting with > yr: > > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] > >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: >> >> I have a data frame containing variables "yr3",...,"yr28". >> >> How do I remove them with a wild cardsomething similar to "del yr*" >> in Windows/doc? Thank you. >> >>> colnames(mydata) >> [1] "year" "weight" "confeduc" "confothr" "college" >> [6] ... >> [41] "yr3""yr4""yr5""yr6" "yr7" >> [46] "yr8""yr9""yr10" "yr11" "yr12" >> [51] "yr13" "yr14" "yr15" "yr16" "yr17" >> [56] "yr18" "yr19" "yr20" "yr21" "yr22" >> [61] "yr23" "yr24" "yr25" "yr26" "yr27" >> [66] "yr28"... >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
You'll want to use grep() or grepl(). By default, grep() uses extended regular expressions to find matches, but you can also use perl regular expressions and globbing (after converting to a regular expression). For example: grepl("^yr", colnames(mydata)) will tell you which 'colnames' start with "yr". If you'd rather you use globbing: grepl(glob2rx("yr*"), colnames(mydata)) Then you might write something like this to remove the columns starting with yr: mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE] On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen wrote: > > I have a data frame containing variables "yr3",...,"yr28". > > How do I remove them with a wild cardsomething similar to "del yr*" > in Windows/doc? Thank you. > > > colnames(mydata) >[1] "year" "weight" "confeduc" "confothr" "college" >[6] ... > [41] "yr3""yr4""yr5""yr6" "yr7" > [46] "yr8""yr9""yr10" "yr11" "yr12" > [51] "yr13" "yr14" "yr15" "yr16" "yr17" > [56] "yr18" "yr19" "yr20" "yr21" "yr22" > [61] "yr23" "yr24" "yr25" "yr26" "yr27" > [66] "yr28"... > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing variables from data frame with a wile card
mydata[, -grep("^yr",colnames(mydata))] On Sat, Jan 14, 2023 at 8:57 AM Steven T. Yen wrote: > I have a data frame containing variables "yr3",...,"yr28". > > How do I remove them with a wild cardsomething similar to "del yr*" > in Windows/doc? Thank you. > > > colnames(mydata) >[1] "year" "weight" "confeduc" "confothr" "college" >[6] ... > [41] "yr3""yr4""yr5""yr6" "yr7" > [46] "yr8""yr9""yr10" "yr11" "yr12" > [51] "yr13" "yr14" "yr15" "yr16" "yr17" > [56] "yr18" "yr19" "yr20" "yr21" "yr22" > [61] "yr23" "yr24" "yr25" "yr26" "yr27" > [66] "yr28"... > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing variables from data frame with a wile card
I have a data frame containing variables "yr3",...,"yr28". How do I remove them with a wild cardsomething similar to "del yr*" in Windows/doc? Thank you. > colnames(mydata) [1] "year" "weight" "confeduc" "confothr" "college" [6] ... [41] "yr3" "yr4" "yr5" "yr6" "yr7" [46] "yr8" "yr9" "yr10" "yr11" "yr12" [51] "yr13" "yr14" "yr15" "yr16" "yr17" [56] "yr18" "yr19" "yr20" "yr21" "yr22" [61] "yr23" "yr24" "yr25" "yr26" "yr27" [66] "yr28"... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.