Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread avi.e.gross
Steven,

The default is drop=TRUE.

If you want to retain a data.frame and not have it reduced to a vector under 
some circumstances. 

https://win-vector.com/2018/02/27/r-tip-use-drop-false-with-data-frames/

-Original Message-
From: R-help  On Behalf Of Steven T. Yen
Sent: Sunday, February 12, 2023 5:19 PM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

In the line suggested by Andrew Simmons,

mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]

what does drop=FALSE do? Thanks.

On 1/14/2023 8:48 PM, Steven Yen wrote:
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns 
>> starting with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>> How do I remove them with a wild cardsomething similar to "del yr*"
>>> in Windows/doc? Thank you.
>>>
>>>> colnames(mydata)
>>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>>   [6] ...
>>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>>  [66] "yr28"...
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Steven Yen
Great, Thanks. Now I have many options.

Steven from iPhone

> On Feb 13, 2023, at 10:52 AM, Andrew Simmons  wrote:
> 
> What I meant is that that
> 
> mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
> and
> 
> mydata[!grepl("^yr", colnames(mydata))]
> 
> should be identical. Some people would prefer the first because the
> indexing looks the same as matrix indexing, whereas some people would
> prefer the second because it is more efficient. However, I would argue
> it is exactly as efficient. You can see from the first few lines of
> `[.data.frame` when the first index is missing and the second is
> provided, it does almost the same thing as if only the first index
> provided.
> 
>> On Sun, Feb 12, 2023 at 9:38 PM Steven Yen  wrote:
>> 
>> x[“V2”] would retain columns of x headed by V2. What I need is the 
>> opposite——I need a data grime with those columns excluded.
>> 
>> Steven from iPhone
>> 
>> On Feb 13, 2023, at 9:33 AM, Rolf Turner  wrote:
>> 
>> 
>> On Sun, 12 Feb 2023 14:57:36 -0800
>> Jeff Newmiller  wrote:
>> 
>> x["V2"]
>> 
>> 
>> is more efficient than using drop=FALSE, and perfectly normal syntax
>> 
>> (data frames are lists of columns).
>> 
>> 
>> 
>> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Andrew Simmons
What I meant is that that

mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]

and

mydata[!grepl("^yr", colnames(mydata))]

should be identical. Some people would prefer the first because the
indexing looks the same as matrix indexing, whereas some people would
prefer the second because it is more efficient. However, I would argue
it is exactly as efficient. You can see from the first few lines of
`[.data.frame` when the first index is missing and the second is
provided, it does almost the same thing as if only the first index
provided.

On Sun, Feb 12, 2023 at 9:38 PM Steven Yen  wrote:
>
> x[“V2”] would retain columns of x headed by V2. What I need is the 
> opposite——I need a data grime with those columns excluded.
>
> Steven from iPhone
>
> On Feb 13, 2023, at 9:33 AM, Rolf Turner  wrote:
>
> 
> On Sun, 12 Feb 2023 14:57:36 -0800
> Jeff Newmiller  wrote:
>
> x["V2"]
>
>
> is more efficient than using drop=FALSE, and perfectly normal syntax
>
> (data frames are lists of columns).
>
>
> 
>
> I never cease to be amazed by the sagacity and perspicacity of the
> designers of R.  I  would have worried that x["V2"] would turn out to be
> a *list* (of length 1), but no, it retains the data.frame class, which
> is clearly the Right Thing To Do.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Stats. Dep't. phone: +64-9-373-7599 ext. 89622
> Home phone: +64-9-480-4619
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Jeff Newmiller
Complain, complain...

x[ names( x ) != "V2" ]

or

x[ ! names( x ) %in% c( "V2", "V3" ) ]

or any other character or logical or integer expression that selects columns 
you want...

On February 12, 2023 6:38:00 PM PST, Steven Yen  wrote:
>x[“V2”] would retain columns of x headed by V2. What I need is the opposite——I 
>need a data grime with those columns excluded.
>
>Steven from iPhone
>
>> On Feb 13, 2023, at 9:33 AM, Rolf Turner  wrote:
>> 
>> 
>>> On Sun, 12 Feb 2023 14:57:36 -0800
>>> Jeff Newmiller  wrote:
>>> 
>>> x["V2"]
>>> 
>>> is more efficient than using drop=FALSE, and perfectly normal syntax
>>> (data frames are lists of columns).
>> 
>> 
>> 
>> I never cease to be amazed by the sagacity and perspicacity of the
>> designers of R.  I  would have worried that x["V2"] would turn out to be
>> a *list* (of length 1), but no, it retains the data.frame class, which
>> is clearly the Right Thing To Do.
>> 
>> cheers,
>> 
>> Rolf
>> 
>> -- 
>> Honorary Research Fellow
>> Department of Statistics
>> University of Auckland
>> Stats. Dep't. phone: +64-9-373-7599 ext. 89622
>> Home phone: +64-9-480-4619
>> 

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Steven Yen
x[“V2”] would retain columns of x headed by V2. What I need is the opposite——I 
need a data grime with those columns excluded.

Steven from iPhone

> On Feb 13, 2023, at 9:33 AM, Rolf Turner  wrote:
> 
> 
>> On Sun, 12 Feb 2023 14:57:36 -0800
>> Jeff Newmiller  wrote:
>> 
>> x["V2"]
>> 
>> is more efficient than using drop=FALSE, and perfectly normal syntax
>> (data frames are lists of columns).
> 
> 
> 
> I never cease to be amazed by the sagacity and perspicacity of the
> designers of R.  I  would have worried that x["V2"] would turn out to be
> a *list* (of length 1), but no, it retains the data.frame class, which
> is clearly the Right Thing To Do.
> 
> cheers,
> 
> Rolf
> 
> -- 
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Stats. Dep't. phone: +64-9-373-7599 ext. 89622
> Home phone: +64-9-480-4619
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Rolf Turner


On Sun, 12 Feb 2023 14:57:36 -0800
Jeff Newmiller  wrote:

> x["V2"]
> 
> is more efficient than using drop=FALSE, and perfectly normal syntax
> (data frames are lists of columns).



I never cease to be amazed by the sagacity and perspicacity of the
designers of R.  I  would have worried that x["V2"] would turn out to be
a *list* (of length 1), but no, it retains the data.frame class, which
is clearly the Right Thing To Do.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. phone: +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Steven T. Yen
Thanks Jeff and Andrew. My initial file, mydata, is a data frame with 92 
columns (variables). After the operation (trimming), it remains a data 
frame with 72 variables. So yes indeed, I do not need the drop=FALSE.

> is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 92 > 
mydata<-mydata[,!grepl("^yr",colnames(mydata)),drop=FALSE] > 
is.data.frame(mydata) [1] TRUE > ncol(mydata) [1] 72

On 2/13/2023 6:57 AM, Jeff Newmiller wrote:
> x["V2"]
>
> is more efficient than using drop=FALSE, and perfectly normal syntax (data 
> frames are lists of columns).  I would ignore the naysayers, or put a comment 
> in if you want to accelerate their uptake.
>
> As I understand it, one of the main reasons tibbles exist is because of 
> drop=TRUE. List-slice (single-dimension) indexing works equally well with 
> both standard and tibble types of data frames.
>
> On February 12, 2023 2:30:15 PM PST, Andrew Simmons  
> wrote:
>> drop = FALSE means that should the indexing select exactly one column, then
>> return a data frame with one column, instead of the object in the column.
>> It's usually not necessary, but I've messed up some data before by assuming
>> the indexing always returns a data frame when it doesn't, so drop = FALSE
>> let's me that I will always get a data frame.
>>
>> ```
>> x <- data.frame(V1 = 1:5, V2 = letters[1:5])
>> x[, "V2"]
>> x[, "V2", drop = FALSE]
>> ```
>>
>> You'll notice that the first returns a character vector, a through e, where
>> the second returns a data frame with one column where the object in the
>> column is the same character vector.
>>
>> You could alternatively use
>>
>> x["V2"]
>>
>> which should be identical to x[, "V2", drop = FALSE], but some people don't
>> like that because it doesn't look like matrix indexing anymore.
>>
>>
>> On Sun, Feb 12, 2023, 17:18 Steven T. Yen  wrote:
>>
>>> In the line suggested by Andrew Simmons,
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>>
>>> what does drop=FALSE do? Thanks.
>>>
>>> On 1/14/2023 8:48 PM, Steven Yen wrote:
>>>
>>> Thanks to all. Very helpful.
>>>
>>> Steven from iPhone
>>>
>>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons
>>>   wrote:
>>>
>>> You'll want to use grep() or grepl(). By default, grep() uses extended
>>> regular expressions to find matches, but you can also use perl regular
>>> expressions and globbing (after converting to a regular expression).
>>> For example:
>>>
>>> grepl("^yr", colnames(mydata))
>>>
>>> will tell you which 'colnames' start with "yr". If you'd rather you
>>> use globbing:
>>>
>>> grepl(glob2rx("yr*"), colnames(mydata))
>>>
>>> Then you might write something like this to remove the columns starting
>>> with yr:
>>>
>>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen
>>>   wrote:
>>>
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>>
>>> How do I remove them with a wild cardsomething similar to "del yr*"
>>>
>>> in Windows/doc? Thank you.
>>>
>>>
>>> colnames(mydata)
>>>
>>>[1] "year"   "weight" "confeduc"   "confothr" "college"
>>>
>>>[6] ...
>>>
>>>   [41] "yr3""yr4""yr5""yr6" "yr7"
>>>
>>>   [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>>
>>>   [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>>
>>>   [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>>
>>>   [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>>
>>>   [66] "yr28"...
>>>
>>>
>>> __
>>>
>>> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Jeff Newmiller
x["V2"]

is more efficient than using drop=FALSE, and perfectly normal syntax (data 
frames are lists of columns).  I would ignore the naysayers, or put a comment 
in if you want to accelerate their uptake.

As I understand it, one of the main reasons tibbles exist is because of 
drop=TRUE. List-slice (single-dimension) indexing works equally well with both 
standard and tibble types of data frames.

On February 12, 2023 2:30:15 PM PST, Andrew Simmons  wrote:
>drop = FALSE means that should the indexing select exactly one column, then
>return a data frame with one column, instead of the object in the column.
>It's usually not necessary, but I've messed up some data before by assuming
>the indexing always returns a data frame when it doesn't, so drop = FALSE
>let's me that I will always get a data frame.
>
>```
>x <- data.frame(V1 = 1:5, V2 = letters[1:5])
>x[, "V2"]
>x[, "V2", drop = FALSE]
>```
>
>You'll notice that the first returns a character vector, a through e, where
>the second returns a data frame with one column where the object in the
>column is the same character vector.
>
>You could alternatively use
>
>x["V2"]
>
>which should be identical to x[, "V2", drop = FALSE], but some people don't
>like that because it doesn't look like matrix indexing anymore.
>
>
>On Sun, Feb 12, 2023, 17:18 Steven T. Yen  wrote:
>
>> In the line suggested by Andrew Simmons,
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>> what does drop=FALSE do? Thanks.
>>
>> On 1/14/2023 8:48 PM, Steven Yen wrote:
>>
>> Thanks to all. Very helpful.
>>
>> Steven from iPhone
>>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons 
>>  wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses extended
>> regular expressions to find matches, but you can also use perl regular
>> expressions and globbing (after converting to a regular expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns starting
>> with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen 
>>  wrote:
>>
>>
>> I have a data frame containing variables "yr3",...,"yr28".
>>
>>
>> How do I remove them with a wild cardsomething similar to "del yr*"
>>
>> in Windows/doc? Thank you.
>>
>>
>> colnames(mydata)
>>
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>
>>   [6] ...
>>
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>
>>  [66] "yr28"...
>>
>>
>> __
>>
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Andrew Simmons
drop = FALSE means that should the indexing select exactly one column, then
return a data frame with one column, instead of the object in the column.
It's usually not necessary, but I've messed up some data before by assuming
the indexing always returns a data frame when it doesn't, so drop = FALSE
let's me that I will always get a data frame.

```
x <- data.frame(V1 = 1:5, V2 = letters[1:5])
x[, "V2"]
x[, "V2", drop = FALSE]
```

You'll notice that the first returns a character vector, a through e, where
the second returns a data frame with one column where the object in the
column is the same character vector.

You could alternatively use

x["V2"]

which should be identical to x[, "V2", drop = FALSE], but some people don't
like that because it doesn't look like matrix indexing anymore.


On Sun, Feb 12, 2023, 17:18 Steven T. Yen  wrote:

> In the line suggested by Andrew Simmons,
>
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>
> what does drop=FALSE do? Thanks.
>
> On 1/14/2023 8:48 PM, Steven Yen wrote:
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
> On Jan 14, 2023, at 3:08 PM, Andrew Simmons 
>  wrote:
>
> You'll want to use grep() or grepl(). By default, grep() uses extended
> regular expressions to find matches, but you can also use perl regular
> expressions and globbing (after converting to a regular expression).
> For example:
>
> grepl("^yr", colnames(mydata))
>
> will tell you which 'colnames' start with "yr". If you'd rather you
> use globbing:
>
> grepl(glob2rx("yr*"), colnames(mydata))
>
> Then you might write something like this to remove the columns starting
> with yr:
>
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>
> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen 
>  wrote:
>
>
> I have a data frame containing variables "yr3",...,"yr28".
>
>
> How do I remove them with a wild cardsomething similar to "del yr*"
>
> in Windows/doc? Thank you.
>
>
> colnames(mydata)
>
>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>
>   [6] ...
>
>  [41] "yr3""yr4""yr5""yr6" "yr7"
>
>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>
>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>
>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>
>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>
>  [66] "yr28"...
>
>
> __
>
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> https://stat.ethz.ch/mailman/listinfo/r-help
>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread Steven T. Yen
In the line suggested by Andrew Simmons,

mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]

what does drop=FALSE do? Thanks.

On 1/14/2023 8:48 PM, Steven Yen wrote:
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses extended
>> regular expressions to find matches, but you can also use perl regular
>> expressions and globbing (after converting to a regular expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns 
>> starting with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>> How do I remove them with a wild cardsomething similar to "del yr*"
>>> in Windows/doc? Thank you.
>>>
 colnames(mydata)
>>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>>   [6] ...
>>>  [41] "yr3"    "yr4"    "yr5"    "yr6" "yr7"
>>>  [46] "yr8"    "yr9"    "yr10"   "yr11" "yr12"
>>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>>  [66] "yr28"...
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-15 Thread Rui Barradas

Às 16:54 de 15/01/2023, Sorkin, John escreveu:

I am new to this thread. At the risk of presenting something that has been shown before, 
below I demonstrate how a column in a data frame can be dropped using a wild card, i.e. a 
column whose name starts with "th" using nothing more than base r functions and 
base R syntax. While additions to R such as tidyverse can be very helpful, many things 
that they do can be accomplished simply using base R.

# Create data frame with three columns
one <- rep(1,10)
one
two <- rep(2,10)
two
three <- rep(3,10)
three
mydata <- data.frame(one=one, two=two, three=three)
cat("Data frame with three columns\n")
mydata

# Drop the column whose name starts with th, i.e. column three
# Find the location of the column
ColumToDelete <- grep("th",colnames((mydata)))
cat("The colomumn to be dropped is the column called three, which is 
column",ColumToDelete,"\n")
ColumToDelete

# Drop the column whose name starts with "th"
newdata2 <- mydata[,-ColumnToDelete]
cat("Data frame after droping column whose name is three\n")
newdata2

I hope this helps.
John



From: R-help  on behalf of Valentin Petzel 

Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List'
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that is, 
the object may not change after creation. This guarantees that if you have a 
binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:


Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone


On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:

You'll want to use grep() or grepl(). By default, grep() uses
extended regular expressions to find matches, but you can also use
perl regular expressions and globbing (after converting to a regular 
expression).
For example:

grepl("^yr", colnames(mydata))

will tell you which 'colnames' start with "yr". If you'd rather you
use globbing:

grepl(glob2rx("yr*"), colnames(mydata))

Then you might write something like this to remove the columns starting with yr:

mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]


On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:

I have a data frame containing variables "yr3",...,"yr28".

How do I remove them with a wild cardsomething similar to "del yr*"
in Windows/doc? Thank you.


colnames(mydata)

   [1] "year"   "weight" "confeduc"   "confothr" "college"
   [6] ...
[41] "yr3"    "yr4" 

Re: [R] Removing variables from data frame with a wile card

2023-01-15 Thread avi.e.gross
John,

As you said, you are new to the discussion so let me catch you up.

The original question was about removing many columns that shared a similar 
feature in the naming convention while leaving other columns in-place. Quite a 
few replies were given on how to do that including how to use a regular 
expression to gather the column names you want to remove.

It was only afterwards that the topic changed a bit to mention that some people 
used additional ways both in base R and also using packages like dplyr in the 
tidyverse.

As a general rule, most packages out there provide functionality that can be 
done in base R if you wish, and some are written purely in R while some augment 
that with parts re-done in C or something. If a package is well built and 
frequently used, it may well make your life as a programmer easier as the code 
need not be re-invented and debugged. Of course some packages are of poorer 
quality.

So we fully agree that unless asked for, the base R answers should be the focus 
HERE. Then again, languages are not static and sometimes we see things like 
pipes moved in a modified version into the main language.

Avi

-Original Message-
From: Sorkin, John  
Sent: Sunday, January 15, 2023 11:55 AM
To: Valentin Petzel ; avi.e.gr...@gmail.com
Cc: 'R-help Mailing List' 
Subject: Re: [R] Removing variables from data frame with a wile card

I am new to this thread. At the risk of presenting something that has been 
shown before, below I demonstrate how a column in a data frame can be dropped 
using a wild card, i.e. a column whose name starts with "th" using nothing more 
than base r functions and base R syntax. While additions to R such as tidyverse 
can be very helpful, many things that they do can be accomplished simply using 
base R.  

# Create data frame with three columns
one <- rep(1,10)
one
two <- rep(2,10)
two
three <- rep(3,10)
three
mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three 
columns\n") mydata

# Drop the column whose name starts with th, i.e. column three # Find the 
location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The 
colomumn to be dropped is the column called three, which is 
column",ColumToDelete,"\n") ColumToDelete

# Drop the column whose name starts with "th"
newdata2 <- mydata[,-ColumnToDelete]
cat("Data frame after droping column whose name is three\n")
newdata2

I hope this helps.
John



From: R-help  on behalf of Valentin Petzel 

Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List'
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
>
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start wi

Re: [R] Removing variables from data frame with a wile card

2023-01-15 Thread Sorkin, John
I am new to this thread. At the risk of presenting something that has been 
shown before, below I demonstrate how a column in a data frame can be dropped 
using a wild card, i.e. a column whose name starts with "th" using nothing more 
than base r functions and base R syntax. While additions to R such as tidyverse 
can be very helpful, many things that they do can be accomplished simply using 
base R.  

# Create data frame with three columns
one <- rep(1,10)
one
two <- rep(2,10)
two
three <- rep(3,10)
three
mydata <- data.frame(one=one, two=two, three=three)
cat("Data frame with three columns\n")
mydata

# Drop the column whose name starts with th, i.e. column three
# Find the location of the column
ColumToDelete <- grep("th",colnames((mydata)))
cat("The colomumn to be dropped is the column called three, which is 
column",ColumToDelete,"\n")
ColumToDelete

# Drop the column whose name starts with "th"
newdata2 <- mydata[,-ColumnToDelete]
cat("Data frame after droping column whose name is three\n")
newdata2

I hope this helps.
John



From: R-help  on behalf of Valentin Petzel 

Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List'
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
>
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses
>> extended regular expressions to find matches, but you can also use
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns starting with 
>> yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>

Re: [R] Removing variables from data frame with a wile card

2023-01-15 Thread Valentin Petzel
Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
> 
> Just want to add a few things to what people wrote.
> 
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
> 
> That is fine.
> 
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
> 
> mydata$NAME <- NULL
> 
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
> 
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
> 
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
> 
> 
> 
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
> 
> Thanks to all. Very helpful.
> 
> Steven from iPhone
> 
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>> 
>> You'll want to use grep() or grepl(). By default, grep() uses
>> extended regular expressions to find matches, but you can also use
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>> 
>> grepl("^yr", colnames(mydata))
>> 
>> will tell you which 'colnames' start with "yr". If you'd rather you
>> use globbing:
>> 
>> grepl(glob2rx("yr*"), colnames(mydata))
>> 
>> Then you might write something like this to remove the columns starting with 
>> yr:
>> 
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>> 
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>> 
>>> I have a data frame containing variables "yr3",...,"yr28".
>>> 
>>> How do I remove them with a wild cardsomething similar to "del yr*"
>>> in Windows/doc? Thank you.
>>> 
>>>> colnames(mydata)
>>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>>   [6] ...
>>> [41] "yr3"    "yr4"    "yr5"    "yr6" "yr7"
>>> [46] "yr8"    "yr9"    "yr10"   "yr11" "yr12"
>>> [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>> [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>> [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>> [66] "yr28"...
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained,

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
John,

 

I am very familiar with the evolving tidyverse and some messages a while back 
included people who wanted this forum to mainly stick to base R, so I leave out 
examples.

 

Indeed, the tidyverse is designed to make it easy to select columns with all 
kinds of conditions including using regular expressions that allow more 
precision (as does grep) so you want to match “yr” followed by exactly one or 
two digits. Some of the answers suggest starting with “yr” was enough. They 
also allow selecting on arbitrary considerations like whether the column 
contains numeric data. You can do most things in base R, albeit I find the 
tidyverse method easier most of the time and also able to do some extremely 
complicated things with some care, such as creating multiple new columns form a 
set of columns that each implement a different function like mean, and mode and 
standard deviation and make the new columns the same names as the one they are 
derived from but a different suffix reflecting what transformation was done.

 

One nice feature is the ideas behind how data streams through multiple steps 
with one or a few transformations in each step, and the intermediate parts you 
do not want, simply melt away. The part about selecting or deselecting columns 
can often be used in many of the verbs.

 

From: John Kane  
Sent: Saturday, January 14, 2023 4:07 PM
To: avi.e.gr...@gmail.com
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

 

You rang sir?

 

library(tidyverse)
xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)

 

dat1  %>%  select(!starts_with("yr"))

 

or for something a bit more exotic as I have been trying to learn a bit about 
the "data.table package

 

library(data.table)

xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)

dat2 <- data.table(xx , yr1, yr2, yr3)

dat2[, !names(dat2) %like% "yr", with=FALSE ]
 

 

 

On Sat, 14 Jan 2023 at 12:28, mailto:avi.e.gr...@gmail.com> > wrote:

Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons mailto:akwsi...@gmail.com> >
Cc: R-help Mailing List mailto:r-help@r-project.org> >
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  <mailto:akwsi...@gmail.com> > wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen > <mailto:st...@ntu.edu.tw> > wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17&qu

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Valentin,

You are correct that R does many things largely behind the scenes that make 
some operations fairly efficient.

>From a programming point of view, though, many people might make a data.frame 
>and not think of it as a list of vectors of the same length that are kept that 
>way.

So if they made a copy of the original data with fewer columns, they might be 
tempted to think the original item was completely copied and the original is 
either around or if the identifier was re-used, will be garbage collected. As 
you note, the only thinks collected are the columns you chose not to include.

For some it seems cleaner to set a list item to NULL, which seems to remove it 
immediately. 

The real point I hoped to make is that using base R, you can indeed approach 
removing (multiple) columns in two logical ways. One is to seemingly remove 
them in the original object, even if your point is valid. The other is to make 
a copy of just what you want and ignore the rest and it may be kept around or 
not.

If someone really wanted to get down to the basics, they could get a reference 
to all the columns they want to keep, as in col1 <- mydata[["col1"] ] and use 
those to make a new data.frame, or many other variants on these methods.  

Many programming languages have some qualms (I mean designers and programmers, 
and just plain purists) about when "pointers" of sorts are used and whether 
things should be mutable and so on so I prefer to avoid religious wars.

-Original Message-
From: Valentin Petzel  
Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List' 
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
> 
> Just want to add a few things to what people wrote.
> 
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
> 
> That is fine.
> 
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
> 
> mydata$NAME <- NULL
> 
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
> 
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
> 
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
> 
> 
> 
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
> 
> Thanks to all. Very helpful.
> 
> Steven from iPhone
> 
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>> 
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>> 
>> grepl("^yr", colnames(mydata))
>> 
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>> 
>> grepl(glob2rx("yr*"), colnames(mydata))
>> 
>> Th

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread John Kane
You rang sir?

library(tidyverse)
xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)

dat1  %>%  select(!starts_with("yr"))

or for something a bit more exotic as I have been trying to learn a bit
about the "data.table package

library(data.table)

xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)

dat2 <- data.table(xx , yr1, yr2, yr3)

dat2[, !names(dat2) %like% "yr", with=FALSE ]



On Sat, 14 Jan 2023 at 12:28,  wrote:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original
> DF that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a
> column within it. You can do that in several ways but the simplest is
> something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of
> unctional programming method that does that with all components of your
> grep.
>
> R does have optimizations that make this less useful as a partial copy of
> a data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that
> let you select columns that start with or end with or contain some pattern
> and I find that way easier.
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
>
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
> > On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> >
> > You'll want to use grep() or grepl(). By default, grep() uses
> > extended regular expressions to find matches, but you can also use
> > perl regular expressions and globbing (after converting to a regular
> expression).
> > For example:
> >
> > grepl("^yr", colnames(mydata))
> >
> > will tell you which 'colnames' start with "yr". If you'd rather you
> > use globbing:
> >
> > grepl(glob2rx("yr*"), colnames(mydata))
> >
> > Then you might write something like this to remove the columns starting
> with yr:
> >
> > mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> >
> >> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
> >>
> >> I have a data frame containing variables "yr3",...,"yr28".
> >>
> >> How do I remove them with a wild cardsomething similar to "del yr*"
> >> in Windows/doc? Thank you.
> >>
> >>> colnames(mydata)
> >>   [1] "year"   "weight" "confeduc"   "confothr" "college"
> >>   [6] ...
> >>  [41] "yr3""yr4""yr5""yr6" "yr7"
> >>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
> >>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
> >>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
> >>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
> >>  [66] "yr28"...
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread Bill Dunlap
The -grep(pattern,colnames) as a subscript is a bit dangerous.  If no
colname matches the pattern then all columns will be omitted (because -0 is
the same as 0, which means no column). !grepl(pattern,colnames) avoids this
problem.

> mydata <- data.frame(A=1:3,B=11:13)
> mydata[, -grep("^yr", colnames(mydata))]
data frame with 0 columns and 3 rows
> mydata[, !grepl("^yr", colnames(mydata))]
  A  B
1 1 11
2 2 12
3 3 13

-Bill

On Fri, Jan 13, 2023 at 11:07 PM Eric Berger  wrote:

> mydata[, -grep("^yr",colnames(mydata))]
>
> On Sat, Jan 14, 2023 at 8:57 AM Steven T. Yen  wrote:
>
> > I have a data frame containing variables "yr3",...,"yr28".
> >
> > How do I remove them with a wild cardsomething similar to "del yr*"
> > in Windows/doc? Thank you.
> >
> >  > colnames(mydata)
> >[1] "year"   "weight" "confeduc"   "confothr" "college"
> >[6] ...
> >   [41] "yr3""yr4""yr5""yr6" "yr7"
> >   [46] "yr8""yr9""yr10"   "yr11" "yr12"
> >   [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
> >   [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
> >   [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
> >   [66] "yr28"...
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread Steven Yen
Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses extended
> regular expressions to find matches, but you can also use perl regular
> expressions and globbing (after converting to a regular expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-13 Thread Andrew Simmons
You'll want to use grep() or grepl(). By default, grep() uses extended
regular expressions to find matches, but you can also use perl regular
expressions and globbing (after converting to a regular expression).
For example:

grepl("^yr", colnames(mydata))

will tell you which 'colnames' start with "yr". If you'd rather you
use globbing:

grepl(glob2rx("yr*"), colnames(mydata))

Then you might write something like this to remove the columns starting with yr:

mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]

On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>
> I have a data frame containing variables "yr3",...,"yr28".
>
> How do I remove them with a wild cardsomething similar to "del yr*"
> in Windows/doc? Thank you.
>
>  > colnames(mydata)
>[1] "year"   "weight" "confeduc"   "confothr" "college"
>[6] ...
>   [41] "yr3""yr4""yr5""yr6" "yr7"
>   [46] "yr8""yr9""yr10"   "yr11" "yr12"
>   [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>   [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>   [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>   [66] "yr28"...
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-13 Thread Eric Berger
mydata[, -grep("^yr",colnames(mydata))]

On Sat, Jan 14, 2023 at 8:57 AM Steven T. Yen  wrote:

> I have a data frame containing variables "yr3",...,"yr28".
>
> How do I remove them with a wild cardsomething similar to "del yr*"
> in Windows/doc? Thank you.
>
>  > colnames(mydata)
>[1] "year"   "weight" "confeduc"   "confothr" "college"
>[6] ...
>   [41] "yr3""yr4""yr5""yr6" "yr7"
>   [46] "yr8""yr9""yr10"   "yr11" "yr12"
>   [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>   [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>   [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>   [66] "yr28"...
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Removing variables from data frame with a wile card

2023-01-13 Thread Steven T. Yen

I have a data frame containing variables "yr3",...,"yr28".

How do I remove them with a wild cardsomething similar to "del yr*" 
in Windows/doc? Thank you.


> colnames(mydata)
  [1] "year"   "weight" "confeduc"   "confothr" "college"
  [6] ...
 [41] "yr3"    "yr4"    "yr5"    "yr6" "yr7"
 [46] "yr8"    "yr9"    "yr10"   "yr11" "yr12"
 [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
 [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
 [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
 [66] "yr28"...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.