Re: [R] Removing rows if certain elements are found in character string

Claudia Penaloza Tue, 03 Jul 2012 09:10:56 -0700

Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly because there
are no instances of 'Dd' or 'dD' in the data set (that I would/not want to
include/exclude)... but I understand that 'i1new' targets precisely what I
want.


Why isn't a leader of zero's required for either 'i1' or 'i1new', as so?

i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch)

Thank you again,
Claudia
On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <[email protected]> wrote:

> Hello,
>
> Inline.
>
> Em 03-07-2012 01:15, jim holtman escreveu:
>
>  You will have to change the 'i1' expression as follows:
>>
>>  i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>>> i1  # matches strings with d & D in them
>>>
>>   [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>
>>> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below
>>> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch)
>>> i1new
>>>
>>   [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>
>>>
>>>
> Right, apparently, I forgot that grep is greedy, and the test cases were
> not complete.
>
>
>>>
>> I put a 'd' and 'D' in the second string and the original regular
>> expression is equivalent to
>>
>> grepl("^[0dD]*$", dd$ch)
>>
>>
> This is only for the first request, and does not solve cases where there
> are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first
> non-zero. This is the case of my 4th row, changed from the OP's data
> example.
>
> My regexpr for 'i2' is equivalent to this one, that I believe is more
> readable:
>
>
> i2b <- grepl("^0{0,}[Dd]", dd$ch)
>
>
> First a zero, that might occur zero or more times, then a 'd' or 'D', then
> and til the end, irrelevant.
>
>
>  which will match strings containing d, D and 0.  If you only want 'd'
>> or 'D' (and not both), then you will have to use the one in 'i1new'.
>>
>>
> To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'.
>
> Rui Barradas
>
>
>  On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> Try regular expressions instead.
>>> In this data.frame, I've changed row nr.4 to have a row with 'D' as first
>>> non-zero character.
>>>
>>> dd <- read.table(text="
>>>
>>> ch     count
>>> 1  0000000000D0000000000000000000**000000000000000000 0.007368
>>> 2  0000000000d0000000000000000000**000000000000000000 0.002456
>>> 3  000000000T00000000000000000000**000000000000000000 0.007368
>>> 4  000000000DT0000000000000000000**000000000000000000 0.007368
>>>
>>> 5  000000000T00000000000000000000**000000000000000000 0.002456
>>> 6  000000000Td0000000000000000000**000000000000000000 0.002456
>>> 7  00000000T000000000000000000000**000000000000000000 0.007368
>>> 8  00000000T0D0000000000000000000**000000000000000000 0.007368
>>> 9  00000000T000000000000000000000**000000000000000000 0.002456
>>> 10 00000000T0d0000000000000000000**000000000000000000 0.002456
>>> ", header=TRUE)
>>> dd
>>>
>>> i1 <- grepl("^([0D]|[0d])*$", dd$ch)
>>> i2 <- grepl("^0*[Dd]", dd$ch)
>>>
>>> dd[!i1, ]
>>> dd[!i2, ]
>>> dd[!(i1 | i2), ]
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 02-07-2012 23:48, Claudia Penaloza escreveu:
>>>
>>>  I would like to remove rows from the following data frame (df) if there
>>>> are
>>>> only two specific elements found in the df$ch character string (I want
>>>> to
>>>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would
>>>> like
>>>> to remove rows if the first non-zero element is "D" or "d".
>>>>
>>>>
>>>>                                                    ch     count
>>>> 1  0000000000D0000000000000000000**000000000000000000 0.007368;
>>>> 2  0000000000d0000000000000000000**000000000000000000 0.002456;
>>>> 3  000000000T00000000000000000000**000000000000000000 0.007368;
>>>> 4  000000000TD0000000000000000000**000000000000000000 0.007368;
>>>> 5  000000000T00000000000000000000**000000000000000000 0.002456;
>>>> 6  000000000Td0000000000000000000**000000000000000000 0.002456;
>>>> 7  00000000T000000000000000000000**000000000000000000 0.007368;
>>>> 8  00000000T0D0000000000000000000**000000000000000000 0.007368;
>>>> 9  00000000T000000000000000000000**000000000000000000 0.002456;
>>>> 10 00000000T0d0000000000000000000**000000000000000000 0.002456;
>>>>
>>>>
>>>> I tried the following but it doesn't work if there is more than one
>>>> character per string:
>>>>
>>>>  df <- df[!df$ch %in% c("0","D"),]
>>>>> df <- df[!df$ch %in% c("0","d"),]
>>>>>
>>>>
>>>>
>>>> Any help greatly appreciated,
>>>> Claudia
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________**________________
>>>> [email protected] mailing list
>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> ______________________________**________________
>>> [email protected] mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing rows if certain elements are found in character string

Reply via email to