Re: [R] Regular Expressions + Matrices

Fred G Fri, 10 Aug 2012 11:41:17 -0700

Thanks so much, and thanks for the clarification. "New York" ---> "New"
should not match "Other New" because "New" is not the first.


Thanks so much, testing it on my data now.

On Fri, Aug 10, 2012 at 2:35 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> My code doesn't predict a point you've made clear in this post. Inline.
> Em 10-08-2012 19:05, Fred G escreveu:
>
>  Thanks Arun. The only issue is that I need the code to be very
>> generalizable, such that the grep() really has to be if the first string
>> up
>> to the whitespace in a row (ie "New", "Boston", "Washington", "Detroit
>> below) is the same as the first string up to the whitespace in the row
>> directly below it
>>
>
> Does this mean that "New York" ---> "New" in one row shouldn't match
> "Other New" in the next row because "New" is not the first string up to the
> whitespace? If this is the case, modify my earlier code to
>
>
>
> fun <- function(i, x){
>     if(x[i, "ID"] != x[i + 1, "ID"]){
>         s1 <- unlist(strsplit(x[i, "NAME"], "[[:space:]]"))[1]     # keep
> first string
>         s2 <- unlist(strsplit(x[i + 1, "NAME"], "[[:space:]]"))[1]  # keep
> first string
>         if(grepl(s1, s2)) return(TRUE)
>     }
>     FALSE
> }
>
> If it isn't the case, do nothing.
>
> Rui Barradas
>
>
>  , AND the ID's are different, then copy.  The actual file
>> has thousands of different IDs and names...
>>
>> On Fri, Aug 10, 2012 at 2:01 PM, arun <smartpink...@yahoo.com> wrote:
>>
>>
>>> Hi,
>>>
>>> Try this:
>>> dat1<-read.table(text="
>>> ID,    NAME,    YEAR,    SOURCE
>>> 1,    New York Mets,    1900,    ESPN
>>> 2,    New York Yankees,    1920,    Cooperstown
>>> 3,    Boston Redsox,    1918,    ESPN
>>> 4,    Washington Nationals,    2010,    ESPN
>>> 5,    Detroit Tigers,    1990,    ESPN
>>> ",sep=",",header=TRUE,**stringsAsFactors=FALSE)
>>>
>>>   index<-grep("New York.*",dat1$NAME)
>>> dat1[index,]
>>> #  ID             NAME YEAR      SOURCE
>>> #1  1    New York Mets 1900        ESPN
>>> #2  2 New York Yankees 1920 Cooperstown
>>>
>>> A.K.
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Fred G <bayespoker...@gmail.com>
>>> To: r-help@r-project.org
>>> Cc:
>>> Sent: Friday, August 10, 2012 1:41 PM
>>> Subject: [R] Regular Expressions + Matrices
>>>
>>> Hi all,
>>>
>>> My code looks like the following:
>>> inname = read.csv("ID_error_checker.**csv", as.is=TRUE)
>>> outname = read.csv("output.csv", as.is=TRUE)
>>>
>>> #My algorithm is the following:
>>> #for line in inname
>>> #if first string up to whitespace in row in inname$name = first string up
>>> to whitespace in row + 1 in inname$name
>>> #AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the
>>> row
>>> below it
>>> #copy these two lines to a new file
>>>
>>> In other words, if the name (up to the first whitespace) in the first row
>>> equals the name in the second row (etc for whole file) and the ID in the
>>> first row does not equal the ID in the second row, copy both of these
>>> rows
>>> in full to a new file.  Only caveat is that I want a regular expression
>>> not
>>> to take the full names, but just the first string up to the first
>>> whitespace in the inname$name column (ie if row1 has a name of: New York
>>> Mets and row2 has a name of New York Yankees, I would want both of these
>>> rows to be copied in full since "New" is the same in both...)
>>>
>>> Here is some example data:
>>> ID NAME                          YEAR     SOURCE     NOTES
>>> 1  New York Mets               1900      ESPN
>>> 2  New York Yankees          1920     Cooperstown
>>> 3  Boston Redsox               1918      ESPN
>>> 4  Washington Nationals      2010     ESPN
>>> 5  Detroit Tigers                  1990      ESPN
>>>
>>> The desired output would be:
>>> ID   NAME                    YEAR SOURCE
>>> 1    New York Mets        1900   ESPN
>>> 2    New York Yankees   1920   Cooperstown
>>>
>>> Thanks so much!
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________**________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>
>> ______________________________**________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular Expressions + Matrices

Reply via email to