Thanks so much, and thanks for the clarification. "New York" ---> "New" should not match "Other New" because "New" is not the first.
Thanks so much, testing it on my data now. On Fri, Aug 10, 2012 at 2:35 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > My code doesn't predict a point you've made clear in this post. Inline. > Em 10-08-2012 19:05, Fred G escreveu: > > Thanks Arun. The only issue is that I need the code to be very >> generalizable, such that the grep() really has to be if the first string >> up >> to the whitespace in a row (ie "New", "Boston", "Washington", "Detroit >> below) is the same as the first string up to the whitespace in the row >> directly below it >> > > Does this mean that "New York" ---> "New" in one row shouldn't match > "Other New" in the next row because "New" is not the first string up to the > whitespace? If this is the case, modify my earlier code to > > > > fun <- function(i, x){ > if(x[i, "ID"] != x[i + 1, "ID"]){ > s1 <- unlist(strsplit(x[i, "NAME"], "[[:space:]]"))[1] # keep > first string > s2 <- unlist(strsplit(x[i + 1, "NAME"], "[[:space:]]"))[1] # keep > first string > if(grepl(s1, s2)) return(TRUE) > } > FALSE > } > > If it isn't the case, do nothing. > > Rui Barradas > > > , AND the ID's are different, then copy. The actual file >> has thousands of different IDs and names... >> >> On Fri, Aug 10, 2012 at 2:01 PM, arun <smartpink...@yahoo.com> wrote: >> >> >>> Hi, >>> >>> Try this: >>> dat1<-read.table(text=" >>> ID, NAME, YEAR, SOURCE >>> 1, New York Mets, 1900, ESPN >>> 2, New York Yankees, 1920, Cooperstown >>> 3, Boston Redsox, 1918, ESPN >>> 4, Washington Nationals, 2010, ESPN >>> 5, Detroit Tigers, 1990, ESPN >>> ",sep=",",header=TRUE,**stringsAsFactors=FALSE) >>> >>> index<-grep("New York.*",dat1$NAME) >>> dat1[index,] >>> # ID NAME YEAR SOURCE >>> #1 1 New York Mets 1900 ESPN >>> #2 2 New York Yankees 1920 Cooperstown >>> >>> A.K. >>> >>> >>> >>> ----- Original Message ----- >>> From: Fred G <bayespoker...@gmail.com> >>> To: r-help@r-project.org >>> Cc: >>> Sent: Friday, August 10, 2012 1:41 PM >>> Subject: [R] Regular Expressions + Matrices >>> >>> Hi all, >>> >>> My code looks like the following: >>> inname = read.csv("ID_error_checker.**csv", as.is=TRUE) >>> outname = read.csv("output.csv", as.is=TRUE) >>> >>> #My algorithm is the following: >>> #for line in inname >>> #if first string up to whitespace in row in inname$name = first string up >>> to whitespace in row + 1 in inname$name >>> #AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the >>> row >>> below it >>> #copy these two lines to a new file >>> >>> In other words, if the name (up to the first whitespace) in the first row >>> equals the name in the second row (etc for whole file) and the ID in the >>> first row does not equal the ID in the second row, copy both of these >>> rows >>> in full to a new file. Only caveat is that I want a regular expression >>> not >>> to take the full names, but just the first string up to the first >>> whitespace in the inname$name column (ie if row1 has a name of: New York >>> Mets and row2 has a name of New York Yankees, I would want both of these >>> rows to be copied in full since "New" is the same in both...) >>> >>> Here is some example data: >>> ID NAME YEAR SOURCE NOTES >>> 1 New York Mets 1900 ESPN >>> 2 New York Yankees 1920 Cooperstown >>> 3 Boston Redsox 1918 ESPN >>> 4 Washington Nationals 2010 ESPN >>> 5 Detroit Tigers 1990 ESPN >>> >>> The desired output would be: >>> ID NAME YEAR SOURCE >>> 1 New York Mets 1900 ESPN >>> 2 New York Yankees 1920 Cooperstown >>> >>> Thanks so much! >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide >>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> [[alternative HTML version deleted]] >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.