Ah, I think I'm beginning to see the light. Just to complete the final thought... the "\" is superfluous with the "_" character, so "\\_+" gets passed to regex as "\_+" and the "\" is ignored in the search; it also would be ignored in a replacement. However, as you remarked, "." and "\." act differently in a search but the same in a replacement. I hope I have that straight now. Thanks much!
-- TMK -- 212-460-5430 home 917-656-5351 cell >From: "Greg Snow" <[EMAIL PROTECTED]> >To: "Talbot Katz" <[EMAIL PROTECTED]>,[EMAIL PROTECTED] >CC: r-help@stat.math.ethz.ch >Subject: RE: [R] gsub warning message >Date: Fri, 31 Aug 2007 12:41:37 -0600 > >What is happening is that before the regex engine can look at your >pattern, the R string parsing routines first process your input as a >string. In the string processing there are certain things represented >using a backslash. Try this code in R: > > > cat('here\tthere\n') > >The \t is made into a tab and the \n is made into a newline. If you >want the actuall backslash you need \\: > > > cat('here\\tthere\n') > >So if you want the regex engine to see \. (which means a literal dot) >then you need to say \\. So that the string processing sees \\ and >converts it to \ to pass to the regex engine. If you say \. Then it >looks in its table where it knows what to do with \t, \n, and others, >but \. Is not there (it is meaningful to regexs but not string >proccessing), so gives you the warning. For your example you are using >it in the replacement portion where the \ in front of . Does not do >anything, which is why either works. If you are using it in the pattern >to match, then \\. (which gets reduced to \.) matches a . (dot >character) while . (without \) matches any single character (with some >possible exceptions), so in some cases it may give different results. > >Hope this helps, > > > >-- >Gregory (Greg) L. Snow Ph.D. >Statistical Data Center >Intermountain Healthcare >[EMAIL PROTECTED] >(801) 408-8111 > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz > > Sent: Friday, August 31, 2007 12:30 PM > > To: [EMAIL PROTECTED] > > Cc: r-help@stat.math.ethz.ch > > Subject: Re: [R] gsub warning message > > > > Thank you for the swift response. It looks like the code > > works the same way with or without the "\\" in either the > > search string: { "\\_+" or "_+" } or the replacement string: > > { "\\." or "." }. I tested this in Windows and Linux > > (although we're still on R 2.4.1 in Linux). It's not clear > > to me why I can use either two slashes or no slash safely, > > but not one slash, and it makes me vaguely uneasy. > > Obviously, I need to review regular expressions, but my usual > > sources, such as http://perldoc.perl.org/perlre.html, don't > > seem to address this issue. I wonder whether there's a good > > document explaining this. > > > > -- TMK -- > > 212-460-5430 home > > 917-656-5351 cell > > > > > > >From: Uwe Ligges <[EMAIL PROTECTED]> > > >To: Talbot Katz <[EMAIL PROTECTED]> > > >CC: r-help@stat.math.ethz.ch > > >Subject: Re: [R] gsub warning message > > >Date: Fri, 31 Aug 2007 18:04:39 +0200 > > > > > > > > > > > >Talbot Katz wrote: > > >>Hi. > > >> > > >>I am using R 2.5.1 on a Windows XP machine. Here is an > > example of a > > >>piece of code I was running in older versions of R on the same > > >>machine. I am looking for underscores and replacing them with > > >>periods. This result is from R 2.4.1: > > >> > > >>>gsub ( "\\_+","\.","AAA_I") > > >>[1] "AAA.I" > > >> > > >>Here is what I get in R 2.5.1: > > >> > > >>>gsub ( "\\_+","\.","AAA_I") > > >>[1] "AAA.I" > > >>Warning messages: > > >>1: '\.' is an unrecognized escape in a character string > > >>2: unrecognized escape removed from "\." > > >> > > >>I still get the same result, which is what I want, but now I get a > > >>warning message. Am I actually doing something wrong that the > > >>previous versions of R didn't warn me about? Or is this warning > > >>message unwarranted? Is there a fully approved method for > > getting the same functionality? Thanks! > > > > > >Yes, correct usage is either > > > gsub ( "\\_+", ".", "AAA_I") > > >or > > > gsub ( "\\_+", "\\.", "AAA_I") > > > > > >Uwe Ligges > > > > > > > > > > > >>-- TMK -- > > >>212-460-5430 home > > >>917-656-5351 cell > > >> > > >>______________________________________________ > > >>R-help@stat.math.ethz.ch mailing list > > >>https://stat.ethz.ch/mailman/listinfo/r-help > > >>PLEASE do read the posting guide > > >>http://www.R-project.org/posting-guide.html > > >>and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.