[R] gsub warning message
Hi. I am using R 2.5.1 on a Windows XP machine. Here is an example of a piece of code I was running in older versions of R on the same machine. I am looking for underscores and replacing them with periods. This result is from R 2.4.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Here is what I get in R 2.5.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \. I still get the same result, which is what I want, but now I get a warning message. Am I actually doing something wrong that the previous versions of R didn't warn me about? Or is this warning message unwarranted? Is there a fully approved method for getting the same functionality? Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub warning message
Talbot Katz wrote: Hi. I am using R 2.5.1 on a Windows XP machine. Here is an example of a piece of code I was running in older versions of R on the same machine. I am looking for underscores and replacing them with periods. This result is from R 2.4.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Here is what I get in R 2.5.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \. I still get the same result, which is what I want, but now I get a warning message. Am I actually doing something wrong that the previous versions of R didn't warn me about? Or is this warning message unwarranted? Is there a fully approved method for getting the same functionality? Thanks! Yes, correct usage is either gsub ( \\_+, ., AAA_I) or gsub ( \\_+, \\., AAA_I) Uwe Ligges -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub warning message
Thank you for the swift response. It looks like the code works the same way with or without the \\ in either the search string: { \\_+ or _+ } or the replacement string: { \\. or . }. I tested this in Windows and Linux (although we're still on R 2.4.1 in Linux). It's not clear to me why I can use either two slashes or no slash safely, but not one slash, and it makes me vaguely uneasy. Obviously, I need to review regular expressions, but my usual sources, such as http://perldoc.perl.org/perlre.html, don't seem to address this issue. I wonder whether there's a good document explaining this. -- TMK -- 212-460-5430home 917-656-5351cell From: Uwe Ligges [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] gsub warning message Date: Fri, 31 Aug 2007 18:04:39 +0200 Talbot Katz wrote: Hi. I am using R 2.5.1 on a Windows XP machine. Here is an example of a piece of code I was running in older versions of R on the same machine. I am looking for underscores and replacing them with periods. This result is from R 2.4.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Here is what I get in R 2.5.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \. I still get the same result, which is what I want, but now I get a warning message. Am I actually doing something wrong that the previous versions of R didn't warn me about? Or is this warning message unwarranted? Is there a fully approved method for getting the same functionality? Thanks! Yes, correct usage is either gsub ( \\_+, ., AAA_I) or gsub ( \\_+, \\., AAA_I) Uwe Ligges -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub warning message
What is happening is that before the regex engine can look at your pattern, the R string parsing routines first process your input as a string. In the string processing there are certain things represented using a backslash. Try this code in R: cat('here\tthere\n') The \t is made into a tab and the \n is made into a newline. If you want the actuall backslash you need \\: cat('here\\tthere\n') So if you want the regex engine to see \. (which means a literal dot) then you need to say \\. So that the string processing sees \\ and converts it to \ to pass to the regex engine. If you say \. Then it looks in its table where it knows what to do with \t, \n, and others, but \. Is not there (it is meaningful to regexs but not string proccessing), so gives you the warning. For your example you are using it in the replacement portion where the \ in front of . Does not do anything, which is why either works. If you are using it in the pattern to match, then \\. (which gets reduced to \.) matches a . (dot character) while . (without \) matches any single character (with some possible exceptions), so in some cases it may give different results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Friday, August 31, 2007 12:30 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] gsub warning message Thank you for the swift response. It looks like the code works the same way with or without the \\ in either the search string: { \\_+ or _+ } or the replacement string: { \\. or . }. I tested this in Windows and Linux (although we're still on R 2.4.1 in Linux). It's not clear to me why I can use either two slashes or no slash safely, but not one slash, and it makes me vaguely uneasy. Obviously, I need to review regular expressions, but my usual sources, such as http://perldoc.perl.org/perlre.html, don't seem to address this issue. I wonder whether there's a good document explaining this. -- TMK -- 212-460-5430 home 917-656-5351 cell From: Uwe Ligges [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] gsub warning message Date: Fri, 31 Aug 2007 18:04:39 +0200 Talbot Katz wrote: Hi. I am using R 2.5.1 on a Windows XP machine. Here is an example of a piece of code I was running in older versions of R on the same machine. I am looking for underscores and replacing them with periods. This result is from R 2.4.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Here is what I get in R 2.5.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \. I still get the same result, which is what I want, but now I get a warning message. Am I actually doing something wrong that the previous versions of R didn't warn me about? Or is this warning message unwarranted? Is there a fully approved method for getting the same functionality? Thanks! Yes, correct usage is either gsub ( \\_+, ., AAA_I) or gsub ( \\_+, \\., AAA_I) Uwe Ligges -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub warning message
Ah, I think I'm beginning to see the light. Just to complete the final thought... the \ is superfluous with the _ character, so \\_+ gets passed to regex as \_+ and the \ is ignored in the search; it also would be ignored in a replacement. However, as you remarked, . and \. act differently in a search but the same in a replacement. I hope I have that straight now. Thanks much! -- TMK -- 212-460-5430home 917-656-5351cell From: Greg Snow [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],[EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: RE: [R] gsub warning message Date: Fri, 31 Aug 2007 12:41:37 -0600 What is happening is that before the regex engine can look at your pattern, the R string parsing routines first process your input as a string. In the string processing there are certain things represented using a backslash. Try this code in R: cat('here\tthere\n') The \t is made into a tab and the \n is made into a newline. If you want the actuall backslash you need \\: cat('here\\tthere\n') So if you want the regex engine to see \. (which means a literal dot) then you need to say \\. So that the string processing sees \\ and converts it to \ to pass to the regex engine. If you say \. Then it looks in its table where it knows what to do with \t, \n, and others, but \. Is not there (it is meaningful to regexs but not string proccessing), so gives you the warning. For your example you are using it in the replacement portion where the \ in front of . Does not do anything, which is why either works. If you are using it in the pattern to match, then \\. (which gets reduced to \.) matches a . (dot character) while . (without \) matches any single character (with some possible exceptions), so in some cases it may give different results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Friday, August 31, 2007 12:30 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] gsub warning message Thank you for the swift response. It looks like the code works the same way with or without the \\ in either the search string: { \\_+ or _+ } or the replacement string: { \\. or . }. I tested this in Windows and Linux (although we're still on R 2.4.1 in Linux). It's not clear to me why I can use either two slashes or no slash safely, but not one slash, and it makes me vaguely uneasy. Obviously, I need to review regular expressions, but my usual sources, such as http://perldoc.perl.org/perlre.html, don't seem to address this issue. I wonder whether there's a good document explaining this. -- TMK -- 212-460-5430home 917-656-5351cell From: Uwe Ligges [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] gsub warning message Date: Fri, 31 Aug 2007 18:04:39 +0200 Talbot Katz wrote: Hi. I am using R 2.5.1 on a Windows XP machine. Here is an example of a piece of code I was running in older versions of R on the same machine. I am looking for underscores and replacing them with periods. This result is from R 2.4.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Here is what I get in R 2.5.1: gsub ( \\_+,\.,AAA_I) [1] AAA.I Warning messages: 1: '\.' is an unrecognized escape in a character string 2: unrecognized escape removed from \. I still get the same result, which is what I want, but now I get a warning message. Am I actually doing something wrong that the previous versions of R didn't warn me about? Or is this warning message unwarranted? Is there a fully approved method for getting the same functionality? Thanks! Yes, correct usage is either gsub ( \\_+, ., AAA_I) or gsub ( \\_+, \\., AAA_I) Uwe Ligges -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub: replacing a.*a if no occurence of b in .*
I am trying to read a number of XML files using xmlTreeParse(). Unfortunately, some of them are malformed in a way that makes R crash. The problem is that closing tags are sometimes repeated like this: tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag I want to preprocess the contents of the XML file using gsub() before feeding them to xmlTreeParse() to clean them up, but I can't figure out how to do it. What I need is something that transforms the example above into: tagvalue1/tagtagvalue2/tagtagvalue3/tag Some kind of /tag.*/tag that only matches if there is no tag in .*. Thanks in advance for you ideas, Uli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
Ulrich Keller [EMAIL PROTECTED] writes: I am trying to read a number of XML files using xmlTreeParse(). Unfortunately, some of them are malformed in a way that makes R crash. The problem is that closing tags are sometimes repeated like this: tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag I want to preprocess the contents of the XML file using gsub() before feeding them to xmlTreeParse() to clean them up, but I can't figure out how to do it. What I need is something that transforms the example above into: tagvalue1/tagtagvalue2/tagtagvalue3/tag Some kind of /tag.*/tag that only matches if there is no tag in .*. Thanks in advance for you ideas, Hmm, there are things you just cannot do with RE's, and I suspect that this is one of them. Something involving explicit splitting of the strings might work, though. How's this for size? trim - function(x)paste(sub(/tag.*,/tag,x),collapse=tag) sapply(strsplit(x,tag),trim) [1] tagvalue1/tagtagvalue2/tagtagvalue3/tag -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
On Sat, 2007-02-24 at 15:03 +0100, Peter Dalgaard wrote: Ulrich Keller [EMAIL PROTECTED] writes: I am trying to read a number of XML files using xmlTreeParse(). Unfortunately, some of them are malformed in a way that makes R crash. The problem is that closing tags are sometimes repeated like this: tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag I want to preprocess the contents of the XML file using gsub() before feeding them to xmlTreeParse() to clean them up, but I can't figure out how to do it. What I need is something that transforms the example above into: tagvalue1/tagtagvalue2/tagtagvalue3/tag Some kind of /tag.*/tag that only matches if there is no tag in .*. Thanks in advance for you ideas, Hmm, there are things you just cannot do with RE's, and I suspect that this is one of them. Something involving explicit splitting of the strings might work, though. How's this for size? trim - function(x)paste(sub(/tag.*,/tag,x),collapse=tag) sapply(strsplit(x,tag),trim) [1] tagvalue1/tagtagvalue2/tagtagvalue3/tag Does this work? XML [1] tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag gsub([^]*(/tag){2}, , XML) [1] tagvalue1/tagtagvalue2/tagtagvalue3/tag This looks for any characters != '' that precedes a /tag/tag sequence. It replaces that with . ? Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
All these methods do assume that you don't have nested tag's, like so: tagtagfoo/taguseful stuff/tagsome garbage/tag For that you would really need a true parser. So I would double-check to make sure this doesn't happen. Do you have any control on where those XML files are generated though? It sounds to me it might be easier to fix the utility generating those XML files, since it clearly is doing something wrong. On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote: I assume tag is known. This removes any occurrence /tag.*/tag where .* does not contain tag or /tag. The regular expression, re, matches /tag, then does a greedy match (?U) for anything followed by /tag but uses a zero width lookahead subexpression (?=...) for the second /tag so that it it can be rematched again. gsubfn in package gsubfn is like the usual gsub except that instead of replacing the match with a string it passes the match to function f and then replaces the match with the output of f. See the gsubfn home page: http://code.google.com/p/gsubfn/ and vignette. Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
The _question_ assumed that, which is why the answers did too. On 2/24/07, Charilaos Skiadas [EMAIL PROTECTED] wrote: All these methods do assume that you don't have nested tag's, like so: tagtagfoo/taguseful stuff/tagsome garbage/tag For that you would really need a true parser. So I would double-check to make sure this doesn't happen. Do you have any control on where those XML files are generated though? It sounds to me it might be easier to fix the utility generating those XML files, since it clearly is doing something wrong. On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote: I assume tag is known. This removes any occurrence /tag.*/tag where .* does not contain tag or /tag. The regular expression, re, matches /tag, then does a greedy match (?U) for anything followed by /tag but uses a zero width lookahead subexpression (?=...) for the second /tag so that it it can be rematched again. gsubfn in package gsubfn is like the usual gsub except that instead of replacing the match with a string it passes the match to function f and then replaces the match with the output of f. See the gsubfn home page: http://code.google.com/p/gsubfn/ and vignette. Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
On Feb 24, 2007, at 11:37 AM, Gabor Grothendieck wrote: The _question_ assumed that, which is why the answers did too. Oh yes, I totally agree, the file snippet the OP provided did indeed assume that, though nothing in the text of his question did, so I wasn't entirely clear whether the actual file that is going to be processed has this form or not. So I just wanted to make sure the OP is aware of this limitation, in case the actual file is more problematic. But most importantly, I wanted to suggest a reevaluation, if possible, of the process that generates these XML's, and perhaps fixing that, instead of patching the problem after it has been created. Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub: replacing a.*a if no occurence of b in .*
Charilaos Skiadas wrote: On Feb 24, 2007, at 11:37 AM, Gabor Grothendieck wrote: The _question_ assumed that, which is why the answers did too. Oh yes, I totally agree, the file snippet the OP provided did indeed assume that, though nothing in the text of his question did, so I wasn't entirely clear whether the actual file that is going to be processed has this form or not. So I just wanted to make sure the OP is aware of this limitation, in case the actual file is more problematic. But most importantly, I wanted to suggest a reevaluation, if possible, of the process that generates these XML's, and perhaps fixing that, instead of patching the problem after it has been created. Also, I wouldn't tolerate R *crashing* in package code on malformed xml input. Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub regexp question
Dear R Users, I am trying to users gsub to remove multiple cases of square brackets and their different contents in a character string. A sample of such a string is shown below. However, I am having great difficulty understanding regexp syntax. Any help is greatly appreciated. Ally tree STATE_286000 [lnP=-12708.453945423369] = [R] ((15[rate=0.009761226401396686]:7.040851727747465,17[rate=0.011500289631135564]:7.040851727747465)[rate=0.010986570567484494]:2.257049446900292,(18[rate=0.009123432243563103]:2.461289418776003,19[rate=0.00981822432115329]:2.461289418776003) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub regexp question
On Jan 27, 2007, at 3:41 PM, Phillimore, Albert wrote: Dear R Users, I am trying to users gsub to remove multiple cases of square brackets and their different contents in a character string. A sample of such a string is shown below. However, I am having great difficulty understanding regexp syntax. Any help is greatly appreciated. Ally tree STATE_286000 [lnP=-12708.453945423369] = [R] ((15 [rate=0.009761226401396686]:7.040851727747465,17 [rate=0.011500289631135564]:7.040851727747465) [rate=0.010986570567484494]:2.257049446900292,(18 [rate=0.009123432243563103]:2.461289418776003,19 [rate=0.00981822432115329]:2.461289418776003) Is this what you want? I tend to prefer perl regular expressions: str - tree STATE_286000 [lnP=-12708.453945423369] = [R] ((15[rate=0.009761226401396686]:7.040851727747465,17 [rate=0.011500289631135564]:7.040851727747465) [rate=0.010986570567484494]:2.257049446900292,(18 [rate=0.009123432243563103]:2.461289418776003,19 [rate=0.00981822432115329]:2.461289418776003) gsub(\\[[^\\]]+\\],,str, perl=T) [1] tree STATE_286000 = ((15:7.040851727747465,17:7.040851727747465):2.257049446900292, (18:2.461289418776003,19:2.461289418776003) As an explanation, \\[ and \\] match the two square brackets you want. We need to escape the brackets with the backslashes because they have a special meaning in perl regular expressions. In perl regexps, [] stands for match a single character that is like what we have in the For instance [ab] will match an a or a b. [a-z] will match all lowercase characters. A ^ as a first character in there means match all but what follows. for instance [^a-z] means match anything but lowercase characters. So [^\\]] means match any character but a closing bracket. Finally the plus sign afterwards means: match at least one. So [^\\]] + means match any sequence of characters that does not contain a closing bracket. So the whole thing now matches an opening bracket, followed by all characters until a corresponding closing bracket. This will not work if you have nested pairs of brackets, [like [so]]. That is a tad more delicate, and we can discuss it if you really need to deal with it. Haris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub
R-help, I want to remove the following strings cpue and nogd string - c(upsanogd ,toskanogd , hysunogd , konganogd ,gullaksnogd , longunogd , blalongunogd , brosmunogd) I could use first : first - gsub(cpue , , string) and then : second - gsub(nogd , , first) Can it be done at once? Thanks in advance version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub
On 11/15/2006 8:29 AM, Luis Ridao Cruz wrote: R-help, I want to remove the following strings cpue and nogd string - c(upsanogd ,toskanogd , hysunogd , konganogd ,gullaksnogd , longunogd , blalongunogd , brosmunogd) I could use first : first - gsub(cpue , , string) and then : second - gsub(nogd , , first) Can it be done at once? gsub(cpue|nogd, , string) See ?regex for a description of the kinds of patterns R can use, in particular Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression. For example, abba|cde matches either the string abba or the string cde. Note that alternation does not work inside character classes, where | has its literal meaning. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsub
Is this what you want? : gsub(cpue\|nogd, , string) John --- Web sites: www.ifr.ac.uk www.foodandhealthnetwork.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luis Ridao Cruz Sent: 15 November 2006 13:29 To: r-help@stat.math.ethz.ch Subject: [R] gsub R-help, I want to remove the following strings cpue and nogd string - c(upsanogd ,toskanogd , hysunogd , konganogd ,gullaksnogd , longunogd , blalongunogd , brosmunogd) I could use first : first - gsub(cpue , , string) and then : second - gsub(nogd , , first) Can it be done at once? Thanks in advance version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsub in data frame
Hello, I have this data frame: ### begin d -data.frame(matrix(c(1,--,bla,2),2,2)) d # I want to replace the -- by \N and still get a data frame. # I tried: out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of factors out cat(out) # But I lost my data frame ### end Any idea? Regards, Pierre Lapointe ** AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub in data frame
Hi On 5 Apr 2006 at 7:48, Lapointe, Pierre wrote: From: Lapointe, Pierre [EMAIL PROTECTED] To: 'r-help@stat.math.ethz.ch' r-help@stat.math.ethz.ch Date sent: Wed, 5 Apr 2006 07:48:33 -0400 Subject:[R] gsub in data frame Hello, I have this data frame: ### begin d -data.frame(matrix(c(1,--,bla,2),2,2)) d # I want to replace the -- by \N and still get a data frame. # I tried: out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of factors out cat(out) # But I lost my data frame ### end Any idea? re formate it back? data.frame(matrix(out,2,2)) X1 X2 1 1 bla 2 \\N 2 HTH Petr Regards, Pierre Lapointe ** AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub in data frame
On Wed, 5 Apr 2006, Lapointe, Pierre wrote: Hello, I have this data frame: ### begin d -data.frame(matrix(c(1,--,bla,2),2,2)) d So d is a two-column data frame with factor columns. # I want to replace the -- by \N and still get a data frame. levels(d$X1) - gsub(--,N, levels(d$X1)) # I tried: out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of factors out cat(out) # But I lost my data frame ### end Any idea? Regards, Pierre Lapointe ** AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gsub syntax
Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn John Logsdon Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler [EMAIL PROTECTED] [EMAIL PROTECTED] +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
you could use something like: dates - c(73, 74, 02, 1973, 1974, 2002) ### nd - nchar(dates) substr(dates, ifelse(nd == 2, 1, 3), nd) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: John Logsdon [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Sunday, November 27, 2005 11:04 AM Subject: [R] gsub syntax Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn John Logsdon Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler [EMAIL PROTECTED] [EMAIL PROTECTED] +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
John Logsdon wrote: Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn Hi, John, I cannot comment on your questions since I'm no regexpr guru. However, it seems to me you can do the following instead: gsub(.*([0-9][0-9]), \\1, dates) This works fine on Linux Windows, R-2.2.0. HTH, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
On 11/27/05, John Logsdon [EMAIL PROTECTED] wrote: Hello I know that R's string functions are not as extensive as those of Unix but I don't think this statement is true although I have seen it repeated. I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Free versions of these utilities are available for Windows although they don't come with Windows. e.g. Google for gawk. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? It works the same on my system which is 2.2.0 Windows patched (2005-10-24). At first I too thought it was a bug but I noticed it works the same in perl so now I am not sure. The following perl program under Windows using perl 5.8.6 on Windows gives 002 as the answer as the answer too: $_ = 2002; s/[19|20]([0-9])([0-9])/\1\2/g; print; In any any case, it could be done like this: sub(.*(..)$, \\1, dates) or substring(dates, nchar(dates)-1) or the following which appends -01-01 to the year, converts it to Date class, implicitly converts it back to character and then extracts the 3rd to 4th character of the result: substring(as.Date(sprintf(%s-01-01, dates)), 3, 4) or __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
R is blameless here: it works as documented and in the same way as POSIX tools. It agrees with 'sed' using the same syntax (modulo the shell-specific quoting rules) e.g. in csh % echo 1973 | sed 's/[19|20]\([0-9][0-9]\)/\1/g' 973 % echo 1973 | sed 's/\([19|20]\)\([0-9][0-9]\)/-\1-\2-/g' -1-97-3 % echo 73 74 02 1973 1974 2002 | sed 's/[19|20]\([0-9][0-9]\)/\1/g' 73 74 02 973 974 002 so what happened when you were 'comparing with sed'? [19|20] is a character class (containing five characters) matching one character, not a match for two characters as you seem to imagine. It does not mean the same as 19|20, which is what you seem to have intended (and you seem only to want to do the substitution once on each string, so why use gsub?): sub(19|20([0-9][0-9]), \\1, dates) [1] 73 74 02 73 74 02 A more direct way which would work e.g. for 1837 would be sub(.*([0-9]{2}$), \\1, dates) or even better (locale-independent) sub(.*([[:digit:]]{2}$), \\1, dates) Current versions of R have a help page ?regexp explaining what regexps are. Even 2.0.1 did, although you were asked to update *before* posting (see the posting guide). It was unambiguous: A _character class_ is a list of characters enclosed by '[' and ']' matches any single character in that list ... ^^ ... Note that alternation does not work inside character classes, where \code{|} has its literal meaning. On Sun, 27 Nov 2005, John Logsdon wrote: Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: Why 'should' it work in a different way to that documented? substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? Yes. 2) Was it a bug that has since been corrected? Unfortunately the bug reported two years ago in library(fortunes); fortune(WTFM) still seems extant. See the posting guide for advice on how to correct it. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gsub pattern?
Hi, search in web for regular expressions i get the information that the line below replace all AUTO string's like AUTOBAHN ,AUTORENNEN with 1 but nothing happend. Using the [] in the pattern it works like i'm expected, but i didn't want single character replacment. Where is my mistake? bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T) many thanks regards, christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub pattern?
Christian Schulz wrote: Where is my mistake? bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T) You dont need the slashes! You've been looking at documentation for Perl regular expression replacements, I guess. help(gsub) may have showed you the way. Here's how to do it: MyStringVector=c(AUTOBAHN,NAUTON,FOO,AUTOGRAPH) # wrong way: gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T) [1] AUTOBAHN NAUTONFOO AUTOGRAPH # dont slash all over the regexp: gsub(^AUTO.*,1,MyStringVector,ignore.case=T,extended=T) [1] 1 NAUTON FOO1 Is that what you're after? Baz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub pattern?
Christian Schulz [EMAIL PROTECTED] writes: Hi, search in web for regular expressions i get the information that the line below replace all AUTO string's like AUTOBAHN ,AUTORENNEN with 1 but nothing happend. Using the [] in the pattern it works like i'm expected, but i didn't want single character replacment. Where is my mistake? bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T) What are the /-es for? -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub pattern?
many thanks ..the different styles from linux to r-project a little confusing for me :-( christian Peter Dalgaard wrote: Christian Schulz [EMAIL PROTECTED] writes: Hi, search in web for regular expressions i get the information that the line below replace all AUTO string's like AUTOBAHN ,AUTORENNEN with 1 but nothing happend. Using the [] in the pattern it works like i'm expected, but i didn't want single character replacment. Where is my mistake? bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T) What are the /-es for? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub() on Matrix
Many more recent regular expression implementations have ways of indicating a match on a word boundary. It's usually \b. Here's what you did: gsub(x1, i1, x1 + x2 + x10 + xx1) [1] i1 + x2 + i10 + xi1 The following worked for me to just change x1 to i1, while leaving alone any larger word that contains x1: gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1) [1] i1 + x2 + x10 + xx1 Note that the backslash must be escaped itself to get past the R lexical analyser, which is independent of the regexp processor. What the regexp processor sees is just a single backslash. For more on this, look for perl documentation of regular expressions. Be aware that to use full perl regexps, you must supply the perl=T argument to gsub(). Also note that \b seems to be part of the most basic regular expression language in R; it even works with extended=F: gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=T) [1] i1 + x2 + x10 + xx1 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F) [1] i1 + x2 + x10 + xx1 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F, ext=F) [1] i1 + x2 + x10 + xx1 (I assumed the fact that you have a matrix of strings is not relevant.) Hope this helps, Tony Plate At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote: Hi, Suppose I've got a matrix, and the first few elements look like x1 + x3 + x4 + x5 + x1:x3 + x1:x4 x1 + x2 + x3 + x5 + x1:x2 + x1:x5 x1 + x3 + x4 + x5 + x1:x3 + x1:x5 and so on (have got terms from x1 ~ x14). If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13, for example. Is there an easy way? I tried to put what I want to replace in a vector, like: repl = c(i7, i14, i13, d2, i8, i5, i6, i3, A, i9, i2, i4, i15, i21) and have another vector, say: orig [1] x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 [11] x11 x12 x13 x14 Then I tried something like gsub(orig, repl, mat) ## mat is the name of my matrix but it didn't work *_*.it would replace terms like x10 with i70. (I know it may be an easy question...but I haven't done much regular expression) Cheers, Kevin Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub() on Matrix
Tony Plate [EMAIL PROTECTED] writes: Many more recent regular expression implementations have ways of indicating a match on a word boundary. It's usually \b. Another idea is that if what you need is something that is parseable as a model formula RHS, then you might want to parse first and substitute later. Something along these lines: e - parse(text=x1 + x3 + x4 + x5 + x1:x3 + x1:x4)[[1]] repl = lapply(c(i7, i14, i13, d2, i8, i5),as.name) names(repl)-paste(x,1:6,sep=) eval(substitute(substitute(e,repl),list(e=e))) i7 + i13 + d2 + i8 + i7:i13 + i7:d2 At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote: Suppose I've got a matrix, and the first few elements look like x1 + x3 + x4 + x5 + x1:x3 + x1:x4 x1 + x2 + x3 + x5 + x1:x2 + x1:x5 x1 + x3 + x4 + x5 + x1:x3 + x1:x5 and so on (have got terms from x1 ~ x14). If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13, for example. Is there an easy way? I tried to put what I want to replace in a vector, like: repl = c(i7, i14, i13, d2, i8, i5, i6, i3, A, i9, i2, i4, i15, i21) and have another vector, say: orig [1] x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 [11] x11 x12 x13 x14 Then I tried something like gsub(orig, repl, mat) ## mat is the name of my matrix but it didn't work *_*.it would replace terms like x10 with i70. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gsub
Hi A while back I used gsub to do the following temp-000US00231 gsub(something here, , temp) 00231 I think it involved the `meta characters' somehow. I do not know how to do this anymore. I know strsplit will also work but I remember gsub was much faster. In essence the question is how to delete all characters before a particular pattern. If anyone has some help file for this, it will be greatly appreciated. Jean Eid __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub
You might want to look at ?regex. Sean On Sep 23, 2004, at 10:03 AM, Jean Eid wrote: Hi A while back I used gsub to do the following temp-000US00231 gsub(something here, , temp) 00231 I think it involved the `meta characters' somehow. I do not know how to do this anymore. I know strsplit will also work but I remember gsub was much faster. In essence the question is how to delete all characters before a particular pattern. If anyone has some help file for this, it will be greatly appreciated. Jean Eid __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub
Jean Eid jeaneid at chass.utoronto.ca writes: : : Hi : : A while back I used gsub to do the following : : temp-000US00231 : gsub(something here, , temp) : 00231 : : I think it involved the `meta characters' somehow. : : I do not know how to do this anymore. I know strsplit will also work but I : remember gsub was much faster. In essence the question is how to delete : all characters before a particular pattern. : : If anyone has some help file for this, it will be greatly appreciated. : I think you want sub in this case, not gsub. There are many possibilities here depending on what the general case is. The following all give the desired result for the example but their general cases differ. These are just some of the numerous variations possible. temp-000US00231 sub(.*US, , temp) sub(.*S, , temp) sub([[:digit:]]*[[:alpha:]]*, , temp) sub(.*[[:alpha:]], , temp) sub(.*[[:alpha:]][[:alpha:]], , temp) sub(.*[[:upper:]], , temp) sub(.*[[:upper:]][[:upper:]], , temp) sub(., , temp) substring(temp, 6) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub
Thank you all for the help, specially Gabor that is exactly what I needed. A few examples that do the same thing is very helpful in understanding the structure of the call. Thank you again, Jean On Thu, 23 Sep 2004, Gabor Grothendieck wrote: Jean Eid jeaneid at chass.utoronto.ca writes: : : Hi : : A while back I used gsub to do the following : : temp-000US00231 : gsub(something here, , temp) : 00231 : : I think it involved the `meta characters' somehow. : : I do not know how to do this anymore. I know strsplit will also work but I : remember gsub was much faster. In essence the question is how to delete : all characters before a particular pattern. : : If anyone has some help file for this, it will be greatly appreciated. : I think you want sub in this case, not gsub. There are many possibilities here depending on what the general case is. The following all give the desired result for the example but their general cases differ. These are just some of the numerous variations possible. temp-000US00231 sub(.*US, , temp) sub(.*S, , temp) sub([[:digit:]]*[[:alpha:]]*, , temp) sub(.*[[:alpha:]], , temp) sub(.*[[:alpha:]][[:alpha:]], , temp) sub(.*[[:upper:]], , temp) sub(.*[[:upper:]][[:upper:]], , temp) sub(., , temp) substring(temp, 6) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gsub, backslash and xtable
R Version 1.9.1 (2004-06-21) Mac OS X.3.5 Dual 2GHz PowerPC G5 GUI = AQUA I have a data.frame comprising percentiles with the column headings containing % characters, e.g. (pp - colnames(temp2)) [1] 5% 10% 25% 50% 75% 90% 95% I use xtable to convert the data.frame to Latex but I want to protect these % signs from Latex using a backslash in the normal way before calling xtable. I have tried using gsub as follows; gsub(\%,\\%,pp) [1] 5% 10% 25% 50% 75% 90% 95% also gsub(%,\134%,pp) #octal for backslash [1] 5% 10% 25% 50% 75% 90% 95% Both of which fail to provide what I need. I verified my 'regexps' using awk under Darwin thus; $ cat fred 5% 10% 25% 50% 75% 90% 95% $ awk '{gsub(/%/,\\%); print}' fred 5\% 10\% 25\% 50\% 75\% 90\% 95\% and $ awk '{gsub(/%/,\134%); print}' fred 5\% 10\% 25\% 50\% 75\% 90\% 95\% As a possble 'work around', I noticed that, chartr(z,\,gsub(%,z%,pp)) Error: syntax error chartr(z,\\,gsub(%,z%,pp)) [1] 5\\% 10\\% 25\\% 50\\% 75\\% 90\\% 95\\% chartr(z,\134,gsub(%,z%,pp)) [1] 5\\% 10\\% 25\\% 50\\% 75\\% 90\\% 95\\% As the xtable is then 'catted' to a file and read back (vide infra) I actually end up with what I want using the latter example. However I am very much left with the feeling that R is in control of me rather than vise versa. Secondly, as I am building up a character vector of sentences, tables and figures, I wanted to convert my xtable output to a character vector with newline separators. I have only able to accomplish this by printing to a temporary file thus, theTx - \\documentclass[A4paper,10pt]{article} . . theTx - paste(theTx, paste_xtable(temp2,Percentiles for scores), sep = ) . theTx - paste(theTx,\n ,\\end{document},\n, sep = ) . cat(theTx) #into a file for Latex ## with my past_xtable function being ## paste_xtable - function(a_table, cap) { sink(file = levzz, append = FALSE, type = output) print(xtable(a_table,caption=cap)) sink() #read it back temp - readLines(levzz, n=-1) #note / get doubled automaticaly unlink(levzz) #delete file a - \n for (i in 1:length(temp)) { a - paste(a,temp[i],\n,sep = ) } return(a) } I should be most grateful for a more elegant solutions to both these issues or a pointer to the documentation. Paul __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub, backslash and xtable
On Fri, 27 Aug 2004, P. B. Pynsent wrote: R Version 1.9.1 (2004-06-21) Mac OS X.3.5 Dual 2GHz PowerPC G5 GUI = AQUA I have a data.frame comprising percentiles with the column headings containing % characters, e.g. (pp - colnames(temp2)) [1] 5% 10% 25% 50% 75% 90% 95% I use xtable to convert the data.frame to Latex but I want to protect these % signs from Latex using a backslash in the normal way before calling xtable. I have tried using gsub as follows; gsub(\%,\\%,pp) [1] 5% 10% 25% 50% 75% 90% 95% also gsub(%,\134%,pp) #octal for backslash [1] 5% 10% 25% 50% 75% 90% 95% Both of which fail to provide what I need. Remember you need to double \ in R character strings (in the FAQ, for example, and in ?regex): gsub(%,%,pp) [1] 5\\% 10\\% 25\\% 50\\% 75\\% 90\\% 95\\% cat(gsub(%,%,pp), \n) 5\% 10\% 25\% 50\% 75\% 90\% 95\% -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub, backslash and xtable
P. B. Pynsent [EMAIL PROTECTED] writes: I have a data.frame comprising percentiles with the column headings containing % characters, e.g. (pp - colnames(temp2)) [1] 5% 10% 25% 50% 75% 90% 95% I use xtable to convert the data.frame to Latex but I want to protect these % signs from Latex using a backslash in the normal way before calling xtable. I have tried using gsub as follows; gsub(\%,\\%,pp) [1] 5% 10% 25% 50% 75% 90% 95% also ... However I am very much left with the feeling that R is in control of me rather than vise versa. The generic rule for backslashes is that you need twice as many as you thought: p [1] 25% gsub(%,%,p) [1] 25\\% cat(gsub(%,%,p),\n) 25\% The thing that people usually forget is that you have two levels of escaping, one in R's string parser and another one in the regexp machinery. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub, backslash and xtable
Peter Dalgaard wrote: ``The generic rule for backslashes is that you need twice as many as you thought'' And you have to apply that rule recursively! :-) cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html