On 02-03-2012, at 14:13, Roey Angel wrote: > Hi Bernard, thanks for the quick reply. > Of course, I understand that an escape is needed because parenthesis are > reserved symbols in regular expressions. > My problem is that if I just use \( I get the error: > > Error: '\(' is an unrecognized escape in character string starting "\(" > > so in order to get a literal ( I need to use \\( > which is odd cause I've never encountered that in any other language and also > all the R manuals dont mention that. >
It is not odd as the previous poster has already mentioned. I have encountered this (e.g. awk). You need the \\ because the expression between tour quotes is interpreted twice: once and first as a character string (in which \( is illegal but \\ is legal) and then as a regular expression in which you want to match a literal ( and ) which must be escaped in the regular expression since they are meta characters. If you don't like doing that (the \\) use this instead as.data.frame(apply(tax.data, 2, function(x) gsub('[(].*[)]','',x))) i.e. put the ( and ) in a character class. Berend >> On 02-03-2012, at 09:36, Roey Angel wrote: >> >>> Hi, >>> I was recently misfortunate enough to have to use regular expressions to >>> sort out some data in R. >>> I'm working on a data file which contains taxonomical data of bacteria in >>> hierarchical order. >>> A sample of this file can be generated using: >>> >>> tax.data<- read.table(header=F, con<- textConnection(' >>> G9SS7BA01D15EC Bacteria(100) Cyanobacteria(84) unclassified >>> G9SS7BA01C9UIR Bacteria(100) Proteobacteria(94) >>> Alphaproteobacteria(89) >>> G9SS7BA01CM00D Bacteria(100) Proteobacteria(99) >>> Alphaproteobacteria(99) >>> ')) >>> close(con) >>> >>> What I try to do is to remove the parenthesis and the number inside (which >>> could contain a decimal point) >>> I assumed that the following command would solve it, but instead I got an >>> error. >>> >>> tax.data<- as.data.frame(apply(tax.data, 2, function(x) >>> gsub('\(.*\)','',x))) >>> Error: '\(' is an unrecognized escape in character string starting "\(" >>> >>> And it doesn't matter if I use perl = TRUE or not. >>> To solve it I need to use a double escape sign '\\' before opening and >>> closing the parenthesis: >>> >>> tax.data<- as.data.frame(apply(tax.data, 2, function(x) >>> gsub('\\(.*\\)','',x))) >>> >>> This yields the desired result but I wonder why it does that? >>> No other regular expression system I'm used to (e.g. Perl, Shell) works >>> like that. >>> >>> I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and >>> win XP. >>> >>> I'd appreciate any explanation. >> Section "Character vectors" in the R Intro manual. >> >> ?Quotes >> >> The regular expression is provided as a string to gsub. In strings there are >> escape sequences. >> To get the \ as a single \ to the regular expression parser it has to be >> \-ed in the string stage: \\ >> >> Berend >> >> > <angel.vcf> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.