On 02-03-2012, at 14:13, Roey Angel wrote:

> Hi Bernard, thanks for the quick reply.
> Of course, I understand that an escape is needed because parenthesis are 
> reserved symbols in regular expressions.
> My problem is that if I just use \( I get the error:
> 
> Error: '\(' is an unrecognized escape in character string starting "\("
> 
> so in order to get a literal ( I need to use \\(
> which is odd cause I've never encountered that in any other language and also 
> all the R manuals dont mention that.
> 

It is not odd as the previous poster has already mentioned.

I have encountered this (e.g. awk).

You need the \\ because the expression between tour quotes is interpreted twice:
once and first as a character string (in which \( is illegal but \\ is legal) 
and then as a regular expression in which you want to match a literal ( and ) 
which must be escaped in the regular expression since they are meta characters.

If you don't like doing that (the \\) use this instead

as.data.frame(apply(tax.data, 2, function(x) gsub('[(].*[)]','',x)))

i.e. put the ( and ) in a character class.

Berend



>> On 02-03-2012, at 09:36, Roey Angel wrote:
>> 
>>> Hi,
>>> I was recently misfortunate enough to have to use regular expressions to 
>>> sort out some data in R.
>>> I'm working on a data file which contains taxonomical data of bacteria in 
>>> hierarchical order.
>>> A sample of this file can be generated using:
>>> 
>>> tax.data<- read.table(header=F, con<- textConnection('
>>> G9SS7BA01D15EC  Bacteria(100)    Cyanobacteria(84)    unclassified
>>> G9SS7BA01C9UIR    Bacteria(100)    Proteobacteria(94)    
>>> Alphaproteobacteria(89)
>>> G9SS7BA01CM00D    Bacteria(100)    Proteobacteria(99)    
>>> Alphaproteobacteria(99)
>>> '))
>>> close(con)
>>> 
>>> What I try to do is to remove the parenthesis and the number inside (which 
>>> could contain a decimal point)
>>> I assumed that the following command would solve it, but instead I got an 
>>> error.
>>> 
>>> tax.data<- as.data.frame(apply(tax.data, 2, function(x) 
>>> gsub('\(.*\)','',x)))
>>> Error: '\(' is an unrecognized escape in character string starting "\("
>>> 
>>> And it doesn't matter if I use perl = TRUE or not.
>>> To solve it I need to use a double escape sign '\\' before opening and 
>>> closing the parenthesis:
>>> 
>>> tax.data<- as.data.frame(apply(tax.data, 2, function(x) 
>>> gsub('\\(.*\\)','',x)))
>>> 
>>> This yields the desired result but I wonder why it does that?
>>> No other regular expression system I'm used to (e.g. Perl, Shell) works 
>>> like that.
>>> 
>>> I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and 
>>> win XP.
>>> 
>>> I'd appreciate any explanation.
>> Section "Character vectors" in the R Intro manual.
>> 
>> ?Quotes
>> 
>> The regular expression is provided as a string to gsub. In strings there are 
>> escape sequences.
>> To get the \ as a single \ to the regular expression parser it has to be 
>> \-ed in the string stage: \\
>> 
>> Berend
>> 
>> 
> <angel.vcf>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to