[R] gsub regexp question

2007-01-27 Thread Phillimore, Albert
Dear R Users,
 
I am trying to users gsub to remove multiple cases of square brackets and their 
different contents in a character string. A sample of such a string is shown 
below. However, I am having great difficulty understanding regexp syntax. Any 
help is greatly appreciated.
 
Ally
 
tree STATE_286000 [lnP=-12708.453945423369] = [R] 
((15[rate=0.009761226401396686]:7.040851727747465,17[rate=0.011500289631135564]:7.040851727747465)[rate=0.010986570567484494]:2.257049446900292,(18[rate=0.009123432243563103]:2.461289418776003,19[rate=0.00981822432115329]:2.461289418776003)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub regexp question

2007-01-27 Thread Charilaos Skiadas
On Jan 27, 2007, at 3:41 PM, Phillimore, Albert wrote:

 Dear R Users,

 I am trying to users gsub to remove multiple cases of square  
 brackets and their different contents in a character string. A  
 sample of such a string is shown below. However, I am having great  
 difficulty understanding regexp syntax. Any help is greatly  
 appreciated.

 Ally

 tree STATE_286000 [lnP=-12708.453945423369] = [R] ((15 
 [rate=0.009761226401396686]:7.040851727747465,17 
 [rate=0.011500289631135564]:7.040851727747465) 
 [rate=0.010986570567484494]:2.257049446900292,(18 
 [rate=0.009123432243563103]:2.461289418776003,19 
 [rate=0.00981822432115329]:2.461289418776003)

Is this what you want? I tend to prefer perl regular expressions:

  str - tree STATE_286000 [lnP=-12708.453945423369] = [R]  
((15[rate=0.009761226401396686]:7.040851727747465,17 
[rate=0.011500289631135564]:7.040851727747465) 
[rate=0.010986570567484494]:2.257049446900292,(18 
[rate=0.009123432243563103]:2.461289418776003,19 
[rate=0.00981822432115329]:2.461289418776003)
  gsub(\\[[^\\]]+\\],,str, perl=T)
[1] tree STATE_286000  =   
((15:7.040851727747465,17:7.040851727747465):2.257049446900292, 
(18:2.461289418776003,19:2.461289418776003)


As an explanation, \\[ and \\] match the two square brackets you  
want. We need to escape the brackets with the backslashes because  
they have a special meaning in perl regular expressions.

In perl regexps, [] stands for match a single character that  
is like what we have in the  For instance [ab] will match an a or  
a b. [a-z] will match all lowercase characters. A ^ as a first  
character in there means match all but what follows. for instance  
[^a-z] means match anything but lowercase characters. So [^\\]] means  
match any character but a closing bracket.

Finally the plus sign afterwards means: match at least one. So [^\\]] 
+ means match any sequence of characters that does not contain a  
closing bracket. So the whole thing now matches an opening bracket,  
followed by all characters until a corresponding closing bracket.  
This will not work if you have nested pairs of brackets, [like [so]].  
That is a tad more delicate, and we can discuss it if you really need  
to deal with it.

Haris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.