[R] gsub warning message

2007-08-31 Thread Talbot Katz
Hi.

I am using R 2.5.1 on a Windows XP machine.  Here is an example of a piece 
of code I was running in older versions of R on the same machine.  I am 
looking for underscores and replacing them with periods.  This result is 
from R 2.4.1:

gsub ( \\_+,\.,AAA_I)
[1] AAA.I


Here is what I get in R 2.5.1:

gsub ( \\_+,\.,AAA_I)
[1] AAA.I
Warning messages:
1: '\.' is an unrecognized escape in a character string
2: unrecognized escape removed from \.


I still get the same result, which is what I want, but now I get a warning 
message.  Am I actually doing something wrong that the previous versions of 
R didn't warn me about?  Or is this warning message unwarranted?  Is there a 
fully approved method for getting the same functionality?  Thanks!

--  TMK  --
212-460-5430home
917-656-5351cell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub warning message

2007-08-31 Thread Uwe Ligges


Talbot Katz wrote:
 Hi.
 
 I am using R 2.5.1 on a Windows XP machine.  Here is an example of a piece 
 of code I was running in older versions of R on the same machine.  I am 
 looking for underscores and replacing them with periods.  This result is 
 from R 2.4.1:
 
 gsub ( \\_+,\.,AAA_I)
 [1] AAA.I
 
 Here is what I get in R 2.5.1:
 
 gsub ( \\_+,\.,AAA_I)
 [1] AAA.I
 Warning messages:
 1: '\.' is an unrecognized escape in a character string
 2: unrecognized escape removed from \.
 
 I still get the same result, which is what I want, but now I get a warning 
 message.  Am I actually doing something wrong that the previous versions of 
 R didn't warn me about?  Or is this warning message unwarranted?  Is there a 
 fully approved method for getting the same functionality?  Thanks!

Yes, correct usage is either
   gsub ( \\_+, ., AAA_I)
or
   gsub ( \\_+, \\., AAA_I)

Uwe Ligges



 --  TMK  --
 212-460-5430  home
 917-656-5351  cell
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub warning message

2007-08-31 Thread Talbot Katz
Thank you for the swift response.  It looks like the code works the same way 
with or without the \\ in either the search string: { \\_+ or _+ }  or 
the replacement string: { \\. or . }.  I tested this in Windows and 
Linux (although we're still on R 2.4.1 in Linux).  It's not clear to me why 
I can use either two slashes or no slash safely, but not one slash, and it 
makes me vaguely uneasy.  Obviously, I need to review regular expressions, 
but my usual sources, such as http://perldoc.perl.org/perlre.html, don't 
seem to address this issue.  I wonder whether there's a good document 
explaining this.

--  TMK  --
212-460-5430home
917-656-5351cell


From: Uwe Ligges [EMAIL PROTECTED]
To: Talbot Katz [EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: Re: [R] gsub warning message
Date: Fri, 31 Aug 2007 18:04:39 +0200



Talbot Katz wrote:
Hi.

I am using R 2.5.1 on a Windows XP machine.  Here is an example of a piece 
of code I was running in older versions of R on the same machine.  I am 
looking for underscores and replacing them with periods.  This result is 
from R 2.4.1:

gsub ( \\_+,\.,AAA_I)
[1] AAA.I

Here is what I get in R 2.5.1:

gsub ( \\_+,\.,AAA_I)
[1] AAA.I
Warning messages:
1: '\.' is an unrecognized escape in a character string
2: unrecognized escape removed from \.

I still get the same result, which is what I want, but now I get a warning 
message.  Am I actually doing something wrong that the previous versions 
of R didn't warn me about?  Or is this warning message unwarranted?  Is 
there a fully approved method for getting the same functionality?  Thanks!

Yes, correct usage is either
   gsub ( \\_+, ., AAA_I)
or
   gsub ( \\_+, \\., AAA_I)

Uwe Ligges



--  TMK  --
212-460-5430  home
917-656-5351  cell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub warning message

2007-08-31 Thread Greg Snow
What is happening is that before the regex engine can look at your
pattern, the R string parsing routines first process your input as a
string.  In the string processing there are certain things represented
using a backslash.  Try this code in R:

 cat('here\tthere\n')

The \t is made into a tab and the \n is made into a newline.  If you
want the actuall backslash you need \\:

 cat('here\\tthere\n')

So if you want the regex engine to see \. (which means a literal dot)
then you need to say \\. So that the string processing sees \\ and
converts it to \ to pass to the regex engine.  If you say \. Then it
looks in its table where it knows what to do with \t, \n, and others,
but \. Is not there (it is meaningful to regexs but not string
proccessing), so gives you the warning.  For your example you are using
it in the replacement portion where the \ in front of . Does not do
anything, which is why either works.  If you are using it in the pattern
to match, then \\. (which gets reduced to \.) matches a . (dot
character) while . (without \) matches any single character (with some
possible exceptions), so in some cases it may give different results.

Hope this helps,



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz
 Sent: Friday, August 31, 2007 12:30 PM
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] gsub warning message
 
 Thank you for the swift response.  It looks like the code 
 works the same way with or without the \\ in either the 
 search string: { \\_+ or _+ }  or the replacement string: 
 { \\. or . }.  I tested this in Windows and Linux 
 (although we're still on R 2.4.1 in Linux).  It's not clear 
 to me why I can use either two slashes or no slash safely, 
 but not one slash, and it makes me vaguely uneasy.  
 Obviously, I need to review regular expressions, but my usual 
 sources, such as http://perldoc.perl.org/perlre.html, don't 
 seem to address this issue.  I wonder whether there's a good 
 document explaining this.
 
 --  TMK  --
 212-460-5430  home
 917-656-5351  cell
 
 
 From: Uwe Ligges [EMAIL PROTECTED]
 To: Talbot Katz [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Subject: Re: [R] gsub warning message
 Date: Fri, 31 Aug 2007 18:04:39 +0200
 
 
 
 Talbot Katz wrote:
 Hi.
 
 I am using R 2.5.1 on a Windows XP machine.  Here is an 
 example of a 
 piece of code I was running in older versions of R on the same 
 machine.  I am looking for underscores and replacing them with 
 periods.  This result is from R 2.4.1:
 
 gsub ( \\_+,\.,AAA_I)
 [1] AAA.I
 
 Here is what I get in R 2.5.1:
 
 gsub ( \\_+,\.,AAA_I)
 [1] AAA.I
 Warning messages:
 1: '\.' is an unrecognized escape in a character string
 2: unrecognized escape removed from \.
 
 I still get the same result, which is what I want, but now I get a 
 warning message.  Am I actually doing something wrong that the 
 previous versions of R didn't warn me about?  Or is this warning 
 message unwarranted?  Is there a fully approved method for 
 getting the same functionality?  Thanks!
 
 Yes, correct usage is either
gsub ( \\_+, ., AAA_I)
 or
gsub ( \\_+, \\., AAA_I)
 
 Uwe Ligges
 
 
 
 --  TMK  --
 212-460-5430home
 917-656-5351cell
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub warning message

2007-08-31 Thread Talbot Katz
Ah, I think I'm beginning to see the light.  Just to complete the final 
thought... the \ is superfluous with the _ character, so \\_+ gets 
passed to regex as \_+ and the \ is ignored in the search; it also would 
be ignored in a replacement.  However, as you remarked, . and \. act 
differently in a search but the same in a replacement.  I hope I have that 
straight now.  Thanks much!

--  TMK  --
212-460-5430home
917-656-5351cell



From: Greg Snow [EMAIL PROTECTED]
To: Talbot Katz [EMAIL PROTECTED],[EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: RE: [R] gsub warning message
Date: Fri, 31 Aug 2007 12:41:37 -0600

What is happening is that before the regex engine can look at your
pattern, the R string parsing routines first process your input as a
string.  In the string processing there are certain things represented
using a backslash.  Try this code in R:

  cat('here\tthere\n')

The \t is made into a tab and the \n is made into a newline.  If you
want the actuall backslash you need \\:

  cat('here\\tthere\n')

So if you want the regex engine to see \. (which means a literal dot)
then you need to say \\. So that the string processing sees \\ and
converts it to \ to pass to the regex engine.  If you say \. Then it
looks in its table where it knows what to do with \t, \n, and others,
but \. Is not there (it is meaningful to regexs but not string
proccessing), so gives you the warning.  For your example you are using
it in the replacement portion where the \ in front of . Does not do
anything, which is why either works.  If you are using it in the pattern
to match, then \\. (which gets reduced to \.) matches a . (dot
character) while . (without \) matches any single character (with some
possible exceptions), so in some cases it may give different results.

Hope this helps,



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz
  Sent: Friday, August 31, 2007 12:30 PM
  To: [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] gsub warning message
 
  Thank you for the swift response.  It looks like the code
  works the same way with or without the \\ in either the
  search string: { \\_+ or _+ }  or the replacement string:
  { \\. or . }.  I tested this in Windows and Linux
  (although we're still on R 2.4.1 in Linux).  It's not clear
  to me why I can use either two slashes or no slash safely,
  but not one slash, and it makes me vaguely uneasy.
  Obviously, I need to review regular expressions, but my usual
  sources, such as http://perldoc.perl.org/perlre.html, don't
  seem to address this issue.  I wonder whether there's a good
  document explaining this.
 
  --  TMK  --
  212-460-5430home
  917-656-5351cell
 
 
  From: Uwe Ligges [EMAIL PROTECTED]
  To: Talbot Katz [EMAIL PROTECTED]
  CC: r-help@stat.math.ethz.ch
  Subject: Re: [R] gsub warning message
  Date: Fri, 31 Aug 2007 18:04:39 +0200
  
  
  
  Talbot Katz wrote:
  Hi.
  
  I am using R 2.5.1 on a Windows XP machine.  Here is an
  example of a
  piece of code I was running in older versions of R on the same
  machine.  I am looking for underscores and replacing them with
  periods.  This result is from R 2.4.1:
  
  gsub ( \\_+,\.,AAA_I)
  [1] AAA.I
  
  Here is what I get in R 2.5.1:
  
  gsub ( \\_+,\.,AAA_I)
  [1] AAA.I
  Warning messages:
  1: '\.' is an unrecognized escape in a character string
  2: unrecognized escape removed from \.
  
  I still get the same result, which is what I want, but now I get a
  warning message.  Am I actually doing something wrong that the
  previous versions of R didn't warn me about?  Or is this warning
  message unwarranted?  Is there a fully approved method for
  getting the same functionality?  Thanks!
  
  Yes, correct usage is either
 gsub ( \\_+, ., AAA_I)
  or
 gsub ( \\_+, \\., AAA_I)
  
  Uwe Ligges
  
  
  
  --  TMK  --
  212-460-5430  home
  917-656-5351  cell
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Ulrich Keller
I am trying to read a number of XML files using xmlTreeParse(). Unfortunately,
some of them are malformed in a way that makes R crash. The problem is that
closing tags are sometimes repeated like this:

tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag

I want to preprocess the contents of the XML file using gsub() before feeding
them to xmlTreeParse() to clean them up, but I can't figure out how to do it.
What I need is something that transforms the example above into:

tagvalue1/tagtagvalue2/tagtagvalue3/tag

Some kind of /tag.*/tag that only matches if there is no tag in .*.

Thanks in advance for you ideas,

Uli

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Peter Dalgaard
Ulrich Keller [EMAIL PROTECTED] writes:

 I am trying to read a number of XML files using xmlTreeParse(). Unfortunately,
 some of them are malformed in a way that makes R crash. The problem is that
 closing tags are sometimes repeated like this:

 tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag

 I want to preprocess the contents of the XML file using gsub() before feeding
 them to xmlTreeParse() to clean them up, but I can't figure out how to do it.
 What I need is something that transforms the example above into:

 tagvalue1/tagtagvalue2/tagtagvalue3/tag

 Some kind of /tag.*/tag that only matches if there is no tag in 
 .*.

 Thanks in advance for you ideas,

Hmm, there are things you just cannot do with RE's, and I suspect that
this is one of them. Something involving explicit splitting of the
strings might work, though. How's this for size?

 trim -
function(x)paste(sub(/tag.*,/tag,x),collapse=tag)
 sapply(strsplit(x,tag),trim)
[1] tagvalue1/tagtagvalue2/tagtagvalue3/tag


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Marc Schwartz
On Sat, 2007-02-24 at 15:03 +0100, Peter Dalgaard wrote:
 Ulrich Keller [EMAIL PROTECTED] writes:
 
  I am trying to read a number of XML files using xmlTreeParse(). 
  Unfortunately,
  some of them are malformed in a way that makes R crash. The problem is that
  closing tags are sometimes repeated like this:
 
  tagvalue1/tagtagvalue2/tagsome garbage/tag/tagtagvalue3/tag
 
  I want to preprocess the contents of the XML file using gsub() before 
  feeding
  them to xmlTreeParse() to clean them up, but I can't figure out how to do 
  it.
  What I need is something that transforms the example above into:
 
  tagvalue1/tagtagvalue2/tagtagvalue3/tag
 
  Some kind of /tag.*/tag that only matches if there is no tag in 
  .*.
 
  Thanks in advance for you ideas,
 
 Hmm, there are things you just cannot do with RE's, and I suspect that
 this is one of them. Something involving explicit splitting of the
 strings might work, though. How's this for size?
 
  trim -
 function(x)paste(sub(/tag.*,/tag,x),collapse=tag)
  sapply(strsplit(x,tag),trim)
 [1] tagvalue1/tagtagvalue2/tagtagvalue3/tag

Does this work?

 XML
[1] tagvalue1/tagtagvalue2/tagsome 
garbage/tag/tagtagvalue3/tag


 gsub([^]*(/tag){2}, , XML)
[1] tagvalue1/tagtagvalue2/tagtagvalue3/tag


This looks for any characters != '' that precedes a /tag/tag
sequence. It replaces that with .

?

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Charilaos Skiadas
All these methods do assume that you don't have nested tag's, like so:

tagtagfoo/taguseful stuff/tagsome garbage/tag

For that you would really need a true parser. So I would double-check  
to make sure this doesn't happen.

Do you have any control on where those XML files are generated  
though? It sounds to me it might be easier to fix the utility  
generating those XML files, since it clearly is doing something wrong.

On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote:

 I assume tag is known.

 This removes any occurrence /tag.*/tag where .* does not
 contain tag or /tag.

 The regular expression, re, matches /tag, then does a greedy
 match (?U) for anything followed by /tag but uses a zero
 width lookahead subexpression (?=...) for the second /tag
 so that it it can be rematched again.  gsubfn in package
 gsubfn is like the usual gsub except that instead of
 replacing the match with a string it passes the match
 to function f and then replaces the match with the output
 of f.  See the gsubfn home page:
   http://code.google.com/p/gsubfn/
 and vignette.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Gabor Grothendieck
The _question_ assumed that, which is why the answers did too.

On 2/24/07, Charilaos Skiadas [EMAIL PROTECTED] wrote:
 All these methods do assume that you don't have nested tag's, like so:

 tagtagfoo/taguseful stuff/tagsome garbage/tag

 For that you would really need a true parser. So I would double-check
 to make sure this doesn't happen.

 Do you have any control on where those XML files are generated
 though? It sounds to me it might be easier to fix the utility
 generating those XML files, since it clearly is doing something wrong.

 On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote:

  I assume tag is known.
 
  This removes any occurrence /tag.*/tag where .* does not
  contain tag or /tag.
 
  The regular expression, re, matches /tag, then does a greedy
  match (?U) for anything followed by /tag but uses a zero
  width lookahead subexpression (?=...) for the second /tag
  so that it it can be rematched again.  gsubfn in package
  gsubfn is like the usual gsub except that instead of
  replacing the match with a string it passes the match
  to function f and then replaces the match with the output
  of f.  See the gsubfn home page:
http://code.google.com/p/gsubfn/
  and vignette.

 Haris Skiadas
 Department of Mathematics and Computer Science
 Hanover College

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Charilaos Skiadas
On Feb 24, 2007, at 11:37 AM, Gabor Grothendieck wrote:

 The _question_ assumed that, which is why the answers did too.

Oh yes, I totally agree, the file snippet the OP provided did indeed  
assume that, though nothing in the text of his question did, so I  
wasn't entirely clear whether the actual file that is going to be  
processed has this form or not. So I just wanted to make sure the OP  
is aware of this limitation, in case the actual file is more  
problematic.

But most importantly, I wanted to suggest a reevaluation, if  
possible, of the process that generates these XML's, and perhaps  
fixing that, instead of patching the problem after it has been created.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub: replacing a.*a if no occurence of b in .*

2007-02-24 Thread Jeffrey Horner
Charilaos Skiadas wrote:
 On Feb 24, 2007, at 11:37 AM, Gabor Grothendieck wrote:
 
 The _question_ assumed that, which is why the answers did too.
 
 Oh yes, I totally agree, the file snippet the OP provided did indeed  
 assume that, though nothing in the text of his question did, so I  
 wasn't entirely clear whether the actual file that is going to be  
 processed has this form or not. So I just wanted to make sure the OP  
 is aware of this limitation, in case the actual file is more  
 problematic.
 
 But most importantly, I wanted to suggest a reevaluation, if  
 possible, of the process that generates these XML's, and perhaps  
 fixing that, instead of patching the problem after it has been created.

Also, I wouldn't tolerate R *crashing* in package code on malformed xml 
input.

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gsub regexp question

2007-01-27 Thread Phillimore, Albert
Dear R Users,
 
I am trying to users gsub to remove multiple cases of square brackets and their 
different contents in a character string. A sample of such a string is shown 
below. However, I am having great difficulty understanding regexp syntax. Any 
help is greatly appreciated.
 
Ally
 
tree STATE_286000 [lnP=-12708.453945423369] = [R] 
((15[rate=0.009761226401396686]:7.040851727747465,17[rate=0.011500289631135564]:7.040851727747465)[rate=0.010986570567484494]:2.257049446900292,(18[rate=0.009123432243563103]:2.461289418776003,19[rate=0.00981822432115329]:2.461289418776003)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub regexp question

2007-01-27 Thread Charilaos Skiadas
On Jan 27, 2007, at 3:41 PM, Phillimore, Albert wrote:

 Dear R Users,

 I am trying to users gsub to remove multiple cases of square  
 brackets and their different contents in a character string. A  
 sample of such a string is shown below. However, I am having great  
 difficulty understanding regexp syntax. Any help is greatly  
 appreciated.

 Ally

 tree STATE_286000 [lnP=-12708.453945423369] = [R] ((15 
 [rate=0.009761226401396686]:7.040851727747465,17 
 [rate=0.011500289631135564]:7.040851727747465) 
 [rate=0.010986570567484494]:2.257049446900292,(18 
 [rate=0.009123432243563103]:2.461289418776003,19 
 [rate=0.00981822432115329]:2.461289418776003)

Is this what you want? I tend to prefer perl regular expressions:

  str - tree STATE_286000 [lnP=-12708.453945423369] = [R]  
((15[rate=0.009761226401396686]:7.040851727747465,17 
[rate=0.011500289631135564]:7.040851727747465) 
[rate=0.010986570567484494]:2.257049446900292,(18 
[rate=0.009123432243563103]:2.461289418776003,19 
[rate=0.00981822432115329]:2.461289418776003)
  gsub(\\[[^\\]]+\\],,str, perl=T)
[1] tree STATE_286000  =   
((15:7.040851727747465,17:7.040851727747465):2.257049446900292, 
(18:2.461289418776003,19:2.461289418776003)


As an explanation, \\[ and \\] match the two square brackets you  
want. We need to escape the brackets with the backslashes because  
they have a special meaning in perl regular expressions.

In perl regexps, [] stands for match a single character that  
is like what we have in the  For instance [ab] will match an a or  
a b. [a-z] will match all lowercase characters. A ^ as a first  
character in there means match all but what follows. for instance  
[^a-z] means match anything but lowercase characters. So [^\\]] means  
match any character but a closing bracket.

Finally the plus sign afterwards means: match at least one. So [^\\]] 
+ means match any sequence of characters that does not contain a  
closing bracket. So the whole thing now matches an opening bracket,  
followed by all characters until a corresponding closing bracket.  
This will not work if you have nested pairs of brackets, [like [so]].  
That is a tad more delicate, and we can discuss it if you really need  
to deal with it.

Haris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gsub

2006-11-15 Thread Luis Ridao Cruz
R-help,

I want to remove the following strings
cpue and nogd

string - c(upsanogd ,toskanogd ,   hysunogd   ,  konganogd
  
 ,gullaksnogd , longunogd  ,  blalongunogd  , brosmunogd)

I could use first : first - gsub(cpue , , string)
and then : second - gsub(nogd , , first)

Can it be done at once?

Thanks in advance


 version
   _   
platform   i386-pc-mingw32 
arch   i386
os mingw32 
system i386, mingw32   
status 
major  2   
minor  4.0 
year   2006
month  10  
day03  
svn rev39566   
language   R   
version.string R version 2.4.0 (2006-10-03)


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub

2006-11-15 Thread Duncan Murdoch
On 11/15/2006 8:29 AM, Luis Ridao Cruz wrote:
 R-help,
 
 I want to remove the following strings
 cpue and nogd
 
 string - c(upsanogd ,toskanogd ,   hysunogd   ,  konganogd
   
  ,gullaksnogd , longunogd  ,  blalongunogd  , brosmunogd)
 
 I could use first : first - gsub(cpue , , string)
 and then : second - gsub(nogd , , first)
 
 Can it be done at once?

gsub(cpue|nogd, , string)

See ?regex for a description of the kinds of patterns R can use, in 
particular

Two regular expressions may be joined by the infix operator |; the 
resulting regular expression matches any string matching either 
subexpression. For example, abba|cde matches either the string abba or 
the string cde. Note that alternation does not work inside character 
classes, where | has its literal meaning.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gsub

2006-11-15 Thread john seers \(IFR\)


Is this what you want?  :

gsub(cpue\|nogd, , string)


John
 
---

Web sites:

www.ifr.ac.uk   
www.foodandhealthnetwork.com


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Luis Ridao Cruz
Sent: 15 November 2006 13:29
To: r-help@stat.math.ethz.ch
Subject: [R] gsub


R-help,

I want to remove the following strings
cpue and nogd

string - c(upsanogd ,toskanogd ,   hysunogd   ,  konganogd
  
 ,gullaksnogd , longunogd  ,  blalongunogd  , brosmunogd)

I could use first : first - gsub(cpue , , string)
and then : second - gsub(nogd , , first)

Can it be done at once?

Thanks in advance


 version
   _   
platform   i386-pc-mingw32 
arch   i386
os mingw32 
system i386, mingw32   
status 
major  2   
minor  4.0 
year   2006
month  10  
day03  
svn rev39566   
language   R   
version.string R version 2.4.0 (2006-10-03)


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gsub in data frame

2006-04-05 Thread Lapointe, Pierre
Hello,

I have this data frame:

### begin

d -data.frame(matrix(c(1,--,bla,2),2,2))
d

# I want to replace the -- by \N and still get a data frame.

# I tried: 

out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of factors
out
cat(out)

# But I lost my data frame

### end

Any idea?

Regards,

Pierre Lapointe

**
AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub in data frame

2006-04-05 Thread Petr Pikal
Hi

On 5 Apr 2006 at 7:48, Lapointe, Pierre wrote:

From:   Lapointe, Pierre [EMAIL PROTECTED]
To: 'r-help@stat.math.ethz.ch' r-help@stat.math.ethz.ch
Date sent:  Wed, 5 Apr 2006 07:48:33 -0400
Subject:[R] gsub in data frame

 Hello,
 
 I have this data frame:
 
 ### begin
 
 d -data.frame(matrix(c(1,--,bla,2),2,2))
 d
 
 # I want to replace the -- by \N and still get a data frame.
 
 # I tried: 
 
 out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of
 factors out cat(out)
 
 # But I lost my data frame
 
 ### end
 
 Any idea?

re formate it back?

data.frame(matrix(out,2,2))
X1  X2
 1   1 bla
 2 \\N   2

HTH
Petr



 
 Regards,
 
 Pierre Lapointe
 
 **
 AVIS DE NON-RESPONSABILITE: Ce document transmis par
 courrie...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub in data frame

2006-04-05 Thread Prof Brian Ripley
On Wed, 5 Apr 2006, Lapointe, Pierre wrote:

 Hello,

 I have this data frame:

 ### begin

 d -data.frame(matrix(c(1,--,bla,2),2,2))
 d

So d is a two-column data frame with factor columns.

 # I want to replace the -- by \N and still get a data frame.

levels(d$X1) - gsub(--,N, levels(d$X1))


 # I tried:

 out -gsub(--,N,as.matrix(d)) #using as.matrix to get rid of factors
 out
 cat(out)

 # But I lost my data frame

 ### end

 Any idea?

 Regards,

 Pierre Lapointe

 **
 AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gsub syntax

2005-11-27 Thread John Logsdon
Hello

I know that R's string functions are not as extensive as those of Unix but
I need to do some text handling totally within an R environment because
the target is a Windows system which will not have the corresponding shell
utilities, sed, awk etc.

Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

I want to take just the last two digits where it is a 4-digit year and
both digits when it is a 2-digit year.  I should be able to use substr but
measurement from the string end (with a negative counter or something) is
not implemented:

 substr(dates,3,4)
[1]  73 74 02
 substr(dates,-2,4)
[1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
[1]  

So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
[1] 73  74  02  973 974 002

As I understand it (and comparing with sed), the \\1 should take the first
bracketed string but clearly this doesn't work.  If I try what should also
work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
[1] 73  74  02  973 974 002

On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates) 
[1] 73 74 02 73 74 02

So it appears that the substitution takes one character extra to the left
but the following indicates that the lower limit of the selected range is
also at fault:

 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
[1]   334   345  3456

Probably more elegant examples could be constructed that could home in on
the issue.

The version is R 2.0.1 on Linux so perhaps it is a little old now.

Questions:

1) Am I misunderstanding the gsub use?

2) Was it a bug that has since been corrected?

3) Is it still a bug in the latest version?

TIA

JOhn

John Logsdon   Try to make things as simple
Quantex Research Ltd, Manchester UK as possible but not simpler
[EMAIL PROTECTED]  [EMAIL PROTECTED]
+44(0)161 445 4951/G:+44(0)7717758675   www.quantex-research.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Dimitris Rizopoulos
you could use something like:

dates - c(73, 74, 02, 1973, 1974, 2002)
###
nd - nchar(dates)
substr(dates, ifelse(nd == 2, 1, 3), nd)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: John Logsdon [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Sunday, November 27, 2005 11:04 AM
Subject: [R] gsub syntax


 Hello

 I know that R's string functions are not as extensive as those of 
 Unix but
 I need to do some text handling totally within an R environment 
 because
 the target is a Windows system which will not have the corresponding 
 shell
 utilities, sed, awk etc.

 Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year 
 and
 both digits when it is a 2-digit year.  I should be able to use 
 substr but
 measurement from the string end (with a negative counter or 
 something) is
 not implemented:

 substr(dates,3,4)
 [1]  73 74 02
 substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
 [1]  

 So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the 
 first
 bracketed string but clearly this doesn't work.  If I try what 
 should also
 work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the 
 left
 but the following indicates that the lower limit of the selected 
 range is
 also at fault:

 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home 
 in on
 the issue.

 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

 2) Was it a bug that has since been corrected?

 3) Is it still a bug in the latest version?

 TIA

 JOhn

 John Logsdon   Try to make things as 
 simple
 Quantex Research Ltd, Manchester UK as possible but not 
 simpler
 [EMAIL PROTECTED] 
 [EMAIL PROTECTED]
 +44(0)161 445 4951/G:+44(0)7717758675   www.quantex-research.com

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Sundar Dorai-Raj


John Logsdon wrote:
 Hello
 
 I know that R's string functions are not as extensive as those of Unix but
 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.
 
 Can anyone explain the following gsub phenomenon to me:
 
 
dates-c(73,74,02,1973,1974,2002)
 
 
 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:
 
 
substr(dates,3,4)
 
 [1]  73 74 02
 
substr(dates,-2,4)
 
 [1] 73   74   02   1973 1974 2002
 
substr(dates,4,-2)
 
 [1]  
 
 So I tried gsub:
 
 
gsub([19|20]([0-9][0-9]),\\1,dates)
 
 [1] 73  74  02  973 974 002
 
 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.  If I try what should also
 work:
 
 
gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 
 [1] 73  74  02  973 974 002
 
 On the other hand the following does work:
 
 
gsub([19|20]([0-9])([0-9]),\\2,dates) 
 
 [1] 73 74 02 73 74 02
 
 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:
 
 
s-c(1,12,123,1234,12345,123456)
gsub([12]([4-6]*),,s)
 
 [1]   334   345  3456
 
 Probably more elegant examples could be constructed that could home in on
 the issue.
 
 The version is R 2.0.1 on Linux so perhaps it is a little old now.
 
 Questions:
 
 1) Am I misunderstanding the gsub use?
 
 2) Was it a bug that has since been corrected?
 
 3) Is it still a bug in the latest version?
 
 TIA
 
 JOhn


Hi, John,

I cannot comment on your questions since I'm no regexpr guru. However, 
it seems to me you can do the following instead:

gsub(.*([0-9][0-9]), \\1, dates)

This works fine on Linux  Windows, R-2.2.0.

HTH,

--sundar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Gabor Grothendieck
On 11/27/05, John Logsdon [EMAIL PROTECTED] wrote:
 Hello

 I know that R's string functions are not as extensive as those of Unix but

I don't think this statement is true although I have seen it repeated.

 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.

Free versions of these utilities are available for Windows although they
don't come with Windows.  e.g. Google for gawk.


 Can anyone explain the following gsub phenomenon to me:

  dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:

  substr(dates,3,4)
 [1]  73 74 02
  substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
  substr(dates,4,-2)
 [1]  

 So I tried gsub:

  gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.  If I try what should also
 work:

  gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

  gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:

  s-c(1,12,123,1234,12345,123456)
  gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home in on
 the issue.

 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

 2) Was it a bug that has since been corrected?

 3) Is it still a bug in the latest version?


It works the same on my system which is 2.2.0 Windows patched
(2005-10-24). At first I too thought it was a bug but I noticed it
works the same in perl so now I am not sure. The following perl
program under Windows using perl 5.8.6 on Windows
gives 002 as the answer as the answer too:

   $_ = 2002;
   s/[19|20]([0-9])([0-9])/\1\2/g;
   print;

In any any case, it could be done like this:

   sub(.*(..)$, \\1, dates)

or

   substring(dates, nchar(dates)-1)

or the following which appends -01-01 to the year, converts it to Date
class, implicitly converts it back to character and then extracts
the 3rd to 4th character of the result:

   substring(as.Date(sprintf(%s-01-01, dates)), 3, 4)

or

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Prof Brian Ripley
R is blameless here: it works as documented and in the same way as 
POSIX tools.  It agrees with 'sed' using the same syntax (modulo the 
shell-specific quoting rules) e.g. in csh

% echo 1973 | sed 's/[19|20]\([0-9][0-9]\)/\1/g'
973
% echo 1973 | sed 's/\([19|20]\)\([0-9][0-9]\)/-\1-\2-/g'
-1-97-3
% echo 73 74 02 1973 1974 2002 | sed 's/[19|20]\([0-9][0-9]\)/\1/g'
73 74 02 973 974 002

so what happened when you were 'comparing with sed'?

[19|20] is a character class (containing five characters) matching one 
character, not a match for two characters as you seem to imagine.  It does 
not mean the same as 19|20, which is what you seem to have intended (and 
you seem only to want to do the substitution once on each string, so why 
use gsub?):

 sub(19|20([0-9][0-9]), \\1, dates)
[1] 73 74 02 73 74 02

A more direct way which would work e.g. for 1837 would be

sub(.*([0-9]{2}$), \\1, dates)

or even better (locale-independent)

sub(.*([[:digit:]]{2}$), \\1, dates)

Current versions of R have a help page ?regexp explaining what regexps 
are.  Even 2.0.1 did, although you were asked to update *before* posting 
(see the posting guide).  It was unambiguous:

A _character class_ is a list of characters enclosed by '[' and
']' matches any single character in that list ...
^^
...  Note that alternation does not work inside character classes,
where \code{|} has its literal meaning.


On Sun, 27 Nov 2005, John Logsdon wrote:

 Hello

 I know that R's string functions are not as extensive as those of Unix but
 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.
 Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:

Why 'should' it work in a different way to that documented?

 substr(dates,3,4)
 [1]  73 74 02
 substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
 [1]  

 So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.
 If I try what should also work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:
 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home in on
 the issue.
 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

Yes.

 2) Was it a bug that has since been corrected?

Unfortunately the bug reported two years ago in

 library(fortunes); fortune(WTFM)

still seems extant.  See the posting guide for advice on how to correct 
it.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gsub pattern?

2005-01-21 Thread Christian Schulz
Hi,
search in web for regular expressions i get the information that
the line  below replace all  AUTO string's like AUTOBAHN
,AUTORENNEN  with 1 but nothing happend.
Using the [] in the pattern it works like i'm expected, but i didn't
want single character replacment. Where is my mistake?
bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T)
many thanks  regards,
christian
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub pattern?

2005-01-21 Thread Barry Rowlingson
Christian Schulz wrote:
 Where is my mistake?
bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T)
 You dont need the slashes! You've been looking at documentation for 
Perl regular expression replacements, I guess.

 help(gsub) may have showed you the way. Here's how to do it:
  MyStringVector=c(AUTOBAHN,NAUTON,FOO,AUTOGRAPH)
# wrong way:
  gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T)
 [1] AUTOBAHN  NAUTONFOO   AUTOGRAPH
# dont slash all over the regexp:
  gsub(^AUTO.*,1,MyStringVector,ignore.case=T,extended=T)
 [1] 1  NAUTON FOO1
 Is that what you're after?
Baz
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub pattern?

2005-01-21 Thread Peter Dalgaard
Christian Schulz [EMAIL PROTECTED] writes:

 Hi,
 
 search in web for regular expressions i get the information that
 the line  below replace all  AUTO string's like AUTOBAHN
 ,AUTORENNEN  with 1 but nothing happend.
 Using the [] in the pattern it works like i'm expected, but i didn't
 want single character replacment. Where is my mistake?
 
 bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T)

What are the /-es for?

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub pattern?

2005-01-21 Thread Christian Schulz
many thanks
..the different styles from linux to r-project a little confusing for me :-(
christian
Peter Dalgaard wrote:
Christian Schulz [EMAIL PROTECTED] writes:
 

Hi,
search in web for regular expressions i get the information that
the line  below replace all  AUTO string's like AUTOBAHN
,AUTORENNEN  with 1 but nothing happend.
Using the [] in the pattern it works like i'm expected, but i didn't
want single character replacment. Where is my mistake?
bcode - gsub(/^AUTO.*/,1,MyStringVector,ignore.case=T,extended=T)
   

What are the /-es for?
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub() on Matrix

2004-10-28 Thread Tony Plate
Many more recent regular expression implementations have ways of indicating 
a match on a word boundary.  It's usually \b.

Here's what you did:
 gsub(x1, i1, x1 + x2 + x10 + xx1)
[1] i1 + x2 + i10 + xi1
The following worked for me to just change x1 to i1, while leaving 
alone any larger word that contains x1:

 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1)
[1] i1 + x2 + x10 + xx1

Note that the backslash must be escaped itself to get past the R lexical 
analyser, which is independent of the regexp processor.  What the regexp 
processor sees is just a single backslash.

For more on this, look for perl documentation of regular expressions.  Be 
aware that to use full perl regexps, you must supply the perl=T argument to 
gsub().  Also note that \b seems to be part of the most basic regular 
expression language in R; it even works with extended=F:

 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=T)
[1] i1 + x2 + x10 + xx1
 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F)
[1] i1 + x2 + x10 + xx1
 gsub(\\bx1\\b, i1, x1 + x2 + x10 + xx1, perl=F, ext=F)
[1] i1 + x2 + x10 + xx1

(I assumed the fact that you have a matrix of strings is not relevant.)
Hope this helps,
Tony Plate
At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote:
Hi,
Suppose I've got a matrix, and the first few elements look like
  x1 + x3 + x4 + x5 + x1:x3 + x1:x4
  x1 + x2 + x3 + x5 + x1:x2 + x1:x5
  x1 + x3 + x4 + x5 + x1:x3 + x1:x5
and so on (have got terms from x1 ~ x14).
If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13,
for example.  Is there an easy way?
I tried to put what I want to replace in a vector, like:
 repl = c(i7, i14, i13, d2, i8, i5,
  i6, i3, A, i9, i2,
  i4, i15, i21)
and have another vector, say:
   orig
 [1] x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
[11] x11 x12 x13 x14
Then I tried something like
  gsub(orig, repl, mat)
## mat is the name of my matrix
but it didn't work *_*.it would replace terms like x10 with i70.
(I know it may be an easy question...but I haven't done much regular
expression)
Cheers,
Kevin

Ko-Kang Kevin Wang
PhD Student
Centre for Mathematics and its Applications
Building 27, Room 1004
Mathematical Sciences Institute (MSI)
Australian National University
Canberra, ACT 0200
Australia
Homepage: http://wwwmaths.anu.edu.au/~wangk/
Ph (W): +61-2-6125-2431
Ph (H): +61-2-6125-7407
Ph (M): +61-40-451-8301
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub() on Matrix

2004-10-28 Thread Peter Dalgaard
Tony Plate [EMAIL PROTECTED] writes:

 Many more recent regular expression implementations have ways of
 indicating a match on a word boundary.  It's usually \b.


Another idea is that if what you need is something that is parseable
as a model formula RHS, then you might want to parse first and
substitute later. Something along these lines:

 e - parse(text=x1 + x3 + x4 + x5 + x1:x3 + x1:x4)[[1]]
 repl = lapply(c(i7, i14, i13, d2, i8, i5),as.name)
 names(repl)-paste(x,1:6,sep=)
 eval(substitute(substitute(e,repl),list(e=e)))
i7 + i13 + d2 + i8 + i7:i13 + i7:d2


 At Wednesday 09:07 PM 10/27/2004, Kevin Wang wrote:
 Suppose I've got a matrix, and the first few elements look like
x1 + x3 + x4 + x5 + x1:x3 + x1:x4
x1 + x2 + x3 + x5 + x1:x2 + x1:x5
x1 + x3 + x4 + x5 + x1:x3 + x1:x5
 and so on (have got terms from x1 ~ x14).
 
 If I want to replace all the x1 with i7, all x2 with i14, all x3 with i13,
 for example.  Is there an easy way?
 
 I tried to put what I want to replace in a vector, like:
   repl = c(i7, i14, i13, d2, i8, i5,
i6, i3, A, i9, i2,
i4, i15, i21)
 and have another vector, say:
 orig
   [1] x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
 [11] x11 x12 x13 x14
 
 Then I tried something like
gsub(orig, repl, mat)
 ## mat is the name of my matrix
 
 but it didn't work *_*.it would replace terms like x10 with i70.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gsub

2004-09-23 Thread Jean Eid
Hi

A while back I used gsub to do the following

temp-000US00231
gsub(something here, , temp)
00231

I think it involved the `meta characters' somehow.

I do not know how to do this anymore. I know strsplit will also work but I
remember gsub was much faster.  In essence the question is how to delete
all characters before a particular pattern.

If anyone has some help file for this, it will be greatly appreciated.


Jean Eid

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub

2004-09-23 Thread Sean Davis
You might want to look at ?regex.
Sean
On Sep 23, 2004, at 10:03 AM, Jean Eid wrote:
Hi
A while back I used gsub to do the following
temp-000US00231
gsub(something here, , temp)
00231
I think it involved the `meta characters' somehow.
I do not know how to do this anymore. I know strsplit will also work 
but I
remember gsub was much faster.  In essence the question is how to 
delete
all characters before a particular pattern.

If anyone has some help file for this, it will be greatly appreciated.
Jean Eid
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub

2004-09-23 Thread Gabor Grothendieck
Jean Eid jeaneid at chass.utoronto.ca writes:

: 
: Hi
: 
: A while back I used gsub to do the following
: 
: temp-000US00231
: gsub(something here, , temp)
: 00231
: 
: I think it involved the `meta characters' somehow.
: 
: I do not know how to do this anymore. I know strsplit will also work but I
: remember gsub was much faster.  In essence the question is how to delete
: all characters before a particular pattern.
: 
: If anyone has some help file for this, it will be greatly appreciated.
: 

I think you want sub in this case, not gsub.

There are many possibilities here depending on what the
general case is.  The following all give the desired
result for the example but their general cases differ.
These are just some of the numerous variations possible.

temp-000US00231
sub(.*US, , temp)
sub(.*S, , temp)
sub([[:digit:]]*[[:alpha:]]*, , temp)
sub(.*[[:alpha:]], , temp)
sub(.*[[:alpha:]][[:alpha:]], , temp)
sub(.*[[:upper:]], , temp)
sub(.*[[:upper:]][[:upper:]], , temp)
sub(., , temp)
substring(temp, 6)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub

2004-09-23 Thread Jean Eid
Thank you all for the help, specially Gabor that is exactly what I needed.
A few examples that do the same thing is very helpful in understanding the
structure of the call.


Thank you again,


Jean

 On Thu, 23 Sep 2004, Gabor Grothendieck wrote:

 Jean Eid jeaneid at chass.utoronto.ca writes:

 :
 : Hi
 :
 : A while back I used gsub to do the following
 :
 : temp-000US00231
 : gsub(something here, , temp)
 : 00231
 :
 : I think it involved the `meta characters' somehow.
 :
 : I do not know how to do this anymore. I know strsplit will also work but I
 : remember gsub was much faster.  In essence the question is how to delete
 : all characters before a particular pattern.
 :
 : If anyone has some help file for this, it will be greatly appreciated.
 :

 I think you want sub in this case, not gsub.

 There are many possibilities here depending on what the
 general case is.  The following all give the desired
 result for the example but their general cases differ.
 These are just some of the numerous variations possible.

 temp-000US00231
 sub(.*US, , temp)
 sub(.*S, , temp)
 sub([[:digit:]]*[[:alpha:]]*, , temp)
 sub(.*[[:alpha:]], , temp)
 sub(.*[[:alpha:]][[:alpha:]], , temp)
 sub(.*[[:upper:]], , temp)
 sub(.*[[:upper:]][[:upper:]], , temp)
 sub(., , temp)
 substring(temp, 6)

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gsub, backslash and xtable

2004-08-27 Thread P. B. Pynsent
R Version 1.9.1  (2004-06-21)
Mac OS X.3.5 Dual 2GHz PowerPC G5
GUI = AQUA
I have a data.frame comprising percentiles with the column headings 
containing % characters, e.g.
 (pp - colnames(temp2))
[1] 5%  10% 25% 50% 75% 90% 95%
I use xtable to convert the data.frame to Latex but I want to protect 
these % signs from Latex using a backslash in the normal way before 
calling xtable.
I have tried using gsub as follows;
 gsub(\%,\\%,pp)
[1] 5%  10% 25% 50% 75% 90% 95%
also
 gsub(%,\134%,pp) #octal for backslash
[1] 5%  10% 25% 50% 75% 90% 95%
Both of which fail to provide what I need.

I verified  my 'regexps' using awk under Darwin thus;
$ cat fred
5%  10% 25% 50% 75% 90% 95%
$ awk '{gsub(/%/,\\%); print}' fred
5\%  10\% 25\% 50\% 75\% 90\% 95\%
and
$ awk '{gsub(/%/,\134%); print}' fred
5\%  10\% 25\% 50\% 75\% 90\% 95\%
As a possble 'work around', I noticed that,
 chartr(z,\,gsub(%,z%,pp))
Error: syntax error
 chartr(z,\\,gsub(%,z%,pp))
[1] 5\\%  10\\% 25\\% 50\\% 75\\% 90\\% 95\\%
 chartr(z,\134,gsub(%,z%,pp))
[1] 5\\%  10\\% 25\\% 50\\% 75\\% 90\\% 95\\%
As the xtable is then 'catted' to a file and read back (vide infra) I 
actually end up with what I want using the latter example.
However I am very much left with the feeling that R is in control of me 
rather than vise versa.


Secondly, as I am building up a character vector of sentences, tables 
and figures, I wanted to convert my xtable output to a character vector 
with newline
separators. I have only able to accomplish this by printing to a 
temporary file thus,

theTx - \\documentclass[A4paper,10pt]{article}
.
.
theTx - paste(theTx, paste_xtable(temp2,Percentiles for scores), sep 
= )
.
theTx - paste(theTx,\n ,\\end{document},\n, sep = )
.
cat(theTx) #into a file for Latex

## with my past_xtable function being ##
paste_xtable - function(a_table, cap) {
sink(file = levzz, append = FALSE, type = output)
print(xtable(a_table,caption=cap))
sink()
#read it  back
temp - readLines(levzz, n=-1) #note / get doubled automaticaly
unlink(levzz) #delete file
a - \n
for (i in 1:length(temp)) {
a - paste(a,temp[i],\n,sep = )
}
return(a)
}
I should be most grateful for a  more elegant solutions to both these 
issues or a pointer to the documentation.

Paul
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub, backslash and xtable

2004-08-27 Thread Prof Brian Ripley
On Fri, 27 Aug 2004, P. B. Pynsent wrote:

 R Version 1.9.1  (2004-06-21)
 Mac OS X.3.5 Dual 2GHz PowerPC G5
 GUI = AQUA
 
 
 I have a data.frame comprising percentiles with the column headings 
 containing % characters, e.g.
   (pp - colnames(temp2))
 [1] 5%  10% 25% 50% 75% 90% 95%
 I use xtable to convert the data.frame to Latex but I want to protect 
 these % signs from Latex using a backslash in the normal way before 
 calling xtable.
 I have tried using gsub as follows;
   gsub(\%,\\%,pp)
 [1] 5%  10% 25% 50% 75% 90% 95%
 also
   gsub(%,\134%,pp) #octal for backslash
 [1] 5%  10% 25% 50% 75% 90% 95%
 Both of which fail to provide what I need.

Remember you need to double \ in R character strings (in the FAQ, for 
example, and in ?regex):

  gsub(%,%,pp)
[1] 5\\%  10\\% 25\\% 50\\% 75\\% 90\\% 95\\%
 cat(gsub(%,%,pp), \n)
5\% 10\% 25\% 50\% 75\% 90\% 95\% 



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub, backslash and xtable

2004-08-27 Thread Peter Dalgaard
P. B. Pynsent [EMAIL PROTECTED] writes:

 I have a data.frame comprising percentiles with the column headings
 containing % characters, e.g.
   (pp - colnames(temp2))
 [1] 5%  10% 25% 50% 75% 90% 95%
 I use xtable to convert the data.frame to Latex but I want to protect
 these % signs from Latex using a backslash in the normal way before
 calling xtable.
 I have tried using gsub as follows;
   gsub(\%,\\%,pp)
 [1] 5%  10% 25% 50% 75% 90% 95%
 also
...
 However I am very much left with the feeling that R is in control of
 me rather than vise versa.


The generic rule for backslashes is that you need twice as many as you
thought:

 p
[1] 25%
 gsub(%,%,p)
[1] 25\\%
 cat(gsub(%,%,p),\n)
25\%

The thing that people usually forget is that you have two levels of
escaping, one in R's string parser and another one in the regexp
machinery. 

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub, backslash and xtable

2004-08-27 Thread Rolf Turner

Peter Dalgaard wrote:

 ``The generic rule for backslashes is that you need twice as many
   as you thought''

And you have to apply that rule recursively! :-)

cheers,

Rolf Turner
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html