Re: [R] Regexp pattern but fixed replacement?

2024-04-11 Thread Duncan Murdoch

On 11/04/2024 12:58 p.m., Iris Simmons wrote:

Hi Duncan,


I only know about sub() and gsub().

There is no way to have pattern be a regular expression and replacement 
be a fixed string.


Backslash is the only special character in replacement. If you need a 
reference, see this file:

https://github.com/wch/r-source/blob/04650eddd6d844963b6d7aac02bd8d13cbf440d4/src/main/grep.c
 

particularly functions R_pcre_string_adj and wstring_adj. So just double 
the backslashes in replacement and you'll be good to go.


Thanks, that's what I've done.

Duncan Murdoch



On Thu, Apr 11, 2024, 12:36 Duncan Murdoch > wrote:


I noticed this issue in stringr::str_replace, but it also affects sub()
in base R.

If the pattern in a call to one of these needs to be a regular
expression, then backslashes in the replacement text are treated
specially.

For example,

    gsub("a|b", "\\", "abcdef")

gives "def", not "def" as I wanted.  To get the latter, I need to
escape the replacement backslashes, e.g.

    gsub("a|b", "", "abcdef")

which gives "cdef".

I have two questions:

1.  Is there a variant on sub or str_replace which allows the
pattern to
be declared as a regular expression, but the replacement to be declared
as fixed?

2.  To get what I want, I can double the backslashes in the replacement
text.  This would do that:

     replacement <- gsub("", "", replacement)

Are there any other special characters to worry about besides
backslashes?

Duncan Murdoch

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regexp pattern but fixed replacement?

2024-04-11 Thread Duncan Murdoch

On 11/04/2024 12:57 p.m., Dave Dixon wrote:

Backslashes in regex expressions in R are maddening, but they make sense.

R string handling interprets your replacement string "\\" as just one
backslash. Your string is received by gsub as "\" - that is, just the
control backslash, NOT the character backslash. gsub is expecting to see
\0, \1, \2, or some other control starting with backslash.

If you want gsub to replace with a backslash character, you have to send
it as "\\". In order to get two backslash characters in an R string, you
have to double them ALL: "".


You can use "\\" if the pattern is declared as "fixed", via

  sub("a", "\\", "abcdef", fixed = TRUE)

or

  stringr::str_replace("abcdef", fixed("a"), "\\")

My first question was whether there is a sub-like function with a way to 
declare the pattern as a regexp, but the replacement as fixed.  Thanks 
for your answer to my second question.


Duncan Murdoch



The string that is output is an R string: the backslashes are escaped
with a backslash, so "" really means two backslashes.

There are lots of special characters in the search string, but only one
in the replacement string: backslash.

Here's my favorite resource on this topic is
https://www.regular-expressions.info/replacecharacters.html


On 4/11/24 10:35, Duncan Murdoch wrote:

I noticed this issue in stringr::str_replace, but it also affects
sub() in base R.

If the pattern in a call to one of these needs to be a regular
expression, then backslashes in the replacement text are treated
specially.

For example,

   gsub("a|b", "\\", "abcdef")

gives "def", not "def" as I wanted.  To get the latter, I need to
escape the replacement backslashes, e.g.

   gsub("a|b", "", "abcdef")

which gives "cdef".

I have two questions:

1.  Is there a variant on sub or str_replace which allows the pattern
to be declared as a regular expression, but the replacement to be
declared as fixed?

2.  To get what I want, I can double the backslashes in the
replacement text.  This would do that:

    replacement <- gsub("", "", replacement)

Are there any other special characters to worry about besides
backslashes?

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regexp pattern but fixed replacement?

2024-04-11 Thread Iris Simmons
Hi Duncan,


I only know about sub() and gsub().

There is no way to have pattern be a regular expression and replacement be
a fixed string.

Backslash is the only special character in replacement. If you need a
reference, see this file:
https://github.com/wch/r-source/blob/04650eddd6d844963b6d7aac02bd8d13cbf440d4/src/main/grep.c
particularly functions R_pcre_string_adj and wstring_adj. So just double
the backslashes in replacement and you'll be good to go.

On Thu, Apr 11, 2024, 12:36 Duncan Murdoch  wrote:

> I noticed this issue in stringr::str_replace, but it also affects sub()
> in base R.
>
> If the pattern in a call to one of these needs to be a regular
> expression, then backslashes in the replacement text are treated specially.
>
> For example,
>
>gsub("a|b", "\\", "abcdef")
>
> gives "def", not "def" as I wanted.  To get the latter, I need to
> escape the replacement backslashes, e.g.
>
>gsub("a|b", "", "abcdef")
>
> which gives "cdef".
>
> I have two questions:
>
> 1.  Is there a variant on sub or str_replace which allows the pattern to
> be declared as a regular expression, but the replacement to be declared
> as fixed?
>
> 2.  To get what I want, I can double the backslashes in the replacement
> text.  This would do that:
>
> replacement <- gsub("", "", replacement)
>
> Are there any other special characters to worry about besides backslashes?
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regexp pattern but fixed replacement?

2024-04-11 Thread Dave Dixon

Backslashes in regex expressions in R are maddening, but they make sense.

R string handling interprets your replacement string "\\" as just one 
backslash. Your string is received by gsub as "\" - that is, just the 
control backslash, NOT the character backslash. gsub is expecting to see 
\0, \1, \2, or some other control starting with backslash.


If you want gsub to replace with a backslash character, you have to send 
it as "\\". In order to get two backslash characters in an R string, you 
have to double them ALL: "".


The string that is output is an R string: the backslashes are escaped 
with a backslash, so "" really means two backslashes.


There are lots of special characters in the search string, but only one 
in the replacement string: backslash.


Here's my favorite resource on this topic is 
https://www.regular-expressions.info/replacecharacters.html



On 4/11/24 10:35, Duncan Murdoch wrote:
I noticed this issue in stringr::str_replace, but it also affects 
sub() in base R.


If the pattern in a call to one of these needs to be a regular 
expression, then backslashes in the replacement text are treated 
specially.


For example,

  gsub("a|b", "\\", "abcdef")

gives "def", not "def" as I wanted.  To get the latter, I need to 
escape the replacement backslashes, e.g.


  gsub("a|b", "", "abcdef")

which gives "cdef".

I have two questions:

1.  Is there a variant on sub or str_replace which allows the pattern 
to be declared as a regular expression, but the replacement to be 
declared as fixed?


2.  To get what I want, I can double the backslashes in the 
replacement text.  This would do that:


   replacement <- gsub("", "", replacement)

Are there any other special characters to worry about besides 
backslashes?


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.