Sometimes you need to NOT use a regular expression and do things simpler. You 
have a fairly simple example that not only does not need great power but may be 
a pain to do using a very powerful technique, especially if you want to play 
with look-ahead and look behind.

Assuming you have a line with repeated runs on non-plussed text followed by one 
to three contiguous runs of a plus sign, can you write a short function that 
scans along till it finds a plus sign, then looks ahead till it sees a space or 
the end of the line. It then wraps up all the text till the last plus and adds 
a copy to a growing list or other structure. It then continues from the 
space(s) it ignores and repeats.

When done, you have what you want in the format you want.

A variant on this is to start from the end and scan backwards and stop at any 
plus sign. Keep what follows but strip any whitespace to the left of it. The 
result is the list in backwards order unless you used a stack to hold it.

There are quite a few variants that might apply and perhaps use of functions in 
modules. A dumb example might be to preprocess the string and replace all 
instances of 1 to 3 plus signs and an optional space  with itself and an added 
letter like a ":" and then a second pass using a regular expression becomes 
trivial as the colons disappear.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of David Winsemius
Sent: Wednesday, April 12, 2023 6:03 PM
To: Emily Bakker <emilybak...@outlook.com>
Cc: r-help@r-project.org
Subject: Re: [R] Split String in regex while Keeping Delimiter

I thought replacing the spaces following instances of +++,++,+,- with "\n" and 
then reading with scan should succeed. Like Ivan Krylov I was fairly sure that 
you meant the minus sign to be "-" rather than "–", but perhaps your were using 
MS Word as an editor which is inconsistent with effective use of R. If so, 
learn to use a proper programming editor, and in any case learn to post to 
rhelp in plain text.

-- 
David

scan(text=gsub("([-+]){1}\\s", "\\1\n", dat), what="", sep="\n")



> On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybak...@outlook.com> wrote:
> 
> Hello List,
>  
> I have a dataset consisting of strings that I want to split while saving the 
> delimiter.
>  
> Some example data:
> “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
> “leucocyten – grampositieve coccen +”
>  
> I want to split the strings such that I get the following result:
> c(“leucocyten +”,  “gramnegatieve staven +++”,  “grampositieve staven ++”)
> c(“leucocyten –“, “grampositieve coccen +”)
>  
> I have tried strsplit with a regular expression with a positive lookahead, 
> but I am not able to achieve the results that I want.
>  
> I have tried:
> as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>  
> Which results in:
> c(“leucocyten “, “+”,  “gramnegatieve staven “, “+”, “+”, “+”,  
> “grampositieve staven ++”)
> c(“leucocyten “, “–“, “grampositieve coccen +”)
>  
>  
> Is there a function or regular expression that will make this possible?
>  
> Kind regards,
> Emily 
>  
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to