Re: [R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

2018-02-20 Thread Bert Gunter
These are always kind of fun, not least because of the variety of different
replies that "work" at least somewhat. Here's mine:

> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"

> sub("^(.+)www\\.(.+)\\.com.+","\\1\\2",stringa)
[1] "[2440810] / tinyurl"

Note the use of doubled backslashes to escape the regex metacharacters. See
?regexp for details.

Cheers,
Bert





On Tue, Feb 20, 2018 at 9:19 PM, Omar André Gonzáles Díaz <
oma.gonza...@gmail.com> wrote:

> Hi, I need help for cleaning this:
>
> "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> My desired output is:
>
> "[2440810] / tinyurl".
>
> My attemps:
>
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
> first dot.
>
> b <- sub('[.].*', '', b) #clean from ".com" until the end.
>
> b #returns ""[2440810] / www"
>
> Thank you.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

2018-02-20 Thread Ulrik Stervbo
Hi Omar,

you are almost there but! Your first substitution looks 'www' as the
start of the line followed by anything (which then do nothing), so your
second substitution removes everything from the first '.' to be found
(which is the one after www).

What you want to do is
x <- "[2440810] / www.tinyurl.com/hgaco4fha3"

y <- sub('www\\.', '', x) # Note the escape of '.'
y <- sub('\\..*', '', y)
y

Altrenatively, all in one (if all addresses are .com)
gsub("(www\\.|\\.com.*)", "", x)

And the same using stringr
library(stringr)
x %>% str_replace_all("(www\\.|\\.com.*)", "")

HTH
Ulrik


On Wed, 21 Feb 2018 at 06:20 Omar André Gonzáles Díaz <
oma.gonza...@gmail.com> wrote:

> Hi, I need help for cleaning this:
>
> "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> My desired output is:
>
> "[2440810] / tinyurl".
>
> My attemps:
>
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
> first dot.
>
> b <- sub('[.].*', '', b) #clean from ".com" until the end.
>
> b #returns ""[2440810] / www"
>
> Thank you.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

2018-02-20 Thread Omar André Gonzáles Díaz
Hi, I need help for cleaning this:

"[2440810] / www.tinyurl.com/hgaco4fha3"

My desired output is:

"[2440810] / tinyurl".

My attemps:

stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"

b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
first dot.

b <- sub('[.].*', '', b) #clean from ".com" until the end.

b #returns ""[2440810] / www"

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.