subject:"\[R\] regular expression help"

Re: [R] Regular expression help

2017-10-10 Thread David Winsemius


> On Oct 9, 2017, at 6:08 PM, Georges Monette  wrote:
> 
> How about this (I'm showing it as a pipe because it's easier to read that 
> way):
> 
> library(magrittr)
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587" %>%
>   strsplit(' ') %>%
>   unlist %>%
>   sub('^[^/]*/*','',.) %>%
>   sub('^[^/]*/*','',.) %>%
>   paste(collapse = ' ')

I'm old school R, so I don't find that particularly readable. I read the later 
specification as saying each line began with an f, so the fourth item after an 
strsplit becomes the target.

This seemed more readable to me:

Lines <- 
readLines(url("http://sci.esa.int/science-e/www/object/doc.cfm?fobjectid=54726;))
lines <- Lines[ grepl("^f", Lines) ]

str(lines)
# chr [1:62908] "f 14327 6959 18747" "f 8258 15598 18980" "f 27662 21871 21939" 
...

l2 <- strsplit(lines, " ")  # in that file the separators were spaces
l3 <- sapply(l2[1:3], function(x) { if (length(x) == 4) x[4] else ""
  })
l3
#[1] "18747" "18980" "21939"

# Remove the `[1:3]` to get the entire result.


Best;
David.

> 
> Georges Monette
> 
> -- 
> Georges Monette, PhD P.Stat.(SSC) | Associate Professor. Faculty of Science, 
> Department of Mathematics & Statistics | North 626 Ross Building | York 
> University | 4700 Keele Street, Toronto, ON M3J 1P3 | Telephone: 416-736-5250 
> | Fax: 416-736-5757 | E-Mail: geor...@yorku.ca
> 
> 
> On 2017-10-09 11:02 AM, Duncan Murdoch wrote:
>> I have a file containing "words" like
>> 
>> 
>> a
>> 
>> a/b
>> 
>> a/b/c
>> 
>> where there may be multiple words on a line (separated by spaces).  The a, 
>> b, and c strings can contain non-space, non-slash characters. I'd like to 
>> use gsub() to extract the c strings (which should be empty if there are 
>> none).
>> 
>> A real example is
>> 
>> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>> 
>> which I'd like to transform to
>> 
>> " 587 587 587 587"
>> 
>> Another real example is
>> 
>> "f 1067 28680 24462"
>> 
>> which should transform to "   ".
>> 
>> I've tried a few different regexprs, but am unable to find a way to say 
>> "transform words by deleting everything up to and including the 2nd slash" 
>> when there might be zero, one or two slashes.  Any suggestions?
>> 
>> Duncan Murdoch
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread Georges Monette

How about this (I'm showing it as a pipe because it's easier to read 
that way):


library(magrittr)
"f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587" %>%
  strsplit(' ') %>%
  unlist %>%
  sub('^[^/]*/*','',.) %>%
  sub('^[^/]*/*','',.) %>%
  paste(collapse = ' ')

Georges Monette

--
Georges Monette, PhD P.Stat.(SSC) | Associate Professor. Faculty of Science, 
Department of Mathematics & Statistics | North 626 Ross Building | York 
University | 4700 Keele Street, Toronto, ON M3J 1P3 | Telephone: 416-736-5250 | 
Fax: 416-736-5757 | E-Mail: geor...@yorku.ca


On 2017-10-09 11:02 AM, Duncan Murdoch wrote:

I have a file containing "words" like


a

a/b

a/b/c

where there may be multiple words on a line (separated by spaces).  
The a, b, and c strings can contain non-space, non-slash characters. 
I'd like to use gsub() to extract the c strings (which should be empty 
if there are none).


A real example is

"f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"

which I'd like to transform to

" 587 587 587 587"

Another real example is

"f 1067 28680 24462"

which should transform to "   ".

I've tried a few different regexprs, but am unable to find a way to 
say "transform words by deleting everything up to and including the 
2nd slash" when there might be zero, one or two slashes.  Any 
suggestions?


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread Duncan Murdoch

On 09/10/2017 12:06 PM, William Dunlap wrote:
"(^| +)([^/ ]*/?){0,2}", with the first "*" replaced by "+" would be a 
bit better.

Thanks!  I think I actually need the *, because theoretically the b part 
of the word could be empty, i.e. "a//c" would be legal and should become 
"c".

Duncan Murdoch

Bill Dunlap
TIBCO Software
wdunlap tibco.com 

On Mon, Oct 9, 2017 at 8:50 AM, William Dunlap > wrote:

 > x <- "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
 > gsub("(^| *)([^/ ]*/?){0,2}", "\\1", x)
[1] " 587 587 587 587"
 > y <- "aa aa/ aa/bb aa/bb/ aa/bb/cc aa/bb/cc/ aa/bb/cc/dd
aa/bb/cc/dd/"
 > gsub("(^| *)([^/ ]*/?){0,2}", "\\1", y)
[1] "    cc cc/ cc/dd cc/dd/"

Bill Dunlap
TIBCO Software
wdunlap tibco.com 

On Mon, Oct 9, 2017 at 8:02 AM, Duncan Murdoch
> wrote:

I have a file containing "words" like

a

a/b

a/b/c

where there may be multiple words on a line (separated by
spaces).  The a, b, and c strings can contain non-space,
non-slash characters. I'd like to use gsub() to extract the c
strings (which should be empty if there are none).

A real example is

"f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"

which I'd like to transform to

" 587 587 587 587"

Another real example is

"f 1067 28680 24462"

which should transform to "   ".

I've tried a few different regexprs, but am unable to find a way
to say "transform words by deleting everything up to and
including the 2nd slash" when there might be zero, one or two
slashes.  Any suggestions?

Duncan Murdoch

__
R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread Duncan Murdoch


On 09/10/2017 11:23 AM, Ulrik Stervbo wrote:

Hi Duncan,

why not split on / and take the correct elements? It is not as elegant 
as regex but could do the trick.


Thanks for the suggestion.  There are likely many thousands of lines of 
data like the two real examples (which had about 5000 and 6 lines 
respectively), so I was thinking that would be too slow, as it would 
involve nested strsplit() calls.  But in fact, it's not so bad, so I 
might go with it.  Here's a stab at it:


lines <- http://sci.esa.int/science-e/www/object/doc.cfm?fobjectid=54726>


l2 <- strsplit(lines, " ")
l3 <- lapply(l2, function(x) {
y <- strsplit(x, "/")
sapply(y, function(z) if (length(z) == 3) z[3] else "")
  })

Duncan



Best,
Ulrik

On Mon, 9 Oct 2017 at 17:03 Duncan Murdoch > wrote:


I have a file containing "words" like


a

a/b

a/b/c

where there may be multiple words on a line (separated by spaces).  The
a, b, and c strings can contain non-space, non-slash characters. I'd
like to use gsub() to extract the c strings (which should be empty if
there are none).

A real example is

"f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"

which I'd like to transform to

" 587 587 587 587"

Another real example is

"f 1067 28680 24462"

which should transform to "   ".

I've tried a few different regexprs, but am unable to find a way to say
"transform words by deleting everything up to and including the 2nd
slash" when there might be zero, one or two slashes.  Any suggestions?

Duncan Murdoch

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread William Dunlap via R-help

"(^| +)([^/ ]*/?){0,2}", with the first "*" replaced by "+" would be a bit
better.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Oct 9, 2017 at 8:50 AM, William Dunlap  wrote:

> > x <- "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
> > gsub("(^| *)([^/ ]*/?){0,2}", "\\1", x)
> [1] " 587 587 587 587"
> > y <- "aa aa/ aa/bb aa/bb/ aa/bb/cc aa/bb/cc/ aa/bb/cc/dd aa/bb/cc/dd/"
> > gsub("(^| *)([^/ ]*/?){0,2}", "\\1", y)
> [1] "cc cc/ cc/dd cc/dd/"
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Mon, Oct 9, 2017 at 8:02 AM, Duncan Murdoch 
> wrote:
>
>> I have a file containing "words" like
>>
>>
>> a
>>
>> a/b
>>
>> a/b/c
>>
>> where there may be multiple words on a line (separated by spaces).  The
>> a, b, and c strings can contain non-space, non-slash characters. I'd like
>> to use gsub() to extract the c strings (which should be empty if there are
>> none).
>>
>> A real example is
>>
>> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>>
>> which I'd like to transform to
>>
>> " 587 587 587 587"
>>
>> Another real example is
>>
>> "f 1067 28680 24462"
>>
>> which should transform to "   ".
>>
>> I've tried a few different regexprs, but am unable to find a way to say
>> "transform words by deleting everything up to and including the 2nd slash"
>> when there might be zero, one or two slashes.  Any suggestions?
>>
>> Duncan Murdoch
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread William Dunlap via R-help

> x <- "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
> gsub("(^| *)([^/ ]*/?){0,2}", "\\1", x)
[1] " 587 587 587 587"
> y <- "aa aa/ aa/bb aa/bb/ aa/bb/cc aa/bb/cc/ aa/bb/cc/dd aa/bb/cc/dd/"
> gsub("(^| *)([^/ ]*/?){0,2}", "\\1", y)
[1] "cc cc/ cc/dd cc/dd/"


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Oct 9, 2017 at 8:02 AM, Duncan Murdoch 
wrote:

> I have a file containing "words" like
>
>
> a
>
> a/b
>
> a/b/c
>
> where there may be multiple words on a line (separated by spaces).  The a,
> b, and c strings can contain non-space, non-slash characters. I'd like to
> use gsub() to extract the c strings (which should be empty if there are
> none).
>
> A real example is
>
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>
> which I'd like to transform to
>
> " 587 587 587 587"
>
> Another real example is
>
> "f 1067 28680 24462"
>
> which should transform to "   ".
>
> I've tried a few different regexprs, but am unable to find a way to say
> "transform words by deleting everything up to and including the 2nd slash"
> when there might be zero, one or two slashes.  Any suggestions?
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread peter dalgaard


> On 9 Oct 2017, at 17:02 , Duncan Murdoch  wrote:
> 
> I have a file containing "words" like
> 
> 
> a
> 
> a/b
> 
> a/b/c
> 
> where there may be multiple words on a line (separated by spaces).  The a, b, 
> and c strings can contain non-space, non-slash characters. I'd like to use 
> gsub() to extract the c strings (which should be empty if there are none).
> 
> A real example is
> 
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
> 
> which I'd like to transform to
> 
> " 587 587 587 587"
> 
> Another real example is
> 
> "f 1067 28680 24462"
> 
> which should transform to "   ".
> 
> I've tried a few different regexprs, but am unable to find a way to say 
> "transform words by deleting everything up to and including the 2nd slash" 
> when there might be zero, one or two slashes.  Any suggestions?
> 

I think you might need something like this:

s <- "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
l <- strsplit(s, " ")[[1]]
pat <- "[[:alnum:]]*/[[:alnum:]]*/([[:alnum:]]*)"
paste(ifelse(grepl(pat,l),gsub(pat, "\\1", l), ""), collapse=" ")

-pd

> Duncan Murdoch
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread Eric Berger

Hi Duncan,
You can try this:

library(readr)
f <- function(s) {
  t <- unlist(readr::tokenize(paste0(gsub(" ",",",s),"\n",collapse="")))
  i <- grep("[a-zA-Z0-9]*/[a-zA-Z0-9]*/",t)
  u <- sub("[a-zA-Z0-9]*/[a-zA-Z0-9]*/","",t[i])
  paste0(u,collapse=" ")
}

f("f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587")
# "587 587 587 587"

f("f 1067 28680 24462")
# ""

HTH,
Eric


On Mon, Oct 9, 2017 at 6:23 PM, Ulrik Stervbo 
wrote:

> Hi Duncan,
>
> why not split on / and take the correct elements? It is not as elegant as
> regex but could do the trick.
>
> Best,
> Ulrik
>
> On Mon, 9 Oct 2017 at 17:03 Duncan Murdoch 
> wrote:
>
> > I have a file containing "words" like
> >
> >
> > a
> >
> > a/b
> >
> > a/b/c
> >
> > where there may be multiple words on a line (separated by spaces).  The
> > a, b, and c strings can contain non-space, non-slash characters. I'd
> > like to use gsub() to extract the c strings (which should be empty if
> > there are none).
> >
> > A real example is
> >
> > "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
> >
> > which I'd like to transform to
> >
> > " 587 587 587 587"
> >
> > Another real example is
> >
> > "f 1067 28680 24462"
> >
> > which should transform to "   ".
> >
> > I've tried a few different regexprs, but am unable to find a way to say
> > "transform words by deleting everything up to and including the 2nd
> > slash" when there might be zero, one or two slashes.  Any suggestions?
> >
> > Duncan Murdoch
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2017-10-09 Thread Ulrik Stervbo

Hi Duncan,

why not split on / and take the correct elements? It is not as elegant as
regex but could do the trick.

Best,
Ulrik

On Mon, 9 Oct 2017 at 17:03 Duncan Murdoch  wrote:

> I have a file containing "words" like
>
>
> a
>
> a/b
>
> a/b/c
>
> where there may be multiple words on a line (separated by spaces).  The
> a, b, and c strings can contain non-space, non-slash characters. I'd
> like to use gsub() to extract the c strings (which should be empty if
> there are none).
>
> A real example is
>
> "f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"
>
> which I'd like to transform to
>
> " 587 587 587 587"
>
> Another real example is
>
> "f 1067 28680 24462"
>
> which should transform to "   ".
>
> I've tried a few different regexprs, but am unable to find a way to say
> "transform words by deleting everything up to and including the 2nd
> slash" when there might be zero, one or two slashes.  Any suggestions?
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regular expression help

2017-10-09 Thread Duncan Murdoch


I have a file containing "words" like


a

a/b

a/b/c

where there may be multiple words on a line (separated by spaces).  The 
a, b, and c strings can contain non-space, non-slash characters. I'd 
like to use gsub() to extract the c strings (which should be empty if 
there are none).


A real example is

"f 147/1315/587 2820/1320/587 3624/1321/587 1852/1322/587"

which I'd like to transform to

" 587 587 587 587"

Another real example is

"f 1067 28680 24462"

which should transform to "   ".

I've tried a few different regexprs, but am unable to find a way to say 
"transform words by deleting everything up to and including the 2nd 
slash" when there might be zero, one or two slashes.  Any suggestions?


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2017-06-08 Thread Ashim Kapoor

Dear Enrico,

Many thanks and Best Regards,

Ashim.

On Thu, Jun 8, 2017 at 5:11 PM, Enrico Schumann 
wrote:

>
> Zitat von Ashim Kapoor :
>
>
> Dear All,
>>
>> My query is:
>>
>> Do we always need to use perl = TRUE option when doing ignore.case=TRUE?
>>
>> A small example :
>>
>> my_text =
>> "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
>> VS RAMESH GUPTA.\nDated: 01.03.2016   Item no.01\n
>> Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
>> for ARCIL.\nNone for the CDs.\n  The counsel for the CHFI
>> submitted that the matter has been assigned to ARCIL and deed of
>> assignment, application for substituting the name and vakalatnama has been
>> filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
>> prayed that ARCIL may be substituted in place of SBI for the purpose of
>> further proceedings in the matter. Request allowed.\nThe proxy counsel for
>> CHFI further requested to issue demand notice thereby mentioning the name
>> of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
>> notice mentioning the name of ARCIL.\nCHFI is directed to file status of
>> the mortgaged property as well as other assets of the CDs.\nList the case
>> on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."
>>
>> My regular expression is:
>>
>> parties_present_start_1=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)
>>
>> parties_present_start_2=
>> regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)
>>
>> parties_present_start_1
>>>
>> [1] 138
>> attr(,"match.length")
>> [1] 123
>> attr(,"useBytes")
>> [1] TRUE
>>
>>> parties_present_start_2
>>>
>> [1] 20
>> attr(,"match.length")
>> [1] 949
>> attr(,"useBytes")
>> [1] TRUE
>>
>>>
>>>
>> Why do I see the correct result only in the first case?
>>
>> Best Regards,
>> Ashim
>>
>>
> In Perl, '.' matches anything but a newline.
>
> In R, '.' matches any character.
>
>   test <- "hello\n1"
>   regexpr(".*[0-9]", test)
>   ## [1] 1
>   ## attr(,"match.length")
>   ## [1] 7
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>   regexpr(".*[0-9]", test, perl = TRUE)
>   ## [1] 7
>   ## attr(,"match.length")
>   ## [1] 1
>   ## attr(,"useBytes")
>   ## [1] TRUE
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2017-06-08 Thread Enrico Schumann



Zitat von Ashim Kapoor :


Dear All,

My query is:

Do we always need to use perl = TRUE option when doing ignore.case=TRUE?

A small example :

my_text =
"RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
VS RAMESH GUPTA.\nDated: 01.03.2016   Item no.01\n
Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
for ARCIL.\nNone for the CDs.\n  The counsel for the CHFI
submitted that the matter has been assigned to ARCIL and deed of
assignment, application for substituting the name and vakalatnama has been
filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
prayed that ARCIL may be substituted in place of SBI for the purpose of
further proceedings in the matter. Request allowed.\nThe proxy counsel for
CHFI further requested to issue demand notice thereby mentioning the name
of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
notice mentioning the name of ARCIL.\nCHFI is directed to file status of
the mortgaged property as well as other assets of the CDs.\nList the case
on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."

My regular expression is:

parties_present_start_1=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)

parties_present_start_2=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)


parties_present_start_1

[1] 138
attr(,"match.length")
[1] 123
attr(,"useBytes")
[1] TRUE

parties_present_start_2

[1] 20
attr(,"match.length")
[1] 949
attr(,"useBytes")
[1] TRUE




Why do I see the correct result only in the first case?

Best Regards,
Ashim



In Perl, '.' matches anything but a newline.

In R, '.' matches any character.

  test <- "hello\n1"
  regexpr(".*[0-9]", test)
  ## [1] 1
  ## attr(,"match.length")
  ## [1] 7
  ## attr(,"useBytes")
  ## [1] TRUE

  regexpr(".*[0-9]", test, perl = TRUE)
  ## [1] 7
  ## attr(,"match.length")
  ## [1] 1
  ## attr(,"useBytes")
  ## [1] TRUE


--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regular expression help

2017-06-08 Thread Ashim Kapoor

Dear All,

My query is:

Do we always need to use perl = TRUE option when doing ignore.case=TRUE?

A small example :

my_text =
"RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n  RC No. 162/2015\nSBI
VS RAMESH GUPTA.\nDated: 01.03.2016   Item no.01\n
Present:   Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel
for ARCIL.\nNone for the CDs.\n  The counsel for the CHFI
submitted that the matter has been assigned to ARCIL and deed of
assignment, application for substituting the name and vakalatnama has been
filed vide diary no. 1454 dated 08.02.2016\nIn the application it has been
prayed that ARCIL may be substituted in place of SBI for the purpose of
further proceedings in the matter. Request allowed.\nThe proxy counsel for
CHFI further requested to issue demand notice thereby mentioning the name
of ARCIL. Request allowed.\nRegistry is directed to issue fresh demand
notice mentioning the name of ARCIL.\nCHFI is directed to file status of
the mortgaged property as well as other assets of the CDs.\nList the case
on 28.03.2016.\n  (SUJEET KUMAR)\nRECOVERY OFFICER-II."

My regular expression is:

parties_present_start_1=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE,perl=T)

parties_present_start_2=
regexpr("\n.*Present.*\n.*\n",my_text,ignore.case=TRUE)

> parties_present_start_1
[1] 138
attr(,"match.length")
[1] 123
attr(,"useBytes")
[1] TRUE
> parties_present_start_2
[1] 20
attr(,"match.length")
[1] 949
attr(,"useBytes")
[1] TRUE
>

Why do I see the correct result only in the first case?

Best Regards,
Ashim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-30 Thread C Lin

Hi, Bill

Thank you so much for your kind explanation. It's very clear too for someone 
like me.
I should've remember this but somehow forgot that [] have a special meaning in 
regular expression.

Lin


 From: wdun...@tibco.com
 Date: Sun, 29 Jun 2014 13:16:26 -0700
 Subject: Re: [R] regular expression help
 To: bac...@hotmail.com
 CC: dwinsem...@comcast.net; r-help@r-project.org

 what's the difference between [:space:]+ and[[:space:]]+ ?

 The pattern '[:space:]' matches any of ':', 's', 'p', 'a', 'c', and
 'e' (the second colon is superfluous). I.e., it has no magic meaning.
 Inside of [] it does have a special meaning.

 The pattern '[[:space:]]' matches a space, a newline, and other
 whitespace characters. The pattern '[a-c[:space:]z[:digit:]]' matches
 'a', 'b', 'c', any decimal digit, and any whitespace character.
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Fri, Jun 27, 2014 at 6:27 AM, C Lin bac...@hotmail.com wrote:
 Thank you all for your help.

 Bill, thanks for making it compact and I did mean any amount of whitespace.

 To break it down, so I know why this pattern work:
 The first parenthesis means that before AARSD1 it can be
 ^: begins with nothing
 |: or
 //: double slash or
 [[:space:]]+: one or more whitespace character

 For the second parenthesis:
 $: ending with nothing

 Do this sound correct?

 I missed the fact that I need the ^ and $ and I always do [:space:]+ instead 
 of [[:space:]]+
 what's the difference between [:space:]+ and[[:space:]]+ ?

 Thanks so much!
 Lin

 
 From: wdun...@tibco.com
 Date: Fri, 27 Jun 2014 02:35:54 -0700
 Subject: Re: [R] regular expression help
 To: dwinsem...@comcast.net
 CC: bac...@hotmail.com; r-help@r-project.org

 You can use parentheses to factor out the common string in David's
 pattern, as in
 grep(value=TRUE, (^|//|[[:space:]]+)AARSD1($|//|[[:space:]]+), test)

 (By 'whitespace' I could not tell if you meant any amount of
 whitespace or a single
 whitespace character. I use '+' to match one or more whitespace characters.)

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Thu, Jun 26, 2014 at 10:12 PM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jun 26, 2014, at 6:11 PM, C Lin wrote:

 Hi Duncan,

 Thanks for trying to help. Sorry for not being clear.
 The string I'd like to get is 'AARSD1'
 It can be followed or preceded by white space or // or nothing

 so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
 //','//AARSD1','AARSD1');

 I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

 Perhaps you want jsut

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1', test)

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1$', test)
 [1] FALSE FALSE TRUE TRUE TRUE TRUE

 --
 David.



 Thanks,
 Lin

 
 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string 
 you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
 On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace 
 or //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.




 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-29 Thread William Dunlap

 what's the difference between [:space:]+ and[[:space:]]+ ?

The pattern '[:space:]' matches any of ':', 's', 'p', 'a', 'c', and
'e' (the second colon is superfluous).  I.e., it has no magic meaning.
Inside of [] it does have a special meaning.

The pattern '[[:space:]]' matches a space, a newline, and other
whitespace characters.  The pattern '[a-c[:space:]z[:digit:]]' matches
'a', 'b', 'c', any decimal digit, and any whitespace character.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 27, 2014 at 6:27 AM, C Lin bac...@hotmail.com wrote:
 Thank you all for your help.

 Bill, thanks for making it compact and I did mean any amount of whitespace.

 To break it down, so I know why this pattern work:
 The first parenthesis means that before AARSD1 it can be
 ^: begins with nothing
 |: or
 //: double slash or
 [[:space:]]+: one or more whitespace character

 For the second parenthesis:
 $: ending with nothing

 Do this sound correct?

 I missed the fact that I need the ^ and $ and I always do [:space:]+ instead 
 of [[:space:]]+
 what's the difference between [:space:]+ and[[:space:]]+ ?

 Thanks so much!
 Lin

 
 From: wdun...@tibco.com
 Date: Fri, 27 Jun 2014 02:35:54 -0700
 Subject: Re: [R] regular expression help
 To: dwinsem...@comcast.net
 CC: bac...@hotmail.com; r-help@r-project.org

 You can use parentheses to factor out the common string in David's
 pattern, as in
 grep(value=TRUE, (^|//|[[:space:]]+)AARSD1($|//|[[:space:]]+), test)

 (By 'whitespace' I could not tell if you meant any amount of
 whitespace or a single
 whitespace character. I use '+' to match one or more whitespace characters.)

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Thu, Jun 26, 2014 at 10:12 PM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jun 26, 2014, at 6:11 PM, C Lin wrote:

 Hi Duncan,

 Thanks for trying to help. Sorry for not being clear.
 The string I'd like to get is 'AARSD1'
 It can be followed or preceded by white space or // or nothing

 so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
 //','//AARSD1','AARSD1');

 I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

 Perhaps you want jsut

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1', test)

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1$', test)
 [1] FALSE FALSE TRUE TRUE TRUE TRUE

 --
 David.



 Thanks,
 Lin

 
 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string 
 you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
 On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace or 
 //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.




 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-27 Thread William Dunlap

You can use parentheses to factor out the common string in David's
pattern, as in
   grep(value=TRUE, (^|//|[[:space:]]+)AARSD1($|//|[[:space:]]+), test)

(By 'whitespace' I could not tell if you meant any amount of
whitespace or a single
whitespace character.  I use '+' to match one or more whitespace characters.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jun 26, 2014 at 10:12 PM, David Winsemius
dwinsem...@comcast.net wrote:

 On Jun 26, 2014, at 6:11 PM, C Lin wrote:

 Hi Duncan,

 Thanks for trying to help. Sorry for not being clear.
 The string I'd like to get is 'AARSD1'
 It can be followed or preceded by white space or // or nothing

 so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
 //','//AARSD1','AARSD1');

 I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

 Perhaps you want jsut

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1', test)

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1$', test)
 [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

 --
 David.



 Thanks,
 Lin

 
 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace or //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.




 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-27 Thread C Lin

Thank you all for your help.

Bill, thanks for making it compact and I did mean any amount of whitespace.

To break it down, so I know why this pattern work:
The first parenthesis means that before AARSD1 it can be
^: begins with nothing
|: or
//: double slash or
[[:space:]]+: one or more whitespace character

For the second parenthesis:
$: ending with nothing

Do this sound correct?

I missed the fact that I need the ^ and $ and I always do [:space:]+ instead of 
[[:space:]]+
what's the difference between [:space:]+ and[[:space:]]+ ?

Thanks so much!
Lin


 From: wdun...@tibco.com
 Date: Fri, 27 Jun 2014 02:35:54 -0700
 Subject: Re: [R] regular expression help
 To: dwinsem...@comcast.net
 CC: bac...@hotmail.com; r-help@r-project.org

 You can use parentheses to factor out the common string in David's
 pattern, as in
 grep(value=TRUE, (^|//|[[:space:]]+)AARSD1($|//|[[:space:]]+), test)

 (By 'whitespace' I could not tell if you meant any amount of
 whitespace or a single
 whitespace character. I use '+' to match one or more whitespace characters.)

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Thu, Jun 26, 2014 at 10:12 PM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jun 26, 2014, at 6:11 PM, C Lin wrote:

 Hi Duncan,

 Thanks for trying to help. Sorry for not being clear.
 The string I'd like to get is 'AARSD1'
 It can be followed or preceded by white space or // or nothing

 so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
 //','//AARSD1','AARSD1');

 I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

 Perhaps you want jsut

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1', test)

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1$', test)
 [1] FALSE FALSE TRUE TRUE TRUE TRUE

 --
 David.



 Thanks,
 Lin

 
 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string 
 you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace or 
 //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.




 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-27 Thread arun

Hi,
You may try:
test[!grepl((?=AARSD1)[-\\d], test, perl=T)]

A.K.




On Friday, June 27, 2014 6:43 AM, C Lin bac...@hotmail.com wrote:
Hi Duncan,

Thanks for trying to help. Sorry for not being clear.
The string I'd like to get is 'AARSD1'
It can be followed or preceded by white space or // or nothing

so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
//','//AARSD1','AARSD1');

I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

Thanks,
Lin
 




 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace or //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.

 Thanks in advance for your help.

 Lin
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

                          
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regular expression help

2014-06-26 Thread C Lin

Dear R users,

I need to match a string. It can be followed or preceded by whitespace or // or 
nothing.
How do I code it in R?

For example:
test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
grep('AARSD1(\\s*//*)',test);

should return 3,4,5 and 6.

Thanks in advance for your help.

Lin   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-26 Thread Duncan Mackay

Hi

You only have a vector of length 5 and I am not quite sure of the string you
are testing
so try this

grep('[/]*\\AARSD1\\[/]*',test)

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of C Lin
Sent: Friday, 27 June 2014 10:05
To: r-help@r-project.org
Subject: [R] regular expression help

Dear R users,

I need to match a string. It can be followed or preceded by whitespace or //
or nothing.
How do I code it in R?

For example:
test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
grep('AARSD1(\\s*//*)',test);

should return 3,4,5 and 6.

Thanks in advance for your help.

Lin   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-26 Thread C Lin

Hi Duncan,

Thanks for trying to help. Sorry for not being clear.
The string I'd like to get is 'AARSD1'
It can be followed or preceded by white space or // or nothing

so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
//','//AARSD1','AARSD1');

I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

Thanks,
Lin
 

 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000

 Hi

 You only have a vector of length 5 and I am not quite sure of the string you
 are testing
 so try this

 grep('[/]*\\AARSD1\\[/]*',test)

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help

 Dear R users,

 I need to match a string. It can be followed or preceded by whitespace or //
 or nothing.
 How do I code it in R?

 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);

 should return 3,4,5 and 6.

 Thanks in advance for your help.

 Lin
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help

2014-06-26 Thread David Winsemius


On Jun 26, 2014, at 6:11 PM, C Lin wrote:

 Hi Duncan,
 
 Thanks for trying to help. Sorry for not being clear.
 The string I'd like to get is 'AARSD1'
 It can be followed or preceded by white space or // or nothing
 
 so, from test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 
 //','//AARSD1','AARSD1');
 
 I want to match only 'AARSD1//','AARSD1 //','//AARSD1','AARSD1'

Perhaps you want jsut 

grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1', test)

 grepl('^AARSD1//$|^AARSD1 //$|^//AARSD1$|^AARSD1$', test)
[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

-- 
David.

 

 Thanks,
 Lin
  
 
 From: dulca...@bigpond.com
 To: bac...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] regular expression help
 Date: Fri, 27 Jun 2014 10:59:29 +1000
 
 Hi
 
 You only have a vector of length 5 and I am not quite sure of the string you
 are testing
 so try this
 
 grep('[/]*\\AARSD1\\[/]*',test)
 
 Duncan
 
 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of C Lin
 Sent: Friday, 27 June 2014 10:05
 To: r-help@r-project.org
 Subject: [R] regular expression help
 
 Dear R users,
 
 I need to match a string. It can be followed or preceded by whitespace or //
 or nothing.
 How do I code it in R?
 
 For example:
 test - c('AARSD11','AARSD1-','AARSD1//','AARSD1 //','//AARSD1');
 grep('AARSD1(\\s*//*)',test);
 
 should return 3,4,5 and 6.
 
 


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help to extract specific strings from text

2010-04-01 Thread Tony B

Thank you guys, both solutions work great! Seems I have two new
packages to investigate :)

Regards,
Tony Breyal

On 31 Mar, 14:20, Tony B tony.bre...@googlemail.com wrote:
 Dear all,

 Lets say I have the following:

  x - c(Eve: Going to try something new today..., Adam: Hey @Eve, how are 
  you finding R? #rstats, Eve: @Adam, It's awesome, so much better at 
  statistics that #Excel ever was! @Cain  @Able disagree though :(, Adam: 
  @Eve I'm sure they'll sort it out :), blahblah)
  x

 [1] Eve: Going to try something new
 today...
 [2] Adam: Hey @Eve, how are you finding R?
 #rstats
 [3] Eve: @Adam, It's awesome, so much better at statistics that
 \n#Excel ever was! @Cain  @Able disagree though :(
 [4] Adam: @Eve I'm sure they'll sort it
 out :)
 [5] blahblah

 I would like to come up with a data frame which looks like this
 (pulling out the usernames and #tags):

  data.frame(Msg = x, Source = c(Eve, Adam, Eve, Adam, NA), Mentions 
  = c(NA, Eve, Adam, Cain, Able, Eve, NA), HashTags = c(NA, rstats, 
  Excel, NA, NA))

 The best I can do so far is:

 source - lapply(x, function (x) {
    tmp - strsplit(x, :, fixed = TRUE)
    if(length(tmp[[1]])  2) {
      tmp - c(NA, tmp)
    }
    return(tmp[[1]][1])
  } )
 source - unlist(source)

 [1] Eve  Adam Eve  Adam NA

 I can't work out how to extract the usernames starting with '@' or the
 #tags. I can identify them using gsub and replace them, but I don't
 know how to just extract those terms only, e.g. sort of the opposite
 of the following

  gsub(@([A-Za-z0-9_]+), @[...], x)

 [1] Eve: Going to try something new today...
 [2] Adam: Hey @[...], how are you finding R? #rstats
 [3] Eve: @[...], It's awesome, so much better at statistics that
 #Excel ever was! @[...]  @[...] disagree though :(
 [4] Adam: @[...] I'm sure they'll sort it out :)
 [5] blahblah

 and

  gsub(#([A-Za-z0-9_]+), #[...], x)

 [1] Eve: Going to try something new today...
 [2] Adam: Hey @Eve, how are you finding R? #[...]
 [3] Eve: @Adam, It's awesome, so much better at statistics that
 #[...] ever was! @Cain  @Able disagree though :(
 [4] Adam: @Eve I'm sure they'll sort it out :)
 [5] blahblah

 I hope that makes sense, and thank you kindly in advance for your
 time.
 Tony Breyal

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regular expression help to extract specific strings from text

2010-03-31 Thread Tony B

Dear all,

Lets say I have the following:

 x - c(Eve: Going to try something new today..., Adam: Hey @Eve, how are 
 you finding R? #rstats, Eve: @Adam, It's awesome, so much better at 
 statistics that #Excel ever was! @Cain  @Able disagree though :(, Adam: 
 @Eve I'm sure they'll sort it out :), blahblah)
 x
[1] Eve: Going to try something new
today...
[2] Adam: Hey @Eve, how are you finding R?
#rstats
[3] Eve: @Adam, It's awesome, so much better at statistics that
\n#Excel ever was! @Cain  @Able disagree though :(
[4] Adam: @Eve I'm sure they'll sort it
out :)
[5] blahblah

I would like to come up with a data frame which looks like this
(pulling out the usernames and #tags):

 data.frame(Msg = x, Source = c(Eve, Adam, Eve, Adam, NA), Mentions = 
 c(NA, Eve, Adam, Cain, Able, Eve, NA), HashTags = c(NA, rstats, 
 Excel, NA, NA))

The best I can do so far is:

source - lapply(x, function (x) {
   tmp - strsplit(x, :, fixed = TRUE)
   if(length(tmp[[1]])  2) {
 tmp - c(NA, tmp)
   }
   return(tmp[[1]][1])
 } )
source - unlist(source)

[1] Eve  Adam Eve  Adam NA

I can't work out how to extract the usernames starting with '@' or the
#tags. I can identify them using gsub and replace them, but I don't
know how to just extract those terms only, e.g. sort of the opposite
of the following

 gsub(@([A-Za-z0-9_]+), @[...], x)
[1] Eve: Going to try something new today...
[2] Adam: Hey @[...], how are you finding R? #rstats
[3] Eve: @[...], It's awesome, so much better at statistics that
#Excel ever was! @[...]  @[...] disagree though :(
[4] Adam: @[...] I'm sure they'll sort it out :)
[5] blahblah

and

 gsub(#([A-Za-z0-9_]+), #[...], x)
[1] Eve: Going to try something new today...
[2] Adam: Hey @Eve, how are you finding R? #[...]
[3] Eve: @Adam, It's awesome, so much better at statistics that
#[...] ever was! @Cain  @Able disagree though :(
[4] Adam: @Eve I'm sure they'll sort it out :)
[5] blahblah

I hope that makes sense, and thank you kindly in advance for your
time.
Tony Breyal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help to extract specific strings from text

2010-03-31 Thread Gabor Grothendieck

strapply in gsubfn can extract matches based on content which seems to
be what you want:

library(gsubfn)

f - function(...) sapply(list(...), paste, collapse = , )

DF - data.frame(x,
Source = strapply(x, ^(\\w+):, c, simplify = f),
Mentions = strapply(x, @(\\w+), c, simplify = f),
HashTags = strapply(x, #(\\w+), c, simplify = f))

DF[DF == ] - NA



On Wed, Mar 31, 2010 at 9:20 AM, Tony B tony.bre...@googlemail.com wrote:
 Dear all,

 Lets say I have the following:

 x - c(Eve: Going to try something new today..., Adam: Hey @Eve, how are 
 you finding R? #rstats, Eve: @Adam, It's awesome, so much better at 
 statistics that #Excel ever was! @Cain  @Able disagree though :(, Adam: 
 @Eve I'm sure they'll sort it out :), blahblah)
 x
 [1] Eve: Going to try something new
 today...
 [2] Adam: Hey @Eve, how are you finding R?
 #rstats
 [3] Eve: @Adam, It's awesome, so much better at statistics that
 \n#Excel ever was! @Cain  @Able disagree though :(
 [4] Adam: @Eve I'm sure they'll sort it
 out :)
 [5] blahblah

 I would like to come up with a data frame which looks like this
 (pulling out the usernames and #tags):

 data.frame(Msg = x, Source = c(Eve, Adam, Eve, Adam, NA), Mentions = 
 c(NA, Eve, Adam, Cain, Able, Eve, NA), HashTags = c(NA, rstats, 
 Excel, NA, NA))

 The best I can do so far is:

 source - lapply(x, function (x) {
   tmp - strsplit(x, :, fixed = TRUE)
   if(length(tmp[[1]])  2) {
     tmp - c(NA, tmp)
   }
   return(tmp[[1]][1])
  } )
 source - unlist(source)

 [1] Eve  Adam Eve  Adam NA

 I can't work out how to extract the usernames starting with '@' or the
 #tags. I can identify them using gsub and replace them, but I don't
 know how to just extract those terms only, e.g. sort of the opposite
 of the following

 gsub(@([A-Za-z0-9_]+), @[...], x)
 [1] Eve: Going to try something new today...
 [2] Adam: Hey @[...], how are you finding R? #rstats
 [3] Eve: @[...], It's awesome, so much better at statistics that
 #Excel ever was! @[...]  @[...] disagree though :(
 [4] Adam: @[...] I'm sure they'll sort it out :)
 [5] blahblah

 and

 gsub(#([A-Za-z0-9_]+), #[...], x)
 [1] Eve: Going to try something new today...
 [2] Adam: Hey @Eve, how are you finding R? #[...]
 [3] Eve: @Adam, It's awesome, so much better at statistics that
 #[...] ever was! @Cain  @Able disagree though :(
 [4] Adam: @Eve I'm sure they'll sort it out :)
 [5] blahblah

 I hope that makes sense, and thank you kindly in advance for your
 time.
 Tony Breyal

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regular expression help to extract specific strings from text

2010-03-31 Thread hadley wickham

On Wed, Mar 31, 2010 at 8:20 AM, Tony B tony.bre...@googlemail.com wrote:
 Dear all,

 Lets say I have the following:

 x - c(Eve: Going to try something new today..., Adam: Hey @Eve, how are 
 you finding R? #rstats, Eve: @Adam, It's awesome, so much better at 
 statistics that #Excel ever was! @Cain  @Able disagree though :(, Adam: 
 @Eve I'm sure they'll sort it out :), blahblah)
 x
 [1] Eve: Going to try something new
 today...
 [2] Adam: Hey @Eve, how are you finding R?
 #rstats
 [3] Eve: @Adam, It's awesome, so much better at statistics that
 \n#Excel ever was! @Cain  @Able disagree though :(
 [4] Adam: @Eve I'm sure they'll sort it
 out :)
 [5] blahblah

 I would like to come up with a data frame which looks like this
 (pulling out the usernames and #tags):

 data.frame(Msg = x, Source = c(Eve, Adam, Eve, Adam, NA), Mentions = 
 c(NA, Eve, Adam, Cain, Able, Eve, NA), HashTags = c(NA, rstats, 
 Excel, NA, NA))

You can do this pretty easily with the stringr package:

library(stringr)
str_extract_all(x, @[a-zA-z]+)
sapply(str_extract_all(x, @[a-zA-z]+), str_c, collapse = , )

Hadley



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regular expression help

2009-12-07 Thread Ramya


Hi  there

I have a string like this i want to extract 9831019 from this string i used
a regular expresion \d+ by which i can only make it to see 7 and returns.
This type of number(9831019)  appears in any part of the string and is
definitely more than 5 digits all the time and i want to give that as a
condition 

UV7C11-F9-E1 MCS#9831019
MCS Lot #9512516


how do i go abt it

Ramya
-- 
View this message in context: 
http://n4.nabble.com/Regular-expression-help-tp954834p954834.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2009-12-07 Thread Marc Schwartz


On Dec 7, 2009, at 5:04 PM, Ramya wrote:



Hi  there

I have a string like this i want to extract 9831019 from this string  
i used
a regular expresion \d+ by which i can only make it to see 7 and  
returns.

This type of number(9831019)  appears in any part of the string and is
definitely more than 5 digits all the time and i want to give that  
as a

condition

UV7C11-F9-E1 MCS#9831019
MCS Lot #9512516


how do i go abt it

Ramya



Is the double quote actually part of your data or just a typo?

I am not sure that it might matter in the end, but here is one approach:

 x
[1] UV7C11-F9-E1 MCS#9831019 MCS Lot #9512516\

Note that I have the double quote included in the second value, which  
is escaped when printed here.


 gsub(^.*#([0-9]*).*$, \\1, x)
[1] 9831019 9512516


This uses gsub() to extract the value within the parens in the regex  
using a back reference.


Any characters from the beginning of the line to the '#' are dropped,  
as are any characters after the numeric sequence to the end of the line.


See ?gsub for more information.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2009-12-07 Thread Phil Spector


Ramya -
   Try


strings = c('UV7C11-F9-E1 MCS#9831019','MCS Lot #9512516')
sub('^.*?(\\d{5,}).*?$','\\1',strings,perl=TRUE)

[1] 9831019 9512516

The regular expression finds the first string of five or 
more numbers in the strings.  Since you said the numbers could
occur anywhere in the string, you could have provided some 
examples where the numbers weren't the last part of the string.


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Mon, 7 Dec 2009, Ramya wrote:



Hi  there

I have a string like this i want to extract 9831019 from this string i used
a regular expresion \d+ by which i can only make it to see 7 and returns.
This type of number(9831019)  appears in any part of the string and is
definitely more than 5 digits all the time and i want to give that as a
condition

UV7C11-F9-E1 MCS#9831019
MCS Lot #9512516


how do i go abt it

Ramya
--
View this message in context: 
http://n4.nabble.com/Regular-expression-help-tp954834p954834.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

2009-12-07 Thread Gabor Grothendieck

If I understand correctly you wish to extract strings of digits more
than 5 characters long:

s - c(UV7C11-F9-E1 MCS#9831019, MCS Lot #9512516)
library(gsubfn)
strapply(s, \\d{6,}, c)

Depending on what you want to get back you might wish to add the
simplify=TRUE argument to strapply, as well.  See
http://gsubfn.googlecode.com


On Mon, Dec 7, 2009 at 6:04 PM, Ramya ramya.vict...@gmail.com wrote:

 Hi  there

 I have a string like this i want to extract 9831019 from this string i used
 a regular expresion \d+ by which i can only make it to see 7 and returns.
 This type of number(9831019)  appears in any part of the string and is
 definitely more than 5 digits all the time and i want to give that as a
 condition

 UV7C11-F9-E1 MCS#9831019
 MCS Lot #9512516


 how do i go abt it

 Ramya
 --
 View this message in context: 
 http://n4.nabble.com/Regular-expression-help-tp954834p954834.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

[R] Regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

[R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

[R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help

Re: [R] regular expression help to extract specific strings from text

[R] regular expression help to extract specific strings from text

Re: [R] regular expression help to extract specific strings from text

Re: [R] regular expression help to extract specific strings from text

[R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

Re: [R] Regular expression help

30 matches

Site Navigation

Mail list logo

Footer information