On Tue, 4 Apr 2006, Gabor Grothendieck wrote: > gsubfn in package gsubfn can do this. See the examples > in ?gsubfn
Thanks. gsubfn looks useful, but may be overkill for this, and it isn't vectorized. To do what strsplit(keep=T) would do I think you need to do something like: > findMatches<-function(strings, pattern){ lapply(strings, function(string){ v <- character() gsubfn(number.pattern, function(x,...)v<<-c(v,x), string) v}) } > number.pattern <- "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?" > findMatches(c("12;34:56,89,,12", "1.2, .4, 1., 1e3"), number.pattern) [[1]] [1] "12" "34" "56" "89" "12" [[2]] [1] "1.2" ".4" "1." "1e3" Is this worth encapsulating in a standard R function? If so, is doing via an extra argument to strsplit() a reasonable way to do it? > strsplit(c("12;34:56,89,,12", "1.2, .4, 1., 1e3"), number.pattern, keep=T) [[1]]: [1] "12" "34" "56" "89" "12" [[2]]: [1] "1.2" ".4" "1." "1e3" > On 4/4/06, Bill Dunlap <[EMAIL PROTECTED]> wrote: > > strsplit() is a convenient way to get a > > list of items from a string when you > > have a regular expression for what is not > > an item. E.g., > > > > > strsplit("1.2, 34, 1.7e-2", split="[ ,] *") > > [[1]]: > > [1] "1.2" "34" "1.7e-2" > > > > However, sometimes is it more convenient to > > give a pattern for the items you do want. > > E.g., suppose you want to pull all the numbers > > out of a string which contains a mix of numbers > > and words. Making a pattern for what a > > number is simpler than making a pattern > > for what may come between the number. > > > number.pattern <- > > "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?" > > > > I propose adding a keep=FALSE argument to > > strsplit() to do this. If keep is FALSE, > > then the split argument matches the stuff to > > omit from the output; if keep is TRUE then > > split matches the stuff to put into the > > output. Then we could do the following to > > get a list of all the numbers in a string > > (done in a version of strsplit() I'm working on > > for S-PLUS): > > > > > strsplit("1.2, 34, 1.7e-2", split=number.pattern,keep=TRUE) > > [[1]]: > > [1] "1.2" "34" "1.7e-2" > > > > > strsplit("Ibuprofin 200mg", split=number.pattern,keep=TRUE) > > [[1]]: > > [1] "200" > > > > Is this a reasonable thing to want strsplit to do? > > Is this a reasonable parameterization of it? ---------------------------------------------------------------------------- Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel