strsplit() is a convenient way to get a list of items from a string when you have a regular expression for what is not an item. E.g.,
> strsplit("1.2, 34, 1.7e-2", split="[ ,] *") [[1]]: [1] "1.2" "34" "1.7e-2" However, sometimes is it more convenient to give a pattern for the items you do want. E.g., suppose you want to pull all the numbers out of a string which contains a mix of numbers and words. Making a pattern for what a number is simpler than making a pattern for what may come between the number. > number.pattern <- "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?" I propose adding a keep=FALSE argument to strsplit() to do this. If keep is FALSE, then the split argument matches the stuff to omit from the output; if keep is TRUE then split matches the stuff to put into the output. Then we could do the following to get a list of all the numbers in a string (done in a version of strsplit() I'm working on for S-PLUS): > strsplit("1.2, 34, 1.7e-2", split=number.pattern,keep=TRUE) [[1]]: [1] "1.2" "34" "1.7e-2" > strsplit("Ibuprofin 200mg", split=number.pattern,keep=TRUE) [[1]]: [1] "200" Is this a reasonable thing to want strsplit to do? Is this a reasonable parameterization of it? ---------------------------------------------------------------------------- Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel