Thanks for you contributions. Jonnes' solution (after sorting) works fine for my purposes but it would be useful to have a function that works for any numeric prefix. Another case to include would be a signed numeric: x<-c("+12.3.abc", "-0.12xyz")
Mike ----- Original Message ----- From: "Peter Dalgaard" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "R user" <[EMAIL PROTECTED]>; <R-help@stat.math.ethz.ch>; "Mike White" <[EMAIL PROTECTED]> Sent: Monday, January 31, 2005 11:05 PM Subject: Re: [R] Extracting a numeric prefix from a string > (Ted Harding) <[EMAIL PROTECTED]> writes: > > > On 31-Jan-05 R user wrote: > > > You could use something like > > > > > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > > as.numeric(y) > > > > > > But maybe there's a much nicer way. > > > > > > Jonne. > > > > I doubt it -- full marks for neat regexp footwork! > > Hmm, I'd have to deduct a few points for forgetting to escape the dot... > > > x <- "2a4" > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > y > [1] "2a4" > > as.numeric(y) > [1] NA > Warning message: > NAs introduced by coercion > > and maybe a few more for using gsub() where sub() suffices. > > There are a few more nits to pick, since "2.", ".2", "2e-7" are also > numbers, but ".", ".e-2" are not. In fact it seems quite hard even to > handle all cases in, e.g., > > x <- c("2.2abc","2.def",".2ghi",".jkl") > > with a single regular expression. The first one that worked for me was > > > r <- regexpr('^(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))',x) > > substr(x,r,r+attr(r,"match.length")-1) > [1] "2.2" "2." ".2" "" > > but several "obvious" attempts had failed. > > The problem is that regular expressions try to find the > longest match, but not necessary of subexpressions, so > > > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))?.*','\\1',x) > [1] "2." "2." ".2" "" > > even though > > > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))','XXX',x) > [1] "XXXabc" "XXXdef" "XXXghi" ".jkl" > > Actually, this one comes pretty close: > > > sub('([0-9]*(\\.[0-9]+)?)?.*','\\1',x) > [1] "2.2" "2" ".2" "" > > It only loses a trailing dot which is immaterial in the present > context. However, next try extending the RE to handle an exponent > part... > > -- > O__ ---- Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html