(Ted Harding) <[EMAIL PROTECTED]> writes: > On 31-Jan-05 R user wrote: > > You could use something like > > > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > as.numeric(y) > > > > But maybe there's a much nicer way. > > > > Jonne. > > I doubt it -- full marks for neat regexp footwork!
Hmm, I'd have to deduct a few points for forgetting to escape the dot... > x <- "2a4" > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > y [1] "2a4" > as.numeric(y) [1] NA Warning message: NAs introduced by coercion and maybe a few more for using gsub() where sub() suffices. There are a few more nits to pick, since "2.", ".2", "2e-7" are also numbers, but ".", ".e-2" are not. In fact it seems quite hard even to handle all cases in, e.g., x <- c("2.2abc","2.def",".2ghi",".jkl") with a single regular expression. The first one that worked for me was > r <- regexpr('^(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))',x) > substr(x,r,r+attr(r,"match.length")-1) [1] "2.2" "2." ".2" "" but several "obvious" attempts had failed. The problem is that regular expressions try to find the longest match, but not necessary of subexpressions, so > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))?.*','\\1',x) [1] "2." "2." ".2" "" even though > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))','XXX',x) [1] "XXXabc" "XXXdef" "XXXghi" ".jkl" Actually, this one comes pretty close: > sub('([0-9]*(\\.[0-9]+)?)?.*','\\1',x) [1] "2.2" "2" ".2" "" It only loses a trailing dot which is immaterial in the present context. However, next try extending the RE to handle an exponent part... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html