Perhaps an easier way would be to throw away the offending text at the end of the strings, rather than matching all possible numeric formulations at the beginning of the string, that is:
sub("\\.*[[:alpha:]]+$", "", x) Easier to read, if nothing else, and it allows for 2e-7 as a valid number. This however (I think correctly) assumes that there aren't numbers in the middle of the string, i.e. 2a3b. Robert -----Original Message----- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 6:05 PM To: [EMAIL PROTECTED] Cc: R user; R-help@stat.math.ethz.ch; Mike White Subject: Re: [R] Extracting a numeric prefix from a string (Ted Harding) <[EMAIL PROTECTED]> writes: > On 31-Jan-05 R user wrote: > > You could use something like > > > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > as.numeric(y) > > > > But maybe there's a much nicer way. > > > > Jonne. > > I doubt it -- full marks for neat regexp footwork! Hmm, I'd have to deduct a few points for forgetting to escape the dot... > x <- "2a4" > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > y [1] "2a4" > as.numeric(y) [1] NA Warning message: NAs introduced by coercion and maybe a few more for using gsub() where sub() suffices. There are a few more nits to pick, since "2.", ".2", "2e-7" are also numbers, but ".", ".e-2" are not. In fact it seems quite hard even to handle all cases in, e.g., x <- c("2.2abc","2.def",".2ghi",".jkl") with a single regular expression. The first one that worked for me was > r <- regexpr('^(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))',x) > substr(x,r,r+attr(r,"match.length")-1) [1] "2.2" "2." ".2" "" but several "obvious" attempts had failed. The problem is that regular expressions try to find the longest match, but not necessary of subexpressions, so > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))?.*','\\1',x) [1] "2." "2." ".2" "" even though > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))','XXX',x) [1] "XXXabc" "XXXdef" "XXXghi" ".jkl" Actually, this one comes pretty close: > sub('([0-9]*(\\.[0-9]+)?)?.*','\\1',x) [1] "2.2" "2" ".2" "" It only loses a trailing dot which is immaterial in the present context. However, next try extending the RE to handle an exponent part... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html