On Jun 12, 2008, at 5:06 PM, Marc Schwartz wrote:

on 06/12/2008 03:46 PM Hua Li wrote:
Hi,
I'm looking for some way to pick up the numbers which are contained and buried in a long character. For example, outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C: 17139.21);" num.char = unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist (strsplit(unlist(strsplit (outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",f ixed=TRUE)),";",fixed=TRUE))
num.vec=as.numeric(num.char[1:(length(num.char)-1)])
num.char
# "B" "1204.25" "E" "1204.25" "7581.11" "F" "8785.36" "8353.85" "C" "17139.21" "" num.vec # NA 1204.25 NA 1204.25 7581.11 NA 8785.36 8353.85 NA 17139.21 would help me get the numbers such as 1204.25, 7581.11, etc, but with a warning message which reads:
"Warning message:
NAs introduced by coercion "
Is there a way to get around this? Thanks!
Hua

Your code above is overly and needlessly complicated, which makes it difficult to debug.

I would take an approach whereby you use gsub() to strip non- numeric characters from the input character vector and then use scan () to read the remaining numbers:

> Vec <- scan(textConnection(gsub("[^0-9\\.]+", " ", outtree.new)))
Read 6 items

> Vec
[1]  1204.25  1204.25  7581.11  8785.36  8353.85 17139.21

> str(Vec)
 num [1:6] 1204 1204 7581 8785 8354 ...


The result of using gsub() above is:

> gsub("[^0-9\\.]+", " ", outtree.new)
[1] " 1204.25 1204.25 7581.11 8785.36 8353.85 17139.21 "


That gives you a character vector which can then be passed to scan () as a textConnection().

Another approach would be to split on sequences of non-integers:

as.numeric( strsplit(outtree.new, "[^\\d.]+", perl=TRUE)[[1]] )


Use "[^+-\\d.]+" if your numbers might be signed. This does assume that dots, +/- occur only as decimal points.

Hua, did you want to keep the information of which number is B, which is C etc?

See ?gsub, ?regex, ?textConnection and ?scan for more information.

HTH,

Marc Schwartz


Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to