[R] regular expressions : extracting numbers
Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
Bonjour David, What about one of these : R gsub( [^[:digit:]], , x ) or using perl regular expressions: R gsub( \\D, , x, perl = T ) Cheers, Romain GOUACHE David wrote: Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.3 -- Mango Solutions data analysis that delivers Tel: +44(0) 1249 467 467 Fax: +44(0) 1249 467 468 Mob: +44(0) 7813 526 123 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
Is this what you want: x [1] lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse [7] rbrbrb 12 rb rj 30%rb [13] rbrb 25%rbrb rbrj, rb gsub([^0-9]*([0-9]*)[^0-9]*, \\1, x) [1] 2 2 3 4 3 21230 25 On 7/30/07, GOUACHE David [EMAIL PROTECTED] wrote: Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
GOUACHE David wrote: Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) chv-scan(what=character,sep= ) #then copy the text from your message to the clipboard and paste it to the R console chv [1] lema, rb 2% rb 2% rb 3% rb 4% [5] rb 3% rb 2%,mineuse rbrb [9] rb 12 rbrj 30%rb [13] rbrb 25%rbrb [17] rbrj, rb # actual replacements : # replace non-digits with nothing chv.digits-gsub([^0-9],,chv) chv.digits [1] 2 2 3 4 3 21230 25 [16] # replace empty strings with NA chv.digits[chv.digits==]-NA chv.digits [1] 2 2 3 4 3 2 NA NA 12 NA 30 NA NA 25 NA [16] NA NA NA -- View this message in context: http://www.nabble.com/regular-expressions-%3A-extracting-numbers-tf4169660.html#a11862597 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
Dear David, does the following work for you? sVec - c(lema, rb 2%, rb 2%, rb 3%, rb 4%, rb 3%, rb 2%,mineuse, rb, rb, rb 12, rb, rj 30%, rb, rb, rb 25%, rb, rb, rb, rj, rb) reVec - regexpr([[:digit:]]+, sVec) # see ?regex for details on '[:digit:]' and '+' substr(sVec ,start = reVec, stop=reVec + attr(reVec, match.length) - 1) # see ?substr for details Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
On Mon, 2007-07-30 at 13:58 +0200, GOUACHE David wrote: Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. Try this: Vec [1] lema, rb 2% rb 2% rb 3% rb 4% [5] rb 3% rb 2%,mineuse rbrb [9] rb 12 rbrj 30%rb [13] rbrb 25%rbrb [17] rbrj, rb gsub([^0-9], , Vec) [1] 2 2 3 4 3 21230 [14] 25 The search pattern regex here is [^0-9] which says to replace anything that is not (^) in the character range of 0 through 9. See ?regex and/or http://www.regular-expressions.info/ HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
I assume if you want the components to be NA then you really intend the result to be a numeric vector. The following replaces all non-digits with (thereby removing them) and then uses as.numeric to convert the result to numeric. Just omit the conversion if you want a character vector result: s - c(lema, rb 2%, rb 2%, rb 3%, rb 4%, rb 3%, rb 2%,mineuse, rb, rb, rb 12, rb, rj 30%, rb, rb, rb 25%, rb, rb, rb, rj, rb) as.numeric(gsub([^[:digit:]]+, , s)) On 7/30/07, GOUACHE David [EMAIL PROTECTED] wrote: Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
gsub( , , gsub(%, , gsub([a-z], , c(tr3,jh40%qs dqd [1] 3 40 Jacques VESLOT INRA - Biostatistique Processus Spatiaux Site Agroparc 84914 Avignon Cedex 9, France Tel: +33 (0) 4 32 72 21 58 Fax: +33 (0) 4 32 72 21 84 GOUACHE David a écrit : Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expressions : extracting numbers
This might work: numOnly - function(x) gsub([^0-9], , x) numOnly(lema, rb 2%) [1] 2 numOnly(rb) [1] Max -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of GOUACHE David Sent: Monday, July 30, 2007 7:59 AM To: r-help@stat.math.ethz.ch Subject: [R] regular expressions : extracting numbers Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : lema, rb 2% rb 2% rb 3% rb 4% rb 3% rb 2%,mineuse rb rb rb 12 rb rj 30% rb rb rb 25% rb rb rb rj, rb and the type of thing I wish to end up with : 2 2 3 4 3 2 12 30 25 or, instead of , NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du Végétal Station de La Minière 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.