[R] String manipulation
I want to do the following: if a string does not contain a colon (:), no change is needed; if it contains one or more colons, break the string into multiple strings using the colon as a separator. For example, happy: becomes happy : :sad turns to : sad and happy:sad changes to happy : sad How to do this? Thanks, Gang __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
strsplit(split=:) does almost what you want, but it omits the colons from the output. You can use perl zero-length look-ahead and look-behind operators in the split argument to get the colons as well: strsplit(c(:sad, happy:, happy:sad), split=(?=:)|(?=:), perl=TRUE) [[1]] [1] : sad [[2]] [1] happy : [[3]] [1] happy : sad Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Dec 8, 2014 at 9:08 AM, Gang Chen gangch...@gmail.com wrote: I want to do the following: if a string does not contain a colon (:), no change is needed; if it contains one or more colons, break the string into multiple strings using the colon as a separator. For example, happy: becomes happy : :sad turns to : sad and happy:sad changes to happy : sad How to do this? Thanks, Gang __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
Actually, the zero-length look-ahead expression is enough to get the job done: strsplit(c(:sad, happy:, happy:sad, :happy:sad:subdued:), split=(?=:), perl=TRUE) [[1]] [1] : sad [[2]] [1] happy : [[3]] [1] happy : sad [[4]] [1] : happy : sad : subdued : Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Dec 8, 2014 at 1:13 PM, William Dunlap wdun...@tibco.com wrote: strsplit(split=:) does almost what you want, but it omits the colons from the output. You can use perl zero-length look-ahead and look-behind operators in the split argument to get the colons as well: strsplit(c(:sad, happy:, happy:sad), split=(?=:)|(?=:), perl=TRUE) [[1]] [1] : sad [[2]] [1] happy : [[3]] [1] happy : sad Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Dec 8, 2014 at 9:08 AM, Gang Chen gangch...@gmail.com wrote: I want to do the following: if a string does not contain a colon (:), no change is needed; if it contains one or more colons, break the string into multiple strings using the colon as a separator. For example, happy: becomes happy : :sad turns to : sad and happy:sad changes to happy : sad How to do this? Thanks, Gang __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String Manipulation in R
Hi , Is there any inbuilt functions to check whether a substring is present in a string and give the result as boolean Thanks -- View this message in context: http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation in R
grepl Michael On Tue, Jun 12, 2012 at 8:51 AM, anjali jeevi...@gmail.com wrote: Hi , Is there any inbuilt functions to check whether a substring is present in a string and give the result as boolean Thanks -- View this message in context: http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation in R
?grepl Note that this function uses regular expressions, in which certain characters have special meanings, so depending on what string you are looking for you may have to know something about regex patterns to get it to work. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. anjali jeevi...@gmail.com wrote: Hi , Is there any inbuilt functions to check whether a substring is present in a string and give the result as boolean Thanks -- View this message in context: http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation in R
Hello, Yes, there is. See ?grepl or help('grepl'). Hope this helps, Rui Barradas Em 12-06-2012 14:51, anjali escreveu: Hi , Is there any inbuilt functions to check whether a substring is present in a string and give the result as boolean Thanks -- View this message in context: http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation in R
Or use 'fixed=TRUE' as an argument to grepl to avoid the regular expression matching (but learning regular expressions will be a useful tool in the long run). On Tue, Jun 12, 2012 at 9:15 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: ?grepl Note that this function uses regular expressions, in which certain characters have special meanings, so depending on what string you are looking for you may have to know something about regex patterns to get it to work. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. anjali jeevi...@gmail.com wrote: Hi , Is there any inbuilt functions to check whether a substring is present in a string and give the result as boolean Thanks -- View this message in context: http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation with regexpr, got to be a better way
Hi Chris, why not using routines for dates dates - c(09/10/2003, 10/22/2005) format(strptime(dates,format=%m/%d/%Y),%Y) or take just the last 4 chars from dates gsub(.*([0-9]{4})$,\\1,dates) cheers Am 29.09.2011 16:23, schrieb Chris Conner: Help-Rs, I'm doing some string manipulation in a file where I converted a string date in mm/dd/ format and returned the date . I've used regexpr (hat tip to Gabor G for a very nice earlier post on this function) in steps (I've un-nested the code and provided it and an example of what I did below. My question is: is there a more efficient way to do this. Specifically is there a way to use regexpr or some other string function to return not the first instance, but the 2nd (or for that matter 3rd, 4th or 5th instance) of a certain string? #first find the first occurence of / and create a variable for this firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then use frist/ to cut the string field into an intermediate variable e.g., from 1/1/2008 to 1/2008. step1 - substr( dates, (firstslash + 1), nchar(dates) ) #then repeat steps 1 and 2...there's got to be a better way step2 - unlist(regexpr(/, step1, fixed = TRUE)) #then use step2 to cut string into final product e.g., from 1/2008 to 2008. final - substring(step1,step2 + 1, nchar(step1) ) Thx! C [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String manipulation with regexpr, got to be a better way
Help-Rs, I'm doing some string manipulation in a file where I converted a string date in mm/dd/ format and returned the date . I've used regexpr (hat tip to Gabor G for a very nice earlier post on this function) in steps (I've un-nested the code and provided it and an example of what I did below. My question is: is there a more efficient way to do this. Specifically is there a way to use regexpr or some other string function to return not the first instance, but the 2nd (or for that matter 3rd, 4th or 5th instance) of a certain string? #first find the first occurence of / and create a variable for this firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then use frist/ to cut the string field into an intermediate variable e.g., from 1/1/2008 to 1/2008. step1 - substr( dates, (firstslash + 1), nchar(dates) ) #then repeat steps 1 and 2...there's got to be a better way step2 - unlist(regexpr(/, step1, fixed = TRUE)) #then use step2 to cut string into final product e.g., from 1/2008 to 2008. final - substring(step1,step2 + 1, nchar(step1) ) Thx! C [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation with regexpr, got to be a better way
Chris Conner wrote on 09/29/2011 09:23:02 AM: Help-Rs, I'm doing some string manipulation in a file where I converted a string date in mm/dd/ format and returned the date . I've used regexpr (hat tip to Gabor G for a very nice earlier post on this function) in steps (I've un-nested the code and provided it and an example of what I did below. My question is: is there a more efficient way to do this. Specifically is there a way to use regexpr or some other string function to return not the first instance, but the 2nd (or for that matter 3rd, 4th or 5th instance) of a certain string? #first find the first occurence of / and create a variable for this firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then use frist/ to cut the string field into an intermediate variable e.g., from 1/1/2008 to 1/2008. step1 - substr( dates, (firstslash + 1), nchar(dates) ) #then repeat steps 1 and 2...there's got to be a better way step2 - unlist(regexpr(/, step1, fixed = TRUE)) #then use step2 to cut string into final product e.g., from 1/2008 to 2008. final - substring(step1,step2 + 1, nchar(step1) ) Thx! C # a couple example dates dates - c(09/10/2003, 10/22/2005) # split the dates dates.split - strsplit(dates, /) # extract the years sapply(dates.split, [, 3) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
You might want to take a look at 'regexpr' and/or 'gregexpr': mytext - I want the number 2000, not the number two thousand idx - regexpr(\\d{4}, mytext) idx - c(idx, (idx + attributes(idx)$match.length)-1) substr(start=idx[1], stop=idx[2], mytext) HTH, Janko On 26.08.2011 03:51, Lorenzo Cattarino wrote: Apologies for confusion. What I meant was the following: mytext- I want the number 2000, not the number two thousand and the problem is to select 2000 as the first four digits after the word number. The position of 2000 in the string might change. thanks Lorenzo -Original Message- From: Steven Kennedy [mailto:stevenkennedy2...@gmail.com] Sent: Friday, 26 August 2011 11:31 AM To: Henrique Dallazuanna Cc: Lorenzo Cattarino; r-help@r-project.org Subject: Re: [R] string manipulation You can split your string, and then only take the first 4 digits after that (this is only an improvement if your numbers might not be at the end of mytext): mytext- I do not want the first number 1234, but the second number 5678 sstr-strsplit(mytext,split=second number )[[1]][2] nynumbers-substr(sstr,1,4) On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuannawww...@gmail.com wrote: Try this: gsub(.*second number , , mytext) On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext- I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
On Thu, Aug 25, 2011 at 9:51 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: Apologies for confusion. What I meant was the following: mytext - I want the number 2000, not the number two thousand and the problem is to select 2000 as the first four digits after the word number. The position of 2000 in the string might change. thanks Lorenzo strapply in gsubfn searches mytext for the indicated regular expression and passes the back referenced portion (i.e. the portion of mytext matching the parenthesized portion of the regular expression) to the as.numeric function whose output is returned. library(gsubfn) strapply(mytext, number.*([0-9]{4}), as.numeric, simplify = TRUE) # 2000 See http://gsubfn.googlecode.com for more info. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
.* is greedy... might want regex number[^0-9]*([0-9] {4}) to avoid getting 1999 from I want the number 2000, not the number 1999. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Gabor Grothendieck ggrothendi...@gmail.com wrote: On Thu, Aug 25, 2011 at 9:51 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: Apologies for confusion. What I meant was the following: mytext - I want the number 2000, not the number two thousand and the problem is to select 2000 as the first four digits after the word number. The position of 2000 in the string might change. thanks Lorenzo strapply in gsubfn searches mytext for the indicated regular expression and passes the back referenced portion (i.e. the portion of mytext matching the parenthesized portion of the regular expression) to the as.numeric function whose output is returned. library(gsubfn) strapply(mytext, number.*([0-9]{4}), as.numeric, simplify = TRUE) # 2000 See http://gsubfn.googlecode.com for more info. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
On Fri, Aug 26, 2011 at 7:27 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: .* is greedy... might want regex number[^0-9]*([0-9] {4}) to avoid getting 1999 from I want the number 2000, not the number 1999. If such inputs are possible we could also do this where we have added a ? after the * to make the repetition non-greedy and also have used simplify=unlist and ended it with [1] to get only the first match since it will otherwise match and return all occurrences: strapply(mytext, number.*?([0-9]{4}), as.numeric, simplify = unlist)[1] # 2000 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string manipulation
I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext - I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
Try this: gsub(.*second number , , mytext) On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext - I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
You can split your string, and then only take the first 4 digits after that (this is only an improvement if your numbers might not be at the end of mytext): mytext - I do not want the first number 1234, but the second number 5678 sstr-strsplit(mytext,split=second number )[[1]][2] nynumbers-substr(sstr,1,4) On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: gsub(.*second number , , mytext) On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext - I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
To be on the safe side in case there are other characters at the end of the string, use: mytext - I do not want the first number 1234, but the second number 5678sadfsadffdsa # make sure you get 4 digits sub(^.*second number[^[0-9]]*([0-9]{4}).*, \\1, mytext) [1] 5678 On Thu, Aug 25, 2011 at 7:00 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext - I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string manipulation
Apologies for confusion. What I meant was the following: mytext - I want the number 2000, not the number two thousand and the problem is to select 2000 as the first four digits after the word number. The position of 2000 in the string might change. thanks Lorenzo -Original Message- From: Steven Kennedy [mailto:stevenkennedy2...@gmail.com] Sent: Friday, 26 August 2011 11:31 AM To: Henrique Dallazuanna Cc: Lorenzo Cattarino; r-help@r-project.org Subject: Re: [R] string manipulation You can split your string, and then only take the first 4 digits after that (this is only an improvement if your numbers might not be at the end of mytext): mytext - I do not want the first number 1234, but the second number 5678 sstr-strsplit(mytext,split=second number )[[1]][2] nynumbers-substr(sstr,1,4) On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: gsub(.*second number , , mytext) On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino l.cattar...@uq.edu.au wrote: I R-users, I am trying to find the way to manipulate a character string to select a 4 digit number after some specific word/s. Example: mytext - I do not want the first number 1234, but the second number 5678 Is there any function that allows you to select a certain number of digits (in this case 5678) after a particular word/s (e.g., second number) Thank you for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String manipulation
Dear all, I have following kind of character vector: Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 123jhd51) Now I want to split each element of this vector according to numeric and string element. For example in the 1st element of that vector, there is no string element. Therefore I should get a vector of length 2 like c(, 344426) and so on. Can somebody point me how to achieve that in R? Is there any specific function for doing that? Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On Sun, Jun 26, 2011 at 10:54 AM, Megh Dal megh700...@yahoo.com wrote: Dear all, I have following kind of character vector: Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 123jhd51) Now I want to split each element of this vector according to numeric and string element. For example in the 1st element of that vector, there is no string element. Therefore I should get a vector of length 2 like c(, 344426) and so on. Can somebody point me how to achieve that in R? Is there any specific function for doing that? Try this and see the gsubfn home page at http://gsubfn.googlecode.com for more info: library(gsubfn) strapply(Vec, \\d+|\\D+, c) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On Jun 26, 2011, at 10:54 AM, Megh Dal wrote: Dear all, I have following kind of character vector: Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 123jhd51) Now I want to split each element of this vector according to numeric and string element. For example in the 1st element of that vector, there is no string element. Therefore I should get a vector of length 2 like c(, 344426) and so on. Can somebody point me how to achieve that in R? Is there any specific function for doing that? ?regex ?strsplit You don't do a very good job of describing your desired output, so this is two versions of what I am guessing that to be: cbind(lapply(strsplit(Vec, [^0-9]+), paste, collapse=), + lapply(strsplit(Vec, [0-9]+), paste, collapse=) ) [,1] [,2] [1,] 344426 [2,]dwjjsgcj [3,] 123sgdc [4,] 123aagha [5,] 343sdhasgh [6,] 12351 jhd data.frame(numbits=unlist(lapply(strsplit(Vec, [^0-9]+), paste, collapse=)), + alphabits=unlist(lapply(strsplit(Vec, [0-9]+), paste, collapse=)) ) numbits alphabits 1 344426 2 dwjjsgcj 3 123 sgdc 4 123 aagha 5 343 sdhasgh 6 12351 jhd -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On Sun, Jun 26, 2011 at 11:00 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Jun 26, 2011 at 10:54 AM, Megh Dal megh700...@yahoo.com wrote: Dear all, I have following kind of character vector: Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 123jhd51) Now I want to split each element of this vector according to numeric and string element. For example in the 1st element of that vector, there is no string element. Therefore I should get a vector of length 2 like c(, 344426) and so on. Can somebody point me how to achieve that in R? Is there any specific function for doing that? Try this and see the gsubfn home page at http://gsubfn.googlecode.com for more info: library(gsubfn) strapply(Vec, \\d+|\\D+, c) Also, if what you want is a leading string which begins Vec[[i]] followed by a numeric (and everything else is to be ignored) try this: strapply(Vec, ^(\\D*)(\\d*), c) If the first component must be string and you don't want to limit it to two try this (ignoring the warnings): L - strapply(Vec, \\d+|\\D+, c) lapply(L, function(x) if (length(x) == 0) x else if (is.na(as.numeric(x[1]))) x else c(, x)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String manipulation
Dear [R] people Could you please help with following How to convert a vector 'ac','ac','c','ac','ac','c' into a single string 'ac2_c_ac2_c' Thank you in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
Try this: x - c('ac','ac','c','ac','ac','c') rle(x) Run Length Encoding lengths: int [1:4] 2 1 2 1 values : chr [1:4] ac c ac c z - rle(x) paste(z$values, ifelse(z$lengths == 1, '', z$lengths), collapse='_', sep = '') [1] ac2_c_ac2_c On Tue, Mar 8, 2011 at 6:33 PM, Denis Kazakiewicz d.kazakiew...@gmail.com wrote: Dear [R] people Could you please help with following How to convert a vector 'ac','ac','c','ac','ac','c' into a single string 'ac2_c_ac2_c' Thank you in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
Dennis, If I understand you correctly (your example does not point unambiguously to one unique solution...) you could try: dummy- c('ac','ac','c','ac','ac','c') dummy.rle-rle(dummy) result - paste(dummy.rle$values,dummy.rle$lengths,collapse='_',sep='') You may need to remove the '1' in dummy.rle$lengths to get exactly what you wanted. HTH Jannis On 03/09/2011 12:33 AM, Denis Kazakiewicz wrote: Dear [R] people Could you please help with following How to convert a vector 'ac','ac','c','ac','ac','c' into a single string 'ac2_c_ac2_c' Thank you in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
A quick way to do this is to replace \d and \D with character classes [0-9.] and [^0-9.] . This assumes that there is no scientific notation and that there is nothing like 123.45.678 in the string. You did not account for a leading minus sign. The book Mastering Regular Expressions is probably worth the expense if you are going to be doing a lot of this, even though similar content can be gleaned from on line. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Megh Dal Sent: Sunday, February 13, 2011 4:42 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] String manipulation Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not working properly for following string: MyString - ABCFR34564IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Therefore there is decimal number in the 4th group, which is numeric then that is not taken care off... Similarly same kind of unintended result here as well: MyString - ABCFR34564.354IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+), c)[[1]] [1] ABCFR 34564 . 354 IJVEOJC 3434. 36453 Can you please tell me how can I modify that? Thanks, On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote: Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Try this. \\D+ and \\d+ match non-digits and digits respectively. The portions within parentheses are captures and passed to the c function. It returns a list with a component for each element of MyString. Like R's split it returns a list with a component per element of MyString but MyString only has one element so we get its contents using [[1]]. library(gsubfn) strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Alternately we could convert the relevant portions to numbers at the same time. ~ list(...) is interpreted as a function whose body is the right hand side of the ~ and whose arguments are the free variables, i.e. s1, s2, s3 and s4. strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1, as.numeric(s2), s3, as.numeric(s4)))[[1]] See http://gsubfn.googlecode.com for more. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String manipulation
Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote: Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Try this. \\D+ and \\d+ match non-digits and digits respectively. The portions within parentheses are captures and passed to the c function. It returns a list with a component for each element of MyString. Like R's split it returns a list with a component per element of MyString but MyString only has one element so we get its contents using [[1]]. library(gsubfn) strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Alternately we could convert the relevant portions to numbers at the same time. ~ list(...) is interpreted as a function whose body is the right hand side of the ~ and whose arguments are the free variables, i.e. s1, s2, s3 and s4. strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1, as.numeric(s2), s3, as.numeric(s4)))[[1]] See http://gsubfn.googlecode.com for more. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
If you have an indeterminate number of the patterns in the string, try the following: MyString - ABCFR34564IJVEOJC3434 # translate to the pattern sequences x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' + , '0011' + , MyString + ) x.rle - rle(strsplit(x, '')[[1]]) # determine the runs # create extraction matrix x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1))) + , cumsum(x.rle$lengths) + ) substring(MyString, x.ext[,1], x.ext[,2]) [1] ABCFR 34564 IJVEOJC 3434 On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote: Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On Sun, Feb 13, 2011 at 4:42 PM, Megh Dal megh700...@gmail.com wrote: Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not working properly for following string: MyString - ABCFR34564IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Therefore there is decimal number in the 4th group, which is numeric then that is not taken care off... Similarly same kind of unintended result here as well: MyString - ABCFR34564.354IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]] [1] ABCFR 34564 . 354 IJVEOJC 3434 . 36453 Can you please tell me how can I modify that? In that case we need to tell it that a number can include a dot. Additionally the following simplify the regular expressions by assuming any number of non-numeric followed by numeric fields strapply(MyString, (\\D+)([.0-9]+), c)[[1]] strapply(MyString, (\\D+)([.0-9]+), ~ list(s1, as.numeric(s2)))[[1]] -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
Just add '.' to the pattern specifier: MyString - ABCFR34564IJVEOJC3434.16ABC123.456KJHLKJH23452345AAA # translate to the pattern sequences x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.' + , '00111' + , MyString + ) x.rle - rle(strsplit(x, '')[[1]]) # determine the runs # create extraction matrix x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1))) + , cumsum(x.rle$lengths) + ) substring(MyString, x.ext[,1], x.ext[,2]) [1] ABCFR34564IJVEOJC 3434.16 ABC 123.456 KJHLKJH 23452345 AAA On Sun, Feb 13, 2011 at 2:07 PM, jim holtman jholt...@gmail.com wrote: If you have an indeterminate number of the patterns in the string, try the following: MyString - ABCFR34564IJVEOJC3434 # translate to the pattern sequences x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' + , '0011' + , MyString + ) x.rle - rle(strsplit(x, '')[[1]]) # determine the runs # create extraction matrix x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1))) + , cumsum(x.rle$lengths) + ) substring(MyString, x.ext[,1], x.ext[,2]) [1] ABCFR 34564 IJVEOJC 3434 On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote: Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not working properly for following string: MyString - ABCFR34564IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Therefore there is decimal number in the 4th group, which is numeric then that is not taken care off... Similarly same kind of unintended result here as well: MyString - ABCFR34564.354IJVEOJC3434.36453 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+), c)[[1]] [1] ABCFR 34564 . 354 IJVEOJC 3434. 36453 Can you please tell me how can I modify that? Thanks, On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote: Please consider following string: MyString - ABCFR34564IJVEOJC3434 Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Try this. \\D+ and \\d+ match non-digits and digits respectively. The portions within parentheses are captures and passed to the c function. It returns a list with a component for each element of MyString. Like R's split it returns a list with a component per element of MyString but MyString only has one element so we get its contents using [[1]]. library(gsubfn) strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]] [1] ABCFR 34564 IJVEOJC 3434 Alternately we could convert the relevant portions to numbers at the same time. ~ list(...) is interpreted as a function whose body is the right hand side of the ~ and whose arguments are the free variables, i.e. s1, s2, s3 and s4. strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1, as.numeric(s2), s3, as.numeric(s4)))[[1]] See http://gsubfn.googlecode.com for more. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String manipulation
Dear community, I have a problem with a string conversion: text [1]and\xc1d\xe1m [4] graphical interface MLP [7] Nagy networks Networks [10] neural Neural RBF [13] sod...@yahoo.com user with [16] and\xc1d\xe1m graphical [19] interface MLP I need to get rid off text[3,17] ! I have this kind of control-sequence a few times in my text and I do not get rid of it, by strsplit or sub. grep(\xc1d\xe1m,text) Error in grep(\xc1d\xe1m, text) : regular expression is invalid in this locale grep(\\xc1d\\xe1m,text) integer(0) Warning messages: 1: In grep(\\xc1d\\xe1m, text) : input string 3 is invalid in this locale 2: In grep(\\xc1d\\xe1m, text) : input string 17 is invalid in this locale Thanks in advance, Georg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
See ?Encoding and ?iconv: iconv(\xc1d\xe1m, from = '', to = 'latin1') On Sat, May 8, 2010 at 11:05 AM, Webby mailing-l...@gmx.net wrote: Dear community, I have a problem with a string conversion: text [1]and\xc1d\xe1m [4] graphical interface MLP [7] Nagy networks Networks [10] neural Neural RBF [13] sod...@yahoo.com user with [16] and\xc1d\xe1m graphical [19] interface MLP I need to get rid off text[3,17] ! I have this kind of control-sequence a few times in my text and I do not get rid of it, by strsplit or sub. grep(\xc1d\xe1m,text) Error in grep(\xc1d\xe1m, text) : regular expression is invalid in this locale grep(\\xc1d\\xe1m,text) integer(0) Warning messages: 1: In grep(\\xc1d\\xe1m, text) : input string 3 is invalid in this locale 2: In grep(\\xc1d\\xe1m, text) : input string 17 is invalid in this locale Thanks in advance, Georg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String manipulation
On May 8, 2010, at 10:05 AM, Webby wrote: Dear community, I have a problem with a string conversion: text [1]and\xc1d\xe1m [4] graphical interface MLP [7] Nagy networks Networks [10] neural Neural RBF [13] sod...@yahoo.com user with [16] and\xc1d\xe1m graphical [19] interface MLP I need to get rid off text[3,17] ! Does this work text[ grep([[:alnum:]]|, text) ] Still gives the warnings but seems to properly leave out the control- sequences. I have this kind of control-sequence a few times in my text and I do not get rid of it, by strsplit or sub. grep(\xc1d\xe1m,text) Error in grep(\xc1d\xe1m, text) : regular expression is invalid in this locale grep(\\xc1d\\xe1m,text) integer(0) Warning messages: 1: In grep(\\xc1d\\xe1m, text) : input string 3 is invalid in this locale 2: In grep(\\xc1d\\xe1m, text) : input string 17 is invalid in this locale Thanks in advance, Georg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String Manipulation- Extract numerical and alphanumerical segment
I am currently attempting to split a long list of strings (let's call it string.list) that is of the format: 1234567.z3.abcdef-gh.12 I have gotten it to: 1234567 z3 abcdef-gh 12 by use of the strsplit function. This leaves me with each element of string.list having a split string of the above format. What I'd like to do now is extract the first two strings of each element in string.list -- the 1234567 and the z3 -- and place them into two separate lists, say, firstsplit.numeric.list and secondsplit.alphanumeric.list I'm having some trouble figuring out how to do this. Any help would be greatly appreciated! -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation- Extract numerical and alphanumerical segment
Does this help: x - c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12) y - strsplit(x, '[.]') y [[1]] [1] 1234567 z3abcdef-gh 12 [[2]] [1] 1234567 z3abcdef-gh 12 [[3]] [1] 1234567 z3abcdef-gh 12 y.1 - sapply(y, '[[', 1) y.1 [1] 1234567 1234567 1234567 y.2 - sapply(y, '[[', 2) y.2 [1] z3 z3 z3 On Fri, Feb 5, 2010 at 10:11 AM, Su C. sushi...@gmail.com wrote: I am currently attempting to split a long list of strings (let's call it string.list) that is of the format: 1234567.z3.abcdef-gh.12 I have gotten it to: 1234567 z3 abcdef-gh 12 by use of the strsplit function. This leaves me with each element of string.list having a split string of the above format. What I'd like to do now is extract the first two strings of each element in string.list -- the 1234567 and the z3 -- and place them into two separate lists, say, firstsplit.numeric.list and secondsplit.alphanumeric.list I'm having some trouble figuring out how to do this. Any help would be greatly appreciated! -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation- Extract numerical and alphanumerical segment
On Fri, Feb 5, 2010 at 9:29 AM, jim holtman jholt...@gmail.com wrote: Does this help: x - c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12) y - strsplit(x, '[.]') Here's another way with the stringr package: library(stringr) x - c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12) y - str_split_fixed(x, '[.]', 4) y[, 1] y[, 2] Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation- Extract numerical and alphanumerical segment
Yes, that was perfect! Thank you so much! Just to clarify, since I'm kind of new to string manipulation-- is that '[[' in the sapply function what is designating splits/elements within the string? So that's the part that says I want this particular element and the 1 or 2 or number is what designates location? And, if while looking at the second column, I want to verify if the alphabetical character is say, a 'z' or an 'a' or a 'b', what would be an elegant way to do that besides splitting the second column into alphabetical and numerical values, and then testing against z,a,b, using a for loop and a boolean statement? I want to assign a 1 for z's, a 2 for a's, and a 3 for b's. On Fri, Feb 5, 2010 at 10:30 AM, jholtman [via R] ml-node+1470341-841877...@n4.nabble.comml-node%2b1470341-841877...@n4.nabble.com wrote: Does this help: x - c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12) y - strsplit(x, '[.]') y [[1]] [1] 1234567 z3abcdef-gh 12 [[2]] [1] 1234567 z3abcdef-gh 12 [[3]] [1] 1234567 z3abcdef-gh 12 y.1 - sapply(y, '[[', 1) y.1 [1] 1234567 1234567 1234567 y.2 - sapply(y, '[[', 2) y.2 [1] z3 z3 z3 On Fri, Feb 5, 2010 at 10:11 AM, Su C. [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=0 wrote: I am currently attempting to split a long list of strings (let's call it string.list) that is of the format: 1234567.z3.abcdef-gh.12 I have gotten it to: 1234567 z3 abcdef-gh 12 by use of the strsplit function. This leaves me with each element of string.list having a split string of the above format. What I'd like to do now is extract the first two strings of each element in string.list -- the 1234567 and the z3 -- and place them into two separate lists, say, firstsplit.numeric.list and secondsplit.alphanumeric.list I'm having some trouble figuring out how to do this. Any help would be greatly appreciated! -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=1mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=2mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470341.html To unsubscribe from String Manipulation- Extract numerical and alphanumerical segment, click here (link removed) ==. -- Su H. Chu Carnegie Mellon University Economics and Statistics '09 -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470358.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String Manipulation- Extract numerical and alphanumerical segment
The '[[' is just the index access to an object. type: ?'[[' to see the help page. Actually I should have used '[' in this case: sapply(y, '[', 1) [1] 1234567 1234567 1234567 is equivalent to: sapply(y, function(a) a[1]) [1] 1234567 1234567 1234567 So set a value based on the first character, just extract the first character (e.g., substring) and then index into a vector with the key values: key - c(z=1, a=2, b=3) # mapping values data - c('a','c','b','d','z','a','b') # data to be mapped key[data] a NAb NAzab 2 NA3 NA123 On Fri, Feb 5, 2010 at 10:41 AM, Su C. sushi...@gmail.com wrote: Yes, that was perfect! Thank you so much! Just to clarify, since I'm kind of new to string manipulation-- is that '[[' in the sapply function what is designating splits/elements within the string? So that's the part that says I want this particular element and the 1 or 2 or number is what designates location? And, if while looking at the second column, I want to verify if the alphabetical character is say, a 'z' or an 'a' or a 'b', what would be an elegant way to do that besides splitting the second column into alphabetical and numerical values, and then testing against z,a,b, using a for loop and a boolean statement? I want to assign a 1 for z's, a 2 for a's, and a 3 for b's. On Fri, Feb 5, 2010 at 10:30 AM, jholtman [via R] ml-node+1470341-841877...@n4.nabble.comml-node%2b1470341-841877...@n4.nabble.com wrote: Does this help: x - c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12) y - strsplit(x, '[.]') y [[1]] [1] 1234567 z3 abcdef-gh 12 [[2]] [1] 1234567 z3 abcdef-gh 12 [[3]] [1] 1234567 z3 abcdef-gh 12 y.1 - sapply(y, '[[', 1) y.1 [1] 1234567 1234567 1234567 y.2 - sapply(y, '[[', 2) y.2 [1] z3 z3 z3 On Fri, Feb 5, 2010 at 10:11 AM, Su C. [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=0 wrote: I am currently attempting to split a long list of strings (let's call it string.list) that is of the format: 1234567.z3.abcdef-gh.12 I have gotten it to: 1234567 z3 abcdef-gh 12 by use of the strsplit function. This leaves me with each element of string.list having a split string of the above format. What I'd like to do now is extract the first two strings of each element in string.list -- the 1234567 and the z3 -- and place them into two separate lists, say, firstsplit.numeric.list and secondsplit.alphanumeric.list I'm having some trouble figuring out how to do this. Any help would be greatly appreciated! -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=1mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ [hidden email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=2mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470341.html To unsubscribe from String Manipulation- Extract numerical and alphanumerical segment, click here (link removed) ==. -- Su H. Chu Carnegie Mellon University Economics and Statistics '09 -- View this message in context: http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470358.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.