[R] String manipulation

2014-12-08 Thread Gang Chen
I want to do the following: if a string does not contain a colon (:),
no change is needed; if it contains one or more colons, break the
string into multiple strings using the colon as a separator. For
example, happy: becomes

happy :

:sad turns to

: sad

and happy:sad changes to

happy : sad

How to do this?

Thanks,
Gang

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2014-12-08 Thread William Dunlap
strsplit(split=:) does almost what you want, but it omits the colons from
the output.  You can use perl zero-length look-ahead and look-behind
operators in the split argument to get the colons as well:

 strsplit(c(:sad, happy:, happy:sad), split=(?=:)|(?=:),
perl=TRUE)
[[1]]
[1] :   sad

[[2]]
[1] happy :

[[3]]
[1] happy : sad



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Dec 8, 2014 at 9:08 AM, Gang Chen gangch...@gmail.com wrote:

 I want to do the following: if a string does not contain a colon (:),
 no change is needed; if it contains one or more colons, break the
 string into multiple strings using the colon as a separator. For
 example, happy: becomes

 happy :

 :sad turns to

 : sad

 and happy:sad changes to

 happy : sad

 How to do this?

 Thanks,
 Gang

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2014-12-08 Thread William Dunlap
Actually, the zero-length look-ahead expression is enough to get the job
done:

 strsplit(c(:sad, happy:, happy:sad, :happy:sad:subdued:),
split=(?=:), perl=TRUE)
[[1]]
[1] :   sad

[[2]]
[1] happy :

[[3]]
[1] happy : sad

[[4]]
[1] :   happy   :   sad :   subdued :



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Dec 8, 2014 at 1:13 PM, William Dunlap wdun...@tibco.com wrote:

 strsplit(split=:) does almost what you want, but it omits the colons
 from the output.  You can use perl zero-length look-ahead and look-behind
 operators in the split argument to get the colons as well:

  strsplit(c(:sad, happy:, happy:sad), split=(?=:)|(?=:),
 perl=TRUE)
 [[1]]
 [1] :   sad

 [[2]]
 [1] happy :

 [[3]]
 [1] happy : sad



 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Mon, Dec 8, 2014 at 9:08 AM, Gang Chen gangch...@gmail.com wrote:

 I want to do the following: if a string does not contain a colon (:),
 no change is needed; if it contains one or more colons, break the
 string into multiple strings using the colon as a separator. For
 example, happy: becomes

 happy :

 :sad turns to

 : sad

 and happy:sad changes to

 happy : sad

 How to do this?

 Thanks,
 Gang

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String Manipulation in R

2012-06-12 Thread anjali
Hi ,
Is there any inbuilt functions  to check whether a substring is present in a
string and give the result as boolean 
Thanks 


--
View this message in context: 
http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation in R

2012-06-12 Thread R. Michael Weylandt
grepl

Michael

On Tue, Jun 12, 2012 at 8:51 AM, anjali jeevi...@gmail.com wrote:
 Hi ,
 Is there any inbuilt functions  to check whether a substring is present in a
 string and give the result as boolean
 Thanks


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation in R

2012-06-12 Thread Jeff Newmiller
?grepl

Note that this function uses regular expressions, in which certain characters 
have special meanings, so depending on what string you are looking for you may 
have to know something about regex patterns to get it to work.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

anjali jeevi...@gmail.com wrote:

Hi ,
Is there any inbuilt functions  to check whether a substring is present
in a
string and give the result as boolean 
Thanks 


--
View this message in context:
http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation in R

2012-06-12 Thread Rui Barradas

Hello,

Yes, there is. See ?grepl or help('grepl').

Hope this helps,

Rui Barradas

Em 12-06-2012 14:51, anjali escreveu:

Hi ,
Is there any inbuilt functions  to check whether a substring is present in a
string and give the result as boolean
Thanks


--
View this message in context: 
http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation in R

2012-06-12 Thread Greg Snow
Or use 'fixed=TRUE' as an argument to grepl to avoid the regular
expression matching (but learning regular expressions will be a useful
tool in the long run).

On Tue, Jun 12, 2012 at 9:15 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 ?grepl

 Note that this function uses regular expressions, in which certain characters 
 have special meanings, so depending on what string you are looking for you 
 may have to know something about regex patterns to get it to work.
 ---
 Jeff Newmiller                        The     .       .  Go Live...
 DCN:jdnew...@dcn.davis.ca.us        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
 /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 anjali jeevi...@gmail.com wrote:

Hi ,
Is there any inbuilt functions  to check whether a substring is present
in a
string and give the result as boolean
Thanks


--
View this message in context:
http://r.789695.n4.nabble.com/String-Manipulation-in-R-tp4633104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation with regexpr, got to be a better way

2011-09-30 Thread Eik Vettorazzi
Hi Chris,
why not using routines for dates
dates - c(09/10/2003, 10/22/2005)
format(strptime(dates,format=%m/%d/%Y),%Y)

or take just the last 4 chars from dates
gsub(.*([0-9]{4})$,\\1,dates)

cheers

Am 29.09.2011 16:23, schrieb Chris Conner:
 Help-Rs,
  
 I'm doing some string manipulation in a file where I converted a string date 
 in mm/dd/ format and returned the date .
  
 I've used regexpr (hat tip to Gabor G for a very nice earlier post on this 
 function) in steps (I've un-nested the code and provided it and an example of 
 what I did below.  My question is: is there a more efficient way to do this.  
 Specifically is there a way to use regexpr or some other string function to 
 return not the first instance, but the 2nd (or for that matter 3rd, 4th or 
 5th instance) of a certain string?
  
  #first find the first occurence of / and create a variable for this 
 firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then use frist/ to 
 cut the string field into an intermediate variable e.g., from 1/1/2008 to 
 1/2008. step1 - substr( dates,  (firstslash + 1), nchar(dates) ) #then 
 repeat steps 1 and 2...there's got to be a better way step2 - 
 unlist(regexpr(/, step1, fixed = TRUE)) #then use step2 to cut string into 
 final product e.g., from 1/2008 to 2008. final - substring(step1,step2 + 1, 
 nchar(step1) )
  
 Thx!
 C
   [[alternative HTML version deleted]]
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander 
Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String manipulation with regexpr, got to be a better way

2011-09-29 Thread Chris Conner
Help-Rs,
 
I'm doing some string manipulation in a file where I converted a string date in 
mm/dd/ format and returned the date .
 
I've used regexpr (hat tip to Gabor G for a very nice earlier post on this 
function) in steps (I've un-nested the code and provided it and an example of 
what I did below.  My question is: is there a more efficient way to do this.  
Specifically is there a way to use regexpr or some other string function to 
return not the first instance, but the 2nd (or for that matter 3rd, 4th or 5th 
instance) of a certain string?
 
 #first find the first occurence of / and create a variable for this 
firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then use frist/ to cut 
the string field into an intermediate variable e.g., from 1/1/2008 to 1/2008. 
step1 - substr( dates,  (firstslash + 1), nchar(dates) ) #then repeat steps 1 
and 2...there's got to be a better way step2 - unlist(regexpr(/, step1, 
fixed = TRUE)) #then use step2 to cut string into final product e.g., from 
1/2008 to 2008. final - substring(step1,step2 + 1, nchar(step1) )
 
Thx!
C
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation with regexpr, got to be a better way

2011-09-29 Thread Jean V Adams
Chris Conner wrote on 09/29/2011 09:23:02 AM:
 
 Help-Rs,
  
 I'm doing some string manipulation in a file where I converted a 
 string date in mm/dd/ format and returned the date .
  
 I've used regexpr (hat tip to Gabor G for a very nice earlier post 
 on this function) in steps (I've un-nested the code and provided it 
 and an example of what I did below.  My question is: is there a more
 efficient way to do this.  Specifically is there a way to use 
 regexpr or some other string function to return not the first 
 instance, but the 2nd (or for that matter 3rd, 4th or 5th instance) 
 of a certain string?
  
  #first find the first occurence of / and create a variable for 
 this firstslash - unlist(regexpr(/, dates, fixed = TRUE)) #then 
 use frist/ to cut the string field into an intermediate variable 
 e.g., from 1/1/2008 to 1/2008. step1 - substr( dates,  (firstslash 
 + 1), nchar(dates) ) #then repeat steps 1 and 2...there's got to be 
 a better way step2 - unlist(regexpr(/, step1, fixed = TRUE)) 
 #then use step2 to cut string into final product e.g., from 1/2008 
 to 2008. final - substring(step1,step2 + 1, nchar(step1) )
  
 Thx!
 C


# a couple example dates
dates - c(09/10/2003, 10/22/2005)

# split the dates
dates.split - strsplit(dates, /)

# extract the years
sapply(dates.split, [, 3)

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-26 Thread Janko Thyson

You might want to take a look at 'regexpr' and/or 'gregexpr':

mytext - I want the number 2000, not the number two thousand
idx - regexpr(\\d{4}, mytext)
idx - c(idx, (idx + attributes(idx)$match.length)-1)
substr(start=idx[1], stop=idx[2], mytext)

HTH,
Janko

On 26.08.2011 03:51, Lorenzo Cattarino wrote:

Apologies for confusion. What I meant was the following:

mytext- I want the number 2000, not the number two thousand

and the problem is to select 2000 as the first four digits after the word 
number. The position of 2000 in the string might change.

thanks
Lorenzo

-Original Message-
From: Steven Kennedy [mailto:stevenkennedy2...@gmail.com]
Sent: Friday, 26 August 2011 11:31 AM
To: Henrique Dallazuanna
Cc: Lorenzo Cattarino; r-help@r-project.org
Subject: Re: [R] string manipulation

You can split your string, and then only take the first 4 digits after
that (this is only an improvement if your numbers might not be at the
end of mytext):

mytext- I do not want the first number 1234, but the second number 5678
sstr-strsplit(mytext,split=second number )[[1]][2]
nynumbers-substr(sstr,1,4)


On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuannawww...@gmail.com  wrote:

Try this:

gsub(.*second number , , mytext)

On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino
l.cattar...@uq.edu.au  wrote:

I R-users,

I am trying to find the way to manipulate a character string to select a 4 
digit number after some specific word/s. Example:

mytext- I do not want the first number 1234, but the second number 5678

Is there any function that allows you to select a certain number of digits (in 
this case 5678) after a particular word/s (e.g., second number)

Thank you for your help

Lorenzo


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-26 Thread Gabor Grothendieck
On Thu, Aug 25, 2011 at 9:51 PM, Lorenzo Cattarino
l.cattar...@uq.edu.au wrote:
 Apologies for confusion. What I meant was the following:

 mytext - I want the number 2000, not the number two thousand

 and the problem is to select 2000 as the first four digits after the word 
 number. The position of 2000 in the string might change.

 thanks
 Lorenzo


strapply in gsubfn searches mytext for the indicated regular
expression and passes the back referenced portion (i.e. the portion of
mytext matching the parenthesized portion of the regular expression)
to the as.numeric function whose output is returned.

library(gsubfn)
strapply(mytext, number.*([0-9]{4}), as.numeric, simplify = TRUE) # 2000

See http://gsubfn.googlecode.com for more info.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-26 Thread Jeff Newmiller
.* is greedy... might want regex number[^0-9]*([0-9] {4}) to avoid getting 
1999 from I want the number 2000, not the number 1999.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Gabor Grothendieck ggrothendi...@gmail.com wrote:

On Thu, Aug 25, 2011 at 9:51 PM, Lorenzo Cattarino
l.cattar...@uq.edu.au wrote:
 Apologies for confusion. What I meant was the following:

 mytext - I want the number 2000, not the number two thousand

 and the problem is to select 2000 as the first four digits after the word 
 number. The position of 2000 in the string might change.

 thanks
 Lorenzo


strapply in gsubfn searches mytext for the indicated regular
expression and passes the back referenced portion (i.e. the portion of
mytext matching the parenthesized portion of the regular expression)
to the as.numeric function whose output is returned.

library(gsubfn)
strapply(mytext, number.*([0-9]{4}), as.numeric, simplify = TRUE) # 2000

See http://gsubfn.googlecode.com for more info.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-26 Thread Gabor Grothendieck
On Fri, Aug 26, 2011 at 7:27 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 .* is greedy... might want regex number[^0-9]*([0-9] {4}) to avoid
 getting 1999 from I want the number 2000, not the number 1999.

If such inputs are possible we could also do this where we have added
a ? after the * to make the repetition non-greedy and also have used
simplify=unlist and ended it with [1] to get only the first match
since it will otherwise match and return all occurrences:

strapply(mytext, number.*?([0-9]{4}), as.numeric, simplify = unlist)[1] # 2000

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] string manipulation

2011-08-25 Thread Lorenzo Cattarino
I R-users,

I am trying to find the way to manipulate a character string to select a 4 
digit number after some specific word/s. Example:

mytext - I do not want the first number 1234, but the second number 5678

Is there any function that allows you to select a certain number of digits (in 
this case 5678) after a particular word/s (e.g., second number)

Thank you for your help

Lorenzo


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-25 Thread Henrique Dallazuanna
Try this:

gsub(.*second number , , mytext)

On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino
l.cattar...@uq.edu.au wrote:
 I R-users,

 I am trying to find the way to manipulate a character string to select a 4 
 digit number after some specific word/s. Example:

 mytext - I do not want the first number 1234, but the second number 5678

 Is there any function that allows you to select a certain number of digits 
 (in this case 5678) after a particular word/s (e.g., second number)

 Thank you for your help

 Lorenzo


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-25 Thread Steven Kennedy
You can split your string, and then only take the first 4 digits after
that (this is only an improvement if your numbers might not be at the
end of mytext):

mytext - I do not want the first number 1234, but the second number 5678
sstr-strsplit(mytext,split=second number )[[1]][2]
nynumbers-substr(sstr,1,4)


On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 gsub(.*second number , , mytext)

 On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino
 l.cattar...@uq.edu.au wrote:
 I R-users,

 I am trying to find the way to manipulate a character string to select a 4 
 digit number after some specific word/s. Example:

 mytext - I do not want the first number 1234, but the second number 5678

 Is there any function that allows you to select a certain number of digits 
 (in this case 5678) after a particular word/s (e.g., second number)

 Thank you for your help

 Lorenzo


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-25 Thread jim holtman
To be on the safe side in case there are other characters at the end
of the string, use:

 mytext - I do not want the first number 1234, but the second number 
 5678sadfsadffdsa
 # make sure you get 4 digits
 sub(^.*second number[^[0-9]]*([0-9]{4}).*, \\1, mytext)
[1] 5678



On Thu, Aug 25, 2011 at 7:00 PM, Lorenzo Cattarino
l.cattar...@uq.edu.au wrote:
 I R-users,

 I am trying to find the way to manipulate a character string to select a 4 
 digit number after some specific word/s. Example:

 mytext - I do not want the first number 1234, but the second number 5678

 Is there any function that allows you to select a certain number of digits 
 (in this case 5678) after a particular word/s (e.g., second number)

 Thank you for your help

 Lorenzo


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string manipulation

2011-08-25 Thread Lorenzo Cattarino
Apologies for confusion. What I meant was the following:

mytext - I want the number 2000, not the number two thousand

and the problem is to select 2000 as the first four digits after the word 
number. The position of 2000 in the string might change.  

thanks 
Lorenzo

-Original Message-
From: Steven Kennedy [mailto:stevenkennedy2...@gmail.com] 
Sent: Friday, 26 August 2011 11:31 AM
To: Henrique Dallazuanna
Cc: Lorenzo Cattarino; r-help@r-project.org
Subject: Re: [R] string manipulation

You can split your string, and then only take the first 4 digits after
that (this is only an improvement if your numbers might not be at the
end of mytext):

mytext - I do not want the first number 1234, but the second number 5678
sstr-strsplit(mytext,split=second number )[[1]][2]
nynumbers-substr(sstr,1,4)


On Fri, Aug 26, 2011 at 11:18 AM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 gsub(.*second number , , mytext)

 On Thu, Aug 25, 2011 at 8:00 PM, Lorenzo Cattarino
 l.cattar...@uq.edu.au wrote:
 I R-users,

 I am trying to find the way to manipulate a character string to select a 4 
 digit number after some specific word/s. Example:

 mytext - I do not want the first number 1234, but the second number 5678

 Is there any function that allows you to select a certain number of digits 
 (in this case 5678) after a particular word/s (e.g., second number)

 Thank you for your help

 Lorenzo


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String manipulation

2011-06-26 Thread Megh Dal
Dear all, I have following kind of character vector:

Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 123jhd51)


Now I want to split each element of this vector according to numeric and string 
element. For example in the 1st element of that vector, there is no string 
element. Therefore I should get a vector of length 2 like c(, 344426) and 
so on.

Can somebody point me how to achieve that in R? Is there any specific function 
for doing that?

Thanks,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-06-26 Thread Gabor Grothendieck
On Sun, Jun 26, 2011 at 10:54 AM, Megh Dal megh700...@yahoo.com wrote:
 Dear all, I have following kind of character vector:

 Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 
 123jhd51)


 Now I want to split each element of this vector according to numeric and 
 string element. For example in the 1st element of that vector, there is no 
 string element. Therefore I should get a vector of length 2 like c(, 
 344426) and so on.

 Can somebody point me how to achieve that in R? Is there any specific 
 function for doing that?


Try this and see the gsubfn home page at http://gsubfn.googlecode.com
for more info:

library(gsubfn)
strapply(Vec, \\d+|\\D+, c)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-06-26 Thread David Winsemius


On Jun 26, 2011, at 10:54 AM, Megh Dal wrote:


Dear all, I have following kind of character vector:

Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh,  
123jhd51)



Now I want to split each element of this vector according to numeric  
and string element. For example in the 1st element of that vector,  
there is no string element. Therefore I should get a vector of  
length 2 like c(, 344426) and so on.


Can somebody point me how to achieve that in R? Is there any  
specific function for doing that?


?regex
?strsplit

You don't do a very good job of describing your desired output, so  
this is two versions of what I am guessing that to be:


 cbind(lapply(strsplit(Vec, [^0-9]+), paste, collapse=),
+   lapply(strsplit(Vec, [0-9]+), paste, collapse=) )
 [,1] [,2]
[1,] 344426 
[2,]dwjjsgcj
[3,] 123sgdc
[4,] 123aagha
[5,] 343sdhasgh
[6,] 12351  jhd

 data.frame(numbits=unlist(lapply(strsplit(Vec, [^0-9]+), paste,  
collapse=)),
+   alphabits=unlist(lapply(strsplit(Vec, [0-9]+), paste,  
collapse=)) )

  numbits alphabits
1  344426
2  dwjjsgcj
3 123  sgdc
4 123 aagha
5 343   sdhasgh
6   12351   jhd

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-06-26 Thread Gabor Grothendieck
On Sun, Jun 26, 2011 at 11:00 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Sun, Jun 26, 2011 at 10:54 AM, Megh Dal megh700...@yahoo.com wrote:
 Dear all, I have following kind of character vector:

 Vec - c(344426, dwjjsgcj, 123sgdc, aagha123, sdh343asgh, 
 123jhd51)


 Now I want to split each element of this vector according to numeric and 
 string element. For example in the 1st element of that vector, there is no 
 string element. Therefore I should get a vector of length 2 like c(, 
 344426) and so on.

 Can somebody point me how to achieve that in R? Is there any specific 
 function for doing that?


 Try this and see the gsubfn home page at http://gsubfn.googlecode.com
 for more info:

 library(gsubfn)
 strapply(Vec, \\d+|\\D+, c)


Also, if what you want is a leading string which begins Vec[[i]]
followed by a numeric (and everything else is to be ignored) try this:

strapply(Vec, ^(\\D*)(\\d*), c)

If the first component must be string and you don't want to limit it
to two try this (ignoring the warnings):

L - strapply(Vec, \\d+|\\D+, c)
lapply(L, function(x) if (length(x) == 0) x else if
(is.na(as.numeric(x[1]))) x else c(, x))

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String manipulation

2011-03-08 Thread Denis Kazakiewicz
Dear [R] people
Could you please help with following


How to convert a vector

'ac','ac','c','ac','ac','c'

into a single string
'ac2_c_ac2_c'


Thank you in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-03-08 Thread jim holtman
Try this:

 x - c('ac','ac','c','ac','ac','c')
 rle(x)
Run Length Encoding
  lengths: int [1:4] 2 1 2 1
  values : chr [1:4] ac c ac c
 z - rle(x)
 paste(z$values, ifelse(z$lengths == 1, '', z$lengths), collapse='_', sep = '')
[1] ac2_c_ac2_c



On Tue, Mar 8, 2011 at 6:33 PM, Denis Kazakiewicz
d.kazakiew...@gmail.com wrote:
 Dear [R] people
 Could you please help with following


 How to convert a vector

 'ac','ac','c','ac','ac','c'

 into a single string
 'ac2_c_ac2_c'


 Thank you in advance

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-03-08 Thread Jannis

Dennis,

If I understand you correctly (your example does not point unambiguously 
to one unique solution...)

you could try:

dummy- c('ac','ac','c','ac','ac','c')
dummy.rle-rle(dummy)

result - paste(dummy.rle$values,dummy.rle$lengths,collapse='_',sep='')

You may need to remove the '1' in dummy.rle$lengths to get exactly what 
you wanted.


HTH
Jannis

On 03/09/2011 12:33 AM, Denis Kazakiewicz wrote:

Dear [R] people
Could you please help with following


How to convert a vector

'ac','ac','c','ac','ac','c'

into a single string
'ac2_c_ac2_c'


Thank you in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-16 Thread rex.dwyer
A quick way to do this is to replace \d and \D with character classes [0-9.]
and [^0-9.] .  This assumes that there is no scientific notation and that there 
is nothing like 123.45.678 in the string.  You did not account for a leading 
minus sign.
The book Mastering Regular Expressions is probably worth the expense if you are 
going to be doing a lot of this, even though similar content can be gleaned 
from on line.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Megh Dal
Sent: Sunday, February 13, 2011 4:42 PM
To: Gabor Grothendieck
Cc: r-help@r-project.org
Subject: Re: [R] String manipulation

Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not
working properly for following string:

 MyString - ABCFR34564IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+),
c)[[1]]
[1] ABCFR   34564   IJVEOJC 3434

Therefore there is decimal number in the 4th group, which is numeric then
that is not taken care off...

Similarly same kind of unintended result here as well:

 MyString - ABCFR34564.354IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+),
c)[[1]]
[1] ABCFR   34564   .   354 IJVEOJC 3434.
36453
Can you please tell me how can I modify that?

Thanks,


On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

  On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote:
  Please consider following string:
 
  MyString - ABCFR34564IJVEOJC3434
 
  Here you see that, there are 4 groups in above string. 1st and 3rd groups
  are for english letters and 2nd and 4th for numeric. Given a string, how
 can
  I separate out those 4 groups?
 

 Try this.  \\D+ and \\d+ match non-digits and digits respectively.
  The portions within parentheses are captures and passed to the c
 function.  It returns a list with a component for each element of
 MyString.  Like R's split it returns a list with a component per
 element of MyString but MyString only has one element so we get its
 contents using  [[1]].

  library(gsubfn)
  strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]]
 [1] ABCFR   34564   IJVEOJC 3434

 Alternately we could convert the relevant portions to numbers at the
 same time.  ~ list(...) is interpreted as a  function whose body is
 the right hand side of the ~ and whose arguments are the free
 variables, i.e. s1, s2, s3 and s4.

 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1,
 as.numeric(s2), s3, as.numeric(s4)))[[1]]

 See http://gsubfn.googlecode.com for more.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String manipulation

2011-02-13 Thread Megh Dal
Please consider following string:

MyString - ABCFR34564IJVEOJC3434

Here you see that, there are 4 groups in above string. 1st and 3rd groups
are for english letters and 2nd and 4th for numeric. Given a string, how can
I separate out those 4 groups?

Thanks for your time

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-13 Thread Gabor Grothendieck
On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote:
 Please consider following string:

 MyString - ABCFR34564IJVEOJC3434

 Here you see that, there are 4 groups in above string. 1st and 3rd groups
 are for english letters and 2nd and 4th for numeric. Given a string, how can
 I separate out those 4 groups?


Try this.  \\D+ and \\d+ match non-digits and digits respectively.
 The portions within parentheses are captures and passed to the c
function.  It returns a list with a component for each element of
MyString.  Like R's split it returns a list with a component per
element of MyString but MyString only has one element so we get its
contents using  [[1]].

 library(gsubfn)
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]]
[1] ABCFR   34564   IJVEOJC 3434

Alternately we could convert the relevant portions to numbers at the
same time.  ~ list(...) is interpreted as a  function whose body is
the right hand side of the ~ and whose arguments are the free
variables, i.e. s1, s2, s3 and s4.

strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1,
as.numeric(s2), s3, as.numeric(s4)))[[1]]

See http://gsubfn.googlecode.com for more.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-13 Thread jim holtman
If you have an indeterminate number of the patterns in the string, try
the following:

 MyString - ABCFR34564IJVEOJC3434
 # translate to the pattern sequences
 x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
+   , '0011'
+   , MyString
+   )
 x.rle - rle(strsplit(x, '')[[1]])  # determine the runs
 # create extraction matrix
 x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1)))
+ , cumsum(x.rle$lengths)
+ )
 substring(MyString, x.ext[,1], x.ext[,2])
[1] ABCFR   34564   IJVEOJC 3434



On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote:
 Please consider following string:

 MyString - ABCFR34564IJVEOJC3434

 Here you see that, there are 4 groups in above string. 1st and 3rd groups
 are for english letters and 2nd and 4th for numeric. Given a string, how can
 I separate out those 4 groups?

 Thanks for your time

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-13 Thread Gabor Grothendieck
On Sun, Feb 13, 2011 at 4:42 PM, Megh Dal megh700...@gmail.com wrote:
 Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not
 working properly for following string:

 MyString - ABCFR34564IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]]
 [1] ABCFR   34564   IJVEOJC 3434

 Therefore there is decimal number in the 4th group, which is numeric then
 that is not taken care off...

 Similarly same kind of unintended result here as well:

 MyString - ABCFR34564.354IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]]
 [1] ABCFR   34564   .   354 IJVEOJC 3434    .
 36453
 Can you please tell me how can I modify that?


In that case we need to tell it that a number can include a dot.
Additionally the following simplify the regular expressions by
assuming any number of non-numeric followed by numeric fields

strapply(MyString, (\\D+)([.0-9]+), c)[[1]]

strapply(MyString, (\\D+)([.0-9]+), ~ list(s1, as.numeric(s2)))[[1]]


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-13 Thread jim holtman
Just add '.' to the pattern specifier:

 MyString - ABCFR34564IJVEOJC3434.16ABC123.456KJHLKJH23452345AAA
 # translate to the pattern sequences
 x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.'
+   , '00111'
+   , MyString
+   )
 x.rle - rle(strsplit(x, '')[[1]])  # determine the runs
 # create extraction matrix
 x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1)))
+ , cumsum(x.rle$lengths)
+ )
 substring(MyString, x.ext[,1], x.ext[,2])
[1] ABCFR34564IJVEOJC  3434.16  ABC  123.456
KJHLKJH  23452345 AAA



On Sun, Feb 13, 2011 at 2:07 PM, jim holtman jholt...@gmail.com wrote:
 If you have an indeterminate number of the patterns in the string, try
 the following:

 MyString - ABCFR34564IJVEOJC3434
 # translate to the pattern sequences
 x - chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
 +           , '0011'
 +           , MyString
 +           )
 x.rle - rle(strsplit(x, '')[[1]])  # determine the runs
 # create extraction matrix
 x.ext - cbind(cumsum(c(1, head(x.rle$lengths, -1)))
 +                     , cumsum(x.rle$lengths)
 +                     )
 substring(MyString, x.ext[,1], x.ext[,2])
 [1] ABCFR   34564   IJVEOJC 3434



 On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote:
 Please consider following string:

 MyString - ABCFR34564IJVEOJC3434

 Here you see that, there are 4 groups in above string. 1st and 3rd groups
 are for english letters and 2nd and 4th for numeric. Given a string, how can
 I separate out those 4 groups?

 Thanks for your time

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2011-02-13 Thread Megh Dal
Hi Gabor, thanks (and Jim as well) for your suggestion. However this is not
working properly for following string:

 MyString - ABCFR34564IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+),
c)[[1]]
[1] ABCFR   34564   IJVEOJC 3434

Therefore there is decimal number in the 4th group, which is numeric then
that is not taken care off...

Similarly same kind of unintended result here as well:

 MyString - ABCFR34564.354IJVEOJC3434.36453
 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d file://d+)(//d+)(//D+)(//d+),
c)[[1]]
[1] ABCFR   34564   .   354 IJVEOJC 3434.
36453
Can you please tell me how can I modify that?

Thanks,


On Sun, Feb 13, 2011 at 11:10 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

  On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal megh700...@gmail.com wrote:
  Please consider following string:
 
  MyString - ABCFR34564IJVEOJC3434
 
  Here you see that, there are 4 groups in above string. 1st and 3rd groups
  are for english letters and 2nd and 4th for numeric. Given a string, how
 can
  I separate out those 4 groups?
 

 Try this.  \\D+ and \\d+ match non-digits and digits respectively.
  The portions within parentheses are captures and passed to the c
 function.  It returns a list with a component for each element of
 MyString.  Like R's split it returns a list with a component per
 element of MyString but MyString only has one element so we get its
 contents using  [[1]].

  library(gsubfn)
  strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), c)[[1]]
 [1] ABCFR   34564   IJVEOJC 3434

 Alternately we could convert the relevant portions to numbers at the
 same time.  ~ list(...) is interpreted as a  function whose body is
 the right hand side of the ~ and whose arguments are the free
 variables, i.e. s1, s2, s3 and s4.

 strapply(MyString, (\\D+)(\\d+)(\\D+)(\\d+), ~ list(s1,
 as.numeric(s2), s3, as.numeric(s4)))[[1]]

 See http://gsubfn.googlecode.com for more.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String manipulation

2010-05-08 Thread Webby

Dear community,

I have a problem with a string conversion:

 text
 [1]and\xc1d\xe1m
 [4] graphical  interface  MLP
 [7] Nagy   networks   Networks
[10] neural Neural RBF
[13] sod...@yahoo.com user   with
[16] and\xc1d\xe1m graphical
[19] interface  MLP


I need to get rid off text[3,17] !

I have this kind of control-sequence a few times in my text and I do not get 
rid of it, by strsplit or sub.

 grep(\xc1d\xe1m,text)
Error in grep(\xc1d\xe1m, text) :
  regular expression is invalid in this locale
 grep(\\xc1d\\xe1m,text)
integer(0)
Warning messages:
1: In grep(\\xc1d\\xe1m, text) :
  input string 3 is invalid in this locale
2: In grep(\\xc1d\\xe1m, text) :
  input string 17 is invalid in this locale

Thanks in advance,
Georg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2010-05-08 Thread Henrique Dallazuanna
See

?Encoding and ?iconv:

iconv(\xc1d\xe1m, from = '', to = 'latin1')


On Sat, May 8, 2010 at 11:05 AM, Webby mailing-l...@gmx.net wrote:


 Dear community,

 I have a problem with a string conversion:

  text
  [1]and\xc1d\xe1m
  [4] graphical  interface  MLP
  [7] Nagy   networks   Networks
 [10] neural Neural RBF
 [13] sod...@yahoo.com user   with
 [16] and\xc1d\xe1m graphical
 [19] interface  MLP
 

 I need to get rid off text[3,17] !

 I have this kind of control-sequence a few times in my text and I do not
 get
 rid of it, by strsplit or sub.

  grep(\xc1d\xe1m,text)
 Error in grep(\xc1d\xe1m, text) :
  regular expression is invalid in this locale
  grep(\\xc1d\\xe1m,text)
 integer(0)
 Warning messages:
 1: In grep(\\xc1d\\xe1m, text) :
  input string 3 is invalid in this locale
 2: In grep(\\xc1d\\xe1m, text) :
  input string 17 is invalid in this locale

 Thanks in advance,
 Georg

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String manipulation

2010-05-08 Thread David Winsemius


On May 8, 2010, at 10:05 AM, Webby wrote:



Dear community,

I have a problem with a string conversion:


text

[1]and\xc1d\xe1m
[4] graphical  interface  MLP
[7] Nagy   networks   Networks
[10] neural Neural RBF
[13] sod...@yahoo.com user   with
[16] and\xc1d\xe1m graphical
[19] interface  MLP




I need to get rid off text[3,17] !


Does this work

text[ grep([[:alnum:]]|, text) ]


Still gives the warnings but seems to properly leave out the control- 
sequences.


I have this kind of control-sequence a few times in my text and I do  
not get

rid of it, by strsplit or sub.


grep(\xc1d\xe1m,text)

Error in grep(\xc1d\xe1m, text) :
 regular expression is invalid in this locale

grep(\\xc1d\\xe1m,text)

integer(0)
Warning messages:
1: In grep(\\xc1d\\xe1m, text) :
 input string 3 is invalid in this locale
2: In grep(\\xc1d\\xe1m, text) :
 input string 17 is invalid in this locale

Thanks in advance,
Georg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String Manipulation- Extract numerical and alphanumerical segment

2010-02-05 Thread Su C.

I am currently attempting to split a long list of strings (let's call it
string.list) that is of the format:

1234567.z3.abcdef-gh.12

I have gotten it to:
1234567  z3  abcdef-gh  12
by use of the strsplit function.

This leaves me with each element of string.list having a split string of
the above format. What I'd like to do now is extract the first two strings
of each element in string.list -- the 1234567 and the z3 -- and place
them into two separate lists, say, firstsplit.numeric.list and
secondsplit.alphanumeric.list

I'm having some trouble figuring out how to do this. Any help would be
greatly appreciated!
-- 
View this message in context: 
http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation- Extract numerical and alphanumerical segment

2010-02-05 Thread jim holtman
Does this help:

 x - 
 c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12)
 y - strsplit(x, '[.]')

 y
[[1]]
[1] 1234567   z3abcdef-gh 12

[[2]]
[1] 1234567   z3abcdef-gh 12

[[3]]
[1] 1234567   z3abcdef-gh 12

 y.1 - sapply(y, '[[', 1)
 y.1
[1] 1234567 1234567 1234567
 y.2 - sapply(y, '[[', 2)
 y.2
[1] z3 z3 z3



On Fri, Feb 5, 2010 at 10:11 AM, Su C. sushi...@gmail.com wrote:

 I am currently attempting to split a long list of strings (let's call it
 string.list) that is of the format:

 1234567.z3.abcdef-gh.12

 I have gotten it to:
 1234567  z3  abcdef-gh  12
 by use of the strsplit function.

 This leaves me with each element of string.list having a split string of
 the above format. What I'd like to do now is extract the first two strings
 of each element in string.list -- the 1234567 and the z3 -- and place
 them into two separate lists, say, firstsplit.numeric.list and
 secondsplit.alphanumeric.list

 I'm having some trouble figuring out how to do this. Any help would be
 greatly appreciated!
 --
 View this message in context: 
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation- Extract numerical and alphanumerical segment

2010-02-05 Thread hadley wickham
On Fri, Feb 5, 2010 at 9:29 AM, jim holtman jholt...@gmail.com wrote:
 Does this help:

 x - 
 c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12)
 y - strsplit(x, '[.]')

Here's another way with the stringr package:

library(stringr)
x - 
c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12)
y - str_split_fixed(x, '[.]', 4)
y[, 1]
y[, 2]

Hadley



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation- Extract numerical and alphanumerical segment

2010-02-05 Thread Su C.

Yes, that was perfect! Thank you so much!

Just to clarify, since I'm kind of new to string manipulation-- is that '[['
in the sapply function what is designating splits/elements within the
string? So that's the part that says I want this particular element and
the 1 or 2 or number is what designates location?

And, if while looking at the second column, I want to verify if the
alphabetical character is say, a 'z' or an 'a' or a 'b', what would be an
elegant way to do that besides splitting the second column into alphabetical
and numerical values, and then testing against z,a,b, using a for loop and a
boolean statement? I want to assign a 1 for z's, a 2 for a's, and a 3 for
b's.


On Fri, Feb 5, 2010 at 10:30 AM, jholtman [via R] 
ml-node+1470341-841877...@n4.nabble.comml-node%2b1470341-841877...@n4.nabble.com
 wrote:

 Does this help:

  x -
 c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12)

  y - strsplit(x, '[.]')
 
  y
 [[1]]
 [1] 1234567   z3abcdef-gh 12

 [[2]]
 [1] 1234567   z3abcdef-gh 12

 [[3]]
 [1] 1234567   z3abcdef-gh 12

  y.1 - sapply(y, '[[', 1)
  y.1
 [1] 1234567 1234567 1234567
  y.2 - sapply(y, '[[', 2)
  y.2
 [1] z3 z3 z3
 


 On Fri, Feb 5, 2010 at 10:11 AM, Su C. [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=0
 wrote:

 
  I am currently attempting to split a long list of strings (let's call it
  string.list) that is of the format:
 
  1234567.z3.abcdef-gh.12
 
  I have gotten it to:
  1234567  z3  abcdef-gh  12
  by use of the strsplit function.
 
  This leaves me with each element of string.list having a split string
 of
  the above format. What I'd like to do now is extract the first two
 strings
  of each element in string.list -- the 1234567 and the z3 -- and
 place
  them into two separate lists, say, firstsplit.numeric.list and
  secondsplit.alphanumeric.list
 
  I'm having some trouble figuring out how to do this. Any help would be
  greatly appreciated!
  --
  View this message in context:
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden 
  email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=1mailing
   list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=2mailing
  list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  View message @
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470341.html
 To unsubscribe from String Manipulation- Extract numerical and
 alphanumerical segment, click here (link removed) ==.





-- 
Su H. Chu
Carnegie Mellon University
Economics and Statistics '09

-- 
View this message in context: 
http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470358.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String Manipulation- Extract numerical and alphanumerical segment

2010-02-05 Thread jim holtman
The '[[' is just the index access to an object.  type:

?'[['

to see the help page.

Actually I should have used '[' in this case:



 sapply(y, '[', 1)
[1] 1234567 1234567 1234567

is equivalent to:

 sapply(y, function(a) a[1])
[1] 1234567 1234567 1234567



So set a value based on the first character, just extract the first
character (e.g., substring) and then index into a vector with the key
values:

 key - c(z=1, a=2, b=3)  # mapping values
 data - c('a','c','b','d','z','a','b')  # data to be mapped
 key[data]
   a NAb NAzab
   2   NA3   NA123



On Fri, Feb 5, 2010 at 10:41 AM, Su C. sushi...@gmail.com wrote:

 Yes, that was perfect! Thank you so much!

 Just to clarify, since I'm kind of new to string manipulation-- is that '[['
 in the sapply function what is designating splits/elements within the
 string? So that's the part that says I want this particular element and
 the 1 or 2 or number is what designates location?

 And, if while looking at the second column, I want to verify if the
 alphabetical character is say, a 'z' or an 'a' or a 'b', what would be an
 elegant way to do that besides splitting the second column into alphabetical
 and numerical values, and then testing against z,a,b, using a for loop and a
 boolean statement? I want to assign a 1 for z's, a 2 for a's, and a 3 for
 b's.


 On Fri, Feb 5, 2010 at 10:30 AM, jholtman [via R] 
 ml-node+1470341-841877...@n4.nabble.comml-node%2b1470341-841877...@n4.nabble.com
 wrote:

 Does this help:

  x -
 c(1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12,1234567.z3.abcdef-gh.12)

  y - strsplit(x, '[.]')
 
  y
 [[1]]
 [1] 1234567   z3        abcdef-gh 12

 [[2]]
 [1] 1234567   z3        abcdef-gh 12

 [[3]]
 [1] 1234567   z3        abcdef-gh 12

  y.1 - sapply(y, '[[', 1)
  y.1
 [1] 1234567 1234567 1234567
  y.2 - sapply(y, '[[', 2)
  y.2
 [1] z3 z3 z3
 


 On Fri, Feb 5, 2010 at 10:11 AM, Su C. [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=0
 wrote:

 
  I am currently attempting to split a long list of strings (let's call it
  string.list) that is of the format:
 
  1234567.z3.abcdef-gh.12
 
  I have gotten it to:
  1234567  z3  abcdef-gh  12
  by use of the strsplit function.
 
  This leaves me with each element of string.list having a split string
 of
  the above format. What I'd like to do now is extract the first two
 strings
  of each element in string.list -- the 1234567 and the z3 -- and
 place
  them into two separate lists, say, firstsplit.numeric.list and
  secondsplit.alphanumeric.list
 
  I'm having some trouble figuring out how to do this. Any help would be
  greatly appreciated!
  --
  View this message in context:
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470301.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden 
  email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=1mailing
   list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1470341i=2mailing
  list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  View message @
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470341.html
 To unsubscribe from String Manipulation- Extract numerical and
 alphanumerical segment, click here (link removed) ==.





 --
 Su H. Chu
 Carnegie Mellon University
 Economics and Statistics '09

 --
 View this message in context: 
 http://n4.nabble.com/String-Manipulation-Extract-numerical-and-alphanumerical-segment-tp1470301p1470358.html
 Sent from the R help mailing list archive at Nabble.com.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.