[R] Omitting repeated occurrence in a string

2013-02-06 Thread Christofer Bogaso
Hello again,

I was looking for some way on How to delete repeated appearance in a
String. Let say I have following string:

Text - ahsgdvasgAbcabcsdahj

Here you see Abc appears twice. But I want to keep only 1
occurrence. Therefore I need that:

Text_result - ahsgdvasgAbcsdahj (i.e. the first one).

Can somebody help me if it is possible using some R function?

Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Omitting repeated occurrence in a string

2013-02-06 Thread David Winsemius

On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote:

 Hello again,
 
 I was looking for some way on How to delete repeated appearance in a
 String. Let say I have following string:
 
 Text - ahsgdvasgAbcabcsdahj
 
 Here you see Abc appears twice. But I want to keep only 1
 occurrence. Therefore I need that:
 
 Text_result - ahsgdvasgAbcsdahj (i.e. the first one).
 
 Can somebody help me if it is possible using some R function?

This is not going to solve all possible variations of this problem, but then 
you proposed testing suite was rather limited, ... don't you agree?

 Text - ahsgdvasgAbcabcsdahabcj
 gsub((abc).*(abc), \\1, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcj


-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Omitting repeated occurrence in a string

2013-02-06 Thread Eik Vettorazzi
Hi Christopher,
what is the rule to omit ah which is also repeated in Text?
The following might be a start:

Text - ahsgdvasgAbcabcsdahj
#finds first repetion of substrings of length 2 or more, here ah
gsub((?i)([a-z]{2,})(.*)\\1,\\1\\2,Text,perl=T)
#finds all repetions of substrings of length 3 or more, here Abc
gsub((?i)([a-z]{3,})(.*)\\1,\\1\\2,Text,perl=T)
#finds only subsequent repetions of substrings of length 2 or more
gsub((?i)([a-z]{2,})\\1,\\1,Text,perl=T)

hth.

Am 06.02.2013 17:46, schrieb Christofer Bogaso:
 Hello again,
 
 I was looking for some way on How to delete repeated appearance in a
 String. Let say I have following string:
 
 Text - ahsgdvasgAbcabcsdahj
 
 Here you see Abc appears twice. But I want to keep only 1
 occurrence. Therefore I need that:
 
 Text_result - ahsgdvasgAbcsdahj (i.e. the first one).
 
 Can somebody help me if it is possible using some R function?
 
 Thanks and regards,
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Omitting repeated occurrence in a string

2013-02-06 Thread David Winsemius

On Feb 6, 2013, at 11:24 AM, David Winsemius wrote:

 
 On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote:
 
 Hello again,
 
 I was looking for some way on How to delete repeated appearance in a
 String. Let say I have following string:
 
 Text - ahsgdvasgAbcabcsdahj
 
 Here you see Abc appears twice. But I want to keep only 1
 occurrence. Therefore I need that:
 
 Text_result - ahsgdvasgAbcsdahj (i.e. the first one).
 
 Can somebody help me if it is possible using some R function?
 
 This is not going to solve all possible variations of this problem, but then 
 you proposed testing suite was rather limited, ... don't you agree?
 
 Text - ahsgdvasgAbcabcsdahabcj
 gsub((abc).*(abc), \\1, Text, ignore.case=TRUE)
 [1] ahsgdvasgAbcj
 

This gives some further variations:

 Text - ahsgdvasgAbcabcsdahabcj  #adding a third instance
 gsub((abc).*(abc), \\1, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcj
# The first strategy deletes everything  up to and through the last 'abc'


 gsub((abc)((.*)(abc)), \\1\\2, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcabcsdahabcj
# embedded parenthesies don't seem to work

 gsub((abc)(abc), \\1, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcsdahabcj
Gets rid of first of sequential instances only.


 Text
[1] ahsgdvasgAbcabcsdahabcj
 gsub((abc)(.?)(abc), \\1\\2, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcsdahabcj
# Only gets rid of first repeat
 
#This gets rid of all of sequential repeats but not separated ones
 Text - ahsgdvasgAbcabcabcabcabcsdahabcj
 gsub((abc)(abc)*, \\1, Text, ignore.case=TRUE)
[1] ahsgdvasgAbcsdahabcj


 
 
 -- 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.