[R] Omitting repeated occurrence in a string
Hello again, I was looking for some way on How to delete repeated appearance in a String. Let say I have following string: Text - ahsgdvasgAbcabcsdahj Here you see Abc appears twice. But I want to keep only 1 occurrence. Therefore I need that: Text_result - ahsgdvasgAbcsdahj (i.e. the first one). Can somebody help me if it is possible using some R function? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Omitting repeated occurrence in a string
On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote: Hello again, I was looking for some way on How to delete repeated appearance in a String. Let say I have following string: Text - ahsgdvasgAbcabcsdahj Here you see Abc appears twice. But I want to keep only 1 occurrence. Therefore I need that: Text_result - ahsgdvasgAbcsdahj (i.e. the first one). Can somebody help me if it is possible using some R function? This is not going to solve all possible variations of this problem, but then you proposed testing suite was rather limited, ... don't you agree? Text - ahsgdvasgAbcabcsdahabcj gsub((abc).*(abc), \\1, Text, ignore.case=TRUE) [1] ahsgdvasgAbcj -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Omitting repeated occurrence in a string
Hi Christopher, what is the rule to omit ah which is also repeated in Text? The following might be a start: Text - ahsgdvasgAbcabcsdahj #finds first repetion of substrings of length 2 or more, here ah gsub((?i)([a-z]{2,})(.*)\\1,\\1\\2,Text,perl=T) #finds all repetions of substrings of length 3 or more, here Abc gsub((?i)([a-z]{3,})(.*)\\1,\\1\\2,Text,perl=T) #finds only subsequent repetions of substrings of length 2 or more gsub((?i)([a-z]{2,})\\1,\\1,Text,perl=T) hth. Am 06.02.2013 17:46, schrieb Christofer Bogaso: Hello again, I was looking for some way on How to delete repeated appearance in a String. Let say I have following string: Text - ahsgdvasgAbcabcsdahj Here you see Abc appears twice. But I want to keep only 1 occurrence. Therefore I need that: Text_result - ahsgdvasgAbcsdahj (i.e. the first one). Can somebody help me if it is possible using some R function? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Omitting repeated occurrence in a string
On Feb 6, 2013, at 11:24 AM, David Winsemius wrote: On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote: Hello again, I was looking for some way on How to delete repeated appearance in a String. Let say I have following string: Text - ahsgdvasgAbcabcsdahj Here you see Abc appears twice. But I want to keep only 1 occurrence. Therefore I need that: Text_result - ahsgdvasgAbcsdahj (i.e. the first one). Can somebody help me if it is possible using some R function? This is not going to solve all possible variations of this problem, but then you proposed testing suite was rather limited, ... don't you agree? Text - ahsgdvasgAbcabcsdahabcj gsub((abc).*(abc), \\1, Text, ignore.case=TRUE) [1] ahsgdvasgAbcj This gives some further variations: Text - ahsgdvasgAbcabcsdahabcj #adding a third instance gsub((abc).*(abc), \\1, Text, ignore.case=TRUE) [1] ahsgdvasgAbcj # The first strategy deletes everything up to and through the last 'abc' gsub((abc)((.*)(abc)), \\1\\2, Text, ignore.case=TRUE) [1] ahsgdvasgAbcabcsdahabcj # embedded parenthesies don't seem to work gsub((abc)(abc), \\1, Text, ignore.case=TRUE) [1] ahsgdvasgAbcsdahabcj Gets rid of first of sequential instances only. Text [1] ahsgdvasgAbcabcsdahabcj gsub((abc)(.?)(abc), \\1\\2, Text, ignore.case=TRUE) [1] ahsgdvasgAbcsdahabcj # Only gets rid of first repeat #This gets rid of all of sequential repeats but not separated ones Text - ahsgdvasgAbcabcabcabcabcsdahabcj gsub((abc)(abc)*, \\1, Text, ignore.case=TRUE) [1] ahsgdvasgAbcsdahabcj -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.