God this listserve is awesome. Thanks to everyone for their ideas. I'll speed & memory test tomorrow and change the code. Thanks again!
Matt On Sun, May 29, 2011 at 6:44 PM, Ian Gow <[email protected]> wrote: > Not a new approach, but some benchmark data (the perl=TRUE speeds up Jim's > suggestion): > >> x <- c('18x.6','12x.9','302x.3') >> y <- rep(x,100000) >> system.time(temp <- unlist(lapply(strsplit(y,".",fixed=TRUE),function(x) >>x[1]))) > user system elapsed > 1.203 0.018 1.222 >> system.time(temp2 <- gsub("^(.*?)\\..*$","\\1",y, perl=TRUE)) > user system elapsed > 0.176 0.001 0.176 >> identical(temp2, temp) > [1] TRUE >> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y)) > user system elapsed > 0.292 0.001 0.291 >> identical(temp3, temp) > [1] TRUE >> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y, perl=TRUE)) > user system elapsed > 0.160 0.001 0.161 > > > > > > > On 5/29/11 7:40 PM, "jim holtman" <[email protected]> wrote: > >>Try this approach: >> >>> x <- c('18x.6','12x.9','302x.3') >>> gsub("^(.*)\\..*", '\\1', x) >>[1] "18x" "12x" "302x" >> >> >>On Sun, May 29, 2011 at 8:10 PM, Matthew Keller <[email protected]> >>wrote: >>> hi all, >>> >>> I'm full of questions today :). Thanks in advance for your help! >>> >>> Here's the problem: >>> x <- c('18x.6','12x.9','302x.3') >>> >>> I want to get a vector that is c('18x','12x','302x') >>> >>> This is easily done using this code: >>> >>> unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1])) >>> >>> So far so good. The problem is that x is a vector of length 132e6. >>> When I run the above code, it runs for > 30 minutes, and it takes > 23 >>> Gb RAM (no kidding!). >>> >>> Does anyone have ideas about how to speed up the code above and (more >>> importantly) reduce the RAM footprint? I'd prefer not to change the >>> file on disk using, e.g., awk, but I will do that as a last resort. >>> >>> Best >>> >>> Matt >>> >>> -- >>> Matthew C Keller >>> Asst. Professor of Psychology >>> University of Colorado at Boulder >>> www.matthewckeller.com >>> >>> ______________________________________________ >>> [email protected] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >>-- >>Jim Holtman >>Data Munger Guru >> >>What is the problem that you are trying to solve? >> >>______________________________________________ >>[email protected] mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > > -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

