[R] Avoiding loop
Hi everyone. I'm using matrix product such as : #Generate some data NCols = 5 NRows = 5 A = matrix(runif(NCols*NRows), ncol=NCols) B = matrix(runif(NCols*NRows), ncol=NCols) #First calculation R = A%*%B for(i in 1:100) { R = R%*%B } I would like to know if it was possible to avoid the loop by using something like mapply or anything else. Tx in advance, Phil -- View this message in context: http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3457963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding loop
Hi: Try the expm package. Using your example, R = A%*%B for(i in 1:100) + { +R = R%*%B + } R [,1] [,2] [,3] [,4] [,5] [1,] 9.934879e+47 1.098761e+48 8.868476e+47 7.071831e+47 6.071370e+47 [2,] 1.492692e+48 1.650862e+48 1.332468e+48 1.062526e+48 9.122090e+47 [3,] 6.693145e+47 7.402373e+47 5.974708e+47 4.764305e+47 4.090293e+47 [4,] 5.895689e+47 6.520416e+47 5.262850e+47 4.196661e+47 3.602954e+47 [5,] 8.347321e+47 9.231830e+47 7.451326e+47 5.941778e+47 5.101187e+47 library(expm) # The matrix power function is an operator % ^ % A %*% (B %^% 101) [,1] [,2] [,3] [,4] [,5] [1,] 9.934879e+47 1.098761e+48 8.868476e+47 7.071831e+47 6.071370e+47 [2,] 1.492692e+48 1.650862e+48 1.332468e+48 1.062526e+48 9.122090e+47 [3,] 6.693145e+47 7.402373e+47 5.974708e+47 4.764305e+47 4.090293e+47 [4,] 5.895689e+47 6.520416e+47 5.262850e+47 4.196661e+47 3.602954e+47 [5,] 8.347321e+47 9.231830e+47 7.451326e+47 5.941778e+47 5.101187e+47 system.time(replicate(1000, A %*% (B %^% 101))) user system elapsed 0.020.000.01 system.time(replicate(1000, {R = A%*%B + for(i in 1:100) + { +R = R%*%B + } })) user system elapsed 0.150.000.15 HTH, Dennis On Mon, Apr 18, 2011 at 9:06 AM, Filoche pmassico...@hotmail.com wrote: Hi everyone. I'm using matrix product such as : #Generate some data NCols = 5 NRows = 5 A = matrix(runif(NCols*NRows), ncol=NCols) B = matrix(runif(NCols*NRows), ncol=NCols) #First calculation R = A%*%B for(i in 1:100) { R = R%*%B } I would like to know if it was possible to avoid the loop by using something like mapply or anything else. Tx in advance, Phil -- View this message in context: http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3457963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding loop
Hi sire. This is exactly what I was looking for, thank you. With regards, Phil -- View this message in context: http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3458152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
Thanks for your help. Date: Mon, 2 Nov 2009 18:50:42 -0500 Subject: Re: [R] avoiding loop From: jholt...@gmail.com To: bbom...@hotmail.com CC: mtmor...@fhcrc.org; r-help@r-project.org The first thing I would suggest is convert your dataframes to matrices so that you are not having to continually convert them in the calls to the functions. Also I am not sure what the code: realized_prob = with(DF, { ind - (CHOSEN == 1) n - tapply(theta_multiple[ind], CS[ind], sum) d - tapply(theta_multiple, CS, sum) n / d }) is doing. It looks like 'n' and 'd' might have different lengths since they are being created by two different (CS CS[ind]) sequences. I have no idea why you are converting to the DF dataframe. THere is no need for that. You could just leave the vectors (e.g., theta_multiple, CS and ind) as they are and work with them. This is probably where most of your time is being spent. So if you start with matrices and leave the dataframes out of the main loop you will probably see an increase in performance. 2009/11/2 parkbomee bbom...@hotmail.com: This is the Rprof() report by self time. Is it also possible that these routines, which take long self.time, are causing the optim() to be slow? $by.self self.time self.pct total.time total.pct FUN 94.16 16.5 94.16 16.5 unlist80.46 14.1 120.54 21.1 lapply76.94 13.5 255.48 44.7 match 60.76 10.6 60.88 10.7 as.matrix.data.frame 31.00 5.4 51.12 8.9 as.character 29.28 5.1 29.28 5.1 unique.default24.36 4.3 24.40 4.3 data.frame21.06 3.7 55.78 9.8 split.default 20.42 3.6 84.38 14.8 tapply13.84 2.4 414.28 72.5 structure 11.32 2.0 22.36 3.9 factor11.08 1.9 127.68 22.3 attributes- 11.00 1.9 11.00 1.9 ==10.56 1.8 10.56 1.8 %*% 10.30 1.8 10.30 1.8 as.vector 10.22 1.8 10.22 1.8 as.integer 9.86 1.7 9.86 1.7 list 9.64 1.7 9.64 1.7 exp7.12 1.2 7.12 1.2 as.data.frame.integer 5.98 1.0 8.10 1.4 To: bbom...@hotmail.com CC: jholt...@gmail.com; r-help@r-project.org Subject: Re: [R] avoiding loop From: mtmor...@fhcrc.org Date: Sun, 1 Nov 2009 22:14:09 -0800 parkbomee bbom...@hotmail.com writes: Thank you all. What Chuck has suggested might not be applicable since the number of different times is around 40,000. The object of optimization in my function is the varying value, which is basically data * parameter, of which parameter is the object of optimization.. And from the r profiling with a subset of data, I got this report..any idea what Anonymous is? $by.total total.time total.pct self.time self.pct Anonymous 571.56 100.0 0.02 0.0 optim 571.56 100.0 0.00 0.0 fn 571.54 100.0 0.98 0.2 You're giving us 'by.total', so these are saying that all the time was spent in these functions or the functions they called. Probably all are in 'optim' and its arguments; since little self.time is spent here, there isn't much to work with eval 423.74 74.1 0.00 0.0 with.default 423.74 74.1 0.00 0.0 with 423.74 74.1 0.00 0.0 These are probably in the internals of optim, where the function you're trying to optimize is being set up for evaluation. Again there's little self.time, and all these say is that a big piece of the time is being spent in code called by this code. tapply 414.28 72.5 13.84 2.4 lapply 255.48 44.7 76.94 13.5 factor 127.68 22.3 11.08 1.9 unlist 120.54 21.1 80.46 14.1 FUN 94.16 16.5 94.16 16.5 these look like they are tapply-related calls (looking at the code for tapply, it calls lapply, factor, and unlist, and FUN is the function argument to tapply), perhaps from the function you're optimizing (did you implement this as suggested below? it would really help to have a possibly simplified version of the code you're calling). There is material to work with here, as apparently a fairly large amount of self.time is being spent in each of these functions. So here's a sample data set n - 10 set.seed(123) df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))), value=ceiling(runif(n)*5
Re: [R] avoiding loop
This is the Rprof() report by self time. Is it also possible that these routines, which take long self.time, are causing the optim() to be slow? $by.self self.time self.pct total.time total.pct FUN 94.16 16.5 94.16 16.5 unlist80.46 14.1 120.54 21.1 lapply76.94 13.5 255.48 44.7 match 60.76 10.6 60.88 10.7 as.matrix.data.frame 31.00 5.4 51.12 8.9 as.character 29.28 5.1 29.28 5.1 unique.default24.36 4.3 24.40 4.3 data.frame21.06 3.7 55.78 9.8 split.default 20.42 3.6 84.38 14.8 tapply13.84 2.4 414.28 72.5 structure 11.32 2.0 22.36 3.9 factor11.08 1.9 127.68 22.3 attributes- 11.00 1.9 11.00 1.9 ==10.56 1.8 10.56 1.8 %*% 10.30 1.8 10.30 1.8 as.vector 10.22 1.8 10.22 1.8 as.integer 9.86 1.7 9.86 1.7 list 9.64 1.7 9.64 1.7 exp7.12 1.2 7.12 1.2 as.data.frame.integer 5.98 1.0 8.10 1.4 To: bbom...@hotmail.com CC: jholt...@gmail.com; r-help@r-project.org Subject: Re: [R] avoiding loop From: mtmor...@fhcrc.org Date: Sun, 1 Nov 2009 22:14:09 -0800 parkbomee bbom...@hotmail.com writes: Thank you all. What Chuck has suggested might not be applicable since the number of different times is around 40,000. The object of optimization in my function is the varying value, which is basically data * parameter, of which parameter is the object of optimization.. And from the r profiling with a subset of data, I got this report..any idea what Anonymous is? $by.total total.time total.pct self.time self.pct Anonymous 571.56 100.0 0.02 0.0 optim 571.56 100.0 0.00 0.0 fn571.54 100.0 0.98 0.2 You're giving us 'by.total', so these are saying that all the time was spent in these functions or the functions they called. Probably all are in 'optim' and its arguments; since little self.time is spent here, there isn't much to work with eval 423.74 74.1 0.00 0.0 with.default 423.74 74.1 0.00 0.0 with 423.74 74.1 0.00 0.0 These are probably in the internals of optim, where the function you're trying to optimize is being set up for evaluation. Again there's little self.time, and all these say is that a big piece of the time is being spent in code called by this code. tapply414.28 72.5 13.84 2.4 lapply255.48 44.7 76.94 13.5 factor127.68 22.3 11.08 1.9 unlist120.54 21.1 80.46 14.1 FUN94.16 16.5 94.16 16.5 these look like they are tapply-related calls (looking at the code for tapply, it calls lapply, factor, and unlist, and FUN is the function argument to tapply), perhaps from the function you're optimizing (did you implement this as suggested below? it would really help to have a possibly simplified version of the code you're calling). There is material to work with here, as apparently a fairly large amount of self.time is being spent in each of these functions. So here's a sample data set n - 10 set.seed(123) df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))), value=ceiling(runif(n)*5)) It would have been helpful for you to provide reproducible code like that above, so that the characteristics of your data were easily reproducible. Let's time tapply replicate(5, { + system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]] + }) [1] 0.316 0.316 0.308 0.320 0.304 tapply is quite general, but in your case I think you'd be happy with tapply1 - function(X, INDEX, FUN) unlist(lapply(split(X, INDEX), FUN), use.names=FALSE) replicate(5, { + system.time(x1 - tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]] + }) [1] 0.156 0.148 0.152 0.144 0.152 so about twice the speed (timing depends quite a bit on what 'time' is, integer or numeric or character or factor). The vector values of the two calculations are identical, though tapply presents the data as an array with column names identical(as.vector(x0), x1) [1] TRUE tapply allows FUN to be anything, but if the interest is in the sum of each time interval
Re: [R] avoiding loop
The first thing I would suggest is convert your dataframes to matrices so that you are not having to continually convert them in the calls to the functions. Also I am not sure what the code: realized_prob = with(DF, { ind - (CHOSEN == 1) n - tapply(theta_multiple[ind], CS[ind], sum) d - tapply(theta_multiple, CS, sum) n / d }) is doing. It looks like 'n' and 'd' might have different lengths since they are being created by two different (CS CS[ind]) sequences. I have no idea why you are converting to the DF dataframe. THere is no need for that. You could just leave the vectors (e.g., theta_multiple, CS and ind) as they are and work with them. This is probably where most of your time is being spent. So if you start with matrices and leave the dataframes out of the main loop you will probably see an increase in performance. 2009/11/2 parkbomee bbom...@hotmail.com: This is the Rprof() report by self time. Is it also possible that these routines, which take long self.time, are causing the optim() to be slow? $by.self self.time self.pct total.time total.pct FUN 94.16 16.5 94.16 16.5 unlist80.46 14.1 120.54 21.1 lapply76.94 13.5 255.48 44.7 match 60.76 10.6 60.88 10.7 as.matrix.data.frame 31.00 5.4 51.12 8.9 as.character 29.28 5.1 29.28 5.1 unique.default24.36 4.3 24.40 4.3 data.frame21.06 3.7 55.78 9.8 split.default 20.42 3.6 84.38 14.8 tapply13.84 2.4 414.28 72.5 structure 11.32 2.0 22.36 3.9 factor11.08 1.9 127.68 22.3 attributes- 11.00 1.9 11.00 1.9 ==10.56 1.8 10.56 1.8 %*% 10.30 1.8 10.30 1.8 as.vector 10.22 1.8 10.22 1.8 as.integer 9.86 1.7 9.86 1.7 list 9.64 1.7 9.64 1.7 exp7.12 1.2 7.12 1.2 as.data.frame.integer 5.98 1.0 8.10 1.4 To: bbom...@hotmail.com CC: jholt...@gmail.com; r-help@r-project.org Subject: Re: [R] avoiding loop From: mtmor...@fhcrc.org Date: Sun, 1 Nov 2009 22:14:09 -0800 parkbomee bbom...@hotmail.com writes: Thank you all. What Chuck has suggested might not be applicable since the number of different times is around 40,000. The object of optimization in my function is the varying value, which is basically data * parameter, of which parameter is the object of optimization.. And from the r profiling with a subset of data, I got this report..any idea what Anonymous is? $by.total total.time total.pct self.time self.pct Anonymous 571.56 100.0 0.02 0.0 optim 571.56 100.0 0.00 0.0 fn 571.54 100.0 0.98 0.2 You're giving us 'by.total', so these are saying that all the time was spent in these functions or the functions they called. Probably all are in 'optim' and its arguments; since little self.time is spent here, there isn't much to work with eval 423.74 74.1 0.00 0.0 with.default 423.74 74.1 0.00 0.0 with 423.74 74.1 0.00 0.0 These are probably in the internals of optim, where the function you're trying to optimize is being set up for evaluation. Again there's little self.time, and all these say is that a big piece of the time is being spent in code called by this code. tapply 414.28 72.5 13.84 2.4 lapply 255.48 44.7 76.94 13.5 factor 127.68 22.3 11.08 1.9 unlist 120.54 21.1 80.46 14.1 FUN 94.16 16.5 94.16 16.5 these look like they are tapply-related calls (looking at the code for tapply, it calls lapply, factor, and unlist, and FUN is the function argument to tapply), perhaps from the function you're optimizing (did you implement this as suggested below? it would really help to have a possibly simplified version of the code you're calling). There is material to work with here, as apparently a fairly large amount of self.time is being spent in each of these functions. So here's a sample data set n - 10 set.seed(123) df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))), value=ceiling(runif(n)*5)) It would have been helpful for you to provide reproducible code like that above, so that the characteristics of your data were easily reproducible. Let's time tapply replicate(5, { + system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]] + }) [1] 0.316 0.316 0.308
Re: [R] avoiding loop
On Sat, 31 Oct 2009, parkbomee wrote: Thank you both. However, using tapply() instead of a loop does not seem to improve my code much. I am using this inside of an optimization function, and it still takes more than it needs... Well, you haven't given us much to work with. The optimization choices depend on the particulars of your problem, which you've not detailed. It does not take long to run the tapply() code once, so you need to do it many times. Right? If so, you need to say how the structure varies (rendered as DF in David's reply) from iteration to iteration in the optimization. If it turns out that only 'value' changes and that the number of different times is not too large, then precomputing suitable indicator matrics may help: mat1 - model.matrix( ~ 0 + factor(time):as.numeric(choice==1),DF) mat2 - model.matrix( ~ 0 + factor(time), DF ) Inside the optimization use something like with(DF,(value%*%mat1)/(value%*%mat2)) If the structure can change or the number of unique times is large, then with so simple a calculation you should probably just inline some C code. http://cran.r-project.org/web/packages/inline/index.html HTH, Chuck CC: bbom...@hotmail.com; r-help@r-project.org From: dwinsem...@comcast.net To: d.rizopou...@erasmusmc.nl Subject: Re: [R] avoiding loop Date: Sat, 31 Oct 2009 22:26:17 -0400 This is pretty much equivalent: tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) / tapply(DF$value, DF$time, sum) And both will probably fail if the number of groups with choice==1 is different than the number overall. -- David. On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote: one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value| choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
What you need to do is to understand how to use Rprof so that you can determine where the time is being spent. It probably indicates that this is not the source of slowness in your optimization function. How much time are we talking about? You may spent more time trying to optimize the function than just running the current version even if it is slow (slow is a relative term and does not hold much meaning without some context round it). On Sat, Oct 31, 2009 at 11:36 PM, parkbomee bbom...@hotmail.com wrote: Thank you both. However, using tapply() instead of a loop does not seem to improve my code much. I am using this inside of an optimization function, and it still takes more than it needs... CC: bbom...@hotmail.com; r-help@r-project.org From: dwinsem...@comcast.net To: d.rizopou...@erasmusmc.nl Subject: Re: [R] avoiding loop Date: Sat, 31 Oct 2009 22:26:17 -0400 This is pretty much equivalent: tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) / tapply(DF$value, DF$time, sum) And both will probably fail if the number of groups with choice==1 is different than the number overall. -- David. On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote: one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value| choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choice value 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
Thank you all. What Chuck has suggested might not be applicable since the number of different times is around 40,000. The object of optimization in my function is the varying value, which is basically data * parameter, of which parameter is the object of optimization.. And from the r profiling with a subset of data, I got this report..any idea what Anonymous is? $by.total total.time total.pct self.time self.pct Anonymous 571.56 100.0 0.02 0.0 optim 571.56 100.0 0.00 0.0 fn571.54 100.0 0.98 0.2 eval 423.74 74.1 0.00 0.0 with.default 423.74 74.1 0.00 0.0 with 423.74 74.1 0.00 0.0 tapply414.28 72.5 13.84 2.4 lapply255.48 44.7 76.94 13.5 factor127.68 22.3 11.08 1.9 unlist120.54 21.1 80.46 14.1 FUN94.16 16.5 94.16 16.5 . . . . . Date: Sun, 1 Nov 2009 15:35:41 -0400 Subject: Re: [R] avoiding loop From: jholt...@gmail.com To: bbom...@hotmail.com CC: dwinsem...@comcast.net; d.rizopou...@erasmusmc.nl; r-help@r-project.org What you need to do is to understand how to use Rprof so that you can determine where the time is being spent. It probably indicates that this is not the source of slowness in your optimization function. How much time are we talking about? You may spent more time trying to optimize the function than just running the current version even if it is slow (slow is a relative term and does not hold much meaning without some context round it). On Sat, Oct 31, 2009 at 11:36 PM, parkbomee bbom...@hotmail.com wrote: Thank you both. However, using tapply() instead of a loop does not seem to improve my code much. I am using this inside of an optimization function, and it still takes more than it needs... CC: bbom...@hotmail.com; r-help@r-project.org From: dwinsem...@comcast.net To: d.rizopou...@erasmusmc.nl Subject: Re: [R] avoiding loop Date: Sat, 31 Oct 2009 22:26:17 -0400 This is pretty much equivalent: tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) / tapply(DF$value, DF$time, sum) And both will probably fail if the number of groups with choice==1 is different than the number overall. -- David. On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote: one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value| choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
Re: [R] avoiding loop
parkbomee bbom...@hotmail.com writes: Thank you all. What Chuck has suggested might not be applicable since the number of different times is around 40,000. The object of optimization in my function is the varying value, which is basically data * parameter, of which parameter is the object of optimization.. And from the r profiling with a subset of data, I got this report..any idea what Anonymous is? $by.total total.time total.pct self.time self.pct Anonymous 571.56 100.0 0.02 0.0 optim 571.56 100.0 0.00 0.0 fn571.54 100.0 0.98 0.2 You're giving us 'by.total', so these are saying that all the time was spent in these functions or the functions they called. Probably all are in 'optim' and its arguments; since little self.time is spent here, there isn't much to work with eval 423.74 74.1 0.00 0.0 with.default 423.74 74.1 0.00 0.0 with 423.74 74.1 0.00 0.0 These are probably in the internals of optim, where the function you're trying to optimize is being set up for evaluation. Again there's little self.time, and all these say is that a big piece of the time is being spent in code called by this code. tapply414.28 72.5 13.84 2.4 lapply255.48 44.7 76.94 13.5 factor127.68 22.3 11.08 1.9 unlist120.54 21.1 80.46 14.1 FUN94.16 16.5 94.16 16.5 these look like they are tapply-related calls (looking at the code for tapply, it calls lapply, factor, and unlist, and FUN is the function argument to tapply), perhaps from the function you're optimizing (did you implement this as suggested below? it would really help to have a possibly simplified version of the code you're calling). There is material to work with here, as apparently a fairly large amount of self.time is being spent in each of these functions. So here's a sample data set n - 10 set.seed(123) df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))), value=ceiling(runif(n)*5)) It would have been helpful for you to provide reproducible code like that above, so that the characteristics of your data were easily reproducible. Let's time tapply replicate(5, { + system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]] + }) [1] 0.316 0.316 0.308 0.320 0.304 tapply is quite general, but in your case I think you'd be happy with tapply1 - function(X, INDEX, FUN) unlist(lapply(split(X, INDEX), FUN), use.names=FALSE) replicate(5, { + system.time(x1 - tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]] + }) [1] 0.156 0.148 0.152 0.144 0.152 so about twice the speed (timing depends quite a bit on what 'time' is, integer or numeric or character or factor). The vector values of the two calculations are identical, though tapply presents the data as an array with column names identical(as.vector(x0), x1) [1] TRUE tapply allows FUN to be anything, but if the interest is in the sum of each time interval, and the time intervals can be assumed to be sorted (sorting is not expensive, so could be done on the fly), then tapply2 - function(X, INDEX) { csum - cumsum(c(0, X)) idx - diff(INDEX) != 0 csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)] } calculates the cumulative sum and the points in INDEX where the time intervals change. It then takes the difference over the appropriate interval. replicate(5, { + system.time(x2 - tapply2(df$value, df$time), gcFirst=TRUE)[[1]] + }) [1] 0.024 0.024 0.024 0.024 0.024 identical(as.vector(x0), x2) [1] TRUE This approach could be subject to rounding error (if csum gets very large and the intervals remain small). To calculate values where choice == 1 I think you'd want to tapply2(df$value * (df$choice==1), df$time) rather than sub-setting, so that the result of tapply2 is always a vector of the same length even when some time intervals never have choice==1. Because tapply in these examples seems so fast compared to your calculation, I wonder whether optim is evaluating your function many times, and that reformulating the optimization might lead to a very substantial speed-up? Martin . . . . . Date: Sun, 1 Nov 2009 15:35:41 -0400 Subject: Re: [R] avoiding loop From: jholt...@gmail.com To: bbom...@hotmail.com CC: dwinsem...@comcast.net; d.rizopou...@erasmusmc.nl; r-help@r-project.org What you need to do is to understand how to use Rprof so that you can determine where the time is being spent. It probably indicates that this is not the source of slowness in your optimization function. How much time are we talking about? You may spent more time trying to optimize the function than just running
[R] avoiding loop
Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value|choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value|choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
This is pretty much equivalent: tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) / tapply(DF$value, DF$time, sum) And both will probably fail if the number of groups with choice==1 is different than the number overall. -- David. On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote: one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value| choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding loop
Thank you both. However, using tapply() instead of a loop does not seem to improve my code much. I am using this inside of an optimization function, and it still takes more than it needs... CC: bbom...@hotmail.com; r-help@r-project.org From: dwinsem...@comcast.net To: d.rizopou...@erasmusmc.nl Subject: Re: [R] avoiding loop Date: Sat, 31 Oct 2009 22:26:17 -0400 This is pretty much equivalent: tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) / tapply(DF$value, DF$time, sum) And both will probably fail if the number of groups with choice==1 is different than the number overall. -- David. On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote: one approach is the following: # say 'DF' is your data frame, then with(DF, { ind - choice == 1 n - tapply(value[ind], time[ind], sum) d - tapply(value, time, sum) n / d }) I hope it helps. Best, Dimitris parkbomee wrote: Hi all, I am trying to figure out a way to improve my code's efficiency by avoiding the use of loop. I want to calculate a conditional mean(?) given time. For example, from the data below, I want to calculate sum((value| choice==1)/sum(value)) across time. Is there a way to do it without using a loop? time cum_time choicevalue 1 4 1 3 1 4 0 2 1 4 0 3 1 4 0 3 2 6 1 4 2 6 0 4 2 6 0 2 2 6 0 4 2 6 0 2 2 6 0 2 3 4 1 2 3 4 0 3 3 4 0 5 3 4 0 2 My code looks like objective[1] = value[1] / sum(value[1:cum_time[1]) for (i in 2:max(time)){ objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : cum_time[i])]) } sum(objective) Anyone have an idea that I can do this without using a loop?? Thanks. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.