[R] Avoiding loop

2011-04-18 Thread Filoche
Hi everyone.

I'm using matrix product such as : 


#Generate some data
NCols = 5
NRows = 5
A = matrix(runif(NCols*NRows), ncol=NCols) 
B = matrix(runif(NCols*NRows), ncol=NCols) 

#First calculation
R = A%*%B


for(i in 1:100)
{
R = R%*%B
}


I would like to know if it was possible to avoid the loop by using something
like mapply or anything else.

Tx in advance,
Phil

--
View this message in context: 
http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3457963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding loop

2011-04-18 Thread Dennis Murphy
Hi:

Try the expm package. Using your example,

 R = A%*%B
 for(i in 1:100)
+ {
+R = R%*%B
+ }
 R
 [,1] [,2] [,3] [,4] [,5]
[1,] 9.934879e+47 1.098761e+48 8.868476e+47 7.071831e+47 6.071370e+47
[2,] 1.492692e+48 1.650862e+48 1.332468e+48 1.062526e+48 9.122090e+47
[3,] 6.693145e+47 7.402373e+47 5.974708e+47 4.764305e+47 4.090293e+47
[4,] 5.895689e+47 6.520416e+47 5.262850e+47 4.196661e+47 3.602954e+47
[5,] 8.347321e+47 9.231830e+47 7.451326e+47 5.941778e+47 5.101187e+47

library(expm)
# The matrix power function is an operator % ^ %
 A %*% (B %^% 101)
 [,1] [,2] [,3] [,4] [,5]
[1,] 9.934879e+47 1.098761e+48 8.868476e+47 7.071831e+47 6.071370e+47
[2,] 1.492692e+48 1.650862e+48 1.332468e+48 1.062526e+48 9.122090e+47
[3,] 6.693145e+47 7.402373e+47 5.974708e+47 4.764305e+47 4.090293e+47
[4,] 5.895689e+47 6.520416e+47 5.262850e+47 4.196661e+47 3.602954e+47
[5,] 8.347321e+47 9.231830e+47 7.451326e+47 5.941778e+47 5.101187e+47

 system.time(replicate(1000, A %*% (B %^% 101)))
   user  system elapsed
   0.020.000.01
 system.time(replicate(1000, {R = A%*%B
+ for(i in 1:100)
+ {
+R = R%*%B
+ }  }))
   user  system elapsed
   0.150.000.15

HTH,
Dennis


On Mon, Apr 18, 2011 at 9:06 AM, Filoche pmassico...@hotmail.com wrote:
 Hi everyone.

 I'm using matrix product such as :


 #Generate some data
 NCols = 5
 NRows = 5
 A = matrix(runif(NCols*NRows), ncol=NCols)
 B = matrix(runif(NCols*NRows), ncol=NCols)

 #First calculation
 R = A%*%B


 for(i in 1:100)
 {
        R = R%*%B
 }


 I would like to know if it was possible to avoid the loop by using something
 like mapply or anything else.

 Tx in advance,
 Phil

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3457963.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Avoiding loop

2011-04-18 Thread Filoche
Hi sire.

This is exactly what I was looking for, thank you.

With regards,
Phil

--
View this message in context: 
http://r.789695.n4.nabble.com/Avoiding-loop-tp3457963p3458152.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-11-03 Thread parkbomee


Thanks for your help.


 Date: Mon, 2 Nov 2009 18:50:42 -0500
 Subject: Re: [R] avoiding loop
 From: jholt...@gmail.com
 To: bbom...@hotmail.com
 CC: mtmor...@fhcrc.org; r-help@r-project.org
 
 The first thing I would suggest is convert your dataframes to matrices
 so that you are not having to continually convert them in the calls to
 the functions.  Also I am not sure what the code:
 
   realized_prob = with(DF, {
   ind - (CHOSEN == 1)
   n - tapply(theta_multiple[ind], 
 CS[ind], sum)
   d - tapply(theta_multiple, CS, sum)
   n / d   
   })
 
 is doing.  It looks like 'n' and 'd' might have different lengths
 since they are being created by two different (CS  CS[ind])
 sequences.  I have no idea why you are converting to the DF
 dataframe.  THere is no need for that.  You could just leave the
 vectors (e.g., theta_multiple, CS and ind) as they are and work with
 them.  This is probably where most of your time is being spent.  So if
 you start with matrices and leave the dataframes out of the main loop
 you will probably see an increase in performance.
 
 2009/11/2 parkbomee bbom...@hotmail.com:
  This is the Rprof() report by self time.
  Is it also possible that these routines, which take long self.time, are
  causing the optim() to be slow?
 
 
  $by.self
  self.time self.pct total.time total.pct
  FUN   94.16 16.5  94.16  16.5
  unlist80.46 14.1 120.54  21.1
  lapply76.94 13.5 255.48  44.7
  match 60.76 10.6  60.88  10.7
  as.matrix.data.frame  31.00  5.4  51.12   8.9
  as.character  29.28  5.1  29.28   5.1
  unique.default24.36  4.3  24.40   4.3
  data.frame21.06  3.7  55.78   9.8
  split.default 20.42  3.6  84.38  14.8
  tapply13.84  2.4 414.28  72.5
  structure 11.32  2.0  22.36   3.9
  factor11.08  1.9 127.68  22.3
  attributes-  11.00  1.9  11.00   1.9
  ==10.56  1.8  10.56   1.8
  %*%   10.30  1.8  10.30   1.8
  as.vector 10.22  1.8  10.22   1.8
  as.integer 9.86  1.7   9.86   1.7
  list   9.64  1.7   9.64   1.7
  exp7.12  1.2   7.12   1.2
  as.data.frame.integer  5.98  1.0   8.10   1.4
 
  To: bbom...@hotmail.com
  CC: jholt...@gmail.com; r-help@r-project.org
  Subject: Re: [R] avoiding loop
  From: mtmor...@fhcrc.org
  Date: Sun, 1 Nov 2009 22:14:09 -0800
 
  parkbomee bbom...@hotmail.com writes:
 
   Thank you all.
  
   What Chuck has suggested might not be applicable since the number of
   different times is around 40,000.
  
   The object of optimization in my function is the varying value,
   which is basically data * parameter, of which parameter is the
   object of optimization..
  
   And from the r profiling with a subset of data,
   I got this report..any idea what Anonymous is?
  
  
   $by.total
   total.time total.pct self.time self.pct
   Anonymous 571.56 100.0 0.02 0.0
   optim 571.56 100.0 0.00 0.0
   fn 571.54 100.0 0.98 0.2
 
  You're giving us 'by.total', so these are saying that all the time was
  spent in these functions or the functions they called. Probably all
  are in 'optim' and its arguments; since little self.time is spent
  here, there isn't much to work with
 
   eval 423.74 74.1 0.00 0.0
   with.default 423.74 74.1 0.00 0.0
   with 423.74 74.1 0.00 0.0
 
  These are probably in the internals of optim, where the function
  you're trying to optimize is being set up for evaluation. Again
  there's little self.time, and all these say is that a big piece of the
  time is being spent in code called by this code.
 
   tapply 414.28 72.5 13.84 2.4
   lapply 255.48 44.7 76.94 13.5
   factor 127.68 22.3 11.08 1.9
   unlist 120.54 21.1 80.46 14.1
   FUN 94.16 16.5 94.16 16.5
 
  these look like they are tapply-related calls (looking at the code for
  tapply, it calls lapply, factor, and unlist, and FUN is the function
  argument to tapply), perhaps from the function you're optimizing (did
  you implement this as suggested below? it would really help to have a
  possibly simplified version of the code you're calling).
 
  There is material to work with here, as apparently a fairly large
  amount of self.time is being spent in each of these functions. So
  here's a sample data set
 
  n - 10
  set.seed(123)
  df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
  value=ceiling(runif(n)*5

Re: [R] avoiding loop

2009-11-02 Thread parkbomee

This is the Rprof() report by self time.
Is it also possible that these routines, which take long self.time, are causing 
the optim() to be slow?


$by.self
self.time self.pct total.time total.pct
FUN   94.16 16.5  94.16  16.5
unlist80.46 14.1 120.54  21.1
lapply76.94 13.5 255.48  44.7
match 60.76 10.6  60.88  10.7
as.matrix.data.frame  31.00  5.4  51.12   8.9
as.character  29.28  5.1  29.28   5.1
unique.default24.36  4.3  24.40   4.3
data.frame21.06  3.7  55.78   9.8
split.default 20.42  3.6  84.38  14.8
tapply13.84  2.4 414.28  72.5
structure 11.32  2.0  22.36   3.9
factor11.08  1.9 127.68  22.3
attributes-  11.00  1.9  11.00   1.9
==10.56  1.8  10.56   1.8
%*%   10.30  1.8  10.30   1.8
as.vector 10.22  1.8  10.22   1.8
as.integer 9.86  1.7   9.86   1.7
list   9.64  1.7   9.64   1.7
exp7.12  1.2   7.12   1.2
as.data.frame.integer  5.98  1.0   8.10   1.4

 To: bbom...@hotmail.com
 CC: jholt...@gmail.com; r-help@r-project.org
 Subject: Re: [R] avoiding loop
 From: mtmor...@fhcrc.org
 Date: Sun, 1 Nov 2009 22:14:09 -0800
 
 parkbomee bbom...@hotmail.com writes:
 
  Thank you all.
 
  What Chuck has suggested might not be applicable since the number of
  different times is around 40,000.
 
  The object of optimization in my function is the varying value,
  which is basically data * parameter, of which parameter is the
  object of optimization..
   
  And from the r profiling with a subset of data,
  I got this report..any idea what Anonymous is?
 
 
  $by.total
  total.time total.pct self.time self.pct
  Anonymous   571.56 100.0  0.02  0.0
  optim 571.56 100.0  0.00  0.0
  fn571.54 100.0  0.98  0.2
 
 You're giving us 'by.total', so these are saying that all the time was
 spent in these functions or the functions they called. Probably all
 are in 'optim' and its arguments; since little self.time is spent
 here, there isn't much to work with
 
  eval  423.74  74.1  0.00  0.0
  with.default  423.74  74.1  0.00  0.0
  with  423.74  74.1  0.00  0.0
 
 These are probably in the internals of optim, where the function
 you're trying to optimize is being set up for evaluation. Again
 there's little self.time, and all these say is that a big piece of the
 time is being spent in code called by this code.
 
  tapply414.28  72.5 13.84  2.4
  lapply255.48  44.7 76.94 13.5
  factor127.68  22.3 11.08  1.9
  unlist120.54  21.1 80.46 14.1
  FUN94.16  16.5 94.16 16.5
 
 these look like they are tapply-related calls (looking at the code for
 tapply, it calls lapply, factor, and unlist, and FUN is the function
 argument to tapply), perhaps from the function you're optimizing (did
 you implement this as suggested below?  it would really help to have a
 possibly simplified version of the code you're calling).
 
 There is material to work with here, as apparently a fairly large
 amount of self.time is being spent in each of these functions. So
 here's a sample data set
 
   n - 10
   set.seed(123)
   df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
value=ceiling(runif(n)*5))
 
 It would have been helpful for you to provide reproducible code like
 that above, so that the characteristics of your data were easily
 reproducible. Let's time tapply
 
  replicate(5, {
 + system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
 + })
 [1] 0.316 0.316 0.308 0.320 0.304
 
 tapply is quite general, but in your case I think you'd be happy with
 
   tapply1 - function(X, INDEX, FUN)
   unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)
 
  replicate(5, {
 + system.time(x1 - tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
 + })
 [1] 0.156 0.148 0.152 0.144 0.152
 
 so about twice the speed (timing depends quite a bit on what 'time' is,
 integer or numeric or character or factor). The vector values of the
 two calculations are identical, though tapply presents the data as an
 array with column names
 
  identical(as.vector(x0), x1)
 [1] TRUE
 
 tapply allows FUN to be anything, but if the interest is in the sum of
 each time interval

Re: [R] avoiding loop

2009-11-02 Thread jim holtman
The first thing I would suggest is convert your dataframes to matrices
so that you are not having to continually convert them in the calls to
the functions.  Also I am not sure what the code:

realized_prob = with(DF, {
ind - (CHOSEN == 1)
n - tapply(theta_multiple[ind], 
CS[ind], sum)
d - tapply(theta_multiple, CS, sum)
n / d   
})

is doing.  It looks like 'n' and 'd' might have different lengths
since they are being created by two different (CS  CS[ind])
sequences.  I have no idea why you are converting to the DF
dataframe.  THere is no need for that.  You could just leave the
vectors (e.g., theta_multiple, CS and ind) as they are and work with
them.  This is probably where most of your time is being spent.  So if
you start with matrices and leave the dataframes out of the main loop
you will probably see an increase in performance.

2009/11/2 parkbomee bbom...@hotmail.com:
 This is the Rprof() report by self time.
 Is it also possible that these routines, which take long self.time, are
 causing the optim() to be slow?


 $by.self
 self.time self.pct total.time total.pct
 FUN   94.16 16.5  94.16  16.5
 unlist80.46 14.1 120.54  21.1
 lapply76.94 13.5 255.48  44.7
 match 60.76 10.6  60.88  10.7
 as.matrix.data.frame  31.00  5.4  51.12   8.9
 as.character  29.28  5.1  29.28   5.1
 unique.default24.36  4.3  24.40   4.3
 data.frame21.06  3.7  55.78   9.8
 split.default 20.42  3.6  84.38  14.8
 tapply13.84  2.4 414.28  72.5
 structure 11.32  2.0  22.36   3.9
 factor11.08  1.9 127.68  22.3
 attributes-  11.00  1.9  11.00   1.9
 ==10.56  1.8  10.56   1.8
 %*%   10.30  1.8  10.30   1.8
 as.vector 10.22  1.8  10.22   1.8
 as.integer 9.86  1.7   9.86   1.7
 list   9.64  1.7   9.64   1.7
 exp7.12  1.2   7.12   1.2
 as.data.frame.integer  5.98  1.0   8.10   1.4

 To: bbom...@hotmail.com
 CC: jholt...@gmail.com; r-help@r-project.org
 Subject: Re: [R] avoiding loop
 From: mtmor...@fhcrc.org
 Date: Sun, 1 Nov 2009 22:14:09 -0800

 parkbomee bbom...@hotmail.com writes:

  Thank you all.
 
  What Chuck has suggested might not be applicable since the number of
  different times is around 40,000.
 
  The object of optimization in my function is the varying value,
  which is basically data * parameter, of which parameter is the
  object of optimization..
 
  And from the r profiling with a subset of data,
  I got this report..any idea what Anonymous is?
 
 
  $by.total
  total.time total.pct self.time self.pct
  Anonymous 571.56 100.0 0.02 0.0
  optim 571.56 100.0 0.00 0.0
  fn 571.54 100.0 0.98 0.2

 You're giving us 'by.total', so these are saying that all the time was
 spent in these functions or the functions they called. Probably all
 are in 'optim' and its arguments; since little self.time is spent
 here, there isn't much to work with

  eval 423.74 74.1 0.00 0.0
  with.default 423.74 74.1 0.00 0.0
  with 423.74 74.1 0.00 0.0

 These are probably in the internals of optim, where the function
 you're trying to optimize is being set up for evaluation. Again
 there's little self.time, and all these say is that a big piece of the
 time is being spent in code called by this code.

  tapply 414.28 72.5 13.84 2.4
  lapply 255.48 44.7 76.94 13.5
  factor 127.68 22.3 11.08 1.9
  unlist 120.54 21.1 80.46 14.1
  FUN 94.16 16.5 94.16 16.5

 these look like they are tapply-related calls (looking at the code for
 tapply, it calls lapply, factor, and unlist, and FUN is the function
 argument to tapply), perhaps from the function you're optimizing (did
 you implement this as suggested below? it would really help to have a
 possibly simplified version of the code you're calling).

 There is material to work with here, as apparently a fairly large
 amount of self.time is being spent in each of these functions. So
 here's a sample data set

 n - 10
 set.seed(123)
 df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
 value=ceiling(runif(n)*5))

 It would have been helpful for you to provide reproducible code like
 that above, so that the characteristics of your data were easily
 reproducible. Let's time tapply

  replicate(5, {
 + system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
 + })
 [1] 0.316 0.316 0.308

Re: [R] avoiding loop

2009-11-01 Thread Charles C. Berry

On Sat, 31 Oct 2009, parkbomee wrote:



Thank you both.

However, using tapply() instead of a loop does not seem to improve my code much.
I am using this inside of an optimization function,
and it still takes more than it needs...




Well, you haven't given us much to work with.

The optimization choices depend on the particulars of your problem, which 
you've not detailed.


It does not take long to run the tapply() code once, so you need to do it 
many times. Right?


If so, you need to say how the structure varies (rendered as DF in David's 
reply) from iteration to iteration in the optimization.


If it turns out that only 'value' changes and that the number of different 
times is not too large, then precomputing suitable indicator matrics may 
help:


mat1 - model.matrix( ~ 0 + factor(time):as.numeric(choice==1),DF)
mat2 - model.matrix( ~ 0 + factor(time), DF )

Inside the optimization use something like

with(DF,(value%*%mat1)/(value%*%mat2))

If the structure can change or the number of unique times is large, then 
with so simple a calculation you should probably just inline some C code.


http://cran.r-project.org/web/packages/inline/index.html

HTH,

Chuck





CC: bbom...@hotmail.com; r-help@r-project.org
From: dwinsem...@comcast.net
To: d.rizopou...@erasmusmc.nl
Subject: Re: [R] avoiding loop
Date: Sat, 31 Oct 2009 22:26:17 -0400

This is pretty much equivalent:

tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
 tapply(DF$value, DF$time, sum)

And both will probably fail if the number of groups with choice==1 is
different than the number overall.

--
David.

On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:


one approach is the following:

# say 'DF' is your data frame, then
with(DF, {
   ind - choice == 1
   n - tapply(value[ind], time[ind], sum)
   d - tapply(value, time, sum)
   n / d
})


I hope it helps.

Best,
Dimitris


parkbomee wrote:

Hi all,
I am trying to figure out a way to improve my code's efficiency by
avoiding the use of loop.
I want to calculate a conditional mean(?) given time.
For example, from the data below, I want to calculate sum((value|
choice==1)/sum(value)) across time.
Is there a way to do it without using a loop?
time  cum_time  choicevalue
1 4 1   3
1 4  0   2
1  4 0   3
1  4 0   3
2 6 1   4
2 6 0   4
2 6 0   2
2 6 0   4
2 6 0   2
2 6 0   2 3 4
1   2 3 4 0   3 3
4 0   5 3 4 0   2
My code looks like
objective[1] = value[1] / sum(value[1:cum_time[1])
for (i in 2:max(time)){
objective[i] = value[cum_time[i-1]+1] /
sum(value[(cum_time[i-1]+1) : cum_time[i])])
}
sum(objective)
Anyone have an idea that I can do this without using a loop??
Thanks.

_
[[elided Hotmail spam]]
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



_
[[elided Hotmail spam]]
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-11-01 Thread jim holtman
What you need to do is to understand how to use Rprof so that you can
determine where the time is being spent.  It probably indicates that
this is not the source of slowness in your optimization function.  How
much time are we talking about?  You may spent more time trying to
optimize the function than just running the current version even if it
is slow (slow is a relative term and does not hold much meaning
without some context round it).

On Sat, Oct 31, 2009 at 11:36 PM, parkbomee bbom...@hotmail.com wrote:

 Thank you both.

 However, using tapply() instead of a loop does not seem to improve my code 
 much.
 I am using this inside of an optimization function,
 and it still takes more than it needs...



 CC: bbom...@hotmail.com; r-help@r-project.org
 From: dwinsem...@comcast.net
 To: d.rizopou...@erasmusmc.nl
 Subject: Re: [R] avoiding loop
 Date: Sat, 31 Oct 2009 22:26:17 -0400

 This is pretty much equivalent:

 tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
          tapply(DF$value, DF$time, sum)

 And both will probably fail if the number of groups with choice==1 is
 different than the number overall.

 --
 David.

 On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:

  one approach is the following:
 
  # say 'DF' is your data frame, then
  with(DF, {
     ind - choice == 1
     n - tapply(value[ind], time[ind], sum)
     d - tapply(value, time, sum)
     n / d
  })
 
 
  I hope it helps.
 
  Best,
  Dimitris
 
 
  parkbomee wrote:
  Hi all,
  I am trying to figure out a way to improve my code's efficiency by
  avoiding the use of loop.
  I want to calculate a conditional mean(?) given time.
  For example, from the data below, I want to calculate sum((value|
  choice==1)/sum(value)) across time.
  Is there a way to do it without using a loop?
  time  cum_time  choice    value
  1         4             1           3
  1         4              0           2
  1          4             0           3
  1          4             0           3
  2         6             1           4
  2         6             0           4
  2         6             0           2
  2         6             0           4
  2         6             0           2
  2         6             0           2 3         4
  1           2 3         4             0           3 3
  4             0           5 3         4             0           2
  My code looks like
  objective[1] = value[1] / sum(value[1:cum_time[1])
  for (i in 2:max(time)){
      objective[i] = value[cum_time[i-1]+1] /
  sum(value[(cum_time[i-1]+1) : cum_time[i])])
  }
  sum(objective)
  Anyone have an idea that I can do this without using a loop??
  Thanks.
 
  _
  [[elided Hotmail spam]]
     [[alternative HTML version deleted]]
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  --
  Dimitris Rizopoulos
  Assistant Professor
  Department of Biostatistics
  Erasmus University Medical Center
 
  Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
  Tel: +31/(0)10/7043478
  Fax: +31/(0)10/7043014
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT


 _
 [[elided Hotmail spam]]
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-11-01 Thread parkbomee

Thank you all.

What Chuck has suggested might not be applicable since the number of different 
times is around 40,000.
The object of optimization in my function is the varying value, which is 
basically data * parameter, of which parameter is the object of optimization..
 
And from the r profiling with a subset of data,
I got this report..any idea what Anonymous is?


$by.total
total.time total.pct self.time self.pct
Anonymous   571.56 100.0  0.02  0.0
optim 571.56 100.0  0.00  0.0
fn571.54 100.0  0.98  0.2
eval  423.74  74.1  0.00  0.0
with.default  423.74  74.1  0.00  0.0
with  423.74  74.1  0.00  0.0
tapply414.28  72.5 13.84  2.4
lapply255.48  44.7 76.94 13.5
factor127.68  22.3 11.08  1.9
unlist120.54  21.1 80.46 14.1
FUN94.16  16.5 94.16 16.5
.
.
.
.
.


 Date: Sun, 1 Nov 2009 15:35:41 -0400
 Subject: Re: [R] avoiding loop
 From: jholt...@gmail.com
 To: bbom...@hotmail.com
 CC: dwinsem...@comcast.net; d.rizopou...@erasmusmc.nl; r-help@r-project.org
 
 What you need to do is to understand how to use Rprof so that you can
 determine where the time is being spent.  It probably indicates that
 this is not the source of slowness in your optimization function.  How
 much time are we talking about?  You may spent more time trying to
 optimize the function than just running the current version even if it
 is slow (slow is a relative term and does not hold much meaning
 without some context round it).
 
 On Sat, Oct 31, 2009 at 11:36 PM, parkbomee bbom...@hotmail.com wrote:
 
  Thank you both.
 
  However, using tapply() instead of a loop does not seem to improve my code 
  much.
  I am using this inside of an optimization function,
  and it still takes more than it needs...
 
 
 
  CC: bbom...@hotmail.com; r-help@r-project.org
  From: dwinsem...@comcast.net
  To: d.rizopou...@erasmusmc.nl
  Subject: Re: [R] avoiding loop
  Date: Sat, 31 Oct 2009 22:26:17 -0400
 
  This is pretty much equivalent:
 
  tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
   tapply(DF$value, DF$time, sum)
 
  And both will probably fail if the number of groups with choice==1 is
  different than the number overall.
 
  --
  David.
 
  On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
 
   one approach is the following:
  
   # say 'DF' is your data frame, then
   with(DF, {
  ind - choice == 1
  n - tapply(value[ind], time[ind], sum)
  d - tapply(value, time, sum)
  n / d
   })
  
  
   I hope it helps.
  
   Best,
   Dimitris
  
  
   parkbomee wrote:
   Hi all,
   I am trying to figure out a way to improve my code's efficiency by
   avoiding the use of loop.
   I want to calculate a conditional mean(?) given time.
   For example, from the data below, I want to calculate sum((value|
   choice==1)/sum(value)) across time.
   Is there a way to do it without using a loop?
   time  cum_time  choicevalue
   1 4 1   3
   1 4  0   2
   1  4 0   3
   1  4 0   3
   2 6 1   4
   2 6 0   4
   2 6 0   2
   2 6 0   4
   2 6 0   2
   2 6 0   2 3 4
   1   2 3 4 0   3 3
   4 0   5 3 4 0   2
   My code looks like
   objective[1] = value[1] / sum(value[1:cum_time[1])
   for (i in 2:max(time)){
   objective[i] = value[cum_time[i-1]+1] /
   sum(value[(cum_time[i-1]+1) : cum_time[i])])
   }
   sum(objective)
   Anyone have an idea that I can do this without using a loop??
   Thanks.
  
   _
   [[elided Hotmail spam]]
  [[alternative HTML version deleted]]
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   --
   Dimitris Rizopoulos
   Assistant Professor
   Department of Biostatistics
   Erasmus University Medical Center
  
   Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
   Tel: +31/(0)10/7043478
   Fax: +31/(0)10/7043014
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide

Re: [R] avoiding loop

2009-11-01 Thread Martin Morgan
parkbomee bbom...@hotmail.com writes:

 Thank you all.

 What Chuck has suggested might not be applicable since the number of
 different times is around 40,000.

 The object of optimization in my function is the varying value,
 which is basically data * parameter, of which parameter is the
 object of optimization..
  
 And from the r profiling with a subset of data,
 I got this report..any idea what Anonymous is?


 $by.total
 total.time total.pct self.time self.pct
 Anonymous   571.56 100.0  0.02  0.0
 optim 571.56 100.0  0.00  0.0
 fn571.54 100.0  0.98  0.2

You're giving us 'by.total', so these are saying that all the time was
spent in these functions or the functions they called. Probably all
are in 'optim' and its arguments; since little self.time is spent
here, there isn't much to work with

 eval  423.74  74.1  0.00  0.0
 with.default  423.74  74.1  0.00  0.0
 with  423.74  74.1  0.00  0.0

These are probably in the internals of optim, where the function
you're trying to optimize is being set up for evaluation. Again
there's little self.time, and all these say is that a big piece of the
time is being spent in code called by this code.

 tapply414.28  72.5 13.84  2.4
 lapply255.48  44.7 76.94 13.5
 factor127.68  22.3 11.08  1.9
 unlist120.54  21.1 80.46 14.1
 FUN94.16  16.5 94.16 16.5

these look like they are tapply-related calls (looking at the code for
tapply, it calls lapply, factor, and unlist, and FUN is the function
argument to tapply), perhaps from the function you're optimizing (did
you implement this as suggested below?  it would really help to have a
possibly simplified version of the code you're calling).

There is material to work with here, as apparently a fairly large
amount of self.time is being spent in each of these functions. So
here's a sample data set

  n - 10
  set.seed(123)
  df - data.frame(time=sort(as.integer(ceiling(runif(n)*n/5))),
   value=ceiling(runif(n)*5))

It would have been helpful for you to provide reproducible code like
that above, so that the characteristics of your data were easily
reproducible. Let's time tapply

 replicate(5, {
+ system.time(x0 - tapply0(df$value, df$time, sum), gcFirst=TRUE)[[1]]
+ })
[1] 0.316 0.316 0.308 0.320 0.304

tapply is quite general, but in your case I think you'd be happy with

  tapply1 - function(X, INDEX, FUN)
  unlist(lapply(split(X, INDEX), FUN), use.names=FALSE)

 replicate(5, {
+ system.time(x1 - tapply1(df$value, df$time, sum), gcFirst=TRUE)[[1]]
+ })
[1] 0.156 0.148 0.152 0.144 0.152

so about twice the speed (timing depends quite a bit on what 'time' is,
integer or numeric or character or factor). The vector values of the
two calculations are identical, though tapply presents the data as an
array with column names

 identical(as.vector(x0), x1)
[1] TRUE

tapply allows FUN to be anything, but if the interest is in the sum of
each time interval, and the time intervals can be assumed to be sorted
(sorting is not expensive, so could be done on the fly), then

  tapply2 - function(X, INDEX)
  {
  csum - cumsum(c(0, X))
  idx - diff(INDEX) != 0
  csum[c(FALSE, idx, TRUE)] - csum[c(TRUE, idx, FALSE)]
  }

calculates the cumulative sum and the points in INDEX where the time
intervals change. It then takes the difference over the appropriate
interval.

 replicate(5, {
+ system.time(x2 - tapply2(df$value, df$time), gcFirst=TRUE)[[1]]
+ })
[1] 0.024 0.024 0.024 0.024 0.024
 identical(as.vector(x0), x2)
[1] TRUE

This approach could be subject to rounding error (if csum gets very
large and the intervals remain small). To calculate values where
choice == 1 I think you'd want to

  tapply2(df$value * (df$choice==1), df$time)

rather than sub-setting, so that the result of tapply2 is always a
vector of the same length even when some time intervals never have
choice==1.

Because tapply in these examples seems so fast compared to your
calculation, I wonder whether optim is evaluating your function many
times, and that reformulating the optimization might lead to a very
substantial speed-up?

Martin

 .
 .
 .
 .
 .


 Date: Sun, 1 Nov 2009 15:35:41 -0400
 Subject: Re: [R] avoiding loop
 From: jholt...@gmail.com
 To: bbom...@hotmail.com
 CC: dwinsem...@comcast.net; d.rizopou...@erasmusmc.nl; r-help@r-project.org
 
 What you need to do is to understand how to use Rprof so that you can
 determine where the time is being spent.  It probably indicates that
 this is not the source of slowness in your optimization function.  How
 much time are we talking about?  You may spent more time trying to
 optimize the function than just running

[R] avoiding loop

2009-10-31 Thread parkbomee

Hi all,

I am trying to figure out a way to improve my code's efficiency by avoiding the 
use of loop.
I want to calculate a conditional mean(?) given time.
For example, from the data below, I want to calculate 
sum((value|choice==1)/sum(value)) across time.
Is there a way to do it without using a loop?

time  cum_time  choicevalue
1 4 1   3
1 4  0   2
1  4 0   3
1  4 0   3
2 6 1   4
2 6 0   4
2 6 0   2
2 6 0   4
2 6 0   2
2 6 0   2 
3 4 1   2 
3 4 0   3 
3 4 0   5 
3 4 0   2 



My code looks like

objective[1] = value[1] / sum(value[1:cum_time[1])
for (i in 2:max(time)){
 objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : 
cum_time[i])])
}
sum(objective)


Anyone have an idea that I can do this without using a loop??
Thanks.

  
_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-10-31 Thread Dimitris Rizopoulos

one approach is the following:

# say 'DF' is your data frame, then
with(DF, {
ind - choice == 1
n - tapply(value[ind], time[ind], sum)
d - tapply(value, time, sum)
n / d
})


I hope it helps.

Best,
Dimitris


parkbomee wrote:

Hi all,

I am trying to figure out a way to improve my code's efficiency by avoiding the 
use of loop.
I want to calculate a conditional mean(?) given time.
For example, from the data below, I want to calculate 
sum((value|choice==1)/sum(value)) across time.
Is there a way to do it without using a loop?

time  cum_time  choicevalue
1 4 1   3
1 4  0   2
1  4 0   3
1  4 0   3
2 6 1   4
2 6 0   4
2 6 0   2
2 6 0   4
2 6 0   2
2 6 0   2 
3 4 1   2 
3 4 0   3 
3 4 0   5 
3 4 0   2 




My code looks like

objective[1] = value[1] / sum(value[1:cum_time[1])
for (i in 2:max(time)){
 objective[i] = value[cum_time[i-1]+1] / sum(value[(cum_time[i-1]+1) : 
cum_time[i])])
}
sum(objective)


Anyone have an idea that I can do this without using a loop??
Thanks.

 		 	   		  
_

[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-10-31 Thread David Winsemius

This is pretty much equivalent:

tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
tapply(DF$value, DF$time, sum)

And both will probably fail if the number of groups with choice==1 is  
different than the number overall.


--
David.

On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:


one approach is the following:

# say 'DF' is your data frame, then
with(DF, {
   ind - choice == 1
   n - tapply(value[ind], time[ind], sum)
   d - tapply(value, time, sum)
   n / d
})


I hope it helps.

Best,
Dimitris


parkbomee wrote:

Hi all,
I am trying to figure out a way to improve my code's efficiency by  
avoiding the use of loop.

I want to calculate a conditional mean(?) given time.
For example, from the data below, I want to calculate sum((value| 
choice==1)/sum(value)) across time.

Is there a way to do it without using a loop?
time  cum_time  choicevalue
1 4 1   3
1 4  0   2
1  4 0   3
1  4 0   3
2 6 1   4
2 6 0   4
2 6 0   2
2 6 0   4
2 6 0   2
2 6 0   2 3 4  
1   2 3 4 0   3 3  
4 0   5 3 4 0   2  
My code looks like

objective[1] = value[1] / sum(value[1:cum_time[1])
for (i in 2:max(time)){
objective[i] = value[cum_time[i-1]+1] /  
sum(value[(cum_time[i-1]+1) : cum_time[i])])

}
sum(objective)
Anyone have an idea that I can do this without using a loop??
Thanks.
		 	   		   
_

[[elided Hotmail spam]]
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoiding loop

2009-10-31 Thread parkbomee

Thank you both.

However, using tapply() instead of a loop does not seem to improve my code much.
I am using this inside of an optimization function,
and it still takes more than it needs...



 CC: bbom...@hotmail.com; r-help@r-project.org
 From: dwinsem...@comcast.net
 To: d.rizopou...@erasmusmc.nl
 Subject: Re: [R] avoiding loop
 Date: Sat, 31 Oct 2009 22:26:17 -0400
 
 This is pretty much equivalent:
 
 tapply(DF$value[DF$choice==1], DF$time[DF$choice==1], sum) /
  tapply(DF$value, DF$time, sum)
 
 And both will probably fail if the number of groups with choice==1 is  
 different than the number overall.
 
 -- 
 David.
 
 On Oct 31, 2009, at 5:14 PM, Dimitris Rizopoulos wrote:
 
  one approach is the following:
 
  # say 'DF' is your data frame, then
  with(DF, {
 ind - choice == 1
 n - tapply(value[ind], time[ind], sum)
 d - tapply(value, time, sum)
 n / d
  })
 
 
  I hope it helps.
 
  Best,
  Dimitris
 
 
  parkbomee wrote:
  Hi all,
  I am trying to figure out a way to improve my code's efficiency by  
  avoiding the use of loop.
  I want to calculate a conditional mean(?) given time.
  For example, from the data below, I want to calculate sum((value| 
  choice==1)/sum(value)) across time.
  Is there a way to do it without using a loop?
  time  cum_time  choicevalue
  1 4 1   3
  1 4  0   2
  1  4 0   3
  1  4 0   3
  2 6 1   4
  2 6 0   4
  2 6 0   2
  2 6 0   4
  2 6 0   2
  2 6 0   2 3 4  
  1   2 3 4 0   3 3  
  4 0   5 3 4 0   2  
  My code looks like
  objective[1] = value[1] / sum(value[1:cum_time[1])
  for (i in 2:max(time)){
  objective[i] = value[cum_time[i-1]+1] /  
  sum(value[(cum_time[i-1]+1) : cum_time[i])])
  }
  sum(objective)
  Anyone have an idea that I can do this without using a loop??
  Thanks.

  _
  [[elided Hotmail spam]]
 [[alternative HTML version deleted]]
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  -- 
  Dimitris Rizopoulos
  Assistant Professor
  Department of Biostatistics
  Erasmus University Medical Center
 
  Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
  Tel: +31/(0)10/7043478
  Fax: +31/(0)10/7043014
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
  
_
[[elided Hotmail spam]]
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.