subject:"\[R\] Vectorization"

Re: [R] vectorization of loops in R

2021-11-18 Thread PIKAL Petr

Hi

above tapply and aggregate, split *apply could be used)

sapply(with(df, split(z, y)), mean)

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Luigi Marongiu
> Sent: Wednesday, November 17, 2021 2:21 PM
> To: r-help 
> Subject: [R] vectorization of loops in R
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
> > (m)
> a  b  c  d  e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of loops in R

2021-11-17 Thread Jan van der Laan


Have a look at the base functions tapply and aggregate.

For example see:
- 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-function-tapply_0028_0029-and-ragged-arrays 
,

- https://online.stat.psu.edu/stat484/lesson/9/9.2,
- or ?tapply and ?aggregate.

Also your current code seems to contain an error: `s = df[df$y == i,]` 
should be `s = df$z[df$y == i]` I think.


HTH,
Jan






On 17-11-2021 14:20, Luigi Marongiu wrote:

Hello,
I have a dataframe with 3 variables. I want to loop through it to get
the mean value of the variable `z`, as follows:
```
df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
y = rep(letters[1:5],3),
z = rnorm(15),
stringsAsFactors = FALSE)
m = vector()
for (i in unique(df$y)) {
s = df[df$y == i,]
m = append(m, mean(s$z))
}
names(m) = unique(df$y)

(m)

a  b  c  d  e
-0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
```
The problem is that I have one million `y` values, so the work takes
almost a day. I understand that vectorization will speed up the
procedure. But how shall I write the procedure in vectorial terms?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of loops in R

2021-11-17 Thread Kevin Thorpe

If I follow what you are trying to do, you want the mean of z for each value of 
y.

tapply(df$z, df$y, mean)


> On Nov 17, 2021, at 8:20 AM, Luigi Marongiu  wrote:
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a  b  c  d  e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael’s Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization of loops in R

2021-11-17 Thread Luigi Marongiu

Hello,
I have a dataframe with 3 variables. I want to loop through it to get
the mean value of the variable `z`, as follows:
```
df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
y = rep(letters[1:5],3),
z = rnorm(15),
stringsAsFactors = FALSE)
m = vector()
for (i in unique(df$y)) {
s = df[df$y == i,]
m = append(m, mean(s$z))
}
names(m) = unique(df$y)
> (m)
a  b  c  d  e
-0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
```
The problem is that I have one million `y` values, so the work takes
almost a day. I understand that vectorization will speed up the
procedure. But how shall I write the procedure in vectorial terms?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization in a random order

2016-11-10 Thread Jeff Newmiller

I think you answered your own question. For loops are not a boogeyman... poor 
memory management is.

Algorithms that are sensitive to evaluation sequence are often not very 
re-usable, and certainly not parallelizable. If you have a specific algorithm 
in mind, there may be some advice we can give you about optimization, but as it 
stands think you know how to get a working implementation. 
-- 
Sent from my phone. Please excuse my brevity.

On November 10, 2016 5:06:07 AM PST, Thomas Chesney 
 wrote:
>Is there a way to use vectorization where the elements are evaluated in
>a random order?
>
>For instance, if the code is to be run on each row in a matrix of
>length nBuy the following will do the job
>
>for (b in sample(1:nBuy,nBuy, replace=FALSE)){
>
>}
>
>but
>
>apply(nBuyMat, 1, function(x))
>
>will be run I believe, in the same order each time (Row1, then Row2,
>then Row3 etc.)
>
>This is important for building agent based models (the classic
>explanation of this is probably Huberman & Glance's response to Nowak &
>May's 1992 Nature article - Evolutionary games and computer
>simulations, http://www.pnas.org/content/90/16/7716.abstract)
>
>Thank you,
>
>Thomas
>http://www.nottingham.ac.uk/~liztc/Personal/index.html
>
>
>
>This message and any attachment are intended solely for the addressee
>and may contain confidential information. If you have received this
>message in error, please send it back to me, and immediately delete it.
>
>
>Please do not use, copy or disclose the information contained in this
>message or in any attachment.  Any views or opinions expressed by the
>author of this email do not necessarily reflect the views of the
>University of Nottingham.
>
>This message has been checked for viruses but the contents of an
>attachment may still contain software viruses which could damage your
>computer system, you are advised to perform your own checks. Email
>communications with the University of Nottingham may be monitored as
>permitted by UK legislation.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization in a random order

2016-11-10 Thread Bert Gunter

You are mistaken. apply() is *not* vectorized. It is a disguised loop.

For true vectorization at the C level, the answer must be no, as the
whole point is to treat the argument as a whole object and hide the
iterative details.

However, as you indicated, you can always manually randomize the
indexing that is being iterated over and even write a function to do
it if you like; e.g. (warning: esentially untested and probably clumsy
as well as buggy)

randapply <- function(X, MARGIN, FUN,...)
{
   d <- dim(X)
   ix <- as.list(rep(TRUE,length(d)))
   for(i in MARGIN) ix[[i]] <- sample(seq_len(d[i]),d[i])
   X <- do.call("[", c(list(X), ix))
   apply(X,MARGIN,FUN,...)
}

> a <- array(1:24,dim = 2:4)

> randapply(a, 3,mean)
[1]  9.5 21.5 15.5  3.5

> randapply(a,3,mean)
[1] 21.5  3.5 15.5  9.5


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Nov 10, 2016 at 5:06 AM, Thomas Chesney
 wrote:
> Is there a way to use vectorization where the elements are evaluated in a 
> random order?
>
> For instance, if the code is to be run on each row in a matrix of length nBuy 
> the following will do the job
>
> for (b in sample(1:nBuy,nBuy, replace=FALSE)){
>
> }
>
> but
>
> apply(nBuyMat, 1, function(x))
>
> will be run I believe, in the same order each time (Row1, then Row2, then 
> Row3 etc.)
>
> This is important for building agent based models (the classic explanation of 
> this is probably Huberman & Glance's response to Nowak & May's 1992 Nature 
> article - Evolutionary games and computer simulations, 
> http://www.pnas.org/content/90/16/7716.abstract)
>
> Thank you,
>
> Thomas
> http://www.nottingham.ac.uk/~liztc/Personal/index.html
>
>
>
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it.
>
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization in a random order

2016-11-10 Thread Richard M. Heiberger

nBuyMat <- data.frame(matrix(rnorm(28), 7, 4))
nBuyMat
nBuy <- nrow(nBuyMat)
sample(1:nBuy, nBuy, replace=FALSE)
sample(1:nBuy)
sample(nBuy)
?sample
apply(nBuyMat[sample(1:nBuy,nBuy, replace=FALSE),], 1, function(x) sum(x))

apply(nBuyMat[sample(nBuy),], 1, function(x) sum(x))

The defaults for sample do what you have requested.

If the original row identification matters, then be sure the to use
either a matrix with rownames or a data.frame.

Rich

Sent from my iPhone

> On Nov 10, 2016, at 08:06, Thomas Chesney  
> wrote:
>
> for (b in sample(1:nBuy,nBuy, replace=FALSE)){
>
> }
>
> but
>
> apply(nBuyMat, 1, function(x))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization in a random order

2016-11-10 Thread Thomas Chesney

Is there a way to use vectorization where the elements are evaluated in a 
random order?

For instance, if the code is to be run on each row in a matrix of length nBuy 
the following will do the job

for (b in sample(1:nBuy,nBuy, replace=FALSE)){

}

but

apply(nBuyMat, 1, function(x))

will be run I believe, in the same order each time (Row1, then Row2, then Row3 
etc.)

This is important for building agent based models (the classic explanation of 
this is probably Huberman & Glance's response to Nowak & May's 1992 Nature 
article - Evolutionary games and computer simulations, 
http://www.pnas.org/content/90/16/7716.abstract)

Thank you,

Thomas
http://www.nottingham.ac.uk/~liztc/Personal/index.html



This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please send it back to me, and immediately delete it. 

Please do not use, copy or disclose the information contained in this
message or in any attachment.  Any views or opinions expressed by the
author of this email do not necessarily reflect the views of the
University of Nottingham.

This message has been checked for viruses but the contents of an
attachment may still contain software viruses which could damage your
computer system, you are advised to perform your own checks. Email
communications with the University of Nottingham may be monitored as
permitted by UK legislation.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of rolling function

2014-12-08 Thread Arnaud Duranel


Great, many thanks for your help Jeff.
Apologies for the HTML format, I'll be more careful next time.
Arnaud

On 08/12/2014 08:25, Jeff Newmiller wrote:
Please don't post in HTML... you may not recognize it, but the 
receiving end does not necessarily (and in this case did not) look 
like the sending end, and the cleanup can impede answers you are 
hoping to get.


In many cases, loops can be vectorized.  However, near as I can tell 
this is an example of an algorithm that simply needs a loop [1].


One bit of advice: the coredata function is horribly slow. Just 
converting your time series objects to numeric vectors for the purpose 
of this computation sped up the algorithm by 500x on 1 point 
series. Converting it to inline C++ as below sped it up by yet another 
factor of 40x. 2x is nothing to sneeze at.


###
## optional temporary setup for windows
## assumes you have installed Rtools
gcc - C:\\Rtools\\bin
rtools - C:\\Rtools\\gcc-4.6.3\\bin
path - strsplit(Sys.getenv(PATH), ;)[[1]]
new_path - c(rtools, gcc, path)
new_path - new_path[!duplicated(tolower(new_path))]
Sys.setenv(PATH = paste(new_path, collapse = ;))
## end of optional

library(Rcpp)

cppFunction(
DataFrame EvapSimRcpp( NumericVector RR
  , NumericVector ETmax
  , const double Smax
  , const double initialStorage ) {
  int n = RR.size();

  // create empty time-series to fill
  // effective rainfall (i.e. rainfall minus intercepted rainfall)
  NumericVector RReff( n );
  // intercepted rainfall( n );
  NumericVector Rint( n );
  // residual potential evapotranspiration (ie ETmax minus
  // evaporation from interception)
  NumericVector ETres( n, NA_REAL );
  double evap;

  // volume of water in interception storage at start of
  // computation
  double storage = initialStorage;

  for ( int i=0; in; i++ ) {
// compute interception capacity for time step i (maximum
// interception capacity minus any water intercepted but not
// evaporated during previous time-step).
Rint[ i ] = Smax - storage;
// compute intercepted rainfall: equal to rainfall if smaller
// than interception capacity, and to interception capacity if
// larger.
if ( RR[ i ]  Rint[ i ] ) Rint[ i ] = RR[ i ];
// compute effective rainfall (rainfall minus intercepted
// rainfall).
RReff[ i ] = RR[ i ] - Rint[ i ];
// update interception storage: initial interception storage +
// intercepted
// rainfall.
storage = storage + Rint[ i ];
// compute evaporation from interception storage: equal to
// potential evapotranspiration if the latter is smaller than
// interception storage, and to interception storage if larger.
if ( storage  ETmax[ i ] )
  evap = ETmax[ i ];
else
  evap = storage;
// compute residual potentiel evapotranspiration: potential
// evapotranspiration minus evaporation from interception
// storage.
ETres[ i ] = ETmax[ i ] - evap;
// update interception storage, to be carried over to next
// time-step: interception storage minus evaporation from
// interception storage.
storage = storage - evap;
  }
  DataFrame DF = DataFrame::create( Named( \int\ ) = Rint
  , Named( \RReff\ ) = RReff
  , Named( \ETres\ ) = ETres
  );
  return DF;
}
)

# Assumes your initial variables are already defined
EvapSimRcpp( RR, ETmax, Smax, 0 )

###

[1] 
http://stackoverflow.com/questions/7153586/can-i-vectorize-a-calculation-which-depends-on-previous-elements


On Sat, 6 Dec 2014, A Duranel wrote:


Hello
I use R to run a simple model of rainfall interception by vegetation:
rainfall falls on vegetation, some is retained by the vegetation 
(part of

which can evaporate), the rest falls on the ground (quite crude but very
similar to those used in SWAT or MikeSHE, for the hydrologists among 
you).

It uses a loop on zoo time-series of rainfall and potential
evapotranspiration. Unfortunately I did not find a way to vectorize 
it and
it takes ages to run on long datasets. Could anybody help me to make 
it run

faster?

library(zoo)
set.seed(1)
# artificial potential evapotranspiration time-series
ETmax-zoo(runif(10, min=1, max=6), c(1:10))
# artificial rainfall time-series
RR-zoo(runif(10, min=0, max=6), c(1:10))

## create empty time-series to fill
# effective rainfall (i.e. rainfall minus intercepted rainfall)
RReff-zoo(NA, c(1:10))
# intercepted rainfall
int-zoo(NA, c(1:10))
# residual potential evapotranspiration (ie ETmax minus evaporation from
interception)
ETres-zoo(NA, c(1:10))

# define maximum interception storage capacity (maximum volume of 
rainfall
that can be intercepted per time step, provided the interception 
store is

empty at start of time-step)
Smax-3
# volume of water in interception storage at start of computation
storage-0

for (i in 1:length(ETmax)) {
 # compute interception

Re: [R] vectorization of rolling function

2014-12-08 Thread Jeff Newmiller

Please don't post in HTML... you may not recognize it, but the receiving 
end does not necessarily (and in this case did not) look like the sending 
end, and the cleanup can impede answers you are hoping to get.


In many cases, loops can be vectorized.  However, near as I can tell this 
is an example of an algorithm that simply needs a loop [1].


One bit of advice: the coredata function is horribly slow. Just converting 
your time series objects to numeric vectors for the purpose of this 
computation sped up the algorithm by 500x on 1 point series. 
Converting it to inline C++ as below sped it up by yet another factor of 
40x. 2x is nothing to sneeze at.


###
## optional temporary setup for windows
## assumes you have installed Rtools
gcc - C:\\Rtools\\bin
rtools - C:\\Rtools\\gcc-4.6.3\\bin
path - strsplit(Sys.getenv(PATH), ;)[[1]]
new_path - c(rtools, gcc, path)
new_path - new_path[!duplicated(tolower(new_path))]
Sys.setenv(PATH = paste(new_path, collapse = ;))
## end of optional

library(Rcpp)

cppFunction(
DataFrame EvapSimRcpp( NumericVector RR
  , NumericVector ETmax
  , const double Smax
  , const double initialStorage ) {
  int n = RR.size();

  // create empty time-series to fill
  // effective rainfall (i.e. rainfall minus intercepted rainfall)
  NumericVector RReff( n );
  // intercepted rainfall( n );
  NumericVector Rint( n );
  // residual potential evapotranspiration (ie ETmax minus
  // evaporation from interception)
  NumericVector ETres( n, NA_REAL );
  double evap;

  // volume of water in interception storage at start of
  // computation
  double storage = initialStorage;

  for ( int i=0; in; i++ ) {
// compute interception capacity for time step i (maximum
// interception capacity minus any water intercepted but not
// evaporated during previous time-step).
Rint[ i ] = Smax - storage;
// compute intercepted rainfall: equal to rainfall if smaller
// than interception capacity, and to interception capacity if
// larger.
if ( RR[ i ]  Rint[ i ] ) Rint[ i ] = RR[ i ];
// compute effective rainfall (rainfall minus intercepted
// rainfall).
RReff[ i ] = RR[ i ] - Rint[ i ];
// update interception storage: initial interception storage +
// intercepted
// rainfall.
storage = storage + Rint[ i ];
// compute evaporation from interception storage: equal to
// potential evapotranspiration if the latter is smaller than
// interception storage, and to interception storage if larger.
if ( storage  ETmax[ i ] )
  evap = ETmax[ i ];
else
  evap = storage;
// compute residual potentiel evapotranspiration: potential
// evapotranspiration minus evaporation from interception
// storage.
ETres[ i ] = ETmax[ i ] - evap;
// update interception storage, to be carried over to next
// time-step: interception storage minus evaporation from
// interception storage.
storage = storage - evap;
  }
  DataFrame DF = DataFrame::create( Named( \int\ ) = Rint
  , Named( \RReff\ ) = RReff
  , Named( \ETres\ ) = ETres
  );
  return DF;
}
)

# Assumes your initial variables are already defined
EvapSimRcpp( RR, ETmax, Smax, 0 )

###

[1] 
http://stackoverflow.com/questions/7153586/can-i-vectorize-a-calculation-which-depends-on-previous-elements

On Sat, 6 Dec 2014, A Duranel wrote:


Hello
I use R to run a simple model of rainfall interception by vegetation:
rainfall falls on vegetation, some is retained by the vegetation (part of
which can evaporate), the rest falls on the ground (quite crude but very
similar to those used in SWAT or MikeSHE, for the hydrologists among you).
It uses a loop on zoo time-series of rainfall and potential
evapotranspiration. Unfortunately I did not find a way to vectorize it and
it takes ages to run on long datasets. Could anybody help me to make it run
faster?

library(zoo)
set.seed(1)
# artificial potential evapotranspiration time-series
ETmax-zoo(runif(10, min=1, max=6), c(1:10))
# artificial rainfall time-series
RR-zoo(runif(10, min=0, max=6), c(1:10))

## create empty time-series to fill
# effective rainfall (i.e. rainfall minus intercepted rainfall)
RReff-zoo(NA, c(1:10))
# intercepted rainfall
int-zoo(NA, c(1:10))
# residual potential evapotranspiration (ie ETmax minus evaporation from
interception)
ETres-zoo(NA, c(1:10))

# define maximum interception storage capacity (maximum volume of rainfall
that can be intercepted per time step, provided the interception store is
empty at start of time-step)
Smax-3
# volume of water in interception storage at start of computation
storage-0

for (i in 1:length(ETmax)) {
 # compute interception capacity for time step i (maximum interception
capacity minus any water intercepted but not evaporated during previous
time-step).
 int[i]-Smax-storage
 # compute

[R] vectorization of rolling function

2014-12-06 Thread A Duranel

Hello
I use R to run a simple model of rainfall interception by vegetation:
rainfall falls on vegetation, some is retained by the vegetation (part of
which can evaporate), the rest falls on the ground (quite crude but very
similar to those used in SWAT or MikeSHE, for the hydrologists among you).
It uses a loop on zoo time-series of rainfall and potential
evapotranspiration. Unfortunately I did not find a way to vectorize it and
it takes ages to run on long datasets. Could anybody help me to make it run
faster?

library(zoo)
set.seed(1)
# artificial potential evapotranspiration time-series
ETmax-zoo(runif(10, min=1, max=6), c(1:10))
# artificial rainfall time-series
RR-zoo(runif(10, min=0, max=6), c(1:10))

## create empty time-series to fill
# effective rainfall (i.e. rainfall minus intercepted rainfall)
RReff-zoo(NA, c(1:10))
# intercepted rainfall
int-zoo(NA, c(1:10))
# residual potential evapotranspiration (ie ETmax minus evaporation from
interception)
ETres-zoo(NA, c(1:10))

# define maximum interception storage capacity (maximum volume of rainfall
that can be intercepted per time step, provided the interception store is
empty at start of time-step)
Smax-3
# volume of water in interception storage at start of computation
storage-0

for (i in 1:length(ETmax)) {
  # compute interception capacity for time step i (maximum interception
capacity minus any water intercepted but not evaporated during previous
time-step).
  int[i]-Smax-storage
  # compute intercepted rainfall: equal to rainfall if smaller than
interception capacity, and to interception capacity if larger.
  if(RR[i]int[i]) int[i]lt;-RR[i]
  # compute effective rainfall (rainfall minus intercepted rainfall).
  RReff[i]lt;-RR[i]-int[i]
  # update interception storage: initial interception storage + intercepted
rainfall.
  storagelt;-storage+coredata(int[i])
  # compute evaporation from interception storage: equal to potential
evapotranspiration if the latter is smaller than interception storage, and
to interception storage if larger. 
  if(storagecoredata(ETmax[i])) evap-coredata(ETmax[i]) else evap-storage
  # compute residual potentiel evapotranspiration: potential
evapotranspiration minus evaporation from interception storage.
  ETres[i]-ETmax[i]-evap
  # update interception storage, to be carried over to next time-step:
interception storage minus evaporation from interception storage.
  storage-storage-evap
}

Many thanks for your help!

Arnaud
UCL Department of Geography, UK



--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-of-rolling-function-tp4700487.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization

2014-01-29 Thread Bill

Hi. I saw this example and I cannot begin to figure out how it works. Can
anyone give me an idea on this?

n = 9e6
df = data.frame(values = rnorm(n),
ID = rep(LETTERS[1:3], each = n/3),
stringsAsFactors = FALSE)
 head(df)
  values ID
1 -0.7355823  A
2 -0.4729925  A
3 -0.7417259  A
4  1.7633367  A
5 -0.3006790  A
6  0.6785947  A


The idea is to replace all occurrences of A by'Text for A'.

He does this:

translator_vector = c(A = 'Text for A',
  B = 'Text for B',
  C = 'Text for C')

and subset this vector using df$ID:

dum_vectorized = translator_vector[df$ID]

It works but I have no idea why.

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization

2014-01-29 Thread PIKAL Petr

Hi

everything is written in docs. However this example is a little tricky.

Each df$ID matches name of item in translator_vector and [] selects this 
matched item.

It is similar like
x-sample(1:3, 10,replace=T)
translator_vector[x]

Regards
Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Bill
 Sent: Wednesday, January 29, 2014 12:41 PM
 To: r-help@r-project.org
 Subject: [R] vectorization

 Hi. I saw this example and I cannot begin to figure out how it works.
 Can anyone give me an idea on this?

 n = 9e6
 df = data.frame(values = rnorm(n),
 ID = rep(LETTERS[1:3], each = n/3),
 stringsAsFactors = FALSE)
  head(df)
   values ID
 1 -0.7355823  A
 2 -0.4729925  A
 3 -0.7417259  A
 4  1.7633367  A
 5 -0.3006790  A
 6  0.6785947  A


 The idea is to replace all occurrences of A by'Text for A'.

 He does this:

 translator_vector = c(A = 'Text for A',
   B = 'Text for B',
   C = 'Text for C')

 and subset this vector using df$ID:

 dum_vectorized = translator_vector[df$ID]

 It works but I have no idea why.

 Thank you.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization

2014-01-29 Thread Duncan Murdoch


On 14-01-29 6:41 AM, Bill wrote:

Hi. I saw this example and I cannot begin to figure out how it works. Can
anyone give me an idea on this?

n = 9e6
df = data.frame(values = rnorm(n),
 ID = rep(LETTERS[1:3], each = n/3),
 stringsAsFactors = FALSE)

head(df)

   values ID
1 -0.7355823  A
2 -0.4729925  A
3 -0.7417259  A
4  1.7633367  A
5 -0.3006790  A
6  0.6785947  A


The idea is to replace all occurrences of A by'Text for A'.

He does this:

translator_vector = c(A = 'Text for A',
   B = 'Text for B',
   C = 'Text for C')

and subset this vector using df$ID:

dum_vectorized = translator_vector[df$ID]

It works but I have no idea why.


He is indexing by name.  The translator_vector looks like this:

   ABC
Text for A Text for B Text for C

The first element is named A, the second B, the third C.

So translator_vector[A] is the same as translator_vector[1].  The ID 
column in your dataframe is a vector of strings to be used as names, so 
each one pulls out one element from the translator_vector.


Duncan Murdoch



Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization

2014-01-29 Thread Bill

Oh wow, I guess I get it!
Thank you. It is pretty tricky but I saw that it works very fast.


On Wed, Jan 29, 2014 at 9:31 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 14-01-29 6:41 AM, Bill wrote:

 Hi. I saw this example and I cannot begin to figure out how it works. Can
 anyone give me an idea on this?

 n = 9e6
 df = data.frame(values = rnorm(n),
  ID = rep(LETTERS[1:3], each = n/3),
  stringsAsFactors = FALSE)

 head(df)

values ID
 1 -0.7355823  A
 2 -0.4729925  A
 3 -0.7417259  A
 4  1.7633367  A
 5 -0.3006790  A
 6  0.6785947  A


 The idea is to replace all occurrences of A by'Text for A'.

 He does this:

 translator_vector = c(A = 'Text for A',
B = 'Text for B',
C = 'Text for C')

 and subset this vector using df$ID:

 dum_vectorized = translator_vector[df$ID]

 It works but I have no idea why.


 He is indexing by name.  The translator_vector looks like this:

ABC
 Text for A Text for B Text for C

 The first element is named A, the second B, the third C.

 So translator_vector[A] is the same as translator_vector[1].  The ID
 column in your dataframe is a vector of strings to be used as names, so
 each one pulls out one element from the translator_vector.

 Duncan Murdoch



 Thank you.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization modifying globals in functions

2012-12-28 Thread Greg Snow

In current versions of R the apply functions do not gain much (if any) in
speed over a well written for loop (the for loops are much more efficient
than they used to be).

Using global variables could actually slow things down a little for what
you are doing, if you use `-` then it has to search through multiple
environments to find which to replace.

In general you should avoid using global variables.  It is best to pass all
needed variables into a function as arguments, do any modifications
internally inside the function on local copies, then return the modified
local copy from the function (you can use a list if you want to return
multiple variables).

Since each iteration of your code depends on the previous iteration,
vectorizing is not going to help (or even be reasonable).

If you want to speed up the code then you might consider a compiled option,
see the inline or rcpp packages (or others).


On Thu, Dec 27, 2012 at 1:38 PM, Sam Steingold s...@gnu.org wrote:

 I have the following code:

 --8---cut here---start-8---
 d - rep(10,10)
 for (i in 1:100) {
   a - sample.int(length(d), size = 2)
   if (d[a[1]] = 1) {
 d[a[1]] - d[a[1]] - 1
 d[a[2]] - d[a[2]] + 1
   }
 }
 --8---cut here---end---8---

 it does what I want, i.e., modified vector d 100 times.

 Now, if I want to repeat this 1e6 times instead of 1e2 times, I want to
 vectorize it for speed, so I do this:

 --8---cut here---start-8---
 update - function (i) {
   a - sample.int(n.agents, size = 2)
   if (d[a[1]] = delta) {
 d[a[1]] - d[a[1]] - 1
 d[a[2]] - d[a[2]] + 1
   }
   entropy(d, unit=log2)
 }
 system.time(entropy.history - sapply(1:1e6,update))
 --8---cut here---end---8---

 however, the global d is not modified, apparently update modifies the
 local copy.

 so,
 1. is there a way for a function to modify a global variable?
 2. how would you vectorize this loop?

 thanks!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
 11.0.11103000
 http://www.childpsy.net/ http://honestreporting.com
 http://pmw.org.il http://www.PetitionOnline.com/tap12009/
 A number problem solved with floats turns into 1.9998
 problems.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization modifying globals in functions

2012-12-28 Thread David Winsemius



On Dec 27, 2012, at 12:38 PM, Sam Steingold wrote:


I have the following code:

--8---cut here---start-8---
d - rep(10,10)
for (i in 1:100) {
 a - sample.int(length(d), size = 2)
 if (d[a[1]] = 1) {
   d[a[1]] - d[a[1]] - 1
   d[a[2]] - d[a[2]] + 1
 }
}
--8---cut here---end---8---

it does what I want, i.e., modified vector d 100 times.

Now, if I want to repeat this 1e6 times instead of 1e2 times, I want  
to

vectorize it for speed, so I do this:


You could get some modest improvement by vectorizing the two  
lookups, additions, and assignments into one:


 d[a] - d[a]-c(1,-1)

In a test with 10 iterations, it yields about a  1.693/1.394 -1  =  
21 percent improvement.


--8---cut here---start-8---
update - function (i) {
 a - sample.int(n.agents, size = 2)
 if (d[a[1]] = delta) {
   d[a[1]] - d[a[1]] - 1
   d[a[2]] - d[a[2]] + 1
 }
 entropy(d, unit=log2)


The `unit` seems likely to throw an error since there is no argument  
for it to match.



}
system.time(entropy.history - sapply(1:1e6,update))
--8---cut here---end---8---

however, the global d is not modified, apparently update modifies the
local copy.


You could have returned 'd' and the entropy result as a list. But what  
would be the point of saving 1e6 copies




so,
1. is there a way for a function to modify a global variable?


So if you replaced it in the global environment, you would only be  
seeing the result of the last iteration of the loop. What's the use of  
that



2. how would you vectorize this loop?

thanks!


--
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization modifying globals in functions

2012-12-27 Thread Sam Steingold

I have the following code:

--8---cut here---start-8---
d - rep(10,10)
for (i in 1:100) {
  a - sample.int(length(d), size = 2)
  if (d[a[1]] = 1) {
d[a[1]] - d[a[1]] - 1
d[a[2]] - d[a[2]] + 1
  }
}
--8---cut here---end---8---

it does what I want, i.e., modified vector d 100 times.

Now, if I want to repeat this 1e6 times instead of 1e2 times, I want to
vectorize it for speed, so I do this:

--8---cut here---start-8---
update - function (i) {
  a - sample.int(n.agents, size = 2)
  if (d[a[1]] = delta) {
d[a[1]] - d[a[1]] - 1
d[a[2]] - d[a[2]] + 1
  }
  entropy(d, unit=log2)
}
system.time(entropy.history - sapply(1:1e6,update))
--8---cut here---end---8---

however, the global d is not modified, apparently update modifies the
local copy.

so,
1. is there a way for a function to modify a global variable?
2. how would you vectorize this loop?

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://pmw.org.il http://www.PetitionOnline.com/tap12009/
A number problem solved with floats turns into 1.9998 problems.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization modifying globals in functions

2012-12-27 Thread Neal H. Walfield

At Thu, 27 Dec 2012 15:38:08 -0500,
Sam Steingold wrote:
 so,
 1. is there a way for a function to modify a global variable?

Use - instead of -.

 2. how would you vectorize this loop?

This is hard.  Your function has a feedback loop: an iteration depends
on the previous iteration's result.  A for loop is about as good as
you can do in this case.  sapply might help a bit, but it is really
just a for loop in disguise.

Since sample.int is used to generate indexes, you might try to
generate a bunch of indexes, take as many as don't overlap (i.e.,
collect all orthogonal updates) and do all of those updates at once.
If you really need the entropy after every iteration, however, then
this won't work for you either.

Neal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization modifying globals in functions

2012-12-27 Thread Suzen, Mehmet

You can use environments. Have a look at this this discussion.

http://stackoverflow.com/questions/7439110/what-is-the-difference-between-parent-frame-and-parent-env-in-r-how-do-they

On 27 December 2012 21:38, Sam Steingold s...@gnu.org wrote:
 I have the following code:

 --8---cut here---start-8---
 d - rep(10,10)
 for (i in 1:100) {
   a - sample.int(length(d), size = 2)
   if (d[a[1]] = 1) {
 d[a[1]] - d[a[1]] - 1
 d[a[2]] - d[a[2]] + 1
   }
 }
 --8---cut here---end---8---

 it does what I want, i.e., modified vector d 100 times.

 Now, if I want to repeat this 1e6 times instead of 1e2 times, I want to
 vectorize it for speed, so I do this:

 --8---cut here---start-8---
 update - function (i) {
   a - sample.int(n.agents, size = 2)
   if (d[a[1]] = delta) {
 d[a[1]] - d[a[1]] - 1
 d[a[2]] - d[a[2]] + 1
   }
   entropy(d, unit=log2)
 }
 system.time(entropy.history - sapply(1:1e6,update))
 --8---cut here---end---8---

 however, the global d is not modified, apparently update modifies the
 local copy.

 so,
 1. is there a way for a function to modify a global variable?
 2. how would you vectorize this loop?

 thanks!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
 11.0.11103000
 http://www.childpsy.net/ http://honestreporting.com
 http://pmw.org.il http://www.PetitionOnline.com/tap12009/
 A number problem solved with floats turns into 1.9998 problems.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization condition counting

2012-08-10 Thread Guillaume2883

Hi all,

I am working on a really big dataset and I would like to vectorize a
condition in a if loop to improve speed.

the original loop with the condition is currently writen as follow:

if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1tags$lgth[i]300){

 tags$stage[i]-J

   }

Do you have some ideas ? I was unable to do it correctly
Thanking you in advance for your help

Guillaume



--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-condition-counting-tp4639992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization condition counting

2012-08-10 Thread William Dunlap

Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its
value, may be vectorized by the sneaky idiom
   !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE)

Hence f0() (with your code in a loop) and f1() are equivalent:
f0 - function (tags) {
for (i in seq_len(nrow(tags))) {
if (sum(tags$tag_id == tags$tag_id[i]) == 1  tags$lgth[i]  300) {
tags$stage[i] - J
}
}
tags
}
f1 -function (tags) {
needsChanging - with(tags, !(duplicated(tag_id, fromLast = FALSE) |
duplicated(tag_id, fromLast = TRUE))  lgth  300)
tags$stage[needsChanging] - J
tags
}

E.g.,
 someTags - data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), 
 stage=factor(rep(.,8), levels=c(.,J)))
 all.equal(f0(someTags), f1(someTags))
[1] TRUE
 f1(someTags)
  tag_id lgth stage
1  1   50 J
2  2  100 .
3  2  150 .
4  3  200 J
5  4  250 J
6  5  300 .
7  6  350 .
8  6  400 .

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Guillaume2883
 Sent: Friday, August 10, 2012 3:47 PM
 To: r-help@r-project.org
 Subject: [R] vectorization condition counting
 
 Hi all,
 
 I am working on a really big dataset and I would like to vectorize a
 condition in a if loop to improve speed.
 
 the original loop with the condition is currently writen as follow:
 
 if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1tags$lgth[i]300){
 
  tags$stage[i]-J
 
}
 
 Do you have some ideas ? I was unable to do it correctly
 Thanking you in advance for your help
 
 Guillaume
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/vectorization-condition-
 counting-tp4639992.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization condition counting

2012-08-10 Thread arun



HI,

This may also help:
someTags - data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), 
stage=factor(rep(.,8), levels=c(.,J)))
f2-function(x){
  
needsChanging-with(someTags,is.na(match(tag_id,tag_id[duplicated(tag_id)]))lgth300)
 x$stage[needsChanging]-J
 x
 }
 f2(someTags)
#  tag_id lgth stage
#1  1   50 J
#2  2  100 .
#3  2  150 .
#4  3  200 J
#5  4  250 J
#6  5  300 .
#7  6  350 .
#8  6  400 .
A.K.


- Original Message -
From: William Dunlap wdun...@tibco.com
To: Guillaume2883 guillaume.bal@gmail.com; r-help@r-project.org 
r-help@r-project.org
Cc: 
Sent: Friday, August 10, 2012 8:02 PM
Subject: Re: [R] vectorization condition counting

Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its
value, may be vectorized by the sneaky idiom
   !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE)

Hence f0() (with your code in a loop) and f1() are equivalent:
f0 - function (tags) {
    for (i in seq_len(nrow(tags))) {
        if (sum(tags$tag_id == tags$tag_id[i]) == 1  tags$lgth[i]  300) {
            tags$stage[i] - J
        }
    }
    tags
}
f1 -function (tags) {
    needsChanging - with(tags, !(duplicated(tag_id, fromLast = FALSE) |
        duplicated(tag_id, fromLast = TRUE))  lgth  300)
    tags$stage[needsChanging] - J
    tags
}

E.g.,
 someTags - data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), 
 stage=factor(rep(.,8), levels=c(.,J)))
 all.equal(f0(someTags), f1(someTags))
[1] TRUE
 f1(someTags)
  tag_id lgth stage
1      1   50     J
2      2  100     .
3      2  150     .
4      3  200     J
5      4  250     J
6      5  300     .
7      6  350     .
8      6  400     .

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Guillaume2883
 Sent: Friday, August 10, 2012 3:47 PM
 To: r-help@r-project.org
 Subject: [R] vectorization condition counting
 
 Hi all,
 
 I am working on a really big dataset and I would like to vectorize a
 condition in a if loop to improve speed.
 
 the original loop with the condition is currently writen as follow:
 
 if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1tags$lgth[i]300){
 
      tags$stage[i]-J
 
    }
 
 Do you have some ideas ? I was unable to do it correctly
 Thanking you in advance for your help
 
 Guillaume
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/vectorization-condition-
 counting-tp4639992.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization with subset?

2012-07-02 Thread dlv04c

Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic] coordinates.  



For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.

Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

Thanks,

Dan

--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization with subset?

2012-07-02 Thread David Winsemius



On Jul 2, 2012, at 12:15 PM, dlv04c wrote:


Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of  
[genomic]

coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic]  
coordinates.




For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set,  
and I

have 64 data sets.

Why does the rate at which coordinate averages are calculated  
increase when

coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?


You probably need to start by reading the vignettes for the IRanges  
package. It's difficult to be sure since you did not show the code for  
what you were doing currently.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization with subset?

2012-07-02 Thread dlv04c

The code is in the original post, but here it is again:



Thanks,

Dan


--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156p4635208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization with subset?

2012-07-02 Thread David Winsemius



On Jul 2, 2012, at 5:16 PM, dlv04c wrote:


The code is in the original post, but here it is again:



No code here or in original posting to rhelp. You are under the  
delusion that Nabble is R-help. It is not.



--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156p4635208.html
Sent from the R help mailing list archive at Nabble.com.


This is the rhelp mailing list. Not a website.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization instead of loops problem

2011-12-04 Thread Costas Vorlow

Hello,

I am having problems vectorizing the following (i/o using a for/next/while
loop):

I have 2 sequences such as:

x, y
1, 30
2, -40
0, 50
0, 25
1, -5
2, -10
1, 5
0, 40

etc etc

The first sequence (x) takes integer numbers only: 0, 1, 2
The sequence y can be anything...

I want to be able to retrieve (in a list if possible) the 3 last values of
the y sequence before a value of 1 is encountered on the x sequence, i.e:

On line 5 in the above dataset, x is 1 so I need to capture values: 25, 50
and -40 of the y sequence.

So the outcome (if a list) should look something like:

[1],[25,50,-40]
[2],[-10,-5,25] # as member #7 of x sequence is 1...

etc. etc.

Can I do the above avoiding for/next or while loops?
I am not sure I can explain it better. Any help/pointer extremely welcome.

Best regards,
Costas


-- 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Uwe Ligges




On 04.12.2011 16:18, Costas Vorlow wrote:

Hello,

I am having problems vectorizing the following (i/o using a for/next/while
loop):

I have 2 sequences such as:

x, y
1, 30
2, -40
0, 50
0, 25
1, -5
2, -10
1, 5
0, 40

etc etc

The first sequence (x) takes integer numbers only: 0, 1, 2
The sequence y can be anything...

I want to be able to retrieve (in a list if possible) the 3 last values of
the y sequence before a value of 1 is encountered on the x sequence, i.e:

On line 5 in the above dataset, x is 1 so I need to capture values: 25, 50
and -40 of the y sequence.

So the outcome (if a list) should look something like:

[1],[25,50,-40]
[2],[-10,-5,25] # as member #7 of x sequence is 1...

etc. etc.

Can I do the above avoiding for/next or while loops?
I am not sure I can explain it better. Any help/pointer extremely welcome.

Best regards,
Costas




One way is (assuming your data is in a data.frame called dat):

 wx - which(dat$x==1)
 result - lapply(wx[wx  3], function(x) dat$y[x - (1:3)])

(where lapply is a loop, implicitly).


Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Costas Vorlow

Thanks Uwe.

What happens if these are zoo (or time series) sequences/dataframes?
\
I think your solution would apply as well, no?

Thanks again  best wishes,
Costas

2011/12/4 Uwe Ligges lig...@statistik.tu-dortmund.de



 On 04.12.2011 16:18, Costas Vorlow wrote:

 Hello,

 I am having problems vectorizing the following (i/o using a for/next/while
 loop):

 I have 2 sequences such as:

 x, y
 1, 30
 2, -40
 0, 50
 0, 25
 1, -5
 2, -10
 1, 5
 0, 40

 etc etc

 The first sequence (x) takes integer numbers only: 0, 1, 2
 The sequence y can be anything...

 I want to be able to retrieve (in a list if possible) the 3 last values of
 the y sequence before a value of 1 is encountered on the x sequence, i.e:

 On line 5 in the above dataset, x is 1 so I need to capture values: 25, 50
 and -40 of the y sequence.

 So the outcome (if a list) should look something like:

 [1],[25,50,-40]
 [2],[-10,-5,25] # as member #7 of x sequence is 1...

 etc. etc.

 Can I do the above avoiding for/next or while loops?
 I am not sure I can explain it better. Any help/pointer extremely welcome.

 Best regards,
 Costas



 One way is (assuming your data is in a data.frame called dat):

  wx - which(dat$x==1)
  result - lapply(wx[wx  3], function(x) dat$y[x - (1:3)])

 (where lapply is a loop, implicitly).


 Uwe Ligges




-- 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Gabor Grothendieck

On Sun, Dec 4, 2011 at 10:18 AM, Costas Vorlow costas.vor...@gmail.com wrote:
 Hello,

 I am having problems vectorizing the following (i/o using a for/next/while
 loop):

 I have 2 sequences such as:

 x, y
 1, 30
 2, -40
 0, 50
 0, 25
 1, -5
 2, -10
 1, 5
 0, 40

 etc etc

 The first sequence (x) takes integer numbers only: 0, 1, 2
 The sequence y can be anything...

 I want to be able to retrieve (in a list if possible) the 3 last values of
 the y sequence before a value of 1 is encountered on the x sequence, i.e:

 On line 5 in the above dataset, x is 1 so I need to capture values: 25, 50
 and -40 of the y sequence.

 So the outcome (if a list) should look something like:

 [1],[25,50,-40]
 [2],[-10,-5,25] # as member #7 of x sequence is 1...

 etc. etc.

 Can I do the above avoiding for/next or while loops?
 I am not sure I can explain it better. Any help/pointer extremely welcome.

Try this. embed(z, 4) places values 1,2,3,4 of vector z in the first
row, values 2,3,4,5 in the second row and so on so we want the rows of
embed(y, 4) for which embed(x, 4) is 1, i.e we want rows of embed(y,
4) for which embed(x, 4)[,1]==1, except the first column can be
suppressed (-1).

 embed(y, 4)[embed(x, 4)[, 1] == 1, -1]
 [,1] [,2] [,3]
[1,]   25   50  -40
[2,]  -10   -5   25

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Bert Gunter

Costas: (and thanks for giving us your name)

which(x == 1)

gives you the indices where x is 1 (up to floating point equality --
you did not specify whether your x values are integers or calculated
as floating point, and that certainly makes a difference). You can
then use simple indexing to get the y values. No loops needed.

However, let's explore why your question may have been too poorly
formed to get the answer you seek:

1. What if the index of the first 1 is 3 or less? -- Do you want to
ignore the (less than 3) preceding values or just choose as many as
you can?

2. What if, as in your example, several 1's occur in x. Do you want
the 3 preceding values for all of them or just the first?

3. If the answer to 2 is all of them, what if several 1's are less
than 3 indices apart -- do you want to include the overlapping sets of
3 y's -- or what?

My point is that etc. etc. is simply inadequate as a coherent or
useful problem description in your post. You _must_ be explicit,
complete, and concise. This can be hard. Indeed, it may require
considerable thought and effort. I have found -- and others have often
noted here -- that going through such an exercise itself often reveals
a solution. But be that as it may, the Posting Guide is actually an
excellent, comprehensive discussion of how to ask good questions in
forums like this. Read it. Follow it.

... and to be fair, your post below is, imho, probably above average
as posts go, allowing me to focus on specific points that I thought
required clarification. Quite a few posts here of late have been so
muddled and incoherent that I had no clue what the OP wanted. And it's
not English as a second language. I am a language ignoramus and speak
only English, so I am happy to tolerate poor grammar and vocabulary
from someone for whom English is only one of several languages in
which they can communicate. The problem is poor thinking, not poor
English.

Best,
Bert

On Sun, Dec 4, 2011 at 7:18 AM, Costas Vorlow costas.vor...@gmail.com wrote:
Hello,

I am having problems vectorizing the following (i/o using a for/next/while
loop):

I have 2 sequences such as:

x, y
1, 30
2, -40
0, 50
0, 25
1, -5
2, -10
1, 5
0, 40

etc etc

The first sequence (x) takes integer numbers only: 0, 1, 2
The sequence y can be anything...

I want to be able to retrieve (in a list if possible) the 3 last values of
the y sequence before a value of 1 is encountered on the x sequence, i.e:

On line 5 in the above dataset, x is 1 so I need to capture values: 25, 50
and -40 of the y sequence.

So the outcome (if a list) should look something like:

[1],[25,50,-40]
[2],[-10,-5,25] # as member #7 of x sequence is 1...

etc. etc.

Can I do the above avoiding for/next or while loops?
I am not sure I can explain it better. Any help/pointer extremely welcome.

Best regards,
Costas

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Costas Vorlow

Dear Bert,

You are right (obviously).

Apologies for any inconvenience caused.  I thought my problem was
simplistic with a very obvious answer which eluded me.

As per your justified questions :

2: Answer is all,

hence:

3. would be include overlapping set (I guess) but this does not matter for
the time being. I didn't give it too much thought admittedly... If I got 1
 2 right I could have modified the code for point 3 (if answer in 2 !=
all'), so I did not consider it when I was formulating my query. However,
I can see now why this is confusing.

Anyways, thanks again for the pointers.

BTW, is there a good  quick read/guide on vectorization in R that one
could recommend? That would minimize my queries at least in the list. :-)

Apologies again and best regards,
Costas

On 4 December 2011 17:45, Bert Gunter gunter.ber...@gene.com wrote:

 Costas: (and thanks for giving us your name)

 which(x == 1)

 gives you the indices where x is 1 (up to floating point equality --
 you did not specify whether your x values are integers or calculated
 as floating point, and that certainly makes a difference). You can
 then use simple indexing to get the y values. No loops needed.

 However, let's explore why your question may have been too poorly
 formed to get the answer you seek:

 1. What if the index of the first 1 is 3 or less? -- Do you want to
 ignore the (less than 3) preceding values or just choose as many as
 you can?

 2. What if, as in your example, several 1's occur in x. Do you want
 the 3 preceding values for all of them or just the first?

 3. If the answer to 2 is all of them, what if several 1's are less
 than 3 indices apart -- do you want to include the overlapping sets of
 3 y's -- or what?

 My point is that etc. etc. is simply inadequate as a coherent or
 useful problem description in your post. You _must_ be explicit,
 complete, and concise. This can be hard. Indeed, it may require
 considerable thought and effort. I have found -- and others have often
 noted here -- that going through such an exercise itself often reveals
 a solution. But be that as it may, the Posting Guide is actually an
 excellent, comprehensive discussion of how to ask good questions in
 forums like this. Read it. Follow it.

 ... and to be fair, your post below is, imho, probably above average
 as posts go, allowing me to focus on specific points that I thought
 required clarification. Quite a few posts here of late have been so
 muddled and incoherent that I had no clue what the OP wanted. And it's
 not English as a second language. I am a language ignoramus and speak
 only English, so I am happy to tolerate poor grammar and vocabulary
 from someone for whom English is only one of several languages in
 which they can communicate. The problem is poor thinking, not poor
 English.

 Best,
 Bert

 On Sun, Dec 4, 2011 at 7:18 AM, Costas Vorlow costas.vor...@gmail.com
 wrote:
  Hello,
 
  I am having problems vectorizing the following (i/o using a
 for/next/while
  loop):
 
  I have 2 sequences such as:
 
  x, y
  1, 30
  2, -40
  0, 50
  0, 25
  1, -5
  2, -10
  1, 5
  0, 40
 
  etc etc
 
  The first sequence (x) takes integer numbers only: 0, 1, 2
  The sequence y can be anything...
 
  I want to be able to retrieve (in a list if possible) the 3 last values
 of
  the y sequence before a value of 1 is encountered on the x sequence, i.e:
 
  On line 5 in the above dataset, x is 1 so I need to capture values: 25,
 50
  and -40 of the y sequence.
 
  So the outcome (if a list) should look something like:
 
  [1],[25,50,-40]
  [2],[-10,-5,25] # as member #7 of x sequence is 1...
 
  etc. etc.
 
  Can I do the above avoiding for/next or while loops?
  I am not sure I can explain it better. Any help/pointer extremely
 welcome.
 
  Best regards,
  Costas
 
 
  --
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




-- 

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization instead of loops problem

2011-12-04 Thread Bert Gunter

Inline below

On Sun, Dec 4, 2011 at 10:29 AM, Costas Vorlow costas.vor...@gmail.com wrote:
 Dear Bert,

 You are right (obviously).

 Apologies for any inconvenience caused.  I thought my problem was simplistic
 with a very obvious answer which eluded me.

 As per your justified questions :

 2: Answer is all,

 hence:

 3. would be include overlapping set (I guess) but this does not matter for
 the time being. I didn't give it too much thought admittedly... If I got 1 
 2 right I could have modified the code for point 3 (if answer in 2 !=
 all'), so I did not consider it when I was formulating my query. However, I
 can see now why this is confusing.

 Anyways, thanks again for the pointers.

 BTW, is there a good  quick read/guide on vectorization in R that one could
 recommend? That would minimize my queries at least in the list. :-)

Vectorization is a central paradigm in R, so practically all books on
the S language discuss this. The R language definition manual that
ships with R is pretty comprehensive, but VR's MASS or S Programming
Books, Patrick Burns's website tutorials (he has several well suited
for beginners), John Chambers's  Programming with R  , etc. are just
a few among many. It is impossible for me to be more specific than
that.

-- Bert

 Apologies again and best regards,
 Costas

 On 4 December 2011 17:45, Bert Gunter gunter.ber...@gene.com wrote:

 Costas: (and thanks for giving us your name)

 which(x == 1)

 gives you the indices where x is 1 (up to floating point equality --
 you did not specify whether your x values are integers or calculated
 as floating point, and that certainly makes a difference). You can
 then use simple indexing to get the y values. No loops needed.

 However, let's explore why your question may have been too poorly
 formed to get the answer you seek:

 1. What if the index of the first 1 is 3 or less? -- Do you want to
 ignore the (less than 3) preceding values or just choose as many as
 you can?

 2. What if, as in your example, several 1's occur in x. Do you want
 the 3 preceding values for all of them or just the first?

 3. If the answer to 2 is all of them, what if several 1's are less
 than 3 indices apart -- do you want to include the overlapping sets of
 3 y's -- or what?

 My point is that etc. etc. is simply inadequate as a coherent or
 useful problem description in your post. You _must_ be explicit,
 complete, and concise. This can be hard. Indeed, it may require
 considerable thought and effort. I have found -- and others have often
 noted here -- that going through such an exercise itself often reveals
 a solution. But be that as it may, the Posting Guide is actually an
 excellent, comprehensive discussion of how to ask good questions in
 forums like this. Read it. Follow it.

 ... and to be fair, your post below is, imho, probably above average
 as posts go, allowing me to focus on specific points that I thought
 required clarification. Quite a few posts here of late have been so
 muddled and incoherent that I had no clue what the OP wanted. And it's
 not English as a second language. I am a language ignoramus and speak
 only English, so I am happy to tolerate poor grammar and vocabulary
 from someone for whom English is only one of several languages in
 which they can communicate. The problem is poor thinking, not poor
 English.

 Best,
 Bert

 On Sun, Dec 4, 2011 at 7:18 AM, Costas Vorlow costas.vor...@gmail.com
 wrote:
  Hello,
 
  I am having problems vectorizing the following (i/o using a
  for/next/while
  loop):
 
  I have 2 sequences such as:
 
  x, y
  1, 30
  2, -40
  0, 50
  0, 25
  1, -5
  2, -10
  1, 5
  0, 40
 
  etc etc
 
  The first sequence (x) takes integer numbers only: 0, 1, 2
  The sequence y can be anything...
 
  I want to be able to retrieve (in a list if possible) the 3 last values
  of
  the y sequence before a value of 1 is encountered on the x sequence,
  i.e:
 
  On line 5 in the above dataset, x is 1 so I need to capture values: 25,
  50
  and -40 of the y sequence.
 
  So the outcome (if a list) should look something like:
 
  [1],[25,50,-40]
  [2],[-10,-5,25] # as member #7 of x sequence is 1...
 
  etc. etc.
 
  Can I do the above avoiding for/next or while loops?
  I am not sure I can explain it better. Any help/pointer extremely
  welcome.
 
  Best regards,
  Costas
 
 
  --
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |c|o|s|t|a|s|@|v|o|r|l|o|w|.|o|r|g|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

Re: [R] Vectorization

2011-01-24 Thread Petr Savicky

On Sun, Jan 23, 2011 at 07:29:16PM -0800, eric wrote:
 
 Is there a way to vectorize this loop or a smarter way to do it ?
 
 y
  [1]  0.003990746 -0.037664639  0.005397999  0.010415496  0.003500676
  [6]  0.001691775  0.008170774  0.011961998 -0.016879531  0.007284486
 [11] -0.015083581 -0.006645958 -0.013153103  0.028148639 -0.005724317
 [16] -0.027408025  0.014767422 -0.001619691  0.018334730 -0.009747171
 
 x -numeric(length(y))
 for (i in 1 :length(y)) {
 x[i] - ifelse( i==1, 1*(1+y[i]), (1+y[i])*x[i-1])
 }
 
 x
  [1] 10039.907  9661.758  9713.912  9815.087  9849.447  9866.110  9946.724
  [8] 10065.706  9895.802  9967.888  9817.536  9752.289  9624.016  9894.919
 [15]  9838.278  9568.630  9709.934  9694.207  9871.948  9775.724
 
 Basically trying to see how the equity of an investment changes after each
 return period. Start with $10,000 and a series of returns over time. Figure
 out the equity after each time period (return).

Hello.

The cycle computes a cumulative product. The initialization may
be add as a common multiplier. So, z in the following should be equal
to x up to the machine rounding error.

  y - c(
0.003990746, -0.037664639,  0.005397999,  0.010415496,  0.003500676,
0.001691775,  0.008170774,  0.011961998, -0.016879531,  0.007284486,
   -0.015083581, -0.006645958, -0.013153103,  0.028148639, -0.005724317,
   -0.027408025,  0.014767422, -0.001619691,  0.018334730, -0.009747171)
 
  x - numeric(length(y))
  for (i in 1:length(y)) {
  x[i] - ifelse(i==1, 1*(1+y[i]), (1+y[i])*x[i-1])
  }
 
  z - 1*cumprod(1 + y)
 
  max(abs(x - z))
  # [1] 1.818989e-12

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization

2011-01-23 Thread eric


Is there a way to vectorize this loop or a smarter way to do it ?

y
 [1]  0.003990746 -0.037664639  0.005397999  0.010415496  0.003500676
 [6]  0.001691775  0.008170774  0.011961998 -0.016879531  0.007284486
[11] -0.015083581 -0.006645958 -0.013153103  0.028148639 -0.005724317
[16] -0.027408025  0.014767422 -0.001619691  0.018334730 -0.009747171

x -numeric(length(y))
for (i in 1 :length(y)) {
x[i] - ifelse( i==1, 1*(1+y[i]), (1+y[i])*x[i-1])
}

x
 [1] 10039.907  9661.758  9713.912  9815.087  9849.447  9866.110  9946.724
 [8] 10065.706  9895.802  9967.888  9817.536  9752.289  9624.016  9894.919
[15]  9838.278  9568.630  9709.934  9694.207  9871.948  9775.724

Basically trying to see how the equity of an investment changes after each
return period. Start with $10,000 and a series of returns over time. Figure
out the equity after each time period (return).



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Vectorization-tp3233340p3233340.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R - Vectorization and Functional Programming Constructs

2011-01-22 Thread Gabor Grothendieck

On Fri, Jan 21, 2011 at 10:10 PM, Mingo catojo...@gmail.com wrote:
 Hello, I am new to R (coming from Perl) and have what is, at least at this
 point, a philosophical question and a request for comment on some basic
 code. As I understand it - R emphasizes ,or at least supports, the
 functional programming model. I've come across some code that was markedly
 absent in for loops - and have been seeing some constructs that relate to
 functional programming and vectorized code (not that is at all unique to R
 of course). But I'm also new to the concept of vectorizing code.

 However, since I anticipate dealing with vectors of large sizes I think that
 this approach is probably going to serve well in terms of performance. As an
 example I anticipate having vector operations  calling for shifting. I'll be
 shifting vectors to the right (or left) like below while maintaining the
 length and filling with zeros. Keep in mind I'll ultimately be dealing with
 vectors with very large length.

x - c(0,3,2,1,0,0,0)
vlen - length(x)
 [1] 7

 One solution to accomplish the right shift is to do something like:

x=c(0,x[1:vlen-1])
x
 1] 0 0 3 2 1 0 0

 this does the trick though I'm wondering if this is in the spirit of
 Vectorization. I could make recursive function that would cycle through
 the whole vector eventually leaving it full of 0s thus ending the recursion.
 Though does this capture the spirit of R programming and vectorizing ? Are
 there more primitive operators closer to the underlying C code that would
 serve performance interests better ?


If x is supposed to represent a time series that you are trying to
align you would likely be better off to represent it as an object of
one of the time series classes (ts, zoo, xts, timeSeries) and then use
lag.  That way you will not only have a convenient lag function but
all the other functionality that you might need to conveniently handle
such objects.  lag is written in C in both zoo and xts and might be in
timeSeries as well.  If your series is regularly spaced and so
applicable to ts then, internally, lagging only involves manipulating
its tsp attribute so it would be extremely fast.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R - Vectorization and Functional Programming Constructs

2011-01-21 Thread Mingo

Hello, I am new to R (coming from Perl) and have what is, at least at this
point, a philosophical question and a request for comment on some basic
code. As I understand it - R emphasizes ,or at least supports, the
functional programming model. I've come across some code that was markedly
absent in for loops - and have been seeing some constructs that relate to
functional programming and vectorized code (not that is at all unique to R
of course). But I'm also new to the concept of vectorizing code.

However, since I anticipate dealing with vectors of large sizes I think that
this approach is probably going to serve well in terms of performance. As an
example I anticipate having vector operations  calling for shifting. I'll be
shifting vectors to the right (or left) like below while maintaining the
length and filling with zeros. Keep in mind I'll ultimately be dealing with
vectors with very large length.

x - c(0,3,2,1,0,0,0)
vlen - length(x)
[1] 7

One solution to accomplish the right shift is to do something like:

x=c(0,x[1:vlen-1])
x
1] 0 0 3 2 1 0 0

this does the trick though I'm wondering if this is in the spirit of
Vectorization. I could make recursive function that would cycle through
the whole vector eventually leaving it full of 0s thus ending the recursion.
Though does this capture the spirit of R programming and vectorizing ? Are
there more primitive operators closer to the underlying C code that would
serve performance interests better ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization of three embedded loops

2009-01-14 Thread Thomas Terhoeven-Urselmans

Dear R-programmer,

I wrote an adapted implementation of the Kennard-Stone algorithm for  
sample selection of multivariate data (R 2.7.1 under MacBook Pro,  
Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
I used for the heart of the script three embedded loops. This makes it  
especially for huge datasets very slow. For a datamatrix of 1853*1853  
and the selection of 556 samples needed computation time of more than  
24 hours.
I did some research on vecotrization, but I could not figure out how  
to do it better/faster. Which ways are there to replace the time  
consuming loops?

Here are some information:

# val.n-24;
# start.b-matrix(nrow=1812, ncol=20);
# val is a vector of the rownames of 22 in an earlier step chosen  
extrem samples;
# euc--matrix(nrow=1853, ncol=1853); [contains the Euclidean  
distance calculations]

The following calculation of the system.time was for the selection of  
two samples:
system.time(KEN.STO(val.n,start.b,val.start,euc))
user  system elapsed
  25.294  13.262  38.927

The function:

KEN.STO-function(val.n,start.b,val,euc){

for(k in 1:val.n){
sum.dist-c();
for(i in 1:length(start.b[,1])){
sum-c();
for(j in 1:length(val)){
sum[j]-euc[rownames(start.b)[i],val[j]]
}
sum.dist[i]-min(sum);
}
bla-rownames(start.b)[which(sum.dist==max(sum.dist))]
val-c(val,bla[1]);
start.b-start.b[-(which(match(rownames(start.b),val[length(val)])! 
=NA)),];
if(length(val)=val.n)break;
}
return(val);
}

Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF) 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of three embedded loops

2009-01-14 Thread Carlos J. Gil Bellosta

Hello,

I believe that your bottleneck lies at this piece of code:

sum-c();
for(j in 1:length(val)){
sum[j]-euc[rownames(start.b)[i],val[j]]
}

In order to speed up your code, there are two alternatives:

1) Try to reorder the euc matrix so that the sum vector corresponds to
(part of) a row or column of euc.

2) For each i value, create a matrix with the coordinates corresponding
to ( rownames(start.b)[i], val[j] ) and index the matrix by this matrix
in order to create sum. This will be easiest if you can reorder euc in a
way that accessing its elements will be easy (and then you would be back
into (1)).

Creating a variable sum as c() and increasing its size in a loop is one
of the easiest ways to uselessly burn your CPU.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Wed, 2009-01-14 at 10:32 +0300, Thomas Terhoeven-Urselmans wrote:
 Dear R-programmer,
 
 I wrote an adapted implementation of the Kennard-Stone algorithm for  
 sample selection of multivariate data (R 2.7.1 under MacBook Pro,  
 Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
 I used for the heart of the script three embedded loops. This makes it  
 especially for huge datasets very slow. For a datamatrix of 1853*1853  
 and the selection of 556 samples needed computation time of more than  
 24 hours.
 I did some research on vecotrization, but I could not figure out how  
 to do it better/faster. Which ways are there to replace the time  
 consuming loops?
 
 Here are some information:
 
 # val.n-24;
 # start.b-matrix(nrow=1812, ncol=20);
 # val is a vector of the rownames of 22 in an earlier step chosen  
 extrem samples;
 # euc--matrix(nrow=1853, ncol=1853); [contains the Euclidean  
 distance calculations]
 
 The following calculation of the system.time was for the selection of  
 two samples:
 system.time(KEN.STO(val.n,start.b,val.start,euc))
 user  system elapsed
   25.294  13.262  38.927
 
 The function:
 
 KEN.STO-function(val.n,start.b,val,euc){
 
 for(k in 1:val.n){
 sum.dist-c();
 for(i in 1:length(start.b[,1])){
   sum-c();
   for(j in 1:length(val)){
   sum[j]-euc[rownames(start.b)[i],val[j]]
   }
   sum.dist[i]-min(sum);
   }
 bla-rownames(start.b)[which(sum.dist==max(sum.dist))]
 val-c(val,bla[1]);
 start.b-start.b[-(which(match(rownames(start.b),val[length(val)])! 
 =NA)),];
 if(length(val)=val.n)break;
 }
 return(val);
 }
 
 Regards,
 
 Thomas
 
 Dr. Thomas Terhoeven-Urselmans
 Post-Doc Fellow
 Soil infrared spectroscopy
 World Agroforestry Center (ICRAF) 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of three embedded loops

2009-01-14 Thread Patrick Burns


You are definitely in Circle 2 of the R Inferno.
Growing objects is suboptimal, although your
objects are small so this probably isn't taking
too much time.

There is no need for the inner-most loop:

 sum.dist[i] - min(euc[rownames(start.b)[i],val] )

Maybe I'm blind, but I don't see where 'k' comes
in from the outer-most loop.


Patrick Burns
patr...@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of The R Inferno and A Guide for the Unwilling S User)


Thomas Terhoeven-Urselmans wrote:

Dear R-programmer,

I wrote an adapted implementation of the Kennard-Stone algorithm for  
sample selection of multivariate data (R 2.7.1 under MacBook Pro,  
Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
I used for the heart of the script three embedded loops. This makes it  
especially for huge datasets very slow. For a datamatrix of 1853*1853  
and the selection of 556 samples needed computation time of more than  
24 hours.
I did some research on vecotrization, but I could not figure out how  
to do it better/faster. Which ways are there to replace the time  
consuming loops?


Here are some information:

# val.n-24;
# start.b-matrix(nrow=1812, ncol=20);
# val is a vector of the rownames of 22 in an earlier step chosen  
extrem samples;
# euc--matrix(nrow=1853, ncol=1853); [contains the Euclidean  
distance calculations]


The following calculation of the system.time was for the selection of  
two samples:

system.time(KEN.STO(val.n,start.b,val.start,euc))
user  system elapsed
  25.294  13.262  38.927

The function:

KEN.STO-function(val.n,start.b,val,euc){

for(k in 1:val.n){
sum.dist-c();
for(i in 1:length(start.b[,1])){
sum-c();
for(j in 1:length(val)){
sum[j]-euc[rownames(start.b)[i],val[j]]
}
sum.dist[i]-min(sum);
}
bla-rownames(start.b)[which(sum.dist==max(sum.dist))]
val-c(val,bla[1]);
start.b-start.b[-(which(match(rownames(start.b),val[length(val)])! 
=NA)),];

if(length(val)=val.n)break;
}
return(val);
}

Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF) 
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of three embedded loops

2009-01-14 Thread Thomas Terhoeven-Urselmans

Dear Patrick,

thanks for the very helpful response. I can calculate now 25 times  
faster.

I use the 'k' from the outer-most loop only indirectly. It gives a  
maximal number of repetitions of the whole script until following  
command applies

'if(length(val.x.c)=val.x.c.n)break'.

The reason why I use this 'break' instead of 'for(k in 1:val.x.c.n){'  
command is that in some other application of this algorithm more than  
one sample can be chosen in one round.

Is there another/faster way to avoid this usage of 'k'?

Regards,

Thomas

On 14 Jan 2009, at 12:52, Patrick Burns wrote:

 You are definitely in Circle 2 of the R Inferno.
 Growing objects is suboptimal, although your
 objects are small so this probably isn't taking
 too much time.

 There is no need for the inner-most loop:

 sum.dist[i] - min(euc[rownames(start.b)[i],val] )

 Maybe I'm blind, but I don't see where 'k' comes
 in from the outer-most loop.


 Patrick Burns
 patr...@burns-stat.com
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of The R Inferno and A Guide for the Unwilling S User)


 Thomas Terhoeven-Urselmans wrote:
 Dear R-programmer,

 I wrote an adapted implementation of the Kennard-Stone algorithm  
 for  sample selection of multivariate data (R 2.7.1 under MacBook  
 Pro,  Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2  
 SDRAM).
 I used for the heart of the script three embedded loops. This makes  
 it  especially for huge datasets very slow. For a datamatrix of  
 1853*1853  and the selection of 556 samples needed computation time  
 of more than  24 hours.
 I did some research on vecotrization, but I could not figure out  
 how  to do it better/faster. Which ways are there to replace the  
 time  consuming loops?

 Here are some information:

 # val.n-24;
 # start.b-matrix(nrow=1812, ncol=20);
 # val is a vector of the rownames of 22 in an earlier step chosen   
 extrem samples;
 # euc--matrix(nrow=1853, ncol=1853); [contains the Euclidean   
 distance calculations]

 The following calculation of the system.time was for the selection  
 of  two samples:
 system.time(KEN.STO(val.n,start.b,val.start,euc))
user  system elapsed
  25.294  13.262  38.927

 The function:

 KEN.STO-function(val.n,start.b,val,euc){

 for(k in 1:val.n){
 sum.dist-c();
 for(i in 1:length(start.b[,1])){
  sum-c();
  for(j in 1:length(val)){
  sum[j]-euc[rownames(start.b)[i],val[j]]
  }
  sum.dist[i]-min(sum);
  }
 bla-rownames(start.b)[which(sum.dist==max(sum.dist))]
 val-c(val,bla[1]);
 start.b-start.b[-(which(match(rownames(start.b),val[length(val)])!  
 =NA)),];
 if(length(val)=val.n)break;
 }
 return(val);
 }

 Regards,

 Thomas

 Dr. Thomas Terhoeven-Urselmans
 Post-Doc Fellow
 Soil infrared spectroscopy
 World Agroforestry Center (ICRAF)[[alternative HTML version  
 deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF)
United Nations Avenue, Gigiri
PO Box 30677-00100 Nairobi, Kenya
Ph: 254 20 722 4113 or via USA 1 650 833 6654 ext. 4113
Fax 254 20 722 4001 or via USA 1 650 833 6646
Email: t.urselm...@cgiar.org
Internet: http://worldagroforestrycentre.org







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of three embedded loops

2009-01-14 Thread Thomas Terhoeven-Urselmans

Dear Carlos,

thanks for your support. Patrick Burns gave me a hint, which is in the  
end very similar to your proposal. Now the script is roughly 25 times  
faster.

Here is the code (I implemented as well an in size not increasing  
vector 'summ.dist-rep(0,val.x.c.n)'):

KEN.STO-function(val.n,start.b,val,euc){

for(k in 1:val.n){
summ.dist-rep(0,val.n);
for(i in 1:length(start.b[,1])){
summ.dist[i]-min(euc[rownames(start.b)[i],val]);
}
bla-rownames(start.b)[which(summ.dist==max(summ.dist))]
val-c(val,bla[1]);
start.b-start.b[-(which(match(rownames(start.b),val[length(val)])! 
=NA)),];
if(length(val)=val.n)break;
}
return(val.x.c);
}

Regards,

Thomas

On 14 Jan 2009, at 12:58, Carlos J. Gil Bellosta wrote:

 Hello,

 I believe that your bottleneck lies at this piece of code:

 sum-c();
 for(j in 1:length(val)){
   sum[j]-euc[rownames(start.b)[i],val[j]]
 }

 In order to speed up your code, there are two alternatives:

 1) Try to reorder the euc matrix so that the sum vector corresponds to
 (part of) a row or column of euc.

 2) For each i value, create a matrix with the coordinates  
 corresponding
 to ( rownames(start.b)[i], val[j] ) and index the matrix by this  
 matrix
 in order to create sum. This will be easiest if you can reorder euc  
 in a
 way that accessing its elements will be easy (and then you would be  
 back
 into (1)).

 Creating a variable sum as c() and increasing its size in a loop is  
 one
 of the easiest ways to uselessly burn your CPU.

 Best regards,

 Carlos J. Gil Bellosta
 http://www.datanalytics.com


 On Wed, 2009-01-14 at 10:32 +0300, Thomas Terhoeven-Urselmans wrote:
 Dear R-programmer,

 I wrote an adapted implementation of the Kennard-Stone algorithm for
 sample selection of multivariate data (R 2.7.1 under MacBook Pro,
 Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
 I used for the heart of the script three embedded loops. This makes  
 it
 especially for huge datasets very slow. For a datamatrix of 1853*1853
 and the selection of 556 samples needed computation time of more than
 24 hours.
 I did some research on vecotrization, but I could not figure out how
 to do it better/faster. Which ways are there to replace the time
 consuming loops?

 Here are some information:

 # val.n-24;
 # start.b-matrix(nrow=1812, ncol=20);
 # val is a vector of the rownames of 22 in an earlier step chosen
 extrem samples;
 # euc--matrix(nrow=1853, ncol=1853); [contains the Euclidean
 distance calculations]

 The following calculation of the system.time was for the selection of
 two samples:
 system.time(KEN.STO(val.n,start.b,val.start,euc))
user  system elapsed
  25.294  13.262  38.927

 The function:

 KEN.STO-function(val.n,start.b,val,euc){

 for(k in 1:val.n){
 sum.dist-c();
 for(i in 1:length(start.b[,1])){
  sum-c();
  for(j in 1:length(val)){
  sum[j]-euc[rownames(start.b)[i],val[j]]
  }
  sum.dist[i]-min(sum);
  }
 bla-rownames(start.b)[which(sum.dist==max(sum.dist))]
 val-c(val,bla[1]);
 start.b-start.b[-(which(match(rownames(start.b),val[length(val)])!
 =NA)),];
 if(length(val)=val.n)break;
 }
 return(val);
 }

 Regards,

 Thomas

 Dr. Thomas Terhoeven-Urselmans
 Post-Doc Fellow
 Soil infrared spectroscopy
 World Agroforestry Center (ICRAF)
  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF)
United Nations Avenue, Gigiri
PO Box 30677-00100 Nairobi, Kenya
Ph: 254 20 722 4113 or via USA 1 650 833 6654 ext. 4113
Fax 254 20 722 4001 or via USA 1 650 833 6646
Email: t.urselm...@cgiar.org
Internet: http://worldagroforestrycentre.org







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization instead of using loop

2008-10-09 Thread Frank Hedler

Dear all,
I've sent this question 2 days ago and got response from Sarah. Thanks for
that. But unfortunately, it did not really solve our problem. The main issue
is that we want to use our own (manipulated) covariance matrix in the
calculation of the mahalanobis distance. Does anyone know how to vectorize
the below code instead of using a loop (which slows it down)?
I'd really appreciate any help on this, thank you all in advance!
Cheers,
Frank

This is what I posted 2 days ago:
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).
However, we do not want to calculate Euclidian distances. We want to
calculate Mahalanobis distances, which take into account the covariance
among variables.
Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
 for (i in 1:nrow(x)) {
   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5
 }
 return(dismat)
}

This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function. Any help would be greatly
appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization instead of using loop

2008-10-09 Thread Patrick Burns


One thing that would speed it up is if you
inverted 'covmat' once and then used
'inverted=TRUE' in the call to 'mahalanobis'.

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Frank Hedler wrote:

Dear all,
I've sent this question 2 days ago and got response from Sarah. Thanks for
that. But unfortunately, it did not really solve our problem. The main issue
is that we want to use our own (manipulated) covariance matrix in the
calculation of the mahalanobis distance. Does anyone know how to vectorize
the below code instead of using a loop (which slows it down)?
I'd really appreciate any help on this, thank you all in advance!
Cheers,
Frank

This is what I posted 2 days ago:
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).
However, we do not want to calculate Euclidian distances. We want to
calculate Mahalanobis distances, which take into account the covariance
among variables.
Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
 for (i in 1:nrow(x)) {
   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5
 }
 return(dismat)
}

This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function. Any help would be greatly
appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization instead of using loop

2008-10-09 Thread Richard . Cotton

 I've sent this question 2 days ago and got response from Sarah. Thanks 
for
 that. But unfortunately, it did not really solve our problem. The main 
issue
 is that we want to use our own (manipulated) covariance matrix in the
 calculation of the mahalanobis distance. Does anyone know how to 
vectorize
 the below code instead of using a loop (which slows it down)?
 I'd really appreciate any help on this, thank you all in advance!
 Cheers,
 Frank
 
 This is what I posted 2 days ago:
 We have a data frame x with n people as rows and k variables as columns.
 Now, for each person (i.e., each row) we want to calculate a distance
 between  him/her and EACH other person in x. In other words, we want to
 create a n x n matrix with distances (with zeros in the diagonal).
 However, we do not want to calculate Euclidian distances. We want to
 calculate Mahalanobis distances, which take into account the covariance
 among variables.
 Below is the piece of code we wrote (covmat in the function below is 
the
 variance-covariance matrix among variables in Data that has to be fed 
into
 mahalonobis function we are using).
  mahadist = function(x, covmat) {
  dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
  for (i in 1:nrow(x)) {
dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), 
covmat)^.5
  }
  return(dismat)
 }
 
 This piece of code works, but it is very slow. We were wondering if it's 
at
 all possible to somehow vectorize this function. Any help would be 
greatly
 appreciated.

You can save a substantial time by calling as.matrix before the loop, e.g.

x - data.frame(runif(1000), runif(1000), runif(1000))
covmat - cov(x)

mahadist = function(x, covmat) #yours
{
   dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
   for (i in 1:nrow(x)) 
   {
 dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), 
covmat)^.5
   }
   return(dismat)
}

mahadist2 - function(x, covmat) #my modification
{
   n - nrow(x)
   dismat - matrix(0,ncol=n,nrow=n)
   matx - as.matrix(x)
   for (i in 1:n) 
   {
  dismat[i,] - mahalanobis(matx, matx[i,], covmat)^.5
   }
   dismat
}
system.time(mahadist(x, covmat))
#   user  system elapsed 
#   2.820.062.95 
system.time(mahadist2(x, covmat))
#   user  system elapsed 
#   1.390.041.45

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization instead of using loop

2008-10-09 Thread Richard . Cotton

Frank said:
  This piece of code works, but it is very slow. We were wondering if 
it's 
 at
  all possible to somehow vectorize this function. Any help would be 
 greatly
  appreciated.

Richie said:
 You can save a substantial time by calling as.matrix before the loop

Patrick said:
 One thing that would speed it up is if you
 inverted 'covmat' once and then used
 'inverted=TRUE' in the call to 'mahalanobis'.

The timings before:
 system.time(mahadist(x, covmat))
 #   user  system elapsed 
 #   2.820.062.95 
 system.time(mahadist2(x, covmat))
 #   user  system elapsed 
 #   1.390.041.45

With Patrick's modification, and moving the square root out of the loop:
mahadist3 - function(x, covmat) #patrick's modification
{
   n - nrow(x)
   dismat - matrix(0,ncol=n,nrow=n)
   matx - as.matrix(x)
   icovmat - chol2inv(chol(covmat))
   for (i in 1:n) 
   {
  dismat[i,] - mahalanobis(matx, matx[i,], icovmat, inverted=TRUE)
   }
   dismat^.5
}
system.time(mahadist3(x, covmat))
#   user  system elapsed 
#   0.800.000.85

Not bad - a better than threefold speed up, without worrying about 
vectorization.

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Frank Hedler

Dear all,
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).

However, we do not want to calculate Euclidian distances. We want to calculate
Mahalanobis distances, which take into account the covariance among
variables.

Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))

 for (i in 1:nrow(x)) {

   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5

 }

 return(dismat)

}


This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function. Any help would be greatly
appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Sarah Goslee

distance() from the ecodist package will calculate Mahalanobis distances.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Sarah Goslee

Hi Frank,

If the way distance() calculates the Mahalanobis distance meets your
needs other than the covariance specification, you can tweak that
_very_ easily. If you use fix(distance) at the command line, you can
edit the source.
change the first line to:
function (x, method = euclidean, icov)
and under method 4, change the icov calculation to:
if(missing(icov)) {
   icov - solve(cov(x))
}

Alternatively, here's a simplified distanceM function with everything but the
relevant bits deleted. You'll still need to have ecodist loaded.

distanceM - function (x, method = mahalanobis, icov)
{
paireddiff - function(x) {
N - nrow(x)
P - ncol(x)
A - numeric(N * N * P)
A - .C(pdiff, as.double(as.vector(t(x))), as.integer(N),
as.integer(P), A = as.double(A), PACKAGE = ecodist)$A
A - array(A, dim = c(N, N, P))
A
}
x - as.matrix(x)
N - nrow(x)
P - ncol(x)

if(missing(icov)) {
   icov - solve(cov(x))
}
A - paireddiff(x)
A1 - apply(A, 1, function(z) (z %*% icov %*% t(z)))
D - A1[seq(1, N * N, by = (N + 1)), ]


D - D[col(D)  row(D)]
attr(D, Size) - N
attr(D, Labels) - rownames(x)
attr(D, Diag) - FALSE
attr(D, Upper) - FALSE
attr(D, method) - METHODS[method]
class(D) - dist
D
}

Sarah

On Tue, Oct 7, 2008 at 1:05 PM, Frank Hedler [EMAIL PROTECTED] wrote:
 Dear all,
 we just realized something. Sarah's distance function - indeed - calculates
 mahalanobis distance very well. However, it uses the
 observed variance-covariance matrix by default.
 What we actually need (sorry for not stating it clearly in to be able to
 specify which variance-covariance matrix goes into that calculation.
 On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED]
 wrote:

 distance() from the ecodist package will calculate Mahalanobis distances.

 Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of a loop for mahalanobis distance calculation

2008-10-07 Thread Frank Hedler

Dear all,we just realized something. Sarah's distance function - indeed -
calculates mahalanobis distance very well. However, it uses the
observed variance-covariance matrix by default.
What we actually need (sorry for not stating it clearly in to be able to
specify which variance-covariance matrix goes into that calculation.

On Tue, Oct 7, 2008 at 12:44 PM, Sarah Goslee [EMAIL PROTECTED]wrote:

 distance() from the ecodist package will calculate Mahalanobis distances.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org


ORIGINAL request:
Dear all,
We have a data frame x with n people as rows and k variables as columns.
Now, for each person (i.e., each row) we want to calculate a distance
between  him/her and EACH other person in x. In other words, we want to
create a n x n matrix with distances (with zeros in the diagonal).

However, we do not want to calculate Euclidian distances. We want to calculate
Mahalanobis distances, which take into account the covariance among
variables.

Below is the piece of code we wrote (covmat in the function below is the
variance-covariance matrix among variables in Data that has to be fed into
mahalonobis function we are using).
 mahadist = function(x, covmat) {
 dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))

 for (i in 1:nrow(x)) {

   dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), covmat)^.5

 }

 return(dismat)

}


This piece of code works, but it is very slow. We were wondering if it's at
all possible to somehow vectorize this function.
Any help would be greatly appreciated.
Thanks,
Frank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of duration of the game in the gambler ruin's problem

2008-08-15 Thread Moshe Olshansky

Hi Jose,

If you are only interested in the expected duration, the problem can be solved 
analytically - no simulation is needed.
Let P be the probability to get total.capital (and then 1-P is the probability 
to loose all the money) when starting with initial.capital. This probability P 
is well known (I do not remember it now but I can derive the formula if you 
need - let me know). Let X(i) be the gain at game i and let D be the duration. 
Let S(n) = X(1)+...+X(n).
Since EX(i) = p - (1-p) = 2p-1, S(n) - n*(2p-1) is a martingale, and since D is 
a stopping time we get that E(S(D) - (2p-1)*D) = 0, so that (2p-1)*E(D) = 
E(S(D)) = P*(total.capital-initial.capital) + (1-P)*(-initial.capital), and so 
E(D) can be computed provided that p != 1/2.
If p = 1/2 then S(n) is a martingale and then by Wald's Lemma, E(S(D)^2) = 
E(D)*E(X^2) = E(D). Since E(S(D)^2) = P*(total.capital-initial.capital)^2 + 
(1-P)*(-initial.capital)^2, we can compute E(D).

Regards,

Moshe.

--- On Fri, 15/8/08, jose romero [EMAIL PROTECTED] wrote:

 From: jose romero [EMAIL PROTECTED]
 Subject: [R] Vectorization of duration of the game in the gambler ruin's 
 problem
 To: r-help@r-project.org
 Received: Friday, 15 August, 2008, 2:26 PM
 Hey fellas:
 
 In the context of the gambler's ruin problem, the
 following R code obtains the mean duration of the game, in
 turns:
 
 # total.capital is a constant, an arbitrary positive
 integer
 # initial.capital is a constant, an arbitrary positive
 integer between, and not including
 # 0 and total.capital
 # p is the probability of winning 1$ on each turn
 # 1-p is the probability of loosing 1$
 # N is a large integer representing the number of times to
 simulate
 # dur is a vector containing the simulated game durations
 
 
 T - total.capital
 dur - NULL
 for (n in 1:N) {
     x - initial.capital
     d - 0
     while ((x!=0)(x!=T)) {
    x -
 x+sample(c(-1,1),1,replace=TRUE,c(1-p,p))
    d - d+1
     }
    dur - c(dur,d)
 }
 mean(dur) #returns the mean duration of the game
 
 The problem with this code is that, using the traditional
 control structures (while, for, etc.) it is rather slow.
 Does anyone know of a way i could vectorize the
 while and the for to produce a
 faster code?
 
 And while I'm at it, does anyone know of a
 discrete-event simulation package in R such as the
 SimPy for Python?
 
 
 Thanks in advance
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization of duration of the game in the gambler ruin's problem

2008-08-14 Thread jose romero

Hey fellas:

In the context of the gambler's ruin problem, the following R code obtains the 
mean duration of the game, in turns:

# total.capital is a constant, an arbitrary positive integer
# initial.capital is a constant, an arbitrary positive integer between, and not 
including
# 0 and total.capital
# p is the probability of winning 1$ on each turn
# 1-p is the probability of loosing 1$
# N is a large integer representing the number of times to simulate
# dur is a vector containing the simulated game durations


T - total.capital
dur - NULL
for (n in 1:N) {
    x - initial.capital
    d - 0
    while ((x!=0)(x!=T)) {
   x - x+sample(c(-1,1),1,replace=TRUE,c(1-p,p))
   d - d+1
    }
   dur - c(dur,d)
}
mean(dur) #returns the mean duration of the game

The problem with this code is that, using the traditional control structures 
(while, for, etc.) it is rather slow. Does anyone know of a way i could 
vectorize the while and the for to produce a faster code?

And while I'm at it, does anyone know of a discrete-event simulation package in 
R such as the SimPy for Python?


Thanks in advance


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization Problem

2008-03-22 Thread David Winsemius

Sergey Goriatchev [EMAIL PROTECTED] wrote in
news:[EMAIL PROTECTED]: 

 I have the code for the bivariate Gaussian copula. It is written
 with for-loops, it works, but I wonder if there is a way to
 vectorize the function.
 I don't see how outer() can be used in this case, but maybe one can
 use mapply() or Vectorize() in some way? Could anyone help me,
 please? 
 
 ## Density of Gauss Copula
snipped your code that you didn't like

When Yan built his copula package, he called the dmvnorm function from 
Leisch's mvtnorm package:

dnormalCopula - function(copula, u) {
  dim - [EMAIL PROTECTED]
  sigma - getSigma(copula)
  if (is.vector(u)) u - matrix(u, ncol = dim)
  x - qnorm(u)
  val - dmvnorm(x, sigma = sigma) / apply(x, 1, function(v) prod(dnorm
(v)))
  val[apply(u, 1, function(v) any(v = 0))] - 0
  val[apply(u, 1, function(v) any(v = 1))] - 0
  val
}

If the mvtnorm package is installed, one looks at the dmvnorm function 
simply by typing:

dmvnorm

I did not see any for-loops. After error checking, Leisch's code is:

distval - mahalanobis(x, center = mean, cov = sigma)
logdet - sum(log(eigen(sigma, symmetric = TRUE, 
 only.values = TRUE)$values))
logretval - -(ncol(x) * log(2 * pi) + logdet + distval)/2
if (log) 
return(logretval)
exp(logretval)
-

-- 
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Vectorization/Speed Problem

2007-11-20 Thread Tom Johnson

Hi,

I cannot find a 'vectorized' solution to this 'for loop' kind of problem.
Do you see a vectorized, fast-running solution?

Objective:
Take the value of X at each timepoint and calculate the corresponding value
of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y is
incremented by the number of 0's adjacent to the last 1.  The frequency and
distribution of X vary widely and may have ~100 repeated 0's or 1's in a
vector of 10k timepoints.

Example:
time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
X0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
Y0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .

What I have done:
My for() and apply()-related standard solutions are too slow.  They are 6
times slower than my prototype, vectorized code which uses cumsum().
However(!)... my results are inaccurate and I can't correct them without
introducing a for()!  Here is my shot at a vectorized solution, as far as I
can take it.

Preliminary Vectorized Code:
X   - matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, byrow=TRUE)
colnames(X) - c(paste(a, 1:20, sep=))
noX - X; noX[X!=0] - 0; cumX - noX; cumNoX - noX; Y1 - noX; Y2 - X; Y3
- X

for (e in 1:ncol(X)) {
cumX[,e] - cumsum(X[,e])
noX[X[,e]  1  cumsum(X[,e])  0 ,e] - 1
cumNoX[,e] - cumsum(noX[,e])
}
Y1[cumNoX  0] - cumNoX[cumNoX  0] + 1
Y2[X == 0  noX  0] - Y1[X == 0  noX  0]
Y3 - Y2
Y3[cumX  1  noX  0] - Y2[cumX  1  noX  0] - cumX[cumX  1  noX  0]
X; Y3

Your help would be greatly appreciated!  I'm stuck.
Thank you,

Tom
Johnson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization/Speed Problem

2007-11-20 Thread Gabor Grothendieck

Let x be the input vector and cx be the cumulative running sum of it.
Then seq_along(cx) - match(cx, cx) gives increasing sequences
starting at 0 and for those after the leading zeros we start them
at 1 by adding cummax(x).

x - c(0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0) # input

cx - cumsum(x)
seq_along(cx) - match(cx, cx) + cummax(x)

On Nov 20, 2007 6:42 PM, Tom Johnson [EMAIL PROTECTED] wrote:
 Hi,

 I cannot find a 'vectorized' solution to this 'for loop' kind of problem.
 Do you see a vectorized, fast-running solution?

 Objective:
 Take the value of X at each timepoint and calculate the corresponding value
 of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y is
 incremented by the number of 0's adjacent to the last 1.  The frequency and
 distribution of X vary widely and may have ~100 repeated 0's or 1's in a
 vector of 10k timepoints.

 Example:
 time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
 X0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
 Y0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .

 What I have done:
 My for() and apply()-related standard solutions are too slow.  They are 6
 times slower than my prototype, vectorized code which uses cumsum().
 However(!)... my results are inaccurate and I can't correct them without
 introducing a for()!  Here is my shot at a vectorized solution, as far as I
 can take it.

 Preliminary Vectorized Code:
 X   - matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, 
 byrow=TRUE)
colnames(X) - c(paste(a, 1:20, sep=))
 noX - X; noX[X!=0] - 0; cumX - noX; cumNoX - noX; Y1 - noX; Y2 - X; Y3
 - X

 for (e in 1:ncol(X)) {
cumX[,e] - cumsum(X[,e])
noX[X[,e]  1  cumsum(X[,e])  0 ,e] - 1
cumNoX[,e] - cumsum(noX[,e])
}
 Y1[cumNoX  0] - cumNoX[cumNoX  0] + 1
 Y2[X == 0  noX  0] - Y1[X == 0  noX  0]
 Y3 - Y2
 Y3[cumX  1  noX  0] - Y2[cumX  1  noX  0] - cumX[cumX  1  noX  0]
 X; Y3

 Your help would be greatly appreciated!  I'm stuck.
 Thank you,

 Tom
 Johnson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

56 matches

Mail list logo