Hi,
Sorry to get back to this, but I can't figure out how to solve it. Do
not hesitate to direct me to another forum if this one is not
appropriate. I don't think there's anything wrong with Rcpp, but I might
not proceed with the loop in Rcpp the right way.
So I tried various regressions in the Rcpp for loop (fastglm,
Rfast::glm_poisson, even fastlm; see code at end of email) and they all
output the same saved matrix, but a different one than without including
a regression step. Compared to a pure R code, the R code with or without
a regression output the same matrix as the Rcpp loop without a
regression (and therefore different from the ones with a regression),
i.e:
- R loop without regression, R loop with regression (glm or fastglm),
Rcpp loop without regression = all.equal TRUE.
- Rcpp loop with regression (fastglm, Rfast, or fastlm) = all.equal
TRUE but different from above.
I'm totally puzzled on why I've got these results. How can I get the
same output with Rcpp + regression as with the R loop +/- regression?
Denis
NumericMatrix define_e(int iter, NumericMatrix obs_mat, NumericVector
draws) {
Rcpp::NumericVector d = obs_mat(_, 1);
Rcpp::NumericVector pr = no_init(d.length());
Rcpp::NumericMatrix e(d.length(), iter);
for (int i = 0; i < iter; i++) {
pr = obs_mat(_, 0) * draws(i);
e(_, i) = cpprbinom(d.length(), 1, pr);
}
return e;
}
NumericMatrix define_ereg1(int iter, NumericMatrix obs_mat,
NumericVector draws) {
Environment pkg = Environment::namespace_env("fastglm");
Function f = pkg["fastglm"];
Rcpp::NumericVector d = obs_mat(_, 1);
Rcpp::NumericVector pr = no_init(d.length());
Rcpp::NumericMatrix e(d.length(), iter);
Rcpp::NumericMatrix ematrix(d.length(), 2);
Rcpp::NumericVector v(d.length(), 1);
ematrix(_, 0) = v;
Rcpp::List mod_pois;
for (int i = 0; i < iter; i++) {
pr = obs_mat(_, 0) * draws(i);
e(_, i) = cpprbinom(d.length(), 1, pr);
ematrix(_, 1) = e;
mod_pois = f(Named("x", ematrix), Named("y", d), Named("family",
"poisson"));
}
return e;
}
NumericMatrix define_ereg2(int iter, NumericMatrix obs_mat,
NumericVector draws) {
Environment pkg = Environment::namespace_env("Rfast");
Function f = pkg["glm_poisson"];
Rcpp::NumericVector d = obs_mat(_, 1);
Rcpp::NumericVector pr = no_init(d.length());
Rcpp::NumericMatrix e(d.length(), iter);
Rcpp::NumericMatrix ematrix(d.length(), 2);
Rcpp::NumericVector v(d.length(), 1);
ematrix(_, 0) = v;
Rcpp::List mod_pois;
for (int i = 0; i < iter; i++) {
pr = obs_mat(_, 0) * draws(i);
e(_, i) = cpprbinom(d.length(), 1, pr);
ematrix(_, 1) = e;
mod_pois = f(Named("x", ematrix(_, 1)), Named("y", d));
}
return e;
}
NumericMatrix define_ereg3(int iter, NumericMatrix obs_mat,
NumericVector draws) {
Environment pkg = Environment::namespace_env("RcppArmadillo");
Function f = pkg["fastLm"];
Rcpp::NumericVector d = obs_mat(_, 1);
Rcpp::NumericVector pr = no_init(d.length());
Rcpp::NumericMatrix e(d.length(), iter);
Rcpp::NumericMatrix ematrix(d.length(), 2);
Rcpp::NumericVector v(d.length(), 1);
ematrix(_, 0) = v;
Rcpp::List mod_pois;
for (int i = 0; i < iter; i++) {
pr = obs_mat(_, 0) * draws(i);
e(_, i) = cpprbinom(d.length(), 1, pr);
ematrix(_, 1) = e;
mod_pois = f(Named("X", ematrix), Named("y", d));
}
return e;
}
draws <- rbeta(200, 100, 60)
obs_mat <- cbind(c(rep(1, 40), rep(0, 60)), c(rep(1, 20), rep(0, 80)))
iter <- length(draws)
set.seed(1234)
t0 <- define_e(iter, obs_mat, draws)
e <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
p <- e_obs * draws[i]
e[, i] <- rbinom(nrow(obs_mat), 1, p)
}
all.equal(t0, e) # TRUE
e1 <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
p <- e_obs * draws[i]
e1[, i] <- rbinom(nrow(obs_mat), 1, p)
X <- cbind(rep(1, nrow(obs_mat)), e1[, i])
d <- obs_mat[, 2]
mod_pois <- glm(d ~ X, family = poisson(link = "log"))
}
all.equal(e, e1) # TRUE
all.equal(e1, t0) # TRUE
e2 <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
p <- e_obs * draws[i]
e2[, i] <- rbinom(nrow(obs_mat), 1, p)
X <- cbind(rep(1, nrow(obs_mat)), e2[, i])
d <- obs_mat[, 2]
mod_pois <- fastglm::fastglm(X, d, family = poisson(link = "log"))
}
all.equal(e, e2) # TRUE
all.equal(e2, t0) # TRUE
set.seed(1234)
t1 <- define_ereg1(iter, obs_mat, draws)
all.equal(t1, e) # [1] "Mean relative difference: 1.788299"
set.seed(1234)
t2 <- define_ereg2(iter, obs_mat, draws)
all.equal(t2, e) # [1] "Mean relative difference: 1.788299"
all.equal(t2, t1) # TRUE
set.seed(1234)
t3 <- define_ereg3(iter, obs_mat, draws)
all.equal(t3, e) # "Mean relative difference: 1.788299"
all.equal(t3, t1) # TRUE
all.equal(t3, t2) # TRUE
Le 2024-10-01 13:19, Dirk Eddelbuettel a écrit :
On 1 October 2024 at 11:57, Denis Haine wrote:
| But adding the regression to define_e1() (as define_e2() in previous
email)
| does not give the same. The first iteration in the loop is identical
in both,
| but not the subsequent ones. That's why I was thinking it could be
linked to
| the seed, but (sorry for my limited knowledge of C++) it would be
strange to
| manually "re-seed" at each iteration. Moreover, I believe the seed
set in R is
| transferred to Rcpp and so does not need to be taken specifically
care of.
Nice. So it seems to me that you identified fastglm as the source of
your
problem. When we call it, your results differ and there is a
side-effect.
But when we don't it is still true. I added
Environment pkg2 = Environment::namespace_env("base");
Function f2 = pkg2["dim"];
// ...
RObject ro = f2(ematrix);
while keeping the lines relevant to fastglm calls and result storage
commented out. Now we get TRUE from all.equal(t0,t1).
That seems to rule out Rcpp::Function, and may indicate you need to
look into
what fastglm may be doing.
Cheers, Dirk
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel