Re: [Rcpp-devel] Forcing a shallow versus deep copy

2013-07-12 Thread Gabor Grothendieck
On Fri, Jul 12, 2013 at 1:42 AM, Dirk Eddelbuettel e...@debian.org wrote:

 On 11 July 2013 at 19:21, Gabor Grothendieck wrote:
 | 1. Just to be clear what we have been discussing here is not just how to
 | avoid copying but how to avoid copying while using as and wrap
 | or approaches that automatically generate as and wrap.  I was already
 | aware of how to avoid copying using Armadillo how to use Armadillo types
 | as arguments and return values to autogen as and wrap.  The problem is
 | not that but that these two things cannot be done at once - its either or.

 I must still be misunderstanding as this still reads to me as if you are
 suspecting that we somehow keep layers making extra copies.

 We're not. And I've known you long enough to know that you are not likely to
 suspect this either.  So what is it then?

 As Romain said, some of the choice have to do with the representation on both
 the R and C++ side -- for Rcpp itself we can be lightweight and efficient via
 proxy classes, but this does not mean we can do this for _any arbitrary C++
 class_ coming from another project. As eg Armadillo.  RcppArmadilo already
 does pretty well, and code review may make it better.  We do not know of any
 fat to cut, or we'd cut it ourselves.  We care about a few things, but
 performance is clearly among them.

I think Romain's proposal will clarify this.


 | 2. Regarding the quesiton of performance impact there are two situations
 | which should be distinguished:
 |
 | i. We call C++ from R and it does some processing and then returns and
 | we don't call it again. In that case its likely that copying or not won't
 | make a big difference or at least it won't if the actual C++ computation
 | time is large coimpared to the time spent in copying.
 |
 | ii. We factor out the inner loop of the code and only recode that in C++
 | and repeatedly call it many times.  In that case the copying is multiplied
 | by the number of iterations and might very well have a significant impact.

 In case ii) I'd try to use a different design and make it more like i): You
 generally do not want to call down from R to object code a bazillion times as
 there is always some overhead, and multiplying even something rather
 efficient by a veryBigNumber can make small times large in the aggregate.

Sure and sugar, rcpparmadillo and other facilities do make it easier to move
more functionality into C++; nevertheless, it can be the case that a relatively
small amount of R code repeatedly
invoked is responsible for the performance hit in a program and from
the viewpoint
of reducing complexity and increasing maintainability it can be
desirable to just
move that minimum portion to the C++ side minimizing the dual language aspect
of the code.  By making call overhead as fast
as one can while retaining any automatic Rcpp features then this
is facilitated.  If its not possible in general then if it were just possible
for Armadillo objects and selected other situations then this would
still be nice.


 Dirk

 |
 | On Thu, Jul 11, 2013 at 6:55 PM, Dirk Eddelbuettel e...@debian.org wrote:
 | 
 |  Everybody has this existing example in their copy of Armadillo.
 | 
 |  I am running it here from SVN rather than the installed directory, but 
 this
 |  should not make a difference. Machine is my not-overly-powerful thinkpad 
 used
 |  for traveling:
 | 
 |  edd@don:~/svn/rcpp/pkg/RcppArmadillo/inst/examples$ r fastLm.r
 |  Loading required package: methods
 | 
 |  Attaching package: ‘Rcpp’
 | 
 |  The following object is masked from ‘package:inline’:
 | 
 |  registerPlugin
 | 
 | test replications relative elapsed user.self 
 sys.self
 |  2 fLmTwoCasts(X, y) 50001.000   0.184 0.204
 0.164
 |  1  fLmOneCast(X, y) 50001.011   0.186 0.200
 0.172
 |  4   fastLmPureDotCall(X, y) 50001.141   0.210 0.236
 0.184
 |  3  fastLmPure(X, y) 50002.027   0.373 0.412
 0.332
 |  6  lm.fit(X, y) 50002.685   0.494 0.528
 0.456
 |  5 fastLm(frm, data = trees) 5000   36.380   6.694 7.332
 6.028
 |  7 lm(frm, data = trees) 5000   42.734   7.863 8.628
 7.068
 |  edd@don:~/svn/rcpp/pkg/RcppArmadillo/inst/examples$
 | 
 |  What we are talking about here is the difference between 'fLmTwoCasts' and
 |  'fLmOneCasts'.  If you use larger objects, the different with be larger.  
 But
 |  the relative differences are tiny.
 | 
 |  It would be nice to make this more elegant, and I look forward to Romain's
 |  proposals, but methinks that we may well have bigger fish to fry.
 | 
 |  Dirk, still in Sydney
 | 
 |  --
 |  Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
 |  ___
 |  Rcpp-devel mailing list
 |  Rcpp-devel@lists.r-forge.r-project.org
 |  https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
 |
 |
 

Re: [Rcpp-devel] Forcing a shallow versus deep copy

2013-07-12 Thread Changi Han
I apologize if my emails were badly phrased, or disrespectful. No intention
of saying anything was broken, suspicious or wrong.

I second Gabor. His described use case matches mine. The outer loop is an
optimization routine coming from other libraries. Rcpp is used to speed up
the objective, gradient and hessian computations and hence the data is
constantly passed along to all of these functions. Another use case to
consider is recursion with data passed along. A toy example is gib(0) =
values(0); gib(1) = values(1); gib(x) = gib(x-1) + gib(x-2) + values(x).
Values = vector of non negative integers. A naive implementation with aux
memory allocation may cause the number of copies in memory to grow with
exponential order in x.


 In case ii) I'd try to use a different design and make it more like i):
 You
  generally do not want to call down from R to object code a bazillion
 times as
  there is always some overhead, and multiplying even something rather
  efficient by a veryBigNumber can make small times large in the aggregate.

 Sure and sugar, rcpparmadillo and other facilities do make it easier to
 move
 more functionality into C++; nevertheless, it can be the case that a
 relatively
 small amount of R code repeatedly
 invoked is responsible for the performance hit in a program and from
 the viewpoint
 of reducing complexity and increasing maintainability it can be
 desirable to just
 move that minimum portion to the C++ side minimizing the dual language
 aspect
 of the code.  By making call overhead as fast
 as one can while retaining any automatic Rcpp features then this
 is facilitated.  If its not possible in general then if it were just
 possible
 for Armadillo objects and selected other situations then this would
 still be nice.

 
  Dirk
 
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

[Rcpp-devel] Difference between runif and unif_rand?

2013-07-12 Thread Neal Fultz
I've been updating the C in an r package to use Rcpp and started seeing 
so odd results from a function that samples from a discrete
distribution.

I think I've narrowed it down to the sugar runif. The two programs below
are identical except f_works uses unif_rand and f_broke goes through runif. 

---

library(Rcpp)

cppFunction('
int f_works(NumericVector p) {

  int i = 0;

  double u = unif_rand();

  do { u -= p[i++]; } while(u   0);

  return i;
}
')


cppFunction('
int f_broke(NumericVector p) {

  int i = 0;
  
  double u = Rcpp::runif(1)[0];
  
  do { u -= p[i++]; } while(u   0);
  
  return i;
}
')

---

When I run f_broke, sometimes there is a number that doesn't belong:

Rp - 1:4 / 10;
Rtable(replicate(1, f_works(p)))

   1234 
 961 2045 2984 4010 

Rtable(replicate(1, f_broke(p)))

0.427225098479539 0.458629200235009 0.687304416438565 0.735608365153894 
0.978602966060862 
1 1 1 1 
1 

1 2 3 4  
 1053  2010  3021  3911  


I'm pretty stumped about why I consistently see odd numbers every couple
thousand draws, becuase runif just wraps unif_rand anyway. My
functions are declared as int, so it should be impossible to
return a double. 

What is happening here that I don't get?

Thanks much,

Neal




___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


Re: [Rcpp-devel] Difference between runif and unif_rand?

2013-07-12 Thread Dirk Eddelbuettel

On 12 July 2013 at 12:28, Neal Fultz wrote:
| I've been updating the C in an r package to use Rcpp and started seeing 
| so odd results from a function that samples from a discrete
| distribution.
| 
| I think I've narrowed it down to the sugar runif. The two programs below
| are identical except f_works uses unif_rand and f_broke goes through runif. 
| 
| ---
| 
| library(Rcpp)
| 
| cppFunction('
| int f_works(NumericVector p) {
| 
|   int i = 0;
| 
|   double u = unif_rand();

There was a reason we never ever used this one; I think it is meant for
standalone use via Rmathlib.
 
|   do { u -= p[i++]; } while(u   0);
| 
|   return i;
| }
| ')
| 
| 
| cppFunction('
| int f_broke(NumericVector p) {
| 
|   int i = 0;
|   
|   double u = Rcpp::runif(1)[0];

If you really want just oone draw, use R::runif(...)

If you set / reset the RNG seed you very clearly get the exact same draws
from R and Rcpp.

So maybe it is just your assumption that unif_rand() and runif() should give
identical results?

Also, never hurts to do add an explicit RNGScope object even though
cppFunction() will add it for you too...

|   do { u -= p[i++]; } while(u   0);
|   
|   return i;
| }
| ')
| 
| ---
| 
| When I run f_broke, sometimes there is a number that doesn't belong:
| 
| Rp - 1:4 / 10;
| Rtable(replicate(1, f_works(p)))
| 
|1234 
|  961 2045 2984 4010 
| 
| Rtable(replicate(1, f_broke(p)))
| 
| 0.427225098479539 0.458629200235009 0.687304416438565 0.735608365153894 
0.978602966060862 
| 1 1 1 1   
  1 
| 
| 1 2 3 4  
|  1053  2010  3021  3911  
| 
| 
| I'm pretty stumped about why I consistently see odd numbers every couple
| thousand draws, becuase runif just wraps unif_rand anyway. My
| functions are declared as int, so it should be impossible to
| return a double. 
| 
| What is happening here that I don't get?

I am not entirely sure.  My head is a little foggy froim traveling and I am
about to get onto another plane.  Maybe I'll take a look...

Hope this helps so far,  Dirk

| 
| Thanks much,
| 
| Neal
| 
| 
| 
| 
| ___
| Rcpp-devel mailing list
| Rcpp-devel@lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


Re: [Rcpp-devel] Difference between runif and unif_rand?

2013-07-12 Thread Krzysztof Sakrejda
On Fri, Jul 12, 2013 at 8:50 PM, Dirk Eddelbuettel e...@debian.org wrote:

 On 12 July 2013 at 12:28, Neal Fultz wrote:
 | I've been updating the C in an r package to use Rcpp and started seeing
 | so odd results from a function that samples from a discrete
 | distribution.

For what it's worth, I can't reproduce this so it may be specific to
your versions of R/Rcpp. Krzysztof
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


Re: [Rcpp-devel] Difference between runif and unif_rand?

2013-07-12 Thread Neal Fultz
I use the debian packages. Upgrading r-cran-rcpp from testing to unstable 
seems to have fixed this for me, so good call.


On Fri, Jul 12, 2013 at 10:25:17PM -0400, Krzysztof Sakrejda wrote:
 On Fri, Jul 12, 2013 at 8:50 PM, Dirk Eddelbuettel e...@debian.org wrote:
 
  On 12 July 2013 at 12:28, Neal Fultz wrote:
  | I've been updating the C in an r package to use Rcpp and started seeing
  | so odd results from a function that samples from a discrete
  | distribution.
 
 For what it's worth, I can't reproduce this so it may be specific to
 your versions of R/Rcpp. Krzysztof
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel