On Tue, Jul 17, 2012 at 10:44 AM, French, Joshua <joshua.fre...@ucdenver.edu> wrote: > Thank you all for the responses. > > Christian, I didn't know about the copy_aux_mem option. I will have to > take a look at that. > > Dirk, thanks for looking into the 64-bit matrix indices. > > Doug, the place in my code where I get the error is when I multiply > matrices. I might have matrices X and Y, where X is 300000x500 and Y is > 500x300000 and I want Z = X * Y.
You will need to come up with another algorithm. The size of Z is 3,000,000 by 3,000,000 and storing this as a dense matrix will require about 67 terabytes of memory > (3000000 * 3000000 * 8) / 2^30 [1] 67055.23 I don't think even Google has that much memory available. You are going to have to work with a decomposition. For example a QR decomposition can represent the matrix X as a (virtual) orthogonal matrix Q of size 3,000,000 by 3,000,000 and R an upper triangular 500 by 500 matrix but stored in essentially the same amount of space as the original matrix X. > I could break up the matrices into > smaller chunks and do the multiplication, but Z is later used in several > other multi-step calculations (with addition and multiplication mostly) so > I think that would be a last resort. If I can get the 64-bit matrix > indices working in RcppArmadillo, I think that will solve much of the > problem, because I will only need to return very long vectors and not big > matrices. > > Joshua > -- > Joshua French, Ph.D. > Assistant Professor > Department of Mathematical and Statistical Sciences > University of Colorado Denver > joshua.fre...@ucdenver.edu > http://math.ucdenver.edu/~jfrench/ > Ph: 303-556-6265 Fax: 303-556-8550 > > > > > > > > On 7/17/12 8:56 AM, "Douglas Bates" <ba...@stat.wisc.edu> wrote: > >>On Tue, Jul 17, 2012 at 8:14 AM, Dirk Eddelbuettel <e...@debian.org> wrote: >>> >>> On 16 July 2012 at 23:30, French, Joshua wrote: >>> | I am doing some linear algebra on large matrices in R and receiving >>>the >>> | following error: "allocMatrix: too many elements specified". From >>>what I >>> | understand, the error is caused by the fact that R uses 32-bit ints >>>and not >>> | 64-bit ints for matrix indices, so R doesn't have a way to represent >>>all the >>> | elements in the very large matrix. >>> | >>> | My two questions: >>> | >>> | 1. Armadillo (and presumably RcppArmadillo) will not have this issue >>>since >>> | Armadillo provided support for 64-bit indices as of version 2.4.0. >>>Is there a >>> | way to easily utilize this functionality from within RcppArmadillo? >>> >>> I need to double check but this may have been a compile-time option you >>>need >>> to enable. In any event ... R indices are still limited so you may not >>>be >>> able to pass these back and forth. >>> >>> | 2. I have found in the past that some of the speeds gains from >>>RcppArmadillo >>> | in comparison to pure R are lost when passing large matrices as >>>arguments. >>> | There will always be overhead when passing arguments (especially >>>large matrix >>> | arguments) to pretty much any function. Are there any tricks to >>>minimize the >>> | overhead when passing a non-sparse matrix argument of say 1,000,000 >>>by 500 from >>> | R to Armadillo? >>> >>> I defer all question concerning sparse matrices to Doug and other users >>>of >>> sparse matrix code. I live mostly in a small-to-medium size dense >>>matrix world. >> >>Actually the question was about non-sparse matrices. It looks as if >>it is the number of rows in the matrix that will be problematic. An >>upper bound on the number of rows is the maximum integer value divided >>by the number of columns. >> >>> .Machine$integer.max / 500 >>[1] 4294967 >> >>I would try not to exceed about 1/2 to 1/4 of that bound. >> >>A simple way of handling data sets that are much larger than that is >>to work with a sample of the rows. If that is not feasible then I >>would create a list of matrices each of size 1,000,000 by 500 or so >>and vertically concatenate them in the C++ code. Of course, this >>means a copying operation. Also, when you are finished if you need to >>pass results back to R then you face a similar problem getting a large >>matrix in C++ back into R storage. >> >>You can create a read-only matrix in C++ using the storage from R as >>described by Christian for RcppArmadillo or using the Eigen::Map class >>in RcppEigen. >> >>What are you (Joshua) doing with these large matrices? If the main >>calculations involve X'X-type calculations you can carry out the >>calculations on horizontal chunks and then assemble the results. > _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel