On Fri, Mar 18, 2016 at 8:28 PM, Roy Mendelssohn - NOAA Federal <roy.mendelss...@noaa.gov> wrote: > Hi Henrik: > > I want to do want in oceanography is called an EOF, which is just a PCA > analysis. Unless I am missing something, in R I need to flatten my 3-D matrix > into a 2-D data matrix. I can fit the entire 30GB matrix into memory, and I > believe I have enough memory to do the PCA by constraining the number of > components returned . What I don’t think I have enough memory for is an > operation that makes a copy of the matrix. > > As I said, in theory I know how to do the flattening, it a simple command, > but in practice I don’t have enough memory. So I spent the afternoon > rewriting my code to read in parts of the data at a time and then putting > those in the appropriate places of a matrix already flattened of appropriate > size. In case someone is wondering, on the 3D grid the matrix is > [1001,1001,3650]. So I create an empty matrix size [1001*1001,3650], and > read in a slice of the lat-lon grid, and map those into the appropriate > places in the flattened matrix. By reading in appropriately sized chunks my > memory usage is not pushed too far.
Sounds good. There's another small caveat. Make sure to specify the 'data' argument for matrix() we allocating an "empty" matrix, e.g. X <- matrix(NA_real_, nrow=1001*1001, ncol=3650) This will give you a double matrix with all missing value. If you use the default X <- matrix(nrow=1001*1001, ncol=3650) you'll get a logical matrix, which will introduce a copy as soon as you assign a double value (e.g. X[1,1] <- 3.14). The latter is a complete waste of memory/time. See http://www.jottr.org/2014/06/matrixNA-wrong-way.html for details. /Henrik > > -Roy > > >> On Mar 18, 2016, at 7:37 PM, Henrik Bengtsson <henrik.bengts...@gmail.com> >> wrote: >> >> On Fri, Mar 18, 2016 at 3:15 PM, Roy Mendelssohn - NOAA Federal >> <roy.mendelss...@noaa.gov> wrote: >>> Thanks. That is what I needed to know. I don’t want to play around with >>> some of the other suggestions, as I don’t totally understand what they do, >>> and don’t want to risk messing up something and not be aware of it. >>> >>> There is a way to read in the data chunks at a time and reshape it and put, >>> it into the (reshaped) larger array, harder to program but probably worth >>> the pain to be certain of what I am doing. >> >> I recommend this approach; whenever I work with reasonable large data >> (that may become even larger in the future), I try to implement a >> constant-memory version of the algorithm, which often comes down to >> processing data in chunks. The simplest version of this is to read >> all data into memory and the subset, but if you can read data in in >> chunks that is even better. >> >> Though, I'm curious to what matrix operations you wish to perform. >> Because if you wish to do regular summation, then base::.rowSums() and >> base::.colSums() allow you to override the default dimensions on the >> fly without inducing extra copies, e.g. >> >>> X <- array(1:24, dim=c(2,3,4)) >>> .rowSums(X, m=6, n=4) >> [1] 40 44 48 52 56 60 >>> rowSums(matrix(X, nrow=6, ncol=4)) >> [1] 40 44 48 52 56 60 >> >> For other types of calculations, you might want to look at >> matrixStats. It has partial(*) support for overriding the default >> dimensions in a similar fashion. For instance, >> >>> library("matrixStats") >>> rowVars(X, dim.=c(6,4)) >> >> The above effectively calculates rowVars(matrix(X, nrow=6, ncol=4)) >> without making copies. >> >> (*) By partial I mean that this is a feature that hasn't been pushed >> through to all of matrixStats functions, cf. >> https://github.com/HenrikBengtsson/matrixStats/issues/83. >> >> Cheers, >> >> Henrik >> (author of matrixStats) >> >>> >>> I had a feeling a copy was made, just wanted to make certain of it. >>> >>> Thanks again, >>> >>> -Roy >>> >>>> On Mar 18, 2016, at 2:56 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote: >>>> >>>> Hi Roy, >>>> >>>> R (usually) makes a copy if the dimensionality of an array is modified, >>>> even if you use this syntax: >>>> x <- array(1:24, c(2, 3, 4)) >>>> dim(x) <- c(6, 4) >>>> >>>> See also ?tracemem, ?data.table::address, ?pryr::address and other tools >>>> to trace if an internal copy is done. >>>> >>>> Workaround: use data.table::setattr or bit::setattr to modify the >>>> dimensions in place (i.e., without making a copy). Risk: if you modify an >>>> object by reference, all other objects which point to the same memory >>>> address will be modified silently, too. >>>> >>>> HTH, >>>> Denes >>>> >>>> >>>> >>>> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote: >>>>> Hi All: >>>>> >>>>> I am working with a very large array. if noLat is the number of >>>>> latitudes, noLon the number of longitudes and noTime the number of time >>>>> periods, the array is of the form: >>>>> >>>>> myData[noLat, no Lon, noTime]. >>>>> >>>>> It is read in this way because that is how it is stored in a (series) of >>>>> netcdf files. For the analysis I need to do, I need instead the array: >>>>> >>>>> myData[noLat*noLon, noTime]. Normally this would be easy: >>>>> >>>>> myData<- array(myData,dim=c(noLat*noLon,noTime)) >>>>> >>>>> My question is how does this command work in R - does it make a copy of >>>>> the existing array, with different indices for the dimensions, or does it >>>>> just redo the indices and leave the given array as is? The reason for >>>>> this question is my array is 30GB in memory, and I don’t have enough >>>>> space to have a copy of the array in memory. If the latter I will have >>>>> to figure out a work around to bring in only part of the data at a time >>>>> and put it into the proper locations. >>>>> >>>>> Thanks, >>>>> >>>>> -Roy >>>>> >>>>> >>>>> >>>>> ********************** >>>>> "The contents of this message do not reflect any position of the U.S. >>>>> Government or NOAA." >>>>> ********************** >>>>> Roy Mendelssohn >>>>> Supervisory Operations Research Analyst >>>>> NOAA/NMFS >>>>> Environmental Research Division >>>>> Southwest Fisheries Science Center >>>>> ***Note new address and phone*** >>>>> 110 Shaffer Road >>>>> Santa Cruz, CA 95060 >>>>> Phone: (831)-420-3666 >>>>> Fax: (831) 420-3980 >>>>> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ >>>>> >>>>> "Old age and treachery will overcome youth and skill." >>>>> "From those who have been given much, much will be expected" >>>>> "the arc of the moral universe is long, but it bends toward justice" -MLK >>>>> Jr. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>> >>> ********************** >>> "The contents of this message do not reflect any position of the U.S. >>> Government or NOAA." >>> ********************** >>> Roy Mendelssohn >>> Supervisory Operations Research Analyst >>> NOAA/NMFS >>> Environmental Research Division >>> Southwest Fisheries Science Center >>> ***Note new address and phone*** >>> 110 Shaffer Road >>> Santa Cruz, CA 95060 >>> Phone: (831)-420-3666 >>> Fax: (831) 420-3980 >>> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ >>> >>> "Old age and treachery will overcome youth and skill." >>> "From those who have been given much, much will be expected" >>> "the arc of the moral universe is long, but it bends toward justice" -MLK >>> Jr. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ********************** > "The contents of this message do not reflect any position of the U.S. > Government or NOAA." > ********************** > Roy Mendelssohn > Supervisory Operations Research Analyst > NOAA/NMFS > Environmental Research Division > Southwest Fisheries Science Center > ***Note new address and phone*** > 110 Shaffer Road > Santa Cruz, CA 95060 > Phone: (831)-420-3666 > Fax: (831) 420-3980 > e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ > > "Old age and treachery will overcome youth and skill." > "From those who have been given much, much will be expected" > "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.