Can you supply the results of sessionInfo() please, and the full bam 
call that causes this.

best,

Simon (mgcv maintainer)

On 15/03/2019 09:09, Frank van Berkum wrote:
> Dear Community,
>
> In our current research we are trying to fit Generalized Additive Models to a 
> large dataset. We are using the package mgcv in R.
>
> Our dataset contains about 22 million records with less than 20 risk factors 
> for each observation, so in our case n>>p. The dataset covers the period 2006 
> until 2011, and we analyse both the complete dataset and datasets in which we 
> leave out a single year. The latter part is done to analyse robustness of the 
> results. We understand k-fold cross validation may seem more appropriate, but 
> out approach is closer to what is done in practice (how will one additional 
> year of information affect your estimates?).
>
> We use the function bam as advocated in Wood et al. (2017), and we apply the 
> following options: bam(�, discrete=TRUE, chunk.size=10000, gc.level=1). We 
> run these analyses on a computer cluster (see 
> https://userinfo.surfsara.nl/systems/lisa/description for details), and the 
> job is allocated to a node within the computer cluster. A node has at least 
> 16 cores and 64Gb memory.
>
> We had expected 64Gb of memory to be sufficient for these analyses, 
> especially since the bam function is built specifically for large datasets. 
> However, when applying this function to the different datasets described 
> above with different regression specifications (different risk factors 
> included in the linear predictor), we sometimes obtain errors of the 
> following form.
>
> Error in XWyd(G$Xd, w, z, G$kd, G$ks, G$ts, G$dt, G$v, G$qc, G$drop, ar.stop, 
>  :
>
>    'Calloc' could not allocate memory (22624897 of 8 bytes)
>
> Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> XWyd
>
> Execution halted
>
> Warning message:
>
> system call failed: Cannot allocate memory
>
> Error in Xbd(G$Xd, coef, G$kd, G$ks, G$ts, G$dt, G$v, G$qc, G$drop) :
>
>    'Calloc' could not allocate memory (18590685 of 8 bytes)
>
> Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> Xbd
>
> Execution halted
>
> Warning message:
>
> system call failed: Cannot allocate memory
>
> Error: cannot allocate vector of size 1.7 Gb
>
> Timing stopped at: 2 0.556 4.831
>
> Error in system.time(oo <- .C(C_XWXd0, XWX = as.double(rep(0, (pt + nt)^2)),  
> :
>
>    'Calloc' could not allocate memory (55315650 of 24 bytes)
>
> Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> XWXd -> system.time -> .C
>
> Timing stopped at: 1.056 1.396 2.459
>
> Execution halted
>
> Warning message:
>
> system call failed: Cannot allocate memory
>
> The errors seem to arise at different stages in the optimization process. We 
> have analysed whether these errors disappear if different settings are used 
> (different chunk.size, different gc.level), but this does not resolve our 
> problem. Also, the errors occur on different datasets when using different 
> settings, and even when using the same settings it is possible that an error 
> that occurred on dataset X in one run it does not necessarily occur on 
> dataset X in a different run. When using the discrete=TRUE option, 
> optimization can be parallelized, but we have chosen to not employ this 
> feature to ensure memory does not have to be shared between parallel 
> processes.
>
> Naturally I cannot share our dataset with you which makes the problem 
> difficult to analyse. However, based on your collective knowledge, could you 
> pinpoint us to where the problem may occur? Is it something within the C-code 
> used within the package (as the last error seems to indicate), or is it 
> related to the computer cluster?
>
> Any help or insights is much appreciated.
>
> Kind regards,
>
> Frank
>
>       [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Simon Wood, School of Mathematics, University of Bristol, BS8 1TW UK
https://people.maths.bris.ac.uk/~sw15190/


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to