Please confirm that when you do the manual load and check that f(v*) matches the result from qsub() it succeeds for cases #1,#2 but only fails for #3.
On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arth...@gmail.com> wrote: > Dear all, > > I am currently having a weird problem with a large-scale optimization > routine. It would be nice to know if any of you have already gone through > something similar, and how you solved it. > > I apologize in advance for not providing an example, but I think the > non-reproducibility of the error is maybe a key point of this problem. > > Simplest possible description of the problem: I have two functions: g(X) > and f(v). > g(X) does: > i) inputs a large matrix X; > ii) derives four other matrices from X (I'll call them A, B, C and D) then > saves to disk for debugging purposes; > > Then, f(v) does: > iii) loads A, B, C, D from disk > iv) calculates the log-likelihood, which vary according to a vector of > parameters, v. > > My goal application is quite big (X is a 40000x40000 matrix), so I created > the following versions to test and run the codes/math/parallelization: > #1) A simulated example with X being 100x100 > #2) A degraded version of the goal application, with X being 4000x4000 > #3) The goal application, with X being 40000x40000 > > When I use qsub to submit the job, using the exact same code and processing > cluster, #1 and #2 run flawlessly, so no problem. These results tell me > that the codes/math/parallelization are fine. > > For application #3, it converges to a vector v*. However, when I manually > load A, B, C and D from disk and calculate f(v*), then the value I get is > completely different. > For example: > - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1. > - when I manually load A, B, C, D from disk and calculate f(v*) on the > exact same machine with the same libraries and environment variables, I get > f(v*) = 1000. > > This is a very confusing behavior. In theory the size of X should not > affect my problem, but it seems that things get unstable as the dimension > grows. The main issue for debugging is that g(X) for simulation #3 takes > two hours to run, and I am completely lost on how I could find the causes > of the problem. Would you have any general advices? > > Thank you very much in advance for literally any suggestions you might > have! > > Best regards, > Arthur > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.