Re: [julia-users] Re: Double free or corruption (out)
So it looks like I'm having the same issue - have been running the code without parallelization (defining my SharedArrays as regular ones), and it has now been going for about 3 days without any segfaults. Is this a known issue? If so, do we know whether there's a Julia version one can revert to in which SharedArrays work?
Re: [julia-users] Re: Double free or corruption (out)
Hm, interesting observation... I suppose the issue in my case is that the code as it is takes about 3-4 days to complete, so running it on 1 instead of 15 cores means I'm unlikely to ever get my PhD! I will at least try to run a shorter version that might be solvable in a day or two without parallel.
Re: [julia-users] Re: Double free or corruption (out)
Have you tried running the code without using parallel? I have been getting similar errors in my economics code. It segfaults sometimes, though not always, after a seemingly random amount of time, sometimes an hour or so, sometimes less. However, I don't recall it having ever occurred in the times I've run it without parallel. I'm using SharedArrays like you. I've seen this occur on both 0.4.1 and 0.4.5. The error isn't too serious for me because I periodically save the optimization state to disk, so I can just restart. I also can't remember this ever occurring on my own (Linux) computer. It's happened on a (Linux) cluster with many cores. On Thursday, June 2, 2016 at 3:45:24 AM UTC-4, Nils Gudat wrote: > > Fair enough. Does anyone have any clues as to how I would go about > investigating this? As has been said before, the stacktraces aren't very > helpful for segfaults, so how do I figure out what's going wrong here? >
Re: [julia-users] Re: Double free or corruption (out)
Fair enough. Does anyone have any clues as to how I would go about investigating this? As has been said before, the stacktraces aren't very helpful for segfaults, so how do I figure out what's going wrong here?
Re: [julia-users] Re: Double free or corruption (out)
I've checked that the problem we were having doesn't happen with Julia 0.4.5 on Travis. In fact, it also doesn't happen on another one of our systems with Julia 0.4.5, so at this stage we have no idea what the problem is. It may be totally unrelated to the problem you are having. Bill. On 31 May 2016 at 13:25, Bill Hartwrote: > We are also suddenly getting crashes with 2.4.5. when running our (Nemo) > test suite. It says that some memory allocation is failing due to invalid > next size. I suspect there is a bug that wasn't there until the last few > days, since we were passing just fine on Travis. Though at this stage, I > haven't checked whether we are still passing on Travis. > > Bill. > > On 31 May 2016 at 12:52, Nils Gudat wrote: > >> Resurrecting this very old thread - after having been able to solve the >> model with no seg faults over the last couple of months, they have now >> returned and occur much faster (usually within 2 hours of running the code). >> I have refactored the code a little so that it hopefully will be possible >> for others to run it. Cloning the entire repo at >> http://github.com/nilshg/LearningModels, it should run when altering the >> path in >> https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl >> to whatever path it has been cloned to. >> >> I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5 >> installed an all packages on the latest tagged versions. >> >> On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote: >>> >>> The code usually segfaults after 2-5 hours, and is available at >>> http://github.com/nilshg/LearningModels, however I haven't written it >>> up in a way that is easy to run (right now it depends on some data not >>> included in the repo), so I'll have to restructure a bit before you can run >>> it. I'll try to do so today if I find the time. >>> >> >
Re: [julia-users] Re: Double free or corruption (out)
We are also suddenly getting crashes with 2.4.5. when running our (Nemo) test suite. It says that some memory allocation is failing due to invalid next size. I suspect there is a bug that wasn't there until the last few days, since we were passing just fine on Travis. Though at this stage, I haven't checked whether we are still passing on Travis. Bill. On 31 May 2016 at 12:52, Nils Gudatwrote: > Resurrecting this very old thread - after having been able to solve the > model with no seg faults over the last couple of months, they have now > returned and occur much faster (usually within 2 hours of running the code). > I have refactored the code a little so that it hopefully will be possible > for others to run it. Cloning the entire repo at > http://github.com/nilshg/LearningModels, it should run when altering the > path in > https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl > to whatever path it has been cloned to. > > I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5 > installed an all packages on the latest tagged versions. > > On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote: >> >> The code usually segfaults after 2-5 hours, and is available at >> http://github.com/nilshg/LearningModels, however I haven't written it up >> in a way that is easy to run (right now it depends on some data not >> included in the repo), so I'll have to restructure a bit before you can run >> it. I'll try to do so today if I find the time. >> >
Re: [julia-users] Re: Double free or corruption (out)
Resurrecting this very old thread - after having been able to solve the model with no seg faults over the last couple of months, they have now returned and occur much faster (usually within 2 hours of running the code). I have refactored the code a little so that it hopefully will be possible for others to run it. Cloning the entire repo at http://github.com/nilshg/LearningModels, it should run when altering the path in https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl to whatever path it has been cloned to. I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5 installed an all packages on the latest tagged versions. On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote: > > The code usually segfaults after 2-5 hours, and is available at > http://github.com/nilshg/LearningModels, however I haven't written it up > in a way that is easy to run (right now it depends on some data not > included in the repo), so I'll have to restructure a bit before you can run > it. I'll try to do so today if I find the time. >
Re: [julia-users] Re: Double free or corruption (out)
On Sat, Sep 26, 2015 at 1:07 PM, Nils Gudatwrote: > That's the problem I alluded to in my question: This happened in the middle > of a very lengthy minimization problem, which had been running for a couple > of hours. On a previous run, a very similar version of the code finished > successfully after about 10 hours. I was hoping that someone could at least > tell me what this error message is about, it seems to be Linux-related and I > have no clue what's going on. The error message means that something corrupted the memory. The most likely reason that causes this I've seen is incorrectly used ccall (or other unsafe memory stores). What packages are you using? Do you at least have a list of them that uses ccall?
Re: [julia-users] Re: Double free or corruption (out)
The minimization itself is NLopt, the problem is to solve an economic model (which takes around 2 minutes to solve on 16 cores) and compare its output (a 100x4 Float64 Array) to some data moments. The model results depend on two parameters. The model itself is mostly minimization (via Optim) and numerical integration (using FastGaussQuadrature), and is parallelized via SharedArrays. (Since you asked for a list of packages, I'm also using ApproXD for linear interpolation, and Distributions to draw from a bivariate Normal).
Re: [julia-users] Re: Double free or corruption (out)
On Sat, Sep 26, 2015 at 2:37 PM, Nils Gudatwrote: > The minimization itself is NLopt, the problem is to solve an economic model > (which takes around 2 minutes to solve on 16 cores) and compare its output > (a 100x4 Float64 Array) to some data moments. The model results depend on > two parameters. The model itself is mostly minimization (via Optim) and > numerical integration (using FastGaussQuadrature), and is parallelized via > SharedArrays. > > (Since you asked for a list of packages, I'm also using ApproXD for linear > interpolation, and Distributions to draw from a bivariate Normal). Looks like there's at least one segfault in NLopt (AppVeyor Nightly Win32) and I can reproduce locally with aggressive GC. Will investigate.
Re: [julia-users] Re: Double free or corruption (out)
> Looks like there's at least one segfault in NLopt (AppVeyor Nightly > Win32) and I can reproduce locally with aggressive GC. Will > investigate. Fixed in https://github.com/JuliaLang/julia/pull/13325 I have no idea if it is the same SegFault/corruption you are seeing or on the AppVeyor though.