Re: [julia-users] Re: Double free or corruption (out)

2016-06-12 Thread Nils Gudat
So it looks like I'm having the same issue - have been running the code 
without parallelization (defining my SharedArrays as regular ones), and it 
has now been going for about 3 days without any segfaults. Is this a known 
issue? If so, do we know whether there's a Julia version one can revert to 
in which SharedArrays work?


Re: [julia-users] Re: Double free or corruption (out)

2016-06-02 Thread Nils Gudat
Hm, interesting observation... I suppose the issue in my case is that the 
code as it is takes about 3-4 days to complete, so running it on 1 instead 
of 15 cores means I'm unlikely to ever get my PhD!
I will at least try to run a shorter version that might be solvable in a 
day or two without parallel.



Re: [julia-users] Re: Double free or corruption (out)

2016-06-02 Thread Andrew
Have you tried running the code without using parallel? I have been getting 
similar errors in my economics code. It segfaults sometimes, though not 
always, after a seemingly random amount of time, sometimes an hour or so, 
sometimes less. However, I don't recall it having ever occurred in the 
times I've run it without parallel. I'm using SharedArrays like you. I've 
seen this occur on both 0.4.1 and 0.4.5.

The error isn't too serious for me because I periodically save the 
optimization state to disk, so I can just restart.

I also can't remember this ever occurring on my own (Linux) computer. It's 
happened on a (Linux) cluster with many cores.  


On Thursday, June 2, 2016 at 3:45:24 AM UTC-4, Nils Gudat wrote:
>
> Fair enough. Does anyone have any clues as to how I would go about 
> investigating this? As has been said before, the stacktraces aren't very 
> helpful for segfaults, so how do I figure out what's going wrong here?
>


Re: [julia-users] Re: Double free or corruption (out)

2016-06-02 Thread Nils Gudat
Fair enough. Does anyone have any clues as to how I would go about 
investigating this? As has been said before, the stacktraces aren't very 
helpful for segfaults, so how do I figure out what's going wrong here?


Re: [julia-users] Re: Double free or corruption (out)

2016-06-01 Thread 'Bill Hart' via julia-users
I've checked that the problem we were having doesn't happen with Julia
0.4.5 on Travis. In fact, it also doesn't happen on another one of our
systems with Julia 0.4.5, so at this stage we have no idea what the problem
is. It may be totally unrelated to the problem you are having.

Bill.

On 31 May 2016 at 13:25, Bill Hart  wrote:

> We are also suddenly getting crashes with 2.4.5. when running our (Nemo)
> test suite. It says that some memory allocation is failing due to invalid
> next size. I suspect there is a bug that wasn't there until the last few
> days, since we were passing just fine on Travis. Though at this stage, I
> haven't checked whether we are still passing on Travis.
>
> Bill.
>
> On 31 May 2016 at 12:52, Nils Gudat  wrote:
>
>> Resurrecting this very old thread - after having been able to solve the
>> model with no seg faults over the last couple of months, they have now
>> returned and occur much faster (usually within 2 hours of running the code).
>> I have refactored the code a little so that it hopefully will be possible
>> for others to run it. Cloning the entire repo at
>> http://github.com/nilshg/LearningModels, it should run when altering the
>> path in
>> https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl
>> to whatever path it has been cloned to.
>>
>> I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5
>> installed an all packages on the latest tagged versions.
>>
>> On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote:
>>>
>>> The code usually segfaults after 2-5 hours, and is available at
>>> http://github.com/nilshg/LearningModels, however I haven't written it
>>> up in a way that is easy to run (right now it depends on some data not
>>> included in the repo), so I'll have to restructure a bit before you can run
>>> it. I'll try to do so today if I find the time.
>>>
>>
>


Re: [julia-users] Re: Double free or corruption (out)

2016-05-31 Thread 'Bill Hart' via julia-users
We are also suddenly getting crashes with 2.4.5. when running our (Nemo)
test suite. It says that some memory allocation is failing due to invalid
next size. I suspect there is a bug that wasn't there until the last few
days, since we were passing just fine on Travis. Though at this stage, I
haven't checked whether we are still passing on Travis.

Bill.

On 31 May 2016 at 12:52, Nils Gudat  wrote:

> Resurrecting this very old thread - after having been able to solve the
> model with no seg faults over the last couple of months, they have now
> returned and occur much faster (usually within 2 hours of running the code).
> I have refactored the code a little so that it hopefully will be possible
> for others to run it. Cloning the entire repo at
> http://github.com/nilshg/LearningModels, it should run when altering the
> path in
> https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl
> to whatever path it has been cloned to.
>
> I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5
> installed an all packages on the latest tagged versions.
>
> On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote:
>>
>> The code usually segfaults after 2-5 hours, and is available at
>> http://github.com/nilshg/LearningModels, however I haven't written it up
>> in a way that is easy to run (right now it depends on some data not
>> included in the repo), so I'll have to restructure a bit before you can run
>> it. I'll try to do so today if I find the time.
>>
>


Re: [julia-users] Re: Double free or corruption (out)

2016-05-31 Thread Nils Gudat
Resurrecting this very old thread - after having been able to solve the 
model with no seg faults over the last couple of months, they have now 
returned and occur much faster (usually within 2 hours of running the code).
I have refactored the code a little so that it hopefully will be possible 
for others to run it. Cloning the entire repo at 
http://github.com/nilshg/LearningModels, it should run when altering the 
path 
in https://github.com/nilshg/LearningModels/blob/master/NHL/NHL_maximize.jl 
to whatever path it has been cloned to.

I'm running this code on a 16-core Ubuntu 14.04 machine with Julia 0.4.5 
installed an all packages on the latest tagged versions.

On Tuesday, September 29, 2015 at 1:43:31 PM UTC+1, Nils Gudat wrote:
>
> The code usually segfaults after 2-5 hours, and is available at 
> http://github.com/nilshg/LearningModels, however I haven't written it up 
> in a way that is easy to run (right now it depends on some data not 
> included in the repo), so I'll have to restructure a bit before you can run 
> it. I'll try to do so today if I find the time.
>


Re: [julia-users] Re: Double free or corruption (out)

2015-09-26 Thread Yichao Yu
On Sat, Sep 26, 2015 at 1:07 PM, Nils Gudat  wrote:
> That's the problem I alluded to in my question: This happened in the middle
> of a very lengthy minimization problem, which had been running for a couple
> of hours. On a previous run, a very similar version of the code finished
> successfully after about 10 hours. I was hoping that someone could at least
> tell me what this error message is about, it seems to be Linux-related and I
> have no clue what's going on.


The error message means that something corrupted the memory. The most
likely reason that causes this I've seen is incorrectly used ccall (or
other unsafe memory stores).
What packages are you using? Do you at least have a list of them that
uses ccall?


Re: [julia-users] Re: Double free or corruption (out)

2015-09-26 Thread Nils Gudat
The minimization itself is NLopt, the problem is to solve an economic model 
(which takes around 2 minutes to solve on 16 cores) and compare its output 
(a 100x4 Float64 Array) to some data moments. The model results depend on 
two parameters. The model itself is mostly minimization (via Optim) and 
numerical integration (using FastGaussQuadrature), and is parallelized via 
SharedArrays. 

(Since you asked for a list of packages, I'm also using ApproXD for linear 
interpolation, and Distributions to draw from a bivariate Normal).


Re: [julia-users] Re: Double free or corruption (out)

2015-09-26 Thread Yichao Yu
On Sat, Sep 26, 2015 at 2:37 PM, Nils Gudat  wrote:
> The minimization itself is NLopt, the problem is to solve an economic model
> (which takes around 2 minutes to solve on 16 cores) and compare its output
> (a 100x4 Float64 Array) to some data moments. The model results depend on
> two parameters. The model itself is mostly minimization (via Optim) and
> numerical integration (using FastGaussQuadrature), and is parallelized via
> SharedArrays.
>
> (Since you asked for a list of packages, I'm also using ApproXD for linear
> interpolation, and Distributions to draw from a bivariate Normal).

Looks like there's at least one segfault in NLopt (AppVeyor Nightly
Win32) and I can reproduce locally with aggressive GC. Will
investigate.


Re: [julia-users] Re: Double free or corruption (out)

2015-09-26 Thread Yichao Yu
> Looks like there's at least one segfault in NLopt (AppVeyor Nightly
> Win32) and I can reproduce locally with aggressive GC. Will
> investigate.

Fixed in https://github.com/JuliaLang/julia/pull/13325
I have no idea if it is the same SegFault/corruption you are seeing or
on the AppVeyor though.