One thought - who ELSE is using your cluster now? Maybe there is another
task which is playing mean with system level memory access? I have seen
some weird things when multiple heavy-duty processes *should* be playing
nice, that also silenced the canaries in my scripts.
One useful thing (which your sysadmin may already be doing!) is to use
rrdtool to log/watch system level behavior and the behavior of your task as
it runs. Sometimes interesting anomalies can pinpoint a problem you didn't
see before.
Does this happen with other regressors (SGD specifically)? Or only Ridge?
Another idea, that totally avoids this diagnosis procedure, would be to do
PCA or another dimensionality reduction (if you haven't already done so) to
reduce the 800 dimensions down to ~100, which seemed to work well for you.
You do lose some information, but you could probably look at the explained
variance to make sure that the 100 reduced components describe a large
amount (ideally 90+%) of your variance.
That said, at 94GB already, buying more RAM is probably not the choice :)
On Tue, May 27, 2014 at 5:16 PM, Chris Holdgraf <[email protected]>wrote:
> So, the strange thing about this is that I've definitely run regressions
> with larger matrices in the past, and haven't had issues before. This is on
> a cluster with ~94 gigs of ram, and in the past I've exceeded this limit
> and it has usually thrown an error (one of our sysadmin's scripts), not
> silently hung.
>
> Chris
>
> From: Kyle Kastner <[email protected]>
>> To: [email protected]
>> Cc:
>> Date: Tue, 27 May 2014 15:48:20 -0500
>> Subject: Re: [Scikit-learn-general] Anyone experience hanging when
>> parallelizing fits?
>> What is your overall memory usage like when this happens? Sounds like
>> classic memory swapping/thrashing to me - what are your system specs?
>> One quick thing to try might be to change the dtype of the matrices to
>> save some space. float32 vs float64 can make a large memory difference if
>> you don't need double precision. Also as far as I know, sklearn/joblib
>> doesn't do any kind of scheduling or optimization based on available
>> resources, though someone may correct me here. This means that if required
>> memory to run n jobs is >> than your system memory, very bad things (TM)
>> will happen
>>
>>
>> --
>> _____________________________________
>> PhD Candidate in Neuroscience | UC Berkeley
>> <http://hwni.org/>Editor and Web Master | Berkeley Science Review
>> <http://sciencereview.berkeley.edu/>
>> _____________________________________
>>
>
>
> ------------------------------------------------------------------------------
> The best possible search technologies are now affordable for all companies.
> Download your FREE open source Enterprise Search Engine today!
> Our experts will assist you in its installation for $59/mo, no commitment.
> Test it for FREE on our Cloud platform anytime!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
The best possible search technologies are now affordable for all companies.
Download your FREE open source Enterprise Search Engine today!
Our experts will assist you in its installation for $59/mo, no commitment.
Test it for FREE on our Cloud platform anytime!
http://pubads.g.doubleclick.net/gampad/clk?id=145328191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general