Re: Extremely long optimization times

Edward d'Auvergne Thu, 20 May 2010 00:33:45 -0700

Dear all,

A long standing feature request (or bug fix depending on your
perspective ;) has been fixed in relax.  This is the problem of the
full_analysis.py script looping forever.  If you have encountered this
problem, I would be extremely grateful if you could test this to see
if my fix works.  Cheers!  More information is given in the commit
message:


------------------------------------------------------------------------
r11208 | bugman | 2010-05-20 09:24:42 +0200 (Thu, 20 May 2010) | 16 lines
Changed paths:
   M /1.3/auto_analyses/dauvergne_protocol.py

Bug fix for catching the looping around the universal solution in the
dauvergne_protocol module.

This was first identified by Seb in the post at
https://mail.gna.org/public/relax-users/2007-09/msg00010.html
(Message-id: <[email protected]>).

The problem was the automatic looping over optimisation cycles in the
full_analysis.py script.  This code is now in the dauvergne_protocol
auto_analyses module.  The issue was a never-ending looping around the
universal solution (the optimisation minimum combined with Occam's
razor or the model selection minimum).  This should now be caught and
the protocol terminated at the end of the first completed loop.  The
fix was to compare the chi2, selected models, diffusion tensor, and
model-free parameters to every iteration, starting from the first.
This will not be optimal if the protocol is interrupted in the middle
of one such loop, but will just mean that a few extra iterations will
be required to complete the loop.

------------------------------------------------------------------------

Obtaining this new code requires the program Subversion to be
installed.  If installed, the new code can be checked out from the
repository by typing:

$ svn co svn://svn.gna.org/svn/relax/1.3 relax-1.3

or if this doesn't work:

$ svn co http://svn.gna.org/svn/relax/1.3 relax-1.3

If you already have a checked out copy, try typing:

$ svn up

in the base directory to update to the most current version.  Running
the file 'relax' in the new 'relax-1.3' directory is sufficient.  The
new full_analysis.py sample script will have to be used instead of
your old scripts.  Your help in checking this would be very much
appreciated, as I have no test cases to check against.

Cheers,

Edward



On 17 September 2007 18:09, Sebastien Morin <[email protected]> wrote:
> Hi Ed,
>
> First, there were some bad assignments in my data set. I used the automatic
> assignment (which takes an assigned peak list and propagates it to other
> peak lists) procedure within NMRPipe for the first time and some peaks were
> badly assigned.
>
> Second, the PDB file is quite good as it is a representative conformation
> from a 60 ns MD simulation using CHARMM. That said, the protein moves in the
> simulation and, hence, the orientations also change. I could take another
> conformation, which is what I'll do to cross-validate my models, but
> nevertheless the orientations will change and subtil changes will appear.
> This shouldn't be an issue since the vectors that move a lot in the
> simulations should have correlating relaxation properties and that should be
> seen in the models chosen.
>
> Third, here are the stats for the ellipsoid optimization :
>
> round  t_total_(h)  t_opt_(h)  iter_opt  model_change  tm       a     b
> g      chi2                  comments
> =====  ===========  =========  ========  ============  ======   ====  =====
> ====   ==================    =======================
>  1     146          144        207       ---           12.423   18.8  159.7
> 99.1   9282.2280010132217    ok
>  2      49           47         62       215           12.463   74.7  152.0
> 94.3   8793.0777454789404    ok
>  3      16           14         19        16           12.448   78.0  152.3
> 96.9   8767.5325004348124    ok
>  4      12           10         13         1           12.445   80.2  151.9
> 97.9   8765.5659442063006    ok
>  5      19           17         23         2           12.445   83.1  151.7
> 98.3   8761.0001889287214    ok
>  6      25           23         27         1           12.452   80.9  151.4
> 96.2   8744.6870170285692    ok
>  7      16           14         19         1           12.445   83.1  151.7
> 98.3   8761.0001889287269    almost_5
>  8      25           23         28         1           12.452   80.9  151.4
> 96.2   8744.6870170285729    almost_6
>  9      14           12         17         1           12.445   83.1  151.7
> 98.3   8761.0001889287269    almost_5_and_exactly_7
> 10      29           27         33         1           12.452   80.9  151.4
> 96.2   8744.6870170285656    almost_6_and_8
> 11      stopped...................................
>
> As you can see, there is a kind of interchange between two runs in the end
> of the optimization. In fact, from the iteration 5 on, there is only one
> residue for which the model is changing, it's always the same. It changes
> from model 5 to 6 and 6 to 5... with a tf of ~17, a ts of ~25000 and a S2 of
> ~0.73 (chi2 ~40 in aic file, but then with ts ~ 1200) when with model 6 and
> ts of ~650 and S2 of ~0.78 when with model 5 (chi2 ~50 in aic file)
> . How come a so high ts (25000) isn't eliminated..?
>
> round   AIC_or_OPT  model   S2    S2f   S2s   tf      ts      chi2
> =====   ==========  =====   ===   ====  ====  ======  ======  =========
>  9      AIC         5       0.78  0.96  0.81  None      698   52
> 10      AIC         6       0.78  0.97  0.80  11.2     1173   39
>  9      OPT         5       0.78  0.96  0.81  None      630   ---
> 10      OPT         6       0.73  0.93  0.79  16.8    24904   ---
>
>
> Fourth, the previous runs were made on 4 different computers which give
> almost exactly the same calculation time, maybe differing from 10-15 %...
> This shouldn't be what's causing those so extremely long times...
>
> Fifth, I used the default algorithm whithin the full_analysis.py script. How
> can I change the optimization algorithm so it's a two stage procedure like
> you proposed ? Should I run several times with MIN_ALGOR = 'simplex' and,
> after a few runs (maybe when the chi2 and number of iterations get to a
> plateau) switch to MIN_ALGOR = 'newton' ?
>
> I think that's almost everything I can find now...
>
> Let me know if you know how to catch those problems before they appear...
>
> Cheers
>
>
> Séb  :)
>
>
>
>
>
> Edward d'Auvergne wrote:
>
> Hi,
>
> I've been trying to think of what could possibly be causing these
> really long times, but I'm really not sure what is happening.
> Unfortunately there just was not enough information in the post to
> decipher the key to this problem.  Is there something special about
> those 7 residues?  How accurate do you think their orientations are in
> the PDB file you are using?  And how accurate is the PDB file itself
> with respect to all parts of the system?
>
> Have you had a chance to investigate further as to what the issue
> might be?  For example, which part of the calculation is taking the
> time?  Is it the global optimisation of all parameters?  Are the final
> results of each round similar or completely different (selected model
> wise and parameter value wise).  How do the iteration numbers compare
> at each stage.  Essentially a fine analysis and comparison of the
> results files and the printout from relax will be necessary to track
> down this abnormal computation time.  Oh, are you running these on the
> same computer as the previous analysis?
>
> As for the optimisation algorithm being stuck, if you've used the
> default algorithm then this shouldn't happen.  Optimisation should
> terminate.  There are certain very rare situations where the algorithm
> known as the GMW Hessian modification, which is used by default as a
> subalgorithm by the Newton algorithm in relax, can take large amounts
> of time to complete.  You'll see this as a increase in the number of
> iterations by 4 to 5 orders of magnitude.  One way to test this is to
> use a lower quality optimisation algorithm first and then complete to
> high precision with the Newton algorithm.  In this case I would use
> simplex first followed by the default Newton algorithm and its default
> subalgorithms.  In all cases constraints should be used.  This will
> only solve the long computation times if the GMW algorithm is at
> fault.
>
> Regards,
>
> Edward
>
>
> On 9/4/07, Sebastien Morin <[email protected]> wrote:
>
>
> Hi all,
>
> I am using the full_analysis.py script with data a three magnetic fields.
>
> After a first complete cycle (going through the final optimization), I
> realized that a few residues had extremely high chi-squared values (>
> 1000) no matter the diffusion model or model-free model chosen...
>
> So I removed those residues (7 out of 222) and started the full_analysis
> protocole again.
>
> However, the optimization times are now extremely long and I should get
> the final results in weeks...
>
>
> Here are the available times (for local_tm, sphere and ellipsoid) :
>
>
> Diffusion_model    Round      Time-before_N=222  X2
> Time-now_N=215  X2
> ===============    =====      =================  =======
> ==============  =======
> local_tm           ---          12h30              45949
> 14h30            5802    OK, X2 much smaller
>
> sphere             init         ---              1154338        ---
>       249255
>                    1             2h30              65654        36h00
>         10303    Long, but X2 much smaller
>                    2             2h30              65654      > 30h00
>
> ellipsoid          init         ---               753535
> ---            177764
>                    1             4h00              64592      >
> 67h00                    ??
>                    2             2h30              64592
> not_there_yet
>
> Is it possible that the algorithms get stuck somewhere during the
> optimization..?
>
> I thought that removing badly fit residues would, on the contrary, speed
> up calculations...
>
> Thanks for ideas !
>
>
> Sébastien  :)
>
> --
>          ______________________________________
>      _______________________________________________
>     |                                               |
>    || Sebastien Morin                               ||
>   ||| Etudiant au PhD en biochimie                  |||
>  |||| Laboratoire de resonance magnetique nucleaire ||||
> ||||| Dr Stephane Gagne                             |||||
>  |||| CREFSIP (Universite Laval, Quebec, CANADA)    ||||
>   ||| 1-418-656-2131 #4530                          |||
>    ||                                               ||
>     |_______________________________________________|
>          ______________________________________
>
>
>
> _______________________________________________
> relax (http://nmr-relax.com)
>
> This is the relax-users mailing list
> [email protected]
>
> To unsubscribe from this list, get a password
> reminder, or change your subscription options,
> visit the list information page at
> https://mail.gna.org/listinfo/relax-users
>
>
>
>
>
> --
>          ______________________________________
>      _______________________________________________
>     |                                               |
>    || Sebastien Morin                               ||
>   ||| Etudiant au PhD en biochimie                  |||
>  |||| Laboratoire de resonance magnetique nucleaire ||||
> ||||| Dr Stephane Gagne                             |||||
>  |||| CREFSIP (Universite Laval, Quebec, CANADA)    ||||
>   ||| 1-418-656-2131 #4530                          |||
>    ||                                               ||
>     |_______________________________________________|
>          ______________________________________
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Extremely long optimization times

Reply via email to