Re: Extremely long optimization times

Sebastien Morin Wed, 24 Oct 2007 10:48:16 -0700

Hi Ed

I didn't have time to try your tips, but they should help me out when I
try to run the full_analysis.py script again...


I'll let you know if it works well or if I still get long computation
times...

Cheers


Séb  :)



Edward d'Auvergne wrote:
> Hi,
>
> On 9/17/07, Sebastien Morin <[EMAIL PROTECTED]> wrote:
>   
>>  Hi Ed,
>>
>>  First, there were some bad assignments in my data set. I used the automatic
>> assignment (which takes an assigned peak list and propagates it to other
>> peak lists) procedure within NMRPipe for the first time and some peaks were
>> badly assigned.
>>     
>
> Although a problem because of the bond vector orientation, the effect
> of this should not be long computation times just incorrect internal
> motions.
>
>
>   
>>  Second, the PDB file is quite good as it is a representative conformation
>> from a 60 ns MD simulation using CHARMM. That said, the protein moves in the
>> simulation and, hence, the orientations also change. I could take another
>> conformation, which is what I'll do to cross-validate my models, but
>> nevertheless the orientations will change and subtil changes will appear.
>> This shouldn't be an issue since the vectors that move a lot in the
>> simulations should have correlating relaxation properties and that should be
>> seen in the models chosen.
>>     
>
> The orientation changes should only affect the Euler angle values of
> the diffusion tensor.  Nothing else should be affected by this.  The
> internal motions of the simulation will affect the results of the
> analysis, but the overall orientation really doesn't matter unless you
> are comparing these Euler angles.
>
>
>   
>>  Third, here are the stats for the ellipsoid optimization :
>>
>>  round  t_total_(h)  t_opt_(h)  iter_opt  model_change  tm       a     b
>>  g      chi2                  comments
>>  =====  ===========  =========  ========  ============  ======   ====  =====
>>  ====   ==================    =======================
>>   1     146          144        207       ---           12.423   18.8  159.7
>>  99.1   9282.2280010132217    ok
>>   2      49           47         62       215           12.463   74.7  152.0
>>  94.3   8793.0777454789404    ok
>>   3      16           14         19        16           12.448   78.0  152.3
>>  96.9   8767.5325004348124    ok
>>   4      12           10         13         1           12.445   80.2  151.9
>>  97.9   8765.5659442063006    ok
>>   5      19           17         23         2           12.445   83.1  151.7
>>  98.3   8761.0001889287214    ok
>>   6      25           23         27         1           12.452   80.9  151.4
>>  96.2   8744.6870170285692    ok
>>   7      16           14         19         1           12.445   83.1  151.7
>>  98.3   8761.0001889287269    almost_5
>>   8      25           23         28         1           12.452   80.9  151.4
>>  96.2   8744.6870170285729    almost_6
>>   9      14           12         17         1           12.445   83.1  151.7
>>  98.3   8761.0001889287269    almost_5_and_exactly_7
>>  10      29           27         33         1           12.452   80.9  151.4
>>  96.2   8744.6870170285656    almost_6_and_8
>>  11      stopped...................................
>>     
>
> Are these states from the results in the 'opt' directories?  Can you
> possibly pin-point where in the calculation the problem is?  One
> option is to increase the verbosity flag 'print_flag' in the
> minimise() user function.  This may help in seeing the problem.
>
>
>   
>>  As you can see, there is a kind of interchange between two runs in the end
>> of the optimization. In fact, from the iteration 5 on, there is only one
>> residue for which the model is changing, it's always the same. It changes
>> from model 5 to 6 and 6 to 5... with a tf of ~17, a ts of ~25000 and a S2 of
>> ~0.73 (chi2 ~40 in aic file, but then with ts ~ 1200) when with model 6 and
>> ts of ~650 and S2 of ~0.78 when with model 5 (chi2 ~50 in aic file). How
>> come a so high ts (25000) isn't eliminated..?
>>     
>
> In mathematical modelling, model elimination or model validation must
> occur prior to the model selection step.  This is when ts is at ~1.2
> ns, and hence the model is not eliminated.  The final optimisation is
> shifting ts up to 25 ns, and this is likely to be the thing causing
> the optimisation to take soooo long!  Is there something particular
> with this residue?
>
> The iteration numbers are low, but these may be the number of
> iterations of the method of multipliers algorithm.  For each iteration
> there could possibly be thousands of steps of the Newton subalgorithm.
>  I can't remember how the iteration number is generated, but the
> print_flag option may show if this is the case.
>
>
>   
>>  round   AIC_or_OPT  model   S2    S2f   S2s   tf      ts      chi2
>>  =====   ==========  =====   ===   ====  ====  ======  ======  =========
>>   9      AIC         5       0.78  0.96  0.81  None      698   52
>>  10      AIC         6       0.78  0.97  0.80  11.2     1173   39
>>   9      OPT         5       0.78  0.96  0.81  None      630   ---
>>  10      OPT         6       0.73  0.93  0.79  16.8    24904   ---
>>
>>
>>  Fourth, the previous runs were made on 4 different computers which give
>> almost exactly the same calculation time, maybe differing from 10-15 %...
>> This shouldn't be what's causing those so extremely long times...
>>     
>
> This is unlikely to be the problem, but I was just wondering in case
> there was an operating system or platform specific bug possibly in the
> Numeric code.
>
>
>   
>>  Fifth, I used the default algorithm whithin the full_analysis.py script.
>> How can I change the optimization algorithm so it's a two stage procedure
>> like you proposed ? Should I run several times with MIN_ALGOR = 'simplex'
>> and, after a few runs (maybe when the chi2 and number of iterations get to a
>> plateau) switch to MIN_ALGOR = 'newton' ?
>>     
>
> Simply have two lines, one after the other, in the code where the
> minimise() user function is located.  I.e. in the current 1.2
> repository line file 'full_analysis.py':
>
> # Minimise all parameters.
> minimise('simplex', run=name)
> minimise(MIN_ALGOR, run=name)
>
> # Write the results.
> ...
>
>
> That should be enough to solve the problem (hopefully).
>
> Cheers,
>
> Edward
>
>
>
>   
>>  I think that's almost everything I can find now...
>>
>>  Let me know if you know how to catch those problems before they appear...
>>
>>  Cheers
>>
>>
>>  Séb  :)
>>
>>
>>
>>
>>
>>
>>  Edward d'Auvergne wrote:
>>  Hi,
>>
>> I've been trying to think of what could possibly be causing these
>> really long times, but I'm really not sure what is happening.
>> Unfortunately there just was not enough information in the post to
>> decipher the key to this problem. Is there something special about
>> those 7 residues? How accurate do you think their orientations are in
>> the PDB file you are using? And how accurate is the PDB file itself
>> with respect to all parts of the system?
>>
>> Have you had a chance to investigate further as to what the issue
>> might be? For example, which part of the calculation is taking the
>> time? Is it the global optimisation of all parameters? Are the final
>> results of each round similar or completely different (selected model
>> wise and parameter value wise). How do the iteration numbers compare
>> at each stage. Essentially a fine analysis and comparison of the
>> results files and the printout from relax will be necessary to track
>> down this abnormal computation time. Oh, are you running these on the
>> same computer as the previous analysis?
>>
>> As for the optimisation algorithm being stuck, if you've used the
>> default algorithm then this shouldn't happen. Optimisation should
>> terminate. There are certain very rare situations where the algorithm
>> known as the GMW Hessian modification, which is used by default as a
>> subalgorithm by the Newton algorithm in relax, can take large amounts
>> of time to complete. You'll see this as a increase in the number of
>> iterations by 4 to 5 orders of magnitude. One way to test this is to
>> use a lower quality optimisation algorithm first and then complete to
>> high precision with the Newton algorithm. In this case I would use
>> simplex first followed by the default Newton algorithm and its default
>> subalgorithms. In all cases constraints should be used. This will
>> only solve the long computation times if the GMW algorithm is at
>> fault.
>>
>> Regards,
>>
>> Edward
>>
>>
>> On 9/4/07, Sebastien Morin <[EMAIL PROTECTED]> wrote:
>>
>>
>>  Hi all,
>>
>> I am using the full_analysis.py script with data a three magnetic fields.
>>
>> After a first complete cycle (going through the final optimization), I
>> realized that a few residues had extremely high chi-squared values (>
>> 1000) no matter the diffusion model or model-free model chosen...
>>
>> So I removed those residues (7 out of 222) and started the full_analysis
>> protocole again.
>>
>> However, the optimization times are now extremely long and I should get
>> the final results in weeks...
>>
>>
>> Here are the available times (for local_tm, sphere and ellipsoid) :
>>
>>
>> Diffusion_model Round Time-before_N=222 X2
>> Time-now_N=215 X2
>> =============== ===== ================= =======
>> ============== =======
>> local_tm --- 12h30 45949
>> 14h30 5802 OK, X2 much smaller
>>
>> sphere init --- 1154338 ---
>>  249255
>>  1 2h30 65654 36h00
>>  10303 Long, but X2 much smaller
>>  2 2h30 65654 > 30h00
>>
>> ellipsoid init --- 753535
>> --- 177764
>>  1 4h00 64592 >
>> 67h00 ??
>>  2 2h30 64592
>> not_there_yet
>>
>> Is it possible that the algorithms get stuck somewhere during the
>> optimization..?
>>
>> I thought that removing badly fit residues would, on the contrary, speed
>> up calculations...
>>
>> Thanks for ideas !
>>
>>
>> Sébastien :)
>>
>> --
>>  ______________________________________
>>  _______________________________________________
>>  | |
>>  || Sebastien Morin ||
>>  ||| Etudiant au PhD en biochimie |||
>>  |||| Laboratoire de resonance magnetique nucleaire ||||
>> ||||| Dr Stephane Gagne |||||
>>  |||| CREFSIP (Universite Laval, Quebec, CANADA) ||||
>>  ||| 1-418-656-2131 #4530 |||
>>  || ||
>>  |_______________________________________________|
>>  ______________________________________
>>
>>
>>
>> _______________________________________________
>> relax (http://nmr-relax.com)
>>
>> This is the relax-users mailing list
>> [email protected]
>>
>> To unsubscribe from this list, get a password
>> reminder, or change your subscription options,
>> visit the list information page at
>> https://mail.gna.org/listinfo/relax-users
>>
>>
>>
>>
>>
>>  --
>>  ______________________________________
>>  _______________________________________________
>>  | |
>>  || Sebastien Morin ||
>>  ||| Etudiant au PhD en biochimie |||
>>  |||| Laboratoire de resonance magnetique nucleaire ||||
>> ||||| Dr Stephane Gagne |||||
>>  |||| CREFSIP (Universite Laval, Quebec, CANADA) ||||
>>  ||| 1-418-656-2131 #4530 |||
>>  || ||
>>  |_______________________________________________|
>>  ______________________________________
>>
>>
>>     
>
>   

-- 
         ______________________________________    
     _______________________________________________
    |                                               |
   || Sebastien Morin                               ||
  ||| Etudiant au PhD en biochimie                  |||
 |||| Laboratoire de resonance magnetique nucleaire ||||
||||| Dr Stephane Gagne                             |||||
 |||| CREFSIP (Universite Laval, Quebec, CANADA)    ||||
  ||| 1-418-656-2131 #4530                          |||
   ||                                               ||
    |_______________________________________________|
         ______________________________________

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Extremely long optimization times

Reply via email to