Re: influence of pdb orientation on model-free optimization?

Douglas Kojetin Fri, 08 Feb 2008 11:08:04 -0800

Hi Edward,

For the first run listed below (tensor, # runs to convergence):


sphere, 4
prolate, 7
oblate, 9
ellipsoid, 9

For the second run listed below (tensor, # runs to convergence):
sphere, 4
prolate, 5
oblate, 9
ellipsoid, 9

Doug


On Feb 7, 2008, at 4:03 AM, Edward d'Auvergne wrote:

> Hi,
>
> Something strange is happening in your analysis.  Unfortunately with
> the limited information in your post, I really cannot start to track
> it down.  The fact that the local_tm and sphere runs are the same in
> both is a good sign.  Do you have the number of iterations required
> for the convergence of each tensor?
>
> The full_analysis.py script should be insensitive to the input
> structure, but there is one point at which this might not be exactly
> true.  The grid search for the initial tensor parameter values prior
> to minimisation is dependent.  You can think of the grid search as a
> cage which stays fixed in space while the molecule spins around inside
> it.  But the subsequent Newton optimisation should easily recover from
> the small differences of the increments between grid points.  That
> might be a place to start looking though.
>
> The only point in relax where a random number generator is utilised is
> in the Monte Carlo simulations.  It might also be worth looking at the
> Dr value in the ellipsoid optimisations.  If this value is close to
> zero, then the results from the prolate and ellipsoid diffusion
> tensors may actually be almost the same.  In which case, the
> differences don't matter.  For understanding molecule motions, the
> model itself is of no interest - this is, of course, a simple exercise
> in the field of mathematical modelling (a mathematics library will
> show you the extent of this field).  Note that it is what the model
> says about the dynamics which is of interest, not the details of the
> model itself.  So two completely different models may actually say the
> same thing, but maybe with small statistically insignificant
> difference.
>
> That being said, this problem should not occur.  But as I cannot do
> anything with the limited information, you may need to hunt down where
> the problem lies yourself (whether in relax, the operation of relax,
> the use of the quadric_diffusion program, or elsewhere).
>
> Regards,
>
> Edward
>
>
>
> On Jan 30, 2008 3:24 PM, Douglas Kojetin  
> <[EMAIL PROTECTED]> wrote:
>> Hi Edward,
>>
>> As a followup to this, I performed two relax runs using six datasets
>> (r1, r2 and noe at two fields) with two identical structures, but one
>> had been rotated/translated using the quadric_diffusion program
>> provided by the Palmer lab.  For one structure, a prolate tensor is
>> chosen, whereas an elliposid tensor is chosen for the rotated/
>> translated structure:
>>
>> ## ORIGINAL PDB
>> Run                  Chi2                 Criterion
>> local_tm             102.67810            870.67810
>> sphere               177.96407            807.96407
>> prolate              152.70721            796.70721
>> oblate               178.61058            810.61058
>> ellipsoid           155.78475            801.78475
>>
>> The model from the run 'prolate' has been selected.
>>
>> ## ROTATED/TRANSLATED PDB
>> Run                  Chi2                 Criterion
>> local_tm             102.67810            870.67810
>> sphere               177.96407            807.96407
>> prolate              175.13432            803.13432
>> oblate               178.61979            810.61979
>> ellipsoid            155.82168            801.82168
>>
>> The model from the run 'ellipsoid' has been selected.
>>
>>
>> There are no differences in the models selected for two of the three
>> structure-dependent runs (oblate and ellipsoid tensor runs) , but
>> there are a handful of differences in the models selected for the
>> prolate tensor runs.  Is the full_analysis protocol sensitive to the
>> orientation of the input structure, or could this be a result of
>> different runs using something equivalent to different random number
>> seeds?
>>
>> Doug
>>
>>
>>
>>
>> On Jan 10, 2008, at 2:36 PM, Edward d'Auvergne wrote:
>>
>>> Yes, with 4 data sets you could remove tm6 to tm8.  You would also
>>> need to remove m8.  But in this situation, you will be significantly
>>> biasing the initial position (the starting universe will be further
>>> away from that of the universal solution).  I don't know how well  
>>> this
>>> new protocol will perform 4 data sets, i.e. this is untested, but I
>>> would be highly reluctant to trust it.  The relaxation data type and
>>> field strength will be very important.  I would even be wary using 5
>>> data sets, especially if the missing data set is the higher-field  
>>> NOE.
>>>  So I would never recommend using 4 data sets.
>>>
>>> Regards,
>>>
>>> Edward
>>>
>>>
>>> On Jan 10, 2008 8:12 PM, Douglas Kojetin
>>> <[EMAIL PROTECTED]> wrote:
>>>> Hi Edward,
>>>>
>>>> Thanks for the response.  So, with 5 relaxation data sets, only tm8
>>>> should be removed -- no need to remove m8 as well?  Also, if only 4
>>>> relaxation data sets were available, could {tm6-8 and m8} be  
>>>> removed
>>>> to use the full_analysis.py protocol?
>>>>
>>>> Thanks,
>>>> Doug
>>>>
>>>>
>>>>
>>>> On Jan 10, 2008, at 1:31 PM, Edward d'Auvergne wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> If you have 5 relaxation data sets, you can use the  
>>>>> full_analysis.py
>>>>> script but you will need to remove model tm8.  This is the only
>>>>> model
>>>>> with 6 parameters and doing the analysis without it might just  
>>>>> work
>>>>> (the other tm0 to tm9 models may compensate adequately).
>>>>>
>>>>> I've looked at the script and it seems fine.  I think the issue is
>>>>> that the model-free problem is not simply an optimisation  
>>>>> issue.  It
>>>>> is the simultaneous combination of global optimisation  
>>>>> (mathematics)
>>>>> with model selection (statistics).  You are not searching for the
>>>>> global minimum in one space, as in a normal optimisation problem,
>>>>> but
>>>>> for the global minimum across and enormous number of spaces
>>>>> simultaneously.  I formulated the totality of this problem  
>>>>> using set
>>>>> theory here http://www.rsc.org/Publishing/Journals/MB/article.asp?
>>>>> doi=b702202f
>>>>> or in my PhD thesis at
>>>>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  In your
>>>>> script, the CONV_LOOP flag allows you to automatically loop over
>>>>> many
>>>>> global optimisations.  Each iteration of the loop is the
>>>>> mathematical
>>>>> optimisation part.  But the entire loop itself allows for the
>>>>> sliding
>>>>> between these different spaces.  Note that this is a very, very
>>>>> complex problem involving huge numbers spaces or universes,  
>>>>> each of
>>>>> which consists of a large number of dimensions.  There was a  
>>>>> mistake
>>>>> in my Molecular BioSystems paper in that the number of spaces is
>>>>> really equal to n*m^l where n is the number of diffusion models,
>>>>> m is
>>>>> the number of model-free models (10 if you use m0 to m9), and l
>>>>> is the
>>>>> number of spin systems.  So if you have 200 residues, the  
>>>>> number of
>>>>> spaces is on the order of 10 to the power of 200.  The number of
>>>>> dimensions for this system is on the order of 10^2 to 10^3.  So  
>>>>> the
>>>>> problem is to find the 'best' minimum in 10^200 spaces, each
>>>>> consisting of 10^2 to 10^3 dimensions (the universal solution  
>>>>> or the
>>>>> solution in the universal set).  The problem is just a little more
>>>>> complex than most people think!!!
>>>>>
>>>>> So, my opinion of the problem is that the starting position of
>>>>> one of
>>>>> the 2 solutions is not good.  In one (or maybe both) you are
>>>>> stuck in
>>>>> the wrong universe (out of billions of billions of billions of
>>>>> billions....).  And you can't slide out of that universe using the
>>>>> looping procedure in your script.  That's why I designed the new
>>>>> model-free analysis protocol used by the full_analysis.py script
>>>>> (http://www.springerlink.com/content/u170k174t805r344/?
>>>>> p=23cf5337c42e457abe3e5a1aeb38c520&pi=3
>>>>> or the thesis again).  The aim of this new protocol is so that you
>>>>> start in a universe much closer to the one with the universal
>>>>> solution
>>>>> that you can ever get with the initial diffusion tensor estimate.
>>>>> Then you can easily slide, in less than 20 iterations, to the
>>>>> universal solution using the looping procedure.  For a published
>>>>> example of this type of failure, see the section titled  
>>>>> "Failure of
>>>>> the diffusion seeded paradigm" in the previous link to the
>>>>> "Optimisation of NMR dynamic models II" paper.
>>>>>
>>>>> Does this description make sense?  Does it answer all your
>>>>> questions?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>>
>>>>>
>>>>>
>>>>> On Jan 10, 2008 5:49 PM, Douglas Kojetin
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I am working with five relaxation data sets (r1, r2 and noe at  
>>>>>> 400
>>>>>> MHz; r1 and r2 and 600 MHz), and therefore cannot use the
>>>>>> full_analysis.py protocol.  I have obtained estimates  for tm,
>>>>>> Dratio, theta and phi using Art Palmer's quadric_diffusion  
>>>>>> program.
>>>>>> I modified the full_analysis.py protocol to optimize a prolate
>>>>>> tensor
>>>>>> using these estimates (attached file: mod.py).  I have performed
>>>>>> the
>>>>>> optimization of the prolate tensor using either (1) my original
>>>>>> structure or (2) the same structure rotated and translated by the
>>>>>> quadric_diffusion program.  It seems that relax does not
>>>>>> converge to
>>>>>> a single global optimum, as different values of tm, Da, theta
>>>>>> and phi
>>>>>> are reported.
>>>>>>
>>>>>> Using my original structure:
>>>>>> #tm = 6.00721299718e-09
>>>>>> #Da = 14256303.3975
>>>>>> #theta = 11.127323614211441
>>>>>> #phi = 62.250251959733312
>>>>>>
>>>>>> Using the rotated/translated structure by the quadric_diffusion
>>>>>> program:
>>>>>> #tm = 5.84350638161e-09
>>>>>> #Da = 11626835.475
>>>>>> #theta = 8.4006873071400197
>>>>>> #phi = 113.6068898953142
>>>>>>
>>>>>> The only difference between the two calculations is the  
>>>>>> orientation
>>>>>> of the input PDB structure file.  For another set of five rates
>>>>>> (different protein), there is a >0.3 ns difference in the  
>>>>>> converged
>>>>>> tm values.
>>>>>>
>>>>>> Is my modified protocol (in mod.py) setup properly?  Or is this a
>>>>>> more complex issue in the global optimization?  In previous
>>>>>> attempts,
>>>>>> I've also noticed that separate runs with differences in the
>>>>>> estimates for Dratio, theta and phi also converge to different
>>>>>> optimized diffusion tensor variables.
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> relax (http://nmr-relax.com)
>>>>>>
>>>>>> This is the relax-users mailing list
>>>>>> [email protected]
>>>>>>
>>>>>> To unsubscribe from this list, get a password
>>>>>> reminder, or change your subscription options,
>>>>>> visit the list information page at
>>>>>> https://mail.gna.org/listinfo/relax-users
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: influence of pdb orientation on model-free optimization?

Reply via email to