Re: bug? previous vs. current model test in full_analysis

Douglas Kojetin Mon, 17 Sep 2007 15:10:58 -0700

Hi Edward,

No problem.  I am working on this now, and will try to submit the bug  
report later tonight or tomorrow.


Doug


On Sep 17, 2007, at 4:12 PM, Edward d'Auvergne wrote:

> Hi,
>
> In your previous post
> (https://mail.gna.org/public/relax-users/2007-09/msg00011.html,
> Message-id: <[EMAIL PROTECTED]>) I think
> you were spot on with the diagnosis.  The reading of the results files
> with None in all model positions will create variables called 'model'
> with the value set to None.  Then the string comparison will fail
> unless these are skipped as well.  Well done on picking up the exact
> cause of the failure.  This important change will need to go into
> relax before a new release with the more advanced 'full_analysis.py'
> script.
>
> If you would like to post the changes, I can add these to the
> repository.  I'll need to check the change carefully first though, and
> it may only require two lines in the script to be changed.  The best
> way would be to attach a patch to a bug report.  If you could create a
> bug report with a simple summary, that would be appreciated.  Then how
> you report the changes is up to you.  If you change a checked out copy
> of the SVN repository and type 'svn diff > patch', you'll get the
> changes in a patch file which I can then check and commit to the
> repository with your name attached.
>
> Thanks,
>
> Edward
>
>
> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>> As a followup, my changes to full_analysis.py solved my problem.  I
>> will clean up my code and post it within the next day or so.  Would
>> you prefer that I attach the script as an attachment, or inline in an
>> email, or provide a patch, or change the CVS code myself?
>>
>> Doug
>>
>>
>> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:
>>
>>> Hi,
>>>
>>> The problem is likely to be due to a circular looping around very
>>> similar solutions close to the universal solution, as I have defined
>>> in:
>>>
>>> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
>>> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
>>> 2007 Jul;3(7):483-94.
>>>
>>> If you can't get the paper, have a look at Chapter 5 of my PhD
>>> thesis at
>>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The  
>>> problem
>>> is the interlinked mathematical optimisation and statistical model
>>> selection where we are trying to minimise different quantities.  For
>>> mathematical optimisation this is the chi-squared value.  For model
>>> selection this is the quantity known as the descrepancy.  These can
>>> not be optimised together as mathematical optimisation works in a
>>> single fixed-dimension space or universe whereas model selection
>>> operates across multiple spaces with different dimensions.  See the
>>> paper for a more comprehensive description of the issue.
>>>
>>> You should be able to see this if you look at the end of iterations.
>>> If you have 160 iterations, look after iteration 20 (or maybe  
>>> even 30
>>> or further).  Until then, you will not have reached the circular  
>>> loop.
>>>  After that point you will be able to exactly quatify this circular
>>> loop.  You'll be able to determine its periodicity, which  
>>> residues are
>>> involved (probably only 1), and whether the diffusion tensor changes
>>> as model selection changes.
>>>
>>> I mentioned all of this already in my post at
>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
>>> (Message-id:
>>> <[EMAIL PROTECTED]>)
>>> in response to your original post
>>> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
>>> Message-id: <[EMAIL PROTECTED]>).
>>>
>>> I have a few more points about the tests you have done but to  
>>> work out
>>> what is happening with the printouts, it would be very useful to  
>>> have
>>> your modified 'full_analysis.py' script attached.
>>>
>>>
>>>
>>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> I'm unsure if this is a bug in full_analysis.py, in the internal
>>>> relax code, or user error.  The optimization of the 'sphere' model
>>>> will not converge, now after 160+ rounds.   The chi-squared test  
>>>> has
>>>> converged (long, long ago):
>>>
>>> See above.
>>>
>>>
>>>> "" from output
>>>>          Chi-squared test:
>>>>              chi2 (k-1): 100.77647517006251
>>>>              chi2 (k):   100.77647517006251
>>>>              The chi-squared value has converged.
>>>> ""
>>>>
>>>> However, the identical model-free models test does has not  
>>>> converged:
>>>>
>>>> "" from output
>>>>          Identical model-free models test:
>>>>              The model-free models have not converged.
>>>>
>>>>          Identical parameter test:
>>>>              The model-free models haven't converged hence the
>>>> parameters haven't converged.
>>>> ""
>>>>
>>>> Something that confuses me is that the output files in the  
>>>> round_??/
>>>> aic directory suggest that, for example, the round_160 and  
>>>> round_161
>>>> AIC model selections are equivalent.  Here are the models for the
>>>> first few residues:
>>>
>>> Between these 2 rounds, are you sure that all models for all  
>>> residues
>>> are identical?  From your data that you posted at
>>> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
>>> (Message-id: <[EMAIL PROTECTED]>), I
>>> would guess that this is not the case and one or two residues  
>>> actually
>>> do change in their model selections.
>>>
>>>
>>>> ""
>>>>          1 None None
>>>>          2 None None
>>>>          3 None None
>>>>          4 m2 m2
>>>>          5 m2 m2
>>>>          6 m2 m2
>>>>          7 m2 m2
>>>>          8 m2 m2
>>>>          9 m4 m4
>>>>          10 m1 m1
>>>>          11 None None
>>>>          12 m2 m2
>>>>          13 m2 m2
>>>>          14 m1 m1
>>>>          15 m2 m2
>>>>          16 m3 m3
>>>>          17 m3 m3
>>>>          18 None None
>>>> ""
>>>>
>>>> However, I modified the full_analysis.py protocol to print the
>>>> differences in the model selection, within the 'Identical model- 
>>>> free
>>>> model test' section of the 'convergence' definition. Here is the
>>>> beginning of the output (which only contains differences between  
>>>> the
>>>> previous and current rounds):
>>>>
>>>> ""
>>>>          residue 1: prev=None curr=m2
>>>>          residue 2: prev=None curr=m2
>>>>          residue 3: prev=None curr=m2
>>>>          residue 6: prev=m2 curr=m4
>>>>          residue 7: prev=m2 curr=m1
>>>>          residue 9: prev=m4 curr=m2
>>>>          residue 11: prev=None curr=m2
>>>>          residue 12: prev=m2 curr=m3
>>>>          residue 13: prev=m2 curr=m3
>>>>          residue 15: prev=m2 curr=m1
>>>>          residue 16: prev=m3 curr=m2
>>>>          residue 17: prev=m3 curr=m1
>>>>          residue 18: prev=None curr=m3
>>>> ""
>>>
>>> This output is quite strange.  I would need to see the
>>> full_analysis.py script to do more with this.
>>>
>>>
>>>> There should be no data for residues 1-3, 11 and 18 (None), however
>>>> the 'Identical model-free model test' seems as if it ignores  
>>>> residues
>>>> for which 'None' was selected in the curr_model call in the  
>>>> following
>>>> code:
>>>>
>>>> ""
>>>>          # Create a string representation of the model-free  
>>>> models of
>>>> the previous run.
>>>>          prev_models = ''
>>>>          for i in xrange(len(self.relax.data.res['previous'])):
>>>>              if hasattr(self.relax.data.res['previous'][i],  
>>>> 'model'):
>>>>                  #prev_models = prev_models + self.relax.data.res
>>>> ['previous'][i].model
>>>>                  prev_models = prev_models + ' ' +
>>>> self.relax.data.res
>>>> ['previous'][i].model
>>>>
>>>>          # Create a string representation of the model-free  
>>>> models of
>>>> the current run.
>>>>          curr_models = ''
>>>>          for i in xrange(len(self.relax.data.res[run])):
>>>>              if hasattr(self.relax.data.res[run][i], 'model'):
>>>>                  #curr_models = curr_models + self.relax.data.res
>>>> [run]
>>>> [i].model
>>>>                  curr_models = curr_models + ' ' +
>>>> self.relax.data.res
>>>> [run][i].model
>>>> ""
>>>
>>> As residues 1-3, 11 and 18 are deselected, then they will not  
>>> have the
>>> attribute 'model' and hence will not be placed in the prev_models or
>>> curr_models string (which are then compared).
>>>
>>>
>>>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
>>>> 'unresolved' which is read by the full_analysis.py protocol.  I
>>>> created a separate sequence file (variable = SEQUENCE) that  
>>>> contains
>>>> all residues (those with data and those without), instead of  
>>>> using a
>>>> data file (noe data, in the default full_analysis.py file).   
>>>> However,
>>>> these residues are not specified in the data (r1, r2 and noe)  
>>>> files,
>>>> as I did not have data for them.  Should I add them but place  
>>>> 'None'
>>>> in the data and error columns?  Could that be causing the problems?
>>>> Or should I create a bug report for this?
>>>
>>> I'm not sure what you mean by the file '(variable = SEQUENCE)'
>>> statement.  I would need to see the full_analysis.py script to
>>> understand this.  I would assume a file called the value in the
>>> variable 'SEQUENCE'.  In which case, this should not be a  
>>> problem.  As
>>> the data is missing from the files containing the relaxation data,
>>> these residues will not be used in model-free analysis.  They  
>>> will be
>>> automatically deselected.  There is not need to add empty data for
>>> these spin systems into the relaxation data files.  As I said before
>>> at the top of this message and at
>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
>>> (Message-id:
>>> <[EMAIL PROTECTED]>),
>>> the problem is almost guaranteed to be a circular loop of equivalent
>>> solutions circling around the universal solution - the solution
>>> defined within the universal set (the union of all global models
>>> (diffusion tensor + all model-free models of all residues)).  If you
>>> have hit this circular problem, then I suggest you stop running  
>>> relax.
>>>  What I would then do is identify the spin system (or systems)  
>>> causing
>>> the loop, and what the differences are between all members of the
>>> loop.  This you can do by plotting the data and maybe using the
>>> program diff on uncompressed versions of the results files.  It's
>>> likely that the differences are small and inconsequential.  I hope
>>> this helps.
>>>
>>> Regards,
>>>
>>> Edward
>>
>>


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to