Re: bug? previous vs. current model test in full_analysis

Douglas Kojetin Mon, 17 Sep 2007 10:51:32 -0700

My tests are still running, but here is what I think might be  
happening.  I think there is a problem with comparing the string  
representation of the previous and current runs.


When the previous results are loaded in the 'load_tensor' definition  
(starting on line 496), information for all residues is read,  
including residues that have 'None' as a model in the previous  
results.bz2 (AIC) file.  For my 106 residue protein, this essentially  
loads 106 models (for the 106 residues), with the model for some  
residues set to 'None'.  In my example below, the model 'None' would  
be set for residues  1-3, 11 and 18.

The information from the current run does not appear to contain the  
'None' model for residues 1-3, 11 and 18, as you say they should not  
be included in the model free analysis.

Therefore, the literal string test between lines 373-378 fails when  
the string representation of the previous run (the 'prev_models'  
variable, created between lines 363-365) is directly compared to the  
string representation of current run (the 'curr_models' variable,  
created between lines 368-371) within the convergence definition  
(lines 373-378).

Perhaps this test (lines 373-378) should not be a literal string test  
between the prev_models and curr_models variables,  created by  
concatenating the model information, but rather a direct residue-by- 
residue comparison of models in dictionary or list form.

I am working on this now ...

Any thoughts?

Doug

p.s.  if the line numbers for the above references do not match up or  
make sense, please let me know




On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:

> Hi,
>
> The problem is likely to be due to a circular looping around very
> similar solutions close to the universal solution, as I have defined
> in:
>
> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
> 2007 Jul;3(7):483-94.
>
> If you can't get the paper, have a look at Chapter 5 of my PhD  
> thesis at
> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The problem
> is the interlinked mathematical optimisation and statistical model
> selection where we are trying to minimise different quantities.  For
> mathematical optimisation this is the chi-squared value.  For model
> selection this is the quantity known as the descrepancy.  These can
> not be optimised together as mathematical optimisation works in a
> single fixed-dimension space or universe whereas model selection
> operates across multiple spaces with different dimensions.  See the
> paper for a more comprehensive description of the issue.
>
> You should be able to see this if you look at the end of iterations.
> If you have 160 iterations, look after iteration 20 (or maybe even 30
> or further).  Until then, you will not have reached the circular loop.
>  After that point you will be able to exactly quatify this circular
> loop.  You'll be able to determine its periodicity, which residues are
> involved (probably only 1), and whether the diffusion tensor changes
> as model selection changes.
>
> I mentioned all of this already in my post at
> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> (Message-id:  
> <[EMAIL PROTECTED]>)
> in response to your original post
> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
> Message-id: <[EMAIL PROTECTED]>).
>
> I have a few more points about the tests you have done but to work out
> what is happening with the printouts, it would be very useful to have
> your modified 'full_analysis.py' script attached.
>
>
>
> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I'm unsure if this is a bug in full_analysis.py, in the internal
>> relax code, or user error.  The optimization of the 'sphere' model
>> will not converge, now after 160+ rounds.   The chi-squared test has
>> converged (long, long ago):
>
> See above.
>
>
>> "" from output
>>          Chi-squared test:
>>              chi2 (k-1): 100.77647517006251
>>              chi2 (k):   100.77647517006251
>>              The chi-squared value has converged.
>> ""
>>
>> However, the identical model-free models test does has not converged:
>>
>> "" from output
>>          Identical model-free models test:
>>              The model-free models have not converged.
>>
>>          Identical parameter test:
>>              The model-free models haven't converged hence the
>> parameters haven't converged.
>> ""
>>
>> Something that confuses me is that the output files in the round_??/
>> aic directory suggest that, for example, the round_160 and round_161
>> AIC model selections are equivalent.  Here are the models for the
>> first few residues:
>
> Between these 2 rounds, are you sure that all models for all residues
> are identical?  From your data that you posted at
> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> (Message-id: <[EMAIL PROTECTED]>), I
> would guess that this is not the case and one or two residues actually
> do change in their model selections.
>
>
>> ""
>>          1 None None
>>          2 None None
>>          3 None None
>>          4 m2 m2
>>          5 m2 m2
>>          6 m2 m2
>>          7 m2 m2
>>          8 m2 m2
>>          9 m4 m4
>>          10 m1 m1
>>          11 None None
>>          12 m2 m2
>>          13 m2 m2
>>          14 m1 m1
>>          15 m2 m2
>>          16 m3 m3
>>          17 m3 m3
>>          18 None None
>> ""
>>
>> However, I modified the full_analysis.py protocol to print the
>> differences in the model selection, within the 'Identical model-free
>> model test' section of the 'convergence' definition. Here is the
>> beginning of the output (which only contains differences between the
>> previous and current rounds):
>>
>> ""
>>          residue 1: prev=None curr=m2
>>          residue 2: prev=None curr=m2
>>          residue 3: prev=None curr=m2
>>          residue 6: prev=m2 curr=m4
>>          residue 7: prev=m2 curr=m1
>>          residue 9: prev=m4 curr=m2
>>          residue 11: prev=None curr=m2
>>          residue 12: prev=m2 curr=m3
>>          residue 13: prev=m2 curr=m3
>>          residue 15: prev=m2 curr=m1
>>          residue 16: prev=m3 curr=m2
>>          residue 17: prev=m3 curr=m1
>>          residue 18: prev=None curr=m3
>> ""
>
> This output is quite strange.  I would need to see the
> full_analysis.py script to do more with this.
>
>
>> There should be no data for residues 1-3, 11 and 18 (None), however
>> the 'Identical model-free model test' seems as if it ignores residues
>> for which 'None' was selected in the curr_model call in the following
>> code:
>>
>> ""
>>          # Create a string representation of the model-free models of
>> the previous run.
>>          prev_models = ''
>>          for i in xrange(len(self.relax.data.res['previous'])):
>>              if hasattr(self.relax.data.res['previous'][i], 'model'):
>>                  #prev_models = prev_models + self.relax.data.res
>> ['previous'][i].model
>>                  prev_models = prev_models + ' ' +  
>> self.relax.data.res
>> ['previous'][i].model
>>
>>          # Create a string representation of the model-free models of
>> the current run.
>>          curr_models = ''
>>          for i in xrange(len(self.relax.data.res[run])):
>>              if hasattr(self.relax.data.res[run][i], 'model'):
>>                  #curr_models = curr_models + self.relax.data.res 
>> [run]
>> [i].model
>>                  curr_models = curr_models + ' ' +  
>> self.relax.data.res
>> [run][i].model
>> ""
>
> As residues 1-3, 11 and 18 are deselected, then they will not have the
> attribute 'model' and hence will not be placed in the prev_models or
> curr_models string (which are then compared).
>
>
>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
>> 'unresolved' which is read by the full_analysis.py protocol.  I
>> created a separate sequence file (variable = SEQUENCE) that contains
>> all residues (those with data and those without), instead of using a
>> data file (noe data, in the default full_analysis.py file).  However,
>> these residues are not specified in the data (r1, r2 and noe) files,
>> as I did not have data for them.  Should I add them but place 'None'
>> in the data and error columns?  Could that be causing the problems?
>> Or should I create a bug report for this?
>
> I'm not sure what you mean by the file '(variable = SEQUENCE)'
> statement.  I would need to see the full_analysis.py script to
> understand this.  I would assume a file called the value in the
> variable 'SEQUENCE'.  In which case, this should not be a problem.  As
> the data is missing from the files containing the relaxation data,
> these residues will not be used in model-free analysis.  They will be
> automatically deselected.  There is not need to add empty data for
> these spin systems into the relaxation data files.  As I said before
> at the top of this message and at
> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> (Message-id:  
> <[EMAIL PROTECTED]>),
> the problem is almost guaranteed to be a circular loop of equivalent
> solutions circling around the universal solution - the solution
> defined within the universal set (the union of all global models
> (diffusion tensor + all model-free models of all residues)).  If you
> have hit this circular problem, then I suggest you stop running relax.
>  What I would then do is identify the spin system (or systems) causing
> the loop, and what the differences are between all members of the
> loop.  This you can do by plotting the data and maybe using the
> program diff on uncompressed versions of the results files.  It's
> likely that the differences are small and inconsequential.  I hope
> this helps.
>
> Regards,
>
> Edward


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to