Re: bug? previous vs. current model test in full_analysis

Douglas Kojetin Mon, 17 Sep 2007 09:05:42 -0700

Hi Edward,

I am running a test, but will post more information as soon as it  
finishes.


RE:

> Between these 2 rounds, are you sure that all models for all residues
> are identical?  From your data that you posted at
> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> (Message-id: <[EMAIL PROTECTED]>), I
> would guess that this is not the case and one or two residues actually
> do change in their model selections.

Yes, I am sure the models for all residues are identical.  This was  
verified using the unix 'diff' program:

bunzip2 -c sphere/round_160/aic/results.bz2 | awk '{print $1,$6}' >  
previous
bunzip2 -c sphere/round_161/aic/results.bz2 | awk '{print $1,$6}' >  
current
diff previous current

There were no differences between the previous and current files.

Also, I am not using the same data as mentioned in the links you  
provided.  This is data from a different protein.  In addition, the  
identical model-free models test converged in the calculation  
reported in the previous post, whereas full_analysis.py suggest it  
has not converged in my current post.  I think my current problems  
might be a separate issue, which I'm looking into now.

Thanks,
Doug




On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:

> Hi,
>
> The problem is likely to be due to a circular looping around very
> similar solutions close to the universal solution, as I have defined
> in:
>
> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
> 2007 Jul;3(7):483-94.
>
> If you can't get the paper, have a look at Chapter 5 of my PhD  
> thesis at
> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The problem
> is the interlinked mathematical optimisation and statistical model
> selection where we are trying to minimise different quantities.  For
> mathematical optimisation this is the chi-squared value.  For model
> selection this is the quantity known as the descrepancy.  These can
> not be optimised together as mathematical optimisation works in a
> single fixed-dimension space or universe whereas model selection
> operates across multiple spaces with different dimensions.  See the
> paper for a more comprehensive description of the issue.
>
> You should be able to see this if you look at the end of iterations.
> If you have 160 iterations, look after iteration 20 (or maybe even 30
> or further).  Until then, you will not have reached the circular loop.
>  After that point you will be able to exactly quatify this circular
> loop.  You'll be able to determine its periodicity, which residues are
> involved (probably only 1), and whether the diffusion tensor changes
> as model selection changes.
>
> I mentioned all of this already in my post at
> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> (Message-id:  
> <[EMAIL PROTECTED]>)
> in response to your original post
> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
> Message-id: <[EMAIL PROTECTED]>).
>
> I have a few more points about the tests you have done but to work out
> what is happening with the printouts, it would be very useful to have
> your modified 'full_analysis.py' script attached.
>
>
>
> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I'm unsure if this is a bug in full_analysis.py, in the internal
>> relax code, or user error.  The optimization of the 'sphere' model
>> will not converge, now after 160+ rounds.   The chi-squared test has
>> converged (long, long ago):
>
> See above.
>
>
>> "" from output
>>          Chi-squared test:
>>              chi2 (k-1): 100.77647517006251
>>              chi2 (k):   100.77647517006251
>>              The chi-squared value has converged.
>> ""
>>
>> However, the identical model-free models test does has not converged:
>>
>> "" from output
>>          Identical model-free models test:
>>              The model-free models have not converged.
>>
>>          Identical parameter test:
>>              The model-free models haven't converged hence the
>> parameters haven't converged.
>> ""
>>
>> Something that confuses me is that the output files in the round_??/
>> aic directory suggest that, for example, the round_160 and round_161
>> AIC model selections are equivalent.  Here are the models for the
>> first few residues:
>
> Between these 2 rounds, are you sure that all models for all residues
> are identical?  From your data that you posted at
> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> (Message-id: <[EMAIL PROTECTED]>), I
> would guess that this is not the case and one or two residues actually
> do change in their model selections.
>
>
>> ""
>>          1 None None
>>          2 None None
>>          3 None None
>>          4 m2 m2
>>          5 m2 m2
>>          6 m2 m2
>>          7 m2 m2
>>          8 m2 m2
>>          9 m4 m4
>>          10 m1 m1
>>          11 None None
>>          12 m2 m2
>>          13 m2 m2
>>          14 m1 m1
>>          15 m2 m2
>>          16 m3 m3
>>          17 m3 m3
>>          18 None None
>> ""
>>
>> However, I modified the full_analysis.py protocol to print the
>> differences in the model selection, within the 'Identical model-free
>> model test' section of the 'convergence' definition. Here is the
>> beginning of the output (which only contains differences between the
>> previous and current rounds):
>>
>> ""
>>          residue 1: prev=None curr=m2
>>          residue 2: prev=None curr=m2
>>          residue 3: prev=None curr=m2
>>          residue 6: prev=m2 curr=m4
>>          residue 7: prev=m2 curr=m1
>>          residue 9: prev=m4 curr=m2
>>          residue 11: prev=None curr=m2
>>          residue 12: prev=m2 curr=m3
>>          residue 13: prev=m2 curr=m3
>>          residue 15: prev=m2 curr=m1
>>          residue 16: prev=m3 curr=m2
>>          residue 17: prev=m3 curr=m1
>>          residue 18: prev=None curr=m3
>> ""
>
> This output is quite strange.  I would need to see the
> full_analysis.py script to do more with this.
>
>
>> There should be no data for residues 1-3, 11 and 18 (None), however
>> the 'Identical model-free model test' seems as if it ignores residues
>> for which 'None' was selected in the curr_model call in the following
>> code:
>>
>> ""
>>          # Create a string representation of the model-free models of
>> the previous run.
>>          prev_models = ''
>>          for i in xrange(len(self.relax.data.res['previous'])):
>>              if hasattr(self.relax.data.res['previous'][i], 'model'):
>>                  #prev_models = prev_models + self.relax.data.res
>> ['previous'][i].model
>>                  prev_models = prev_models + ' ' +  
>> self.relax.data.res
>> ['previous'][i].model
>>
>>          # Create a string representation of the model-free models of
>> the current run.
>>          curr_models = ''
>>          for i in xrange(len(self.relax.data.res[run])):
>>              if hasattr(self.relax.data.res[run][i], 'model'):
>>                  #curr_models = curr_models + self.relax.data.res 
>> [run]
>> [i].model
>>                  curr_models = curr_models + ' ' +  
>> self.relax.data.res
>> [run][i].model
>> ""
>
> As residues 1-3, 11 and 18 are deselected, then they will not have the
> attribute 'model' and hence will not be placed in the prev_models or
> curr_models string (which are then compared).
>
>
>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
>> 'unresolved' which is read by the full_analysis.py protocol.  I
>> created a separate sequence file (variable = SEQUENCE) that contains
>> all residues (those with data and those without), instead of using a
>> data file (noe data, in the default full_analysis.py file).  However,
>> these residues are not specified in the data (r1, r2 and noe) files,
>> as I did not have data for them.  Should I add them but place 'None'
>> in the data and error columns?  Could that be causing the problems?
>> Or should I create a bug report for this?
>
> I'm not sure what you mean by the file '(variable = SEQUENCE)'
> statement.  I would need to see the full_analysis.py script to
> understand this.  I would assume a file called the value in the
> variable 'SEQUENCE'.  In which case, this should not be a problem.  As
> the data is missing from the files containing the relaxation data,
> these residues will not be used in model-free analysis.  They will be
> automatically deselected.  There is not need to add empty data for
> these spin systems into the relaxation data files.  As I said before
> at the top of this message and at
> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> (Message-id:  
> <[EMAIL PROTECTED]>),
> the problem is almost guaranteed to be a circular loop of equivalent
> solutions circling around the universal solution - the solution
> defined within the universal set (the union of all global models
> (diffusion tensor + all model-free models of all residues)).  If you
> have hit this circular problem, then I suggest you stop running relax.
>  What I would then do is identify the spin system (or systems) causing
> the loop, and what the differences are between all members of the
> loop.  This you can do by plotting the data and maybe using the
> program diff on uncompressed versions of the results files.  It's
> likely that the differences are small and inconsequential.  I hope
> this helps.
>
> Regards,
>
> Edward


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to