Re: bug? previous vs. current model test in full_analysis

Edward d'Auvergne Mon, 17 Sep 2007 15:46:45 -0700

Hi,

In your previous post
(https://mail.gna.org/public/relax-users/2007-09/msg00011.html,
Message-id: <[EMAIL PROTECTED]>) I think
you were spot on with the diagnosis.  The reading of the results files
with None in all model positions will create variables called 'model'
with the value set to None.  Then the string comparison will fail
unless these are skipped as well.  Well done on picking up the exact
cause of the failure.  This important change will need to go into
relax before a new release with the more advanced 'full_analysis.py'
script.


If you would like to post the changes, I can add these to the
repository.  I'll need to check the change carefully first though, and
it may only require two lines in the script to be changed.  The best
way would be to attach a patch to a bug report.  If you could create a
bug report with a simple summary, that would be appreciated.  Then how
you report the changes is up to you.  If you change a checked out copy
of the SVN repository and type 'svn diff > patch', you'll get the
changes in a patch file which I can then check and commit to the
repository with your name attached.

Thanks,

Edward


On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> As a followup, my changes to full_analysis.py solved my problem.  I
> will clean up my code and post it within the next day or so.  Would
> you prefer that I attach the script as an attachment, or inline in an
> email, or provide a patch, or change the CVS code myself?
>
> Doug
>
>
> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:
>
> > Hi,
> >
> > The problem is likely to be due to a circular looping around very
> > similar solutions close to the universal solution, as I have defined
> > in:
> >
> > d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
> > problem and the diffusion seeded model-free paradigm. Mol Biosyst.
> > 2007 Jul;3(7):483-94.
> >
> > If you can't get the paper, have a look at Chapter 5 of my PhD
> > thesis at
> > http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The problem
> > is the interlinked mathematical optimisation and statistical model
> > selection where we are trying to minimise different quantities.  For
> > mathematical optimisation this is the chi-squared value.  For model
> > selection this is the quantity known as the descrepancy.  These can
> > not be optimised together as mathematical optimisation works in a
> > single fixed-dimension space or universe whereas model selection
> > operates across multiple spaces with different dimensions.  See the
> > paper for a more comprehensive description of the issue.
> >
> > You should be able to see this if you look at the end of iterations.
> > If you have 160 iterations, look after iteration 20 (or maybe even 30
> > or further).  Until then, you will not have reached the circular loop.
> >  After that point you will be able to exactly quatify this circular
> > loop.  You'll be able to determine its periodicity, which residues are
> > involved (probably only 1), and whether the diffusion tensor changes
> > as model selection changes.
> >
> > I mentioned all of this already in my post at
> > https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> > (Message-id:
> > <[EMAIL PROTECTED]>)
> > in response to your original post
> > (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
> > Message-id: <[EMAIL PROTECTED]>).
> >
> > I have a few more points about the tests you have done but to work out
> > what is happening with the printouts, it would be very useful to have
> > your modified 'full_analysis.py' script attached.
> >
> >
> >
> > On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >>
> >> I'm unsure if this is a bug in full_analysis.py, in the internal
> >> relax code, or user error.  The optimization of the 'sphere' model
> >> will not converge, now after 160+ rounds.   The chi-squared test has
> >> converged (long, long ago):
> >
> > See above.
> >
> >
> >> "" from output
> >>          Chi-squared test:
> >>              chi2 (k-1): 100.77647517006251
> >>              chi2 (k):   100.77647517006251
> >>              The chi-squared value has converged.
> >> ""
> >>
> >> However, the identical model-free models test does has not converged:
> >>
> >> "" from output
> >>          Identical model-free models test:
> >>              The model-free models have not converged.
> >>
> >>          Identical parameter test:
> >>              The model-free models haven't converged hence the
> >> parameters haven't converged.
> >> ""
> >>
> >> Something that confuses me is that the output files in the round_??/
> >> aic directory suggest that, for example, the round_160 and round_161
> >> AIC model selections are equivalent.  Here are the models for the
> >> first few residues:
> >
> > Between these 2 rounds, are you sure that all models for all residues
> > are identical?  From your data that you posted at
> > https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> > (Message-id: <[EMAIL PROTECTED]>), I
> > would guess that this is not the case and one or two residues actually
> > do change in their model selections.
> >
> >
> >> ""
> >>          1 None None
> >>          2 None None
> >>          3 None None
> >>          4 m2 m2
> >>          5 m2 m2
> >>          6 m2 m2
> >>          7 m2 m2
> >>          8 m2 m2
> >>          9 m4 m4
> >>          10 m1 m1
> >>          11 None None
> >>          12 m2 m2
> >>          13 m2 m2
> >>          14 m1 m1
> >>          15 m2 m2
> >>          16 m3 m3
> >>          17 m3 m3
> >>          18 None None
> >> ""
> >>
> >> However, I modified the full_analysis.py protocol to print the
> >> differences in the model selection, within the 'Identical model-free
> >> model test' section of the 'convergence' definition. Here is the
> >> beginning of the output (which only contains differences between the
> >> previous and current rounds):
> >>
> >> ""
> >>          residue 1: prev=None curr=m2
> >>          residue 2: prev=None curr=m2
> >>          residue 3: prev=None curr=m2
> >>          residue 6: prev=m2 curr=m4
> >>          residue 7: prev=m2 curr=m1
> >>          residue 9: prev=m4 curr=m2
> >>          residue 11: prev=None curr=m2
> >>          residue 12: prev=m2 curr=m3
> >>          residue 13: prev=m2 curr=m3
> >>          residue 15: prev=m2 curr=m1
> >>          residue 16: prev=m3 curr=m2
> >>          residue 17: prev=m3 curr=m1
> >>          residue 18: prev=None curr=m3
> >> ""
> >
> > This output is quite strange.  I would need to see the
> > full_analysis.py script to do more with this.
> >
> >
> >> There should be no data for residues 1-3, 11 and 18 (None), however
> >> the 'Identical model-free model test' seems as if it ignores residues
> >> for which 'None' was selected in the curr_model call in the following
> >> code:
> >>
> >> ""
> >>          # Create a string representation of the model-free models of
> >> the previous run.
> >>          prev_models = ''
> >>          for i in xrange(len(self.relax.data.res['previous'])):
> >>              if hasattr(self.relax.data.res['previous'][i], 'model'):
> >>                  #prev_models = prev_models + self.relax.data.res
> >> ['previous'][i].model
> >>                  prev_models = prev_models + ' ' +
> >> self.relax.data.res
> >> ['previous'][i].model
> >>
> >>          # Create a string representation of the model-free models of
> >> the current run.
> >>          curr_models = ''
> >>          for i in xrange(len(self.relax.data.res[run])):
> >>              if hasattr(self.relax.data.res[run][i], 'model'):
> >>                  #curr_models = curr_models + self.relax.data.res
> >> [run]
> >> [i].model
> >>                  curr_models = curr_models + ' ' +
> >> self.relax.data.res
> >> [run][i].model
> >> ""
> >
> > As residues 1-3, 11 and 18 are deselected, then they will not have the
> > attribute 'model' and hence will not be placed in the prev_models or
> > curr_models string (which are then compared).
> >
> >
> >> For what it's worth, I have residues 1,2,3,11 and 18 in the file
> >> 'unresolved' which is read by the full_analysis.py protocol.  I
> >> created a separate sequence file (variable = SEQUENCE) that contains
> >> all residues (those with data and those without), instead of using a
> >> data file (noe data, in the default full_analysis.py file).  However,
> >> these residues are not specified in the data (r1, r2 and noe) files,
> >> as I did not have data for them.  Should I add them but place 'None'
> >> in the data and error columns?  Could that be causing the problems?
> >> Or should I create a bug report for this?
> >
> > I'm not sure what you mean by the file '(variable = SEQUENCE)'
> > statement.  I would need to see the full_analysis.py script to
> > understand this.  I would assume a file called the value in the
> > variable 'SEQUENCE'.  In which case, this should not be a problem.  As
> > the data is missing from the files containing the relaxation data,
> > these residues will not be used in model-free analysis.  They will be
> > automatically deselected.  There is not need to add empty data for
> > these spin systems into the relaxation data files.  As I said before
> > at the top of this message and at
> > https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> > (Message-id:
> > <[EMAIL PROTECTED]>),
> > the problem is almost guaranteed to be a circular loop of equivalent
> > solutions circling around the universal solution - the solution
> > defined within the universal set (the union of all global models
> > (diffusion tensor + all model-free models of all residues)).  If you
> > have hit this circular problem, then I suggest you stop running relax.
> >  What I would then do is identify the spin system (or systems) causing
> > the loop, and what the differences are between all members of the
> > loop.  This you can do by plotting the data and maybe using the
> > program diff on uncompressed versions of the results files.  It's
> > likely that the differences are small and inconsequential.  I hope
> > this helps.
> >
> > Regards,
> >
> > Edward
>
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to