Re: bug? previous vs. current model test in full_analysis

Edward d'Auvergne Tue, 18 Sep 2007 01:42:54 -0700

Hi,

I've reviewed the patch attached to bug #10022
(https://gna.org/bugs/?10022), and have found an issue.  The problem
is with the use of two dictionaries for the previous and current run.
The issue is that order in a dictionary is not guaranteed, and hence
the comparison may not work.  Maybe a simple test such as "if
self.relax.data.res[run][i].model == None" or its negative after
testing for the presence of the 'model' attribute, and removal of the
dictionary would remove the problem.  Significant simplifications to
the code after the comment "# The test." could also be made.


As for the code in the section "NOTE:  the following code has not been
extenstively tested", this does not directly address the bug itself
and it will have problems with the dictionary ordering.  I would
prefer that this section not be present in the patch.  If someone does
have different residues in two different iterations of
full_analysis.py then this error is much more severe and belongs
elsewhere other than in the convergence tests.

Cheers,

Edward



On 9/18/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> Hi Edward,
>
> I submitted this as a bug report.  I modified the full_analysis.py
> file after a SVN refresh.  Unless you have a quick way of doing so, I
> will test the cleaned up version (submitted as a patch to the bug
> report) tomorrow.
>
> Doug
>
>
> On Sep 17, 2007, at 4:12 PM, Edward d'Auvergne wrote:
>
> > Hi,
> >
> > In your previous post
> > (https://mail.gna.org/public/relax-users/2007-09/msg00011.html,
> > Message-id: <[EMAIL PROTECTED]>) I think
> > you were spot on with the diagnosis.  The reading of the results files
> > with None in all model positions will create variables called 'model'
> > with the value set to None.  Then the string comparison will fail
> > unless these are skipped as well.  Well done on picking up the exact
> > cause of the failure.  This important change will need to go into
> > relax before a new release with the more advanced 'full_analysis.py'
> > script.
> >
> > If you would like to post the changes, I can add these to the
> > repository.  I'll need to check the change carefully first though, and
> > it may only require two lines in the script to be changed.  The best
> > way would be to attach a patch to a bug report.  If you could create a
> > bug report with a simple summary, that would be appreciated.  Then how
> > you report the changes is up to you.  If you change a checked out copy
> > of the SVN repository and type 'svn diff > patch', you'll get the
> > changes in a patch file which I can then check and commit to the
> > repository with your name attached.
> >
> > Thanks,
> >
> > Edward
> >
> >
> > On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >> As a followup, my changes to full_analysis.py solved my problem.  I
> >> will clean up my code and post it within the next day or so.  Would
> >> you prefer that I attach the script as an attachment, or inline in an
> >> email, or provide a patch, or change the CVS code myself?
> >>
> >> Doug
> >>
> >>
> >> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:
> >>
> >>> Hi,
> >>>
> >>> The problem is likely to be due to a circular looping around very
> >>> similar solutions close to the universal solution, as I have defined
> >>> in:
> >>>
> >>> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
> >>> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
> >>> 2007 Jul;3(7):483-94.
> >>>
> >>> If you can't get the paper, have a look at Chapter 5 of my PhD
> >>> thesis at
> >>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The
> >>> problem
> >>> is the interlinked mathematical optimisation and statistical model
> >>> selection where we are trying to minimise different quantities.  For
> >>> mathematical optimisation this is the chi-squared value.  For model
> >>> selection this is the quantity known as the descrepancy.  These can
> >>> not be optimised together as mathematical optimisation works in a
> >>> single fixed-dimension space or universe whereas model selection
> >>> operates across multiple spaces with different dimensions.  See the
> >>> paper for a more comprehensive description of the issue.
> >>>
> >>> You should be able to see this if you look at the end of iterations.
> >>> If you have 160 iterations, look after iteration 20 (or maybe
> >>> even 30
> >>> or further).  Until then, you will not have reached the circular
> >>> loop.
> >>>  After that point you will be able to exactly quatify this circular
> >>> loop.  You'll be able to determine its periodicity, which
> >>> residues are
> >>> involved (probably only 1), and whether the diffusion tensor changes
> >>> as model selection changes.
> >>>
> >>> I mentioned all of this already in my post at
> >>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> >>> (Message-id:
> >>> <[EMAIL PROTECTED]>)
> >>> in response to your original post
> >>> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
> >>> Message-id: <[EMAIL PROTECTED]>).
> >>>
> >>> I have a few more points about the tests you have done but to
> >>> work out
> >>> what is happening with the printouts, it would be very useful to
> >>> have
> >>> your modified 'full_analysis.py' script attached.
> >>>
> >>>
> >>>
> >>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >>>> Hi,
> >>>>
> >>>> I'm unsure if this is a bug in full_analysis.py, in the internal
> >>>> relax code, or user error.  The optimization of the 'sphere' model
> >>>> will not converge, now after 160+ rounds.   The chi-squared test
> >>>> has
> >>>> converged (long, long ago):
> >>>
> >>> See above.
> >>>
> >>>
> >>>> "" from output
> >>>>          Chi-squared test:
> >>>>              chi2 (k-1): 100.77647517006251
> >>>>              chi2 (k):   100.77647517006251
> >>>>              The chi-squared value has converged.
> >>>> ""
> >>>>
> >>>> However, the identical model-free models test does has not
> >>>> converged:
> >>>>
> >>>> "" from output
> >>>>          Identical model-free models test:
> >>>>              The model-free models have not converged.
> >>>>
> >>>>          Identical parameter test:
> >>>>              The model-free models haven't converged hence the
> >>>> parameters haven't converged.
> >>>> ""
> >>>>
> >>>> Something that confuses me is that the output files in the
> >>>> round_??/
> >>>> aic directory suggest that, for example, the round_160 and
> >>>> round_161
> >>>> AIC model selections are equivalent.  Here are the models for the
> >>>> first few residues:
> >>>
> >>> Between these 2 rounds, are you sure that all models for all
> >>> residues
> >>> are identical?  From your data that you posted at
> >>> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> >>> (Message-id: <[EMAIL PROTECTED]>), I
> >>> would guess that this is not the case and one or two residues
> >>> actually
> >>> do change in their model selections.
> >>>
> >>>
> >>>> ""
> >>>>          1 None None
> >>>>          2 None None
> >>>>          3 None None
> >>>>          4 m2 m2
> >>>>          5 m2 m2
> >>>>          6 m2 m2
> >>>>          7 m2 m2
> >>>>          8 m2 m2
> >>>>          9 m4 m4
> >>>>          10 m1 m1
> >>>>          11 None None
> >>>>          12 m2 m2
> >>>>          13 m2 m2
> >>>>          14 m1 m1
> >>>>          15 m2 m2
> >>>>          16 m3 m3
> >>>>          17 m3 m3
> >>>>          18 None None
> >>>> ""
> >>>>
> >>>> However, I modified the full_analysis.py protocol to print the
> >>>> differences in the model selection, within the 'Identical model-
> >>>> free
> >>>> model test' section of the 'convergence' definition. Here is the
> >>>> beginning of the output (which only contains differences between
> >>>> the
> >>>> previous and current rounds):
> >>>>
> >>>> ""
> >>>>          residue 1: prev=None curr=m2
> >>>>          residue 2: prev=None curr=m2
> >>>>          residue 3: prev=None curr=m2
> >>>>          residue 6: prev=m2 curr=m4
> >>>>          residue 7: prev=m2 curr=m1
> >>>>          residue 9: prev=m4 curr=m2
> >>>>          residue 11: prev=None curr=m2
> >>>>          residue 12: prev=m2 curr=m3
> >>>>          residue 13: prev=m2 curr=m3
> >>>>          residue 15: prev=m2 curr=m1
> >>>>          residue 16: prev=m3 curr=m2
> >>>>          residue 17: prev=m3 curr=m1
> >>>>          residue 18: prev=None curr=m3
> >>>> ""
> >>>
> >>> This output is quite strange.  I would need to see the
> >>> full_analysis.py script to do more with this.
> >>>
> >>>
> >>>> There should be no data for residues 1-3, 11 and 18 (None), however
> >>>> the 'Identical model-free model test' seems as if it ignores
> >>>> residues
> >>>> for which 'None' was selected in the curr_model call in the
> >>>> following
> >>>> code:
> >>>>
> >>>> ""
> >>>>          # Create a string representation of the model-free
> >>>> models of
> >>>> the previous run.
> >>>>          prev_models = ''
> >>>>          for i in xrange(len(self.relax.data.res['previous'])):
> >>>>              if hasattr(self.relax.data.res['previous'][i],
> >>>> 'model'):
> >>>>                  #prev_models = prev_models + self.relax.data.res
> >>>> ['previous'][i].model
> >>>>                  prev_models = prev_models + ' ' +
> >>>> self.relax.data.res
> >>>> ['previous'][i].model
> >>>>
> >>>>          # Create a string representation of the model-free
> >>>> models of
> >>>> the current run.
> >>>>          curr_models = ''
> >>>>          for i in xrange(len(self.relax.data.res[run])):
> >>>>              if hasattr(self.relax.data.res[run][i], 'model'):
> >>>>                  #curr_models = curr_models + self.relax.data.res
> >>>> [run]
> >>>> [i].model
> >>>>                  curr_models = curr_models + ' ' +
> >>>> self.relax.data.res
> >>>> [run][i].model
> >>>> ""
> >>>
> >>> As residues 1-3, 11 and 18 are deselected, then they will not
> >>> have the
> >>> attribute 'model' and hence will not be placed in the prev_models or
> >>> curr_models string (which are then compared).
> >>>
> >>>
> >>>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
> >>>> 'unresolved' which is read by the full_analysis.py protocol.  I
> >>>> created a separate sequence file (variable = SEQUENCE) that
> >>>> contains
> >>>> all residues (those with data and those without), instead of
> >>>> using a
> >>>> data file (noe data, in the default full_analysis.py file).
> >>>> However,
> >>>> these residues are not specified in the data (r1, r2 and noe)
> >>>> files,
> >>>> as I did not have data for them.  Should I add them but place
> >>>> 'None'
> >>>> in the data and error columns?  Could that be causing the problems?
> >>>> Or should I create a bug report for this?
> >>>
> >>> I'm not sure what you mean by the file '(variable = SEQUENCE)'
> >>> statement.  I would need to see the full_analysis.py script to
> >>> understand this.  I would assume a file called the value in the
> >>> variable 'SEQUENCE'.  In which case, this should not be a
> >>> problem.  As
> >>> the data is missing from the files containing the relaxation data,
> >>> these residues will not be used in model-free analysis.  They
> >>> will be
> >>> automatically deselected.  There is not need to add empty data for
> >>> these spin systems into the relaxation data files.  As I said before
> >>> at the top of this message and at
> >>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> >>> (Message-id:
> >>> <[EMAIL PROTECTED]>),
> >>> the problem is almost guaranteed to be a circular loop of equivalent
> >>> solutions circling around the universal solution - the solution
> >>> defined within the universal set (the union of all global models
> >>> (diffusion tensor + all model-free models of all residues)).  If you
> >>> have hit this circular problem, then I suggest you stop running
> >>> relax.
> >>>  What I would then do is identify the spin system (or systems)
> >>> causing
> >>> the loop, and what the differences are between all members of the
> >>> loop.  This you can do by plotting the data and maybe using the
> >>> program diff on uncompressed versions of the results files.  It's
> >>> likely that the differences are small and inconsequential.  I hope
> >>> this helps.
> >>>
> >>> Regards,
> >>>
> >>> Edward
> >>
> >>
>
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to