Re: bug? previous vs. current model test in full_analysis

Edward d'Auvergne Sun, 21 Oct 2007 09:08:56 -0700

Hi,

Sorry for the delay, I've been flat out after coming back from
holidays.  Now I've finally had a chance to look at apply your patch.
The patch labelled 'patch2' attached to bug #10022
(https://gna.org/bugs/?10022) has been applied to the 1.2 line (and
manually ported to the 1.3 line).  Thank you again for fixing the
problem.


Regards,

Edward


On 9/19/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I uploaded what I think could be the last version of the patch, where
> references to None were changed to 'None'.
>
> Doug
>
>
> On Sep 18, 2007, at 4:41 AM, Edward d'Auvergne wrote:
>
> > Hi,
> >
> > I've reviewed the patch attached to bug #10022
> > (https://gna.org/bugs/?10022), and have found an issue.  The problem
> > is with the use of two dictionaries for the previous and current run.
> > The issue is that order in a dictionary is not guaranteed, and hence
> > the comparison may not work.  Maybe a simple test such as "if
> > self.relax.data.res[run][i].model == None" or its negative after
> > testing for the presence of the 'model' attribute, and removal of the
> > dictionary would remove the problem.  Significant simplifications to
> > the code after the comment "# The test." could also be made.
> >
> > As for the code in the section "NOTE:  the following code has not been
> > extenstively tested", this does not directly address the bug itself
> > and it will have problems with the dictionary ordering.  I would
> > prefer that this section not be present in the patch.  If someone does
> > have different residues in two different iterations of
> > full_analysis.py then this error is much more severe and belongs
> > elsewhere other than in the convergence tests.
> >
> > Cheers,
> >
> > Edward
> >
> >
> >
> > On 9/18/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >> Hi Edward,
> >>
> >> I submitted this as a bug report.  I modified the full_analysis.py
> >> file after a SVN refresh.  Unless you have a quick way of doing so, I
> >> will test the cleaned up version (submitted as a patch to the bug
> >> report) tomorrow.
> >>
> >> Doug
> >>
> >>
> >> On Sep 17, 2007, at 4:12 PM, Edward d'Auvergne wrote:
> >>
> >>> Hi,
> >>>
> >>> In your previous post
> >>> (https://mail.gna.org/public/relax-users/2007-09/msg00011.html,
> >>> Message-id: <[EMAIL PROTECTED]>) I
> >>> think
> >>> you were spot on with the diagnosis.  The reading of the results
> >>> files
> >>> with None in all model positions will create variables called
> >>> 'model'
> >>> with the value set to None.  Then the string comparison will fail
> >>> unless these are skipped as well.  Well done on picking up the exact
> >>> cause of the failure.  This important change will need to go into
> >>> relax before a new release with the more advanced 'full_analysis.py'
> >>> script.
> >>>
> >>> If you would like to post the changes, I can add these to the
> >>> repository.  I'll need to check the change carefully first
> >>> though, and
> >>> it may only require two lines in the script to be changed.  The best
> >>> way would be to attach a patch to a bug report.  If you could
> >>> create a
> >>> bug report with a simple summary, that would be appreciated.
> >>> Then how
> >>> you report the changes is up to you.  If you change a checked out
> >>> copy
> >>> of the SVN repository and type 'svn diff > patch', you'll get the
> >>> changes in a patch file which I can then check and commit to the
> >>> repository with your name attached.
> >>>
> >>> Thanks,
> >>>
> >>> Edward
> >>>
> >>>
> >>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >>>> As a followup, my changes to full_analysis.py solved my problem.  I
> >>>> will clean up my code and post it within the next day or so.  Would
> >>>> you prefer that I attach the script as an attachment, or inline
> >>>> in an
> >>>> email, or provide a patch, or change the CVS code myself?
> >>>>
> >>>> Doug
> >>>>
> >>>>
> >>>> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> The problem is likely to be due to a circular looping around very
> >>>>> similar solutions close to the universal solution, as I have
> >>>>> defined
> >>>>> in:
> >>>>>
> >>>>> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
> >>>>> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
> >>>>> 2007 Jul;3(7):483-94.
> >>>>>
> >>>>> If you can't get the paper, have a look at Chapter 5 of my PhD
> >>>>> thesis at
> >>>>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The
> >>>>> problem
> >>>>> is the interlinked mathematical optimisation and statistical model
> >>>>> selection where we are trying to minimise different
> >>>>> quantities.  For
> >>>>> mathematical optimisation this is the chi-squared value.  For
> >>>>> model
> >>>>> selection this is the quantity known as the descrepancy.  These
> >>>>> can
> >>>>> not be optimised together as mathematical optimisation works in a
> >>>>> single fixed-dimension space or universe whereas model selection
> >>>>> operates across multiple spaces with different dimensions.  See
> >>>>> the
> >>>>> paper for a more comprehensive description of the issue.
> >>>>>
> >>>>> You should be able to see this if you look at the end of
> >>>>> iterations.
> >>>>> If you have 160 iterations, look after iteration 20 (or maybe
> >>>>> even 30
> >>>>> or further).  Until then, you will not have reached the circular
> >>>>> loop.
> >>>>>  After that point you will be able to exactly quatify this
> >>>>> circular
> >>>>> loop.  You'll be able to determine its periodicity, which
> >>>>> residues are
> >>>>> involved (probably only 1), and whether the diffusion tensor
> >>>>> changes
> >>>>> as model selection changes.
> >>>>>
> >>>>> I mentioned all of this already in my post at
> >>>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> >>>>> (Message-id:
> >>>>> <[EMAIL PROTECTED]>)
> >>>>> in response to your original post
> >>>>> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
> >>>>> Message-id: <[EMAIL PROTECTED]>).
> >>>>>
> >>>>> I have a few more points about the tests you have done but to
> >>>>> work out
> >>>>> what is happening with the printouts, it would be very useful to
> >>>>> have
> >>>>> your modified 'full_analysis.py' script attached.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm unsure if this is a bug in full_analysis.py, in the internal
> >>>>>> relax code, or user error.  The optimization of the 'sphere'
> >>>>>> model
> >>>>>> will not converge, now after 160+ rounds.   The chi-squared test
> >>>>>> has
> >>>>>> converged (long, long ago):
> >>>>>
> >>>>> See above.
> >>>>>
> >>>>>
> >>>>>> "" from output
> >>>>>>          Chi-squared test:
> >>>>>>              chi2 (k-1): 100.77647517006251
> >>>>>>              chi2 (k):   100.77647517006251
> >>>>>>              The chi-squared value has converged.
> >>>>>> ""
> >>>>>>
> >>>>>> However, the identical model-free models test does has not
> >>>>>> converged:
> >>>>>>
> >>>>>> "" from output
> >>>>>>          Identical model-free models test:
> >>>>>>              The model-free models have not converged.
> >>>>>>
> >>>>>>          Identical parameter test:
> >>>>>>              The model-free models haven't converged hence the
> >>>>>> parameters haven't converged.
> >>>>>> ""
> >>>>>>
> >>>>>> Something that confuses me is that the output files in the
> >>>>>> round_??/
> >>>>>> aic directory suggest that, for example, the round_160 and
> >>>>>> round_161
> >>>>>> AIC model selections are equivalent.  Here are the models for the
> >>>>>> first few residues:
> >>>>>
> >>>>> Between these 2 rounds, are you sure that all models for all
> >>>>> residues
> >>>>> are identical?  From your data that you posted at
> >>>>> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
> >>>>> (Message-id: <[EMAIL PROTECTED]>), I
> >>>>> would guess that this is not the case and one or two residues
> >>>>> actually
> >>>>> do change in their model selections.
> >>>>>
> >>>>>
> >>>>>> ""
> >>>>>>          1 None None
> >>>>>>          2 None None
> >>>>>>          3 None None
> >>>>>>          4 m2 m2
> >>>>>>          5 m2 m2
> >>>>>>          6 m2 m2
> >>>>>>          7 m2 m2
> >>>>>>          8 m2 m2
> >>>>>>          9 m4 m4
> >>>>>>          10 m1 m1
> >>>>>>          11 None None
> >>>>>>          12 m2 m2
> >>>>>>          13 m2 m2
> >>>>>>          14 m1 m1
> >>>>>>          15 m2 m2
> >>>>>>          16 m3 m3
> >>>>>>          17 m3 m3
> >>>>>>          18 None None
> >>>>>> ""
> >>>>>>
> >>>>>> However, I modified the full_analysis.py protocol to print the
> >>>>>> differences in the model selection, within the 'Identical model-
> >>>>>> free
> >>>>>> model test' section of the 'convergence' definition. Here is the
> >>>>>> beginning of the output (which only contains differences between
> >>>>>> the
> >>>>>> previous and current rounds):
> >>>>>>
> >>>>>> ""
> >>>>>>          residue 1: prev=None curr=m2
> >>>>>>          residue 2: prev=None curr=m2
> >>>>>>          residue 3: prev=None curr=m2
> >>>>>>          residue 6: prev=m2 curr=m4
> >>>>>>          residue 7: prev=m2 curr=m1
> >>>>>>          residue 9: prev=m4 curr=m2
> >>>>>>          residue 11: prev=None curr=m2
> >>>>>>          residue 12: prev=m2 curr=m3
> >>>>>>          residue 13: prev=m2 curr=m3
> >>>>>>          residue 15: prev=m2 curr=m1
> >>>>>>          residue 16: prev=m3 curr=m2
> >>>>>>          residue 17: prev=m3 curr=m1
> >>>>>>          residue 18: prev=None curr=m3
> >>>>>> ""
> >>>>>
> >>>>> This output is quite strange.  I would need to see the
> >>>>> full_analysis.py script to do more with this.
> >>>>>
> >>>>>
> >>>>>> There should be no data for residues 1-3, 11 and 18 (None),
> >>>>>> however
> >>>>>> the 'Identical model-free model test' seems as if it ignores
> >>>>>> residues
> >>>>>> for which 'None' was selected in the curr_model call in the
> >>>>>> following
> >>>>>> code:
> >>>>>>
> >>>>>> ""
> >>>>>>          # Create a string representation of the model-free
> >>>>>> models of
> >>>>>> the previous run.
> >>>>>>          prev_models = ''
> >>>>>>          for i in xrange(len(self.relax.data.res['previous'])):
> >>>>>>              if hasattr(self.relax.data.res['previous'][i],
> >>>>>> 'model'):
> >>>>>>                  #prev_models = prev_models + self.relax.data.res
> >>>>>> ['previous'][i].model
> >>>>>>                  prev_models = prev_models + ' ' +
> >>>>>> self.relax.data.res
> >>>>>> ['previous'][i].model
> >>>>>>
> >>>>>>          # Create a string representation of the model-free
> >>>>>> models of
> >>>>>> the current run.
> >>>>>>          curr_models = ''
> >>>>>>          for i in xrange(len(self.relax.data.res[run])):
> >>>>>>              if hasattr(self.relax.data.res[run][i], 'model'):
> >>>>>>                  #curr_models = curr_models + self.relax.data.res
> >>>>>> [run]
> >>>>>> [i].model
> >>>>>>                  curr_models = curr_models + ' ' +
> >>>>>> self.relax.data.res
> >>>>>> [run][i].model
> >>>>>> ""
> >>>>>
> >>>>> As residues 1-3, 11 and 18 are deselected, then they will not
> >>>>> have the
> >>>>> attribute 'model' and hence will not be placed in the
> >>>>> prev_models or
> >>>>> curr_models string (which are then compared).
> >>>>>
> >>>>>
> >>>>>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
> >>>>>> 'unresolved' which is read by the full_analysis.py protocol.  I
> >>>>>> created a separate sequence file (variable = SEQUENCE) that
> >>>>>> contains
> >>>>>> all residues (those with data and those without), instead of
> >>>>>> using a
> >>>>>> data file (noe data, in the default full_analysis.py file).
> >>>>>> However,
> >>>>>> these residues are not specified in the data (r1, r2 and noe)
> >>>>>> files,
> >>>>>> as I did not have data for them.  Should I add them but place
> >>>>>> 'None'
> >>>>>> in the data and error columns?  Could that be causing the
> >>>>>> problems?
> >>>>>> Or should I create a bug report for this?
> >>>>>
> >>>>> I'm not sure what you mean by the file '(variable = SEQUENCE)'
> >>>>> statement.  I would need to see the full_analysis.py script to
> >>>>> understand this.  I would assume a file called the value in the
> >>>>> variable 'SEQUENCE'.  In which case, this should not be a
> >>>>> problem.  As
> >>>>> the data is missing from the files containing the relaxation data,
> >>>>> these residues will not be used in model-free analysis.  They
> >>>>> will be
> >>>>> automatically deselected.  There is not need to add empty data for
> >>>>> these spin systems into the relaxation data files.  As I said
> >>>>> before
> >>>>> at the top of this message and at
> >>>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
> >>>>> (Message-id:
> >>>>> <[EMAIL PROTECTED]>),
> >>>>> the problem is almost guaranteed to be a circular loop of
> >>>>> equivalent
> >>>>> solutions circling around the universal solution - the solution
> >>>>> defined within the universal set (the union of all global models
> >>>>> (diffusion tensor + all model-free models of all residues)).
> >>>>> If you
> >>>>> have hit this circular problem, then I suggest you stop running
> >>>>> relax.
> >>>>>  What I would then do is identify the spin system (or systems)
> >>>>> causing
> >>>>> the loop, and what the differences are between all members of the
> >>>>> loop.  This you can do by plotting the data and maybe using the
> >>>>> program diff on uncompressed versions of the results files.  It's
> >>>>> likely that the differences are small and inconsequential.  I hope
> >>>>> this helps.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Edward
> >>>>
> >>>>
> >>
> >>
>
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to