Re: bug? previous vs. current model test in full_analysis

Douglas Kojetin Tue, 18 Sep 2007 04:58:03 -0700

Hi Edward,

I uploaded another patch (file #2965) with the simpler solution you  
mentioned.  I guess I must have been extremely lucky for the  
dictionary comparison to work on the first try, unless something else  
was happening resulting in a sorted key list.


Doug


On Sep 18, 2007, at 4:41 AM, Edward d'Auvergne wrote:

> Hi,
>
> I've reviewed the patch attached to bug #10022
> (https://gna.org/bugs/?10022), and have found an issue.  The problem
> is with the use of two dictionaries for the previous and current run.
> The issue is that order in a dictionary is not guaranteed, and hence
> the comparison may not work.  Maybe a simple test such as "if
> self.relax.data.res[run][i].model == None" or its negative after
> testing for the presence of the 'model' attribute, and removal of the
> dictionary would remove the problem.  Significant simplifications to
> the code after the comment "# The test." could also be made.
>
> As for the code in the section "NOTE:  the following code has not been
> extenstively tested", this does not directly address the bug itself
> and it will have problems with the dictionary ordering.  I would
> prefer that this section not be present in the patch.  If someone does
> have different residues in two different iterations of
> full_analysis.py then this error is much more severe and belongs
> elsewhere other than in the convergence tests.
>
> Cheers,
>
> Edward
>
>
>
> On 9/18/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>> Hi Edward,
>>
>> I submitted this as a bug report.  I modified the full_analysis.py
>> file after a SVN refresh.  Unless you have a quick way of doing so, I
>> will test the cleaned up version (submitted as a patch to the bug
>> report) tomorrow.
>>
>> Doug
>>
>>
>> On Sep 17, 2007, at 4:12 PM, Edward d'Auvergne wrote:
>>
>>> Hi,
>>>
>>> In your previous post
>>> (https://mail.gna.org/public/relax-users/2007-09/msg00011.html,
>>> Message-id: <[EMAIL PROTECTED]>) I  
>>> think
>>> you were spot on with the diagnosis.  The reading of the results  
>>> files
>>> with None in all model positions will create variables called  
>>> 'model'
>>> with the value set to None.  Then the string comparison will fail
>>> unless these are skipped as well.  Well done on picking up the exact
>>> cause of the failure.  This important change will need to go into
>>> relax before a new release with the more advanced 'full_analysis.py'
>>> script.
>>>
>>> If you would like to post the changes, I can add these to the
>>> repository.  I'll need to check the change carefully first  
>>> though, and
>>> it may only require two lines in the script to be changed.  The best
>>> way would be to attach a patch to a bug report.  If you could  
>>> create a
>>> bug report with a simple summary, that would be appreciated.   
>>> Then how
>>> you report the changes is up to you.  If you change a checked out  
>>> copy
>>> of the SVN repository and type 'svn diff > patch', you'll get the
>>> changes in a patch file which I can then check and commit to the
>>> repository with your name attached.
>>>
>>> Thanks,
>>>
>>> Edward
>>>
>>>
>>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>>>> As a followup, my changes to full_analysis.py solved my problem.  I
>>>> will clean up my code and post it within the next day or so.  Would
>>>> you prefer that I attach the script as an attachment, or inline  
>>>> in an
>>>> email, or provide a patch, or change the CVS code myself?
>>>>
>>>> Doug
>>>>
>>>>
>>>> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The problem is likely to be due to a circular looping around very
>>>>> similar solutions close to the universal solution, as I have  
>>>>> defined
>>>>> in:
>>>>>
>>>>> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free
>>>>> problem and the diffusion seeded model-free paradigm. Mol Biosyst.
>>>>> 2007 Jul;3(7):483-94.
>>>>>
>>>>> If you can't get the paper, have a look at Chapter 5 of my PhD
>>>>> thesis at
>>>>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  The
>>>>> problem
>>>>> is the interlinked mathematical optimisation and statistical model
>>>>> selection where we are trying to minimise different  
>>>>> quantities.  For
>>>>> mathematical optimisation this is the chi-squared value.  For  
>>>>> model
>>>>> selection this is the quantity known as the descrepancy.  These  
>>>>> can
>>>>> not be optimised together as mathematical optimisation works in a
>>>>> single fixed-dimension space or universe whereas model selection
>>>>> operates across multiple spaces with different dimensions.  See  
>>>>> the
>>>>> paper for a more comprehensive description of the issue.
>>>>>
>>>>> You should be able to see this if you look at the end of  
>>>>> iterations.
>>>>> If you have 160 iterations, look after iteration 20 (or maybe
>>>>> even 30
>>>>> or further).  Until then, you will not have reached the circular
>>>>> loop.
>>>>>  After that point you will be able to exactly quatify this  
>>>>> circular
>>>>> loop.  You'll be able to determine its periodicity, which
>>>>> residues are
>>>>> involved (probably only 1), and whether the diffusion tensor  
>>>>> changes
>>>>> as model selection changes.
>>>>>
>>>>> I mentioned all of this already in my post at
>>>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
>>>>> (Message-id:
>>>>> <[EMAIL PROTECTED]>)
>>>>> in response to your original post
>>>>> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html,
>>>>> Message-id: <[EMAIL PROTECTED]>).
>>>>>
>>>>> I have a few more points about the tests you have done but to
>>>>> work out
>>>>> what is happening with the printouts, it would be very useful to
>>>>> have
>>>>> your modified 'full_analysis.py' script attached.
>>>>>
>>>>>
>>>>>
>>>>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm unsure if this is a bug in full_analysis.py, in the internal
>>>>>> relax code, or user error.  The optimization of the 'sphere'  
>>>>>> model
>>>>>> will not converge, now after 160+ rounds.   The chi-squared test
>>>>>> has
>>>>>> converged (long, long ago):
>>>>>
>>>>> See above.
>>>>>
>>>>>
>>>>>> "" from output
>>>>>>          Chi-squared test:
>>>>>>              chi2 (k-1): 100.77647517006251
>>>>>>              chi2 (k):   100.77647517006251
>>>>>>              The chi-squared value has converged.
>>>>>> ""
>>>>>>
>>>>>> However, the identical model-free models test does has not
>>>>>> converged:
>>>>>>
>>>>>> "" from output
>>>>>>          Identical model-free models test:
>>>>>>              The model-free models have not converged.
>>>>>>
>>>>>>          Identical parameter test:
>>>>>>              The model-free models haven't converged hence the
>>>>>> parameters haven't converged.
>>>>>> ""
>>>>>>
>>>>>> Something that confuses me is that the output files in the
>>>>>> round_??/
>>>>>> aic directory suggest that, for example, the round_160 and
>>>>>> round_161
>>>>>> AIC model selections are equivalent.  Here are the models for the
>>>>>> first few residues:
>>>>>
>>>>> Between these 2 rounds, are you sure that all models for all
>>>>> residues
>>>>> are identical?  From your data that you posted at
>>>>> https://mail.gna.org/public/relax-users/2007-06/msg00017.html
>>>>> (Message-id: <[EMAIL PROTECTED]>), I
>>>>> would guess that this is not the case and one or two residues
>>>>> actually
>>>>> do change in their model selections.
>>>>>
>>>>>
>>>>>> ""
>>>>>>          1 None None
>>>>>>          2 None None
>>>>>>          3 None None
>>>>>>          4 m2 m2
>>>>>>          5 m2 m2
>>>>>>          6 m2 m2
>>>>>>          7 m2 m2
>>>>>>          8 m2 m2
>>>>>>          9 m4 m4
>>>>>>          10 m1 m1
>>>>>>          11 None None
>>>>>>          12 m2 m2
>>>>>>          13 m2 m2
>>>>>>          14 m1 m1
>>>>>>          15 m2 m2
>>>>>>          16 m3 m3
>>>>>>          17 m3 m3
>>>>>>          18 None None
>>>>>> ""
>>>>>>
>>>>>> However, I modified the full_analysis.py protocol to print the
>>>>>> differences in the model selection, within the 'Identical model-
>>>>>> free
>>>>>> model test' section of the 'convergence' definition. Here is the
>>>>>> beginning of the output (which only contains differences between
>>>>>> the
>>>>>> previous and current rounds):
>>>>>>
>>>>>> ""
>>>>>>          residue 1: prev=None curr=m2
>>>>>>          residue 2: prev=None curr=m2
>>>>>>          residue 3: prev=None curr=m2
>>>>>>          residue 6: prev=m2 curr=m4
>>>>>>          residue 7: prev=m2 curr=m1
>>>>>>          residue 9: prev=m4 curr=m2
>>>>>>          residue 11: prev=None curr=m2
>>>>>>          residue 12: prev=m2 curr=m3
>>>>>>          residue 13: prev=m2 curr=m3
>>>>>>          residue 15: prev=m2 curr=m1
>>>>>>          residue 16: prev=m3 curr=m2
>>>>>>          residue 17: prev=m3 curr=m1
>>>>>>          residue 18: prev=None curr=m3
>>>>>> ""
>>>>>
>>>>> This output is quite strange.  I would need to see the
>>>>> full_analysis.py script to do more with this.
>>>>>
>>>>>
>>>>>> There should be no data for residues 1-3, 11 and 18 (None),  
>>>>>> however
>>>>>> the 'Identical model-free model test' seems as if it ignores
>>>>>> residues
>>>>>> for which 'None' was selected in the curr_model call in the
>>>>>> following
>>>>>> code:
>>>>>>
>>>>>> ""
>>>>>>          # Create a string representation of the model-free
>>>>>> models of
>>>>>> the previous run.
>>>>>>          prev_models = ''
>>>>>>          for i in xrange(len(self.relax.data.res['previous'])):
>>>>>>              if hasattr(self.relax.data.res['previous'][i],
>>>>>> 'model'):
>>>>>>                  #prev_models = prev_models + self.relax.data.res
>>>>>> ['previous'][i].model
>>>>>>                  prev_models = prev_models + ' ' +
>>>>>> self.relax.data.res
>>>>>> ['previous'][i].model
>>>>>>
>>>>>>          # Create a string representation of the model-free
>>>>>> models of
>>>>>> the current run.
>>>>>>          curr_models = ''
>>>>>>          for i in xrange(len(self.relax.data.res[run])):
>>>>>>              if hasattr(self.relax.data.res[run][i], 'model'):
>>>>>>                  #curr_models = curr_models + self.relax.data.res
>>>>>> [run]
>>>>>> [i].model
>>>>>>                  curr_models = curr_models + ' ' +
>>>>>> self.relax.data.res
>>>>>> [run][i].model
>>>>>> ""
>>>>>
>>>>> As residues 1-3, 11 and 18 are deselected, then they will not
>>>>> have the
>>>>> attribute 'model' and hence will not be placed in the  
>>>>> prev_models or
>>>>> curr_models string (which are then compared).
>>>>>
>>>>>
>>>>>> For what it's worth, I have residues 1,2,3,11 and 18 in the file
>>>>>> 'unresolved' which is read by the full_analysis.py protocol.  I
>>>>>> created a separate sequence file (variable = SEQUENCE) that
>>>>>> contains
>>>>>> all residues (those with data and those without), instead of
>>>>>> using a
>>>>>> data file (noe data, in the default full_analysis.py file).
>>>>>> However,
>>>>>> these residues are not specified in the data (r1, r2 and noe)
>>>>>> files,
>>>>>> as I did not have data for them.  Should I add them but place
>>>>>> 'None'
>>>>>> in the data and error columns?  Could that be causing the  
>>>>>> problems?
>>>>>> Or should I create a bug report for this?
>>>>>
>>>>> I'm not sure what you mean by the file '(variable = SEQUENCE)'
>>>>> statement.  I would need to see the full_analysis.py script to
>>>>> understand this.  I would assume a file called the value in the
>>>>> variable 'SEQUENCE'.  In which case, this should not be a
>>>>> problem.  As
>>>>> the data is missing from the files containing the relaxation data,
>>>>> these residues will not be used in model-free analysis.  They
>>>>> will be
>>>>> automatically deselected.  There is not need to add empty data for
>>>>> these spin systems into the relaxation data files.  As I said  
>>>>> before
>>>>> at the top of this message and at
>>>>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html
>>>>> (Message-id:
>>>>> <[EMAIL PROTECTED]>),
>>>>> the problem is almost guaranteed to be a circular loop of  
>>>>> equivalent
>>>>> solutions circling around the universal solution - the solution
>>>>> defined within the universal set (the union of all global models
>>>>> (diffusion tensor + all model-free models of all residues)).   
>>>>> If you
>>>>> have hit this circular problem, then I suggest you stop running
>>>>> relax.
>>>>>  What I would then do is identify the spin system (or systems)
>>>>> causing
>>>>> the loop, and what the differences are between all members of the
>>>>> loop.  This you can do by plotting the data and maybe using the
>>>>> program diff on uncompressed versions of the results files.  It's
>>>>> likely that the differences are small and inconsequential.  I hope
>>>>> this helps.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Edward
>>>>
>>>>
>>
>>


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: bug? previous vs. current model test in full_analysis

Reply via email to