Hi, I've reviewed the patch attached to bug #10022 (https://gna.org/bugs/?10022), and have found an issue. The problem is with the use of two dictionaries for the previous and current run. The issue is that order in a dictionary is not guaranteed, and hence the comparison may not work. Maybe a simple test such as "if self.relax.data.res[run][i].model == None" or its negative after testing for the presence of the 'model' attribute, and removal of the dictionary would remove the problem. Significant simplifications to the code after the comment "# The test." could also be made.
As for the code in the section "NOTE: the following code has not been extenstively tested", this does not directly address the bug itself and it will have problems with the dictionary ordering. I would prefer that this section not be present in the patch. If someone does have different residues in two different iterations of full_analysis.py then this error is much more severe and belongs elsewhere other than in the convergence tests. Cheers, Edward On 9/18/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote: > Hi Edward, > > I submitted this as a bug report. I modified the full_analysis.py > file after a SVN refresh. Unless you have a quick way of doing so, I > will test the cleaned up version (submitted as a patch to the bug > report) tomorrow. > > Doug > > > On Sep 17, 2007, at 4:12 PM, Edward d'Auvergne wrote: > > > Hi, > > > > In your previous post > > (https://mail.gna.org/public/relax-users/2007-09/msg00011.html, > > Message-id: <[EMAIL PROTECTED]>) I think > > you were spot on with the diagnosis. The reading of the results files > > with None in all model positions will create variables called 'model' > > with the value set to None. Then the string comparison will fail > > unless these are skipped as well. Well done on picking up the exact > > cause of the failure. This important change will need to go into > > relax before a new release with the more advanced 'full_analysis.py' > > script. > > > > If you would like to post the changes, I can add these to the > > repository. I'll need to check the change carefully first though, and > > it may only require two lines in the script to be changed. The best > > way would be to attach a patch to a bug report. If you could create a > > bug report with a simple summary, that would be appreciated. Then how > > you report the changes is up to you. If you change a checked out copy > > of the SVN repository and type 'svn diff > patch', you'll get the > > changes in a patch file which I can then check and commit to the > > repository with your name attached. > > > > Thanks, > > > > Edward > > > > > > On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote: > >> As a followup, my changes to full_analysis.py solved my problem. I > >> will clean up my code and post it within the next day or so. Would > >> you prefer that I attach the script as an attachment, or inline in an > >> email, or provide a patch, or change the CVS code myself? > >> > >> Doug > >> > >> > >> On Sep 17, 2007, at 11:48 AM, Edward d'Auvergne wrote: > >> > >>> Hi, > >>> > >>> The problem is likely to be due to a circular looping around very > >>> similar solutions close to the universal solution, as I have defined > >>> in: > >>> > >>> d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free > >>> problem and the diffusion seeded model-free paradigm. Mol Biosyst. > >>> 2007 Jul;3(7):483-94. > >>> > >>> If you can't get the paper, have a look at Chapter 5 of my PhD > >>> thesis at > >>> http://eprints.infodiv.unimelb.edu.au/archive/00002799/. The > >>> problem > >>> is the interlinked mathematical optimisation and statistical model > >>> selection where we are trying to minimise different quantities. For > >>> mathematical optimisation this is the chi-squared value. For model > >>> selection this is the quantity known as the descrepancy. These can > >>> not be optimised together as mathematical optimisation works in a > >>> single fixed-dimension space or universe whereas model selection > >>> operates across multiple spaces with different dimensions. See the > >>> paper for a more comprehensive description of the issue. > >>> > >>> You should be able to see this if you look at the end of iterations. > >>> If you have 160 iterations, look after iteration 20 (or maybe > >>> even 30 > >>> or further). Until then, you will not have reached the circular > >>> loop. > >>> After that point you will be able to exactly quatify this circular > >>> loop. You'll be able to determine its periodicity, which > >>> residues are > >>> involved (probably only 1), and whether the diffusion tensor changes > >>> as model selection changes. > >>> > >>> I mentioned all of this already in my post at > >>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html > >>> (Message-id: > >>> <[EMAIL PROTECTED]>) > >>> in response to your original post > >>> (https://mail.gna.org/public/relax-users/2007-06/msg00004.html, > >>> Message-id: <[EMAIL PROTECTED]>). > >>> > >>> I have a few more points about the tests you have done but to > >>> work out > >>> what is happening with the printouts, it would be very useful to > >>> have > >>> your modified 'full_analysis.py' script attached. > >>> > >>> > >>> > >>> On 9/17/07, Douglas Kojetin <[EMAIL PROTECTED]> wrote: > >>>> Hi, > >>>> > >>>> I'm unsure if this is a bug in full_analysis.py, in the internal > >>>> relax code, or user error. The optimization of the 'sphere' model > >>>> will not converge, now after 160+ rounds. The chi-squared test > >>>> has > >>>> converged (long, long ago): > >>> > >>> See above. > >>> > >>> > >>>> "" from output > >>>> Chi-squared test: > >>>> chi2 (k-1): 100.77647517006251 > >>>> chi2 (k): 100.77647517006251 > >>>> The chi-squared value has converged. > >>>> "" > >>>> > >>>> However, the identical model-free models test does has not > >>>> converged: > >>>> > >>>> "" from output > >>>> Identical model-free models test: > >>>> The model-free models have not converged. > >>>> > >>>> Identical parameter test: > >>>> The model-free models haven't converged hence the > >>>> parameters haven't converged. > >>>> "" > >>>> > >>>> Something that confuses me is that the output files in the > >>>> round_??/ > >>>> aic directory suggest that, for example, the round_160 and > >>>> round_161 > >>>> AIC model selections are equivalent. Here are the models for the > >>>> first few residues: > >>> > >>> Between these 2 rounds, are you sure that all models for all > >>> residues > >>> are identical? From your data that you posted at > >>> https://mail.gna.org/public/relax-users/2007-06/msg00017.html > >>> (Message-id: <[EMAIL PROTECTED]>), I > >>> would guess that this is not the case and one or two residues > >>> actually > >>> do change in their model selections. > >>> > >>> > >>>> "" > >>>> 1 None None > >>>> 2 None None > >>>> 3 None None > >>>> 4 m2 m2 > >>>> 5 m2 m2 > >>>> 6 m2 m2 > >>>> 7 m2 m2 > >>>> 8 m2 m2 > >>>> 9 m4 m4 > >>>> 10 m1 m1 > >>>> 11 None None > >>>> 12 m2 m2 > >>>> 13 m2 m2 > >>>> 14 m1 m1 > >>>> 15 m2 m2 > >>>> 16 m3 m3 > >>>> 17 m3 m3 > >>>> 18 None None > >>>> "" > >>>> > >>>> However, I modified the full_analysis.py protocol to print the > >>>> differences in the model selection, within the 'Identical model- > >>>> free > >>>> model test' section of the 'convergence' definition. Here is the > >>>> beginning of the output (which only contains differences between > >>>> the > >>>> previous and current rounds): > >>>> > >>>> "" > >>>> residue 1: prev=None curr=m2 > >>>> residue 2: prev=None curr=m2 > >>>> residue 3: prev=None curr=m2 > >>>> residue 6: prev=m2 curr=m4 > >>>> residue 7: prev=m2 curr=m1 > >>>> residue 9: prev=m4 curr=m2 > >>>> residue 11: prev=None curr=m2 > >>>> residue 12: prev=m2 curr=m3 > >>>> residue 13: prev=m2 curr=m3 > >>>> residue 15: prev=m2 curr=m1 > >>>> residue 16: prev=m3 curr=m2 > >>>> residue 17: prev=m3 curr=m1 > >>>> residue 18: prev=None curr=m3 > >>>> "" > >>> > >>> This output is quite strange. I would need to see the > >>> full_analysis.py script to do more with this. > >>> > >>> > >>>> There should be no data for residues 1-3, 11 and 18 (None), however > >>>> the 'Identical model-free model test' seems as if it ignores > >>>> residues > >>>> for which 'None' was selected in the curr_model call in the > >>>> following > >>>> code: > >>>> > >>>> "" > >>>> # Create a string representation of the model-free > >>>> models of > >>>> the previous run. > >>>> prev_models = '' > >>>> for i in xrange(len(self.relax.data.res['previous'])): > >>>> if hasattr(self.relax.data.res['previous'][i], > >>>> 'model'): > >>>> #prev_models = prev_models + self.relax.data.res > >>>> ['previous'][i].model > >>>> prev_models = prev_models + ' ' + > >>>> self.relax.data.res > >>>> ['previous'][i].model > >>>> > >>>> # Create a string representation of the model-free > >>>> models of > >>>> the current run. > >>>> curr_models = '' > >>>> for i in xrange(len(self.relax.data.res[run])): > >>>> if hasattr(self.relax.data.res[run][i], 'model'): > >>>> #curr_models = curr_models + self.relax.data.res > >>>> [run] > >>>> [i].model > >>>> curr_models = curr_models + ' ' + > >>>> self.relax.data.res > >>>> [run][i].model > >>>> "" > >>> > >>> As residues 1-3, 11 and 18 are deselected, then they will not > >>> have the > >>> attribute 'model' and hence will not be placed in the prev_models or > >>> curr_models string (which are then compared). > >>> > >>> > >>>> For what it's worth, I have residues 1,2,3,11 and 18 in the file > >>>> 'unresolved' which is read by the full_analysis.py protocol. I > >>>> created a separate sequence file (variable = SEQUENCE) that > >>>> contains > >>>> all residues (those with data and those without), instead of > >>>> using a > >>>> data file (noe data, in the default full_analysis.py file). > >>>> However, > >>>> these residues are not specified in the data (r1, r2 and noe) > >>>> files, > >>>> as I did not have data for them. Should I add them but place > >>>> 'None' > >>>> in the data and error columns? Could that be causing the problems? > >>>> Or should I create a bug report for this? > >>> > >>> I'm not sure what you mean by the file '(variable = SEQUENCE)' > >>> statement. I would need to see the full_analysis.py script to > >>> understand this. I would assume a file called the value in the > >>> variable 'SEQUENCE'. In which case, this should not be a > >>> problem. As > >>> the data is missing from the files containing the relaxation data, > >>> these residues will not be used in model-free analysis. They > >>> will be > >>> automatically deselected. There is not need to add empty data for > >>> these spin systems into the relaxation data files. As I said before > >>> at the top of this message and at > >>> https://mail.gna.org/public/relax-users/2007-07/msg00001.html > >>> (Message-id: > >>> <[EMAIL PROTECTED]>), > >>> the problem is almost guaranteed to be a circular loop of equivalent > >>> solutions circling around the universal solution - the solution > >>> defined within the universal set (the union of all global models > >>> (diffusion tensor + all model-free models of all residues)). If you > >>> have hit this circular problem, then I suggest you stop running > >>> relax. > >>> What I would then do is identify the spin system (or systems) > >>> causing > >>> the loop, and what the differences are between all members of the > >>> loop. This you can do by plotting the data and maybe using the > >>> program diff on uncompressed versions of the results files. It's > >>> likely that the differences are small and inconsequential. I hope > >>> this helps. > >>> > >>> Regards, > >>> > >>> Edward > >> > >> > > _______________________________________________ relax (http://nmr-relax.com) This is the relax-users mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users

