Hi, Does this still fail in the 1.3 line? I should have fixed this one quite a while ago. I think it's about time I released relax-1.3.5!
Cheers, Edward On 16 March 2010 15:44, Sébastien Morin <[email protected]> wrote: > Hi Edward, > > I just tested the Gentoo machines again using relax-1.3.4 and minf-1.0.2. > > Three 32 bit machines I tested completed the test-suite without any > error. Two other machines I previously had in my possession are no > longer available... > > However, one 64 bit machine failed for one test, always with the same > values: > > ==================== > FAIL: Constrained Newton opt, GMW Hessian mod, More and Thuente line > search {S2=0.970, te=2048, Rex=0.149} > > ... > > relax> minimise(*args=('newton',), func_tol=1e-25, > max_iterations=10000000, constraints=True, scaling=True, verbosity=1) > Simulation 1 > Simulation 2 > Simulation 3 > > relax> monte_carlo.error_analysis(prune=0.0) > Traceback (most recent call last): > File "/home/semor/relax-1.3.4/test_suite/system_tests/model_free.py", > line 610, in test_opt_constr_newton_gmw_mt_S2_0_970_te_2048_Rex_0_149 > self.value_test(spin, select, s2, te, rex, chi2, iter, f_count, > g_count, h_count, warning) > File "/home/semor/relax-1.3.4/test_suite/system_tests/model_free.py", > line 1110, in value_test > self.assertEqual(spin.f_count, f_count, msg=mesg) > AssertionError: Optimisation failure. > > System: Linux > Release: 2.6.20-gentoo-r7 > Version: #1 SMP Sat Apr 28 23:31:52 Local time zone must be set--see zic > Win32 version: > Distribution: gentoo 1.12.13 > Architecture: 64bit ELF > Machine: x86_64 > Processor: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz > Python version: 2.6.4 > numpy version: 1.3.0 > > > s2: 0.9699999999999994 > te: 2048.0000000000446 > rex: 0.14900000000001615 > chi2: 8.3312601381368332e-28 > iter: 22 > f_count: 91 > g_count: 91 > h_count: 22 > warning: None > ==================== > > Regards, > > > Séb :) > > > > On 10-02-21 9:00 AM, Edward d'Auvergne wrote: >> Is it different for the different machines, or is it different each >> time on the same machine? If you give a range of numbers for the >> optimisation results, these tests could be relaxed a little. >> >> Cheers, >> >> Edward >> >> >> On 21 February 2010 14:41, Sébastien Morin<[email protected]> >> wrote: >> >>> Hi Ed, >>> >>> I agree with you that this is not an important issue given the small >>> variations observed... >>> >>> I was just still a bit annoyed by this happening on our Gentoo systems... >>> >>> But maybe this is just because of Gentoo itself, as in Gentoo almost >>> everything is compiled locally, so every system is different because of all >>> the variables that can be changed that affect compilation... >>> >>> Ok, let's forget all this ! >>> >>> Regards, >>> >>> >>> Séb >>> >>> >>> On 10-02-21 8:32 AM, Edward d'Auvergne wrote: >>> >>>> Hi, >>>> >>>> The code is not parallelised as most optimisation algorithms are not >>>> amenable to parallelisation. There's a lot of research in that field, >>>> but the code here is not along these lines. Do you still see this >>>> problem? Maybe it is a bug in this specific version of the GCC >>>> compiler which created the python executable? Does it occur on >>>> machines with a different Gentoo versions installed? Can you >>>> reproduce the error in a virtual machine? This is a fixed code path >>>> and cannot in any way be different upon different runs of the test >>>> suite. It doesn't change on all the Mandriva installs I have, all the >>>> Macs it has been tested on, or even on the Windows virtual image I use >>>> to build and test relax on Windows. I've even tested it on Solaris >>>> without problems! In any case, this bug is definitely machine >>>> specific and not related to relax itself. Sorry, I don't know what >>>> else I can do to try to track this down. Maybe your CPUs are doing >>>> some strange frequency scaling depending on load, and that is causing >>>> this bizarre behaviour? In any case, this is not an issue for relax >>>> execution and only affects the precision of optimisation in a small >>>> way. >>>> >>>> Regards, >>>> >>>> Edward >>>> >>>> >>>> >>>> On 21 February 2010 05:34, Sébastien Morin<[email protected]> >>>> wrote: >>>> >>>> >>>>> Hi Ed, >>>>> >>>>> This has been a long time since we discussed about this... >>>>> >>>>> However, talking with Olivier last week, we discussed about one >>>>> possibility >>>>> to explain this issue. Is the code in question in some way parallelized, >>>>> i.e. are there multiple processes running at the same time with their >>>>> results being combined subsequently ? If yes, there could be conditions >>>>> in >>>>> which the problem could arise either because of variations in allocated >>>>> memory or cpu that would change the timing between the different >>>>> processes, >>>>> hence affecting the final result... >>>>> >>>>> Does that make sens ? >>>>> >>>>> Olivier, is this what you explained me last week ? >>>>> >>>>> >>>>> Sébastien >>>>> >>>>> >>>>> On 09-09-14 3:30 AM, Edward d'Auvergne wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I've been trying to work out what is happening, but it is a complete >>>>>> mystery to me. The algorithms are fixed in stone - I coded them >>>>>> myself and you can see it in the minfx code. They are standard >>>>>> optimisation algorithms that obey fixed rules. On the same machine it >>>>>> must, without question, give the same result every time! If it >>>>>> doesn't, something is wrong with the machine, either hardward or >>>>>> software. Would it be possible to install an earlier python and numpy >>>>>> version (maybe 2.5 and 1.2.1 respectively) to see if that makes a >>>>>> difference? Or maybe it is the Linux kernel doing some strange things >>>>>> with the CPU - maybe switching between power profiles causing the CPU >>>>>> floating point math precision to change? Are you 100% sure that all >>>>>> computers give variable results (between each run), and not that they >>>>>> just give a different fixed result each time? Maybe there is a >>>>>> non-fatal kernel bug not triggered by Oliver's hardward? >>>>>> >>>>>> Regards, >>>>>> >>>>>> Edward >>>>>> >>>>>> >>>>>> P.S. A note to others reading this - this problem is not serious for >>>>>> relax's optimisation! >>>>>> >>>>>> >>>>>> 2009/9/4 Sébastien Morin<[email protected]>: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Ed, >>>>>>> >>>>>>> (I added Olivier Fisette in CC as he is quite computer knowledgeable >>>>>>> and >>>>>>> could help us rationalize this issue...) >>>>>>> >>>>>>> This strange behavior was observed for my laptop and the two other >>>>>>> computers in the lab with the failures in the system tests (i.e. for >>>>>>> the >>>>>>> three computers of the bug report). >>>>>>> >>>>>>> I performed some of the different tests proposed on the following page: >>>>>>> -> >>>>>>> http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml >>>>>>> (tested CPU with infinite rebuild of kernel using gcc for 4 >>>>>>> hours) >>>>>>> (tested CPU with cpuburn-1.4 for XXXX hours) >>>>>>> (tested RAM with memtester-4.0.7 for> 6 hours) >>>>>>> to check the CPU and RAM, but did not find anything... Of course, these >>>>>>> tests may not have uncovered potential problems in my CPU and RAM, but >>>>>>> most chances are they are fine. Moreover, the problem being observed >>>>>>> for >>>>>>> three different computers, it would be surprising that hardware >>>>>>> failures >>>>>>> occur in these three machines... >>>>>>> >>>>>>> The three systems run Gentoo Linux with kernel-2.6.30, numpy-1.3.0 and >>>>>>> python-2.6.2. However, the fourth computer to which I have access (for >>>>>>> Olivier: this computer is 'hibou'), and which passes the system tests >>>>>>> properly, also runs Gentoo Linux with kernel 2.6.30, numpy-1.3.0 and >>>>>>> python-2.6.2... >>>>>>> >>>>>>> A potential option could be that some kernel configuration is causing >>>>>>> these problems... >>>>>>> >>>>>>> Another option would be that, although the algorithms are supposedly >>>>>>> fixed, that they are not... >>>>>>> >>>>>>> I could check if the calculations diverge always at the same step and, >>>>>>> if so, try to see what function is problematic... >>>>>>> >>>>>>> Other ideas ? >>>>>>> >>>>>>> Do you know any other minimisation library with which I could test to >>>>>>> see if these computers indeed give rise to changing results or if this >>>>>>> is limited to relax (and minfx) ? >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> >>>>>>> Séb :) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Edward d'Auvergne wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This is very strange, very strange indeed! I've never seen anything >>>>>>>> quite like this. Is it only your laptop that is giving this variable >>>>>>>> result? I'm pretty sure that it's not related to a random seed >>>>>>>> because the optimisation at no point uses random numbers - it is 100% >>>>>>>> fixed, pre-determined, etc. and should never, ever vary (well on >>>>>>>> different machines it will change, but never on the same machine). >>>>>>>> What is the operating system on the laptop? Can you run a ram >>>>>>>> checking program or anything else to diagnose hardware failures? >>>>>>>> Maybe the CPU is overheating? Apart from hardware problems, since you >>>>>>>> never recompile Python or numpy between these tests I cannot think of >>>>>>>> anything else that could possibly cause this. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Edward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2009/9/3 Sébastien Morin<[email protected]>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi Ed, >>>>>>>>> >>>>>>>>> I've just tried what you proposed and observed something quite >>>>>>>>> strange... >>>>>>>>> >>>>>>>>> Here are the results: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ./relax scripts/optimisation_testing.py> /dev/null >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> (stats from my laptop, different trials, see below) >>>>>>>>> iter 161 147 151 >>>>>>>>> f_count 765 620 591 >>>>>>>>> g_count 168 152 158 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ./relax -s >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> (stats from my laptop, different trials, see below) >>>>>>>>> iter 146 159 160 159 >>>>>>>>> f_count 708 721 649 673 >>>>>>>>> g_count 152 166 167 166 >>>>>>>>> >>>>>>>>> >>>>>>>>> Problem 1: >>>>>>>>> The results should be the same in both situations, right ? >>>>>>>>> >>>>>>>>> Problem 2: >>>>>>>>> The results should not vary when the test is done multiple times, >>>>>>>>> right >>>>>>>>> ? >>>>>>>>> >>>>>>>>> >>>>>>>>> I have tested different things to find out why the tests give rise to >>>>>>>>> different results as a function of time... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ./relax scripts/optimisation_testing.py> /dev/null >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> If you modify the file "test_suite/system_tests/__init__.py", then >>>>>>>>> the result will be different. By modifying, I mean just comment a few >>>>>>>>> lines in the run() function. (I usually do that when I want to speed >>>>>>>>> up >>>>>>>>> the process of testing a specific issue.) Maybe this behavior is >>>>>>>>> related >>>>>>>>> to random seed based on the code files... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ./relax -s >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> This one varies as a function of time without any change. Just >>>>>>>>> doing >>>>>>>>> the test several times in a row will have it varying... Maybe this >>>>>>>>> behavior is related to random seed based on the date and time... >>>>>>>>> >>>>>>>>> >>>>>>>>> Any idea ? >>>>>>>>> >>>>>>>>> If you want, Ed, I could create you an account on one of these >>>>>>>>> strange-behaving computers... >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> >>>>>>>>> Séb >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Edward d'Auvergne wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've now written a script so that you can fix this. Try running: >>>>>>>>>> >>>>>>>>>> ./relax scripts/optimisation_testing.py> /dev/null >>>>>>>>>> >>>>>>>>>> This will give you all the info you need, formatted ready for >>>>>>>>>> copying >>>>>>>>>> and pasting into the correct file. This is currently only >>>>>>>>>> 'test_suite/system_tests/model_free.py'. Just paste the >>>>>>>>>> pre-formatted >>>>>>>>>> python comment into the correct test, and add the different values >>>>>>>>>> to >>>>>>>>>> the list of values checked. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Edward >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2009/9/3 Sébastien Morin<[email protected]>: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi Ed, >>>>>>>>>>> >>>>>>>>>>> I just checked my original mail >>>>>>>>>>> (https://mail.gna.org/public/relax-devel/2009-05/msg00003.html). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For the failure "FAIL: Constrained BFGS opt, backtracking line >>>>>>>>>>> search >>>>>>>>>>> {S2=0.970, te=2048, Rex=0.149}", the counts were initially as >>>>>>>>>>> follows: >>>>>>>>>>> f_count 386 >>>>>>>>>>> g_count 386 >>>>>>>>>>> and are now: >>>>>>>>>>> f_count 743 694 761 >>>>>>>>>>> g_count 168 172 164 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For the failure "FAIL: Constrained BFGS opt, More and Thuente line >>>>>>>>>>> search {S2=0.970, te=2048, Rex=0.149}", the counts were initially >>>>>>>>>>> as >>>>>>>>>>> follows: >>>>>>>>>>> f_count 722 >>>>>>>>>>> g_count 164 >>>>>>>>>>> and are now: >>>>>>>>>>> f_count 375 322 385 >>>>>>>>>>> g_count 375 322 385 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The different values given for the "just-measured" parameters >>>>>>>>>>> account >>>>>>>>>>> for the 3 different computers I have access to that give rise to >>>>>>>>>>> these >>>>>>>>>>> two annoying failures... >>>>>>>>>>> >>>>>>>>>>> I wounder if the names of the tests in the original mail were not >>>>>>>>>>> mixed, >>>>>>>>>>> as numbers just measured in the second test seem closer to those >>>>>>>>>>> originally posted in the first test, and vice versa... >>>>>>>>>>> >>>>>>>>>>> Anyway, the problem is that there are variations between the >>>>>>>>>>> different >>>>>>>>>>> machines. Variations are also present for the other parameters (s2, >>>>>>>>>>> te, >>>>>>>>>>> rex, chi2, iter). >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Séb :) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Edward d'Auvergne wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Could you check and see if the numbers are exactly the same as in >>>>>>>>>>>> your >>>>>>>>>>>> original email >>>>>>>>>>>> (https://mail.gna.org/public/relax-devel/2009-05/msg00003.html)? >>>>>>>>>>>> Specifically look at f_count and g_count. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> Edward >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2009/9/2 Sébastien Morin<[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ed, >>>>>>>>>>>>> >>>>>>>>>>>>> I updated my svn copies to r9432 and checked if the problem was >>>>>>>>>>>>> still >>>>>>>>>>>>> present. >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately, it is still present... >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Séb >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Edward d'Auvergne wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ah, yes, there is a reason. I went through and fixed a series >>>>>>>>>>>>>> of >>>>>>>>>>>>>> these optimisation difference issues - in my local svn copy. I >>>>>>>>>>>>>> collected these all together and committed them as one after I >>>>>>>>>>>>>> had >>>>>>>>>>>>>> shut the bugs. This was a few minutes ago at r9426. If you >>>>>>>>>>>>>> update >>>>>>>>>>>>>> and test now, it should work. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Edward >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2009/9/2 Sébastien Morin<[email protected]>: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Ed, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I just tested the for the presence of this bug (1.3 repository, >>>>>>>>>>>>>>> r9425) >>>>>>>>>>>>>>> and it seems it is still there... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there a reason why it was closed ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> From the data I have, I guess this bug report should be >>>>>>>>>>>>>>>> re-opened. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Maybe I could try to give more details to help debugging... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Séb :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Edward d Auvergne wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Update of bug #14182 (project relax): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Status: Confirmed => Fixed >>>>>>>>>>>>>>>> Assigned to: None => bugman >>>>>>>>>>>>>>>> Open/Closed: Open => Closed >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Reply to this item at: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <http://gna.org/bugs/?14182> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Message sent via/by Gna! >>>>>>>>>>>>>>>> http://gna.org/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Sébastien Morin >>>>>>>>>>>>>>> PhD Student >>>>>>>>>>>>>>> S. Gagné NMR Laboratory >>>>>>>>>>>>>>> Université Laval& PROTEO >>>>>>>>>>>>>>> Québec, Canada >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Sébastien Morin >>>>>>>>>>>>> PhD Student >>>>>>>>>>>>> S. Gagné NMR Laboratory >>>>>>>>>>>>> Université Laval& PROTEO >>>>>>>>>>>>> Québec, Canada >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Sébastien Morin >>>>>>>>>>> PhD Student >>>>>>>>>>> S. Gagné NMR Laboratory >>>>>>>>>>> Université Laval& PROTEO >>>>>>>>>>> Québec, Canada >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Sébastien Morin >>>>> PhD Student >>>>> S. Gagné NMR Laboratory >>>>> Université Laval& PROTEO >>>>> Québec, Canada >>>>> >>>>> >>>>> >>>>> >>> -- >>> Sébastien Morin >>> PhD Student >>> S. Gagné NMR Laboratory >>> Université Laval& PROTEO >>> Québec, Canada >>> >>> >>> > > -- > Sébastien Morin > PhD Student > S. Gagné NMR Laboratory > Université Laval& PROTEO > Québec, Canada > > > _______________________________________________ > relax (http://nmr-relax.com) > > This is the relax-devel mailing list > [email protected] > > To unsubscribe from this list, get a password > reminder, or change your subscription options, > visit the list information page at > https://mail.gna.org/listinfo/relax-devel > _______________________________________________ relax (http://nmr-relax.com) This is the relax-devel mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel

