One other point is that I've recently been working on cleaning up, simplifying, and fixing a few IO stream bugs the multi-processor package in the 1.3 line of the relax repository since I tagged and released the 1.3.13 version. So there is a slight chance that I may accientally have fixed the problem already. But you'll need to check out the most up to date repository code with the subversion program to test this.
Regards, Edward On 6 March 2012 12:58, Edward d'Auvergne <[email protected]> wrote: > Actually, looking the code, it appears as though the multi-processor > error handling is failing. Which means that there are probably two > bugs here. One is causing the program to fail, the second in the > multi-processor error handling is causing the memory error, hiding the > frist problem. Could you replace the run() function in > multi/uni_processor.py code? The original code should be: > > def run(self): > try: > self.pre_run() > self.callback.init_master(self) > self.post_run() > except Exception, e: > self.callback.handle_exception(self, e) > > Could you replace it with: > > def run(self): > self.pre_run() > self.callback.init_master(self) > self.post_run() > > and see what the error message is? If what I said above is correct, > then this should uncover the first bug (which then triggers the > second). By the way, how long does it take to test this problem? > > Cheers, > > Edward > > > > On 6 March 2012 12:49, Edward d'Auvergne <[email protected]> wrote: >> Hi, >> >> Thank you for all the details. That really helps in narrowing down >> the bug! From all the info, the bug is without doubt within the >> multi-processor package. Cheers. If you have a little time, we can >> work together and fix this. The changes/fixes will go into the >> repository version, so you'll need a copy of that for testing. Do you >> have the subversion program installed? If so, you can obtain the most >> up to date copy from the repository by typing: >> >> $ svn co svn://svn.gna.org/svn/relax/1.3 relax-1.3 >> >> or if this doesn't work: >> >> $ svn co http://svn.gna.org/svn/relax/1.3 relax-1.3 >> >> If you already have a checked out copy, you can update to the newest >> copy by typing: >> >> $ svn up >> >> I'll look at the second bug you've identifed later. It would be >> appreciated if you created a second bug report for that problem too. >> I would not recommend reverting to earlier relax versions due to the >> number of bug fixes and other problems solved since then. This should >> not affect the model-free results, but the bugs could bite elsewhere. >> Hopefully I can fix this problem quickly. >> >> Cheers, >> >> Edward >> >> >> P. S. For reference, the bug report is https://gna.org/bugs/?19528. >> >> >> >> On 6 March 2012 12:18, Hugh RW Dannatt <[email protected]> wrote: >>> Hi Edward, >>> >>> Your description sounds very likely the cause of the problem, during >>> the time where no output is being produced, the computer gets >>> gradually more and more slow before finally giving up. >>> >>> The error is reproducible such that I have tried it on a couple of >>> different machines and it has failed several times at the same stage. >>> The error messages tend to vary a little, however. Here are another 2 >>> of the outputs given when the program has failed (I should clarify all >>> of these messages came from runs done on the same machine, and the >>> second was run with option "-d" but it hasn't helped very much):- >>> >>> Simulation 492 >>> Simulation 493 >>> Simulation 494 >>> Simulation 495 >>> Simulation 496 >>> Simulation 497 >>> Simulation 498 >>> Simulation 499 >>> Simulation 500 >>> Traceback (most recent call last): >>> File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 136, in run >>> self.callback.init_master(self) >>> File "/usr/local/relax-1.3.13/multi/processor.py", line 263, in >>> default_init_m >>> aster >>> Traceback (most recent call last): >>> File "/usr/local/bin/relax", line 7, in <module> >>> relax.start() >>> File "/usr/local/relax-1.3.13/relax.py", line 100, in start >>> processor.run() >>> File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 139, in run >>> self.callback.handle_exception(self, e) >>> File "/usr/local/relax-1.3.13/multi/processor.py", line 250, in >>> default_handle >>> _exception >>> traceback.print_exc(file=sys.stderr) >>> File "/usr/lib/python2.6/traceback.py", line 227, in print_exc >>> print_exception(etype, value, tb, limit, file) >>> File "/usr/lib/python2.6/traceback.py", line 125, in print_exception >>> print_tb(tb, limit, file) >>> File "/usr/lib/python2.6/traceback.py", line 69, in print_tb >>> line = linecache.getline(filename, lineno, f.f_globals) >>> File "/usr/lib/python2.6/linecache.py", line 14, in getline >>> lines = getlines(filename, module_globals) >>> File "/usr/lib/python2.6/linecache.py", line 40, in getlines >>> return updatecache(filename, module_globals) >>> File "/usr/lib/python2.6/linecache.py", line 136, in updatecache >>> lines = fp.readlines() >>> MemoryError >>> 9203.219u 258.488s 8:05:09.46 32.5% 0+0k 90962440+0io 2215895pf+0w >>> >>> ------------------ >>> >>> Simulation 489 >>> Simulation 490 >>> Simulation 491 >>> Simulation 492 >>> Simulation 493 >>> Simulation 494 >>> Simulation 495 >>> Simulation 496 >>> Simulation 497 >>> Simulation 498 >>> Simulation 499 >>> Simulation 500 >>> debug> Execution lock: Release by 'script UI' ('script' mode). >>> debug> Execution lock: Release by 'script UI' ('script' mode). >>> Traceback (most recent call last): >>> File "/progs/Linux/bin/relax13", line 7, in <module> >>> relax.start() >>> File "/progs/relax-1.3.13/relax.py", line 100, in start >>> processor.run() >>> File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run >>> self.callback.handle_exception(self, e) >>> File "/progs/relax-1.3.13/multi/processor.py", line 250, in >>> default_handle_exc >>> eption >>> traceback.print_exc(file=sys.stderr) >>> File "/usr/lib/python2.6/traceback.py", line 227, in print_exc >>> print_exception(etype, value, tb, limit, file) >>> MemoryError >>> >>> 8006.268u 542.873s 8:34:11.81 27.7% 0+0k 225824840+0io 6192344pf+0w >>> >>> ------------------ >>> >>> If the number of MC simulations is dropped even as little as 100, the >>> program finishes the fitting successfully, though I then get an error >>> message to do with the grace files (i've not been using them so I'm >>> not bothered about this though it will be of interest to you no >>> doubt):- >>> >>> Data pipe 'final': The ts value of 2.6285e-08 is greater than 1.9714e-08, >>> elimi >>> nating simulation 94 of spin system ':218@N'. >>> Data pipe 'final': The ts value of 2.6285e-08 is greater than 1.9714e-08, >>> elimi >>> nating simulation 95 of spin system ':218@N'. >>> >>> relax> monte_carlo.error_analysis(prune=0.0) >>> >>> relax> results.write(file='results', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/ >>> final', compress_type=1, force=True) >>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/results.bz2' >>> for w >>> riting. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='s2', spin_id=None, >>> plot_data >>> ='value', file='s2.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace' >>> , force=True, norm=False) >>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2.agr' >>> for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='s2f', spin_id=None, >>> plot_dat >>> a='value', file='s2f.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac >>> e', force=True, norm=False) >>> Opening the file >>> '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2f.agr' for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='s2s', spin_id=None, >>> plot_dat >>> a='value', file='s2s.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac >>> e', force=True, norm=False) >>> Opening the file >>> '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2s.agr' for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='te', spin_id=None, >>> plot_data >>> ='value', file='te.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace' >>> , force=True, norm=False) >>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/te.agr' >>> for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='tf', spin_id=None, >>> plot_data >>> ='value', file='tf.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace' >>> , force=True, norm=False) >>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/tf.agr' >>> for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='ts', spin_id=None, >>> plot_data >>> ='value', file='ts.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace' >>> , force=True, norm=False) >>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/ts.agr' >>> for >>> writing. >>> >>> relax> grace.write(x_data_type='spin', y_data_type='rex', spin_id=None, >>> plot_dat >>> a='value', file='rex.agr', >>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac >>> e', force=True, norm=False) >>> Opening the file >>> '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/rex.agr' for >>> writing. >>> debug> Execution lock: Release by 'script UI' ('script' mode). >>> debug> Execution lock: Release by 'script UI' ('script' mode). >>> Traceback (most recent call last): >>> File "/ld10c/progs/relax-1.3.13/prompt/interpreter.py", line 383, in >>> exec_scri >>> pt >>> runpy.run_module(module, globals) >>> File "/usr/lib/python2.6/runpy.py", line 140, in run_module >>> fname, loader, pkg_name) >>> File "/usr/lib/python2.6/runpy.py", line 34, in _run_code >>> exec code in run_globals >>> File "/ld10c/home1/hugh/data/pgm298bq/relax/dauvergne_protocol_lessMC.py", >>> lin >>> e 216, in <module> >>> dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL, >>> mf_models=MF_MODEL >>> S, local_tm_models=LOCAL_TM_MODELS, grid_inc=GRID_INC, min_algor=MIN_ALGOR, >>> mc_s >>> im_num=MC_NUM, conv_loop=CONV_LOOP) >>> File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line >>> 223 >>> , in __init__ >>> self.execute() >>> File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line >>> 710 >>> , in execute >>> self.write_results() >>> File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line >>> 837 >>> , in write_results >>> self.interpreter.grace.write(x_data_type='spin', y_data_type='rex', >>> file='re >>> x.agr', dir=dir, force=True) >>> File "/ld10c/progs/relax-1.3.13/prompt/grace.py", line 103, in write >>> grace.write(x_data_type=x_data_type, y_data_type=y_data_type, >>> spin_id=spin_i >>> d, plot_data=plot_data, file=file, dir=dir, force=force, norm=norm) >>> File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 366, in write >>> write_xy_header(sets=len(data[0]), file=file, data_type=[x_data_type, >>> y_data >>> _type], seq_type=seq_type, set_names=set_names, norm=norm) >>> File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 600, in >>> write_xy_h >>> eader >>> units = return_units(data_type[i]) >>> File "/ld10c/progs/relax-1.3.13/specific_fns/model_free/main.py", line >>> 2394, i >>> n return_units >>> raise RelaxNoSpinSpecError >>> RelaxNoSpinSpecError: RelaxError: The spin system must be specified. >>> >>> >>> 3510.479u 20.741s 59:07.76 99.5% 0+0k 0+3368io 0pf+0w >>> >>> ------------------ >>> >>> Finally, this is the output from relax --info as requested:- >>> >>> relax 1.3.13 >>> >>> Molecular dynamics by NMR data analysis >>> >>> Copyright (C) 2001-2006 Edward d'Auvergne >>> Copyright (C) 2006-2011 the relax development team >>> >>> This is free software which you are welcome to modify and redistribute >>> under the conditions of the >>> GNU General Public License (GPL). This program, including all >>> modules, is licensed under the GPL >>> and comes with absolutely no warranty. For details type 'GPL' within >>> the relax prompt. >>> >>> Assistance in using the relax prompt and scripting interface can be >>> accessed by typing 'help' within >>> the prompt. >>> >>> Processor fabric: Uni-processor. >>> >>> Hardware information: >>> Machine: i686 >>> Processor: >>> >>> System information: >>> System: Linux >>> Release: 2.6.32-37-generic >>> Version: #81-Ubuntu SMP Fri Dec 2 20:35:14 UTC 2011 >>> GNU/Linux version: Ubuntu 10.04 lucid >>> Distribution: Ubuntu 10.04 lucid >>> Full platform string: >>> Linux-2.6.32-37-generic-i686-with-Ubuntu-10.04-lucid >>> >>> Software information: >>> Architecture: 32bit ELF >>> Python version: 2.6.5 >>> Python branch: tags/r265 >>> Python build: r265:79063, Apr 16 2010 13:09:56 >>> Python compiler: GCC 4.4.3 >>> Python implementation: CPython >>> Python revision: 79063 >>> Numpy version: 1.3.0 >>> Libc version: glibc 2.4 >>> >>> Python packages (most are optional): >>> >>> Package Installed Version Path >>> minfx True Unknown >>> /ld10c/progs/relax-1.3.13/minfx >>> bmrblib True Unknown >>> /ld10c/progs/relax-1.3.13/bmrblib >>> numpy True 1.3.0 >>> /usr/lib/python2.6/dist-packages/numpy >>> scipy True 0.7.0 >>> /usr/lib/python2.6/dist-packages/scipy >>> wxPython False >>> mpi4py False >>> epydoc False >>> optparse True 1.5.3 >>> /usr/lib/python2.6/optparse.pyc >>> readline True >>> /usr/lib/python2.6/lib-dynload/readline.so >>> profile True >>> /usr/lib/python2.6/profile.pyc >>> bz2 True >>> /usr/lib/python2.6/lib-dynload/bz2.so >>> gzip True >>> /usr/lib/python2.6/gzip.pyc >>> os.devnull True >>> /usr/lib/python2.6/os.pyc >>> >>> Compiled relax C modules: >>> Relaxation curve fitting: True >>> >>> ------------------ >>> >>> Apologies for all the detail but I'm not really sure what to do here. >>> If it is the multi-processor part of it that is failing, is installing >>> relax 1.3.11 an option? I previously has 1.3.10 installed and the >>> commands seem to have changed quite a lot since then. What is your >>> opinion on the validity of error estimates based on 100 simulations? >>> >>> Thanks >>> >>> Hugh >>> >>> >>> >>> On 5 March 2012 08:33, Edward d'Auvergne <[email protected]> wrote: >>>> Hi Hugh, >>>> >>>> I'm pretty sure this error has not been encountered before. It at >>>> least hasn't been reported. I've never seen anything close to this >>>> before, but I would guess that this is an infinitely recursive >>>> exception (the error is being caught but, in the process, the error >>>> occurs again, being caught a second time, then the 3rd error occurs, >>>> is caught a 3rd time, with this continuing until your computer runs >>>> out of RAM and swap space and relax is killed by the operating >>>> system). The error seems to occur within the error handing portion of >>>> Gary Thompson's multi-processor framework (you are using the >>>> uni-processor fabric of the framework here), so maybe Gary might know >>>> a solution? >>>> >>>> Is this error reproducible? For testing, can you drop the number of >>>> Monte Carlo simulations down to say 5? Running relax with the debug >>>> flag might also help: >>>> >>>> $ relax --debug >>>> >>>> or: >>>> >>>> $ relax -d >>>> >>>> Are you using the GUI or scripting user interface? The output of: >>>> >>>> $ relax --info >>>> >>>> might also be useful. As for your data set being too large, relax has >>>> been used on much bigger systems before so this should not be an >>>> issue. One last thing, would you be able to create a bug report for >>>> this error (https://gna.org/bugs/?func=additem&group=relax)? All of >>>> the info/log files can then be pasted/attached there, and it is a >>>> useful future reference for anyone who encounters the same or a >>>> similar bug. >>>> >>>> Cheers, >>>> >>>> Edward >>>> >>>> >>>> >>>> On 2 March 2012 12:33, Hugh RW Dannatt <[email protected]> wrote: >>>>> Dear All, >>>>> >>>>> Having completed the fitting of 1 dataset without any problems, I am >>>>> now moving onto another. Everything has worked fine until I change the >>>>> DIFF_MODEL to "final" and try to run the program again to get error >>>>> estimates on my fitted parameters. >>>>> >>>>> The program successfully re-opens all the results file and selects the >>>>> diffusion model. Then all 500 simulations are done without issue, but >>>>> as soon as the program has finished this, it stops outputting anything >>>>> to the screen for a long time (>12 hrs). During this time, the CPU and >>>>> Memory use is very high and the computer runs slowly. Eventually I get >>>>> a "Memory Error" and a whole load of messages outputted to the screen, >>>>> which I have pasted below. I should emphasize that all the stages of >>>>> running this program with different diffusion models have run fine, >>>>> and the computer I'm using is a relatively fast machine (dual core >>>>> Pentium 4, 2 GB RAM). >>>>> >>>>> Has anyone had a similar problem? This dataset is larger than the >>>>> previous one which fit without issue (current one has 6 measurements >>>>> per 176 residues), but I can't imagine this being the cause of this >>>>> problem. >>>>> >>>>> Thanks >>>>> >>>>> Hugh >>>>> >>>>> ---- >>>>> >>>>> Simulation 485 >>>>> Simulation 486 >>>>> Simulation 487 >>>>> Simulation 488 >>>>> Simulation 489 >>>>> Simulation 490 >>>>> Simulation 491 >>>>> Simulation 492 >>>>> Simulation 493 >>>>> Simulation 494 >>>>> Simulation 495 >>>>> Simulation 496 >>>>> Simulation 497 >>>>> Simulation 498 >>>>> Simulation 499 >>>>> Simulation 500 >>>>> >>>>> >>>>> Traceback (most recent call last): >>>>> File "/progs/relax-1.3.13/multi/uni_processor.py", line 136, in run >>>>> self.callback.init_master(self) >>>>> File "/progs/relax-1.3.13/multi/processor.py", line 263, in >>>>> default_init_master >>>>> self.master.run() >>>>> File "/progs/relax-1.3.13/relax.py", line 171, in run >>>>> self.interpreter.run(self.script_file) >>>>> File "/progs/relax-1.3.13/prompt/interpreter.py", line 300, in run >>>>> return run_script(intro=self.__intro_string, local=locals(), >>>>> script_file=script_file, quit=self.__quit_flag, >>>>> show_script=self.__show_script, >>>>> raise_relax_error=self.__raise_relax_error) >>>>> File "/progs/relax-1.3.13/prompt/interpreter.py", line 610, in run_script >>>>> return console.interact(intro, local, script_file, quit, >>>>> show_script=show_script, raise_relax_error=raise_relax_error) >>>>> File "/progs/relax-1.3.13/prompt/interpreter.py", line 495, in >>>>> interact_script >>>>> exec_script(script_file, local) >>>>> File "/progs/relax-1.3.13/prompt/interpreter.py", line 383, in >>>>> exec_script >>>>> runpy.run_module(module, globals) >>>>> File "/usr/lib/python2.6/runpy.py", line 140, in run_module >>>>> fname, loader, pkg_name) >>>>> File "/usr/lib/python2.6/runpy.py", line 34, in _run_code >>>>> exec code in run_globals >>>>> File "/home1/hugh/data/pgm298bq/relax/dauvergne_protocol.py", line >>>>> 216, in <module> >>>>> dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL, >>>>> mf_models=MF_MODELS, local_tm_models=LOCAL_TM_MODELS, >>>>> grid_inc=GRID_INC, min_algor=MIN_ALGOR, mc_sim_num=MC_NUM, >>>>> conv_loop=CONV_LOOP) >>>>> File "/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line >>>>> 223, in __init__ >>>>> Traceback (most recent call last): >>>>> File "/progs/Linux/bin/relax13", line 7, in <module> >>>>> relax.start() >>>>> File "/progs/relax-1.3.13/relax.py", line 100, in start >>>>> processor.run() >>>>> File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run >>>>> self.callback.handle_exception(self, e) >>>>> File "/progs/relax-1.3.13/multi/processor.py", line 250, in >>>>> default_handle_exception >>>>> traceback.print_exc(file=sys.stderr) >>>>> File "/usr/lib/python2.6/traceback.py", line 227, in print_exc >>>>> print_exception(etype, value, tb, limit, file) >>>>> File "/usr/lib/python2.6/traceback.py", line 125, in print_exception >>>>> print_tb(tb, limit, file) >>>>> File "/usr/lib/python2.6/traceback.py", line 69, in print_tb >>>>> line = linecache.getline(filename, lineno, f.f_globals) >>>>> File "/usr/lib/python2.6/linecache.py", line 14, in getline >>>>> lines = getlines(filename, module_globals) >>>>> File "/usr/lib/python2.6/linecache.py", line 40, in getlines >>>>> return updatecache(filename, module_globals) >>>>> File "/usr/lib/python2.6/linecache.py", line 136, in updatecache >>>>> lines = fp.readlines() >>>>> MemoryError >>>>> 9078.655u 666.933s 10:55:29.66 24.7% 0+0k 241482000+0io 6665721pf+0w >>>>> >>>>> _______________________________________________ >>>>> relax (http://nmr-relax.com) >>>>> >>>>> This is the relax-users mailing list >>>>> [email protected] >>>>> >>>>> To unsubscribe from this list, get a password >>>>> reminder, or change your subscription options, >>>>> visit the list information page at >>>>> https://mail.gna.org/listinfo/relax-users >>> >>> >>> >>> -- >>> Hugh Dannatt >>> PhD Student Researcher >>> >>> Prof. Jon Waltho Lab >>> Department of Molecular Biology & Biotechnology >>> University of Sheffield >>> Firth Court >>> Western Bank >>> Sheffield >>> S10 2TN >>> >>> 0114 222 2729 _______________________________________________ relax (http://nmr-relax.com) This is the relax-users mailing list [email protected] To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users

