Re: Problem during "final" run of d'Auvergne Protocol

Edward d'Auvergne Tue, 06 Mar 2012 03:59:32 -0800

Actually, looking the code, it appears as though the multi-processor
error handling is failing.  Which means that there are probably two
bugs here.  One is causing the program to fail, the second in the
multi-processor error handling is causing the memory error, hiding the
frist problem.  Could you replace the run() function in
multi/uni_processor.py code?  The original code should be:


    def run(self):
        try:
            self.pre_run()
            self.callback.init_master(self)
            self.post_run()
        except Exception, e:
            self.callback.handle_exception(self, e)

Could you replace it with:

    def run(self):
        self.pre_run()
        self.callback.init_master(self)
        self.post_run()

and see what the error message is?  If what I said above is correct,
then this should uncover the first bug (which then triggers the
second).  By the way, how long does it take to test this problem?

Cheers,

Edward



On 6 March 2012 12:49, Edward d'Auvergne <[email protected]> wrote:
> Hi,
>
> Thank you for all the details.  That really helps in narrowing down
> the bug!  From all the info, the bug is without doubt within the
> multi-processor package.  Cheers.  If you have a little time, we can
> work together and fix this.  The changes/fixes will go into the
> repository version, so you'll need a copy of that for testing.  Do you
> have the subversion program installed?  If so, you can obtain the most
> up to date copy from the repository by typing:
>
> $ svn co svn://svn.gna.org/svn/relax/1.3 relax-1.3
>
> or if this doesn't work:
>
> $ svn co http://svn.gna.org/svn/relax/1.3 relax-1.3
>
> If you already have a checked out copy, you can update to the newest
> copy by typing:
>
> $ svn up
>
> I'll look at the second bug you've identifed later.  It would be
> appreciated if you created a second bug report for that problem too.
> I would not recommend reverting to earlier relax versions due to the
> number of bug fixes and other problems solved since then.  This should
> not affect the model-free results, but the bugs could bite elsewhere.
> Hopefully I can fix this problem quickly.
>
> Cheers,
>
> Edward
>
>
> P. S.  For reference, the bug report is https://gna.org/bugs/?19528.
>
>
>
> On 6 March 2012 12:18, Hugh RW Dannatt <[email protected]> wrote:
>> Hi Edward,
>>
>> Your description sounds very likely the cause of the problem, during
>> the time where no output is being produced, the computer gets
>> gradually more and more slow before finally giving up.
>>
>> The error is reproducible such that I have tried it on a couple of
>> different machines and it has failed several times at the same stage.
>> The error messages tend to vary a little, however. Here are another 2
>> of the outputs given when the program has failed (I should clarify all
>> of these messages came from runs done on the same machine, and the
>> second was run with option "-d" but it hasn't helped very much):-
>>
>> Simulation 492
>> Simulation 493
>> Simulation 494
>> Simulation 495
>> Simulation 496
>> Simulation 497
>> Simulation 498
>> Simulation 499
>> Simulation 500
>> Traceback (most recent call last):
>>  File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 136, in run
>>    self.callback.init_master(self)
>>  File "/usr/local/relax-1.3.13/multi/processor.py", line 263, in 
>> default_init_m
>> aster
>> Traceback (most recent call last):
>>  File "/usr/local/bin/relax", line 7, in <module>
>>    relax.start()
>>  File "/usr/local/relax-1.3.13/relax.py", line 100, in start
>>    processor.run()
>>  File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 139, in run
>>    self.callback.handle_exception(self, e)
>>  File "/usr/local/relax-1.3.13/multi/processor.py", line 250, in 
>> default_handle
>> _exception
>>    traceback.print_exc(file=sys.stderr)
>>  File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
>>    print_exception(etype, value, tb, limit, file)
>>  File "/usr/lib/python2.6/traceback.py", line 125, in print_exception
>>    print_tb(tb, limit, file)
>>  File "/usr/lib/python2.6/traceback.py", line 69, in print_tb
>>    line = linecache.getline(filename, lineno, f.f_globals)
>>  File "/usr/lib/python2.6/linecache.py", line 14, in getline
>>    lines = getlines(filename, module_globals)
>>  File "/usr/lib/python2.6/linecache.py", line 40, in getlines
>>    return updatecache(filename, module_globals)
>>  File "/usr/lib/python2.6/linecache.py", line 136, in updatecache
>>    lines = fp.readlines()
>> MemoryError
>> 9203.219u 258.488s 8:05:09.46 32.5%     0+0k 90962440+0io 2215895pf+0w
>>
>> ------------------
>>
>> Simulation 489
>> Simulation 490
>> Simulation 491
>> Simulation 492
>> Simulation 493
>> Simulation 494
>> Simulation 495
>> Simulation 496
>> Simulation 497
>> Simulation 498
>> Simulation 499
>> Simulation 500
>> debug> Execution lock:  Release by 'script UI' ('script' mode).
>> debug> Execution lock:  Release by 'script UI' ('script' mode).
>> Traceback (most recent call last):
>>  File "/progs/Linux/bin/relax13", line 7, in <module>
>>    relax.start()
>>  File "/progs/relax-1.3.13/relax.py", line 100, in start
>>    processor.run()
>>  File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run
>>    self.callback.handle_exception(self, e)
>>  File "/progs/relax-1.3.13/multi/processor.py", line 250, in 
>> default_handle_exc
>> eption
>>    traceback.print_exc(file=sys.stderr)
>>  File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
>>    print_exception(etype, value, tb, limit, file)
>> MemoryError
>>
>> 8006.268u 542.873s 8:34:11.81 27.7%     0+0k 225824840+0io 6192344pf+0w
>>
>> ------------------
>>
>> If the number of MC simulations is dropped even as little as 100, the
>> program finishes the fitting successfully, though I then get an error
>> message to do with the grace files (i've not been using them so I'm
>> not bothered about this though it will be of interest to you no
>> doubt):-
>>
>> Data pipe 'final':  The ts value of 2.6285e-08 is greater than 1.9714e-08, 
>> elimi
>> nating simulation 94 of spin system ':218@N'.
>> Data pipe 'final':  The ts value of 2.6285e-08 is greater than 1.9714e-08, 
>> elimi
>> nating simulation 95 of spin system ':218@N'.
>>
>> relax> monte_carlo.error_analysis(prune=0.0)
>>
>> relax> results.write(file='results', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/
>> final', compress_type=1, force=True)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/results.bz2' 
>> for w
>> riting.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='s2', spin_id=None, 
>> plot_data
>> ='value', file='s2.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
>> , force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2.agr' 
>> for
>> writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='s2f', spin_id=None, 
>> plot_dat
>> a='value', file='s2f.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
>> e', force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2f.agr' 
>> for
>>  writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='s2s', spin_id=None, 
>> plot_dat
>> a='value', file='s2s.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
>> e', force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2s.agr' 
>> for
>>  writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='te', spin_id=None, 
>> plot_data
>> ='value', file='te.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
>> , force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/te.agr' 
>> for
>> writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='tf', spin_id=None, 
>> plot_data
>> ='value', file='tf.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
>> , force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/tf.agr' 
>> for
>> writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='ts', spin_id=None, 
>> plot_data
>> ='value', file='ts.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
>> , force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/ts.agr' 
>> for
>> writing.
>>
>> relax> grace.write(x_data_type='spin', y_data_type='rex', spin_id=None, 
>> plot_dat
>> a='value', file='rex.agr', 
>> dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
>> e', force=True, norm=False)
>> Opening the file '/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/rex.agr' 
>> for
>>  writing.
>> debug> Execution lock:  Release by 'script UI' ('script' mode).
>> debug> Execution lock:  Release by 'script UI' ('script' mode).
>> Traceback (most recent call last):
>>  File "/ld10c/progs/relax-1.3.13/prompt/interpreter.py", line 383, in 
>> exec_scri
>> pt
>>    runpy.run_module(module, globals)
>>  File "/usr/lib/python2.6/runpy.py", line 140, in run_module
>>    fname, loader, pkg_name)
>>  File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
>>    exec code in run_globals
>>  File "/ld10c/home1/hugh/data/pgm298bq/relax/dauvergne_protocol_lessMC.py", 
>> lin
>> e 216, in <module>
>>    dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL, 
>> mf_models=MF_MODEL
>> S, local_tm_models=LOCAL_TM_MODELS, grid_inc=GRID_INC, min_algor=MIN_ALGOR, 
>> mc_s
>> im_num=MC_NUM, conv_loop=CONV_LOOP)
>>  File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line 
>> 223
>> , in __init__
>>    self.execute()
>>  File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line 
>> 710
>> , in execute
>>    self.write_results()
>>  File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line 
>> 837
>> , in write_results
>>    self.interpreter.grace.write(x_data_type='spin', y_data_type='rex', 
>> file='re
>> x.agr',       dir=dir, force=True)
>>  File "/ld10c/progs/relax-1.3.13/prompt/grace.py", line 103, in write
>>    grace.write(x_data_type=x_data_type, y_data_type=y_data_type, 
>> spin_id=spin_i
>> d, plot_data=plot_data, file=file, dir=dir, force=force, norm=norm)
>>  File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 366, in write
>>    write_xy_header(sets=len(data[0]), file=file, data_type=[x_data_type, 
>> y_data
>> _type], seq_type=seq_type, set_names=set_names, norm=norm)
>>  File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 600, in 
>> write_xy_h
>> eader
>>    units = return_units(data_type[i])
>>  File "/ld10c/progs/relax-1.3.13/specific_fns/model_free/main.py", line 
>> 2394, i
>> n return_units
>>    raise RelaxNoSpinSpecError
>> RelaxNoSpinSpecError: RelaxError: The spin system must be specified.
>>
>>
>> 3510.479u 20.741s 59:07.76 99.5%        0+0k 0+3368io 0pf+0w
>>
>> ------------------
>>
>> Finally, this is the output from relax --info as requested:-
>>
>>                                            relax 1.3.13
>>
>>                              Molecular dynamics by NMR data analysis
>>
>>                             Copyright (C) 2001-2006 Edward d'Auvergne
>>                         Copyright (C) 2006-2011 the relax development team
>>
>> This is free software which you are welcome to modify and redistribute
>> under the conditions of the
>> GNU General Public License (GPL).  This program, including all
>> modules, is licensed under the GPL
>> and comes with absolutely no warranty.  For details type 'GPL' within
>> the relax prompt.
>>
>> Assistance in using the relax prompt and scripting interface can be
>> accessed by typing 'help' within
>> the prompt.
>>
>> Processor fabric:  Uni-processor.
>>
>> Hardware information:
>>    Machine:                 i686
>>    Processor:
>>
>> System information:
>>    System:                  Linux
>>    Release:                 2.6.32-37-generic
>>    Version:                 #81-Ubuntu SMP Fri Dec 2 20:35:14 UTC 2011
>>    GNU/Linux version:       Ubuntu 10.04 lucid
>>    Distribution:            Ubuntu 10.04 lucid
>>    Full platform string:
>> Linux-2.6.32-37-generic-i686-with-Ubuntu-10.04-lucid
>>
>> Software information:
>>    Architecture:            32bit ELF
>>    Python version:          2.6.5
>>    Python branch:           tags/r265
>>    Python build:            r265:79063, Apr 16 2010 13:09:56
>>    Python compiler:         GCC 4.4.3
>>    Python implementation:   CPython
>>    Python revision:         79063
>>    Numpy version:           1.3.0
>>    Libc version:            glibc 2.4
>>
>> Python packages (most are optional):
>>
>> Package              Installed       Version         Path
>> minfx                True            Unknown
>> /ld10c/progs/relax-1.3.13/minfx
>> bmrblib              True            Unknown
>> /ld10c/progs/relax-1.3.13/bmrblib
>> numpy                True            1.3.0
>> /usr/lib/python2.6/dist-packages/numpy
>> scipy                True            0.7.0
>> /usr/lib/python2.6/dist-packages/scipy
>> wxPython             False
>> mpi4py               False
>> epydoc               False
>> optparse             True            1.5.3
>> /usr/lib/python2.6/optparse.pyc
>> readline             True
>> /usr/lib/python2.6/lib-dynload/readline.so
>> profile              True
>> /usr/lib/python2.6/profile.pyc
>> bz2                  True
>> /usr/lib/python2.6/lib-dynload/bz2.so
>> gzip                 True                            
>> /usr/lib/python2.6/gzip.pyc
>> os.devnull           True                            
>> /usr/lib/python2.6/os.pyc
>>
>> Compiled relax C modules:
>>    Relaxation curve fitting: True
>>
>> ------------------
>>
>> Apologies for all the detail but I'm not really sure what to do here.
>> If it is the multi-processor part of it that is failing, is installing
>> relax 1.3.11 an option? I previously has 1.3.10 installed and the
>> commands seem to have changed quite a lot since then. What is your
>> opinion on the validity of error estimates based on 100 simulations?
>>
>> Thanks
>>
>> Hugh
>>
>>
>>
>> On 5 March 2012 08:33, Edward d'Auvergne <[email protected]> wrote:
>>> Hi Hugh,
>>>
>>> I'm pretty sure this error has not been encountered before.  It at
>>> least hasn't been reported.  I've never seen anything close to this
>>> before, but I would guess that this is an infinitely recursive
>>> exception (the error is being caught but, in the process, the error
>>> occurs again, being caught a second time, then the 3rd error occurs,
>>> is caught a 3rd time, with this continuing until your computer runs
>>> out of RAM and swap space and relax is killed by the operating
>>> system).  The error seems to occur within the error handing portion of
>>> Gary Thompson's multi-processor framework (you are using the
>>> uni-processor fabric of the framework here), so maybe Gary might know
>>> a solution?
>>>
>>> Is this error reproducible?  For testing, can you drop the number of
>>> Monte Carlo simulations down to say 5?  Running relax with the debug
>>> flag might also help:
>>>
>>> $ relax --debug
>>>
>>> or:
>>>
>>> $ relax -d
>>>
>>> Are you using the GUI or scripting user interface?  The output of:
>>>
>>> $ relax --info
>>>
>>> might also be useful.  As for your data set being too large, relax has
>>> been used on much bigger systems before so this should not be an
>>> issue.  One last thing, would you be able to create a bug report for
>>> this error (https://gna.org/bugs/?func=additem&group=relax)?  All of
>>> the info/log files can then be pasted/attached there, and it is a
>>> useful future reference for anyone who encounters the same or a
>>> similar bug.
>>>
>>> Cheers,
>>>
>>> Edward
>>>
>>>
>>>
>>> On 2 March 2012 12:33, Hugh RW Dannatt <[email protected]> wrote:
>>>> Dear All,
>>>>
>>>> Having completed the fitting of 1 dataset without any problems, I am
>>>> now moving onto another. Everything has worked fine until I change the
>>>> DIFF_MODEL to "final" and try to run the program again to get error
>>>> estimates on my fitted parameters.
>>>>
>>>> The program successfully re-opens all the results file and selects the
>>>> diffusion model. Then all 500 simulations are done without issue, but
>>>> as soon as the program has finished this, it stops outputting anything
>>>> to the screen for a long time (>12 hrs). During this time, the CPU and
>>>> Memory use is very high and the computer runs slowly. Eventually I get
>>>> a "Memory Error" and a whole load of messages outputted to the screen,
>>>> which I have pasted below. I should emphasize that all the stages of
>>>> running this program with different diffusion models have run fine,
>>>> and the computer I'm using is a relatively fast machine (dual core
>>>> Pentium 4, 2 GB RAM).
>>>>
>>>> Has anyone had a similar problem? This dataset is larger than the
>>>> previous one which fit without issue (current one has 6 measurements
>>>> per 176 residues), but I can't imagine this being the cause of this
>>>> problem.
>>>>
>>>> Thanks
>>>>
>>>> Hugh
>>>>
>>>> ----
>>>>
>>>> Simulation 485
>>>> Simulation 486
>>>> Simulation 487
>>>> Simulation 488
>>>> Simulation 489
>>>> Simulation 490
>>>> Simulation 491
>>>> Simulation 492
>>>> Simulation 493
>>>> Simulation 494
>>>> Simulation 495
>>>> Simulation 496
>>>> Simulation 497
>>>> Simulation 498
>>>> Simulation 499
>>>> Simulation 500
>>>>
>>>>
>>>> Traceback (most recent call last):
>>>>  File "/progs/relax-1.3.13/multi/uni_processor.py", line 136, in run
>>>>    self.callback.init_master(self)
>>>>  File "/progs/relax-1.3.13/multi/processor.py", line 263, in
>>>> default_init_master
>>>>    self.master.run()
>>>>  File "/progs/relax-1.3.13/relax.py", line 171, in run
>>>>    self.interpreter.run(self.script_file)
>>>>  File "/progs/relax-1.3.13/prompt/interpreter.py", line 300, in run
>>>>    return run_script(intro=self.__intro_string, local=locals(),
>>>> script_file=script_file, quit=self.__quit_flag,
>>>> show_script=self.__show_script,
>>>> raise_relax_error=self.__raise_relax_error)
>>>>  File "/progs/relax-1.3.13/prompt/interpreter.py", line 610, in run_script
>>>>    return console.interact(intro, local, script_file, quit,
>>>> show_script=show_script, raise_relax_error=raise_relax_error)
>>>>  File "/progs/relax-1.3.13/prompt/interpreter.py", line 495, in 
>>>> interact_script
>>>>    exec_script(script_file, local)
>>>>  File "/progs/relax-1.3.13/prompt/interpreter.py", line 383, in exec_script
>>>>    runpy.run_module(module, globals)
>>>>  File "/usr/lib/python2.6/runpy.py", line 140, in run_module
>>>>    fname, loader, pkg_name)
>>>>  File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
>>>>    exec code in run_globals
>>>>  File "/home1/hugh/data/pgm298bq/relax/dauvergne_protocol.py", line
>>>> 216, in <module>
>>>>    dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL,
>>>> mf_models=MF_MODELS, local_tm_models=LOCAL_TM_MODELS,
>>>> grid_inc=GRID_INC, min_algor=MIN_ALGOR, mc_sim_num=MC_NUM,
>>>> conv_loop=CONV_LOOP)
>>>>  File "/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line
>>>> 223, in __init__
>>>> Traceback (most recent call last):
>>>>  File "/progs/Linux/bin/relax13", line 7, in <module>
>>>>    relax.start()
>>>>  File "/progs/relax-1.3.13/relax.py", line 100, in start
>>>>    processor.run()
>>>>  File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run
>>>>    self.callback.handle_exception(self, e)
>>>>  File "/progs/relax-1.3.13/multi/processor.py", line 250, in
>>>> default_handle_exception
>>>>    traceback.print_exc(file=sys.stderr)
>>>>  File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
>>>>    print_exception(etype, value, tb, limit, file)
>>>>  File "/usr/lib/python2.6/traceback.py", line 125, in print_exception
>>>>    print_tb(tb, limit, file)
>>>>  File "/usr/lib/python2.6/traceback.py", line 69, in print_tb
>>>>    line = linecache.getline(filename, lineno, f.f_globals)
>>>>  File "/usr/lib/python2.6/linecache.py", line 14, in getline
>>>>    lines = getlines(filename, module_globals)
>>>>  File "/usr/lib/python2.6/linecache.py", line 40, in getlines
>>>>    return updatecache(filename, module_globals)
>>>>  File "/usr/lib/python2.6/linecache.py", line 136, in updatecache
>>>>    lines = fp.readlines()
>>>> MemoryError
>>>> 9078.655u 666.933s 10:55:29.66 24.7%    0+0k 241482000+0io 6665721pf+0w
>>>>
>>>> _______________________________________________
>>>> relax (http://nmr-relax.com)
>>>>
>>>> This is the relax-users mailing list
>>>> [email protected]
>>>>
>>>> To unsubscribe from this list, get a password
>>>> reminder, or change your subscription options,
>>>> visit the list information page at
>>>> https://mail.gna.org/listinfo/relax-users
>>
>>
>>
>> --
>> Hugh Dannatt
>> PhD Student Researcher
>>
>> Prof. Jon Waltho Lab
>> Department of Molecular Biology & Biotechnology
>> University of Sheffield
>> Firth Court
>> Western Bank
>> Sheffield
>> S10 2TN
>>
>> 0114 222 2729

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Problem during "final" run of d'Auvergne Protocol

Reply via email to