This might have something to do with the __APPLE__ weak imports in src/plugins/select/cons_res/select_cons_res.c.
Chaos master HEAD doesn't seem to get this on my OS X 10.6 install. Unfortunately I don't have anything running 10.5 available to debug this one. :\ -Jon On May 16, 2011, at 2:57 PM, Tyler Strickland wrote: > Here's the result of recompiling with --enable-debug: > > cgrc-xs11:~ root# /usr/local/sbin/slurmctld -Dvv > Assertion failed: (l != NULL), function list_count, file list.c, line 351. > Abort trap > > And here's the gdb output: > (gdb) run -Dvv > Starting program: /usr/local/sbin/slurmctld -Dvv > Reading symbols for shared libraries ++. done > Reading symbols for shared libraries . done > Reading symbols for shared libraries .. done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Assertion failed: (l != NULL), function list_count, file list.c, line 351. > > Program received signal SIGABRT, Aborted. > 0x94630e42 in __kill () > (gdb) bt full > #0 0x94630e42 in __kill () > No symbol table info available. > #1 0x94630e34 in kill$UNIX2003 () > No symbol table info available. > #2 0x946a323a in raise () > No symbol table info available. > #3 0x946af679 in abort () > No symbol table info available. > #4 0x946a43db in __assert_rtn () > No symbol table info available. > #5 0x00087abd in list_count () > No symbol table info available. > #6 0x003b5ade in _create_part_data () > No symbol table info available. > #7 0x003b8dd9 in select_p_node_init () > No symbol table info available. > #8 0x000a9796 in select_g_node_init () > No symbol table info available. > #9 0x00059153 in read_slurm_conf () > No symbol table info available. > #10 0x0000a3ec in main () > No symbol table info available. > > Tyler > > On 05/16/2011 11:43 AM, Auble, Danny wrote: >> Could you configure with the --with-debug option and recompile? In any >> case. This appears to be a wild goose chase. Could you also try to compile >> against the lastest trunk in the git repo on github? It has other places >> fixed in headers to make sure we don't miss one in the future. >> >> Danny >> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Tyler >>> Strickland >>> Sent: Friday, May 13, 2011 12:03 PM >>> To: [email protected] >>> Subject: Re: [slurm-dev] slurmctld not starting on OSX 10.5 >>> >>> Here's the full gdb output. What might cause slurm to not be able to >>> access the memory? >>> >>> (gdb) run -Dvv >>> Starting program: /usr/local/sbin/slurmctld -Dvv >>> Reading symbols for shared libraries ++. done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries .. done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> Reading symbols for shared libraries . done >>> >>> Program received signal EXC_BAD_ACCESS, Could not access memory. >>> Reason: KERN_PROTECTION_FAILURE at address: 0x00000014 >>> 0x945cab7e in pthread_mutex_lock () >>> (gdb) bt full >>> #0 0x945cab7e in pthread_mutex_lock () >>> No symbol table info available. >>> #1 0x00079eda in list_count () >>> No symbol table info available. >>> #2 0x00337e0e in _create_part_data () >>> No symbol table info available. >>> #3 0x0033b109 in select_p_node_init () >>> No symbol table info available. >>> #4 0x00096ee9 in select_g_node_init () >>> No symbol table info available. >>> #5 0x000504e3 in read_slurm_conf () >>> No symbol table info available. >>> #6 0x0000a768 in main () >>> No symbol table info available. >>> (gdb) >>> >>> >>> On 05/13/2011 02:36 PM, Auble, Danny wrote: >>>> Could you run it is gdb and get the backtrace? >>>> >>>> gdb slurmctld >>>> (gdb) run -Dvv >>>> ...crash... >>>> (gdb) bt full >>>> >>>> >>>> That might give us something. >>>> >>>> Danny >>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Tyler >>>>> Strickland >>>>> Sent: Friday, May 13, 2011 11:33 AM >>>>> To: [email protected] >>>>> Subject: Re: [slurm-dev] slurmctld not starting on OSX 10.5 >>>>> >>>>> At the risk (OK, guarantee) of showing my ignorance, how might I go >>>>> about doing that? One of the past list posts said to run 'ulimit -c >>>>> unlimited' followed by slurmctld -D, after which the core dump would be >>>>> placed in the current directory (/tmp). Unfortunately, nothing is to be >>>>> found in the folder after the crash. >>>>> >>>>> Thanks, >>>>> Tyler >>>>> >>>>> >>>>> >>>>> On 05/13/2011 02:14 PM, Jette, Moe wrote: >>>>>> If you can get a core file on SIGBUS and generate a backtrace, that may >>>>>> help. >>>>>> ________________________________________ >>>>>> From: [email protected] [[email protected]] On >>>>>> Behalf Of Tyler >>> Strickland >>>>> [[email protected]] >>>>>> Sent: Friday, May 13, 2011 10:42 AM >>>>>> To: [email protected] >>>>>> Subject: [slurm-dev] slurmctld not starting on OSX 10.5 >>>>>> >>>>>> All, >>>>>> >>>>>> After the fun with getting SLURM compiled light night, I've finally >>>>>> succeeded at getting it installed. slurmd starts up fine but slurmctld >>>>>> doesn't - and there are no errors indicating why. When I try to run it >>>>>> with -D the words "Bus Error" are printed and the log appearing much >>>>>> line the one below. >>>>>> >>>>>> The logfile for "slurmd -cvvvvvvvvv" >>>>>> >>>>>> Thanks, >>>>>> Tyler >>>>>> >>>>>> [2011-05-13T13:39:29] pidfile not locked, assuming no running daemon >>>>>> [2011-05-13T13:39:29] debug: sched: slurmctld starting >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/accounting_storage_none.so >>>>>> [2011-05-13T13:39:29] Accounting storage NOT INVOKED plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: not enforcing associations and no list was >>>>>> given so we are giving a blank list >>>>>> [2011-05-13T13:39:29] debug2: No Assoc usage file >>>>>> (/var/lib/slurm/slurmctld/assoc_usage) to recover >>>>>> [2011-05-13T13:39:29] slurmctld version 2.2.5 started on cluster cluster >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/crypto_munge.so >>>>>> [2011-05-13T13:39:29] Munge cryptographic signature plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/select_cons_res.so >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/preempt_none.so >>>>>> [2011-05-13T13:39:29] preempt/none loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/checkpoint_none.so >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] Checkpoint plugin loaded: checkpoint/none >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/jobacct_gather_none.so >>>>>> [2011-05-13T13:39:29] Job accounting gather NOT_INVOKED plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug: No backup controller to shutdown >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/switch_none.so >>>>>> [2011-05-13T13:39:29] switch NONE plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/topology_none.so >>>>>> [2011-05-13T13:39:29] topology NONE plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug: No DownNodes >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/jobcomp_none.so >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin >>>>>> /usr/local/lib/slurm/sched_backfill.so >>>>>> [2011-05-13T13:39:29] sched: Backfill scheduler plugin loaded >>>>>> [2011-05-13T13:39:29] debug3: Success. >>>>>> [2011-05-13T13:39:29] debug: No job state file >>>>>> (/var/lib/slurm/slurmctld/job_state) to recover >>>>>> [2011-05-13T13:39:29] cons_res: select_p_node_init >>>>>> >>>> >>>> >> >> > · · · · — · · — — — Jon O. Bringhurst High Performance Computing Systems - http://lanl.gov Email: [email protected] | Office: +1 505 667 9337 | Blog: http://bringhurst.org Schedule: B
