This might have something to do with the __APPLE__ weak imports in 
src/plugins/select/cons_res/select_cons_res.c.

Chaos master HEAD doesn't seem to get this on my OS X 10.6 install. 
Unfortunately I don't have anything running 10.5 available to debug this one. :\

-Jon

On May 16, 2011, at 2:57 PM, Tyler Strickland wrote:

> Here's the result of recompiling with --enable-debug:
> 
> cgrc-xs11:~ root# /usr/local/sbin/slurmctld -Dvv
> Assertion failed: (l != NULL), function list_count, file list.c, line 351.
> Abort trap
> 
> And here's the gdb output:
> (gdb) run -Dvv
> Starting program: /usr/local/sbin/slurmctld -Dvv
> Reading symbols for shared libraries ++. done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries .. done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Reading symbols for shared libraries . done
> Assertion failed: (l != NULL), function list_count, file list.c, line 351.
> 
> Program received signal SIGABRT, Aborted.
> 0x94630e42 in __kill ()
> (gdb) bt full
> #0  0x94630e42 in __kill ()
> No symbol table info available.
> #1  0x94630e34 in kill$UNIX2003 ()
> No symbol table info available.
> #2  0x946a323a in raise ()
> No symbol table info available.
> #3  0x946af679 in abort ()
> No symbol table info available.
> #4  0x946a43db in __assert_rtn ()
> No symbol table info available.
> #5  0x00087abd in list_count ()
> No symbol table info available.
> #6  0x003b5ade in _create_part_data ()
> No symbol table info available.
> #7  0x003b8dd9 in select_p_node_init ()
> No symbol table info available.
> #8  0x000a9796 in select_g_node_init ()
> No symbol table info available.
> #9  0x00059153 in read_slurm_conf ()
> No symbol table info available.
> #10 0x0000a3ec in main ()
> No symbol table info available.
> 
> Tyler
> 
> On 05/16/2011 11:43 AM, Auble, Danny wrote:
>> Could you configure with the --with-debug option and recompile?  In any 
>> case.  This appears to be a wild goose chase.  Could you also try to compile 
>> against the lastest trunk in the git repo on github?  It has other places 
>> fixed in headers to make sure we don't miss one in the future.
>> 
>> Danny
>> 
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Tyler
>>> Strickland
>>> Sent: Friday, May 13, 2011 12:03 PM
>>> To: [email protected]
>>> Subject: Re: [slurm-dev] slurmctld not starting on OSX 10.5
>>> 
>>> Here's the full gdb output.  What might cause slurm to not be able to
>>> access the memory?
>>> 
>>> (gdb) run -Dvv
>>> Starting program: /usr/local/sbin/slurmctld -Dvv
>>> Reading symbols for shared libraries ++. done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries .. done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> Reading symbols for shared libraries . done
>>> 
>>> Program received signal EXC_BAD_ACCESS, Could not access memory.
>>> Reason: KERN_PROTECTION_FAILURE at address: 0x00000014
>>> 0x945cab7e in pthread_mutex_lock ()
>>> (gdb) bt full
>>> #0  0x945cab7e in pthread_mutex_lock ()
>>> No symbol table info available.
>>> #1  0x00079eda in list_count ()
>>> No symbol table info available.
>>> #2  0x00337e0e in _create_part_data ()
>>> No symbol table info available.
>>> #3  0x0033b109 in select_p_node_init ()
>>> No symbol table info available.
>>> #4  0x00096ee9 in select_g_node_init ()
>>> No symbol table info available.
>>> #5  0x000504e3 in read_slurm_conf ()
>>> No symbol table info available.
>>> #6  0x0000a768 in main ()
>>> No symbol table info available.
>>> (gdb)
>>> 
>>> 
>>> On 05/13/2011 02:36 PM, Auble, Danny wrote:
>>>> Could you run it is gdb and get the backtrace?
>>>> 
>>>> gdb slurmctld
>>>> (gdb) run -Dvv
>>>> ...crash...
>>>> (gdb) bt full
>>>> 
>>>> 
>>>> That might give us something.
>>>> 
>>>> Danny
>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Tyler
>>>>> Strickland
>>>>> Sent: Friday, May 13, 2011 11:33 AM
>>>>> To: [email protected]
>>>>> Subject: Re: [slurm-dev] slurmctld not starting on OSX 10.5
>>>>> 
>>>>> At the risk (OK, guarantee) of showing my ignorance, how might I go
>>>>> about doing that?  One of the past list posts said to run 'ulimit -c
>>>>> unlimited' followed by slurmctld -D, after which the core dump would be
>>>>> placed in the current directory (/tmp).  Unfortunately, nothing is to be
>>>>> found in the folder after the crash.
>>>>> 
>>>>> Thanks,
>>>>> Tyler
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/13/2011 02:14 PM, Jette, Moe wrote:
>>>>>> If you can get a core file on SIGBUS and generate a backtrace, that may 
>>>>>> help.
>>>>>> ________________________________________
>>>>>> From: [email protected] [[email protected]] On 
>>>>>> Behalf Of Tyler
>>> Strickland
>>>>> [[email protected]]
>>>>>> Sent: Friday, May 13, 2011 10:42 AM
>>>>>> To: [email protected]
>>>>>> Subject: [slurm-dev] slurmctld not starting on OSX 10.5
>>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> After the fun with getting SLURM compiled light night, I've finally
>>>>>> succeeded at getting it installed.  slurmd starts up fine but slurmctld
>>>>>> doesn't - and there are no errors indicating why. When I try to run it
>>>>>> with -D the words "Bus Error" are printed and the log appearing much
>>>>>> line the one below.
>>>>>> 
>>>>>> The logfile for "slurmd -cvvvvvvvvv"
>>>>>> 
>>>>>> Thanks,
>>>>>> Tyler
>>>>>> 
>>>>>> [2011-05-13T13:39:29] pidfile not locked, assuming no running daemon
>>>>>> [2011-05-13T13:39:29] debug:  sched: slurmctld starting
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/accounting_storage_none.so
>>>>>> [2011-05-13T13:39:29] Accounting storage NOT INVOKED plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: not enforcing associations and no list was
>>>>>> given so we are giving a blank list
>>>>>> [2011-05-13T13:39:29] debug2: No Assoc usage file
>>>>>> (/var/lib/slurm/slurmctld/assoc_usage) to recover
>>>>>> [2011-05-13T13:39:29] slurmctld version 2.2.5 started on cluster cluster
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/crypto_munge.so
>>>>>> [2011-05-13T13:39:29] Munge cryptographic signature plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/select_cons_res.so
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/preempt_none.so
>>>>>> [2011-05-13T13:39:29] preempt/none loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/checkpoint_none.so
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] Checkpoint plugin loaded: checkpoint/none
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/jobacct_gather_none.so
>>>>>> [2011-05-13T13:39:29] Job accounting gather NOT_INVOKED plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug:  No backup controller to shutdown
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/switch_none.so
>>>>>> [2011-05-13T13:39:29] switch NONE plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/topology_none.so
>>>>>> [2011-05-13T13:39:29] topology NONE plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug:  No DownNodes
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/jobcomp_none.so
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug3: Trying to load plugin
>>>>>> /usr/local/lib/slurm/sched_backfill.so
>>>>>> [2011-05-13T13:39:29] sched: Backfill scheduler plugin loaded
>>>>>> [2011-05-13T13:39:29] debug3: Success.
>>>>>> [2011-05-13T13:39:29] debug:  No job state file
>>>>>> (/var/lib/slurm/slurmctld/job_state) to recover
>>>>>> [2011-05-13T13:39:29] cons_res: select_p_node_init
>>>>>> 
>>>> 
>>>> 
>> 
>> 
> 


· · · · — · · — — —
Jon O. Bringhurst
High Performance Computing Systems - http://lanl.gov

Email: [email protected]  | Office: +1 505 667 9337 | Blog: http://bringhurst.org
Schedule: B


Reply via email to