I did a bit more digging to see if I could figure this out. My assumption is that I am missing a configuration parameter somewhere. So I was trying to find where the mysql query is being incorrectly formed and figured it must be in.
src/plugins/accounting_storage/mysql/accounting_storage_mysql.c But I can't figure out where this query would be getting put together. There are bits and pieces of the query being formed, but I can't pin down the exact line with the error. Like I said it is really strange because the mysql tables are being updated as jobs run and I can query all the past jobs but the sshare information is not changing. Tim On Thu, Nov 8, 2012 at 2:57 PM, Tim Carlson <[email protected]> wrote: > Just upgrade SLURM on our cluster from 2.2.7 to 2.4.3 and now realized > (a couple of weeks later) that my account for jobs is broken. > > I use > > AccountingStorageType=accounting_storage/slurmdbd > > And I see this in the logs when starting up slurmdbd > > [2012-11-08T14:43:09] DBD_JOB_COMPLETE: cluster not registered > [2012-11-08T14:43:09] error: accounting_storage_mysql.c:2612 no cluster name > [2012-11-08T14:43:10] error: mysql_query failed: 1064 You have an > error in your SQL syntax; check the manual that corresponds to your > MySQL server version for the right syntax to use near ')' at line 1 > select name, control_port from cluster_table where deleted=0 && (); > [2012-11-08T14:43:10] error: no result given for where deleted=0 && () > [2012-11-08T14:43:10] error: Processing last message from connection > 10(172.16.0.1) uid(500) > [2012-11-08T14:43:10] error: We should have gotten a new id: Table > 'slurm_acct_db.(null)_job_table' doesn't exist > [2012-11-08T14:43:10] error: It looks like the storage has gone away > trying to reconnect > [2012-11-08T14:43:10] error: We should have gotten a new id: Table > 'slurm_acct_db.(null)_job_table' doesn't exist > [2012-11-08T14:43:10] DBD_JOB_START: cluster not registered > > Mysql is the backend database type. It seems like I missed a step when > upgrading from 2.2.7 to 2.4.3 but I can't figure out what it would be. > > sacctmgr seems to think the cluster is registered > > # sacctmgr list cluster > Cluster ControlHost ControlPort RPC Share GrpJobs GrpNodes > GrpSubmit MaxJobs MaxNodes MaxSubmit MaxWall QOS > Def QOS > ---------- --------------- ------------ --- --------- ------- -------- > --------- ------- -------- --------- ----------- -------------------- > --------- > olympus 172.16.0.1 6817 10 1 > normal > > > The queue runs just and sacct shows me all the jobs that have run but > I'm not getting any updates to sshare which I use for accounting > purposes with sbank. Any ideas? > > Thanks > > Tim
