I did a bit more digging to see if I could figure this out. My
assumption is that I am missing a configuration parameter somewhere.
So I was trying to find where the mysql query is being incorrectly
formed and figured it must be in.

src/plugins/accounting_storage/mysql/accounting_storage_mysql.c

But I can't figure out where this query would be getting put together.
There are bits and pieces of the query being formed, but I can't pin
down the exact line with the error.

Like I said it is really strange because the mysql tables are being
updated as jobs run and I can query all the past jobs but the sshare
information is not changing.

Tim

On Thu, Nov 8, 2012 at 2:57 PM, Tim Carlson <[email protected]> wrote:
> Just upgrade SLURM on our cluster from 2.2.7 to 2.4.3 and now realized
> (a couple of weeks later) that my account for jobs is broken.
>
> I use
>
> AccountingStorageType=accounting_storage/slurmdbd
>
> And I see this in the logs when starting up slurmdbd
>
> [2012-11-08T14:43:09] DBD_JOB_COMPLETE: cluster not registered
> [2012-11-08T14:43:09] error: accounting_storage_mysql.c:2612 no cluster name
> [2012-11-08T14:43:10] error: mysql_query failed: 1064 You have an
> error in your SQL syntax; check the manual that corresponds to your
> MySQL server version for the right syntax to use near ')' at line 1
> select name, control_port from cluster_table where deleted=0 && ();
> [2012-11-08T14:43:10] error: no result given for  where deleted=0 && ()
> [2012-11-08T14:43:10] error: Processing last message from connection
> 10(172.16.0.1) uid(500)
> [2012-11-08T14:43:10] error: We should have gotten a new id: Table
> 'slurm_acct_db.(null)_job_table' doesn't exist
> [2012-11-08T14:43:10] error: It looks like the storage has gone away
> trying to reconnect
> [2012-11-08T14:43:10] error: We should have gotten a new id: Table
> 'slurm_acct_db.(null)_job_table' doesn't exist
> [2012-11-08T14:43:10] DBD_JOB_START: cluster not registered
>
> Mysql is the backend database type. It seems like I missed a step when
> upgrading from 2.2.7 to 2.4.3 but I can't figure out what it would be.
>
> sacctmgr seems to think the cluster is registered
>
> # sacctmgr list cluster
>    Cluster     ControlHost  ControlPort RPC     Share GrpJobs GrpNodes
> GrpSubmit MaxJobs MaxNodes MaxSubmit     MaxWall                  QOS
>  Def QOS
> ---------- --------------- ------------ --- --------- ------- --------
> --------- ------- -------- --------- ----------- --------------------
> ---------
>    olympus      172.16.0.1         6817  10         1
>                                                                normal
>
>
> The queue runs just and sacct shows me all the jobs that have run but
> I'm not getting any updates to sshare which I use for accounting
> purposes with sbank. Any ideas?
>
> Thanks
>
> Tim

Reply via email to