I wouldn't expect job state to survive any pre release transition. The job format probably changed from pre4 to rc1. The message you sent says that. The previous bug would of resulted in a different error.
[email protected] wrote: Hi Danny, Danny Auble <[email protected]> writes: > Thanks for finding this Carles. As you can see from our last email we > tagged a 2.4.1 and removed 2.4.0 from download so others wouldn't > experience the same fate as you. In addition to your patch I added > some logic to figure out we really were running 2.4 to avoid those > already running 2.4.* to loose job state. Sorry I misspelled your > name on the earlier email as well, I know you aren't Charles :). Either the fix is not working properly, or we managed to hit another similar bug. All jobs was lost when upgrading from 2.4.0-pre4 to 2.4.1: Relevant parts of slurmctld.log: [2012-07-04T11:22:20] slurmctld version 2.4.1 started on cluster triolith ... ... [2012-07-04T11:22:21] Recovered state of 240 nodes [2012-07-04T11:22:21] error: Incomplete job record [2012-07-04T11:22:21] error: Incomplete job data checkpoint file [2012-07-04T11:22:21] Recovered information about 0 jobs [2012-07-04T11:22:21] cons_res: select_p_node_init [2012-07-04T11:22:21] cons_res: preparing for 1 partitions [2012-07-04T11:22:21] Purging files for defunct batch job 1634 [2012-07-04T11:22:21] Purging files for defunct batch job 1956 [2012-07-04T11:22:21] Purging files for defunct batch job 1702 [2012-07-04T11:22:21] Purging files for defunct batch job 1759 [2012-07-04T11:22:21] Purging files for defunct batch job 1676 ... Regards, /Pär
