We have "job" A, running on userid A, that writes (replaces) file CEA DATA in a 
filecontrol directory 
using COPYFILE (REPLACE. Then "job" B, running on userid B, comes along and 
reads the file and 
creates a DB2 table. This has been running successfully for several months.

On Friday, the DB2 table had only 15,000 rows in it, instead of the usual 
204,000 rows. (And I had 
300+ messages in my  email!)

Console log for user B shows 204,000 records in CEA FILE -- using REXX

stream('CEA FILE fm','c','query info')

Then it processes the file, and only reads (and writes) 15,000 records.

Normally, userid A is autologged at about 2:00 AM and userid B at 3:30 AM. 
(Both jobs take about 
20 minutes.) On Friday morning, though, both were autologged within the same 
minute, at 3:30 
AM. (We do not yet know why job A was delayed.)

I was under the impression that COPYFILE created a temporary file, and only 
renamed it to the true 
filename after the copy was completed. I was also under  the impression that 
once the file is 
opened, SFS guarantees the the reader won't see a change to the file until the 
file is closed. So 
user B should have either processed all 204,000 records of the old file, or all 
204,000 records of 
the new file -- but that is not what happened.

The timestamp on the file now shows 3:50 AM, 20 minutes after the two userids 
were autologged.

What happened?

And how do we prevent it happening again -- some kind of explicit SFS lock? Do 
our own rename? 
(Why didn't implicit SFS locks handle this?)

As far as we are concerned, if the new file is not available at 3:30 AM, we 
just want to abort and 
send an email. We're a LOT better off with all 204,000 rows from yesterday than 
with only 15,000 
rows from today.

Alan Ackerman
Alan (dot) Ackerman (at) Bank of America (dot) com

Reply via email to