On Tuesday, 11/22/2005 at 10:48 CST, Alan Ackerman 
<[EMAIL PROTECTED]> wrote:
> We have "job" A, running on userid A, that writes (replaces) file CEA 
DATA in a 
> filecontrol directory using COPYFILE (REPLACE. Then "job" B, running on 
userid B,
> comes along and reads the file and creates a DB2 table.
> This has been running successfully for several months.

Are both files in the same filepool?  Or are you copying from mindisk to 
SFS?

> On Friday, the DB2 table had only 15,000 rows in it, instead of the 
usual 
> 204,000 rows. (And I had 300+ messages in my  email!)

Ah.  Like reading this list after being gone for 48 hours.  :-)

> Console log for user B shows 204,000 records in CEA FILE -- using REXX
> 
> stream('CEA FILE fm','c','query info')
> 
> Then it processes the file, and only reads (and writes) 15,000 records.
> 
> Normally, userid A is autologged at about 2:00 AM and userid B at 3:30 
AM. 
> (Both jobs take about 20 minutes.) On Friday morning, though, both were
> autologged within the same minute, at 3:30 AM. (We do not yet know why
> job A was delayed.)
> 
> I was under the impression that COPYFILE created a temporary file, and 
only 
> renamed it to the true filename after the copy was completed. I was also
> under  the impression that once the file is opened, SFS guarantees the
> the reader won't see a change to the file until the file is closed. So
> user B should have either processed all 204,000 records of the old file, 
or all 
> 204,000 records of the new file -- but that is not what happened.
> 
> The timestamp on the file now shows 3:50 AM, 20 minutes after the two 
userids 
> were autologged.
> 
> What happened?

You tell us.  Why did your program stop reading after 15,000 records?

By the way, it is very likely that the stream() call above creates a 
time-of-test-to-time-of-use window that can cause problems.  It could show 
204,000 records, but by the time the program actually opens the file for 
reading, the file may have changed.  Just open the file and read to eof, 
or open the file, then do the 'query info', then read.

> And how do we prevent it happening again -- some kind of explicit SFS 
lock? Do 
> our own rename? (Why didn't implicit SFS locks handle this?)

For filecontrol directories, yes, SFS hides hides all changes made to an 
open file until the changes are committted or the file is closed and 
committed.  The implicit lock is for writing, not reading.  Without an 
explicit exclusive lock, Job B can open the file for read even if Job A 
has the file open for write.  But unless Job A commits changes along the 
way, Job B will not see them.

Is CEA FILE modified in any way prior to the COPYFILE (REPLACE?  COPYFILE 
will commit the output file prior to starting the copy operation to ensure 
it doesn't accidentally append to an already-open file.

> As far as we are concerned, if the new file is not available at 3:30 AM, 
we 
> just want to abort and send an email. We're a LOT better off with all 
204,000 rows from yesterday than 
> with only 15,000 rows from today.

So how does Job B figure out if "the new file is not available"?  That is, 
if it is, for example, autologged twice in a row without Job A having been 
run.

I feel like there is something important missing from your description. 
But if you can reproduce the problem, feel free to call the Support Center 
and they can help you get server traces for them to look at.

Alan Altmark
z/VM Development
IBM Endicott

Reply via email to