[EMAIL PROTECTED] (Robert Creager) writes:
> When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500),
> Andrew Sullivan <[EMAIL PROTECTED]> confessed:
>> We have lately had a couple of cases where machines either locked
>> up, slowed down to the point of complete unusability, or died
>> completely while using jfs.  We are _not_ sure that jfs is in fact
>> the culprit.  In one case, a kernel panic appeared to be referring
>> to the jfs kernel module, but I can't be sure as I lost the output
>> immediately thereafter.  Yesterday, we had a problem of data
>> corruption on a failed jfs volume.
>> None of this is to say that jfs is in fact to blame, nor even that,
>> if it is, it does not have something to do with the age of our
>> installations, &c. (these are all RH 8).  In fact, I suspect
>> hardware in both cases.  But I thought I'd mention it just in case
>> other people are seeing strange behaviour, on the principle of
>> "better safe than sorry."
> Interestingly enough, I'm using JFS on a new scsi disk with Mandrake
> 9.1 and was having similar problems.  I was generating heavy disk
> usage through database and astronomical data reductions.  My machine
> (dual AMD) would suddenly hang.  No new jobs would run, just
> increase the load, until I reboot the machine.
> I solved my problems by creating a 128Mb ram disk (using EXT2) for
> the temp data produced my reduction runs.
> I believe JFS was to blame, not hardware, but you never know...


The set of concurrent factors that came together to appear when this
happened "consistently" were thus:

 1.  Heavy DB updates taking place on JFS filesystems;

 2.  SMP (we suspected Xeon hyperthreading as a possible factor, but
     shut it off and still saw the same problem...)

 3.  The third factor that appeared a catalyst was copying, via scp, a
     file > 2GB in size onto the system.

The third piece was a particularly interesting aspect; the file would
get copied over successfully, and the scp process would hang (to the
point of "kill -9" being unable to touch it) immediately thereafter.

At that point, processes on the system that were accessing files on
the hung-up filesystem were locked, also unkillable by "kill 9."
That's certainly consistent with JFS being at the root of the problem,
whether it was the cause or not...
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
Christopher Browne
(416) 646 3304 x124 (land)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to