Re: [OpenAFS] Tracking down AFS Fileserver corruption

Stephan Wiesand Mon, 28 Nov 2011 11:35:06 -0800

Hi Jack,

no help, just a few dumb questions inline:


On Nov 28, 2011, at 19:13 , Jack Neely wrote:

> Folks,
> 
> I'm deploying new OpenAFS 1.6.0 DAFS file servers on fully update RHEL
> 6.1 servers and I've stumbled across a data corruption problem.  My ext4
> filesystem on the vice mounts are not getting corrupted, just the AFS
> volume data.
> 
> Our /vicep[ab] mounts are provided by an EMC Clariion SAN array using
> the PowerPath driver.  Each of the two vice mounts have 4 paths and are
> not partitioned.  I've directly formatted the /dev/emcpower[ab] block
> device as ext4.  Of course, the /dev/emcpowerX device is mounted on
> /vicepX.

emcpower{a,b} map to sdc{c,e} ?

> Every hour our OCS Inventory agent runs which eventually runs "fdisk -l"
> to get statistics for the storage on the server.  When I was moving test
> volumes to the new server and the agent ran fdisk -l the kernel would
> print:
> 
>    Nov 28 13:01:39 xxx kernel: sdc: unknown partition table
>    Nov 28 13:01:39 xxx kernel: sde: unknown partition table
>    Nov 28 13:01:49 xxx kernel: sdc: unknown partition table
>    Nov 28 13:01:49 xxx kernel: sde: unknown partition table

If the devices aren't partitioned, why would it ever find a partition table?

This may have changed, but Red Hat used to not support setups with filesystems 
on unpartitioned block devices, I believe.

> and the volume being moved at that exact time would be corrupt.  Usually
> the server would soon detect this and salvage the volume, but the level
> of corruptions has varied.

I don't have experience with running 1.6 servers in production yet, but since 
the AFS fileserver is entirely running in userland, it should not cause this 
kind of corruption. That being said, there's an open BZ regarding ext4 
corruption due to Ceph userland processes...

> The above messages and corruption only seem to happen when volume moves
> are in progress.  Running fdisk -l on an idle server produces no
> messages.

Any messages if you run bonnie++ or iozone on the filesystem when the agent 
runs?

> Other things cause the above messages to be re-printed, such as running
> fsck -yf /dev/emcpowera.

Is this safe to do on a mounted ext4 filesystem?

>  They occur during the early hours of the
> morning as well from something that appears to be related to a cron job
> I've not tracked down yet.  
> 
> I need some help in figuring out what is causing the corruption and,
> more importantly, how to fix things.

If the AFS fileserver could be run under a different account than root, one 
could be completely confident it's not the culprit. As things are, I'm only 99% 
confident...

Best regards,
        Stephan
> 
> Thanks,
> Jack Neely
> 
> -- 
> Jack Neely <[email protected]>
> Linux Czar, OIT Campus Linux Services
> Office of Information Technology, NC State University
> GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89
> _______________________________________________
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Tracking down AFS Fileserver corruption

Reply via email to