Re: 3.8-rc5 xfs corruption

2013-01-31 Thread CAI Qian


- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: x...@oss.sgi.com, linux-...@vger.kernel.org, "linux-kernel" 
> 
> Sent: Thursday, January 31, 2013 12:07:48 PM
> Subject: Re: 3.8-rc5 xfs corruption
> 
> On Wed, Jan 30, 2013 at 10:16:47PM -0500, CAI Qian wrote:
> > Hello,
> > 
> > (Sorry to post to xfs mailing lists but unsure about which one is
> > the
> > best for this.)
> 
> Trimmed to just x...@oss.sgi.com.
Thanks for quick response, Dave.
> 
> > I have seen something like this once during testing on a system
> > with a
> > EMC VNX FC/multipath back-end.
> 
> This is a trace from the verifier code that was added in 3.8-rc1 so
> I doubt it has anything to do with any problem you've seen in the
> past
> 
> Can you tell us what workload you were running and what hardware you
> are using as per:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
This was the system,
- AMD Opteron(tm) Processor 4130 (1 socket, 4 cores)
- PowerEdge R415 
- 8G memory
- mptsas local disks

Software version,
- xfsprogs-3.1.10

The workload was running some fs_mark, syscalls tests, some nfs/cifs
connectathon tests, memory, libhugetlbfs tests, and some dynamic debug
(Documentation/dynamic-debug-howto.txt) tests.
> 
> As it is, if you mounted the filesystem after this problem was
> detected, log recovery probably propagated it to disk. I'd suggest
> that you run xfs_repair -n on the device and post the output so we
> can see if any corruption has actaully made it to disk. If no
> corruption made it to disk, it's possible that we've got the
> incorrect verifier attached to the buffer.
The system was taken away from me, so I can only occupy it again later
if needed.

Regards,
CAI Qian
> 
> > [ 3025.063024] 8801a0d5: 2e 2e 2f 2e 2e 2f 75 73 72 2f 6c
> > 69 62 2f 6d 6f  ../../usr/lib/mo
> 
> The start of a block contains a path and the only
> type of block that can contain this format of metadata is remote
> symlink block. Remote symlink blocks don't have a verifier attached
> to them as there is nothing that can currently be used to verify
> them as correct.
> 
> I can't see exactly how this can occur as stale buffers have the
> verifier ops cleared before being returned to the new user, and
> newly allocated xfs_bufs are zeroed before being initialised. I
> really need to know what you are doing to be able to get to the
> bottom of it
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8-rc5 xfs corruption

2013-01-31 Thread CAI Qian


- Original Message -
 From: Dave Chinner da...@fromorbit.com
 To: CAI Qian caiq...@redhat.com
 Cc: x...@oss.sgi.com, linux-...@vger.kernel.org, linux-kernel 
 linux-kernel@vger.kernel.org
 Sent: Thursday, January 31, 2013 12:07:48 PM
 Subject: Re: 3.8-rc5 xfs corruption
 
 On Wed, Jan 30, 2013 at 10:16:47PM -0500, CAI Qian wrote:
  Hello,
  
  (Sorry to post to xfs mailing lists but unsure about which one is
  the
  best for this.)
 
 Trimmed to just x...@oss.sgi.com.
Thanks for quick response, Dave.
 
  I have seen something like this once during testing on a system
  with a
  EMC VNX FC/multipath back-end.
 
 This is a trace from the verifier code that was added in 3.8-rc1 so
 I doubt it has anything to do with any problem you've seen in the
 past
 
 Can you tell us what workload you were running and what hardware you
 are using as per:
 
 http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
This was the system,
- AMD Opteron(tm) Processor 4130 (1 socket, 4 cores)
- PowerEdge R415 
- 8G memory
- mptsas local disks

Software version,
- xfsprogs-3.1.10

The workload was running some fs_mark, syscalls tests, some nfs/cifs
connectathon tests, memory, libhugetlbfs tests, and some dynamic debug
(Documentation/dynamic-debug-howto.txt) tests.
 
 As it is, if you mounted the filesystem after this problem was
 detected, log recovery probably propagated it to disk. I'd suggest
 that you run xfs_repair -n on the device and post the output so we
 can see if any corruption has actaully made it to disk. If no
 corruption made it to disk, it's possible that we've got the
 incorrect verifier attached to the buffer.
The system was taken away from me, so I can only occupy it again later
if needed.

Regards,
CAI Qian
 
  [ 3025.063024] 8801a0d5: 2e 2e 2f 2e 2e 2f 75 73 72 2f 6c
  69 62 2f 6d 6f  ../../usr/lib/mo
 
 The start of a block contains a path and the only
 type of block that can contain this format of metadata is remote
 symlink block. Remote symlink blocks don't have a verifier attached
 to them as there is nothing that can currently be used to verify
 them as correct.
 
 I can't see exactly how this can occur as stale buffers have the
 verifier ops cleared before being returned to the new user, and
 newly allocated xfs_bufs are zeroed before being initialised. I
 really need to know what you are doing to be able to get to the
 bottom of it
 
 Cheers,
 
 Dave.
 --
 Dave Chinner
 da...@fromorbit.com
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8-rc5 xfs corruption

2013-01-30 Thread Dave Chinner
On Wed, Jan 30, 2013 at 10:16:47PM -0500, CAI Qian wrote:
> Hello,
> 
> (Sorry to post to xfs mailing lists but unsure about which one is the
> best for this.)

Trimmed to just x...@oss.sgi.com.

> I have seen something like this once during testing on a system with a
> EMC VNX FC/multipath back-end.

This is a trace from the verifier code that was added in 3.8-rc1 so
I doubt it has anything to do with any problem you've seen in the
past

Can you tell us what workload you were running and what hardware you
are using as per:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

As it is, if you mounted the filesystem after this problem was
detected, log recovery probably propagated it to disk. I'd suggest
that you run xfs_repair -n on the device and post the output so we
can see if any corruption has actaully made it to disk. If no
corruption made it to disk, it's possible that we've got the
incorrect verifier attached to the buffer.

> [ 3025.063024] 8801a0d5: 2e 2e 2f 2e 2e 2f 75 73 72 2f 6c 69 62 2f 6d 
> 6f  ../../usr/lib/mo 

The start of a block contains a path and the only
type of block that can contain this format of metadata is remote
symlink block. Remote symlink blocks don't have a verifier attached
to them as there is nothing that can currently be used to verify
them as correct.

I can't see exactly how this can occur as stale buffers have the
verifier ops cleared before being returned to the new user, and
newly allocated xfs_bufs are zeroed before being initialised. I
really need to know what you are doing to be able to get to the
bottom of it

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8-rc5 xfs corruption

2013-01-30 Thread Dave Chinner
On Wed, Jan 30, 2013 at 10:16:47PM -0500, CAI Qian wrote:
 Hello,
 
 (Sorry to post to xfs mailing lists but unsure about which one is the
 best for this.)

Trimmed to just x...@oss.sgi.com.

 I have seen something like this once during testing on a system with a
 EMC VNX FC/multipath back-end.

This is a trace from the verifier code that was added in 3.8-rc1 so
I doubt it has anything to do with any problem you've seen in the
past

Can you tell us what workload you were running and what hardware you
are using as per:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

As it is, if you mounted the filesystem after this problem was
detected, log recovery probably propagated it to disk. I'd suggest
that you run xfs_repair -n on the device and post the output so we
can see if any corruption has actaully made it to disk. If no
corruption made it to disk, it's possible that we've got the
incorrect verifier attached to the buffer.

 [ 3025.063024] 8801a0d5: 2e 2e 2f 2e 2e 2f 75 73 72 2f 6c 69 62 2f 6d 
 6f  ../../usr/lib/mo 

The start of a block contains a path and the only
type of block that can contain this format of metadata is remote
symlink block. Remote symlink blocks don't have a verifier attached
to them as there is nothing that can currently be used to verify
them as correct.

I can't see exactly how this can occur as stale buffers have the
verifier ops cleared before being returned to the new user, and
newly allocated xfs_bufs are zeroed before being initialised. I
really need to know what you are doing to be able to get to the
bottom of it

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/