Re: g_vfs_done error third part--PLEASE HELP!

2008-05-17 Thread Willy Offermans
Hello Roland and FreeBSD friends,

On Fri, May 16, 2008 at 09:07:18PM +0200, Roland Smith wrote:
 On Fri, May 16, 2008 at 02:14:14PM +0200, Willy Offermans wrote:
 
  Filesystem  1K-blocks Used Avail Capacity  Mounted on
  /dev/ar0s1a  20308398   230438  18453290 1%/
  devfs   11 0   100%/dev
  /dev/ar0s1d  21321454  3814482  1580125619%/usr
  /dev/ar0s1e  50777034  5331686  4138318611%/var
  /dev/ar0s1f 101554150 18813760  7461605820%/home
  /dev/ar0s1g 274977824 34564876 21841472414%/share
  
  pretty normal I would say.
 
 Yes.
 
   Did you notice any file corruption in the filesystem on ar0s1g?
  
  No the two disks are brand new and I did not encounter any noticeable
  file corruption. However I assume that nowadays bad sectors on HD are
  handled by the hardware and do not need any user interaction to correct
  this. But maybe I'm totally wrong.
 
 Every ATA disk has spare sectors, and they usually don't report bad
 blocks untill the spares are exhausted. In which case it is prudent to
 replace the disk.
 
   Unmount the filesystem and run fsck(8) on it. Does it report any errors?
  
  sun# fsck /dev/ar0s1g 
  ** /dev/ar0s1g
  ** Last Mounted on /share
  ** Phase 1 - Check Blocks and Sizes
  INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
  CORRECT? [yn] y
  
  INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
  CORRECT? [yn] y
  
  ** Phase 2 - Check Pathnames
  ** Phase 3 - Check Connectivity
  ** Phase 4 - Check Reference Counts
  ** Phase 5 - Check Cyl groups
  FREE BLK COUNT(S) WRONG IN SUPERBLK
  SALVAGE? [yn] y
  
  SUMMARY INFORMATION BAD
  SALVAGE? [yn] y
  
  BLK(S) MISSING IN BIT MAPS
  SALVAGE? [yn] y
  
  182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
  blocks, 0.0% fragmentation)
  
  * FILE SYSTEM MARKED CLEAN *
  
  * FILE SYSTEM WAS MODIFIED *
  
  The usual stuff I would say.
 
 Disk corruption is never normal. It can be explained by if the machine
 crashed or was power-cycles before the disks were unmounted, but it can
 also indicate hardware troubles.
 
Any hints are very much appreciated.
 
  So I have to conclude that the write error message does make sense and
  that something seems to be wrong with the disks. The next question is
  what can I do about it? Should I return the disks to the shop and ask
  for new ones?
 
 Install sysutils/smartmontools, and run 'smartctl -A /dev/adX|less', where X
 are the numbers of the drives in the RAID array.
 
 In the output, look at the values for Reallocated_Sector_Ct,
 Current_Pending_Sector, Offline_Uncorrectable, which is the last number
 that you see on each line.
 
 A small number for Reallocated_Sector_Ct is allowable. But non-zero counts
 for Current_Pending_Sector or Offline_Uncorrectable means it's time to
 get a new disk.

sun# atacontrol status ar0
ar0: ATA RAID1 status: READY
 subdisks:
   0 ad4  ONLINE
   1 ad6  ONLINE

So ad4 and ad6 are the HDs of the array.

sun# smartctl -A /dev/ad6 
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   100   051Pre-fail Always  -  
 3
  3 Spin_Up_Time0x0007   100   100   015Pre-fail Always  -  
 7232
  4 Start_Stop_Count0x0032   100   100   000Old_age Always   -  
 31
  5 Reallocated_Sector_Ct   0x0033   253   253   010Pre-fail Always  -  
 0
  7 Seek_Error_Rate 0x000f   253   253   051Pre-fail Always  -  
 0
  8 Seek_Time_Performance   0x0025   253   253   015Pre-fail Offline -  
 0
  9 Power_On_Hours  0x0032   100   100   000Old_age Always   -  
 1478
 10 Spin_Retry_Count0x0033   253   253   051Pre-fail Always  -  
 0
 11 Calibration_Retry_Count 0x0012   253   253   000Old_age Always   -  
 0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age Always   -  
 31
 13 Read_Soft_Error_Rate0x000e   100   100   000Old_age Always   -  
 439070649
187 Reported_Uncorrect  0x0032   253   253   000Old_age Always   -  
 0
188 Unknown_Attribute   0x0032   253   253   000Old_age Always   -  
 0
190 Airflow_Temperature_Cel 0x0022   062   060   000Old_age Always   -  
 38
194 Temperature_Celsius 0x0022   124   115   000Old_age Always   -  
 38
195 Hardware_ECC_Recovered  0x001a   100   100   000Old_age Always   -  
 439070649
196 Reallocated_Event_Count 0x0032   253   253   000Old_age Always   -  
 0
197 Current_Pending_Sector  0x0012   253   253   000

Re: g_vfs_done error third part--PLEASE HELP!

2008-05-17 Thread Jeremy Chadwick
On Sat, May 17, 2008 at 09:52:23AM +0200, Willy Offermans wrote:
 sun# atacontrol status ar0
 ar0: ATA RAID1 status: READY
  subdisks:
0 ad4  ONLINE
1 ad6  ONLINE

What ataraid(4) method are you using?  Promise FastTrak?  Adaptec
HostRAID?  Intel MatrixRAID?  Please let us know, as there are some
known long-standing bugs with ataraid(4) that could (no guarantee)
explain what's going on.

 So ad4 and ad6 are the HDs of the array.
 
 sun# smartctl -A /dev/ad6 

This excludes the brand/model of hard disks you have.  Can you please
tell us this?  Different hard disk manufacturers do different things
with SMART statistics.

Your SMART statistics look okay, but depending upon what drive model and
manufacturer is being used, they could be indicative of a problem.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-05-17 Thread Willy Offermans
Hello Jeremy and FreeBSD friends,

On Sat, May 17, 2008 at 03:16:27AM -0700, Jeremy Chadwick wrote:
 On Sat, May 17, 2008 at 09:52:23AM +0200, Willy Offermans wrote:
  sun# atacontrol status ar0
  ar0: ATA RAID1 status: READY
   subdisks:
 0 ad4  ONLINE
 1 ad6  ONLINE
 
 What ataraid(4) method are you using?  Promise FastTrak?  Adaptec
 HostRAID?  Intel MatrixRAID?  Please let us know, as there are some
 known long-standing bugs with ataraid(4) that could (no guarantee)
 explain what's going on.
 
  So ad4 and ad6 are the HDs of the array.
  
  sun# smartctl -A /dev/ad6 
 
 This excludes the brand/model of hard disks you have.  Can you please
 tell us this?  Different hard disk manufacturers do different things
 with SMART statistics.
 
 Your SMART statistics look okay, but depending upon what drive model and
 manufacturer is being used, they could be indicative of a problem.
 
 -- 

From /var/run/dmesg.boot:

ar0: 476837MB Promise Fasttrak RAID1 status: READY

ad4: 476940MB SAMSUNG HD501LJ CR100-12 at ata2-master SATA150
ad6: 476940MB SAMSUNG HD501LJ CR100-12 at ata3-master SATA150

I hope this is the information you are asking for.

-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: [EMAIL PROTECTED]

   Powered by 

(__)
 \\\'',)
   \/  \ ^
   .\._/_)

   www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Jeremy Chadwick
On Fri, May 16, 2008 at 02:14:14PM +0200, Willy Offermans wrote:
 On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:
  Did you notice any file corruption in the filesystem on ar0s1g?
 
 No the two disks are brand new and I did not encounter any noticeable
 file corruption. However I assume that nowadays bad sectors on HD are
 handled by the hardware and do not need any user interaction to correct
 this. But maybe I'm totally wrong.

You're right, but it depends on the type of disk.  SCSI disks will
report bad blocks to the OS regardless if it is about to mark the block
as a grown defect or not.  PATA and SATA disks, on the other hand, will
report bad blocks to the OS only if the internal bad block list (which
it manages itself -- this is what you're thinking of) is full.

There are still many conditions where PATA and SATA disks can induce
errors in the OS.  If the disk is attempting to work around a bad block,
and there's a physical error (servo problem, head crash, repetitive
re-reads of the block due to dust, whatever -- something that ties up
the disk for long periods of time), ATA subsystem timeouts may be seen,
DMA errors, or whatever else.  SMART stats will show this kind of
problem.

  Unmount the filesystem and run fsck(8) on it. Does it report any errors?
 
 sun# fsck /dev/ar0s1g 
 ** /dev/ar0s1g
 ** Last Mounted on /share
 ** Phase 1 - Check Blocks and Sizes
 INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
 CORRECT? [yn] y
 
 INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
 CORRECT? [yn] y
 
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 FREE BLK COUNT(S) WRONG IN SUPERBLK
 SALVAGE? [yn] y
 
 SUMMARY INFORMATION BAD
 SALVAGE? [yn] y
 
 BLK(S) MISSING IN BIT MAPS
 SALVAGE? [yn] y
 
 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
 blocks, 0.0% fragmentation)
 
 * FILE SYSTEM MARKED CLEAN *
 
 * FILE SYSTEM WAS MODIFIED *
 
 The usual stuff I would say.

How is this usual?.  It appears to me you did have some filesystem
corruption.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Kris Kennaway

Willy Offermans wrote:

Hello Roland and FreeBSD friends,

I'm sorry to be so quite for a while, but I went away for a vacation.
But now I'm back, I like to solve this issue.


On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:

On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:

Dear FreeBSD friends,

It is already the third time that I report this error. Can someone help
me in solving this issue?

Probably the reason that you hear so little is that you provide so
little information. Most of us are not clairvoyant.
 

Over and over again and always after heavy disk I/O I see the following
errors in the log files. If I force ar0s1g to unmount the machine
spontaneously reboots. Nothing seriously seems to be damaged by this
act, but anyway I cannot afford something bad happening to this
production machine.

Why would you force an unmount?


Otherwise the device keeps on reporting to be unavailable and cannot be
unmounted:

sun# umount /share/
umount: unmount of /share failed: Resource temporarily unavailable


Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
length=4096)]error = 5

I have no clue what the errors mean, since offsets of 290725068800,
290725072896, and 290725074944 seem to be ridiculous. Does anybody 
have a clue what is going on?

For starters, how big is ar0s1g? If the offset is in bytes, it is around
270 GB, which is not that unusual in this day and age.


I have to admit that I was a bit confused by an offset value of 
290725068800. There is no indication of a unit, so I assumed that it

was sector but probably it is simply bytes and then indeed the number
does make sense.

I'm using FreeBSD 7.0, but found the error being reported before with
previous versions of FreeBSD. I can and will provide more details on
demand.

What does 'df' say?


Filesystem  1K-blocks Used Avail Capacity  Mounted on
/dev/ar0s1a  20308398   230438  18453290 1%/
devfs   11 0   100%/dev
/dev/ar0s1d  21321454  3814482  1580125619%/usr
/dev/ar0s1e  50777034  5331686  4138318611%/var
/dev/ar0s1f 101554150 18813760  7461605820%/home
/dev/ar0s1g 274977824 34564876 21841472414%/share

pretty normal I would say.


Did you notice any file corruption in the filesystem on ar0s1g?


No the two disks are brand new and I did not encounter any noticeable
file corruption. However I assume that nowadays bad sectors on HD are
handled by the hardware and do not need any user interaction to correct
this. But maybe I'm totally wrong.


Unmount the filesystem and run fsck(8) on it. Does it report any errors?


sun# fsck /dev/ar0s1g 
** /dev/ar0s1g

** Last Mounted on /share
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
CORRECT? [yn] y

INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
CORRECT? [yn] y

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] y

SUMMARY INFORMATION BAD
SALVAGE? [yn] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [yn] y

182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
blocks, 0.0% fragmentation)

* FILE SYSTEM MARKED CLEAN *

* FILE SYSTEM WAS MODIFIED *

The usual stuff I would say.


No, any form of filesystem corruption is not usual.




Any hints are very much appreciated.

Did you manage to create a partition larger than the disk is (using
newfs's -s switch)? In that case it could be that you're trying to write
past the end of the device.


No, look to the following output:

sun# bsdlabel -A /dev/ar0s1
# /dev/ar0s1:
type: unknown
disk: amnesiac
label: 
flags:

bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 60799
sectors/unit: 976751937
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0 


8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a: 4194304004.2BSD0 0 0 
  b:  8388608 41943040  swap
  c: 9767519370unused0 0 # raw

part, don't edit
  d: 44040192 503316484.2BSD 2048 16384 28552 
  e: 104857600 943718404.2BSD 2048 16384 28552 
  f: 209715200 1992294404.2BSD 2048 16384 28552 
  g: 567807297 4089446404.2BSD 2048 16384 28552 


/dev/ar0s1g starts after 408944640*512/1024/1024=199680MB


So I have to conclude that the write error message does make sense and
that something seems to be wrong with the disks. The next question is
what can I do about it? Should I return the disks to the shop and ask
for new ones?


#define EIO 5   /* Input/output error */

At least one of your disks is toast.

Kris

___
freebsd-stable@freebsd.org mailing list

Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Willy Offermans
Hello Jeremy and FreeBSD friends,

On Fri, May 16, 2008 at 05:27:59AM -0700, Jeremy Chadwick wrote:
 On Fri, May 16, 2008 at 02:14:14PM +0200, Willy Offermans wrote:
  On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:
   Did you notice any file corruption in the filesystem on ar0s1g?
  
  No the two disks are brand new and I did not encounter any noticeable
  file corruption. However I assume that nowadays bad sectors on HD are
  handled by the hardware and do not need any user interaction to correct
  this. But maybe I'm totally wrong.
 
 You're right, but it depends on the type of disk.  SCSI disks will
 report bad blocks to the OS regardless if it is about to mark the block
 as a grown defect or not.  PATA and SATA disks, on the other hand, will
 report bad blocks to the OS only if the internal bad block list (which
 it manages itself -- this is what you're thinking of) is full.
 
 There are still many conditions where PATA and SATA disks can induce
 errors in the OS.  If the disk is attempting to work around a bad block,
 and there's a physical error (servo problem, head crash, repetitive
 re-reads of the block due to dust, whatever -- something that ties up
 the disk for long periods of time), ATA subsystem timeouts may be seen,
 DMA errors, or whatever else.  SMART stats will show this kind of
 problem.
 
   Unmount the filesystem and run fsck(8) on it. Does it report any errors?
  
  sun# fsck /dev/ar0s1g 
  ** /dev/ar0s1g
  ** Last Mounted on /share
  ** Phase 1 - Check Blocks and Sizes
  INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
  CORRECT? [yn] y
  
  INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
  CORRECT? [yn] y
  
  ** Phase 2 - Check Pathnames
  ** Phase 3 - Check Connectivity
  ** Phase 4 - Check Reference Counts
  ** Phase 5 - Check Cyl groups
  FREE BLK COUNT(S) WRONG IN SUPERBLK
  SALVAGE? [yn] y
  
  SUMMARY INFORMATION BAD
  SALVAGE? [yn] y
  
  BLK(S) MISSING IN BIT MAPS
  SALVAGE? [yn] y
  
  182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
  blocks, 0.0% fragmentation)
  
  * FILE SYSTEM MARKED CLEAN *
  
  * FILE SYSTEM WAS MODIFIED *
  
  The usual stuff I would say.
 
 How is this usual?.  It appears to me you did have some filesystem
 corruption.
 

What kind of filesystem corruption and how to solve that?

I see these messages frequently if a FreeBSD machine unexpectedly reboots. Not 
only on this 
system but also on others. I never worried about it.


-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: [EMAIL PROTECTED]

   Powered by 

(__)
 \\\'',)
   \/  \ ^
   .\._/_)

   www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Willy Offermans
Hello Kris,

On Fri, May 16, 2008 at 02:43:24PM +0200, Kris Kennaway wrote:
 Willy Offermans wrote:
 Hello Roland and FreeBSD friends,
 
 I'm sorry to be so quite for a while, but I went away for a vacation.
 But now I'm back, I like to solve this issue.
 
 
 On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:
 On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:
 Dear FreeBSD friends,
 
 It is already the third time that I report this error. Can someone help
 me in solving this issue?
 Probably the reason that you hear so little is that you provide so
 little information. Most of us are not clairvoyant.
  
 Over and over again and always after heavy disk I/O I see the following
 errors in the log files. If I force ar0s1g to unmount the machine
 spontaneously reboots. Nothing seriously seems to be damaged by this
 act, but anyway I cannot afford something bad happening to this
 production machine.
 Why would you force an unmount?
 
 Otherwise the device keeps on reporting to be unavailable and cannot be
 unmounted:
 
 sun# umount /share/
 umount: unmount of /share failed: Resource temporarily unavailable
 
 Apr 18 20:02:19 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
 
 I have no clue what the errors mean, since offsets of 290725068800,
 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
 have a clue what is going on?
 For starters, how big is ar0s1g? If the offset is in bytes, it is around
 270 GB, which is not that unusual in this day and age.
 
 I have to admit that I was a bit confused by an offset value of 
 290725068800. There is no indication of a unit, so I assumed that it
 was sector but probably it is simply bytes and then indeed the number
 does make sense.
 I'm using FreeBSD 7.0, but found the error being reported before with
 previous versions of FreeBSD. I can and will provide more details on
 demand.
 What does 'df' say?
 
 Filesystem  1K-blocks Used Avail Capacity  Mounted on
 /dev/ar0s1a  20308398   230438  18453290 1%/
 devfs   11 0   100%/dev
 /dev/ar0s1d  21321454  3814482  1580125619%/usr
 /dev/ar0s1e  50777034  5331686  4138318611%/var
 /dev/ar0s1f 101554150 18813760  7461605820%/home
 /dev/ar0s1g 274977824 34564876 21841472414%/share
 
 pretty normal I would say.
 
 Did you notice any file corruption in the filesystem on ar0s1g?
 
 No the two disks are brand new and I did not encounter any noticeable
 file corruption. However I assume that nowadays bad sectors on HD are
 handled by the hardware and do not need any user interaction to correct
 this. But maybe I'm totally wrong.
 
 Unmount the filesystem and run fsck(8) on it. Does it report any errors?
 
 sun# fsck /dev/ar0s1g 
 ** /dev/ar0s1g
 ** Last Mounted on /share
 ** Phase 1 - Check Blocks and Sizes
 INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
 CORRECT? [yn] y
 
 INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
 CORRECT? [yn] y
 
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 FREE BLK COUNT(S) WRONG IN SUPERBLK
 SALVAGE? [yn] y
 
 SUMMARY INFORMATION BAD
 SALVAGE? [yn] y
 
 BLK(S) MISSING IN BIT MAPS
 SALVAGE? [yn] y
 
 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
 blocks, 0.0% fragmentation)
 
 * FILE SYSTEM MARKED CLEAN *
 
 * FILE SYSTEM WAS MODIFIED *
 
 The usual stuff I would say.
 
 No, any form of filesystem corruption is not usual.
 
 
 Any hints are very much appreciated.
 Did you manage to create a partition larger than the disk is (using
 newfs's -s switch)? In that case it could be that you're trying to write
 past the end of the device.
 
 No, look to the following output:
 
 sun# bsdlabel -A /dev/ar0s1
 # /dev/ar0s1:
 type: unknown
 disk: amnesiac
 label: 
 flags:
 bytes/sector: 512
 sectors/track: 63
 tracks/cylinder: 255
 sectors/cylinder: 16065
 cylinders: 60799
 sectors/unit: 976751937
 rpm: 3600
 interleave: 1
 trackskew: 0
 cylinderskew: 0
 headswitch: 0   # milliseconds
 track-to-track seek: 0  # milliseconds
 drivedata: 0 
 
 8 partitions:
 #size   offsetfstype   [fsize bsize bps/cpg]
   a: 4194304004.2BSD0 0 0 
   b:  8388608 41943040  swap
   c: 9767519370unused0 0 # raw
 part, don't edit
   d: 44040192 503316484.2BSD 2048 16384 28552 
   e: 104857600 943718404.2BSD 2048 16384 28552 
   f: 209715200 1992294404.2BSD 2048 16384 28552 
   g: 567807297 4089446404.2BSD 2048 16384 28552 
 
 /dev/ar0s1g starts after 408944640*512/1024/1024=199680MB
 
 
 So I have to conclude that the write error message does make sense and
 that something seems to be wrong with the disks. The next question is
 what can I do about it? Should I return the disks to the shop and ask
 for new ones?
 
 #define 

Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Jeremy Chadwick
On Fri, May 16, 2008 at 05:37:56PM +0200, Willy Offermans wrote:
   sun# fsck /dev/ar0s1g 
   ** /dev/ar0s1g
   ** Last Mounted on /share
   ** Phase 1 - Check Blocks and Sizes
   INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
   CORRECT? [yn] y
   
   INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
   CORRECT? [yn] y
   
   ** Phase 2 - Check Pathnames
   ** Phase 3 - Check Connectivity
   ** Phase 4 - Check Reference Counts
   ** Phase 5 - Check Cyl groups
   FREE BLK COUNT(S) WRONG IN SUPERBLK
   SALVAGE? [yn] y
   
   SUMMARY INFORMATION BAD
   SALVAGE? [yn] y
   
   BLK(S) MISSING IN BIT MAPS
   SALVAGE? [yn] y
   
   182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
   blocks, 0.0% fragmentation)
   
   * FILE SYSTEM MARKED CLEAN *
   
   * FILE SYSTEM WAS MODIFIED *
   
   The usual stuff I would say.
  
  How is this usual?.  It appears to me you did have some filesystem
  corruption.
 
 What kind of filesystem corruption and how to solve that?

That's difficult to answer, for a lot of reasons.

Your original post stated that you were seeing g_vfs_done errors on the
console, and you were worried about what they implied.  Then someone
asked you have you fsck'd the filesystem?, and you hadn't.  Then you
did fsck it, and as can be seen above, the filesystem had errors.

When combined with your below comment, it's very difficult to figure out
what's going on with your system over there, or what information you're
not disclosing.

Additionally, kris@ has stated that it looks like you may have a hard
disk that's gone bad, and that's a strong possibility as well.  SMART
statistics of the drives in your RAID array would be useful.

 I see these messages frequently if a FreeBSD machine unexpectedly reboots. 
 Not only on this 
 system but also on others. I never worried about it.

Are you saying the above errors experienced were caused by an unexpected
crash or reboot?  If so, the filesystem should have been automatically
fsck'd shortly (60-120 seconds) after getting a login: prompt on the
console.

Is your filesystem UFS2 with softupdates enabled?  If so, and the
automatic fsck didn't happen, then that's something separate to look
into -- it should happen automatically with softupdates enabled.

More importantly, though, would be the explanation for why your system
is crashing/rebooting/power-cycling.  Data corruption can happen in
those situations, especially the latter, but any form of non-clean
shutdown should induce a fsck on UFS2+softupdate filesystems.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-05-16 Thread Roland Smith
On Fri, May 16, 2008 at 02:14:14PM +0200, Willy Offermans wrote:

 Filesystem  1K-blocks Used Avail Capacity  Mounted on
 /dev/ar0s1a  20308398   230438  18453290 1%/
 devfs   11 0   100%/dev
 /dev/ar0s1d  21321454  3814482  1580125619%/usr
 /dev/ar0s1e  50777034  5331686  4138318611%/var
 /dev/ar0s1f 101554150 18813760  7461605820%/home
 /dev/ar0s1g 274977824 34564876 21841472414%/share
 
 pretty normal I would say.

Yes.

  Did you notice any file corruption in the filesystem on ar0s1g?
 
 No the two disks are brand new and I did not encounter any noticeable
 file corruption. However I assume that nowadays bad sectors on HD are
 handled by the hardware and do not need any user interaction to correct
 this. But maybe I'm totally wrong.

Every ATA disk has spare sectors, and they usually don't report bad
blocks untill the spares are exhausted. In which case it is prudent to
replace the disk.

  Unmount the filesystem and run fsck(8) on it. Does it report any errors?
 
 sun# fsck /dev/ar0s1g 
 ** /dev/ar0s1g
 ** Last Mounted on /share
 ** Phase 1 - Check Blocks and Sizes
 INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
 CORRECT? [yn] y
 
 INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
 CORRECT? [yn] y
 
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 FREE BLK COUNT(S) WRONG IN SUPERBLK
 SALVAGE? [yn] y
 
 SUMMARY INFORMATION BAD
 SALVAGE? [yn] y
 
 BLK(S) MISSING IN BIT MAPS
 SALVAGE? [yn] y
 
 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
 blocks, 0.0% fragmentation)
 
 * FILE SYSTEM MARKED CLEAN *
 
 * FILE SYSTEM WAS MODIFIED *
 
 The usual stuff I would say.

Disk corruption is never normal. It can be explained by if the machine
crashed or was power-cycles before the disks were unmounted, but it can
also indicate hardware troubles.

   Any hints are very much appreciated.

 So I have to conclude that the write error message does make sense and
 that something seems to be wrong with the disks. The next question is
 what can I do about it? Should I return the disks to the shop and ask
 for new ones?

Install sysutils/smartmontools, and run 'smartctl -A /dev/adX|less', where X
are the numbers of the drives in the RAID array.

In the output, look at the values for Reallocated_Sector_Ct,
Current_Pending_Sector, Offline_Uncorrectable, which is the last number
that you see on each line.

A small number for Reallocated_Sector_Ct is allowable. But non-zero counts
for Current_Pending_Sector or Offline_Uncorrectable means it's time to
get a new disk.

 However other people that I have contacted and who had a similar
 problem before have solved it by using software raid setup instead of a
 hardware raid setup. This seems to indicate that there is some bug in
 the FreeBSD code.

The RAID support that you find on most desktop motherboards _is_
software RAID. See ataraid(4).

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpNcSsEdIcRL.pgp
Description: PGP signature


Re: g_vfs_done error third part--PLEASE HELP!

2008-04-25 Thread Jeremy Chadwick
On Fri, Apr 25, 2008 at 07:59:36AM +0300, Toomas Aas wrote:
 Willy Offermans wrote:

 It is already the third time that I report this error. Can someone help
 me in solving this issue?

 Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
 length=2048)]error = 5
 Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
 length=2048)]error = 5
 Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
 length=2048)]error = 5
 ...

 I can only tell you that I had similar problems with FreeBSD 6.3 and ICH7R 
 based RAID. Since I couldn't figure out how to solve them, I discarded the 
 BIOS-based RAID and instead set up gmirror. It's been running this way for 
 a year now and been rock solid.

Are you referring to Intel MatrixRAID?  If so, there are multiple PRs
open on problems with FreeBSD and MatrixRAID, some which have been open
for over 2 years which include patches.  You wouldn't be the first
person to ask why they haven't been committed to the tree.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-04-25 Thread Toomas Aas

Jeremy Chadwick wrote:


On Fri, Apr 25, 2008 at 07:59:36AM +0300, Toomas Aas wrote:

Willy Offermans wrote:


Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
...
I can only tell you that I had similar problems with FreeBSD 6.3 and ICH7R 
based RAID. Since I couldn't figure out how to solve them, I discarded the 
BIOS-based RAID and instead set up gmirror. It's been running this way for 
a year now and been rock solid.


Are you referring to Intel MatrixRAID?  


Yes.


If so, there are multiple PRs
open on problems with FreeBSD and MatrixRAID, some which have been open
for over 2 years which include patches.  


Funny that I didn't find them when I was investigating the problem. Not 
that I'm doubting your word, just... funny.



You wouldn't be the first
person to ask why they haven't been committed to the tree.


Well, unfortunately I am not competent to comment on that, nor am I in 
postition to *demand* that something be committed in a volunteer project, 
since I couldn't even imagine what the consequences would be :) At least I 
found a workaround.


--
Toomas Aas

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-04-24 Thread Toomas Aas

Willy Offermans wrote:


It is already the third time that I report this error. Can someone help
me in solving this issue?

Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
...


I can only tell you that I had similar problems with FreeBSD 6.3 and ICH7R 
based RAID. Since I couldn't figure out how to solve them, I discarded the 
BIOS-based RAID and instead set up gmirror. It's been running this way for 
a year now and been rock solid.


--
Toomas Aas

... One way to be happy ever after is not to be after too much.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


g_vfs_done error third part--PLEASE HELP!

2008-04-21 Thread Willy Offermans
Dear FreeBSD friends,

It is already the third time that I report this error. Can someone help
me in solving this issue?

Over and over again and always after heavy disk I/O I see the following
errors in the log files. If I force ar0s1g to unmount the machine
spontaneously reboots. Nothing seriously seems to be damaged by this
act, but anyway I cannot afford something bad happening to this
production machine.

Currently the error is the following:

snip
...
Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, 
length=2048)]error = 5
...
/snip

before the error appeared like:

snip
...
Apr 18 20:00:15 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, 
length=2048)]error = 5
Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
length=4096)]error = 5
Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, 
length=2048)]error = 5
Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
length=4096)]error = 5
Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, 
length=2048)]error = 5
Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
length=4096)]error = 5
Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, 
length=2048)]error = 5
Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
length=4096)]error = 5
...
/snip

I have no clue what the errors mean, since offsets of 290725068800,
290725072896, and 290725074944 seem to be ridiculous. Does anybody 
have a clue what is going on?

I'm using FreeBSD 7.0, but found the error being reported before with
previous versions of FreeBSD. I can and will provide more details on
demand.

Any hints are very much appreciated.


-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: [EMAIL PROTECTED]

   Powered by 

(__)
 \\\'',)
   \/  \ ^
   .\._/_)

   www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: g_vfs_done error third part--PLEASE HELP!

2008-04-21 Thread Roland Smith
On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:
 Dear FreeBSD friends,
 
 It is already the third time that I report this error. Can someone help
 me in solving this issue?

Probably the reason that you hear so little is that you provide so
little information. Most of us are not clairvoyant.
 
 Over and over again and always after heavy disk I/O I see the following
 errors in the log files. If I force ar0s1g to unmount the machine
 spontaneously reboots. Nothing seriously seems to be damaged by this
 act, but anyway I cannot afford something bad happening to this
 production machine.

Why would you force an unmount?

 Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, 
 length=4096)]error = 5
 
 I have no clue what the errors mean, since offsets of 290725068800,
 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
 have a clue what is going on?

For starters, how big is ar0s1g? If the offset is in bytes, it is around
270 GB, which is not that unusual in this day and age.

 I'm using FreeBSD 7.0, but found the error being reported before with
 previous versions of FreeBSD. I can and will provide more details on
 demand.

What does 'df' say?

Did you notice any file corruption in the filesystem on ar0s1g?

Unmount the filesystem and run fsck(8) on it. Does it report any errors?

 Any hints are very much appreciated.

Did you manage to create a partition larger than the disk is (using
newfs's -s switch)? In that case it could be that you're trying to write
past the end of the device.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpfrJ8nxl21I.pgp
Description: PGP signature


RE: g_vfs_done error third part--PLEASE HELP!

2008-04-21 Thread Jan Mikkelsen
Hi Willy,

You seem to have emailed me directly as well as posting to the list.

The bad offsets are probably because you have filesystem corruption, and the
actual event that caused it was probably not reported (or is at least not
reported by these errors).

Basic question: Do you have a hardware problem?

- Do you have ECC memory? If not, have you run memtest?
- Are your disks reliable, or is one corrupting data?

Less basic questions:  What is the corruption, and what the cause?  That
might require a little more work and dropping into the debugger.

You could also try reconfiguring to use gmirror instead of ar to see if that
improves things (ie: it could be an ar bug).

Regards,

Jan.



 -Original Message-
 From: Willy Offermans [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, 22 April 2008 5:04 AM
 To: freebsd-stable@FreeBSD.ORG
 Subject: g_vfs_done error third part--PLEASE HELP!
 
 
 Dear FreeBSD friends,
 
 It is already the third time that I report this error. Can 
 someone help
 me in solving this issue?
 
 Over and over again and always after heavy disk I/O I see the 
 following
 errors in the log files. If I force ar0s1g to unmount the machine
 spontaneously reboots. Nothing seriously seems to be damaged by this
 act, but anyway I cannot afford something bad happening to this
 production machine.
 
 Currently the error is the following:
 
 snip
 ...
 Apr 21 19:44:36 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
 Apr 21 19:45:07 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
 Apr 21 19:45:38 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
 ...
 /snip
 
 before the error appeared like:
 
 snip
 ...
 Apr 18 20:00:15 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
 Apr 18 20:00:46 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
 Apr 18 20:00:46 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
 Apr 18 20:01:17 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
 Apr 18 20:01:17 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
 Apr 18 20:01:48 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
 Apr 18 20:01:48 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
 Apr 18 20:02:19 sun kernel: 
 g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
 ...
 /snip
 
 I have no clue what the errors mean, since offsets of 290725068800,
 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
 have a clue what is going on?
 
 I'm using FreeBSD 7.0, but found the error being reported before with
 previous versions of FreeBSD. I can and will provide more details on
 demand.
 
 Any hints are very much appreciated.
 
 
 -- 
 Met vriendelijke groeten,
 With kind regards,
 Mit freundlichen Gruessen,
 De jrus wah,
 
 Willy
 
 *
 W.K. Offermans
 Home:   +31 45 544 49 44
 Mobile: +31 653 27 16 23
 e-mail: [EMAIL PROTECTED]
 
Powered by 
 
 (__)
  \\\'',)
\/  \ ^
.\._/_)
 
www.FreeBSD.org
 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]