Re: Large discrepancy in reported disk usage on USR partition

2008-10-31 Thread Mel
On Friday 31 October 2008 02:20:39 Brendan Hart wrote:

  Is it possible that nfs directory got written to /usr at some point in

 time?

  You would only notice this with du if the nfs directory is unmounted.
  Unmount it and ls -al /usr/mountpoint should only give you an empty dir

 Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local
 dir which had an old copy of the entire NFS mounted dir. I guess it must
 have been written incorrectly to this standby server by RSYNC before the
 NFS mount was put in place. I will add an exclusion to rsync to make sure
 it does not happen again even if the NFS dir is not mounted.

I used to nfs mount /usr/ports and run a cron job on the local machine. I made 
a file on the local machine:
echo 'This is a mountpoint'  /usr/ports/KEEP_ME_EMPTY

The script would:
if [ -e /usr/ports/KEEP_ME_EMPTY ]; then
do_nfs_mount();
if [ -e /usr/ports/KEEP_ME_EMPTY ]; then
give_up_or_wait();
fi
fi

Of course it's fragile, but it works for not so critical issues.


-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Mel
On Thursday 30 October 2008 01:42:32 Brendan Hart wrote:
 Hi,

 I have inherited some servers running various releases of FreeBSD and I am
 having some trouble with the /usr partition on one of these boxen.

 The problem is that there appears to be far more space used on the USR
 partition than there are actual files on the partition. The utility df -h
 reports 25GB used (i.e. nearly the whole partition), but du -x /usr
 reports only 7.6GB of files.

 I have reviewed the FAQ, particularly item 9.24 The du and df commands
 show different amounts of disk space available. What is going on?.
 However, the suggested cause of the discrepancy (large files already
 unlinked but still held open by active processes), does not appear to be
 true in this case as problem is present even after rebooting into single
 user mode.

 #: uname -a
 FreeBSD ibisweb4spare.strategicecommerce.com.au 6.1-RELEASE FreeBSD
 6.1-RELEASE #0: Sun May  7 04:42:56 UTC 2006
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  i386

 #: df -h
 Filesystem  SizeUsed   Avail Capacity  Mounted on
 /dev/aacd0s1a   496M163M 293M36%/
 devfs   1.0K1.0K 0B  100%   /dev
 /dev/aacd0s1e   496M15M  441M3% /tmp
 /dev/aacd0s1f28G25G  1.2G96%/usr
 /dev/aacd0s1d   1.9G429M 1.3G24%/var

Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

 #: du -x -h /usr
 2.0K/usr/.snap
  24M/usr/bin
   
   snip
   
 584M/usr/ports
 140K/usr/lost+found
 7.6G/usr

Is it possible that nfs directory got written to /usr at some point in time? 
You would only notice this with du if the nfs directory is unmounted.

Unmount it and ls -al /usr/mountpoint should only give you an empty dir.
-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
 I took a look at using the smart tools as you suggested, but have now 
 found that the disk in question is a RAID1 set on a DELL PERC 3/Di 
 controller and smartctl does not appear to be the correct tool to 
 access the SMART data for the individual disks.  After a little 
 research, I have found the aaccli tool and used it to get the following
information:

 Sadly, that controller does not show you SMART attributes.  This is one of
 the biggest problems with the majority (but not all) of hardware RAID 
 controllers -- they give you no access to disk-level things like SMART.
 FreeBSD has support for such (using CAM's pass(4)), but the driver has
 to support/use it, *and* the card firmware has to support it.  At present,
 Areca, 3Ware, and Promise controllers support such; HighPoint might, but 
 I haven't confirmed it.  Adaptec does not.

 What you showed tells me nothing about SMART, other than the remote
possibility 
 its basing some of its decisions on the general SMART health status, 
 which means jack squat.  I can explain why this is if need be, but it's
 not related to the problem you're having.

Thanks for this additional information. I hadn't understood that there was
far more information behind the simple SMART ok/not ok reported by the PERC
controller.

 Either way, this is just one of many reasons to avoid hardware RAID
controllers if given the choice.

I have seen some mentions of using gvinum and/or gmirror to achieve the
goals of protection from Single Point Of Failure with a single disk, which I
believe is the reason that most people, myself included, have specified
Hardware RAID in their servers. Is this what you mean by avoiding Hardware
Raid? 


 I hope these are SCSI disks you're showing here, otherwise I'm not sure
how the 
 controller is able to get the primary defect count of a SATA or SAS disk.
So, 
 assuming the numbers shown are accurate, then yes, I don't think there's
any 
 disk-level problem.

Yes, they are SCSI disks. Not particularly relevant to this topic, but
interesting: I would have thought that SAS would make the same information
available as SCSI does, as it is a serial bus evolution of SCSI. Is this
thinking incorrect?

 I understand at this point you're running around with your arms in the
air, 
 but you've already confirmed one thing: none of your other systems exhibit

 this problem.  If this is a production environment, step back a moment and

 ask yourself: just how much time is this worth?  It might be better to
just 
 newfs the filesystem and be done with it, especially if this is a
one-time-never-seen-before thing.

 I will wait and see if any other list member has any suggestions for 
 me to try, but I am now leaning toward scrubbing the system. Oh well.

 When you say scrubbing, are you referring to actually formatting/wiping
the system, or are you referring to disk scrubbing?

I meant reformatting and reinstalling, as a way to escape the issue without
spending too much more time on it. I would of course like to understand the
problem so as to know what to avoid in the future, but as you make the point
above, time is money and it is rapidly approaching the point where it isn't
worth any more effort.

Thanks for all your help.

Best Regards,
Brendan Hart

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Jeremy Chadwick
On Fri, Oct 31, 2008 at 11:15:15AM +1030, Brendan Hart wrote:
  What you showed tells me nothing about SMART, other than the remote 
  possibility 
  its basing some of its decisions on the general SMART health status, 
  which means jack squat.  I can explain why this is if need be, but it's
  not related to the problem you're having.
 
 Thanks for this additional information. I hadn't understood that there was
 far more information behind the simple SMART ok/not ok reported by the PERC
 controller.

Here's an example of some attributes:

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   200   200   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0003   178   175   021Pre-fail  Always   
-   6066
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   50
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000e   200   200   051Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   085   085   000Old_age   Always   
-   11429
 10 Spin_Retry_Count0x0012   100   253   051Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0012   100   253   051Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   48
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always   
-   33
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   50
194 Temperature_Celsius 0x0022   117   100   000Old_age   Always   
-   33
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0010   200   200   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x0008   200   200   051Old_age   Offline  
-   0

You probably now understand why having access to this information is
useful.  :-)  It's very disappointing that so many RAID controllers
don't provide a way to get at this information; the ones which do I am
very thankful for!

  Either way, this is just one of many reasons to avoid hardware RAID
 controllers if given the choice.
 
 I have seen some mentions of using gvinum and/or gmirror to achieve the
 goals of protection from Single Point Of Failure with a single disk, which I
 believe is the reason that most people, myself included, have specified
 Hardware RAID in their servers. Is this what you mean by avoiding Hardware
 Raid? 

More or less.  Hardware RAID has some advantages (I can dig up a mail of
mine long ago outlining what the advantages were), but a lot of the time
the controller acts as more of a hindrance than a benefit.  I personally
feel the negatives outweigh the positives, but each person has different
needs and requirements.  There are some controllers which work very well
and provide great degrees of insights (at a disk level) under FreeBSD,
and those are often what I recommend if someone wants to go that route.

I make it sound like I'm the authoritative voice for what a person
should or should not buy -- I'm not.  I predominantly rely on Intel ICHx
on-board controllers with SATA disks, because ICHx works quite well
under FreeBSD (especially with AHCI).

I personally have no experience with gmirror or gvinum, but I do have
experience with ZFS.  (I'll have a little more experience with gmirror
once I have the time to test some reported problems with gmirror and
high interrupt counts when a disk is hot-swapped).

  I hope these are SCSI disks you're showing here, otherwise I'm not sure how 
  the 
  controller is able to get the primary defect count of a SATA or SAS disk.  
  So, 
  assuming the numbers shown are accurate, then yes, I don't think there's 
  any 
  disk-level problem.

 Yes, they are SCSI disks. Not particularly relevant to this topic, but
 interesting: I would have thought that SAS would make the same information
 available as SCSI does, as it is a serial bus evolution of SCSI. Is this
 thinking incorrect?

I don't have any experience with SAS, so I can't comment on what
features are available on SAS.

Specifically with regards to SMART: historically, SCSI does not provide
the amount of granularity/detail with attributes as ATA/SATA does.  I do
not consider this a negative against SCSI (in case, I very much like
SCSI).  SAS might provide these details, but I don't know, as I don't
have any SAS disks.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making 

RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
 #: df -h
 Filesystem  SizeUsed   Avail Capacity  Mounted on
 /dev/aacd0s1a   496M163M 293M36%/
 devfs   1.0K1.0K 0B  100%   /dev
 /dev/aacd0s1e   496M15M  441M3% /tmp
 /dev/aacd0s1f28G25G  1.2G96%/usr
 /dev/aacd0s1d   1.9G429M 1.3G24%/var

 Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

Yes, it really is the untruncated output of df -h. I also tried the df -t
nonfs and it gives exactly the same output as df. What are you expecting
that is not present in the output ?

 Is it possible that nfs directory got written to /usr at some point in
time? 
 You would only notice this with du if the nfs directory is unmounted.
 Unmount it and ls -al /usr/mountpoint should only give you an empty dir

Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
which had an old copy of the entire NFS mounted dir. I guess it must have
been written incorrectly to this standby server by RSYNC before the NFS
mount was put in place. I will add an exclusion to rsync to make sure it
does not happen again even if the NFS dir is not mounted.

Thank you for your help, you have saved me much time rebuilding this server.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Jeremy Chadwick
On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:
  #: df -h
  Filesystem  SizeUsed   Avail Capacity  Mounted on
  /dev/aacd0s1a   496M163M 293M36%/
  devfs   1.0K1.0K 0B  100%   /dev
  /dev/aacd0s1e   496M15M  441M3% /tmp
  /dev/aacd0s1f28G25G  1.2G96%/usr
  /dev/aacd0s1d   1.9G429M 1.3G24%/var
 
  Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?
 
 Yes, it really is the untruncated output of df -h. I also tried the df -t
 nonfs and it gives exactly the same output as df. What are you expecting
 that is not present in the output ?
 
  Is it possible that nfs directory got written to /usr at some point in
 time? 
  You would only notice this with du if the nfs directory is unmounted.
  Unmount it and ls -al /usr/mountpoint should only give you an empty dir
 
 Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
 which had an old copy of the entire NFS mounted dir. I guess it must have
 been written incorrectly to this standby server by RSYNC before the NFS
 mount was put in place. I will add an exclusion to rsync to make sure it
 does not happen again even if the NFS dir is not mounted.
 
 Thank you for your help, you have saved me much time rebuilding this server.

Can either of you outline what exactly happened here?  I'm trying to
figure out how an NFS mount was hiding a 17G local dir, when there's
no NFS mounts shown in the above df output.  This is purely an ignorant
question on my part, but I'm not able to piece together what happened.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Kevin Kinsey

Jeremy Chadwick wrote:

On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:

#: df -h
Filesystem  SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a   496M163M 293M36%/
devfs   1.0K1.0K 0B  100%   /dev
/dev/aacd0s1e   496M15M  441M3% /tmp
/dev/aacd0s1f28G25G  1.2G96%/usr
/dev/aacd0s1d   1.9G429M 1.3G24%/var

Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

Yes, it really is the untruncated output of df -h. I also tried the df -t
nonfs and it gives exactly the same output as df. What are you expecting
that is not present in the output ?


I would have to assume he's looking for an NFS mount ;-)


Is it possible that nfs directory got written to /usr at some point in
time? 

You would only notice this with du if the nfs directory is unmounted.
Unmount it and ls -al /usr/mountpoint should only give you an empty dir



Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
which had an old copy of the entire NFS mounted dir. I guess it must have
been written incorrectly to this standby server by RSYNC before the NFS
mount was put in place. I will add an exclusion to rsync to make sure it
does not happen again even if the NFS dir is not mounted.

Thank you for your help, you have saved me much time rebuilding this server.


Can either of you outline what exactly happened here?  I'm trying to
figure out how an NFS mount was hiding a 17G local dir, when there's
no NFS mounts shown in the above df output.  This is purely an ignorant
question on my part, but I'm not able to piece together what happened.


Well, it would appear that perhaps Mel also guessed right about df
being aliased?  Just my guess, but, as you mention, no nfs mounts
appear.  I may be mistaken, but I think it's also possible to get
into this sort of situation by mounting a local partition on a 
non-empty mountpoint---at least, it happened to me recently.


Kevin Kinsey
--
A triangle which has an angle of 135 degrees is called an obscene
triangle.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
Now that you mention it, it *is* strange that the NFS mount was not listed
by the df function.

Try again after a fresh reboot:

#: df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a  496M176M280M39%/
devfs  1.0K1.0K  0B   100%/dev
/dev/aacd0s1e  496M 15M441M 3%/tmp
/dev/aacd0s1f   28G4.8G 21G19%/usr
/dev/aacd0s1d  1.9G430M1.3G24%/var
server2:/storage/blah/foo/data/397G103G262G28%
/usr/home/development/mount/foobar

I guess I must have missed the final line when copying the output when I
first posted to the mailing list. And then when I replied Mel, I had already
nmounted the NFS dir when attempting the suggested fix, so it did not show
when I ran df again to double-check, and I did not realize what had
happened.

I apologise for any confusion caused.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 


-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Friday, 31 October 2008 12:02 PM
To: Brendan Hart
Cc: 'Mel'; freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:
  #: df -h
  Filesystem  SizeUsed   Avail Capacity  Mounted on
  /dev/aacd0s1a   496M163M 293M36%/
  devfs   1.0K1.0K 0B  100%   /dev
  /dev/aacd0s1e   496M15M  441M3% /tmp
  /dev/aacd0s1f28G25G  1.2G96%/usr
  /dev/aacd0s1d   1.9G429M 1.3G24%/var
 
  Is this output untruncated? Is df really df or an alias to 'df -t
nonfs'?
 
 Yes, it really is the untruncated output of df -h. I also tried the 
 df -t nonfs and it gives exactly the same output as df. What are 
 you expecting that is not present in the output ?
 
  Is it possible that nfs directory got written to /usr at some point 
  in
 time? 
  You would only notice this with du if the nfs directory is unmounted.
  Unmount it and ls -al /usr/mountpoint should only give you an empty 
  dir
 
 Bingo!! That is exactly the problem. An NFS mount was hiding a 17G 
 local dir which had an old copy of the entire NFS mounted dir. I guess 
 it must have been written incorrectly to this standby server by RSYNC 
 before the NFS mount was put in place. I will add an exclusion to 
 rsync to make sure it does not happen again even if the NFS dir is not
mounted.
 
 Thank you for your help, you have saved me much time rebuilding this
server.

Can either of you outline what exactly happened here?  I'm trying to figure
out how an NFS mount was hiding a 17G local dir, when there's no NFS
mounts shown in the above df output.  This is purely an ignorant question on
my part, but I'm not able to piece together what happened.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart
Hi,

I have inherited some servers running various releases of FreeBSD and I am
having some trouble with the /usr partition on one of these boxen.

The problem is that there appears to be far more space used on the USR
partition than there are actual files on the partition. The utility df -h
reports 25GB used (i.e. nearly the whole partition), but du -x /usr
reports only 7.6GB of files.

I have reviewed the FAQ, particularly item 9.24 The du and df commands show
different amounts of disk space available. What is going on?. However, the
suggested cause of the discrepancy (large files already unlinked but still
held open by active processes), does not appear to be true in this case as
problem is present even after rebooting into single user mode.

#: uname -a
FreeBSD ibisweb4spare.strategicecommerce.com.au 6.1-RELEASE FreeBSD
6.1-RELEASE #0: Sun May  7 04:42:56 UTC 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  i386

#: df -h
Filesystem  SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a   496M163M 293M36%/
devfs   1.0K1.0K 0B  100%   /dev
/dev/aacd0s1e   496M15M  441M3% /tmp
/dev/aacd0s1f28G25G  1.2G96%/usr
/dev/aacd0s1d   1.9G429M 1.3G24%/var

#: du -x -h /usr
2.0K/usr/.snap
 24M/usr/bin
  
  snip
  
584M/usr/ports
140K/usr/lost+found
7.6G/usr


The server is used as a standby machine and a nightly cronjob which uses
RSYNC to make a copy of the /usr partition from a live server. Depending on
how recently the logs have been culled, the Live server has approximately
7-10GB of data on the /usr partition, so I would expect the same size of
data on the standby server.

This may be irrelevant, but the server also has an external data directory
with 11GB mounted via NFS as a directory under the USR partition.

Next, I began to suspect some sort of disk corruption (echoes of the old
days of MSDOS lost cluster chains) and I have attempted to find disk issues
by running fsck, but no issues were reported and the issue was not remedied.
I also tried running fsck in single user mode, again, no improvement.

Can anyone suggest what I can try next?

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote:
 I have inherited some servers running various releases of FreeBSD and I am
 having some trouble with the /usr partition on one of these boxen.
 
 The problem is that there appears to be far more space used on the USR
 partition than there are actual files on the partition. The utility df -h
 reports 25GB used (i.e. nearly the whole partition), but du -x /usr
 reports only 7.6GB of files.

Have you tried playing with tunefs(8), -m flag?

I can't reproduce this behaviour on any of our systems.

icarus# df -k /usr
Filesystem   1024-blocksUsed Avail Capacity  Mounted on
/dev/ad12s1f   167879968 1973344 152476228 1%/usr
icarus# du -sx /usr
1973344 /usr

eos# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1f32494668 2261670 27633426 8%/usr
eos# du -sx /usr
2261670 /usr

anubis# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f80010344 1809620 71799898 2%/usr
anubis# du -sx /usr
1809620 /usr

horus# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f32494668 1608458 28286638 5%/usr
horus# du -sx /usr
1608458 /usr

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart
Hi,

The space reserved as minfree does not appear to have been changed from the
default setting of 8%. Is your suggestion that I should change it to a
larger value? I don't understand how modifying it now could fix the
situation, but I could be missing something.

The output of tunefs -p /usr is as follows:

#: tunefs -p /usr
tunefs: ACLs: (-a) disabled
tunefs: MAC multilabel: (-l)   disabled
tunefs: soft updates: (-n) enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)16384
tunefs: average number of files in a directory: (-s)   64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: optimization preference: (-o)  time
tunefs: volume label: (-L)

I have not observed the problem on any of the other ~dozen FreeBSD servers
in our data centre. 

Could the missing space be an indication of hardware disk issues i.e.
physical blocks marked as bad? 

Is it possible on UFS2 for disk space to be allocated but hidden somehow?
(although I have been running the commands such as du -x as superuser)
Similarly, is it possible on UFS2 for disk space to be allocated in lost
cluster chains ?

Best Regards,
Brendan Hart

-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 30 October 2008 11:50 AM
To: Brendan Hart
Cc: freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote:
 I have inherited some servers running various releases of FreeBSD and I am
 having some trouble with the /usr partition on one of these boxen.
 
 The problem is that there appears to be far more space used on the USR
 partition than there are actual files on the partition. The utility df
-h
 reports 25GB used (i.e. nearly the whole partition), but du -x /usr
 reports only 7.6GB of files.

Have you tried playing with tunefs(8), -m flag?

I can't reproduce this behaviour on any of our systems.

icarus# df -k /usr
Filesystem   1024-blocksUsed Avail Capacity  Mounted on
/dev/ad12s1f   167879968 1973344 152476228 1%/usr
icarus# du -sx /usr
1973344 /usr

eos# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1f32494668 2261670 27633426 8%/usr
eos# du -sx /usr
2261670 /usr

anubis# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f80010344 1809620 71799898 2%/usr
anubis# du -sx /usr
1809620 /usr

horus# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f32494668 1608458 28286638 5%/usr
horus# du -sx /usr
1608458 /usr

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 12:11:58PM +1030, Brendan Hart wrote:
 The space reserved as minfree does not appear to have been changed from the
 default setting of 8%.

Okay, then that's likely not the problem.

 Is your suggestion that I should change it to a larger value?

That would just make your problem worse.  :-)

 I don't understand how modifying it now could fix the situation, but I
 could be missing something.

Well, the feature I described isn't what's causing your problem, but to
clarify: if you change the percentage, it applies immediately.  I read
I don't understand how modifying it now could fix ... to mean isn't
this option applied during newfs?

 I have not observed the problem on any of the other ~dozen FreeBSD servers
 in our data centre. 

Unless someone more clueful chimes in with better hints, the obvious
choice here is going to be recreate the filesystem.  I'd tell you
something like try using ffsinfo(8)?, but I've never used the tool,
so very little of the output will make sense to me.

 Could the missing space be an indication of hardware disk issues i.e.
 physical blocks marked as bad? 

The simple answer is no, bad blocks would not cause what you're seeing.
smartctl -a /dev/disk will help you determine if there's evidence the
disk is in bad shape.  I can help you with reading SMART stats if need
be.

Since you booted single-user and presumably ran fsck -f /usr, and
nothing came back, I'm left to believe this isn't filesystem corruption.

 Is it possible on UFS2 for disk space to be allocated but hidden somehow?
 (although I have been running the commands such as du -x as superuser)

That's exactly what the above tunefs parameter describes.

 Similarly, is it possible on UFS2 for disk space to be allocated in lost
 cluster chains ?

I don't know what this means.  Someone more clueful will have to answer.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart
On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote:
 Could the missing space be an indication of hardware disk issues i.e.
 physical blocks marked as bad? 

The simple answer is no, bad blocks would not cause what you're seeing.
smartctl -a /dev/disk will help you determine if there's evidence the disk
is in bad shape.  I can help you with reading SMART stats if need be.

I took a look at using the smart tools as you suggested, but have now found
that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and
smartctl does not appear to be the correct tool to access the SMART data for
the individual disks. After a little research, I have found the aaccli tool
and used it to get the following information:

AAC0 disk show smart
Executing: disk show smart
SmartMethod of Enable
Capable  Informational Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  ControlEnabled  Count
--  ---    -  ---  --
0:00:0 Y6 Y   N 0
0:01:0 Y6 Y   N 0

AAC0 disk show defects 00
Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 285
Number of GROWN defects on drive: 0

AAC0 disk show defects 01
Executing: disk show defects (ID=1)
Number of PRIMARY defects on drive: 193
Number of GROWN defects on drive: 0


This output doesn't seem to indicate existing physical issues on the disks. 

 Since you booted single-user and presumably ran fsck -f /usr, and nothing
came back, I'm left to believe this isn't filesystem corruption.

Yes, this is the command I tried when I went into the data centre yesterday,
and yes, nothing came back. 

I have done some additional digging and noticed that there is a /usr/.snap
folder present. ls -al shows no content however. Some quick searching
shows this could possibly be part of a UFS snapshot... I wonder if partition
snapshots might be the cause of my major disk space loss. Some old message
group posts suggest that UFS snapshots were dangerously flakey on Release
6.1, so I would hope that my predecessors were not using them however...  Do
you know anything about snapshots, and how I could see what/if any/ space is
used by snapshots?

I also took a look to see if the issue could be something like running out
of inodes, But this does't seem to be the case:

#: df -ih /usr
Filesystem   SizeUsed   Avail Capacity iused   ifree %iused  Mounted
on
/dev/aacd0s1f 28G 25G1.1G96%  708181 3107241   19%   /usr


BTW Jeremy, thanks for your help thus far.

I will wait and see if any other list member has any suggestions for me to
try, but I am now leaning toward scrubbing the system. Oh well.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3568 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 02:04:36PM +1030, Brendan Hart wrote:
 On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote:
  Could the missing space be an indication of hardware disk issues i.e.
  physical blocks marked as bad? 
 
 The simple answer is no, bad blocks would not cause what you're seeing.
 smartctl -a /dev/disk will help you determine if there's evidence the disk
 is in bad shape.  I can help you with reading SMART stats if need be.
 
 I took a look at using the smart tools as you suggested, but have now found
 that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and
 smartctl does not appear to be the correct tool to access the SMART data for
 the individual disks.  After a little research, I have found the aaccli tool
 and used it to get the following information:

Sadly, that controller does not show you SMART attributes.  This is one
of the biggest problems with the majority (but not all) of hardware RAID
controllers -- they give you no access to disk-level things like SMART.
FreeBSD has support for such (using CAM's pass(4)), but the driver has
to support/use it, *and* the card firmware has to support it.  At
present, Areca, 3Ware, and Promise controllers support such; HighPoint
might, but I haven't confirmed it.  Adaptec does not.

What you showed tells me nothing about SMART, other than the remote
possibility its basing some of its decisions on the general SMART
health status, which means jack squat.  I can explain why this is if
need be, but it's not related to the problem you're having.

Either way, this is just one of many reasons to avoid hardware RAID
controllers if given the choice.

 AAC0 disk show defects 00
 Executing: disk show defects (ID=0)
 Number of PRIMARY defects on drive: 285
 Number of GROWN defects on drive: 0
 
 AAC0 disk show defects 01
 Executing: disk show defects (ID=1)
 Number of PRIMARY defects on drive: 193
 Number of GROWN defects on drive: 0
 
 This output doesn't seem to indicate existing physical issues on the disks. 

I hope these are SCSI disks you're showing here, otherwise I'm not sure
how the controller is able to get the primary defect count of a SATA or
SAS disk.  So, assuming the numbers shown are accurate, then yes, I
don't think there's any disk-level problem.

 I have done some additional digging and noticed that there is a /usr/.snap
 folder present. ls -al shows no content however. Some quick searching
 shows this could possibly be part of a UFS snapshot...

Correct; the .snap directory is used for UFS2 snapshots and
mksnap_ffs(8) (which is also the program dump -L uses).

 I wonder if partition snapshots might be the cause of my major disk
 space loss.

Your /usr/.snap directory is empty; there are no snapshots.  That said,
are you actually making filesystem snapshots using dump or mksnap_ffs?
If not, then you're barking up the wrong tree.  :-)

 I also took a look to see if the issue could be something like running out
 of inodes, But this does't seem to be the case:
 
 #: df -ih /usr
 Filesystem   SizeUsed   Avail Capacity iused   ifree %iused  Mounted
 on
 /dev/aacd0s1f 28G 25G1.1G96%  708181 3107241   19%   /usr

inodes != disk space, but I'm pretty sure you know that.

I understand at this point you're running around with your arms in the
air, but you've already confirmed one thing: none of your other systems
exhibit this problem.  If this is a production environment, step back a
moment and ask yourself: just how much time is this worth?  It might
be better to just newfs the filesystem and be done with it, especially
if this is a one-time-never-seen-before thing.

 I will wait and see if any other list member has any suggestions for me to
 try, but I am now leaning toward scrubbing the system. Oh well.

When you say scrubbing, are you referring to actually formatting/wiping
the system, or are you referring to disk scrubbing?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]