Re: filesystem full error with inumber

2006-07-27 Thread Feargal Reilly
On Wed, 26 Jul 2006 13:07:19 -0400
Sven Willenberger [EMAIL PROTECTED] wrote:

 
 Feargal Reilly presumably uttered the following on 07/24/06
 11:48:

Looking again at dumpfs, it appears to say that this is
formatted with a block size of 8K, and a fragment size of
2K, but tuning(7) says:  [...]
Reading this makes me think that when this server was
installed, the block size was dropped from the 16K
default to 8K for performance reasons, but the fragment
size was not modified accordingly.

Would this be the root of my problem?
 
  I think a bsize/fsize ratio of 4/1 _should_ work, but it's
  not widely used, so there might be bugs hidden somewhere.
 
  
  Such as df not reporting the actual data usage, which is now
  my best working theory. I don't know what df bases it's
  figures on, perhaps it either slowly got out of sync, or
  more likely, got things wrong once the disk filled up.
  

 One of my machines that I recently upgraded to 6.1
 (6.1-RELEASE-p3) is also exhibiting df reporting wrong data
 usage numbers. Notice the negative Used numbers below:
 
  df -h
 Filesystem SizeUsed   Avail Capacity  Mounted on
 /dev/da0s1a496M 63M393M14%/
 devfs  1.0K1.0K  0B   100%/dev
 /dev/da0s1e989M   -132M1.0G   -14%/tmp
 /dev/da0s1f 15G478M 14G 3%/usr
 /dev/da0s1d 15G   -1.0G 14G-8%/var
 /dev/md0   496M228K456M
 0%/var/spool/MIMEDefang devfs  1.0K1.0K
 0B   100%/var/named/dev
 
 Sven

For the record, my problems occured with 5.4-PRERELEASE #1
which, for reasons beyond my control, I had not yet been unable
to upgrade.

What bsize/fsize ratio are you using? Mine was 4/1 instead of
the more usual 8/1.

BTW, anybody know what the best method be for double-checking
df's figures would be? du?

-- 
Feargal Reilly.
PGP Key: 0x847DE4C8 (expires: 2006-11-30)
Web: http://www.helgrim.com/ | ICQ: 109837009 | YIM: ectoraige
Visit http://ie.bsd.net/ - BSDs presence in Ireland


signature.asc
Description: PGP signature


Re: filesystem full error with inumber

2006-07-27 Thread Oliver Fromme
Sven Willenberger wrote:
  This was an upgrade from a 5.x system (UFS2); a full fsck did in fact fix the
  problem (for now).

Because of past experience I recommend that you disable
background fsck (it has a switch in /etc/rc.conf).  There
are failure scenarios with background fsck that can lead
to symptoms similar to what you have experienced.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

C++ is the only current language making COBOL look good.
-- Bertrand Meyer
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-27 Thread Oliver Fromme
Feargal Reilly [EMAIL PROTECTED] wrote:
  BTW, anybody know what the best method be for double-checking
  df's figures would be? du?

No, du(1) only sees files that have links (i.e. directory
entries).  It doesn't see deleted files that occupy space
as long as processes still have them open, which can make
quite a difference.  You can use the command lsof +L1 to
check for such files.  If there aren't any on the file
system in question, then the number from du(1) should be
pretty close to the number from df(1).

The df(1) tool just displays the summary records from the
file system.  The only safe way to verify those numbers is
to run fsck(8) manually on the file system (possibly twice).
It will fix the summary records if necessary.  Then run
df(1) again.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

To this day, many C programmers believe that 'strong typing'
just means pounding extra hard on the keyboard.
-- Peter van der Linden
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-26 Thread Sven Willenberger


Feargal Reilly presumably uttered the following on 07/24/06 11:48:
 On Mon, 24 Jul 2006 17:14:27 +0200 (CEST)
 Oliver Fromme [EMAIL PROTECTED] wrote:
 
 Nobody else has answered so far, so I try to give it a shot ...

 The filesystem full error can happen in three cases:
 1.  The file system is running out of data space.
 2.  The file system is running out of inodes.
 3.  The file system is running out of non-fragmented blocks.

 The third case can only happen on extremely fragmented
 file systems which happens very rarely, but maybe it's
 a possible cause of your problem.
 
 I rebooted that server, and df then reported that disk at 108%,
 so it appears that df was reporting incorrect figures prior to
 the reboot. Having cleaned up, it appears by my best
 calculations to be showing correct figures now.
 
   kern.maxfiles: 2
   kern.openfiles: 3582

 Those have nothing to do with filesystem full.

 
 Yeah, that's what I figured.
 
   Looking again at dumpfs, it appears to say that this is
   formatted with a block size of 8K, and a fragment size of
   2K, but tuning(7) says:  [...]
   Reading this makes me think that when this server was
   installed, the block size was dropped from the 16K default
   to 8K for performance reasons, but the fragment size was
   not modified accordingly.
   
   Would this be the root of my problem?

 I think a bsize/fsize ratio of 4/1 _should_ work, but it's
 not widely used, so there might be bugs hidden somewhere.

 
 Such as df not reporting the actual data usage, which is now my
 best working theory. I don't know what df bases it's figures on,
 perhaps it either slowly got out of sync, or more likely, got
 things wrong once the disk filled up.
 
 I'll monitor it to see if this happens again, but hopefully
 won't keep that configuration around for too much longer anyway.
 
 Thanks,
 -fr.
 

One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
exhibiting df reporting wrong data usage numbers. Notice the negative Used 
numbers
below:

 df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/da0s1a496M 63M393M14%/
devfs  1.0K1.0K  0B   100%/dev
/dev/da0s1e989M   -132M1.0G   -14%/tmp
/dev/da0s1f 15G478M 14G 3%/usr
/dev/da0s1d 15G   -1.0G 14G-8%/var
/dev/md0   496M228K456M 0%/var/spool/MIMEDefang
devfs  1.0K1.0K  0B   100%/var/named/dev

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-26 Thread Peter Jeremy
On Wed, 2006-Jul-26 13:07:19 -0400, Sven Willenberger wrote:
One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
exhibiting df reporting wrong data usage numbers.

What did you upgrade from?
Is this UFS1 or UFS2?
Does a full fsck fix the problem?

-- 
Peter Jeremy


pgpzljAGgFapT.pgp
Description: PGP signature


Re: filesystem full error with inumber

2006-07-26 Thread Sven Willenberger


Peter Jeremy presumably uttered the following on 07/26/06 15:00:
 On Wed, 2006-Jul-26 13:07:19 -0400, Sven Willenberger wrote:
 One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
 exhibiting df reporting wrong data usage numbers.
 
 What did you upgrade from?
 Is this UFS1 or UFS2?
 Does a full fsck fix the problem?
 

This was an upgrade from a 5.x system (UFS2); a full fsck did in fact fix the
problem (for now).

Thanks,

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-26 Thread Julian H. Stacey
Sven Willenberger wrote:
 
 
 Feargal Reilly presumably uttered the following on 07/24/06 11:48:
  On Mon, 24 Jul 2006 17:14:27 +0200 (CEST)
  Oliver Fromme [EMAIL PROTECTED] wrote:
  
  Nobody else has answered so far, so I try to give it a shot ...
 
  The filesystem full error can happen in three cases:
  1.  The file system is running out of data space.
  2.  The file system is running out of inodes.
  3.  The file system is running out of non-fragmented blocks.
 
  The third case can only happen on extremely fragmented
  file systems which happens very rarely, but maybe it's
  a possible cause of your problem.
  
  I rebooted that server, and df then reported that disk at 108%,
  so it appears that df was reporting incorrect figures prior to
  the reboot. Having cleaned up, it appears by my best
  calculations to be showing correct figures now.
  
kern.maxfiles: 2
kern.openfiles: 3582
 
  Those have nothing to do with filesystem full.
 
  
  Yeah, that's what I figured.
  
Looking again at dumpfs, it appears to say that this is
formatted with a block size of 8K, and a fragment size of
2K, but tuning(7) says:  [...]
Reading this makes me think that when this server was
installed, the block size was dropped from the 16K default
to 8K for performance reasons, but the fragment size was
not modified accordingly.

Would this be the root of my problem?
 
  I think a bsize/fsize ratio of 4/1 _should_ work, but it's
  not widely used, so there might be bugs hidden somewhere.
 
  
  Such as df not reporting the actual data usage, which is now my
  best working theory. I don't know what df bases it's figures on,
  perhaps it either slowly got out of sync, or more likely, got
  things wrong once the disk filled up.
  
  I'll monitor it to see if this happens again, but hopefully
  won't keep that configuration around for too much longer anyway.
  
  Thanks,
  -fr.
  
 
 One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
 exhibiting df reporting wrong data usage numbers. Notice the negative Used 
 numbers
 below:

Negative isnt an example of programming error, just that the system
is now using the last bit only root can use.

for insight try for example
man tunefs
reboot
boot -s
tunefs -m 2 /dev/da0s1e 
then decide what level of m you want default is 8 to 10 I recall.

 
  df -h
 Filesystem SizeUsed   Avail Capacity  Mounted on
 /dev/da0s1a496M 63M393M14%/
 devfs  1.0K1.0K  0B   100%/dev
 /dev/da0s1e989M   -132M1.0G   -14%/tmp
 /dev/da0s1f 15G478M 14G 3%/usr
 /dev/da0s1d 15G   -1.0G 14G-8%/var
 /dev/md0   496M228K456M 0%/var/spool/MIMEDefang
 devfs  1.0K1.0K  0B   100%/var/named/dev
 
 Sven

-- 
Julian Stacey.  Consultant Unix Net  Sys. Eng., Munich.  http://berklix.com
Mail in Ascii, HTML=spam. Ihr Rauch = mein allergischer Kopfschmerz.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-26 Thread Paul Allen
From Julian H. Stacey [EMAIL PROTECTED], Thu, Jul 27, 2006 at 01:45:16AM 
+0200:
 Negative isnt an example of programming error, just that the system
 is now using the last bit only root can use.
 
 for insight try for example
   man tunefs
   reboot
   boot -s
   tunefs -m 2 /dev/da0s1e 
 then decide what level of m you want default is 8 to 10 I recall.
 
  
   df -h
  Filesystem SizeUsed   Avail Capacity  Mounted on
  /dev/da0s1a496M 63M393M14%/
  devfs  1.0K1.0K  0B   100%/dev
  /dev/da0s1e989M   -132M1.0G   -14%/tmp
  /dev/da0s1f 15G478M 14G 3%/usr
  /dev/da0s1d 15G   -1.0G 14G-8%/var
  /dev/md0   496M228K456M 0%/var/spool/MIMEDefang
  devfs  1.0K1.0K  0B   100%/var/named/dev
  
  Sven
Julian: if you looked more closely you would see that the 
negative numbers appear not in the available category but
in the 'USED'.  This has nothing to do with root reserve.

It may have something to do with background fsck though but
it is rather inconsistent.

989 - (-132)  ==  1G
15G - (-1.0G) != 14G

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: filesystem full error with inumber

2006-07-24 Thread Feargal Reilly
On Mon, 24 Jul 2006 17:14:27 +0200 (CEST)
Oliver Fromme [EMAIL PROTECTED] wrote:

 Nobody else has answered so far, so I try to give it a shot ...
 
 The filesystem full error can happen in three cases:
 1.  The file system is running out of data space.
 2.  The file system is running out of inodes.
 3.  The file system is running out of non-fragmented blocks.
 
 The third case can only happen on extremely fragmented
 file systems which happens very rarely, but maybe it's
 a possible cause of your problem.

I rebooted that server, and df then reported that disk at 108%,
so it appears that df was reporting incorrect figures prior to
the reboot. Having cleaned up, it appears by my best
calculations to be showing correct figures now.

   kern.maxfiles: 2
   kern.openfiles: 3582
 
 Those have nothing to do with filesystem full.
 

Yeah, that's what I figured.

   Looking again at dumpfs, it appears to say that this is
   formatted with a block size of 8K, and a fragment size of
   2K, but tuning(7) says:  [...]
   Reading this makes me think that when this server was
   installed, the block size was dropped from the 16K default
   to 8K for performance reasons, but the fragment size was
   not modified accordingly.
   
   Would this be the root of my problem?
 
 I think a bsize/fsize ratio of 4/1 _should_ work, but it's
 not widely used, so there might be bugs hidden somewhere.
 

Such as df not reporting the actual data usage, which is now my
best working theory. I don't know what df bases it's figures on,
perhaps it either slowly got out of sync, or more likely, got
things wrong once the disk filled up.

I'll monitor it to see if this happens again, but hopefully
won't keep that configuration around for too much longer anyway.

Thanks,
-fr.

-- 
Feargal Reilly.
PGP Key: 0x847DE4C8 (expires: 2006-11-30)
Web: http://www.helgrim.com/ | ICQ: 109837009 | YIM: ectoraige
Visit http://ie.bsd.net/ - BSDs presence in Ireland


signature.asc
Description: PGP signature


filesystem full error with inumber

2006-07-21 Thread Feargal Reilly

The following error is being logged in /var/log/messages on
FreeBSD 5.4:

Jul 21 09:58:44 arwen kernel: pid 615 (postgres), uid 1001
inumber 6166128 on /data0: filesystem full

However, this does not appear to be a case of being out of disk
space, or running out of inodes:

ttyp2$ df -hi
Filesystem   SizeUsed   Avail Capacity iused   ifree
%iused  Mounted on
/dev/amrd0s1f 54G 44G5.4G89% 4104458 3257972
56%   /data0

Nor does it appear to be a file limit:

ttyp2$ sysctl kern.maxfiles kern.openfiles
kern.maxfiles: 2
kern.openfiles: 3582

These reading were not taken at exactly the same time as the
error occured, but close to it.

Here's the head of dumpfs:

magic   19540119 (UFS2) timeFri Jul 21 09:38:40 2006
superblock location 65536   id  [ 42446884 99703062 ]
ncg 693 size29360128blocks  28434238
bsize   8192shift   13  mask0xe000
fsize   2048shift   11  mask0xf800
frag4   shift   2   fsbtodb 2
minfree 8%  optim   timesymlinklen 120
maxbsize 8192   maxbpg  1024maxcontig 16contigsumsize 16
nbfree  563891  ndir495168  nifree  3245588 nffree  19898
bpg 10597   fpg 42388   ipg 10624
nindir  1024inopb   32  maxfilesize 8804691443711
sbsize  2048cgsize  8192csaddr  1372cssize  12288
sblkno  36  cblkno  40  iblkno  44  dblkno  1372
cgrotor 322 fmod0   ronly   0   clean   0
avgfpdir 64 avgfilesize 16384
flags   soft-updates 
fsmnt   /data0
volname swuid   0

Now the server's main function in life is running postgres.
I first noticed this error during a maintainence run
which sequentially dumps and vacuums each individual database.
The are currently 117 databases, most of which are no more than
20M in size, but there are a few outliers, the largest of which
is 792M in size. The chunk of this is stored in a single 500+M
file, so I can't see this consuming all my inodes, even if
soft-updates weren't cleaning up, perhaps I'm wrong. It has
since been happening outside of those runs as well.

I have searched through various forums and list archives, and
while I have found a few references to this error, I have not
been able to find a cause and subsequent solution posted.

Looking through the source, the error is being logged by
ffs_fserr in sys/ufs/ffs/ffs_alloc.c It is being called either
by ffs_alloc or by ffs_realloccg after either of the following
conditions:

ffs_alloc {
...
retry:
  if (size == fs-fs_bsize  fs-fs_cstotal.cs_nbfree == 0)
goto nospace;
freespace(fs, fs-fs_minfree) - numfrags(fs, size) 
0) goto nospace;
...
nospace:
if (fs-fs_pendingblocks  0  reclaimed == 0) {
reclaimed = 1;
softdep_request_cleanup(fs, ITOV(ip));
goto retry;
}
ffs_fserr(fs, ip-i_number, filesystem full);
}

My uninformed and uneducated reading of this is that it does not
think there are enough blocks free, yet that does not tally with
what df is telling me.

Looking again at dumpfs, it appears to say that this is formatted
with a block size of 8K, and a fragment size of 2K, but
tuning(7) says:

 FreeBSD performs best when using 8K or 16K file system
block sizes.  The default file system block size is 16K, which
provides best performance for most applications, with the
exception of those that perform random access on large files
(such as database server software).  Such applica- tions tend to
perform better with a smaller block size, although modern disk
characteristics are such that the performance gain from using a
smaller block size may not be worth consideration.  Using a
block size larger than 16K can cause fragmentation of the buffer
cache and lead to lower performance.

 The defaults may be unsuitable for a file system that
requires a very large number of i-nodes or is intended to hold a
large number of very small files.  Such a file system should be
created with an 8K or 4K block size.  This also requires you to
specify a smaller fragment size.  We recommend always using a
fragment size that is 1/8 the block size (less testing has been
done on other fragment size factors).

Reading this makes me think that when this server was installed,
the block size was dropped from the 16K default to 8K for
performance reasons, but the fragment size was not modified
accordingly.

Would this be the root of my problem? If so, is my only option
to back everything up and newfs the disk, or is there something
else I can do that will minimise my downtime?

Any help and advice would be greatly appreciated.

-Feargal.

-- 
Feargal Reilly, Chief Techie, FBI.
PGP Key: 0x105D7168 (expires: 2006-11-30)
Web: http://www.fbi.ie/ | Tel: +353.14988588 | Fax: +353.14988489
Communications House, 11 Sallymount Avenue, Ranelagh, Dublin 6.


-- 
Feargal Reilly.
PGP Key: 0x847DE4C8 (expires: 2006-11-30)
Web: http://www.helgrim.com/ | ICQ: 109837009 | YIM: