Re: SU+J systems do not fsck themselves

2011-12-29 Thread David Thiel
On Wed, Dec 28, 2011 at 12:57:31AM -0700, Scott Long wrote:
 So, there's an assumption with SUJ+fsck that SU is keeping the filesystem 
 consistent.  Maybe that's a bad assumption, and I'm not trying to discredit 
 your report.  But the intention with SUJ is to eliminate the need for 
 anything more than a cursory check of the superblocks and a processing of the 
 SUJ intent log.  If either of these fails then fsck reverts to a traditional 
 scan.  In the same vein, ext3 and most other traditional journaling 
 filesystems assume that the journal is correct and is preserving consistency, 
 and don't do anything more than a cursory data structure scan and journal 
 replay as well, but then revert to a full scan if that fails (zfs seems to be 
 an exception here, with there being no actual fsck available for it).
 
 As for the 180 day forced scan on ext3, I have no public comment.  SU has 
 matured nicely over the last 10+ years, and I'm happy with the progress that 
 SUJ has made in the last 2-3 years.  If there are bugs, they need to be 
 exposed and addressed ASAP.

That clears things up somewhat - thank you for taking the time to 
explain all that. I've got results from two other users (Cc'd) with a 
fsck in single user mode using the journal and not using it. One has 
geli, one does not, and both were with clean shutdown/boot (correct me 
if I'm wrong, guys). Any thoughts?

=
Machine 1, with journal:
=

Script started on Thu Dec 29 11:26:29 2011
fsck /
** /dev/ada0.eli

USE JOURNAL? [yn] y

** SU+J Recovering /dev/ada0.eli
** Reading 33554432 byte journal from inode 4.

RECOVER? [yn] y

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? [yn] y

** 108 journal records in 49152 bytes for 7.03% utilization
** Freed 9 inodes (0 dirs) 0 blocks, and 1 frags.

* FILE SYSTEM MARKED CLEAN *

Script done on Thu Dec 29 11:26:39 2011

=
Machine 1, without journal:
=

Script started on Thu Dec 29 11:26:49 2011
fsck /
** /dev/ada0.eli

USE JOURNAL? [yn] n

** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=251177 (8 should be 0)
CORRECT? [yn] y

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
220435 files, 3945055 used, 3666151 free (17503 frags, 456081 blocks, 0.2% 
fragmentation)

* FILE SYSTEM IS CLEAN *

* FILE SYSTEM WAS MODIFIED *

Script done on Thu Dec 29 11:27:08 2011


=
Machine 2, with journal:
=

** /dev/ada0s1a

USE JOURNAL? yes

** SU+J Recovering /dev/ada0s1a
** Reading 33554432 byte journal from inode 4.

RECOVER? yes

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? yes

** 131 journal records in 11776 bytes for 35.60% utilization
** Freed 0 inodes (0 dirs) 0 blocks, and 0 frags.

* FILE SYSTEM MARKED CLEAN *

=
Machine 2, without journal:
=

** /dev/ada0s1a
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] 
SUMMARY INFORMATION BAD
SALVAGE? [yn] 
BLK(S) MISSING IN BIT MAPS
SALVAGE? [yn] 
670213 files, 19118534 used, 54535063 free (158431 frags, 6797079 blocks, 0.2% 
fragmentation)

* FILE SYSTEM MARKED CLEAN *

* FILE SYSTEM WAS MODIFIED *

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SU+J systems do not fsck themselves

2011-12-29 Thread David Thiel
On Thu, Dec 29, 2011 at 03:02:14PM -0800, David Thiel wrote:
 =
 Machine 1, with journal:
 =
 
 Script started on Thu Dec 29 11:26:29 2011
 fsck /
 ** /dev/ada0.eli

Correction - machine 1 had an unclean shutdown. Will get additional logs 
soon.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
9-CURRENT on ppc) running SU+J that have had unexplained panics and 
crashes start happening relating to disk I/O. When I end up running a 
full fsck, it keeps turning out that the disk is dirty and corrupted, 
but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
never happens, but a manual fsck in single-user does indeed fix the 
crashing and weird behavior. Others have tested their SU+J volumes and 
found them to have errors as well. This makes me super nervous.

Basically, the way SU+J seems to operate is this:

http://redundancy.redundancy.org/fscklog2

Oh hey, I see you shut down uncleanly, let's check everything looks 
good, off you go, whee

Until I actually go and fsck, when I get:

http://redundancy.redundancy.org/fscklog1

So, I understand that journalling doesn't replace the need for a 
potential fsck (though I never had this problem with gjournal), but 
without a way for the system to detect that a fsck is necessary, this 
seems pretty much a guaranteed recipe for data corruption, and seems to 
offer little to no benefit over plain SU+fsck, or even just mounting 
async.

So: is everyone else seeing this? Am I misunderstanding how SU+J should 
be used? How should the error resolution process really happen? 

Thanks,
David
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 02:29:03PM -0800, Xin LI wrote:
 I'm not sure if your experiments are right here, the second log shows
 you're running it read-only, which is likely caused by running it on
 live file system.  

Yes, this most recent instance is me running it on a live FS, because 
I'm using that machine to type this right now. :) However, I've had the 
issues fixed in single-user on other systems and had the problems go 
away. At least for a bit.

 - use journalled fsck;
 - use normal fsck to check if the journalled fsck did the right thing.

When you say use journalled fsck, what's the proper way to initiate 
that? I don't see any journal-related options in the man page.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote:
  - use journalled fsck; - use normal fsck to check if the
  journalled fsck did the right thing.

Ok, here is the log of fsck with and without journal.

http://redundancy.redundancy.org/fscklog3

That was done the very next boot, after a clean shutdown. The errors 
from the previous live fsck aren't there (oddly), but there are still 
are apparently some corrections made. The next fsck still complains, but 
doesn't give any salvage prompts.

Here is jsa@'s, done on a live FS with SU+J:

http://redundancy.redundancy.org/fscklog4

I'm not actually looking to solve my particular problem per se. The 
issue is that almost everyone I've checked with that's running SU+J gets 
unref'd file and other errors when they check their filesystem (with the 
fs live). Unless I'm missing something, a running FS should never have 
those kinds of errors unless you deliberately disabled fsck.

This leaves only a couple options:

- SU+J and fsck do not work correctly together to fix corruption on 
  boot, i.e. bgfsck isn't getting run when it should
- Stuff is getting completely screwed up after boot
- fsck is giving incorrect results
- I'm completely clueless about how SU+J is supposed to behave or be 
  deployed

I'm pretty certain that the first is the issue here. It would be great 
if others could check their own SU+J filesystems so we could get a few 
more data points.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote:
 The first run of fsck, using the journal, gives results that I would 
 expect.  The second run seems to imply that the fixes made on the 
 first run didn't actually get written to disk.  This is definitely an 
 oddity.  I see that you're using geli, maybe there's some strange 
 side-effect there.  No idea.  Report as a bug, this is definitely 
 undesired behavior.

Not impossible, but I was seeing similar issues on two non-geli systems 
as well, i.e. tons of errors fixed when doing a single-user 
non-journalled fsck, but journalled fsck not fixing stuff. I'll try to 
replicate on a test machine, as I already lost data on the last 
(non-geli) machine this happened to.

 For the love that is all good and holy, don't ever run fsck on a live 
 filesystem.  It's going to report these kinds of problems!  It's 
 normal; filesystem metadata updates stay cached in memory, and fsck 
 bypasses that cache.  

Ok. I expected fsck would be softupdate-aware in that way, but I 
understand it not doing so.

  - SU+J and fsck do not work correctly together to fix corruption on 
  boot, i.e. bgfsck isn't getting run when it should
 
 The point of SUJ is to eliminate the need for bgfsck.  Effectively, 
 they are exclusive ideas.  

This is surprising to me. It is my impression that under Linux at least, 
ext3fs is checked against the journal, and gets a full e2fsck if it 
finds it's still dirty. Additionally, there's a periodic fsck after 180 
days continuous runtime or x number of mounts (see tune2fs -i and -c).  
Is SU+J somehow implemented in such a way that this is unnecessary? What 
does it do that the ext3fs people have missed?

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: TR : IPFilter

2003-02-09 Thread David Thiel
On Sun, Feb 09, 2003 at 07:42:42PM +0100, Coercitas Temet'Nosce wrote:
 Hello all,
 
 I was just wondering something regarding IPFilter and new FreeBSD 5.0
 
 First, I was looking for IPF related functions in new Kernel building,
 didn't found them anywhere.maybe I did something wrong but not likely.
 Is it
 now a non kernel related application ?

The kernel options have moved.  Options that aren't platform specific
are in /usr/src/sys/conf/NOTES, and the IPFILTER options are there.

 Btw, I was looking for some docs on the FreeBSD website and didn't
 found anything interesting, only firewall that FreeBSD seems to
 support nowadays is the old IPFW, which is quite obsolete now
 imo. Why are documentation pages not dealing with IPF at all ?
 is there any reason ?

There's no real need for them.  Just compile the kernel with the
appropriate options and there's plenty of docs on IPF that can
tell you the rest.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



new wi driver problems

2003-01-18 Thread David Thiel
A couple things regarding this new wireless driver - the
wepkey option to ifconfig no longer seems to work; I get a 
SIOCS80211: Invalid argument.  Secondly and more importantly,
even when the wepkey is set via wicontrol, I can't seem to get 
any connectivity at all anymore.

ifconfig wi0:

flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet6 fe80::202:2dff:fe0c:ec4b%wi0 prefixlen 64 scopeid 0x3
inet 10.0.0.2 netmask 0xff00 broadcast 10.0.0.255
ether 00:02:2d:0c:ec:4b
media: IEEE 802.11 Wireless Ethernet autoselect (DS/2Mbps)
status: associated
ssid myssid 1:myssid
stationname FreeBSD WaveLAN/IEEE node
channel 7 authmode OPEN powersavemode OFF powersavesleep 100
wepmode MIXED weptxkey 1
wepkey 1:128-bit

dmesg:

wi0: WaveLAN/IEEE at port 0x100-0x13f irq 11 function 0 config 1 on pccard0
wi0: 802.11 address: 00:02:2d:0c:ec:4b
wi0: using Lucent Technologies, WaveLAN/IEEE
wi0: Lucent Firmware: Station (7.52.1)
wi0: supported rates: 1Mbps 2Mbps 5.5Mbps 11Mbps

uname:

FreeBSD sartre.redundancy.org 5.0-CURRENT FreeBSD 5.0-CURRENT #5: Fri Jan 17 12:15:30 
PST 2003  root@:/usr/obj/user/src/sys/SARTRE  i386

But I'm unable to ping my gateway, a -STABLE box with the same card.  I
did recompile with device wlan, and tried the generic kernel as well. 
Disabling WEP has no effect.

Could someone give me a pointer as to how to debug this?

Thanks,
David


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message