Re: SU+J systems do not fsck themselves

2011-12-29 Thread Scott Long

On Dec 29, 2011, at 4:02 PM, David Thiel wrote:

> On Wed, Dec 28, 2011 at 12:57:31AM -0700, Scott Long wrote:
>> So, there's an assumption with SUJ+fsck that SU is keeping the filesystem 
>> consistent.  Maybe that's a bad assumption, and I'm not trying to discredit 
>> your report.  But the intention with SUJ is to eliminate the need for 
>> anything more than a cursory check of the superblocks and a processing of 
>> the SUJ intent log.  If either of these fails then fsck reverts to a 
>> traditional scan.  In the same vein, ext3 and most other traditional 
>> journaling filesystems assume that the journal is correct and is preserving 
>> consistency, and don't do anything more than a cursory data structure scan 
>> and journal replay as well, but then revert to a full scan if that fails 
>> (zfs seems to be an exception here, with there being no actual fsck 
>> available for it).
>> 
>> As for the 180 day forced scan on ext3, I have no public comment.  SU has 
>> matured nicely over the last 10+ years, and I'm happy with the progress that 
>> SUJ has made in the last 2-3 years.  If there are bugs, they need to be 
>> exposed and addressed ASAP.
> 
> That clears things up somewhat - thank you for taking the time to 
> explain all that. I've got results from two other users (Cc'd) with a 
> fsck in single user mode using the journal and not using it. One has 
> geli, one does not, and both were with clean shutdown/boot (correct me 
> if I'm wrong, guys). Any thoughts?

Below is the transcript of my simple experiment with an intentional unclean 
shutdown with an unlinked file held open.  The machine was idle with nothing of 
any significance installed (it is a driver development box).  I created a file 
and opened in it vi, meanwhile I deleted it from another vty and then did a 
power cycle.  Everything looks as correct and normal as I would expect.  The 
/usr and /var filesystems also checked out normal.

My system sources are from mid-November, maybe earlier.

# fsck /
** /dev/ada0p2

USE JOURNAL? [yn] y

** SU+J Recovering /dev/ada0p2
** Reading 8388608 byte journal from inode 4.

RECOVER? [yn] y

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? [yn] y

** 5 journal records in 2048 bytes for 7.81% utilization
** Freed 1 inodes (0 dirs) 0 blocks, and 0 frags.

* FILE SYSTEM MARKED CLEAN *
# fsck /
** /dev/ada0p2

USE JOURNAL? [yn] n

** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
676 files, 41820 used, 216107 free (787 frags, 26915 blocks, 0.3% fragmentation)

* FILE SYSTEM IS CLEAN *

* FILE SYSTEM WAS MODIFIED *
# 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-29 Thread David Thiel
On Thu, Dec 29, 2011 at 03:02:14PM -0800, David Thiel wrote:
> =
> Machine 1, with journal:
> =
> 
> Script started on Thu Dec 29 11:26:29 2011
> fsck /
> ** /dev/ada0.eli

Correction - machine 1 had an unclean shutdown. Will get additional logs 
soon.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-29 Thread David Thiel
On Wed, Dec 28, 2011 at 12:57:31AM -0700, Scott Long wrote:
> So, there's an assumption with SUJ+fsck that SU is keeping the filesystem 
> consistent.  Maybe that's a bad assumption, and I'm not trying to discredit 
> your report.  But the intention with SUJ is to eliminate the need for 
> anything more than a cursory check of the superblocks and a processing of the 
> SUJ intent log.  If either of these fails then fsck reverts to a traditional 
> scan.  In the same vein, ext3 and most other traditional journaling 
> filesystems assume that the journal is correct and is preserving consistency, 
> and don't do anything more than a cursory data structure scan and journal 
> replay as well, but then revert to a full scan if that fails (zfs seems to be 
> an exception here, with there being no actual fsck available for it).
> 
> As for the 180 day forced scan on ext3, I have no public comment.  SU has 
> matured nicely over the last 10+ years, and I'm happy with the progress that 
> SUJ has made in the last 2-3 years.  If there are bugs, they need to be 
> exposed and addressed ASAP.

That clears things up somewhat - thank you for taking the time to 
explain all that. I've got results from two other users (Cc'd) with a 
fsck in single user mode using the journal and not using it. One has 
geli, one does not, and both were with clean shutdown/boot (correct me 
if I'm wrong, guys). Any thoughts?

=
Machine 1, with journal:
=

Script started on Thu Dec 29 11:26:29 2011
fsck /
** /dev/ada0.eli

USE JOURNAL? [yn] y

** SU+J Recovering /dev/ada0.eli
** Reading 33554432 byte journal from inode 4.

RECOVER? [yn] y

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? [yn] y

** 108 journal records in 49152 bytes for 7.03% utilization
** Freed 9 inodes (0 dirs) 0 blocks, and 1 frags.

* FILE SYSTEM MARKED CLEAN *

Script done on Thu Dec 29 11:26:39 2011

=
Machine 1, without journal:
=

Script started on Thu Dec 29 11:26:49 2011
fsck /
** /dev/ada0.eli

USE JOURNAL? [yn] n

** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=251177 (8 should be 0)
CORRECT? [yn] y

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
220435 files, 3945055 used, 3666151 free (17503 frags, 456081 blocks, 0.2% 
fragmentation)

* FILE SYSTEM IS CLEAN *

* FILE SYSTEM WAS MODIFIED *

Script done on Thu Dec 29 11:27:08 2011


=
Machine 2, with journal:
=

** /dev/ada0s1a

USE JOURNAL? yes

** SU+J Recovering /dev/ada0s1a
** Reading 33554432 byte journal from inode 4.

RECOVER? yes

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? yes

** 131 journal records in 11776 bytes for 35.60% utilization
** Freed 0 inodes (0 dirs) 0 blocks, and 0 frags.

* FILE SYSTEM MARKED CLEAN *

=
Machine 2, without journal:
=

** /dev/ada0s1a
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] 
SUMMARY INFORMATION BAD
SALVAGE? [yn] 
BLK(S) MISSING IN BIT MAPS
SALVAGE? [yn] 
670213 files, 19118534 used, 54535063 free (158431 frags, 6797079 blocks, 0.2% 
fragmentation)

* FILE SYSTEM MARKED CLEAN *

* FILE SYSTEM WAS MODIFIED *

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-28 Thread Lev Serebryakov
Hello, Mdf.
You wrote 28 декабря 2011 г., 23:14:19:


> Not required by SU as they use an explicit BIO_FLUSH which should be
> handled by the driver.
  No, they don't. It was discussed here about month ago.

-- 
// Black Lion AKA Lev Serebryakov 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-28 Thread Kostik Belousov
On Wed, Dec 28, 2011 at 11:14:19AM -0800, m...@freebsd.org wrote:
> SU doesn't care about write ordering, as long as everything before a
> BIO_FLUSH is really flushed by the time the BIO_FLUSH is acknowledged.
No.

SU and SU+J only require that write completed notification is issued
when geom/driver layer guarantees that the block content is written
to the stable storage.

SU does not depend on non-reordering of writes in any way, as well as
it does not issue BIO_FLUSH.


pgpkZdtAOt9Lt.pgp
Description: PGP signature


Re: SU+J systems do not fsck themselves

2011-12-28 Thread mdf
On Wed, Dec 28, 2011 at 8:54 AM, Maxim Khitrov  wrote:
> On Wed, Dec 28, 2011 at 11:42 AM, Matthias Andree
>  wrote:
>> Am 27.12.2011 22:53, schrieb David Thiel:
>>> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier
>>> 9-CURRENT on ppc) running SU+J that have had unexplained panics and
>>> crashes start happening relating to disk I/O. When I end up running a
>>> full fsck, it keeps turning out that the disk is dirty and corrupted,
>>> but no mechanism is in place with SU+J to detect and fix this. A bgfsck
>>> never happens, but a manual fsck in single-user does indeed fix the
>>> crashing and weird behavior. Others have tested their SU+J volumes and
>>> found them to have errors as well. This makes me super nervous.
>>
>> The one thing I figured is that in the light of power outages, or
>> crashing virtualization hosts, you really really really need to disable
>> disk write caches, and this affects softupdates, journalling, asynch
>> file systems, just about everything.
>>
>> The fact that makes matters worse is that journalling or softupdates
>> allow you to mount a silently-corrupted file system, whereas the
>> traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
>> foreground, so they get fixed before the FS panics.
>>
>> So can you be sure that:
>>
>> - your driver, chip set and hard disk execute ordered writes in order,

If they don't, it's a bug.  Not that there isn't buggy firmware out
there, but each layer of software does need to rely on the one below
actually doing what it's promised.

>> - your driver, chip set and hard disk actually write data to permanent
>> storage BEFORE acknowledging a successful write?

Not required by SU as they use an explicit BIO_FLUSH which should be
handled by the driver.

>> Whenever I fixed these issues, I had no more corruptions.
>>
>> For ata and sata, there are loader tunables you will want to set,
>> hw.ata.wc=0 and kern.cam.ada.write_cache=0.

This should not be necessary if the driver and firmware are not buggy.

>> If your drives are under ada, ad, or ahci related control, try these
>> settings.  For SCSI, use camcontrol to turn the write cache off.
>> softupdates is supposed to rectify most of the performance penalties
>> incurred.
>>
>> Note also that you needed to set ahci_load=YES and atapicam_load=YES in
>> 8.X, I've never bothered to check 7.X or 9.X WRT these settings.
>
> This is a bit off-topic, but I'm curious what the effect of NCQ is on
> softupdates? Since that too has the ability to reorder writes to disk,
> should it be disabled in addition to cache?

SU doesn't care about write ordering, as long as everything before a
BIO_FLUSH is really flushed by the time the BIO_FLUSH is acknowledged.

Cheers,
matthew

> Also, I would say that if you are using a hardware raid controller
> with a BBU, then allowing the use of controller's cache and write-back
> policy should be safe for use with softupdates. Any caching mechanism,
> for that matter, that has a separate power supply source should be ok.
> For example, the Intel 320 SSDs have a few on-board capacitors that
> are used to flush the cache in the event of a power loss.
>
> - Max
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-28 Thread Maxim Khitrov
On Wed, Dec 28, 2011 at 11:42 AM, Matthias Andree
 wrote:
> Am 27.12.2011 22:53, schrieb David Thiel:
>> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier
>> 9-CURRENT on ppc) running SU+J that have had unexplained panics and
>> crashes start happening relating to disk I/O. When I end up running a
>> full fsck, it keeps turning out that the disk is dirty and corrupted,
>> but no mechanism is in place with SU+J to detect and fix this. A bgfsck
>> never happens, but a manual fsck in single-user does indeed fix the
>> crashing and weird behavior. Others have tested their SU+J volumes and
>> found them to have errors as well. This makes me super nervous.
>
> The one thing I figured is that in the light of power outages, or
> crashing virtualization hosts, you really really really need to disable
> disk write caches, and this affects softupdates, journalling, asynch
> file systems, just about everything.
>
> The fact that makes matters worse is that journalling or softupdates
> allow you to mount a silently-corrupted file system, whereas the
> traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
> foreground, so they get fixed before the FS panics.
>
> So can you be sure that:
>
> - your driver, chip set and hard disk execute ordered writes in order,
>
> - your driver, chip set and hard disk actually write data to permanent
> storage BEFORE acknowledging a successful write?
>
> Whenever I fixed these issues, I had no more corruptions.
>
> For ata and sata, there are loader tunables you will want to set,
> hw.ata.wc=0 and kern.cam.ada.write_cache=0.
>
> If your drives are under ada, ad, or ahci related control, try these
> settings.  For SCSI, use camcontrol to turn the write cache off.
> softupdates is supposed to rectify most of the performance penalties
> incurred.
>
> Note also that you needed to set ahci_load=YES and atapicam_load=YES in
> 8.X, I've never bothered to check 7.X or 9.X WRT these settings.

This is a bit off-topic, but I'm curious what the effect of NCQ is on
softupdates? Since that too has the ability to reorder writes to disk,
should it be disabled in addition to cache?

Also, I would say that if you are using a hardware raid controller
with a BBU, then allowing the use of controller's cache and write-back
policy should be safe for use with softupdates. Any caching mechanism,
for that matter, that has a separate power supply source should be ok.
For example, the Intel 320 SSDs have a few on-board capacitors that
are used to flush the cache in the event of a power loss.

- Max
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-28 Thread Matthias Andree
Am 27.12.2011 22:53, schrieb David Thiel:
> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
> 9-CURRENT on ppc) running SU+J that have had unexplained panics and 
> crashes start happening relating to disk I/O. When I end up running a 
> full fsck, it keeps turning out that the disk is dirty and corrupted, 
> but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
> never happens, but a manual fsck in single-user does indeed fix the 
> crashing and weird behavior. Others have tested their SU+J volumes and 
> found them to have errors as well. This makes me super nervous.

The one thing I figured is that in the light of power outages, or
crashing virtualization hosts, you really really really need to disable
disk write caches, and this affects softupdates, journalling, asynch
file systems, just about everything.

The fact that makes matters worse is that journalling or softupdates
allow you to mount a silently-corrupted file system, whereas the
traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
foreground, so they get fixed before the FS panics.

So can you be sure that:

- your driver, chip set and hard disk execute ordered writes in order,

- your driver, chip set and hard disk actually write data to permanent
storage BEFORE acknowledging a successful write?

Whenever I fixed these issues, I had no more corruptions.

For ata and sata, there are loader tunables you will want to set,
hw.ata.wc=0 and kern.cam.ada.write_cache=0.

If your drives are under ada, ad, or ahci related control, try these
settings.  For SCSI, use camcontrol to turn the write cache off.
softupdates is supposed to rectify most of the performance penalties
incurred.

Note also that you needed to set ahci_load=YES and atapicam_load=YES in
8.X, I've never bothered to check 7.X or 9.X WRT these settings.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread Scott Long

On Dec 28, 2011, at 12:34 AM, David Thiel wrote:

> On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote:
>> The first run of fsck, using the journal, gives results that I would 
>> expect.  The second run seems to imply that the fixes made on the 
>> first run didn't actually get written to disk.  This is definitely an 
>> oddity.  I see that you're using geli, maybe there's some strange 
>> side-effect there.  No idea.  Report as a bug, this is definitely 
>> undesired behavior.
> 
> Not impossible, but I was seeing similar issues on two non-geli systems 
> as well, i.e. tons of errors fixed when doing a single-user 
> non-journalled fsck, but journalled fsck not fixing stuff. I'll try to 
> replicate on a test machine, as I already lost data on the last 
> (non-geli) machine this happened to.
> 
>> For the love that is all good and holy, don't ever run fsck on a live 
>> filesystem.  It's going to report these kinds of problems!  It's 
>> normal; filesystem metadata updates stay cached in memory, and fsck 
>> bypasses that cache.  
> 
> Ok. I expected fsck would be softupdate-aware in that way, but I 
> understand it not doing so.
> 
>>> - SU+J and fsck do not work correctly together to fix corruption on 
>>> boot, i.e. bgfsck isn't getting run when it should
>> 
>> The point of SUJ is to eliminate the need for bgfsck.  Effectively, 
>> they are exclusive ideas.  
> 
> This is surprising to me. It is my impression that under Linux at least, 
> ext3fs is checked against the journal, and gets a full e2fsck if it 
> finds it's still dirty. Additionally, there's a periodic fsck after 180 
> days continuous runtime or x number of mounts (see tune2fs -i and -c).  
> Is SU+J somehow implemented in such a way that this is unnecessary? What 
> does it do that the ext3fs people have missed?
> 

SUJ isn't like ext3 journaling, it doesn't do 100% metadata logging.  Instead, 
it's an extension of softupdates.  Softupdates (SU) is still responsible for 
ordering dependent writes to the disk to maintain consistency.  What SU can't 
handle is the Unix/POSIX idiom of unlinking a file from the namespace but 
keeping its inode active through refcounts.  When you have an unclean shutdown, 
you wind up with stale blocks allocated to orphaned inodes.  The point of 
bgfsck was to scan the filesystem for these allocations and free them, just 
like fsck does, but to do it in the background so that the boot could continue. 
 SUJ is basically just an intent log for this case; it tells fsck where to find 
these allocations so that fsck doesn't have to do the lengthy scan.  FWIW, this 
problem is present in most any journaling implementation and is usually solved 
via the use of intent records in a journal, not unlike SUJ.

So, there's an assumption with SUJ+fsck that SU is keeping the filesystem 
consistent.  Maybe that's a bad assumption, and I'm not trying to discredit 
your report.  But the intention with SUJ is to eliminate the need for anything 
more than a cursory check of the superblocks and a processing of the SUJ intent 
log.  If either of these fails then fsck reverts to a traditional scan.  In the 
same vein, ext3 and most other traditional journaling filesystems assume that 
the journal is correct and is preserving consistency, and don't do anything 
more than a cursory data structure scan and journal replay as well, but then 
revert to a full scan if that fails (zfs seems to be an exception here, with 
there being no actual fsck available for it).

As for the 180 day forced scan on ext3, I have no public comment.  SU has 
matured nicely over the last 10+ years, and I'm happy with the progress that 
SUJ has made in the last 2-3 years.  If there are bugs, they need to be exposed 
and addressed ASAP.

Scott

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote:
> The first run of fsck, using the journal, gives results that I would 
> expect.  The second run seems to imply that the fixes made on the 
> first run didn't actually get written to disk.  This is definitely an 
> oddity.  I see that you're using geli, maybe there's some strange 
> side-effect there.  No idea.  Report as a bug, this is definitely 
> undesired behavior.

Not impossible, but I was seeing similar issues on two non-geli systems 
as well, i.e. tons of errors fixed when doing a single-user 
non-journalled fsck, but journalled fsck not fixing stuff. I'll try to 
replicate on a test machine, as I already lost data on the last 
(non-geli) machine this happened to.

> For the love that is all good and holy, don't ever run fsck on a live 
> filesystem.  It's going to report these kinds of problems!  It's 
> normal; filesystem metadata updates stay cached in memory, and fsck 
> bypasses that cache.  

Ok. I expected fsck would be softupdate-aware in that way, but I 
understand it not doing so.

> > - SU+J and fsck do not work correctly together to fix corruption on 
> > boot, i.e. bgfsck isn't getting run when it should
> 
> The point of SUJ is to eliminate the need for bgfsck.  Effectively, 
> they are exclusive ideas.  

This is surprising to me. It is my impression that under Linux at least, 
ext3fs is checked against the journal, and gets a full e2fsck if it 
finds it's still dirty. Additionally, there's a periodic fsck after 180 
days continuous runtime or x number of mounts (see tune2fs -i and -c).  
Is SU+J somehow implemented in such a way that this is unnecessary? What 
does it do that the ext3fs people have missed?

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread Scott Long

On Dec 27, 2011, at 10:14 PM, David Thiel wrote:

> On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote:
 - use journalled fsck; - use normal fsck to check if the
 journalled fsck did the right thing.
> 
> Ok, here is the log of fsck with and without journal.
> 
> http://redundancy.redundancy.org/fscklog3
> 

The first run of fsck, using the journal, gives results that I would expect.  
The second run seems to imply that the fixes made on the first run didn't 
actually get written to disk.  This is definitely an oddity.  I see that you're 
using geli, maybe there's some strange side-effect there.  No idea.  Report as 
a bug, this is definitely undesired behavior.

> That was done the very next boot, after a clean shutdown. The errors 
> from the previous live fsck aren't there (oddly), but there are still 
> are apparently some corrections made. The next fsck still complains, but 
> doesn't give any salvage prompts.
> 
> Here is jsa@'s, done on a live FS with SU+J:
> 
> http://redundancy.redundancy.org/fscklog4
> 

For the love that is all good and holy, don't ever run fsck on a live 
filesystem.  It's going to report these kinds of problems!  It's normal; 
filesystem metadata updates stay cached in memory, and fsck bypasses that 
cache.  Also, what you see in your log is a file that has been unlinked but 
held open.  This is a common Unix idiom, and one that gets cleaned up by fsck 
on reboot, whether through the SUJ intent log processing or through a 
traditional fsck.

> I'm not actually looking to solve my particular problem per se. The 
> issue is that almost everyone I've checked with that's running SU+J gets 
> unref'd file and other errors when they check their filesystem (with the 
> fs live). Unless I'm missing something, a running FS should never have 
> those kinds of errors unless you deliberately disabled fsck.
> 

Nope, you are completely incorrect here.

> This leaves only a couple options:
> 
> - SU+J and fsck do not work correctly together to fix corruption on 
> boot, i.e. bgfsck isn't getting run when it should

The point of SUJ is to eliminate the need for bgfsck.  Effectively, they are 
exclusive ideas.  It's possible that there are still problems with SUJ and how 
fsck processes and commits the journal entires.  However, bgfsck has nothing to 
do with this, and I'd also like to know if your use of geli is complicating the 
problem.

> - Stuff is getting completely screwed up after boot

Possibly but unlikely

> - fsck is giving incorrect results

Very unlikely

> - I'm completely clueless about how SU+J is supposed to behave or be 
> deployed

No comment =-)

> 
> I'm pretty certain that the first is the issue here. It would be great 
> if others could check their own SU+J filesystems so we could get a few 
> more data points.
> 

Indeed, more data is needed.

Scott

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote:
> >> - use journalled fsck; - use normal fsck to check if the
> >> journalled fsck did the right thing.

Ok, here is the log of fsck with and without journal.

http://redundancy.redundancy.org/fscklog3

That was done the very next boot, after a clean shutdown. The errors 
from the previous live fsck aren't there (oddly), but there are still 
are apparently some corrections made. The next fsck still complains, but 
doesn't give any salvage prompts.

Here is jsa@'s, done on a live FS with SU+J:

http://redundancy.redundancy.org/fscklog4

I'm not actually looking to solve my particular problem per se. The 
issue is that almost everyone I've checked with that's running SU+J gets 
unref'd file and other errors when they check their filesystem (with the 
fs live). Unless I'm missing something, a running FS should never have 
those kinds of errors unless you deliberately disabled fsck.

This leaves only a couple options:

- SU+J and fsck do not work correctly together to fix corruption on 
  boot, i.e. bgfsck isn't getting run when it should
- Stuff is getting completely screwed up after boot
- fsck is giving incorrect results
- I'm completely clueless about how SU+J is supposed to behave or be 
  deployed

I'm pretty certain that the first is the issue here. It would be great 
if others could check their own SU+J filesystems so we could get a few 
more data points.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/27/11 14:36, David Thiel wrote:
> On Tue, Dec 27, 2011 at 02:29:03PM -0800, Xin LI wrote:
>> I'm not sure if your experiments are right here, the second log
>> shows you're running it read-only, which is likely caused by
>> running it on live file system.
> 
> Yes, this most recent instance is me running it on a live FS,
> because I'm using that machine to type this right now. :) However,
> I've had the issues fixed in single-user on other systems and had
> the problems go away. At least for a bit.
> 
>> - use journalled fsck; - use normal fsck to check if the
>> journalled fsck did the right thing.
> 
> When you say "use journalled fsck", what's the proper way to
> initiate that? I don't see any journal-related options in the man
> page.

fsck -p perhaps?  IIRC the fsck_ufs(8) would use journal if it's
available and up-to-date.

Cheers,
- -- 
Xin LI https://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk76S04ACgkQOfuToMruuMChEACfXyh1Y7IGiATqJdnFKeuIS2vB
vJMAn0gCPy98kohAh3LD9ieIASPmksHd
=L7lN
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
On Tue, Dec 27, 2011 at 02:29:03PM -0800, Xin LI wrote:
> I'm not sure if your experiments are right here, the second log shows
> you're running it read-only, which is likely caused by running it on
> live file system.  

Yes, this most recent instance is me running it on a live FS, because 
I'm using that machine to type this right now. :) However, I've had the 
issues fixed in single-user on other systems and had the problems go 
away. At least for a bit.

> - use journalled fsck;
> - use normal fsck to check if the journalled fsck did the right thing.

When you say "use journalled fsck", what's the proper way to initiate 
that? I don't see any journal-related options in the man page.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-27 Thread Xin LI
On Tue, Dec 27, 2011 at 1:53 PM, David Thiel
 wrote:
> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier
> 9-CURRENT on ppc) running SU+J that have had unexplained panics and
> crashes start happening relating to disk I/O. When I end up running a
> full fsck, it keeps turning out that the disk is dirty and corrupted,
> but no mechanism is in place with SU+J to detect and fix this. A bgfsck
> never happens, but a manual fsck in single-user does indeed fix the
> crashing and weird behavior. Others have tested their SU+J volumes and
> found them to have errors as well. This makes me super nervous.
>
> Basically, the way SU+J seems to operate is this:
>
> http://redundancy.redundancy.org/fscklog2
>
> "Oh hey, I see you shut down uncleanly, let's check everything looks
> good, off you go, whee"
>
> Until I actually go and fsck, when I get:
>
> http://redundancy.redundancy.org/fscklog1
>
> So, I understand that journalling doesn't replace the need for a
> potential fsck (though I never had this problem with gjournal), but
> without a way for the system to detect that a fsck is necessary, this
> seems pretty much a guaranteed recipe for data corruption, and seems to
> offer little to no benefit over plain SU+fsck, or even just mounting
> async.
>
> So: is everyone else seeing this? Am I misunderstanding how SU+J should
> be used? How should the error resolution process really happen?

I'm not sure if your experiments are right here, the second log shows
you're running it read-only, which is likely caused by running it on
live file system.  What I would suggest to do is:

 - Reset the system while it's running;
 - Boot into single user mode;
 - 'dd' the disk image to an image;
 - Boot the system normally and:
- use mdconfig -a -t vnode -f on copy of the image
- use journalled fsck;
- use normal fsck to check if the journalled fsck did the right thing.

This would rule out possible after-mount introduced changes, etc.  I
personally did not hit problems a few months ago but I didn't re-test
recently.

Cheers,
-- 
Xin LI  https://www.delphij.net/
FreeBSD - The Power to Serve! Live free or die
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


SU+J systems do not fsck themselves

2011-12-27 Thread David Thiel
I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
9-CURRENT on ppc) running SU+J that have had unexplained panics and 
crashes start happening relating to disk I/O. When I end up running a 
full fsck, it keeps turning out that the disk is dirty and corrupted, 
but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
never happens, but a manual fsck in single-user does indeed fix the 
crashing and weird behavior. Others have tested their SU+J volumes and 
found them to have errors as well. This makes me super nervous.

Basically, the way SU+J seems to operate is this:

http://redundancy.redundancy.org/fscklog2

"Oh hey, I see you shut down uncleanly, let's check everything looks 
good, off you go, whee"

Until I actually go and fsck, when I get:

http://redundancy.redundancy.org/fscklog1

So, I understand that journalling doesn't replace the need for a 
potential fsck (though I never had this problem with gjournal), but 
without a way for the system to detect that a fsck is necessary, this 
seems pretty much a guaranteed recipe for data corruption, and seems to 
offer little to no benefit over plain SU+fsck, or even just mounting 
async.

So: is everyone else seeing this? Am I misunderstanding how SU+J should 
be used? How should the error resolution process really happen? 

Thanks,
David
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"