Re: [RFC] ext3 freeze feature

2008-01-25 Thread Pekka Enberg
Hi,

 diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
 linux-2.6.24-rc8/include/linux/ext3_fs_sb.h 
 linux-2.6.24-rc8-freeze/include/linux/ext3_fs_sb.h
 --- linux-2.6.24-rc8/include/linux/ext3_fs_sb.h 2008-01-16 13:22:48.0 
 +0900
 +++ linux-2.6.24-rc8-freeze/include/linux/ext3_fs_sb.h  2008-01-22 
 18:20:33.0 +0900
 @@ -81,6 +81,8 @@ struct ext3_sb_info {
 char *s_qf_names[MAXQUOTAS];/* Names of quota files with 
 journalled quota */
 int s_jquota_fmt;   /* Format of quota to use */
  #endif
 +   /* Delayed work for freeze */
 +   struct delayed_work s_freeze_timeout;

Why not put this in struct super_block? Then you don't need this

 +/**
 + * get_super_block - get super_block
 + * @s_fs_info  : filesystem dependent information
 + *   (super_block.s_fs_info)
 + *
 + * Get super_block which holds s_fs_info from super_blocks.
 + * get_super_block() returns a pointer of super block or
 + * %NULL if it have failed.
 + */
 +struct super_block *get_super_block(void *s_fs_info)
 +{

And these can be put to generic code:

  /*
 + * ext3_add_freeze_timeout - Add timeout for ext3 freeze.
 + *
 + * @sbi: ext3 super block
 + * @timeout_msec   : timeout period
 + *
 + * Add the delayed work for ext3 freeze timeout
 + * to the delayed work queue.
 + */
 +void ext3_add_freeze_timeout(struct ext3_sb_info *sbi,
 +   long timeout_msec)
 +{
 +   s64 timeout_jiffies = msecs_to_jiffies(timeout_msec);
 +
 +   /*
 +* setup freeze timeout function
 +*/
 +   INIT_DELAYED_WORK(sbi-s_freeze_timeout, ext3_freeze_timeout);
 +
 +   /* set delayed work queue */
 +   cancel_delayed_work(sbi-s_freeze_timeout);
 +   schedule_delayed_work(sbi-s_freeze_timeout, timeout_jiffies);
 +}
 +
 +/*
 + * ext3_del_freeze_timeout - Delete timeout for ext3 freeze.
 + *
 + * @sbi: ext3 super block
 + *
 + * Delete the delayed work for ext3 freeze timeout
 + * from the delayed work queue.
 + */
 +void ext3_del_freeze_timeout(struct ext3_sb_info *sbi)
 +{
 +   if (delayed_work_pending(sbi-s_freeze_timeout))
 +   cancel_delayed_work(sbi-s_freeze_timeout);
 +}

 +/*
 + * ext3_freeze_timeout - Thaw the filesystem.
 + *
 + * @work   : work queue (delayed_work.work)
 + *
 + * Called by the delayed work when elapsing the timeout period.
 + * Thaw the filesystem.
 + */
 +static void ext3_freeze_timeout(struct work_struct *work)
 +{
 +   struct ext3_sb_info *sbi = container_of(work,
 +   struct ext3_sb_info,
 +   s_freeze_timeout.work);
 +   struct super_block *sb = get_super_block(sbi);
 +
 +   BUG_ON(sb == NULL);
 +
 +   if (sb-s_frozen != SB_UNFROZEN)
 +   thaw_bdev(sb-s_bdev, sb);
 +}
 +

I am also wondering whether we should have system call(s) for these:

On Jan 25, 2008 12:59 PM, Takashi Sato [EMAIL PROTECTED] wrote:
 +   case EXT3_IOC_FREEZE: {

 +   case EXT3_IOC_THAW: {

And just convert XFS to use them too?

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Dmitri Monakhov
On 19:59 Fri 25 Jan , Takashi Sato wrote:
 Hi,
 
 Currently, ext3 doesn't have the freeze feature which suspends write
 requests.  So, we cannot get a backup which keeps the filesystem's
 consistency with the storage device's features (snapshot, replication)
 while it is mounted.
 In many case, a commercial filesystems (e.g. VxFS) has the freeze
 feature and it would be used to get the consistent backup.
First of all Linux already have at least one open-source(dm-snap),
and several commercial snapshot solutions. In fact dm-snaps it
not perfect:
a) bit map loading is not supported (this is useful for freezing 
   only used blocks) which causing significant  slowdown even for new writes
b) non patched dm-snap code has significant performance slowdown for all 
   rewrite requests. 
c) IMHO memory footprint is too big.

BUT, it works well for most file-systems.
 
 So I am planning on implementing the ioctl of the freeze feature for ext3.
 I think we can get the consistent backup with the following steps.
 1. Freeze the filesystem with ioctl.
So you plan to do it from userspace.. well good luck with it :)

 2. Separate the replication volume or get the snapshot
with the storage device's feature.
 3. Unfreeze the filesystem with ioctl.

You have to realize what delay between 1-3 stages have to be minimal.
for example dm-snap perform it only for explicit journal flushing.
From my experience if delay is more than 4-5 seconds whole system becomes
unstable.
BTW: you have to always remember that while locking ext3 via freeze_bdev
 sb-ext3_write_super_lockfs() will be called wich implemented as simple
journal lock. This means what some bio-s still may reach original device
even after file system was locked (i've observed this in real life 
situation).

 4. Get the backup from the separated replication volume
or the snapshot.
 
 The usage of the ioctl is as below.
  int ioctl(int fd, int cmd, long *timeval)
  fd: The file descriptor of the mountpoint.
  cmd: EXT3_IOC_FREEZE for the freeze or EXT3_IOC_THAW for the unfreeze.
  timeval: The timeout value expressed in seconds.
   If it's 0, the timeout isn't set.
  Return value: 0 if the operation succeeds. Otherwise, -1.
 
 I have made sure that write requests were suspended with the experimental
 patch for this feature and attached it in this mail.
 
 The points of the implementation are followings.
 - Add calls of the freeze function (freeze_bdev) and
   the unfreeze function (thaw_bdev) in ext3_ioctl().
 
 - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
   is registered to the delayed work queue to unfreeze the filesystem
   automatically after the lapse of the specified time.
 
 Any comments are very welcome.
 
 Signed-off-by: Takashi Sato [EMAIL PROTECTED]
 ---
 diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
 linux-2.6.24-rc8/fs/ext3/ioctl.c linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c
 --- linux-2.6.24-rc8/fs/ext3/ioctl.c  2008-01-16 13:22:48.0 +0900
 +++ linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c   2008-01-22 18:20:33.0 
 +0900
 @@ -254,6 +254,42 @@ flags_err:
   return err;
   }
  
 + case EXT3_IOC_FREEZE: {
 + long timeout_sec;
 + long timeout_msec;
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen != SB_UNFROZEN)

 + return -EINVAL
WOW timeout extending is not supported !?
So you wanna say what caller have to set timer to the maximal possible
timeout from the very beginning.
IMHO it is better to use heart-beat timer approach, for example:
each second caller extend it's timeout for two seconds. in this approach
even after caller was killed by any reason, it's timeout will be expired in
two seconds.
 
if (inode-i_sb-s_frozen == SB_FROZEN)
/* extending timeout */
.. 


 + /* arg(sec) to tick value */
 + get_user(timeout_sec, (long __user *) arg);
 + timeout_msec = timeout_sec * 1000;
 + if (timeout_msec  0)
 + return -EINVAL;
 +
 + /* Freeze */
 + freeze_bdev(inode-i_sb-s_bdev);
 +
 + /* set up unfreeze timer */
 + if (timeout_msec  0)
 + ext3_add_freeze_timeout(EXT3_SB(inode-i_sb),
 + timeout_msec);
 + return 0;
 + }
 + case EXT3_IOC_THAW: {
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen == SB_UNFROZEN)
 + return -EINVAL;
 +
 + /* delete unfreeze timer */
 + ext3_del_freeze_timeout(EXT3_SB(inode-i_sb));
 +
 + /* Unfreeze */
 + thaw_bdev(inode-i_sb-s_bdev, inode-i_sb);
 +
 + return 0;
 + }
  
   default:
   return -ENOTTY;
 diff -uprN -X 

Re: Integrating patches in SLES10 e2fsprogs

2008-01-25 Thread Matthias Koenig
Andreas Dilger [EMAIL PROTECTED] writes:

 I was looking through the SLES10 e2fsprogs patch set, and I wonder if some
 of them could be integrated upstream, and if any effort had been made in
 that direction in the past?  In particular, the addition of et_list_lock()
 and et_list_unlock() to libcom_err cause failures if e2fsprogs is updated
 to a non-SLES10 derived RPM.


 A list of patches and (my) descriptions are below:
 libcom_err-no-static-buffer.patch - avoids static buffer returned to caller
   by error_message() function
 libcom_err-no-init_error_table.patch - removes init_error_table() function
  (maybe because it isn't thread safe?),
  but I think this could be made thread
  safe by adding locking around use of
  _et_dynamic_list, or maybe it is
  obsoleted by add_error_table()?
 libcom_err-no-e2fsck.static.patch - can't build e2fsck.static because of
   -lpthread in libcom_err-mutex.patch, but
   nothing uses e2fsck.static anymore?
 libcom_err-mutex.patch - add et_list_{un,}lock() via pthread mutex

This adresses
https://bugzilla.novell.com/show_bug.cgi?id=66534

 e2fsprogs-blkid.diff - Adds documentation of BLKID_FILE environment variable.
This is actually implemented directly in libblkid in
  e2fsprogs-1.40.2 but no mention of it in the man pages.

https://bugzilla.novell.com/show_bug.cgi?id=50156

 e2fsprogs-mdraid.patch - allows skip of mdraid probing, not sure why?

https://bugzilla.novell.com/show_bug.cgi?id=100530

 e2fsprogs-probe_reiserfs-fpe.patch - fixes a legitimate bug in probe_reiserfs,
though it might be better to just return
an error if the blocksize is bad?

https://bugzilla.novell.com/show_bug.cgi?id=115827

 In addition to this, the SLES10 .spec file is completely different than that
 shipped with upstream e2fsprogs, and I'd like to reconcile that if possible.
 In particular it has libcom_err and libss in a separate RPM (libcom_err).
 I understand that FC8 (not sure about RHEl5) has also split out some of the
 libraries, but in a different way (e2fsprogs-libs) and that is a bit of a
 headache.  It might be possible to reconcile with suitable rpm-fu, but it
 would be desirable that SLES pick up these changes in the future...

We have now at SuSE a clear policy about packaging shared libraries:
http://en.opensuse.org/Packaging/Shared_Library_Packaging_Policy
This is pretty much similar to what debian does since ages.
It might be possible to do this in one spec, so that it works for
FC and SuSE, but I don't see this being worth the effort.

 I don't want to spam the list with all of the patches yet, but if there is
 interest in merging these upstream then I can provide versions of these
 patches against the current e2fsprogs instead of 1.38 that is in SLES10.

Since the SLES10 patches are against e2fsprogs 1.38 a better base for
upstream work is Opensuse Factory, which just has been updated to 1.40.4.
Yes, there are still patches in there which I need to check for upstream
inclusion, and this is something I wanted to do since some time. I just
didn't get the time recently. But your mail is a good oppertunity to
start working on this.

Matthias
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Fri, Jan 25, 2008 at 09:42:30PM +0900, Takashi Sato wrote:
 I am also wondering whether we should have system call(s) for these:
 
 On Jan 25, 2008 12:59 PM, Takashi Sato [EMAIL PROTECTED] wrote:
 +   case EXT3_IOC_FREEZE: {
 
 +   case EXT3_IOC_THAW: {
 
 And just convert XFS to use them too?
 
 I think it is reasonable to implement it as the generic system call, as you
 said.  Does XFS folks think so?

Sure.

Note that we can't immediately remove the XFS ioctls otherwise
we'd break userspace utilities that use them

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Sat, Jan 26, 2008 at 04:35:26PM +1100, David Chinner wrote:
 On Fri, Jan 25, 2008 at 07:59:38PM +0900, Takashi Sato wrote:
  The points of the implementation are followings.
  - Add calls of the freeze function (freeze_bdev) and
the unfreeze function (thaw_bdev) in ext3_ioctl().
  
  - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
is registered to the delayed work queue to unfreeze the filesystem
automatically after the lapse of the specified time.
 
 Seems like pointless complexity to me - what happens if a
 timeout occurs while the filsystem is still freezing?
 
 It's not uncommon for a freeze to take minutes if memory
 is full of dirty data that needs to be flushed out, esp. if
 dm-snap is doing COWs for every write issued

Sorry, ignore this bit - I just realised the timer is set
up after the freeze has occurred

Still, that makes it potentially dangerous to whatever is being
done while the filesystem is frozen

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread David Chinner
On Fri, Jan 25, 2008 at 07:59:38PM +0900, Takashi Sato wrote:
 The points of the implementation are followings.
 - Add calls of the freeze function (freeze_bdev) and
   the unfreeze function (thaw_bdev) in ext3_ioctl().
 
 - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
   is registered to the delayed work queue to unfreeze the filesystem
   automatically after the lapse of the specified time.

Seems like pointless complexity to me - what happens if a
timeout occurs while the filsystem is still freezing?

It's not uncommon for a freeze to take minutes if memory
is full of dirty data that needs to be flushed out, esp. if
dm-snap is doing COWs for every write issued

 + case EXT3_IOC_FREEZE: {

 + if (inode-i_sb-s_frozen != SB_UNFROZEN)
 + return -EINVAL;

 + freeze_bdev(inode-i_sb-s_bdev);

 + case EXT3_IOC_THAW: {
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 + if (inode-i_sb-s_frozen == SB_UNFROZEN)
 + return -EINVAL;
.
 + /* Unfreeze */
 + thaw_bdev(inode-i_sb-s_bdev, inode-i_sb);

That's inherently unsafe - you can have multiple unfreezes
running in parallel which seriously screws with the bdev semaphore
count that is used to lock the device due to doing multiple up()s
for every down.

Your timeout thingy guarantee that at some point you will get
multiple up()s occuring due to the timer firing racing with
a thaw ioctl. 

If this interface is to be more widely exported, then it needs
a complete revamp of the bdev is locked while it is frozen so
that there is no chance of a double up() ever occuring on the
bd_mount_sem due to racing thaws.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Bodo Eggert
On Fri, 25 Jan 2008, Bryan Henderson wrote:

  AIX basically did this with SIGDANGER (the signal is ignored by
  default), except there wasn't the ability for the process to tell the
  kernel at what level of memory pressure before it should start getting
  notified, and there was no way for the kernel to tell how bad the
  memory pressure actually was.  On the other hand, it was a relatively
  simple design.
 
 AIX does provide a system call to find out how much paging backing store 
 space is available and the thresholds set by the system administrator. 
 Running out of paging space is the only memory pressure AIX is concerned 
 about.  While I think having processes make memory usage decisions based 
 on that is a shoddy way to manage system resources, that's what it is 
 intended for.

If you start partitioning the system into virtual servers (or something
similar), being close to swapping may be somebody else's problem.
(They shouldn't have exceeded their guaranteed memory limit).


 Incidentally, some context for the AIX approach to the OOM problem: a 
 process may exclude itself from OOM vulnerability altogether.  It places 
 itself in early allocation mode, which means at the time it creates 
 virtual memory, it reserves enough backing store for the worst case.  The 
 memory manager does not send such a process the SIGDANGER signal or 
 terminate it when it runs out of paging space.  Before c. 2000, this was 
 the only mode.  Now the default is late allocation mode, which is similar 
 to Linux.

This is an interesting approach. It feels like some programs might be 
interested in choosing this mode instead of risking OOM. 
-- 
The programmer's National Anthem is ''
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Andreas Dilger
On Jan 24, 2008  17:25 -0700, Zan Lynx wrote:
 Have y'all been following the /dev/mem_notify patches?
 http://article.gmane.org/gmane.linux.kernel/628653

Having the notification be via poll() is a very restrictive processing
model.  Having the notification be via a signal means that any kind of
process (and not just those that are event loop driven) can register
a callback at some arbitrary point in the code and be notified.  I
don't object to the poll() interface, but it would be good to have a
signal mechanism also.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 03:18:51PM +0300, Dmitri Monakhov wrote:
 First of all Linux already have at least one open-source(dm-snap),
 and several commercial snapshot solutions. 

Yes, but it requires that the filesystem be stored under LVM.  Unlike
what EVMS v1 allowed us to do, we can't currently take a snapshot of a
bare block device.  This patch could potentially be useful for systems
which aren't using LVM, however

 You have to realize what delay between 1-3 stages have to be minimal.
 for example dm-snap perform it only for explicit journal flushing.
 From my experience if delay is more than 4-5 seconds whole system becomes
 unstable.

That's the problem.  You can't afford to freeze for very long.

What you *could* do is to start putting processes to sleep if they
attempt to write to the frozen filesystem, and then detect the
deadlock case where the process holding the file descriptor used to
freeze the filesystem gets frozen because it attempted to write to the
filesystem --- at which point it gets some kind of signal (which
defaults to killing the process), and the filesystem is unfrozen and
as part of the unfreeze you wake up all of the processes that were put
to sleep for touching the frozen filesystem.

The other approach would be to say, oh well, the freeze ioctl is
inherently dangerous, and root is allowed to himself in the foot, so
who cares.  :-)

But it was this concern which is why ext3 never exported freeze
functionality to userspace, even though other commercial filesystems
do support this.  It wasn't that it wasn't considered, but the concern
about whether or not it was sufficiently safe to make available.

And I do agree that we probably should just implement this in
filesystem independent way, in which case all of the filesystems that
support this already have super_operations functions
write_super_lockfs() and unlockfs().

So if this is done using a new system call, there should be no
filesystem-specific changes needed, and all filesystems which support
those super_operations method functions would be able to provide this
functionality to the new system call.

 - Ted

P.S.  Oh yeah, it should be noted that freezing at the filesystem
layer does *not* guarantee that changes to the block device aren't
happening via mmap()'ed files.  The LVM needs to freeze writes the
block device level if it wants to guarantee a completely stable
snapshot image.  So the proposed patch doens't quite give you those
guarantees, if that was the intended goal.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] UPDATED: ignore safe flag differences when fsck compares superblocks

2008-01-25 Thread Eric Sandeen
(updated for thinko: when proper flag *is* set on both primary  backup)

Recent e2fsprogs (1.40.3 and higher) fsck compares primary superblock to 
backups, and if things differ, it forces a full check.  However, the
kernel has a penchant for updating flags the first time a feature is
used - attributes, large files, etc.

However, it only updates these on the primary sb.  This then causes
the new e2fsck behavior to trigger a full check.  I think these flags
can be safely ignored on this check; having them set on the primary
but not the backups doesn't indicate corruption; if they're wrongly
set on the primary, really no damage is done, and if the backup is
used, but it doesn't have the flags set when it should, I'm pretty sure
e2fsck can cope with that.

I'll admit the patch below is not glamorous.  Any comments, either
on the style(sic) or the intent of the patch?

Thanks,

-Eric

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

Index: e2fsprogs-1.40.4/e2fsck/super.c
===
--- e2fsprogs-1.40.4.orig/e2fsck/super.c
+++ e2fsprogs-1.40.4/e2fsck/super.c
@@ -814,10 +814,32 @@ int check_backup_super_block(e2fsck_t ct
continue;
}
 
-#define SUPER_DIFFERENT(x) (fs-super-x != tfs-super-x)
-   if (SUPER_DIFFERENT(s_feature_compat) ||
-   SUPER_DIFFERENT(s_feature_incompat) ||
-   SUPER_DIFFERENT(s_feature_ro_compat) ||
+   /*
+* A few flags are set on the fly by the kernel, but
+* only in the primary superblock.  They are safe
+* to copy even if they differ.
+*/ 
+
+#define FEATURE_COMPAT_IGNORE  (EXT2_FEATURE_COMPAT_EXT_ATTR)
+#define FEATURE_RO_COMPAT_IGNORE   (EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
+EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
+#define FEATURE_INCOMPAT_IGNORE(EXT3_FEATURE_INCOMPAT_EXTENTS)
+
+#define SUPER_COMPAT_DIFFERENT(x)  \
+   (( fs-super-x  ~FEATURE_COMPAT_IGNORE) !=\
+(tfs-super-x  ~FEATURE_COMPAT_IGNORE))
+#define SUPER_INCOMPAT_DIFFERENT(x)\
+   (( fs-super-x  ~FEATURE_INCOMPAT_IGNORE) !=  \
+(tfs-super-x  ~FEATURE_INCOMPAT_IGNORE))
+#define SUPER_RO_COMPAT_DIFFERENT(x)   \
+   (( fs-super-x  ~FEATURE_RO_COMPAT_IGNORE) != \
+(tfs-super-x  ~FEATURE_RO_COMPAT_IGNORE))
+#define SUPER_DIFFERENT(x) \
+   (fs-super-x != tfs-super-x)
+
+   if (SUPER_COMPAT_DIFFERENT(s_feature_compat) ||
+   SUPER_INCOMPAT_DIFFERENT(s_feature_incompat) ||
+   SUPER_RO_COMPAT_DIFFERENT(s_feature_ro_compat) ||
SUPER_DIFFERENT(s_blocks_count) ||
SUPER_DIFFERENT(s_inodes_count) ||
memcmp(fs-super-s_uuid, tfs-super-s_uuid,


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-25 Thread Bryan Henderson
 AIX basically did this with SIGDANGER (the signal is ignored by
 default), except there wasn't the ability for the process to tell the
 kernel at what level of memory pressure before it should start getting
 notified, and there was no way for the kernel to tell how bad the
 memory pressure actually was.  On the other hand, it was a relatively
 simple design.

AIX does provide a system call to find out how much paging backing store 
space is available and the thresholds set by the system administrator. 
Running out of paging space is the only memory pressure AIX is concerned 
about.  While I think having processes make memory usage decisions based 
on that is a shoddy way to manage system resources, that's what it is 
intended for.

Incidentally, some context for the AIX approach to the OOM problem: a 
process may exclude itself from OOM vulnerability altogether.  It places 
itself in early allocation mode, which means at the time it creates 
virtual memory, it reserves enough backing store for the worst case.  The 
memory manager does not send such a process the SIGDANGER signal or 
terminate it when it runs out of paging space.  Before c. 2000, this was 
the only mode.  Now the default is late allocation mode, which is similar 
to Linux.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 10:34:25AM -0600, Eric Sandeen wrote:
  But it was this concern which is why ext3 never exported freeze
  functionality to userspace, even though other commercial filesystems
  do support this.  It wasn't that it wasn't considered, but the concern
  about whether or not it was sufficiently safe to make available.
 
 What's the safety concern; that the admin will forget to unfreeze?

That the admin would manage to deadlock him/herself and wedge up the
whole system...

 I'm also not sure I see the point of the timeout in the original patch;
 either you are done snapshotting and ready to unfreeze, or you're not;
 1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
 done, and you can only unfreeze then.  Shouldn't this be done
 programmatically, and not with some pre-determined timeout?

This is only a guess, but I suspect it was a fail-safe in case the
admin did manage to deadlock him/herself.  

I would think a better approach would be to make the filesystem
unfreeze if the file descriptor that was used to freeze the filesystem
is closed, and then have explicit deadlock detection that kills the
process doing the freeze, at which point the filesystem unlocks and
the system can recover.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Eric Sandeen
Theodore Tso wrote:
 The other approach would be to say, oh well, the freeze ioctl is
 inherently dangerous, and root is allowed to himself in the foot, so
 who cares.  :-)

I tend to agree.  Either you need your fs frozen, or not, and if you do,
be prepared for the consequences.

 But it was this concern which is why ext3 never exported freeze
 functionality to userspace, even though other commercial filesystems
 do support this.  It wasn't that it wasn't considered, but the concern
 about whether or not it was sufficiently safe to make available.

What's the safety concern; that the admin will forget to unfreeze?

 And I do agree that we probably should just implement this in
 filesystem independent way, in which case all of the filesystems that
 support this already have super_operations functions
 write_super_lockfs() and unlockfs().

That's what I was thinking; can't the path to freeze_bdev just be
elevated out of dm-ioctl.c to fs/ioctl.c and exposed, such that any
filesystem which implements .write_super_lockfs can be frozen?  This is
essentially what the xfs_freeze userspace does via
xfs_ioctl/XFS_IOC_FREEZE - which, AFAIK, isn't used much now that the
lvm hooks are in place.

I'm also not sure I see the point of the timeout in the original patch;
either you are done snapshotting and ready to unfreeze, or you're not;
1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
done, and you can only unfreeze then.  Shouldn't this be done
programmatically, and not with some pre-determined timeout?

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html