Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-03 Thread Aneesh Kumar K.V



Mingming Cao wrote:

On Tue, 2007-07-03 at 15:58 +0530, Kalpak Shah wrote:

On Sun, 2007-07-01 at 03:36 -0400, Mingming Cao wrote:

+
+#define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \
+do {  \
+   (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime); \
+   if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
+   ext4_decode_extra_time(&(inode)->xtime, 
\
+  raw_inode->xtime ## _extra); \
+} while (0)
+
+#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)
   \
+do {  \
+   if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
+   (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
+   if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
+   ext4_decode_extra_time(&(einode)->xtime,\
+  raw_inode->xtime ## _extra); \
+} while (0)
+

This nanosecond patch seems to be missing the fix below which is required for 
http://bugzilla.kernel.org/show_bug.cgi?id=5079

If the timestamp is set to before epoch i.e. a negative timestamp then the file 
may have its date set into the future on 64-bit systems. So when the timestamp 
is read it must be cast as signed.


Missed this one.
Thanks. Will update ext4 patch queue tonight with this fix.





IIRC in the conference call it was decided to not to apply this patch. Andreas 
may be able to update better.

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-03 Thread Mingming Cao
On Tue, 2007-07-03 at 15:58 +0530, Kalpak Shah wrote:
> On Sun, 2007-07-01 at 03:36 -0400, Mingming Cao wrote:
> > +
> > +#define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode)  
> >\
> > +do {   
> >\
> > +   (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);   \
> > +   if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
> > +   ext4_decode_extra_time(&(inode)->xtime,\
> > +  raw_inode->xtime ## _extra);\
> > +} while (0)
> > +
> > +#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)
> >\
> > +do {   
> >\
> > +   if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
> > +   (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
> > +   if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
> > +   ext4_decode_extra_time(&(einode)->xtime,   \
> > +  raw_inode->xtime ## _extra);\
> > +} while (0)
> > +
> 
> This nanosecond patch seems to be missing the fix below which is required for 
> http://bugzilla.kernel.org/show_bug.cgi?id=5079
> 
> If the timestamp is set to before epoch i.e. a negative timestamp then the 
> file may have its date set into the future on 64-bit systems. So when the 
> timestamp is read it must be cast as signed.

Missed this one.
Thanks. Will update ext4 patch queue tonight with this fix.

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-03 Thread Timothy Shimmin

Amit K. Arora wrote:

FA_FL_NO_MTIME  0x10 /* keep same mtime (default change on size, data change) */
FA_FL_NO_CTIME  0x20 /* keep same ctime (default change on size, data change) */

NACK to these aswell.  If i_size changes c/mtime need updates, if the size
doesn't chamge they don't.  No need to add more flags for this.


This requirement was from the point of view of HSM applications. Hope
you saw Andreas previous post and are keeping that in mind.


We use this capability in XFS at the moment.
I think this is mainly for DMF (HSM) but is done via the xfs handle interface
(xfs_open_by_handle) AFAICT.

This sets up a set of invisible operations (xfs_invis_file_operations).
xfs_file_ioctl_invis goes on to set IO_INVIS which goes on to set ATTR_DMI
which is then tested in xfs_change_file_space() (which handles XFS_IOC_RESVSP & 
friends)
for whether xfs_ichgtime(ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG)
is called or not.

--Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vm/fs meetup in september?

2007-07-03 Thread Dongjun Shin

I'd like to reference a paper titled "FASS : A Flash-Aware Swap System".
(http://kernel.kaist.ac.kr/~jinsoo/publication/iwssps05.pdf)

The paper describes a technique that uses NAND flash as a swap device
without FTL (Flash Translation Layer) or filesystem.

It is not related with XIP, however.

On 7/3/07, Jörn Engel <[EMAIL PROTECTED]> wrote:

On Mon, 2 July 2007 17:46:40 -0700, Jared Hulbert wrote:
>
> Right, the solution to swap problem is identical to the rw XIP
> filesystem problem.Jörn, that's why you're the self-appointed
> subject matter expert!

All right.  I'll try to make an important face whenever the subject
comes up.

Nick, do you have a problem if LogFS occupies two brainslots at the
meeting?


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread Andreas Dilger
On Jul 03, 2007  18:15 -0400, J. Bruce Fields wrote:
> How will nfsd tell whether it can really on a given filesystem's
> i_version, or whether it should fall back on ctime?

Good question.

> > As to performance concerns that raise before the inode version counter
> > (at least for ext4) is done inside ext4_mark_inode_dirty), so there is
> > no extra IO work to store this counter to disk.
> 
> So what's the motivation for the "noversion" mount option?

Lustre needs to be able to control the version number directly (version
number needs to be ordered between all inodes, is set by Lustre to be a
transaction number).  Instead of trying to incorporate this unused code
into ext4 we just turn off the ext4 version code and let Lustre control
this directly.  It may even be that NFSv4 will need to control the version
numbers itself...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] dio: remove bogus refcounting BUG_ON

2007-07-03 Thread Zach Brown
Linus, Andrew, please apply the bug fix patch at the end of this reply
for .22.

> >>One of our perf. team ran into this while doing some runs.
> >>I didn't see anything obvious - it looks like we converted
> >>async IO to synchronous one. I didn't spend much time digging
> >>around.

OK, I think this BUG_ON() is just broken.  I wasn't able to find any
obvious bugs from reading the code which would cause the BUG_ON() to
fire.  If it's reproducible I'd love to hear what the recipe is.

I did notice that this BUG_ON() is evaluating dio after having dropped
it's ref :/.  So it's not completely absurd to fear that it's a race
with the dio's memory being reused, but that'd be a pretty tight race.

Let's remove this stupid BUG_ON and see if that test box still has
trouble.  It might just hit the valid BUG_ON a few lines down, but this
unsafe BUG_ON needs to go.

---

dio: remove bogus refcounting BUG_ON

Badari Pulavarty reported a case of this BUG_ON is triggering during
testing.  It's completely bogus and should be removed.

It's trying to notice if we left references to the dio hanging around in
the sync case.  They should have been dropped as IO completed while this
path was in dio_await_completion().  This condition will also be
checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
lines lower.  So to start this BUG_ON() is redundant.

More fatally, it's dereferencing dio-> after having dropped its
reference.  It's only safe to dereference the dio after releasing the
lock if the final reference was just dropped.  Another CPU might free
the dio in bio completion and reuse the memory after this path drops the
dio lock but before the BUG_ON() is evaluated.

This patch passed aio+dio regression unit tests and aio-stress on ext3.

Signed-off-by: Zach Brown <[EMAIL PROTECTED]>
Cc: Badari Pulavarty <[EMAIL PROTECTED]>

diff -r 509ce354ae1b fs/direct-io.c
--- a/fs/direct-io.cSun Jul 01 22:00:49 2007 +
+++ b/fs/direct-io.cTue Jul 03 14:56:41 2007 -0700
@@ -1106,7 +1106,7 @@ direct_io_worker(int rw, struct kiocb *i
spin_lock_irqsave(&dio->bio_lock, flags);
ret2 = --dio->refcount;
spin_unlock_irqrestore(&dio->bio_lock, flags);
-   BUG_ON(!dio->is_async && ret2 != 0);
+
if (ret2 == 0) {
ret = dio_complete(dio, offset, ret);
kfree(dio);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] locks: share more common lease code

2007-07-03 Thread J. Bruce Fields
On Sat, Jun 30, 2007 at 10:20:13AM +0100, Christoph Hellwig wrote:
> On Fri, Jun 29, 2007 at 03:21:25PM -0400, J. Bruce Fields wrote:
> > From: J. Bruce Fields <[EMAIL PROTECTED]>
> > 
> > Share more code between setlease (used by nfsd) and fcntl.
> > 
> > Also some minor cleanup.
> 
> Looks good.  Fine for mainline just after 2.6.23 opens.

Thanks.  (And, by the way, would it be helpful for me to translate this
kind of statement into an "acked-by: Christoph..." on the eventual
patch?)

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread J. Bruce Fields
On Mon, Jul 02, 2007 at 10:58:33AM -0400, Mingming Cao wrote:
> Trond or Bruce, can you please review these patch series and ack if you
> agrees?

Thanks, looks like what we need!

How will nfsd tell whether it can really on a given filesystem's
i_version, or whether it should fall back on ctime?

> As to performance concerns that raise before the inode version counter
> (at least for ext4) is done inside ext4_mark_inode_dirty), so there is
> no extra IO work to store this counter to disk.

So what's the motivation for the "noversion" mount option?

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-03 Thread Karel Zak


 The first util-linux-ng 2.13 release candidate is available at

ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.13/


 Thanks to all who help with util-linux resuscitation:

H. Peter Anvin
Ian Kent

 and contribute to this project:

Arkadiusz Miskiewicz   Matthias Koenig
Cliff Wickman  Mike Frysinger
David Brownell Pádraig Brady
David Miller   Radek Biba
Jason Vas Dias Ram Pai
Kay SieversStepan Kasal
Luciano Chavez Steve Grubb
Marco d'Itri   Valerie Henson
Martin Schlemmer


 Feedback and bug reports, as always, are welcomed.

Karel



Util-linux-ng 2.13 Release Notes


Release highlights:
--

 mount(8) doesn't include NFS client code anymore. Don't forget to
 install nfs-utils 1.1.0 or newer with /sbin/[u]mount.{nfs,nfs4}.

 mount(8) doesn't include filesystem detection code anymore. You
 have to compile --with-fsprobe={blkid,volume_id}, and libblkid
 (e2fsprogs) or libvolume_id (udev >= v110) is required.

 mount(8) supports new relatime, context, fscontext, and defcontext
 mount options.

 losetup(8) supports command line option "-a" to list all used loop
 devices, '-s' to print a device name if "-f" and a file argument
 are present, and "-r" to create a read-only loop device.

 fdisk(8) Sun label support has been improved. fdisk(8) is also able
 to warn about detected GPT (fdisk doesn't support GPT).

 taskset(1) is independent on hardcoded NR_CPUS. chrt(1) supports
 SCHED_BATCH scheduling policy.

 The package build system is now based on autotools. The build system
 supports  separate CFLAGS and LDFLAGS for suid programs (SUID_CFLAGS,
 SUID_LDFLAGS). For more details see the README file

 hwclock(8) supports command line option --rtc= and /dev/rtc0
 device. --systohc functionality has been improved, and it doesn't cause
 a 500ms inaccuracy each time it is used.

 Audit system support (--with-audit) has been added to hwclock(8) and
 login(1).

 SELinux support (--with-selinux) has been added to mkswap(8) and
 mount(8).

 The setarch(8) upstream has been merged with util-linux-ng.


Fixed security issues:
-

 CVE-2007-0822 - mount(8) allows local users to trigger a NULL
 dereference and an application crash
 CVE-2006-7108 - login(1) omits PAM account validation when auth is
 skipped


Changelog:
-

agetty:
add 'O' escape code to display domain name
check gethostname() return value
blockdev:
add BLKFRAGET/BLKFRASET ioctls
cleanup usage() and update man page
build-sys:
add AC_GNU_SOURCE
add Automake option dist-bzip2
add missing files
add SUID_CFLAGS
add SUID_LDFLAGS
add support for audit
amend .gitignore
call automake after autoconf
cleanup architecture conditionals
cleanup sys-utils/ rdev symlinks
configure.am selinux support cleanup
declare SUID_CFLAGS and SUID_LDFLAGS as precious
do not build convenience libraries in lib/
do not kick off AM_CFLAGS by SUID_CFLAGS
do not play with DEFS, use AM_CPPFLAGS
do not set with_foo twice
do not use internal Autoconf variables
do not use wildcards in EXTRA_DIST
factor out common parts from mount/Makefile.am
fix HAVE_NCURSES
fix ifdef ENABLE_WIDECHAR usage
fix linking when ncurses is built with --with-termlib=tinfo
fix README filenames and add missing files to EXTRA_DISTs
fix the example configure call in README
fix the final message of autogen.sh
in configure.ac, change "po" -> "$srcdir/po"
in the clean targets use "find ... | xargs rm -f"
let configure instantiate the misc-utils/*.pl scripts
make the getopt example directory relative to datadir
merge adjacent AC_CONFIG_HEADERS and AC_CONFIG_FUNCS calls
minor fixes in configure.in
mount/Makefile.am tiny cleanup
mount/Makefile.am tiny cleanup II
move -D flags to *_CPPFLAGS
move the optimization flags to AM_CFLAGS
--prefix defaults to /usr
remove aclocal.m4 from SCM
remove AC_PROG_RANLIB
remove config.h.in from VCS
remove config/include-Makefile.am from EXTRA_DIST
remove DEFAULT_INCLUDES workaround
remove -fomit-frame-pointer
remove generated autotools stuff from git
remove po/Makevars.template from EXTRA_DIST
remove swapargs.h, move the tests to main configure.ac
rename to -ng, change maintainer name
replace AC_TRY_* by AC_*_IFELSE
s/AC_HELP_STRING/AS_HELP_STRING/
set DISTCHECK_CONFIGURE_FLAGS in top-level makefile
simplify "clean" in tests/Makefile.am
update po/POTFILES.in
use dist_example_DATA
use dist_no

Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Neil Brown
On Tuesday July 3, [EMAIL PROTECTED] wrote:
> 
> Getting a snapshot that is useful with respect to application data
> requires help from the application.

Certainly.

>  The app needs to be shutdown or
> paused prior to the snapshot and then started up again after the
> snapshot is taken.

Alternately, the app needs to be able to cope with unexpected system
shutdown (aka crash) and the same ability will allow it to cope with
an atomic snapshot.  It may be able to recover more efficiently from
an expected shutdown, so being able to tell the app about an impending
snapshot is probably a good idea, but it should be advisory only.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread Andreas Dilger
On Jul 03, 2007  10:24 -0400, Trond Myklebust wrote:
> It looks OK to me, but you might want to strip out the now redundant
> i_version updates in add_dirent_to_buf(), ext4_rmdir(), ext4_rename().

Agreed, and I thought we discussed that already on the ext4 list.

> I also have some questions about how this will affect the readdir code:
> unless I missed something, the filp->f_version is still unsigned long,
> so the comparisons and assignments in ext4_readdir()/ext4_dx_readdir()
> no longer make sense.

I don't see them as any worse than existing checks.  For 32-bit systems
we only ever had a 32-bit in-memory version anyway so using only the
low 32 bits of i_version in f_version is no more racy than in the past.
For 64-bit systems using the full on-disk i_version is possible.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 4/5] i_version:ext4 inode version update

2007-07-03 Thread Andreas Dilger
On Jul 03, 2007  12:19 +0530, Aneesh Kumar K.V wrote:
> Mingming Cao wrote:
> >Index: linux-2.6.22-rc4/fs/ext4/super.c
> >===
> >--- linux-2.6.22-rc4.orig/fs/ext4/super.c2007-06-13 
> >17:19:11.0 -0700
> >+++ linux-2.6.22-rc4/fs/ext4/super.c 2007-06-13 17:24:45.0 -0700
> >@@ -2846,8 +2846,8 @@ out:
> > i_size_write(inode, off+len-towrite);
> > EXT4_I(inode)->i_disksize = inode->i_size;
> > }
> >-inode->i_version++;
> > inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> >+inode->i_version = 1;
> > ext4_mark_inode_dirty(handle, inode);
> > mutex_unlock(&inode->i_mutex);
> > return len - towrite;
> 
> 
> Is this correct ? . Why do we set the qutoa file inodes version to 1  
> during write ?

Hmm, I thought we had previously fixed this?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 2/5] i_version: Add hi 32 bit inode version on ext4 on-disk inode

2007-07-03 Thread J. Bruce Fields
On Sun, Jul 01, 2007 at 03:37:16AM -0400, Mingming Cao wrote:
> This patch adds a 32-bit i_version_hi field to ext4_inode, which can be used 
> for 64-bit inode versions. This field will store the higher 32 bits of the 
> version, while Jean Noel's patch has added support to store the lower 32-bits 
> in osd1.linux1.l_i_version.
> 

Sorry, I'm a little lost--where's that earlier patch, and exactly what
tree should this patch series apply to?

--b.

> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
> ---
> Index: linux-2.6.21/include/linux/ext4_fs.h
> ===
> --- linux-2.6.21.orig/include/linux/ext4_fs.h
> +++ linux-2.6.21/include/linux/ext4_fs.h
> @@ -342,6 +342,7 @@ struct ext4_inode {
>   __le32  i_atime_extra;  /* extra Access time  (nsec << 2 | epoch) */
>   __le32  i_crtime;   /* File Creation time */
>   __le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
> + __le32  i_version_hi;   /* high 32 bits for 64-bit version */
>  };
> 
>  #define i_size_high  i_dir_acl
> 
> 
> ___
> NFSv4 mailing list
> [EMAIL PROTECTED]
> http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Xin Zhao

On 7/3/07, Bryan Henderson <[EMAIL PROTECTED]> wrote:

>>we want a open/close consistency in snapshots.
>
>This depends on the transaction engine in your filesystem.  None of the
>existing linux filesystems have a way to start a transaction when the
>file opens and finish it when the file closes, or a way to roll back
>individual operations that have happened inside a given transaction.
>
>It certainly could be done, but it would also introduce a great deal of
>complexity to the FS.

And I would be opposed as a matter of architecture to making open/close
transactional.  People often read more into open/close than is there, but
open is just about gaining access and close is just about releasing
resources.  It isn't appropriate for close to _mean_ anything.

There are filesystems that have transactions.  They use separate start
transaction / end transaction system calls (not POSIX).

>> Pausing apps itself
>> does not solve this problem, because a file could be already opened
>> and in the middle of write.

Just to be clear: we're saying "pause," but we mean "quiesce."  I.e., tell
the application to reach a point where it's not in the middle of anything
and then tell you it's there.  Indeed, whether you use open/close or some
other kind of transaction, just pausing the application doesn't help.  If
you were to implement open/close transactions, the filesystem driver would
just wait for the application to close and in the meantime block all new
opens.


If we want to support open/close consistency,  maybe we don't really
need the help from the application. For example, the filesystem is
implemented this way. When a file is opened for write, we copy the
metadata and create a CoW bitmap to keep track what has been changed.
Before writing any new data to the file, we copy the old data and then
write the new data. As such, when we take snapshot and encounter the
opened file, we can save the old data instead of the newdata, since
the old data is in a consistent state. Of course, new file opening
should also be handled this way.

The filesystem driver cannot wait for application to close, I think.
If the application is snapshot aware, the wait time could be
tolerable. But if the application does not provide a way to process
the quience request, the wait could be infinite.

What do you think?




--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems




-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Bryan Henderson
>>we want a open/close consistency in snapshots.
>
>This depends on the transaction engine in your filesystem.  None of the
>existing linux filesystems have a way to start a transaction when the
>file opens and finish it when the file closes, or a way to roll back
>individual operations that have happened inside a given transaction.
>
>It certainly could be done, but it would also introduce a great deal of
>complexity to the FS.

And I would be opposed as a matter of architecture to making open/close 
transactional.  People often read more into open/close than is there, but 
open is just about gaining access and close is just about releasing 
resources.  It isn't appropriate for close to _mean_ anything.

There are filesystems that have transactions.  They use separate start 
transaction / end transaction system calls (not POSIX).

>> Pausing apps itself
>> does not solve this problem, because a file could be already opened
>> and in the middle of write.

Just to be clear: we're saying "pause," but we mean "quiesce."  I.e., tell 
the application to reach a point where it's not in the middle of anything 
and then tell you it's there.  Indeed, whether you use open/close or some 
other kind of transaction, just pausing the application doesn't help.  If 
you were to implement open/close transactions, the filesystem driver would 
just wait for the application to close and in the meantime block all new 
opens.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 13:15:06 -0400
"Xin Zhao" <[EMAIL PROTECTED]> wrote:

> OK. From discussion above, can we reach a conclusion: from the
> application perspective, it is very hard, if not impossible, to take a
> transactional consistent snapshot without the help from applications?

You definitely need help from the applications.  They define what a
transaction is.

> 
> Chris, you mentioned that "Many different applications support some
> form of pausing in order to facilitate live backups. " Can you provide
> some examples? I mean popular apps.

Oracle, db2, mysql, ldap, postgres, sleepycat databases...just search
for online backup and most programs that involve something
transactional have a way to do it.

> 
> Finally, if we back up a little bit, say, we don't care the
> transaction level consistency ( a transaction that open/close many
> times), but we want a open/close consistency in snapshots. That is, a
> file in a snapshot must be in a single version, but it can be in a
> middle state of a transaction. Can we do that? Pausing apps itself
> does not solve this problem, because a file could be already opened
> and in the middle of write. As I mentioned earlier, some systems can
> backup old data every time new data is written, but I suspect that
> this will impact the system performance quite a bit. Any idea about
> that?
> 

This depends on the transaction engine in your filesystem.  None of the
existing linux filesystems have a way to start a transaction when the
file opens and finish it when the file closes, or a way to roll back
individual operations that have happened inside a given transaction.

It certainly could be done, but it would also introduce a great deal of
complexity to the FS.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Xin Zhao

OK. From discussion above, can we reach a conclusion: from the
application perspective, it is very hard, if not impossible, to take a
transactional consistent snapshot without the help from applications?

Chris, you mentioned that "Many different applications support some
form of pausing in order to facilitate live backups. " Can you provide
some examples? I mean popular apps.

Finally, if we back up a little bit, say, we don't care the
transaction level consistency ( a transaction that open/close many
times), but we want a open/close consistency in snapshots. That is, a
file in a snapshot must be in a single version, but it can be in a
middle state of a transaction. Can we do that? Pausing apps itself
does not solve this problem, because a file could be already opened
and in the middle of write. As I mentioned earlier, some systems can
backup old data every time new data is written, but I suspect that
this will impact the system performance quite a bit. Any idea about
that?

Thanks.



On 7/3/07, Chris Mason <[EMAIL PROTECTED]> wrote:

On Tue, 3 Jul 2007 12:31:49 -0400
"Xin Zhao" <[EMAIL PROTECTED]> wrote:

> That's a good point!
>
> But this sounds hopeless to take a real consistent snapshot from app
> perspective unless you shutdown the computer. Right?

Many different applications support some form of pausing in order
to facilitate live backups.  You just have to keep it all in mind when
designing the total backup solution.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 12:31:49 -0400
"Xin Zhao" <[EMAIL PROTECTED]> wrote:

> That's a good point!
> 
> But this sounds hopeless to take a real consistent snapshot from app
> perspective unless you shutdown the computer. Right?

Many different applications support some form of pausing in order
to facilitate live backups.  You just have to keep it all in mind when
designing the total backup solution.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Bryan Henderson
>But you look around, you may find that many
>systems claim that they can take snapshot without shutdown the
>application.

The claim is true, because you can just pause the application and not shut 
it down.  While this means you can't simply add snapshot capability and 
solve your copy consistency problem (you need new applications too), this 
is a huge advance over what there was before.  Without snapshots, you do 
have to shut down the application.  Often for hours, and during that time 
any service request to the application fails.  With snapshots, you simply 
pause the application for a few seconds.  During that time it delays 
processing of service requests, but every request ultimately goes through, 
with the requester probably not noticing any difference.

If a system claims that snapshot function in the filesystem alone gets you 
consistent backups, it's wrong.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-07-03 Thread Andreas Gruenbacher
On Monday 02 July 2007 22:15, Christoph Hellwig wrote:
> AA on the other hand just fucks up VFS layering [...]

Oh come on, this claim clearly isn't justified. How on earth is passing 
vfsmounts down the lsm hooks supposed to break vfs layering? We are not 
proposing to pass additional information down to file systems. There is no 
barrier between the vfs and lsm hooks for vfsmounts even today -- only look 
at the inode_getattr hook; it already gets a vfsmount.

Without vfsmount we cannot tell where in the namespace we are, but that 
information is essential for any kind of pathname based mechanism, AA or not, 
and even for plain reporting.

LSM as a framework is supposed to allow different security mechanisms to be 
plugged in. It isn't flexible enough for us right now, and so we are 
proposing to extend it. What can be wrong about that?

Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Xin Zhao

That's a good point!

But this sounds hopeless to take a real consistent snapshot from app
perspective unless you shutdown the computer. Right?

Thanks.


On 7/3/07, Bryan Henderson <[EMAIL PROTECTED]> wrote:

> Consistent state means many different things.

And, significantly, open/close has nothing to do with any of them
(assuming we're talking about the system calls).  open/close does not
identify a transaction; a program may open and close a file multiple times
the course of making a "single" update.  Also, data and metadata updates
remain buffered at the kernel level after a close.  And don't forget that
a single update may span multiple files.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Bryan Henderson
> Consistent state means many different things. 

And, significantly, open/close has nothing to do with any of them 
(assuming we're talking about the system calls).  open/close does not 
identify a transaction; a program may open and close a file multiple times 
the course of making a "single" update.  Also, data and metadata updates 
remain buffered at the kernel level after a close.  And don't forget that 
a single update may span multiple files.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Xin Zhao

Thanks for your reply.

Sounds like one has to stop or pause the applications to get
consistent snapshot?  But you look around, you may find that many
systems claim that they can take snapshot without shutdown the
application. Actually, I think it is impractical to require that app
to be shutdown before taking snapshot in a commercial environment.

Pausing apps is possible from the filesystem perspective. A simple
solution is that the filesystem stop writing any data to disk from the
point that the snapshotting command is received. But as we discussed
earlier, this is not sufficient to prevent a file from containing part
of old data and part of new data.

That's why I am so confused how can these systems provide consistent
snapshotting capability without sacrificing system performance much.



On 7/3/07, Chris Mason <[EMAIL PROTECTED]> wrote:

On Tue, 3 Jul 2007 01:28:57 -0400
"Xin Zhao" <[EMAIL PROTECTED]> wrote:

> Hi,
>
>
> If a file is already opened when snapshot command is issued,  the file
> itself could be in an inconsistent state already. Before the file is
> closed, maybe part of the file contains old data, the rest contains
> new data.
> How does a versioning filesystem guarantee that the file snapshot is
> in a consistent state in this case?
>
> I googled it but didn't find any answer. Can someone explain it a
> little bit?

It's the same answer as in most filesystem related questions...it
depends ;)  Consistent state means many different things.  It may mean
that the metadata accurately reflects the space on disk allocated to
the file and that all data for the file is properly on disk (ie from an
fsync).

But, even this is less than useful because very few files on the
filesystem stand alone.  Applications spread their state across a
number of files and so consistent means something different to
every application.

Getting a snapshot that is useful with respect to application data
requires help from the application.  The app needs to be shutdown or
paused prior to the snapshot and then started up again after the
snapshot is taken.

-chris




-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread Trond Myklebust
On Mon, 2007-07-02 at 10:58 -0400, Mingming Cao wrote:
> Trond or Bruce, can you please review these patch series and ack if you
> agrees? Thanks.
> 
> As to performance concerns that raise before the inode version counter
> (at least for ext4) is done inside ext4_mark_inode_dirty), so there is
> no extra IO work to store this counter to disk.

Hi Mingming,

It looks OK to me, but you might want to strip out the now redundant
i_version updates in add_dirent_to_buf(), ext4_rmdir(), ext4_rename().

I also have some questions about how this will affect the readdir code:
unless I missed something, the filp->f_version is still unsigned long,
so the comparisons and assignments in ext4_readdir()/ext4_dx_readdir()
no longer make sense.

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vm/fs meetup in september?

2007-07-03 Thread Jörn Engel
On Mon, 2 July 2007 17:46:40 -0700, Jared Hulbert wrote:
> 
> Right, the solution to swap problem is identical to the rw XIP
> filesystem problem.Jörn, that's why you're the self-appointed
> subject matter expert!

All right.  I'll try to make an important face whenever the subject
comes up.

Nick, do you have a problem if LogFS occupies two brainslots at the
meeting?

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 01:28:57 -0400
"Xin Zhao" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> 
> If a file is already opened when snapshot command is issued,  the file
> itself could be in an inconsistent state already. Before the file is
> closed, maybe part of the file contains old data, the rest contains
> new data.
> How does a versioning filesystem guarantee that the file snapshot is
> in a consistent state in this case?
> 
> I googled it but didn't find any answer. Can someone explain it a
> little bit?

It's the same answer as in most filesystem related questions...it
depends ;)  Consistent state means many different things.  It may mean
that the metadata accurately reflects the space on disk allocated to
the file and that all data for the file is properly on disk (ie from an
fsync).

But, even this is less than useful because very few files on the
filesystem stand alone.  Applications spread their state across a
number of files and so consistent means something different to
every application.

Getting a snapshot that is useful with respect to application data
requires help from the application.  The app needs to be shutdown or
paused prior to the snapshot and then started up again after the
snapshot is taken.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-03 Thread Amit K. Arora
On Tue, Jul 03, 2007 at 11:31:07AM +0100, Christoph Hellwig wrote:
> On Tue, Jul 03, 2007 at 03:38:48PM +0530, Amit K. Arora wrote:
> > > FA_FL_DEALLOC 0x01 /* deallocate unwritten extent (default 
> > > allocate) */
> > > FA_FL_KEEP_SIZE   0x02 /* keep size for EOF {pre,de}alloc (default change 
> > > size) */
> > > FA_FL_DEL_DATA0x04 /* delete existing data in alloc range (default 
> > > keep) */
> > 
> > We now have two sets of flags - 
> > 1) the above three with which I think no one has any issues with, and
> 
> Yes, I do.  FA_FL_DEL_DATA is plain stupid, a preallocation call should
> never delete data.  FA_FL_DEALLOC should probably be a separate syscall
> because it's very different functionality.

Well, if you see the modes proposed using above flags :

#define FA_ALLOCATE 0
#define FA_DEALLOCATE   FA_FL_DEALLOC
#define FA_RESV_SPACE   FA_FL_KEEP_SIZE
#define FA_UNRESV_SPACE (FA_FL_DEALLOC | FA_FL_KEEP_SIZE | FA_FL_DEL_DATA)

FA_FL_DEL_DATA is _not_ being used for preallocation. We have two modes
for preallocation FA_ALLOCATE and FA_RESV_SPACE, which do not use this
flag. Hence prealloction will never delete data.
This mode is required only for FA_UNRESV_SPACE, which is a deallocation
mode, to support any existing XFS aware applications/usage-scenarios.

And, regarding FA_FL_DEALLOC being a separate syscall - I think then the
very purpose of @mode argument is not justified. We have this mode so
that we can provide more features like this. That said, I don't say that
we should make things very complicated; but, atleast we should provide
some basic features which we expect most of the applications wanting
preallocation to use. To start with, we need to cater to already
existing applications/user base who use XFS preallocation feature.

And further advanced features, like goal based preallocation, can be
implemented as a separate syscall.

> While we're at it I also dislike the FA_ prefix becuase it doesn't say
> anything and is far too generic.  FALLOC_ is much better.

Ok. This can be changed in the next take.
 
> > > FA_FL_ERR_FREE0x08 /* free preallocation on error (default keep 
> > > prealloc) */
> 
> NACK on this one.  We should have just one behaviour, and from the thread
> that not freeing the allocation on error.

I agree on this one. 
 
> > > FA_FL_NO_MTIME0x10 /* keep same mtime (default change on size, data 
> > > change) */
> > > FA_FL_NO_CTIME0x20 /* keep same ctime (default change on size, data 
> > > change) */
> 
> NACK to these aswell.  If i_size changes c/mtime need updates, if the size
> doesn't chamge they don't.  No need to add more flags for this.

This requirement was from the point of view of HSM applications. Hope
you saw Andreas previous post and are keeping that in mind.

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-03 Thread Kalpak Shah
On Sun, 2007-07-01 at 03:36 -0400, Mingming Cao wrote:
> +
> +#define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode)
>\
> +do {\
> + (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);   \
> + if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
> + ext4_decode_extra_time(&(inode)->xtime,\
> +raw_inode->xtime ## _extra);\
> +} while (0)
> +
> +#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)  
>\
> +do {\
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
> + (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
> + ext4_decode_extra_time(&(einode)->xtime,   \
> +raw_inode->xtime ## _extra);\
> +} while (0)
> +

This nanosecond patch seems to be missing the fix below which is required for 
http://bugzilla.kernel.org/show_bug.cgi?id=5079

If the timestamp is set to before epoch i.e. a negative timestamp then the file 
may have its date set into the future on 64-bit systems. So when the timestamp 
is read it must be cast as signed.

Index: linux-2.6.21/include/linux/ext4_fs.h
===
--- linux-2.6.21.orig/include/linux/ext4_fs.h
+++ linux-2.6.21/include/linux/ext4_fs.h
@@ -390,7 +390,7 @@ do {
   \

 #define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \
 do {  \
-   (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);   \
+   (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime);   \
if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
ext4_decode_extra_time(&(inode)->xtime,\
   raw_inode->xtime ## _extra);\
@@ -399,7 +399,8 @@ do {
   \
 #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)
   \
 do {  \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
-   (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
+   (einode)->xtime.tv_sec =   \
+   (signed)le32_to_cpu((raw_inode)->xtime);   \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
ext4_decode_extra_time(&(einode)->xtime,   \
   raw_inode->xtime ## _extra);\


Thanks,
Kalpak.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-03 Thread Kalpak Shah
On Sun, 2007-07-01 at 03:36 -0400, Mingming Cao wrote:
> +
> +#define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode)
>\
> +do {\
> + (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);   \
> + if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
> + ext4_decode_extra_time(&(inode)->xtime,\
> +raw_inode->xtime ## _extra);\
> +} while (0)
> +
> +#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)  
>\
> +do {\
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
> + (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
> + ext4_decode_extra_time(&(einode)->xtime,   \
> +raw_inode->xtime ## _extra);\
> +} while (0)
> +

This nanosecond patch seems to be missing the fix below which is required for 
http://bugzilla.kernel.org/show_bug.cgi?id=5079

If the timestamp is set to before epoch i.e. a negative timestamp then the file 
may have its date set into the future on 64-bit systems. So when the timestamp 
is read it must be cast as signed.

Index: linux-2.6.21/include/linux/ext4_fs.h
===
--- linux-2.6.21.orig/include/linux/ext4_fs.h
+++ linux-2.6.21/include/linux/ext4_fs.h
@@ -390,7 +390,7 @@ do {
   \

 #define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \
 do {  \
-   (inode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);   \
+   (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime);   \
if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
ext4_decode_extra_time(&(inode)->xtime,\
   raw_inode->xtime ## _extra);\
@@ -399,7 +399,8 @@ do {
   \
 #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)
   \
 do {  \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
-   (einode)->xtime.tv_sec = le32_to_cpu((raw_inode)->xtime);  \
+   (einode)->xtime.tv_sec =   \
+   (signed)le32_to_cpu((raw_inode)->xtime);   \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
ext4_decode_extra_time(&(einode)->xtime,   \
   raw_inode->xtime ## _extra);\


Thanks,
Kalpak.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-03 Thread Christoph Hellwig
On Tue, Jul 03, 2007 at 03:38:48PM +0530, Amit K. Arora wrote:
> > FA_FL_DEALLOC   0x01 /* deallocate unwritten extent (default 
> > allocate) */
> > FA_FL_KEEP_SIZE 0x02 /* keep size for EOF {pre,de}alloc (default change 
> > size) */
> > FA_FL_DEL_DATA  0x04 /* delete existing data in alloc range (default 
> > keep) */
> 
> We now have two sets of flags - 
> 1) the above three with which I think no one has any issues with, and

Yes, I do.  FA_FL_DEL_DATA is plain stupid, a preallocation call should
never delete data.  FA_FL_DEALLOC should probably be a separate syscall
because it's very different functionality.

While we're at it I also dislike the FA_ prefix becuase it doesn't say
anything and is far too generic.  FALLOC_ is much better.

> > FA_FL_ERR_FREE  0x08 /* free preallocation on error (default keep 
> > prealloc) */

NACK on this one.  We should have just one behaviour, and from the thread
that not freeing the allocation on error.

> > FA_FL_NO_MTIME  0x10 /* keep same mtime (default change on size, data 
> > change) */
> > FA_FL_NO_CTIME  0x20 /* keep same ctime (default change on size, data 
> > change) */

NACK to these aswell.  If i_size changes c/mtime need updates, if the size
doesn't chamge they don't.  No need to add more flags for this.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-03 Thread Amit K. Arora
On Sat, Jun 30, 2007 at 12:52:46PM -0400, Andreas Dilger wrote:
> The @mode flags that are currently under consideration are (AFAIK):
> 
> FA_FL_DEALLOC 0x01 /* deallocate unwritten extent (default allocate) 
> */
> FA_FL_KEEP_SIZE   0x02 /* keep size for EOF {pre,de}alloc (default change 
> size) */
> FA_FL_DEL_DATA0x04 /* delete existing data in alloc range (default 
> keep) */

We now have two sets of flags - 
1) the above three with which I think no one has any issues with, and
2) the ones below, for which we need some discussions before finalizing
on them.

I will prefer fallocate going in mainline with the above three modes, and
rest of the modes can be debated upon and discussed parallely. And, each
new mode/flag can be pushed as a separate patch. This will not hold
fallocate feature indefinitely...

Please confirm if you find this approach ok. Otherwise, please object.
Thanks!

> FA_FL_ERR_FREE0x08 /* free preallocation on error (default keep 
> prealloc) */
> FA_FL_NO_MTIME0x10 /* keep same mtime (default change on size, data 
> change) */
> FA_FL_NO_CTIME0x20 /* keep same ctime (default change on size, data 
> change) */

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html