Re: What's cooking in e2fsprogs.git (topics)

2008-02-25 Thread Theodore Tso
On Mon, Feb 25, 2008 at 11:32:34AM -0600, Eric Sandeen wrote:
> Are you sure?  This was her patch comment, from
> [PATCH] ext4: Don't set EXTENTS_FL flag for fast symlinks:

You're right, I confused myself.

> > But you do raise a good point that we need to support using the
> > extents format in order to support blocks > 2**32, so we can't just
> > arbitrary convert all symlinks to the old-style direct block maps.
> 
> ... so I think we really *should* be unconditionally storing *long*
> symlinks in extent format, on ext4... right?

Yes, I think so.  Being liberal in what you receive is probably a good
idea, but we should only store new symlinks in one standard format,
and the extents format makes sense since it will allow us to support
48-bit block numbers.

   - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-25 Thread Theodore Tso
On Sun, Feb 24, 2008 at 09:20:50PM -0700, Andreas Dilger wrote:
> On Feb 22, 2008  19:15 -0500, Theodore Ts'o wrote:
> > So before the recent patch were we actually creating long symlinks in
> > extents format?  Or were we just setting the flag but still treating
> > them as a block number?  If it was the latter, I guess we can put in
> > code into e2fsck to detect that case, and convert it back to a
> > singleton block number.  
> 
> Eric informed me that the long symlinks were actually stored in extent
> mapped blocks.  That is not harmful, because it can only be a single
> block and it will always fit into the inode.  The other thing to note
> is that extent mapping is REQUIRED for > 32-bit blocknumbers, so we
> may as well fix e2fsprogs to allow these symlinks to be handled normally.

Well, at least some kernel versions (as of sometime just before
2.6.25, iirc) were storing the long symlink as a single block in
i_block[0], despite EXTENTS_FL being set.  Valerie noticed this, and I
confirmed it, as it caused the mainline e2fsck extents support to core
dump.  Basically, what this means is that e2fsprogs can't trust
EXTENTS_FL for long symlinks.

But you do raise a good point that we need to support using the
extents format in order to support blocks > 2**32, so we can't just
arbitrary convert all symlinks to the old-style direct block maps.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 04:14:34PM -0700, Andreas Dilger wrote:
> On Feb 21, 2008  10:40 -0600, Eric Sandeen wrote:
> > Ok, but my concern is what happens to those long symlinks when they
> > really truly are in extents format.  One option is to say "hey it was
> > ext4DEV, deal with it" and nuke the symlink, but if possible, converting
> > back to the proper format would be nice.
> 
> Is that actually the case though?  That should be pretty easy to massage
> into storing a block number.  The difficulty is if the long symlink block
> is beyond 32-bit blocknr, in which case it actually needs extents format.
> We may as well bite the bullet and fix the code to be the same as with
> htree fakeroot index block reading and use the proper mapping to find
> the symlink block.  See htree_blk_iter_cb() for how to do that.

So before the recent patch were we actually creating long symlinks in
extents format?  Or were we just setting the flag but still treating
them as a block number?  If it was the latter, I guess we can put in
code into e2fsck to detect that case, and convert it back to a
singleton block number.  

> > I too had assumed that 48 bits would be it for now; it should be
> > sufficient for a good while.  I guess what I'd like to see if a usable
> > ext4 out there in the near future, with stuff added on later as
> > necessary; delalloc, flex_bg (if that doesn't make 2.6.25...) etc.

Flex_bg is already in 2.6.25.  What's in the patch queue are changes
to the allocation algorithms to make them smarter if flex_bg is
enabled.

> At some point 32-bit logical block numbers will also be an issue, but
> the need for 16TB+ non-sparse single files is rare even in my world.

Yeah, I think the real issue is unless someone is willing to sign up
to support a new extent format (which will require significant code
revamp in the kernel, since right now it pretty much assumes only a
single extent format from a code structure point of view), it's
probably not going to happen in the near future.

> > Oh, speaking of all this - what do you think the criteria are for
> > dropping the "dev" from ext4dev?  How do we decide that it's cooked
> > enough?  :)
> 
> I'd say when e2fsprogs has an official release with extents support,
> and there are no show-stopping bugs in the existing code...  I don't
> think that is too far off anymore.

I guess I'd be a *bit* more cautious.  We still have some code patches
such as the delayed allocation and to a lesser extent the online
defrag patches which have the possibility of introducing bugs.  Once
all of those get merged and we have a full kernel release cycle to fix
the last remaining bugs, that's when I would drop the -dev from the
name.

Speaking of which, that's probably one of the things we should start
concentrating on the kernel side, which is preparing what's left in
the unstable part of the queue to be cleaned up and ready for
submission during the next merge window.  Yeah, it's a ways, off, but
some of the patches left in the unstable series probably need quite a
bit of work.  :-)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck dies with error "this should never happen!!!"

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 08:31:13PM +0100, Bas van Schaik wrote:
> > Yeah, it could be quite big, given that you have a 3TB filesystem.  That's
> > why I suggested the "or given me login access to the system", although
> > understand there could be all sorts of privacy and security issues
> > involved with that request.
> 
> The dump is not ready yet...
> 
> Whatever we will do, it will have to wait for a week and a half. For now I
> really appreciate all the help.

Ok, when you get back, I should also have a patch for you that will
allow the "XXX should never happen" message to display more
information, so if the dump is too big for us to move over the
network, I'll have another way of doing some remote debugging.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck dies with error "this should never happen!!!"

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 07:06:10PM +0100, Bas van Schaik wrote:
> > Bas van Schaik wrote:
> >>> Bas van Schaik wrote:
> >>>
>  In a few hours I will hit the road towards France for a holiday, from
>  which I will return on the 3rd of March. I would be _really_ grateful
>  if
>  you could assist me solving this problem.
> >>> Providing the compressed e2image would let ted or others look into the
> >>> problem.
> >>>
> >>
> >> Should that be a raw image or a 'normal' one?
> >
> > Raw, I think; that way you can compress it as well.
> >
> 
> I'm working on it, but it seems that the image creation process is going
> to take a long time... If it will make it before my departure, I will
> publish it somewhere and post the link here. If it takes too long, I will
> publish it in about a week and a half.

Yeah, it could be quite big, given that you have a 3TB filesystem.  That's
why I suggested the "or given me login access to the system", although
understand there could be all sorts of privacy and security issues
involved with that request.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 01:10:40PM -0500, Philip Spencer wrote:
> You know what -- I went back and double-checked all the logs, and somehow
> or other I must have recorded a timestamp wrong as 3:19:21 instead of 
> 3:19:51.
>
> The segfault did in fact happen at 3:19:51 a.m. which is exactly the same 
> time as my backup script moved on to the next filesystem.
>
> So, it occurred during the unmount and lvremove of the snapshot volume.
> It is, then, entirely expected that the device-mapper routines would return 
> an error if the device no longer existed when the task was run.
>
> My apologies for mixing up the timestamps! And no bug in device-mapper, 
> just the one in e2fsprogs whch segfaulted in this circumstance instead of 
> dropping the device from its list. Having it fail outright, and not list 
> the device at all, is the correct behaviour for this situation -- just as 
> if the device had already been removed before the blkid routines were run.

OK, that's helpful, to know.  Thanks!!!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 10:52:36AM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > On Fri, Feb 22, 2008 at 10:16:53AM -0600, Eric Sandeen wrote:
> >> From a quick chat with agk, it sounds like outright failure is
> >> appropriate.  Sounds like most of the calls fail for reasons like ENOMEM
> >> (but it might be nice if it returned that, eh?)
> > 
> > So the question then is why is it that Phillip was able to seeing
> > failures when he was creating and deleting snapshots?
> > 
> > I don't mind having blkid return a failure, but it may not fix
> > Phillip's scenario which he listed in BZ #433857; yeah, he won't have
> > a core dump, which is good, but it might mean that some or all of the
> > dm volumes disappear from the blkid results.
> 
> Maybe a device-mapper bug is in order :)

Yep, especially if it can be easily reproduced.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 10:16:53AM -0600, Eric Sandeen wrote:
> From a quick chat with agk, it sounds like outright failure is
> appropriate.  Sounds like most of the calls fail for reasons like ENOMEM
> (but it might be nice if it returned that, eh?)

So the question then is why is it that Phillip was able to seeing
failures when he was creating and deleting snapshots?

I don't mind having blkid return a failure, but it may not fix
Phillip's scenario which he listed in BZ #433857; yeah, he won't have
a core dump, which is good, but it might mean that some or all of the
dm volumes disappear from the blkid results.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 09:02:56AM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > This looks good, but I assume that the bug was caused by some race
> > condition where if you try to call dm_task_get_info() while some other
> > process is creating or removing a snapshot, dm_task_get_info() is
> > returning some kind of EAGAIN, or some other "Try again; we're busy"
> > error, right?
> > 
> > If that is the case, can you try to find out what error is being
> > returned?  It may be the right thing to do is to check to see if we
> > are getting a "resource is locked; try again in a sec" error message,
> > and retry the dm_task_get_info(), instead of just returning a failure.
> 
> well, dm_task_get_info just returns either 0 or 1; unless there is some
> other contextual piece of information to use, I don't know if we can
> differentiate between error types.  I'll ask agk...

Maybe the right thing is to try 3 times before giving up, maybe with a
nanosleep in between, or some such?  Hopefully agk can give us some
hints about what's the right way to handle errors from all of the
dm_task* calls.

Thanks!!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: error checking in blkid/devname.c

2008-02-22 Thread Theodore Tso
On Thu, Feb 21, 2008 at 04:10:17PM -0600, Eric Sandeen wrote:
> This is for RH Bugzilla #433857: 
> rpc.mountd segfaults due to uninitialized value in e2fsprogs devname.c
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=433857
> 
> which did some very helpful analysis & provided a patch.
> 
> This patch is based on that, but checks all the devicemapper calls,
> and does some goto error handling / unwrapping, in the same style as
> the device-mapper lib code itself.

This looks good, but I assume that the bug was caused by some race
condition where if you try to call dm_task_get_info() while some other
process is creating or removing a snapshot, dm_task_get_info() is
returning some kind of EAGAIN, or some other "Try again; we're busy"
error, right?

If that is the case, can you try to find out what error is being
returned?  It may be the right thing to do is to check to see if we
are getting a "resource is locked; try again in a sec" error message,
and retry the dm_task_get_info(), instead of just returning a failure.

Thanks!!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck dies with error "this should never happen!!!"

2008-02-22 Thread Theodore Tso
On Fri, Feb 22, 2008 at 01:07:13PM +0100, Bas van Schaik wrote:
> Hi all,
> 
> Currently, I have a big problem: e2fsck refuses to fix my currupted
> filesystem...

The things that I would suggest first of all is upgrading to the
latest version of e2fsprogs.  If that doesn't fix it, I'm going to
need either a compressed e2image file (see the e2image man page) or
login access to see what is up.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-21 Thread Theodore Tso
On Wed, Feb 20, 2008 at 12:46:57PM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > The big news here is that extents branch is sufficiently finished that
> > I've promoted it to the next branch.
> 
> Hey, that's great news.  :)
> 
> Out of curiosity; what is the plan for behavior when finding symlinks
> with the extents flag set?  Last I checked e2fsprogs-interim, they got
> clobbered.  Is that on the TODO before extents support goes "live?"

Right now on e2fpsrogs 'next' the extents flag being set on symlinks,
block/char devices (in general inodes for which
ext2fs_has_valid_blocks returns NULL) are ignored.  I need to make
sure this does the right thing with long symlinks --- and I'd argue
that given that long symlinks can only *ever* be one block long, it's
pointless to use the extents format, since it's just too complicated,
and I *think* that's what the kernel code is doing, but I need to
check to be sure.

Eventually I'll add code to mainline to clear EXTENTS_FL from inodes
that shouldn't have it, but the kernel patches need to hit mainline first.

At this point, I think the only thing the 'pu' branch is missing is
the DIR_NLINKS patch, and I'll get that pulled in fairly quickly.  At
that point I believe the 'pu' branch and e2fsprogs-interim should be
of roughly the same functionality, except of course that the patches
which compose e2fsprogs-interim have gotten a lot more testing thanks
to Clusterfs using it in with Lustre, and the fact that at the moment,
if there are blocks that are claimed by multiple inodes, the 'clone'
feature which clones the block so that both inodes get their own copy
is not supported.  The filesystem can be made consistent by deleting
one of the files, which is all UFS-style fsck's shipping with
professional Unix systems like Solaris support :-P, but obviously
that's not ideal.  :-)

So one question which Eric you'll want to consider is at what point
you want to switch the FC users from e2fsprogs-interim to the
bleeding-edge e2fsprogs mainline code, since eventually this is the
branch that will have blk64_t support.

One related question is whether we want to try to get support for full
64-bit physical block numbers into ext4.  I think there were some
rough draft patches floating about, but IIRC they didn't
simultaneously support the 48-bit extent format.  The e2fsprogs
mainline implementation of extents makes it fairly easy to support new
extents formats --- it's minimal code changes in a single file,
lib/ext2fs/extents.c, and made sure all of the infrastructure was
present to make this easy --- but I believe that trying to support
multiple formats in the kernel given the current ext4 code would be
fairly invasive.  Given the timeline, I'm assuming that our current
path is that we aren't planning on pushing full 64-bit physical block
support into the extents format for what we hope will make it into the
next round of enterprise kernels, but I thought I should throw out the
question so we make the decision one way or the other, explicitly.

Comments?

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [E2FSPROGS, RFC] New mke2fs types parsing

2008-02-21 Thread Theodore Tso
On Thu, Feb 21, 2008 at 01:52:21AM -0700, Andreas Dilger wrote:
> On Feb 20, 2008  17:20 -0500, Theodore Ts'o wrote:
> > There are only three things which mke2fs will do, in my design:
> 
> This should all go into the mke2fs man page...

It will be documented; as I said, this was a request for comments
about the overall design, before I finish polishing it and adding man
page documentation, etc.  The patch was very much an interim patch,
including lots and lots of debugging printf's.  :-)

> 
> > [fs_types]
> > ext3 = {
> > features = has_journal
> > }
> > ext4 = {
> > features = extents,flex_bg
> > inode_size = 256
> > }
> 
> Presumably the ext4 feature should also have features = has_journal?
> If this is the default for ext4, why would it need to be given for ext3?
> 
> We should also add "dir_nlink,flexbg" while we are in there.

Yes, of course.  This was an example, not what I plan to check in.
(And it's flex_bg, not flexbg; we also need to add the uninit_groups
flags, etc.)

The other thing which I've been considering is some what to make the
feature list displayed by dumpe2fs a bit easier to understand.  One
thought is to bundle a number of features into things like std_ext2,
std_ext3, std_ext4, etc., with an option to display the full set for
someone who wants a more verbose/explicit description.  The one caveat
here is that once a bundle is defined, we don't ever want to change it
so that when someone e-mail's a dumpe2fs output as part of a bug
report, there is no question about what a feature bundle means; it
can't be e2fsprogs version dependent.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [E2FSPROGS, RFC] New mke2fs types parsing

2008-02-20 Thread Theodore Tso
On Wed, Feb 20, 2008 at 12:51:21PM -0600, Eric Sandeen wrote:
> Theodore Ts'o wrote:
> > The following patch is a work in progress, but I'm sending it out so
> > folks can take a look at it and comment on the general approach.
> > 
> > What this does is change how mke2fs -T works so it can take a comma
> > separated list, so you can do things like this:
> > 
> >   mke2fs -T ext4,small,news
> 
> Is there some hierarchy of what these options are and how they fit
> together?  i.e. small & news might go together (in bizarro world...) but
> "small,large" wouldn't make sense - nor would -T ext3,ext4.  Or, if
> somebody stores mail & news on the same fs, nad they say -T mail,news
> but the mail & news types have conflicting directives...
> 
> how will you define what an acceptable composite of subtypes will be?

There are only three things which mke2fs will do, in my design:

#1) If the first type is not one of "ext2", "ext3", "ext4", or
"ext4dev", mke2fs will determine a suitable default based on either
argv[0] if it is of the form mke2fs.*, or via defaults.fs_type in
/etc/mke2fs.conf.  If neither of these is available it will default to "ext2".

#2) If the second type is not one of "small", "floppy" or "default",
mke2fs will create a default type by checking the size of the
to-be-created filesystem.  If it is less than 3mb, it is "floppy", if
it is less than 512mb, it is "small", otherwise it is default.

#3) Once it has the list of types, i.e., "ext3,defaults,squid", or
"ext2,floppy,rescue", "ext4,defaults,largefile", etc. it uses this as
a search path when mke2fs needs to look up some parameter, such as
"base_features", or "inode_size", etc.  

So suppose we are looking up the inode_size and the fs_types list is
"ext3,defaults,squid".  The possible places where the inode_size
parameter cna be found are: defaults.inode_size,
fs_types.ext3.inode_size, fs_types.defaults.inode_size,
fs_types.squid.inode_size.  Mke2fs will look in the most specific
place (fs_types.squid.inode_size), and then successively more general,
all the way up to defaults.inode_size for the inode_size parameter in
/etc/mke2fs.conf.

(Oh, looks like I got things backwards for the string lookups; oops,
I'll fix that.)

So the basic idea is that it's as free form as possible, and it's all
based on defaults getting overridden in the config file.  So if the
mke2fs.conf file has something like this:

[defaults]
base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
blocksize = 4096
inode_size = 256
inode_ratio = 16384

[fs_types]
ext3 = {
features = has_journal
}
ext4 = {
features = extents,flex_bg
inode_size = 256
}
small = {
blocksize = 1024
inode_size = 128
inode_ratio = 4096
}
news = {
inode_ratio = 4096
}

And the user runs the command:  /sbin/mkfs.ext4 -T news /dev/hda5

(and let's assume for the same of argument that /dev/hda5 is 400
megabytes)

Then mke2fs will expand the fs_types field to be "ext4,small,news".
The user could have specified this explicitly as an argument to the -T
option, but 99% of the time, they won't.

So that means that when we look for the inode_ratio parameter, it will
come form fs_types.news.inode_ratio, and mke2fs will use the value
4096.  For the inode_size, the most specific version will be
fs_types.small.inode_size, or 128.   

In terms of handling features, things are a bit more complicated.  The
design is that we use base_features (first looked in [defaults],
[fs_types].ext4, et. al) to determine the base set of features to
initialize feature set flags.  Next, we look for fs_types.ext4.features, 
fs_types.small.features, fs_types.news.features, and if any of them
exist, they are used to modify the feature set.   Just as with tune2fs, 
the ^ character can be used to turn off a feature.  Finally, the argument 
to -O is used to further modify the feature set.

This are a little bit complicated because I wanted to preserve
backwards compatibility with the existing mke2fs.conf semantics, while
still making it possible to incrementally override portions of the
mke2fs configuration parameters in the simplest but yet most powerful
way possible.

In practice, I suspect most poeple will just say:

   mke2fs -T ext4 /dev/hda5

or
   mkfs.ext4 /dev/hda5

or

   mkfs.ext4 -T news /dev/hda5

Some people might also do:

   mke2fs -T ext4,news /dev/hda5

I doubt many people will do something as complicated as:

   mke2fs -T ext4,largefiles,squid,extra-resize-space /dev/hda5

and have a complicated mke2fs.conf file site on their system --- but
if they really want to do that, they can.

I could also imageine people using this to make it easier to create
different kinds of filesystems for creating test cases or benchmarks, so
/etc/mke2fs.conf could contain things like 

[fs_types]
benchmark_tpc_h =

Re: [PATCH] blkid detection for ZFS

2008-02-20 Thread Theodore Tso
On Thu, Feb 14, 2008 at 06:07:40PM -0700, Andreas Dilger wrote:
> Some input is welcome here also...  There is a UUID (GUID) for the whole
> "pool" (aggregation of devices that ZFS filesystems might live on), a
> UUID for the "virtual device" (vdev) (akin to MD RAID set) that a disk
> is part of and also a separate UUID for each device.  There is a LABEL
> (pool name) for the whole pool, but not one for an individual filesystem.

Are there devices for that are made available for the vdev and the
pool?  I assume not for the pool since that's a filesystem entity, but
what about the vdev?

In general, blkid is all about mapping the UUID of what lives on the
device to the device filename, for the benefit of programs like mount
and fsck.

I don't know enough about ZFS in terms of how you would mount a
filesystem which is part of a pool.  How is the filesystem specified
to the "mount" command?

> I'm thinking of making the blkid UUID be the GUID of the whole pool, as
> any device in the pool would be sufficient to locate all of the component
> devices.  This means all devices in the same pool will return the same
> UUID, but for identification that should be fine I think...  I haven't
> checked for pathologies in libblkid regarding that yet.

What I did with the freshly checked in code to identify LVM2 volumes
is that if /dev/sda1 is part of a physical volume, then blkid returns
the UUID for the PV for /dev/sda1.  The filesystem UUID's end up
getting associated via blkid entries for /dev/mapper/FOO, and we don't
bother trying to associate the UUID for the raidset with anything,
since it's not really associated with a physical block device.

So it would seem to me that it would be better to make the UUID be for
a particular ZFS physical disk be the UUID for that disk, and not for
the whole pool.  The question really, though, is what actually would
be most useful --- who is going to actually use blkid on a Solaris
system with ZFS?  It may be that the right answer is to put the pool
UUID as a separate tag; blkid supports more than just the standard
LABEL, UUID, TYPE, etc. tags.  You could easily stash the pool UUID in
a POOL_GUID tag, if it would be useful for some blkid callers.

> On a related note - on Solaris the ZFS filesystems always live in a GPT
> partition table, and I note that libblkid doesn't identify this.  Is that
> something we want to start adding to libblkid (e.g. GPT, DOS, LVM, etc)?

What do you mean by not identifying the GPT partition table?  At the
moment we haven't been identifying the whole disk partition tables,
mainly becuase there isn't much use for it especially for the DOS MBR
(no uuid or label to speak of).

I just checked in a patch from Eric to detect LVM2 PV's, because it
was useful for the Anaconda developers.  I wouldn't have any
objections accepting a patch which detected the whole-disk device and
returned the GPT label/UUID information, but I probably wouldn't code
it myself.  Still, if it someone thought it was *useful* and would use
it, and thus felt called to write a patch, I'd certainly accept it.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - detect LVM2 PVs in libblkid

2008-02-19 Thread Theodore Tso
On Tue, Feb 19, 2008 at 09:02:30AM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> 
> > The patch works for me, but it would seem the right thing would be
> > print the UUID without the hyphens, i.e.:
> > 
> > guOQGdcOE3IafCm0190XkPZTy5fCEanQ
> > 
> > instead of
> > 
> > guOQGd-cOE3-IafC-m019-0XkP-ZTy5-fCEanQ
> 
> But it already does print it without the hyphens.
> 
> Or did you mean to say "print the UUID *with* the hyphens?"

Sorry, I meant print it with the hyphens, like pvscan does and all of
the other filesystem UUID's returned by blkid.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How were some of the lustre e2fsprogs test cases generated?

2008-02-19 Thread Theodore Tso
On Tue, Feb 19, 2008 at 04:40:32AM -0700, Andreas Dilger wrote:
> 
> No, it hasn't always been true that we cleared the _hi fields in the
> kernel code.  But, it has been a year or more since we found this bug,
> and all CFS e2fsprogs releases since then have cleared the _hi fields,
> and there has not been any other e2fsprogs that supports extents, so
> we expect that there are no filesystems left in the field with this
> issue, and even then the current code will prefer to clear the _hi
> bits instead of considering the whole extent corrupt.
> 

I checked again, and it looks like the interim code is indeed clearing
the _hi bits.  I managed to confuse myself into thinking it didn't for
index nodes, but I checked again and it seems to be doing the right
thing.

The reason why I asked is that the extents code in the 'next' branch
of e2fsprogs *does* consider the whole extent to be corrupt, since in
the long run once we start 64-bit block number extent blocks, if the
physical block number (including the high 16 bits) is greater than
s_blocks_count, simply masking off the high 16 bits of the 48 bit
extent block is probably not the right way of dealing with the
problem.

I think that's probably a safe thing to do since all of your customers
who might have had a filesystem with non-zero _hi fields have almost
certainly run e2fsck to clear the _hi bits at least once; do you
concur that is a safe assumption?  Or would you prefer that I add some
code that tries to clear just the _hi bits, perhaps controlled by a
configuration flag in e2fsck.conf?

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] - detect LVM2 PVs in libblkid

2008-02-19 Thread Theodore Tso
On Wed, Jan 30, 2008 at 05:25:03PM -0600, Eric Sandeen wrote:
> The anaconda folks are now using blkid instead of hand-rolled
> tests for filesystem type at install time, but they had one
> more request:
> 
> Bugzilla Bug 409321: RFE: information on blkdevs "formatted" as PVs
> https://bugzilla.redhat.com/show_bug.cgi?id=409321
> 
> The attached patch does the right thing for me on my sample
> set of exactly 1 PV...
> 
> Any issues with reporting back something which is not actually
> a filesystem ("lvm2pv") ?
> 
> [EMAIL PROTECTED] misc/blkid -c /dev/null /dev/sda2
> /dev/sda2: UUID="guOQGdcOE3IafCm0190XkPZTy5fCEanQ" TYPE="lvm2pv" 
> [EMAIL PROTECTED] pvs -o pv_name,pv_uuid
>   PV PV UUID   
>   /dev/sda2  guOQGd-cOE3-IafC-m019-0XkP-ZTy5-fCEanQ

The patch works for me, but it would seem the right thing would be
print the UUID without the hyphens, i.e.:

guOQGdcOE3IafCm0190XkPZTy5fCEanQ

instead of

guOQGd-cOE3-IafC-m019-0XkP-ZTy5-fCEanQ

After all, pvscan prints it with the hyphens, and blkid returns uuid's
with the hyphens for other filesystems as well.  And removing the
hyphens aren't hard; "tr -d -" will do it.

It's not that hard to convert from one to the other; this will do it:

for (i=0, b=1, p=uuid, q=label->pv_uuid; i <= 32; i++, b = b << 1) {
if (b & 0x440)
*p++ = '-';
*p++ = *q++;
}

This shouldn't screw up the anaconda folks, since they just need to
know the filesystem type, right?  That's all vol_id seems to return in
any case.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-18 Thread Theodore Tso
The big news here is that extents branch is sufficiently finished that
I've promoted it to the next branch.

- Ted

Here are the topics that have been cooking.  Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'.  The topics list the commits in reverse chronological
order.

* tt/64bit-bitmaps (Mon Feb 18 23:54:53 2008 -0500) 1 commit
 - Initial design for 64-bit bitmaps
* next (Mon Feb 18 23:01:43 2008 -0500) 3 commits 
 + e2fsprogs-tests-f_ea_checks.patch
 + e2fsprogs-tests-f_unsorted_EAs.patch
 + Improve support for in-inode EA's
* tt/extents (Mon Feb 18 20:11:07 2008 -0500) 10 commits
 + Activate basic f_extents test case
 + e2fsck: Add support for extents
 + Add read/only support for extents to ext2fs_bmap()
 + Use BLOCK_FLAG_READ_ONLY flag in debugfs, e2image, and tune2fs
 + Add read-only extents support to ext2fs_block_iterate2()
 + Add support for extents to libext2fs
 + e2fsck: factor out code to clear an inode into
   e2fsck_clear_inode()
 + Don't byte swap extents information in the inode
 + Allow debugfs to be extended for use by test programs
 + debugfs: Fix error handling in strtoblk() and
   common_block_args_process()
* js/flex-bg (Wed Feb 13 20:47:50 2008 -0600) 1 commit
 - e2fsprogs: New bitmap and inode table allocation for FLEX_BG v2
* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 13 commits
 - Add m_uninit test case.
 - Add new mm_lazy test case.
 - Fix test cases.
 - Update uninit block group documetation for some of the utilities.
 - Make e2fsck uninit block group aware.
 - Make debugfs uninit block group aware.
 - Make resize2fs uninit block group aware.
 - Make dumpe2fs uninit block group aware.
 - Make tune2fs uninit block group aware.
 - Add support for creating filesystems using uninit block group.
 - Rename feature name from gdt_checksum to uninit_groups.
 - Add uninit block group support on libe2fs.
 - Add initial checksum support.
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
 - e2fsprogs: Add test case for undoe2fs
 - e2fsprogs: Fix the resize inode test case
 - e2fsprogs: Support for large inode migration.
 - e2fsprogs: Make mke2fs use undo I/O manager.
 - e2fsprogs: Add undoe2fs
 - e2fsprogs: Add undo I/O manager
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's in e2fsprogs.git (stable)

2008-02-18 Thread Theodore Tso
* The 'maint' branch has these fixes since the last announcement.

Christophe GRENIER (1):
  Add portability checks to support DJGPP

Eric Sandeen (1):
  print "mostly-printable" xattr strings in debugfs

Theodore Ts'o (9):
  Update Czech, Dutch, Polish, Sweedish, and Vietnamese translations
  libuuid: use fcntl locking instead of lockf
  uuidd: Fix pid file so it has the correct pid number
  configure.in: Don't use the dc command unless it is required
  Make fsck ignore mounted filesystems if given the -M option
  Update to latest samba tdb library before LGPLv3 change
  debugfs: Add new superblock fields to the set_super_value command
  Add support for setting RAID stride and strip-width via mke2fs and
  tune2fs
  libe2p: Make list_super2() print the RAID stride and stripe-width


* The 'master' branch has these since the last announcement
  in addition to the above.

Theodore Ts'o (1):
  libext2: Add BLOCK_FLAG_READ_ONLY flag to ext2fs_block_iterate2()


URL's for the e2fsprogs repository:
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://www.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] print "mostly-printable" xattr strings in debugfs

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 10:35:37AM -0600, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Taking a cue from getfattr... if a string is "mostly"
> > printable characters, go ahead & print as a string,
> > and escape what's left over.
> 
> Ted, ping on this?  Since we use selinux a lot, it'd be nice to have
> something readable in debugfs output.

Thanks for reminding me.  It's been applied into the maint branch.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How were some of the lustre e2fsprogs test cases generated?

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 07:06:58PM -0500, Theodore Ts'o wrote:
> 
> The clusterfs e2fsprogs code doesn't notice this, because it apparently
> ignores ee_start_hi field entirely.  

One minor correction --- the clusterfs e2fsprogs extents code checks
to see if the ee_leaf_hi field is non-zero, and complains if so.
However, it ignores the ee_start_hi field for interior (non-leaf)
nodes in the extent tree, and a number of tests do have non-zero
ee_start_hi fields which cause my version of e2fsprogs to (rightly)
complain.

If you fix this, a whole bunch of tests will fail as a result, and not
exercise the code paths that the tests were apparently trying to
exercise.  Which is what is causing me a bit of worry and wonder about
how those test cases were originally generated

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][7/28] e2fsprogs-extents.patch

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 02:48:18PM -0600, Eric Sandeen wrote:
> > I think we need to get kernel patches into mainline ASAP not to set
> > the EXTENTS_FL 
> 
> You mean on devices/fifos/sockets ?  Ok.

Yes, sorry for not being explicit.

> 
> But today, with 2.6.25-rc1 and e2fsprogs-interim, long (non-fast)
> symlinks get clobbered by e2fsck, because:
> 
> Pass 1: Checking inodes, blocks, and sizes
> Inode 12 has EXTENT_FL set, but is not in extents format
> Fix? yes

Yeah, my current development branch of e2fsprogs does the right thing,
but e2fsprogs-interim doesn't.  We need to add a test to make sure
ext2fs_inode_has_valid_blocks(inode) before marking the inode bad and
asking the user if the inode should be cleared.

> and *poof* it's gone.  That one concerns me more...  This *should* be in
> extents format, right, even though it's limited to one block...

Well, for symlinks, they are only one block, so there is no reason for
it to be using the extent format.  So storing it as a single block
number makes a lot more sense.  It should just not be setting the
EXTENTS_FL flag. 

> well, if any filetypes are not supposed to have the extents flag set,
> and they're zero-length, I'd say go ahead & clear it, and even complain
> if you like - it's the design intent after all -  I wouldn't worry about
> the noise at this stage.  FWIW, I haven't seen a core dump.  :)

The current pu branch core dumps.  My development branch has at least
that problem fixed.  :-)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][7/28] e2fsprogs-extents.patch

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 11:56:53AM -0600, Eric Sandeen wrote:
> So this trips up on things like sockets, fifos, and block & char nodes.
> 
> Also this is unhappy:
> 
> > @@ -137,7 +141,7 @@ int e2fsck_pass1_check_device_inode(ext2
> >  * If the index flag is set, then this is a bogus
> >  * device/fifo/socket
> >  */
> > -   if (inode->i_flags & EXT2_INDEX_FL)
> > +   if (inode->i_flags & (EXT2_INDEX_FL | EXT4_EXTENTS_FL))
> > return 0;
> 
> Do we really care if these have the extents flag set?  IOW should we
> make sure the kernel doesn't set the flag, or should we make e2fsck not
> care...



I think we need to get kernel patches into mainline ASAP not to set
the EXTENTS_FL --- be conservative in what you send --- and at least
for now, e2fsck needs to accept (and not complain or core dump) if
EXTENTS_FL is set for files where ext2fs_inode_has_valid_blocks()
returns false --- be liberal in what you accept.  

Eventually, after the kernel patches hit mainline, we could change
e2fsck to automatically fix all of these in preen mode, just for
cleanliness sake.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fs/jbd/journal.c: cleanups

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 02:31:40PM +0100, Ingo Molnar wrote:
> i guess this explains what static code metrics already suggest:

Am I right in assuming that code-quality is just a program which runs
checkpatch.pl and measures the number of warnings and calls them
errors?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fs/jbd/journal.c: cleanups

2008-02-18 Thread Theodore Tso
> So please deal with it like most other subsystem maintainers do and stop 
> complaining about "code churn" - nobody but you changes the ext3 
> codebase, it's one of the codebases least affected by general kernel 
> flux, it's an ultimate "leaf" subsystem.

Right, sorry.  I misread the filename; I thought this was against
fs/jbd2, instead of fs/jbd.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fs/jbd/journal.c: cleanups

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 03:12:09PM +0200, Adrian Bunk wrote:
> If me resending this old patch collides with something finally getting a 
> user this part of my patch shouldn't be applied now (but you might get 
> it again in 6 months if it's still unused...).
> 
> But generally such conflicts would become visible if "known development 
> trees that are intended for mainline" were in -mm.

It *has* been in -mm, except for periods when akpm has dropped it due
to conflicts due to the "must have an in-tree user" doctrinaire
attitude due to a conflict with the r/o bind patch.

Did you actually try to do a compile test, or only made sure the patch
would apply?  The patch won't collide at application time, but it
would when you compile it

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] fs/jbd/journal.c: cleanups

2008-02-18 Thread Theodore Tso
On Mon, Feb 18, 2008 at 08:12:29AM +0100, Ingo Molnar wrote:
> > Nack.  I don't object to un-exporting journal_update_superblock(), 
> > because that is pretty internal, but the other functions are intended 
> > specifically for use by code outside of JBD.  For example, the journal 
> > checksum patch for ext3/4 uses journal_set_features() to turn on 
> > features in the JBD superblock.
> > 
> > Similarly, for 64-bit support in ext4 uses journal_set_features() to 
> > set a 64-bit feature flag in the journal superblock.
> 
> that's an invalid excuse for the benefit of out-of-tree forks: reality 
> is that you can export those functions in the "journal checksum patch" 
> just fine. So you cannot 'nack' a sensible patch on that ground and no 
> maintainer does it on that ground. Once you get your stuff upstream, you 
> can re-add the export.

I'm going to NACK it as well.  This kind of code churn where we make
symbols static only to make them non-static again in an existing ext4
tree is exactly the sort of needless code churn that makes patches
start to conflict and where we need different patches depending on
whether it is intended for -mm or linux-next or mainline.

I think we really have gotten WAY to doctrinaire on the if there are
no in-tree users, it MUST be static.  This is exactly the sort of
mindless rules that cause the patch conflicts that have been causing
us so much pain and grief.  In this case, it is an existing symbol
which is already non-static, and for which we have code in a
development tree that will be using it.  In the r/o bind case, it is
the insistence that you can't push an existing patch to expose a new
interface that must be used later in the r/o bind patchset and which
sweeps across all trees changing stuff that causes pain and grief.

In both cases, if we expand "in-tree" development users to include
known development trees that are intended for mainline, it makes all
of our lives MUCH easier.  

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] fs/ext{2,3,4}: Use BUG_ON

2008-02-17 Thread Theodore Tso
On Sun, Feb 17, 2008 at 06:55:06PM +0100, Julia Lawall wrote:
> From: Julia Lawall <[EMAIL PROTECTED]>
> 
> if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
> side-effects to allow a definition of BUG_ON that drops the code completely.

Hi, in the future, please separate ext4 changes from ext2/3.  Thanks!!

   - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] ext2: improve ext2_readdir() return value

2008-02-17 Thread Theodore Tso
On Sun, Feb 17, 2008 at 02:56:13PM +0900, Akinobu Mita wrote:
> This patch improves ext2_readdir() return value for ext2_get_page() failure
> by using the actual result of ext2_get_page().

> - return -EIO;
> + return PTR_ERR(page);

The patch is harmless, and maybe even a good thing from an abstract
point of view, but you realize that EIO is the only thing that
ext2_get_page() can return, right?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] UPDATED: building e2fsprogs-interim

2008-02-15 Thread Theodore Tso
On Fri, Feb 15, 2008 at 01:01:28PM -0600, Eric Sandeen wrote:
> An update; running through the whole fedora build turned up another
> one (csum.o).  Out of curiosity do things like ext2_fs.h (which are not
> generated) really need to be listed as dependencies?

Well, in an RPM build you probably don't need them, no.  

But they are useful in an developer tree because if ext2_fs.h gets
modified, it's good to recompile all source files that include it ---
just to make sure any errors in the header file get caught.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Set directory link count to zero if we fail to create the directory.

2008-02-15 Thread Theodore Tso
I've replaced this patch with the following (see attached).  The
changes to the patch from Aneesh's original one are:

1) Replace the patch description with one that describes what is going
   on at a higher level.  (Always write with a deep sympathy for the
   reader, who will be reading the description as part of "git log".
   So mention that the problem was in ext4_mkdir, and the
   circumstances in which it would occur.)

2) The error handling code was duplicated later in the function
   (although in a slightly more efficient fashion; since we *know* the
   newly created inode has an inode->i_nlink of 1, it's easier just to
   set i_nlink to zero)

3) Replace the inode->i_nlink = 0 with the higher level equivalent
   inline function, clear_nlink(inode).

- Ted

ext4: Don't leave behind a half-created inode if ext4_mkdir() fails

From: "Aneesh Kumar K.V" <[EMAIL PROTECTED]>

If ext4_mkdir() fails to allocate the initial block for the directory,
don't leave behind a half-created directory inode with the link count
left at one.  This was caused by an inappropriate call to ext4_dec_count().

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a9347fb..7970e81 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1804,12 +1804,8 @@ retry:
inode->i_fop = &ext4_dir_operations;
inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
dir_block = ext4_bread (handle, inode, 0, 1, &err);
-   if (!dir_block) {
-   ext4_dec_count(handle, inode); /* is this nlink == 0? */
-   ext4_mark_inode_dirty(handle, inode);
-   iput (inode);
-   goto out_stop;
-   }
+   if (!dir_block)
+   goto out_clear_inode;
BUFFER_TRACE(dir_block, "get_write_access");
ext4_journal_get_write_access(handle, dir_block);
de = (struct ext4_dir_entry_2 *) dir_block->b_data;
@@ -1832,7 +1828,8 @@ retry:
ext4_mark_inode_dirty(handle, inode);
err = ext4_add_entry (handle, dentry, inode);
if (err) {
-   inode->i_nlink = 0;
+   out_clear_inode:
+   clear_nlink(inode);
ext4_mark_inode_dirty(handle, inode);
iput (inode);
goto out_stop;
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Fix circular locking dependency with fallocate and touch.

2008-02-15 Thread Theodore Tso
On Tue, Feb 12, 2008 at 02:45:35PM +0530, Aneesh Kumar K.V wrote:
> In order to prevent a circular locking dependency when an ext4_create
> operation is racing with an ext4_fallocate, we acquire and release
> i_data_sem for each multiblock request and use i_mutex to
> prevent writes and truncates during the complete fallocate operation.
> 
> 
> ===
> [ INFO: possible circular locking dependency detected ]
> 2.6.25-rc1 #4
> ---
> touch/2347 is trying to acquire lock:
>  (&ei->i_data_sem){}, at: [] ext4_get_blocks_wrap+0x21/0xca
> 
> but task is already holding lock:
>  (jbd2_handle){--..}, at: [] jbd2_journal_start+0xce/0xf0
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> ...

I replaced the patch description with a more succint and to-the-point
changelog message:

   ext4_fallocate() was trying to acquire i_data_sem outside of
   jbd2_start_transaction/jbd2_journal_stop, which violates ext4's locking
   hierarchy.  So we take i_mutex to prevent writes and truncates during
   the complete fallocate operation, and use ext4_get_block_wrap() which
   acquires and releases i_data_sem for each block allocation.

BTW, a good thing for us to do would be to document our current
locking hierarchy: what each lock protects, and in what order locks
should be taken.  

This makes it easier for people to notice problems by reviewing code
(as opposed to trusting that lockdep will turn up problems).  It also
is a good idea so that as we start to do performance testing and start
noticing inappropriate lock hold times, we can look at our locking
hierarchy design and see if it makes sense, and how to change it to
improve performance.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] ext4: Request for journal write access early.

2008-02-15 Thread Theodore Tso
On Thu, Feb 14, 2008 at 04:00:52PM +0530, Aneesh Kumar K.V wrote:
> In ext4_ext_convert_to_initialized before we need to request for journal
> write access before we even modify the extent length.

This isn't a grammatically correct sentence, and it doesn't explain
what is going on.  I rewrote the patch description as follows:

   ext4: Get journal write access before modifying the extent tree

   When the user was writing into an unitialized extent,
   ext4_ext_convert_to_initialize() was not requesting journal write access
   before it started to modify the extent tree.   Fix this oversight.

  - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-10 Thread Theodore Tso
On Sun, Feb 10, 2008 at 11:08:11PM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > I'ts been a while since I've pushed one of these out.  This updates pu
> > against the latest "next" branch, so we can make sure all of the
> > changes that had been folded into "maint" is included on the pu
> > branch.  It also includes more work on the tt/extents patches.
> > Tune2fs and e2image are now functional with extents, and e2fsck has
> > more extents checking added.  Still a bit more work to be done there,
> > and I also have to add code to deal with symlinks that have the
> > EXTENTS_FL flag set.
> > 
> > I've dropped the js/flex-bg branch since it had conflicts with the
> > on-disk bg flags, and it had conflicts with the exsting patches that
> > caused me concern.  Jose, once you regenerate the patch and deal with
> > comments, I'll add it back in.
> > 
> > My TODO list:
> > 
> >finish e2fsck extents support
> >more flexible mke2fs.conf handling
> >EXT4 -- allow mke2fs 8TB without -f
> 
> ext3 should be fine as well, right?  And I think Josef sent a patch...

I think so, yes.  

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2008-02-10 Thread Theodore Tso
I'ts been a while since I've pushed one of these out.  This updates pu
against the latest "next" branch, so we can make sure all of the
changes that had been folded into "maint" is included on the pu
branch.  It also includes more work on the tt/extents patches.
Tune2fs and e2image are now functional with extents, and e2fsck has
more extents checking added.  Still a bit more work to be done there,
and I also have to add code to deal with symlinks that have the
EXTENTS_FL flag set.

I've dropped the js/flex-bg branch since it had conflicts with the
on-disk bg flags, and it had conflicts with the exsting patches that
caused me concern.  Jose, once you regenerate the patch and deal with
comments, I'll add it back in.

My TODO list:

   finish e2fsck extents support
   more flexible mke2fs.conf handling
   EXT4 -- allow mke2fs 8TB without -f
   reserve a few extra resize blocks for small filesystems
   disable metabg/resize combination
   adjust big, medium, small, thresholds
 256 inode size
   flexbg patch from Jose

- Ted

Here are the topics that have been cooking.  Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'.  The topics list the commits in reverse chronological
order.

* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
 - Add m_uninit test case.
 - Add new mm_lazy test case.
 - Fix test cases.
 - Update uninit block group documetation for some of the utilities.
 - Make e2fsck uninit block group aware.
 - Make debugfs uninit block group aware.
 - Make resize2fs uninit block group aware.
 - Make dumpe2fs uninit block group aware.
 - Make tune2fs uninit block group aware.
 - Add support for creating filesystems using uninit block group.
 - Rename feature name from gdt_checksum to uninit_groups.
 - Add uninit block group support on libe2fs.
 - Add initial checksum support.
* tt/64bit-bitmaps (Sun Oct 14 22:51:51 2007 -0400) 2 commits
 - Initial design for 64-bit bitmaps
* tt/extents (Mon Aug 20 21:31:11 2007 -0400) 8 commits
 - e2fsck: Add support for extents
 - Use BLOCK_FLAG_READ_ONLY flag in debugfs, e2image, and tune2fs
 - Add read-only extents support to ext2fs_block_iterate2()
 - Add support for extents to libext2fs
 - e2fsck: factor out code to clear an inode into
   e2fsck_clear_inode()
 - Don't byte swap extents information in the inode
 - Allow debugfs to be extended for use by test programs
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 7 commits
 - e2fsprogs: Add test case for undoe2fs
 - e2fsprogs: Fix the resize inode test case
 - e2fsprogs: Support for large inode migration.
 - e2fsprogs: Make mke2fs use undo I/O manager.
 - e2fsprogs: Add undoe2fs
 - e2fsprogs: Add undo I/O manager
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's in e2fsprogs.git (stable)

2008-02-10 Thread Theodore Tso
Most of the maint branch went into the e2fsprogs 1.40.6 release that
happened this weekend.

- Ted

* The 'maint' branch has these fixes since the last announcement.

Eric Sandeen (2):
  Fix the pathname of /var/lib/uuidd/uuidd.pid in uuidd man page.
  Fix up e2fsprogs.spec file to include a new uuidd package

Pixel (1):
  blkid: have ntfs's UUID compatible with vol_id

Theodore Ts'o (41):
  Add the Meta directory to the .gitignore file
  debian: use '$(MAKE)' instead of 'make' in debian/rules
  Fix profile, checker, and shared-library building on non-Linux
  platforms
  Fix Makefile race so that "make -j3 distclean" works correctly
  Use pkg-config to determine where to find the devmapper library
  debugfs: Change lsdel to use ext2fs_block_iterate2 it can find
  large files
  Add sample python bindings for the uuid library
  If --sbindir, et. al are specified to configure set $root_sbindir,
  et al.
  debian: Add "set -e" to postinst scripts
  debian: Add passwd dependency to libuuid1 and uuid-runtime
  Define helper functions ext2fs_set_i_{u,g}id_high() for MacOS
  compatibility
  resize2fs: Add sanity check for off_t overflow before truncating
  Explicitly check for ftruncate64() in configure.in
  Update Swedish translation file from the Translation Project
  Add support for the test_fs flag
  Teach the blkid library about ext4/ext4dev
  Ignore "safe" flag differences when e2fsck compares superblocks
  Fix minor typo in resize2fs man page
  blkid: Make sure the blocksize in reiserfs is sane
  blkid: Add support for HFS+ detection
  libuuid: Make sure execl() variadic function is properly terminated
  Use lseek() instead of llseek() of sizeof(long) == sizeof(long
  long)
  e2image: Fix potential overflow if the device name is too long
  debian: Don't use dietlibc on platforms that don't support it
  Don't build e2fsck statically by default anymore
  Create new filesystems with 256-byte inodes by default
  Create filesystems with the ext_attr feature by default
  Update release notes, version files for 1.40.5 release
  debian: Fix packaging problem caused by dpkg 1.14.16
  Fix bug in e2fsck which caused it to core dump if
  --enable-jbd-debug is used
  blkid: Automatically chose between ext4 and ext4dev as appropriate
  Allow tune2fs to set and clear the test_fs flag on ext4 filesystems
  blkid: Flush cached filesystem information on any error other than
  EPERM
  Don't try to create $DESTDIR/etc/init.d as part of make install
  blkid: Add support for returning labels for UDF filesystems
  Update e2fsprogs translation template and Vietnamese and Czech
  translations
  Document the BLKID_FILE environment variable in the libblkid man
  page
  libcom_err: Use thread local storage to fix reentrancy problems
  Update release notes, version files for 1.40.6 release
  Fix incorrect distribution name in Debian's Changelog
  Set the C locale in the tests/test_script driver


* The 'master' branch has these since the last announcement
  in addition to the above.

Andreas Dilger (1):
  mke2fs: Make lost+found always have at least 2 blocks

Theodore Ts'o (1):
  debugfs: Add -p option to ls subcommand.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: New bitmap and inode table allocation for FLEX_BG

2008-02-10 Thread Theodore Tso
On Fri, Feb 08, 2008 at 11:37:40AM -0600, Jose R. Santos wrote:
> > >  #define EXT2_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not 
> > > initialized */
> > >  #define EXT2_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not initialized 
> > > */
> > > +#define EXT2_BG_FLEX_METADATA0x0004 /* FLEX_BG block group contains 
> > > meta-data */
> > 
> > Hrm, I thought I had reserved that value in the uninit_groups patch?
> > +#define EXT3_BG_INODE_ZEROED   0x0004  /* On-disk itable initialized to 
> > zero */
> 
> I may have been, I just based the patch on the next branch as Ted had
> ask for new e2fsprog patches.  The uninit group patch was not part of
> the next branch when I pulled.

Yes, but whenever you start reserving code points that impact the
on-disk format, you need to be careful and coordinate.  Exactly is the
purpose of this flag, and why is it here?

And I don't see any patch in the kernel patch queue that uses this
flag.  Is this intended for internal use inside e2fsprogs?  If so,
this might not be the best place for it.

 - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][0/28] Lustre e2fsprogs patch series

2008-02-10 Thread Theodore Tso
On Sat, Feb 02, 2008 at 12:59:43AM -0700, Andreas Dilger wrote:
> The following series of emails will contain the large part of the
> e2fsprogs patch series that is used for Lustre.  It will not contain
> the regression tests for EXTENTS nor the DIR_NLINK features, as those
> are very large and were previously submitted.
> 
> A full tarball that includes the patches, series, and regression tests
> will be uploaded to ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/

Hey Andreas,

I've applied these patches to the tip of "maint", and exported it as
"e2fsprogs-interim" on the e2fsprogs git repository.  There quite a
few patch conflicts, mostly due to some changes that had happened on
the tip of maint, but also apparently because your patchset was
missing the flex bg changes.  I haven't applied them yet, but I'll
probably tack them at the end.

If you could sanity check to make sure they are sane, I would
appreciate it.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4_avoid_panic_in_case_of_corrupted_bitmap.patch

2008-02-10 Thread Theodore Tso
On Sat, Feb 09, 2008 at 07:35:03AM -0800, Aneesh Kumar wrote:
> I guess we pushed an old version of this patch. I had set in -ver2.
> with better comments.
> 
> http://article.gmane.org/gmane.comp.file-systems.ext4/4956
> 

Sorry, I saw your note, but I missed it when I was doing prepping the
branch to for the git pull request.

We can prepare a comments-only patch which we can merge either before
-rc2 if Linus is willing, or worst case keep it in the ext4 patch
queue until the next merge window.

Again, sorry for missing the updated patch!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: move headers out of include/linux

2008-02-10 Thread Theodore Tso
On Sun, Feb 10, 2008 at 07:52:22AM +0100, Christoph Hellwig wrote:
> On Sat, Feb 09, 2008 at 10:39:33AM +0100, Christoph Hellwig wrote:
> > Move ext4 headers out of include/linux.  This is just the trivial move,
> > there's some more thing that could be done later.
> > 
> > Ted, is anything of these shared with e2fsprogs or can we rip out all
> > that #ifdef __KERNEL__ junk?

No, none of this is shared with e2fsprogs; e2fsprogs stopped using the
kernel header files about seven years ago. (May 2001, e2fsprogs 1.20).

> > Note that I plan to submit similar patches for ext2 and ext3 aswell,
> > so the diverging from them argument doesn't count.

There might be other programs like grub that may depend upon ext2_fs.h
or ext3_fs.h Nope, not grub.  So a few
things might break, but they are all programs that should have been
using the libraries shipped with e2fsprogs, and they wouldn't be 
critical programs.  So no problems that I know of.

> Looks like the patch is to big for vger.  Here's a link instead:
> 
> http://verein.lst.de/~hch/ext4-move-headers

Note Linus just accepted a pull from me (although it just missed the
-git21 snapshot window ---  it would be nice if that happend at
3am Pacific instead of midnight Pacific since very occasionally it's a
little after midnight before Linus pushes his last set of changes to
master.kernel.org) so this patch won't apply cleanly to the
Linus's tip.  I'll take the patch DTRT so it can be placed in the ext4
tree.

Also, on the git list, Linus mentioned the -rc1 merge window would be
closing soon, though, so I don't know if this will make 2.6.25.  If it
doesn't, would you mind terribly if we put this on hold and *not* have
this in the -mm tree until right before the next merge window opens.
It's mostly a mechnical change, doesn't need much testing --- and it
complicates patch management which I know has been making Andrew a bit
grumpy as of late.  (Some people submit patches against Linus's tip,
and they are sweeping changes that touch files in multiple maintainer
trees, like the iget() removal patch or the r/o bind patch, and life
gets hard.)

I think the general idea is good; it's just a matter of timing.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, E2FSPROGS] blkid: Automatically chose between ext4 and ext4dev as appropriate

2008-02-08 Thread Theodore Tso
On Fri, Feb 08, 2008 at 04:50:37PM -0600, Eric Sandeen wrote:
> Theodore Ts'o wrote:
> > I plan to release this in an upcoming 1.40.6 release, along with code
> > that allows "tune2fs -E test_fs" to work on ext4 filesystems.  The idea
> > is that if I can get this into community distro's, it will make it
> > easier for them to experiment with ext4/ext4dev, with the appropriate
> > forward compatibility when sometime (hopefully 2.6.26 or 2.6.27?) we
> > rename ext4dev to ext4.
> > 
> > Any comments before I push this change out to the maint branch?
> 
> Seems logically correct to me... although it also seems like this has
> gotten fairly complex by now, what with blkid rummaging around in
> modules.dep, /proc files, etc...

True, but the good thing is that we only need to do this in one
location, and it does the right thing.  It also means that in the
future, even after ext4 is in production, if we want to do some
testing for some new optimizations or other ehancements in ext4, it's
a lot easier for us have both ext4 and ext4dev modules built at the
same time, where ext4dev would have our test code, and then once it's
stable and we're sure it's OK, we can commit it to mainline.  This
makes life much more convenient for an ext4 developer who wants to
both use it in production as well as having an alternate "crash and
burn" ext4dev code that he or she is trying out.

So one of the things that this does mean, is after we do the
ext4dev->ext4 transition, we have some way for users to give us
permission to disable the test_fs bit --- so that in the future, when
we have need of that bit, we can use it again and not have to worry
about old filesystems that are now in production but still have the
test_fs bit set.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] fsck: add support for the fsprobe library wrapper

2008-02-07 Thread Theodore Tso
On Wed, Feb 06, 2008 at 12:39:02PM +0100, Matthias Koenig wrote:
> Allows fsck to work with blkid or volume_id as defined
> by the configure invocation.
> 
> Signed-off-by: Matthias Koenig <[EMAIL PROTECTED]>

The big thing here is there needs to be a BIG FAT WARNING that if you
want ext4 to work, it's best to make sure fsck is built against the
very latest blkid library.

I was thinking about using opensuse for more of testing, based on
Markus's encouragement, but given that it's using vol_id, and I need
to make some very specific changes so that mount uses the right
filesystem for ext4 filesystems (especially since the kernel will be
changing the filesystem name from ext4dev to ext4 at some point in the
next few months), and opensuse uses vol_id for mount, that became a
show-stopper for me.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Move fsck from e2fsprogs to util-linux-ng

2008-02-07 Thread Theodore Tso
On Thu, Feb 07, 2008 at 01:37:43AM +0100, Kay Sievers wrote:
> 
> Care to explain what ext4 development has to do with the generic
> fsck program?  I don''t see any convincing reason not to move that
> now.

Fsck and mount (and especially mount) needs to be linked against the
very latest blkid library in order to make sure things work, and when
we spread out packages across the various different packages, it means
updating things to assure correct functionality is that much harder.

It also means getting the right ext4 detection routines into vol_id,
and in the past I've had real problems getting patches that fix real
bugs in vol_id past you, and I'm just too busy right now to deal with
that right now.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Move fsck from e2fsprogs to util-linux-ng

2008-02-06 Thread Theodore Tso
On Wed, Feb 06, 2008 at 01:56:51PM +0100, Karel Zak wrote:
> It would be nice to add a note
> 
> "fsck is deprecated in favor of fsck from util-linux-ng"
> 
> to RELEASE-NOTES in e2fsprogs.

I don't want to deprecate fsck quite yet from e2fsprogs, because there
are some ext4 issues with fsck (or more accurately, fsck's use of
vol_id instead of blkid), where it is more convenient if at least for
distros that are being aggressive with ext4 testing, if we don't try
to move fsck just now.

I'm working with Eric Sandeen from Red Hat to make sure that we can
get usable ext4 support for FC9 in rawhide, and making sure that
vol_id is doing the right thing with ext4 support is one more moving
part that I'd rather not have to coordinate with.  So it's going to be
much simpler for now if we keep fsck in e2fsprogs, and what I'm
telling people who want to play with ext4 is that life will be much
simpler if they use the latest e2fsprogs and build mount so that it
uses blkid, and NOT vol_id.

Kay and I have had philosophical differences about how hard to try to
make in-line detection of filesystems work, and when I've tried to
submit patches that fine-tune filesystem id detection, by rejiggering
the order in which various filesystems types are tried (based on how
thoroughly their mkfs commands scrub the disk to remove false
positives for previous filesystems on the disk), they've been rejected
by Kay.

So these days, while I'll take any patches to improve blkid's
filesystem detection, I've largely stopped trying to send patches in
the other direction.  And there are some rather delicate tests that
have been encoded into blkid to coordinate with ext4 and ext4dev
(since at some point when the code is production ready we will be
changing the filesystem name from ext4dev to ext4), it's just much
simpler if I tell people that in order to support ext4, the only
approach I've validated uses fsck from e2fsprogs, and mount compiled
to use libblkid.

I'd love to try to work on merging the functionality between vol_id
and blkid, and at some point that's great, but I just don't have the
time right now to try to coordinate yet another moving part.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replace iget with iget_locked in defrag-free-space-fragementation

2008-02-05 Thread Theodore Tso
On Tue, Feb 05, 2008 at 04:55:52PM -0800, Mingming Cao wrote:
> On Sat, 2008-02-02 at 11:40 -0500, Theodore Ts'o wrote:
> > We also have an issue where the read-only bind patches from Dave Hansen
> > are making changes which conflict with the ext4 patch tree, which is
> > making akpm very grumpy.  I'll try to take a look at this later in the
> > weekend...
> > 
> > - Ted
> 
> BTW, Ted, could you clarify about this one? I am able to build the patch
> queue on akpm's 2.6.24-mm1 tree with Dave Hansen's read-only bind
> changes. Maybe I missed something.

That's because I've already removed the BKL patches that conflict.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Handle unusual results from find_tempdir.

2008-02-05 Thread Theodore Tso
On Tue, Feb 05, 2008 at 01:13:04PM +0100, Jim Meyering wrote:
> An alternative: make find_tempdir set tempdir to default_tempdir
> upon malloc failure.
> 
> * arch/um/os-Linux/mem.c (make_tempfile): Handle NULL tempdir.
> Don't let a long tempdir (e.g., via TMPDIR) provoke heap corruption.

This wasn't meant for the linux-ext4 list, was it?  I think maybe you
typo'ed the mailing list this patch was supposed to be sent to?

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Mon, Feb 04, 2008 at 12:11:31PM -0800, Andrew Morton wrote:
> 
> That patch series is kind of logjammed anyway because it breaks isofs. 
> Last time I discussed this with David he seemed to find this amusing rather
> than an urgent problem.  I'd drop the whole lot if there weren't lots of
> other patches dependent upon them.  Mayeb I can do a selective droppage,
> but I hate going out-of-order, and merging untested patch combinations.
> 

All of the patches which move from using iget() to iget_unlocked()
should in theory (if they are well written) be standalone and not have
any prerequisites, and a good thing to apply in and of themselves.  

So maybe the right thing to do is to push all of them right away, and
then if they all can go in safely w/o breaking filesystems, we can
actually do the deed about removing the iget() interface itself, and
if not, it can wait until 2.6.26.  It's not really that big of a deal
to actually nuke the interface itself, as long as we are gradually
reducing our usage of it  Or is there some patch which David is
trying to push that desperately depends on iget() going away?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-04 Thread Theodore Tso
On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
> On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso <[EMAIL PROTECTED]> wrote:
> 
> > On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> > > When I merge David's iget coversion patches this will instead wreck the
> > > ext4 patchset.
> > 
> > That's ok, it shouldn't be hard for me to fix this up.  How quickly
> > will you be able to merge David's iget converstion patches?
> 
> They're about 1,000 patches back.

OK, if you're not planning on pushing David's changes to Linus right
away, what if I pull in David's

iget-stop-ext4-from-using-iget-and-read_inode-try.patch 

and push it plus some other ext4 bug fixes directly to Linus, and let
you know when that has happened so you can drop David's patch from
your queue?

David's changes to ext4 can be applied standalone without the rest of
his series, so it would be safe to push that to Linus independently
and in advance of the rest of his series.  That should also help
reduce the number of inter-patch queue dependencies.

Regards,

  - Ted


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-03 Thread Theodore Tso
On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
> When I merge David's iget coversion patches this will instead wreck the
> ext4 patchset.

That's ok, it shouldn't be hard for me to fix this up.  How quickly
will you be able to merge David's iget converstion patches?  Could
this get pushed to Linus, say, tomorrow?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Replace use of iget() with iget_locked()

2008-02-02 Thread Theodore Tso
On Sat, Feb 02, 2008 at 10:55:24AM -0500, Theodore Ts'o wrote:
> In the mm tree is a patch queued up to nuke iget().  So replace use of
> iget() with iget_locked().  I will be pushing this to Linus shortly.

Oops, wrong version of the patch; this is the correct one.

   - Ted

ext4: Replace use of iget() with iget_locked()

Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

---
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 575b521..d45fcaa 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -805,9 +805,17 @@ struct inode *ext4_orphan_get(struct super_block *sb, 
unsigned long ino)
 * is a valid orphan (no e2fsck run on fs).  Orphans also include
 * inodes that were being truncated, so we can't check i_nlink==0.
 */
-   if (!ext4_test_bit(bit, bitmap_bh->b_data) ||
-   !(inode = iget(sb, ino)) || is_bad_inode(inode) ||
-   NEXT_ORPHAN(inode) > max_ino) {
+   if (!ext4_test_bit(bit, bitmap_bh->b_data))
+   goto bad_orphan_inode;
+   inode = iget_locked(sb, ino);
+   if (!inode)
+   goto bad_orphan_inode;
+   if (inode->i_state & I_NEW) {
+   sb->s_op->read_inode(inode);
+   unlock_new_inode(inode);
+   }
+   if (is_bad_inode(inode) || NEXT_ORPHAN(inode) > max_ino) {
+   bad_orphan_inode:
ext4_warning(sb, __FUNCTION__,
 "bad orphan inode %lu!  e2fsck was run?", ino);
printk(KERN_NOTICE "ext4_test_bit(bit=%d, block=%llu) = %d\n",
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 67b6d8a..57dd8fb 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1041,11 +1041,16 @@ static struct dentry *ext4_lookup(struct inode * dir, 
struct dentry *dentry, str
   "bad inode number: %lu", ino);
inode = NULL;
} else
-   inode = iget(dir->i_sb, ino);
+   inode = iget_locked(dir->i_sb, ino);
 
if (!inode)
return ERR_PTR(-EACCES);
 
+   if (inode->i_state & I_NEW) {
+   inode->i_sb->s_op->read_inode(inode);
+   unlock_new_inode(inode);
+   }
+
if (is_bad_inode(inode)) {
iput(inode);
return ERR_PTR(-ENOENT);
@@ -1080,11 +1085,16 @@ struct dentry *ext4_get_parent(struct dentry *child)
   "bad inode number: %lu", ino);
inode = NULL;
} else
-   inode = iget(child->d_inode->i_sb, ino);
+   inode = iget_locked(child->d_inode->i_sb, ino);
 
if (!inode)
return ERR_PTR(-EACCES);
 
+   if (inode->i_state & I_NEW) {
+   inode->i_sb->s_op->read_inode(inode);
+   unlock_new_inode(inode);
+   }
+
if (is_bad_inode(inode)) {
iput(inode);
return ERR_PTR(-ENOENT);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4fbba60..ebdca31 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -779,7 +779,11 @@ int ext4_group_add(struct super_block *sb, struct 
ext4_new_group_data *input)
 "No reserved GDT blocks, can't resize");
return -EPERM;
}
-   inode = iget(sb, EXT4_RESIZE_INO);
+   inode = iget_locked(sb, EXT4_RESIZE_INO);
+   if (inode && (inode->i_state & I_NEW)) {
+   sb->s_op->read_inode(inode);
+   unlock_new_inode(inode);
+   }
if (!inode || is_bad_inode(inode)) {
ext4_warning(sb, __FUNCTION__,
 "Error opening resize inode");
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 055a0cd..1ef0359 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -777,9 +777,13 @@ static struct inode *ext4_nfs_get_inode(struct super_block 
*sb,
 * Currently we don't know the generation for parent directory, so
 * a generation of 0 means "accept any"
 */
-   inode = iget(sb, ino);
+   inode = iget_locked(sb, ino);
if (inode == NULL)
return ERR_PTR(-ENOMEM);
+   if (inode->i_state & I_NEW) {
+   sb->s_op->read_inode(inode);
+   unlock_new_inode(inode);
+   }
if (is_bad_inode(inode) ||
(generation && inode->i_generation != generation)) {
iput(inode);
@@ -2243,7 +2247,15 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
 * so we can safely mount the rest of the filesystem now.
 */
 
-   root = iget(sb, EXT4_ROOT_INO);
+   root = iget_locked(sb, EXT4_ROOT_INO);
+   if (!root) {
+   printk(KERN_ERR "EXT4-fs: iget_locked for root inode 

Re: [PATCH, RFC] Add new "development flag" to the ext4 filesystem

2008-01-30 Thread Theodore Tso
On Wed, Jan 30, 2008 at 11:26:20PM +0100, supersud501 wrote:
>
> how can i set this "flag" on my filesystem? i've never set any flags before 
> so i just removed the code from ext4-module to mount my filesystems, but 
> setting the flag once would make it easier i think :)

If you have e2fsprogs 1.40.5, you set at mke2fs time via "mke2fs -E
test_fs ...".  If you haven't mounted the filesystem using ext4 yet,
you can also convert an existing filesystem using "tune2fs -E test_fs
/dev/hdXXX".  At least with e2fsprogs 1.40.5, it won't work with a
filesystem with extents enabled because tune2fs will refuse to touch
such a filesystem.  (And you don't want to use e2fsprogs 1.40.5 with
ext4 anyway, since e2fsck in 1.40.5 doesn't understand extents yet.)

You can manually set the flag using debugfs:

# debugfs -w /dev/hdXX
debugfs: set_super_value s_flags 4
debugfs: quit

(Or, if you're lazy, you can also type the shorthand "ssv flags 4" to
debugfs.)

With e2fsprogs 1.40.5, if the test_fs flag is set, then blkid will
return a filesystem type ext4dev, such that if your mount is
configured to use blkid (instead of vol_id), it will automatically use
ext4dev to mount a filesystem that has (a) a journal (since ext4
currently requires a journal), and (b) the test_fs flag set.

This makes it easier for people who are testing ext4, since they will
just be able to do something like:

  # mke2fs -j -E test_fs /dev/mapper/lvmset-test
  # mount /dev/mapper/lvmset-test

.. and /dev/mapper/lvmset-test will automatically be mounted using
ext4.

 - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] ext4 update

2008-01-29 Thread Theodore Tso
On Tue, Jan 29, 2008 at 10:54:03PM +0100, Jan Engelhardt wrote:
> 
> On Jan 29 2008 07:53, Theodore Tso wrote:
> >
> >>fwiw, diffstat is confused by git's diff output; you need to use
> >>'diffstat -p1'
> 
> I am seeing normal behavior:
>
> 22:52 sovereign:~/linux > git diff HEAD | diffstat

That's because you are doing a diff stat of changes that haven't been
checked in yet.  I was doing a "git log -p origin.. | diffstat -p1",
and in that incantation you definitely do need the -p1 to diffstat.

  - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allow in-inode EAs on ext4 root inode

2008-01-29 Thread Theodore Tso
Thanks, I've included this into the ext4 patch queue and into the
2.6.24-git6-ext4-1 tree.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [0/7] Drop BKL in ext[234] ioctls

2008-01-29 Thread Theodore Tso
On Mon, Jan 28, 2008 at 11:16:09AM +0100, Andi Kleen wrote:
> 
> Remove the BKL from the ext* ioctls.
> 
> This is a slightly updated version of the ext[2-4] patches that hit
> linux-fsdevel earlier.

Thanks, I've included these patches into the ext4 tree, and should
hopefully be pulled into the next -mm tree.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-29 Thread Theodore Tso
On Tue, Jan 29, 2008 at 02:52:10PM +0100, Matthias Koenig wrote:
> > Are you running aclocal?  That may be causing the issues since I think
> > that may be causing it to ignore the autoconf macros I've established
> > in the top-level aclocal.m4 file.
> 
> Yes, someone here fixed the german po file (I already sent it to
> translation project). So we have to rerun the
> gettext related part, which results in the requirement to rerun aclocal.

Hmm?  You should be able to rebuild the .gmo file without needing to
re-run aclocal.  I do it all the time.  :-)

> > The problem is that the FHS does not guarantee that the subdirectories
> > stick around, and a number of distro's are using mounting tmpfs for
> > /var/run.  Makes sense, it can significantly speed up operations ---
> > but upon reboot, everything in /var/run is *gone*.
> 
> I see, then there is a problem of course. This is not the way it is
> done in Suse, so I did not see this problem.
> But, shouldn't the daemon then have an rc init script which checks for
> the needed subdirectory in /var/run and creates this if necessary?  If I
> look into /var/run I see a lot of daemons running with their own UID/GID
> and having their own directory. In this case there must be a mechanism
> to guarantee the existence of this directory, which would obviously be
> done in the init script (not sure if any init scripts currently do this
> in Suse).

Well, it either needs to be done in the init script or the daemons run
with root privs, create the directory, and then drop root privs.  So
yes, there are alternatives.  (a) Yet Another init script, (b) setuid
root, with a hard-coded user name that has to be looked up via
getpwent() and configured into /etc/passwd or the LDAP server, (c)
just put the darned directory in /var/lib.

It's all about tradeoffs, and for me, (c) was the easist, especially
since /var/lib/libuuid/clock.txt needs to be persistent across boots
anyway, and so needs to be in /var/lib instead of /var/run.  So yes, I
could have created some machinery to make sure /var/run/uuidd is
always present, but why not just reuse the already existing
/var/lib/libuuid directory, which has the correct ownership and which
has to be in /var/lib anyway?

> > and I still haven't figured out how to easily build my own
> > custom kernel RPM's on OpenSUSE, but it's quite nice.
> 
> unfortunately we don't have make-kpkg.

Yeah, and the RPM macros for generating kernel are incredibly twisted.
At least they were for RHEL4, when I was figuring out how to generate
a real-time kernel rpm.  Serious temptations to gouge out my eyeballs
with a spoon by the time I was done

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] ext4 update

2008-01-29 Thread Theodore Tso
>fwiw, diffstat is confused by git's diff output; you need to use
>'diffstat -p1'

Argh, I have *got* to create a script that does this automatically.

Revised diffstat -p1 output follows...

- Ted

 Documentation/filesystems/ext4.txt   |   20 
 Documentation/filesystems/proc.txt   |   39 
 fs/Kconfig   |1 
 fs/afs/dir.c |9 
 fs/afs/inode.c   |3 
 fs/buffer.c  |   44 
 fs/ext2/super.c  |   32 
 fs/ext3/super.c  |   32 
 fs/ext4/Makefile |4 
 fs/ext4/balloc.c |  251 +
 fs/ext4/dir.c|   14 
 fs/ext4/extents.c|  525 +--
 fs/ext4/file.c   |   23 
 fs/ext4/group.h  |8 
 fs/ext4/ialloc.c |  161 
 fs/ext4/inode.c  |  396 +-
 fs/ext4/ioctl.c  |7 
 fs/ext4/mballoc.c| 4552 +++
 fs/ext4/migrate.c|  570 +++
 fs/ext4/namei.c  |  135 
 fs/ext4/resize.c |   28 
 fs/ext4/super.c  |  389 +-
 fs/ext4/xattr.c  |4 
 fs/inode.c   |   39 
 fs/jbd2/checkpoint.c |   22 
 fs/jbd2/commit.c |  255 +
 fs/jbd2/journal.c|  368 ++
 fs/jbd2/recovery.c   |  151 
 fs/jbd2/revoke.c |6 
 fs/jbd2/transaction.c|   34 
 fs/read_write.c  |1 
 include/asm-arm/bitops.h |2 
 include/asm-generic/bitops/ext2-non-atomic.h |2 
 include/asm-generic/bitops/le.h  |4 
 include/asm-m68k/bitops.h|2 
 include/asm-m68knommu/bitops.h   |2 
 include/asm-powerpc/bitops.h |4 
 include/asm-s390/bitops.h|2 
 include/linux/buffer_head.h  |2 
 include/linux/ext4_fs.h  |  224 +
 include/linux/ext4_fs_extents.h  |   25 
 include/linux/ext4_fs_i.h|   25 
 include/linux/ext4_fs_sb.h   |   55 
 include/linux/fs.h   |   21 
 include/linux/jbd2.h |  135 
 lib/find_next_bit.c  |   43 
 46 files changed, 7773 insertions(+), 898 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops, incorrect tag: v1.40.5 in e2fsprogs git repository

2008-01-28 Thread Theodore Tso
On Tue, Jan 29, 2008 at 01:43:31AM +0100, Petr Baudis wrote:
>   Done, though it should be doable from your side as well by something
> like
>   git push reporepo :refs/tags/v1.40.5
> 

Wow, I thought that only worked on branches, but you're right, that
works.  Thanks!!!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-28 Thread Theodore Tso
On Mon, Jan 28, 2008 at 07:30:05PM +, Pavel Machek wrote:
> 
> As user pages are always in highmem, this should be easy to decide:
> only send SIGDANGER when highmem is full. (Yes, there are
> inodes/dentries/file descriptors in lowmem, but I doubt apps will
> respond to SIGDANGER by closing files).

Good point; for a system with at least (say) 2GB of memory, that
definitely makes sense.  For a system with less than 768 megs of
memory (how quaint, but it wasn't that long ago this was a lot of
memory :-), there wouldn't *be* any memory in highmem at all

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Oops, incorrect tag: v1.40.5 in e2fsprogs git repository

2008-01-28 Thread Theodore Tso
Hi,

I screwed up, and incorrectly tagged v1.40.5 in the e2fsprogs git
repository.  I accidentally tagged the "next" branch instead of the
"maint" branch, so the the wrong commit was tagged as v1.40.5.  Argh...

This has been fixed on git.kernel.org, but it's still busted on
repo.or.cz.  

If you've pulled from the e2fsprogs git repository incorrectly, you'll
need to run the command:

 git tag -d v1.40.5

... and then re-pull from the git repository to get the correct tag.

Petr, if you could run the same command on the e2fsprogs repository on
repo.or.cz, it would be much appreciated!!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Oops, incorrect tag: v1.40.5 in e2fsprogs git repository

2008-01-28 Thread Theodore Tso
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I screwed up, and incorrectly tagged v1.40.5 in the e2fsprogs git
repository.  I accidentally tagged the "next" branch instead of the
"maint" branch, so the the wrong commit was tagged as v1.40.5.  Argh...

This has been fixed on git.kernel.org, but it's still busted on
repo.or.cz.  

If you've pulled from the e2fsprogs git repository incorrectly, you'll
need to run the command:

 git tag -d v1.40.5

... and then re-pull from the git repository to get the correct tag.

Petr, if you could run the same command on the e2fsprogs repository on
repo.or.cz, it would be much appreciated!!

- Ted
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHnhyr7To545NnTEARAr3gAKD1ikMgB74Qw+65fdZRn0kaEXry8ACcDIva
bEVF2ZqTSaaf5hOE3hmfZyw=
=wRDe
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-28 Thread Theodore Tso
On Mon, Jan 28, 2008 at 05:06:47PM +0100, Thierry Vignaud wrote:
> Worse, http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=summary hasn't
> any tag for neither 1.40.5 nor 1.40.6 in primary branch (only maint
> shows 1.40.5)

Sorry, I forget to use the --tags option when I pushed the patches to
the git tree.  This was fixed about an hour ago.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-28 Thread Theodore Tso
On Mon, Jan 28, 2008 at 05:01:36PM +0100, Thierry Vignaud wrote:
> Theodore Tso <[EMAIL PROTECTED]> writes:
> 
> > In any case, I've just released e2fsprogs 1.40.6.

Oops, typo, that should have read 1.40.5.

> 
> which prompts me to note that it's neither on
> ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.13 nor on
> http://e2fsprogs.sourceforge.net/

I don't know why people would think it's at
/pub/linux/utils/util-linux-ng, but it is at e2fsprogs.sf.net, and
I've just uploaded 1.40.5 to /pub/linux/kernel/people/tytso/e2fsprogs
on master.kernel.org, so it should be at ftp.kernel.org very shortly.
(Sorry, didn't get to that last night.)

> What is supposed to be the current primary source?

Well, for the absolutely *latest*, the git trees are at:

git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=summary
or
git://repo.or.cz/e2fsprogs.git
http://repo.or.cz/w/e2fsprogs.git

For the release tarballs, they can be downloaded at e2fsprogs's
SourceForge page, or at

ftp.kernel.org:/pub/linux/kernel/people/tytso/e2fsprogs

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-28 Thread Theodore Tso
On Mon, Jan 28, 2008 at 04:26:53PM +0100, Matthias Koenig wrote:
> > Patch6: e2fsprogs-mdraid.patch
> >
> > This apparently adds a new environment variable,
> > BLKID_SKIP_CHECK_MDRAID, which forces blkid to not detect mdraid
> > devices.  I'm not sure why.
> 
> Workaround for people having stale RAID signature on their disk:
> https://bugzilla.novell.com/show_bug.cgi?id=100530

Hmm... there's got to be a better way around this.

I'm not authorized to look at that BZ entry.  Is there any way you can
make my "tytso" account have access to any e2fsprogs related BZ
entries?  Thanks!

> > Patch12:e2fsprogs-mkinstalldirs.patch
> >
> > Why?
> 
> Is needed since we recreate the auto* files.
> But I agree that this patch should better set
> MKINSTALLDIRS = @MKDIR_P@
> not to literal "mkdir -p". The @MKINSTALLDIRS@ seems to be
> obsolete in newer gettext (which seems to pull this in).

Are you running aclocal?  That may be causing the issues since I think
that may be causing it to ignore the autoconf macros I've established
in the top-level aclocal.m4 file.

> > Patch22:e2fsprogs-1.40.4-uuidd_pid_path.patch
> >
> > The problem with this patch is that /var/run is cleared via rm -rf, so
> > it is highly problamtic to put the scratch directory for uuidd in
> > /var/run.
> 
> Are you really sure? My interpretation of FHS is, that files under
> /var/run/ have to be cleared or truncated, but the subdirectories do not
> get deleted.

The problem is that the FHS does not guarantee that the subdirectories
stick around, and a number of distro's are using mounting tmpfs for
/var/run.  Makes sense, it can significantly speed up operations ---
but upon reboot, everything in /var/run is *gone*.


Please let me know when you have a newer OpenSUSE factory RPM ready
for use, and I'll take a look at the remaining patches and see which
ones are ready for merging upstream.  Markus managed to convince me it
was worthwhile to install the latest OpenSUSE on a laptop so I could
see how things are shaping up, and I have to say I'm pretty impressed.
I haven't updated it to try tracking the latest OpenSUSE factory
RPM's, and I still haven't figured out how to easily build my own
custom kernel RPM's on OpenSUSE, but it's quite nice.

On my "one of these days" list is to get another cheap/used laptop so
I can try out the latest Fedora Core Rawhide without having to fire up
a huge (noisy) x86_64 box

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsprogs: pu branch

2008-01-28 Thread Theodore Tso
On Fri, Jan 25, 2008 at 05:32:59PM +0100, Valerie Clement wrote:
> Hi Ted,
> 
> Here are the problems I found when testing e2fsprogs built in the "pu"
> branch. I've checked that they haven't been fixed in the latest version.
> (git describe => v1.40.3-98-gb6fea68)
> 
> 1- "Corrupt extent header" report from e2fsck
>   ===
> 
> Steps to reproduce it:
> Copy a regular file on the ext4 filesystem and make a symbolic links to it:
> # ln -s foo lnk
> Unmount fs.
> 
> # e2fsck -f /dev/sdc0
> e2fsck 1.40.3 (05-Dec-2007)
> Pass 1: Checking inodes, blocks, and sizes
> Error while reading over extent tree in inode 49105: Corrupt extent header
> Clear inode

This is a kernel-level bug, actually.  The symlink is a "fast symlink"
where the target of the symlink is in inode itself.  The kernel level
code should *not* be setting the EXTENTS_FL flag.  Still, we'll have
to put in some code to work around it for people with current kernel
levels.

> 2- "EXT2 directory corrupted" report from debugfs
>   ===i===
> When trying to debug the previous problem using debugfs: 
> 
> debugfs:  ncheck 49105
> ncheck: EXT2 directory corrupted while calling ext2_dir_iterate
> ncheck: EXT2 directory corrupted while calling ext2_dir_iterate
> ncheck: EXT2 directory corrupted while calling ext2_dir_iterate

Not all parts of the ext2 library have been fixed to understand
extents (in particular, ext2_dir_iterate).  On the todo list...

> 3- Strange "FILE SYSTEM WAS MODIFIED" report from e2fsck.
>   ===
> 
> The test just creates an empty file so it is surprising that e2fsck modifies
> the filesystem. 
> I found the reason of this modification when reading the git logs, e2fsck
> backups the primary superblock to the backups when the feature sets are 
> different (EXT3_FEATURE_INCOMPAT_EXTENTS in that case).
> 
> It's not really a problem, it's just confusing. To see with other users,
> perhaps.

Yeah, we should add a explanatory message for that case.  Thanks for
pointing that out.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-27 Thread Theodore Tso
On Sun, Jan 27, 2008 at 10:40:25PM -0600, Eric Sandeen wrote:
> 
> %build
> aclocal
> autoconf
> %configure --enable-elf-shlibs --enable-nls --disable-e2initrd-helper
> --enable-blkid-devmapper --enable-blkid-selinux --enable-dynamic-e2fsck
> make %{?_smp_mflags}
> 

It should be fine if you eliminate the "aclocal" call.  And I've never
understood why people like to regenerate configure scripts from
configure.in in the first place

In any case, I've just released e2fsprogs 1.40.6.  It contains some of
obviously correct patches from the Novell and Red Hat packages.  More
importantly, it includes the "mke2fs -E test_fs" feature which will be
needed once we push test_fs ext4 patches to mainline, which I plan to
push to Linus in the next day or two.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-27 Thread Theodore Tso
On Sun, Jan 27, 2008 at 09:06:59AM -0600, Eric Sandeen wrote:
> 
> * Tue Nov 15 2005 [EMAIL PROTECTED]
> - added close.patch provided by Ted Tso (IBM) to fix bug #132708

Hmm, yes, I had forgotten about the changelog in the spec file.  I'm
used to the kernel patch convention of *detailed* comments in the
patch file itself.


> *grin* Maybe obsolete by now?  haven't looked closely.

I can't tell.  Looks like my SuSE bugzilla account was flushed,
probably when it was merged with some central Novell authentication
system.  I created a new account, but "You are not authorized to view
bug #132708".  I tried searching the IBM LTC bugzilla, and turned up
nothing there, as well as the SCM logs for e2fsprogs --- and it's my
general practice that when I fix bug reported through either Red Hat
or Novell, that I include the Red Hat/SuSE/LTC bugzilla numbers in the
changelog.

The patch makes no sense, in any case, since super_shadow gets set a
few line laters, so the patch as it stands is either a no-op (assuming
a smart enough GCC), or wastes a few bytes of text segment space.

> > Patch12:e2fsprogs-mkinstalldirs.patch
> > 
> > Why?
> > 
> 
> Probably same as why we have something similar; for one reason or other
> need to rerun autoconf, and e2fsprogs isn't compatible with latest
> autoconf.  (This is a patch I inherited, and haven't yet investigated
> all the details)

Define "latest autoconf"?  I'm using autoconf 2.61, which is
reasonably up-to-date.  Can you send me the output of config.status,
so I can see what it's setting @MKINSTALLDIRS@ to?

> > Patch22:e2fsprogs-1.40.4-uuidd_pid_path.patch
> > 
> > The problem with this patch is that /var/run is cleared via rm -rf, so
> > it is highly problamtic to put the scratch directory for uuidd in
> > /var/run.
> 
> Hm, I had similar issues with uuidd too - common theme here?

What issues?  I thought you agreed that using /var/lib was the best
approach for now.  The Novell patch moves it back to /var/run, which
will cause significant problems if uuidd is run setuid to a non-root
user.

> > Patch32:libcom_err-no-e2fsck.static.patch
> > 
> > This patch does two completely unrelated things.  One is to disable
> > the libcom_err regression test suite (probably because some of the
> > other changes made) and the other is to disable building the
> > e2fsck.static file.  Why these two are bundled into a single patch I'm
> > not sure.
> 
> And I have a patch to do the latter as well.  Interesting how we've
> arrived at similar needed changes, independently.  :)

Yeah, I'll check in a change so that e2fsck is built dynamically by
default, and e2fsck.static is only built if it is explicitly
requiested via --enable-static-e2fsck.

> and Patch99:e2fsprogs-no_cmd_hiding.patch
> 
> honestly I like that; I should whip up a nice patch to emulate kbuild,
> with V=1 or something, unless there is some other easy way to show full
> build commands already?

Yes, a way to do kbuild with V=1 would be nice.  The main thing that
makes this difficult is that I've tried to make e2fsprogs not rely on
any GNU make'isms, since it builds on a number of non-Linux platforms,
including *BSD, MacOS, Solaris, etc. 

Personally, it's not a big deal; whenever I need to see what is going
on, I just edit the makefile and quickly remove the '@' signs.  It's
really not that hard, and it's rare that I need to look at things.  Of
course, that could be because I'm more familiar with e2fsprog's build
system.  :-)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] UPDATED: ignore "safe" flag differences when fsck compares superblocks

2008-01-26 Thread Theodore Tso
On Sat, Jan 26, 2008 at 10:51:10PM -0600, Eric Sandeen wrote:
> Theodore Tso wrote:
> > This is what I've checked in.  See the comment about why we can't
> > ignore a difference for the extended attribute feature.
> 
> Ok.  Unfortunately, that one really hurts on fedora, or any distro which
> writes xattrs during install.

The right fix is to simply force the extended attribute features on in
mke2fs.  We can do this in mke2fs.conf.  I plan to change it so that
by default it's enabled for ext3 filesystems.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating patches in SLES10 e2fsprogs

2008-01-26 Thread Theodore Tso
Wow.  You have a lot of patches in the SLES 10 e2fsprogs.  I'm not
sure why of them are there, though.  For example:

Patch0: elf.diff

I'm not sure what this one is for.

Patch1: e2fsprogs-1.35-libdir.diff

This one does two different things.  One is include AC_HEADER_TIME in
configure.in, and the other is to use $lib instead of "lib" when
defining root_libdir.  This seems to force root_libdir to /, which
makes no sense to me.


Patch4: e2fsprogs-blkid.diff

This patch causes fsck to check the BLKID_FILE environment variable
and passes it to the blkid library.  But the blkid library *already*
checks the BLKID_FILE environtment variable already.  So I'm not sure
why this is necessary at all.


Patch6: e2fsprogs-mdraid.patch

This apparently adds a new environment variable,
BLKID_SKIP_CHECK_MDRAID, which forces blkid to not detect mdraid
devices.  I'm not sure why.


Patch9: e2fsprogs-libvolume_id-support.patch

This one is going to cause problems for ext4 support, because
libvolume_id isn't going to have the right logic for ext4.  There are
also issues with volume_id library where I haven't been happy with the
choices Kay has made in terms of correctness.


Patch10:close.patch

I don't understand what this patch is trying to do.


Patch12:e2fsprogs-mkinstalldirs.patch

Why?


Patch22:e2fsprogs-1.40.4-uuidd_pid_path.patch

The problem with this patch is that /var/run is cleared via rm -rf, so
it is highly problamtic to put the scratch directory for uuidd in
/var/run.


Patch32:libcom_err-no-e2fsck.static.patch

This patch does two completely unrelated things.  One is to disable
the libcom_err regression test suite (probably because some of the
other changes made) and the other is to disable building the
e2fsck.static file.  Why these two are bundled into a single patch I'm
not sure.


Patch34:libcom_err-compile_et_permissions.patch

Why?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] UPDATED: ignore "safe" flag differences when fsck compares superblocks

2008-01-26 Thread Theodore Tso
This is what I've checked in.  See the comment about why we can't
ignore a difference for the extended attribute feature.

   - Ted

commit a8cde73acbf6e0f9c0a3601e4f5fac2b01a27bd2
Author: Theodore Ts'o <[EMAIL PROTECTED]>
Date:   Sat Jan 26 23:17:50 2008 -0500

Ignore "safe" flag differences when e2fsck compares superblocks

Recent e2fsprogs (1.40.3 and higher) fsck compares primary superblock to
backups, and if things differ, it forces a full check.  However, the
kernel has a penchant for updating flags the first time a feature is
used - attributes, large files, etc.

This is a bad idea, and we should break the kernel of this habit,
especially for the ext4 feature flags.  But for now, let's make e2fsck
avoid forcing a full check and backup except when absolutely
necessary.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

diff --git a/e2fsck/super.c b/e2fsck/super.c
index a4835f7..954783e 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -773,8 +773,26 @@ void check_super_block(e2fsck_t ctx)
 
 /*
  * Check to see if we should backup the master sb to the backup super
- * blocks.
+ * blocks.  Returns non-zero if the sb should be backed up.
  */
+
+/*
+ * A few flags are set on the fly by the kernel, but only in the
+ * primary superblock.  This is actually a bad thing, and we should
+ * try to discourage it in the future.  In particular, for the newer
+ * ext4 files, especially EXT4_FEATURE_RO_COMPAT_DIR_NLINK and
+ * EXT3_FEATURE_INCOMPAT_EXTENTS.  So some of these may go away in the
+ * future.
+ *
+ * The kernel will set EXT2_FEATURE_COMPAT_EXT_ATTR, but
+ * unfortunately, we shouldn't ignore it since if it's not set in the
+ * backup, the extended attributes in the filesystem will be stripped
+ * away.
+ */
+#define FEATURE_RO_COMPAT_IGNORE   (EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
+EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
+#define FEATURE_INCOMPAT_IGNORE(EXT3_FEATURE_INCOMPAT_EXTENTS)
+
 int check_backup_super_block(e2fsck_t ctx)
 {
ext2_filsys fs = ctx->fs;
@@ -814,10 +832,18 @@ int check_backup_super_block(e2fsck_t ctx)
continue;
}
 
-#define SUPER_DIFFERENT(x) (fs->super->x != tfs->super->x)
+#define SUPER_INCOMPAT_DIFFERENT(x)\
+   (( fs->super->x & ~FEATURE_INCOMPAT_IGNORE) !=  \
+(tfs->super->x & ~FEATURE_INCOMPAT_IGNORE))
+#define SUPER_RO_COMPAT_DIFFERENT(x)   \
+   (( fs->super->x & ~FEATURE_RO_COMPAT_IGNORE) != \
+(tfs->super->x & ~FEATURE_RO_COMPAT_IGNORE))
+#define SUPER_DIFFERENT(x) \
+   (fs->super->x != tfs->super->x)
+
if (SUPER_DIFFERENT(s_feature_compat) ||
-   SUPER_DIFFERENT(s_feature_incompat) ||
-   SUPER_DIFFERENT(s_feature_ro_compat) ||
+   SUPER_INCOMPAT_DIFFERENT(s_feature_incompat) ||
+   SUPER_RO_COMPAT_DIFFERENT(s_feature_ro_compat) ||
SUPER_DIFFERENT(s_blocks_count) ||
SUPER_DIFFERENT(s_inodes_count) ||
memcmp(fs->super->s_uuid, tfs->super->s_uuid,
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] teach libblkid about ext4(dev)

2008-01-26 Thread Theodore Tso
On Tue, Jan 22, 2008 at 05:11:11PM -0600, Eric Sandeen wrote:
> Returns ext4dev for now; will need to change to ext4 at the appropriate
> time I guess.

This is what I ultimately ended up committing.

- Ted

commit 2921332fd88b843ffb828d9c18f05bfd171ace76
Author: Theodore Ts'o <[EMAIL PROTECTED]>
Date:   Sat Jan 26 22:25:50 2008 -0500

Teach the blkid library about ext4/ext4dev

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

diff --git a/lib/blkid/probe.c b/lib/blkid/probe.c
index 32d9f3f..f7bb491 100644
--- a/lib/blkid/probe.c
+++ b/lib/blkid/probe.c
@@ -131,7 +131,8 @@ static void set_uuid(blkid_dev dev, uuid_t uuid, char *tag)
}
 }
 
-static void get_ext2_info(blkid_dev dev, unsigned char *buf)
+static void get_ext2_info(blkid_dev dev, struct blkid_magic *id,
+ unsigned char *buf)
 {
struct ext2_super_block *es = (struct ext2_super_block *) buf;
const char *label = 0;
@@ -146,61 +147,123 @@ static void get_ext2_info(blkid_dev dev, unsigned char 
*buf)
blkid_set_tag(dev, "LABEL", label, sizeof(es->s_volume_name));
 
set_uuid(dev, es->s_uuid, 0);
+
+   if ((es->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL) &&
+   !uuid_is_null(es->s_journal_uuid))
+   set_uuid(dev, es->s_journal_uuid, "EXT_JOURNAL");
+
+   if (strcmp(id->bim_type, "ext2") &&
+   ((blkid_le32(es->s_feature_incompat) &
+ EXT2_FEATURE_INCOMPAT_UNSUPPORTED) == 0))
+   blkid_set_tag(dev, "SEC_TYPE", "ext2", sizeof("ext2"));
 }
 
-static int probe_ext3(struct blkid_probe *probe, 
- struct blkid_magic *id __BLKID_ATTR((unused)), 
+static int probe_ext4dev(struct blkid_probe *probe,
+struct blkid_magic *id,
+unsigned char *buf)
+{
+   struct ext2_super_block *es;
+   es = (struct ext2_super_block *)buf;
+
+   /* Distinguish between ext4dev and other filesystems */
+   if ((blkid_le32(es->s_flags) & EXT2_FLAGS_TEST_FILESYS) == 0)
+   return -BLKID_ERR_PARAM;
+
+   /* Distinguish from jbd */
+   if (blkid_le32(es->s_feature_incompat) &
+   EXT3_FEATURE_INCOMPAT_JOURNAL_DEV)
+   return -BLKID_ERR_PARAM;
+
+   /* ext4dev requires a journal */
+   if (!(blkid_le32(es->s_feature_compat) &
+ EXT3_FEATURE_COMPAT_HAS_JOURNAL))
+   return -BLKID_ERR_PARAM;
+
+   get_ext2_info(probe->dev, id, buf);
+   return 0;
+}
+
+static int probe_ext4(struct blkid_probe *probe, struct blkid_magic *id,
  unsigned char *buf)
 {
struct ext2_super_block *es;
es = (struct ext2_super_block *)buf;
 
-   /* Distinguish between jbd and ext2/3 fs */
+   /* Distinguish from ext4dev */
+   if (blkid_le32(es->s_flags) & EXT2_FLAGS_TEST_FILESYS)
+   return -BLKID_ERR_PARAM;
+
+   /* Distinguish from jbd */
if (blkid_le32(es->s_feature_incompat) & 
EXT3_FEATURE_INCOMPAT_JOURNAL_DEV)
return -BLKID_ERR_PARAM;
 
-   /* Distinguish between ext3 and ext2 */
+   /* ext4 requires journal */
if (!(blkid_le32(es->s_feature_compat) &
  EXT3_FEATURE_COMPAT_HAS_JOURNAL))
return -BLKID_ERR_PARAM;
 
-   get_ext2_info(probe->dev, buf);
+   /* Ext4 has at least one feature which ext3 doesn't understand */
+   if (!(blkid_le32(es->s_feature_ro_compat) &
+ EXT3_FEATURE_RO_COMPAT_UNSUPPORTED) &&
+   !(blkid_le32(es->s_feature_incompat) &
+ EXT3_FEATURE_INCOMPAT_UNSUPPORTED))
+   return -BLKID_ERR_PARAM;
 
-   if ((es->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL) &&
-   !uuid_is_null(es->s_journal_uuid))
-   set_uuid(probe->dev, es->s_journal_uuid, "EXT_JOURNAL");
+   get_ext2_info(probe->dev, id, buf);
+   return 0;
+}
+
+static int probe_ext3(struct blkid_probe *probe, struct blkid_magic *id,
+ unsigned char *buf)
+{
+   struct ext2_super_block *es;
+   es = (struct ext2_super_block *)buf;
+
+   /* Distinguish from ext4dev */
+   if (blkid_le32(es->s_flags) & EXT2_FLAGS_TEST_FILESYS)
+   return -BLKID_ERR_PARAM;
 
-   blkid_set_tag(probe->dev, "SEC_TYPE", "ext2", sizeof("ext2"));
+   /* ext3 requires journal */
+   if (!(blkid_le32(es->s_feature_compat) &
+ EXT3_FEATURE_COMPAT_HAS_JOURNAL))
+   return -BLKID_ERR_PARAM;
 
+   /* Any features which ext3 doesn't understand */
+   if ((blkid_le32(es->s_feature_ro_compat) &
+EXT3_FEATURE_RO_COMPAT_UNSUPPORTED) ||
+   (blkid_le32(es->s_feature_incompat) &
+EXT3_FEATURE_INCOMPAT_UNSUPPORTED))
+   return -BLKID_ERR_PARAM;
+
+   get_ext2_info(probe->dev, id, buf);
return 0;
 }

Re: Patch for 2.6.25 queue

2008-01-26 Thread Theodore Tso
On Sat, Jan 26, 2008 at 12:03:31PM -0500, Theodore Tso wrote:
> This patch hunk causes a failure due to a botched brelse->put_bh
> conversion, but removing it isn't the right fix.

Err, sorry, the above should read, "This patch hunk FIXES a failure"...

and if it wasn't clear, I was responsible for the initial botched
conversion.  :-)

  - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch for 2.6.25 queue

2008-01-26 Thread Theodore Tso
On Sat, Jan 26, 2008 at 05:08:49PM +0530, Aneesh Kumar K.V wrote:
> This diff contain mballoc fixes and update for ext3-4 migrate patch.

I will fold these patches into the patch queue in the proper places
(and adjust other patches as necessary).

> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a60672c..9de0cdf 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4452,7 +4452,6 @@ do_more:
>   overflow = bit + count - EXT4_BLOCKS_PER_GROUP(sb);
>   count -= overflow;
>   }
> - put_bh(bitmap_bh);
>   bitmap_bh = read_block_bitmap(sb, block_group);
>   if (!bitmap_bh)
>   goto error_return;

This patch hunk causes a failure due to a botched brelse->put_bh
conversion, but removing it isn't the right fix.  It was there to
avoid a buffer leak on a goto.  So the right fix is to remove the
put_bh() above, but to add one here, circa line 4542 of mballoc.c:

if (overflow && !err) {
block += count;
count = overflow;
+   put_bh(bitmap_bh);
goto do_more;
}

And once we do that, we can drop the initialization of bitmap_bh to
NULL, circa line 4408:

int metadata, unsigned long *freed)
{
-   struct buffer_head *bitmap_bh = NULL;
+   struct buffer_head *bitmap_bh;
struct super_block *sb = inode->i_sb;
struct ext4_allocation_context ac;


- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/49] ext4: add block bitmap validation

2008-01-26 Thread Theodore Tso
On Wed, Jan 23, 2008 at 02:06:54PM -0800, Andrew Morton wrote:
> brelse() should only be used when the bh might be NULL - put_bh()
> can be used here.
> 
> Please review all ext4/jbd2 code for this trivial speedup.

I've reviewed all of the pending patches in the stable queue for this
speedup, and applied them where necessary; it was useful, since I
detected a buffer head leak in one of the patches while I was at it.

The ext4/jbd2 code as a whole still needs to be reviewed for this
speedup, but I don't want to fix this in the initial stable push, lest
I break something by accident.  I'll put it in the "TO DO" queue.

Regards,

 - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-26 Thread Theodore Tso
On Fri, Jan 25, 2008 at 05:55:51PM -0800, Bryan Henderson wrote:
> I was surprised to see AIX do late allocation by default, because IBM's 
> traditional style is bulletproof systems.  A system where a process can be 
> killed at unpredictable times because of resource demands of unrelated 
> processes doesn't really fit that style.
> 
> It's really a fairly unusual application that benefits from late 
> allocation: one that creates a lot more virtual memory than it ever 
> touches.  For example, a sparse array.  Or am I missing something?

I guess it depends on how far you try to do "bulletproof".  OSF/1 used
to use "bulletproof" as its default --- and I had to turn it off on
tsx-11.mit.edu (the first North American ftp server for Linux :-),
because the difference was something like 50 ftp daemons versus over
500 on the same server.  It reserved VM space for the text segement of
every single process, since at least in theory, it's possible for
every single text page to get modified using ptrace if (for example) a
debugger were to set a break point on every single page of every
single text segement of every single ftp daemon.

You can also see potential problems for Java programs.  Suppose you
had some gigantic Java Application (say, Lotus Notes, or Websphere
Application Server) which is taking up many, many, MANY gigabytes of
VM space.  Now suppose the Java application needs to fork and exec
some trivial helper program.  For that tiny instant, between the fork
and exec, the VM requirements in "bulletproof" mode would double,
since while 99.% of the time programs will immediately discard the
VM upon the exec, there is always the possibility that the child
process will touch every single data page, forcing a copy on write,
and never do the exec.

There are of course different levels of "bulletproof" between the
extremes of "totally bulletproof" and "late binding" from an
algorithmic standpoint.  For example, you could ignore the needed
pages caused by ptrace(); more challenging would be to how to handle
the fork/exec semantics, although there could be kludges such as
strongly encouraging applications to use an old-fashed BSD-style
vfork() to guarantee that the child couldn't double VM requirements
between the vfork() and exec().  I certainly can't say for sure what
the AIX designers had in mind, and why they didn't choose one of the
more intermediate design choices.  

However, it is fair to say that "100% bulletproof" can require
reserving far more VM resources than you might first expect.  Even a
company which is highly incented to sell large amounts of hardware,
such as Digital, might not have wanted their OS to be only able to
support an embarassingly small number of simultaneous ftpd
connections.  I know this for sure because the OSF/1 documentation,
when discussing their VM tuning knobs, specifically talked about the
scenario that I ran into with tsx-11.mit.edu.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl

2008-01-25 Thread Theodore Tso
On Thu, Jan 24, 2008 at 11:25:32AM +0530, Aneesh Kumar K.V wrote:
> +static int free_ext_idx(handle_t *handle, struct inode *inode,
> + struct ext4_extent_idx *ix)
> +{
> + int i, retval = 0;
> + ext4_fsblk_t block;
> + struct buffer_head *bh;
> + struct ext4_extent_header *eh;
> +
> + block = idx_pblock(ix);
> + bh = sb_bread(inode->i_sb, block);
> + if (!bh)
> + return -EIO;
> +
> + eh = (struct ext4_extent_header *)bh->b_data;
> + if (eh->eh_depth == 0) {
> + brelse(bh);
> + ext4_free_blocks(handle, inode, block, 1);
> + } else {
> + ix = EXT_FIRST_INDEX(eh);
> + for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ix++) {
> + retval = free_ext_idx(handle, inode, ix);
> + if (retval)
> + return retval;
> + }
> + }
> + return retval;
> +}

Aneesh, looks like if eh->eh_depth is != 0, bh gets leaked.  This is
how I plan to fix it up:

+static int free_ext_idx(handle_t *handle, struct inode *inode,
+   struct ext4_extent_idx *ix)
+{
+   int i, retval = 0;
+   ext4_fsblk_t block;
+   struct buffer_head *bh;
+   struct ext4_extent_header *eh;
+
+   block = idx_pblock(ix);
+   bh = sb_bread(inode->i_sb, block);
+   if (!bh)
+   return -EIO;
+
+   eh = (struct ext4_extent_header *)bh->b_data;
+   if (eh->eh_depth == 0)
+   ext4_free_blocks(handle, inode, block, 1);
+   else {
+   ix = EXT_FIRST_INDEX(eh);
+   for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ix++) {
+   retval = free_ext_idx(handle, inode, ix);
+   if (retval)
+   break;
+   }
+   }
+   put_bh(bh);
+   return retval;
+}

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 10:34:25AM -0600, Eric Sandeen wrote:
> > But it was this concern which is why ext3 never exported freeze
> > functionality to userspace, even though other commercial filesystems
> > do support this.  It wasn't that it wasn't considered, but the concern
> > about whether or not it was sufficiently safe to make available.
> 
> What's the safety concern; that the admin will forget to unfreeze?

That the admin would manage to deadlock him/herself and wedge up the
whole system...

> I'm also not sure I see the point of the timeout in the original patch;
> either you are done snapshotting and ready to unfreeze, or you're not;
> 1, or 2, or 3 seconds doesn't really matter.  When you're done, you're
> done, and you can only unfreeze then.  Shouldn't this be done
> programmatically, and not with some pre-determined timeout?

This is only a guess, but I suspect it was a fail-safe in case the
admin did manage to deadlock him/herself.  

I would think a better approach would be to make the filesystem
unfreeze if the file descriptor that was used to freeze the filesystem
is closed, and then have explicit deadlock detection that kills the
process doing the freeze, at which point the filesystem unlocks and
the system can recover.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Theodore Tso
On Fri, Jan 25, 2008 at 03:18:51PM +0300, Dmitri Monakhov wrote:
> First of all Linux already have at least one open-source(dm-snap),
> and several commercial snapshot solutions. 

Yes, but it requires that the filesystem be stored under LVM.  Unlike
what EVMS v1 allowed us to do, we can't currently take a snapshot of a
bare block device.  This patch could potentially be useful for systems
which aren't using LVM, however

> You have to realize what delay between 1-3 stages have to be minimal.
> for example dm-snap perform it only for explicit journal flushing.
> From my experience if delay is more than 4-5 seconds whole system becomes
> unstable.

That's the problem.  You can't afford to freeze for very long.

What you *could* do is to start putting processes to sleep if they
attempt to write to the frozen filesystem, and then detect the
deadlock case where the process holding the file descriptor used to
freeze the filesystem gets frozen because it attempted to write to the
filesystem --- at which point it gets some kind of signal (which
defaults to killing the process), and the filesystem is unfrozen and
as part of the unfreeze you wake up all of the processes that were put
to sleep for touching the frozen filesystem.

The other approach would be to say, "oh well, the freeze ioctl is
inherently dangerous, and root is allowed to himself in the foot, so
who cares".  :-)

But it was this concern which is why ext3 never exported freeze
functionality to userspace, even though other commercial filesystems
do support this.  It wasn't that it wasn't considered, but the concern
about whether or not it was sufficiently safe to make available.

And I do agree that we probably should just implement this in
filesystem independent way, in which case all of the filesystems that
support this already have super_operations functions
write_super_lockfs() and unlockfs().

So if this is done using a new system call, there should be no
filesystem-specific changes needed, and all filesystems which support
those super_operations method functions would be able to provide this
functionality to the new system call.

 - Ted

P.S.  Oh yeah, it should be noted that freezing at the filesystem
layer does *not* guarantee that changes to the block device aren't
happening via mmap()'ed files.  The LVM needs to freeze writes the
block device level if it wants to guarantee a completely stable
snapshot image.  So the proposed patch doens't quite give you those
guarantees, if that was the intended goal.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-24 Thread Theodore Tso
On Fri, Jan 25, 2008 at 01:08:09AM +0200, Adrian Bunk wrote:
> In practice, there is a small number of programs that are both the
> common memory hogs and should be able to reduce their memory consumption
> by 10% or 20% without big problems when requested (e.g. Java VMs,
> Firefox and databases come into my mind).

I agree, it's only a few processes where this makes sense.  But for
those that do, it would be useful if they could register with the
kernel that would like to know, (just before the system starts
ejecting cached data, just before swapping, etc.) and at what
frequency.  And presumably, if the kernel notices that a process is
responding to such requests with memory actually getting released back
to the system, that process could get "rewarded" by having the OOM
killer less likely to target that particular thread.

AIX basically did this with SIGDANGER (the signal is ignored by
default), except there wasn't the ability for the process to tell the
kernel at what level of memory pressure before it should start getting
notified, and there was no way for the kernel to tell how bad the
memory pressure actually was.  On the other hand, it was a relatively
simple design.

In practice very few processes would indeed pay attention to
SIGDANGER, so I think you're quite right there.

> And from a performance point of view letting applications voluntarily 
> free some memory is better even than starting to swap.

Absolutely.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new "development flag" to the ext4 filesystem

2008-01-23 Thread Theodore Tso
On Wed, Jan 23, 2008 at 11:04:35AM -0600, Eric Sandeen wrote:
> 
> I just think that ext4.ko running ext3 filesystems needs to be under
> explicit control, and not something that happens, occasionally,
> accidentally, without the user/administrator requesting it.  Least
> surprise, and all that...
> 

Well, most of the time that will happen, given that /etc/fstab
explicitly states which filesystem to use; and if the user doesn't
specify a filesystme type, mount will use blkid/vol_id, and if that
says ext3, then ext3 will be used.  The only time where it might not
happen without an explicit administrator request is on the root
filesystem automount and if you are using an initrd to mount the
filesystem (as all of the enterprise distros now do anyway), that
could be easily controlled form userspace as well.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new "development flag" to the ext4 filesystem

2008-01-23 Thread Theodore Tso
On Tue, Jan 22, 2008 at 09:55:37PM -0600, Eric Sandeen wrote:
> 
> Overall, seems ok.  One other question though, when ext4 is a
> fully-fledged production filesystem, and the flag requirement is gone,
> what stops ext3 filesystems from being silently mounted as ext4, just as
> happened with ext4dev today?  

Nothing will prevent that, but in the long term we don't want ext4 to
be automatically adding new features flags anyway.  The only reason
why we were doing that was to encourage more people to test out ext4,
and for the convenience of the ABAT auto-testing.

So I'm assuming that before we remove this test, we will also be
fixing some of the automatic enablement of extents, etc., because that
sort of thing will be moved into e2fsprogs as part of "tune2fs -O
ext4" or "mke2fs -O ext4" or "mkfs.ext4". 

If we do that, then the only downside of having ext3 filesystems run
under ext4 is the test matrix concern.  Since I'm still hoping that
some point in the future, fs/ext4 could subsume fs/ext3 so we don't
have to worry about bug fixes going into fs/ext2 AND fs/ext3 AND
fs/ext4, I have my own reasons for wanting that.  But I do understand
the concerns that maybe in the short term some distro's don't want to
do that.  So in that case I could see adding a "you must have extents"
test into ext4, if I distro has specific support concerns. But for
people who are running mainline kernel, I think it's actually a *good*
thing if fs/ext4 can mount and read and write to an ext3 filesystem
--- as long as it doesn't automatically turn on features behind the
user's back.

> + "\ttestfs\n"));
> 
> help text doesn't match reality here, missing a "_"

Oops, thanks for catching that.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()

2008-01-23 Thread Theodore Tso
On Wed, Jan 23, 2008 at 08:10:20AM -0600, Dave Kleikamp wrote:
> I'd think you'd want to change the printk text as well.  "defrag: failed
> ext4_force_commit (%d)\n" maybe?

mmm, good catch, thanks.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH e2fsprogs] teach libblkid about ext4(dev)

2008-01-22 Thread Theodore Tso
On Tue, Jan 22, 2008 at 05:11:11PM -0600, Eric Sandeen wrote:
> Returns ext4dev for now; will need to change to ext4 at the appropriate
> time I guess.
> 
> Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>

Hmm... see my test_fs patch.  If we decide that this is a good way to
go, then we can use the presence of that flag to cause blkid to return
"ext4dev", and absence of that flag for blkid to return "ext4".

When we rename the kernel filesystem type from ext4dev to ext4, we
also take out the requirement for EXT2_FLAGS_TEST_FILESYS, and simply
tell people to remove that flag from their ext4 filesytems.  Blkid
will then do the right thing without requiring us to synchronize
updating the kernel as well as e2fsprogs at the same time.

What do people think?  I think it's a pretty elegant solution.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, RFC] Add new "development flag" to the ext4 filesystem

2008-01-22 Thread Theodore Tso
As discussed on RFC, this flag is simply a generic "this is a
crash/burn test filesystem" marker.  If it is set, then filesystem
code which is "in development" will be allowed to mount the
filesystem.  Filesystem code which is not considered ready for
prime-time will check for this flag, and if it is not set, it will
refuse to touch the filesystem.

As we start rolling ext4 out to distro's like Fedora, et. al, this
makes it less likely that a user might accidentally start using ext4
on a production filesystem; a bad thing, since that will essentially
make it be unfsckable until e2fsprogs catches up.

- Ted

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index cf2f612..a8c599a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1991,6 +1991,17 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
 		printk(KERN_WARNING
 		   "EXT4-fs warning: feature flags set on rev 0 fs, "
 		   "running e2fsck is recommended\n");
+
+	/*
+	 * Since ext4 is still considered development code, we require
+	 * that the TEST_FILESYS flag in s->flags be set.
+	 */
+	if (!(le32_to_cpu(es->s_flags) & EXT2_FLAGS_TEST_FILESYS)) {
+		printk(KERN_WARNING, "EXT4-fs: %s: test filesystem flag "
+		   "not set.\n");
+		goto failed_mount;
+	}
+
 	/*
 	 * Check feature flags regardless of the revision level, since we
 	 * previously didn't change the revision level when setting the flags,
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 98d1c3c..d203543 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -547,6 +547,13 @@ do {	   \
 #define	EXT4_ORPHAN_FS			0x0004	/* Orphans being recovered */
 
 /*
+ * Misc. filesystem flags
+ */
+#define EXT2_FLAGS_SIGNED_HASH		0x0001  /* Signed dirhash in use */
+#define EXT2_FLAGS_UNSIGNED_HASH	0x0002  /* Unsigned dirhash in use */
+#define EXT2_FLAGS_TEST_FILESYS		0x0004	/* OK for use on development code */
+
+/*
  * Mount flags
  */
 #define EXT4_MOUNT_CHECK		0x1	/* Do mount-time checks */
diff --git a/lib/e2p/ls.c b/lib/e2p/ls.c
index b9ae14a..957ba5b 100644
--- a/lib/e2p/ls.c
+++ b/lib/e2p/ls.c
@@ -147,11 +147,15 @@ static void print_super_flags(struct ext2_super_block * s, FILE *f)
 
 	fputs("Filesystem flags: ", f);
 	if (s->s_flags & EXT2_FLAGS_SIGNED_HASH) {
-		fputs("signed directory hash ", f);
+		fputs("signed_directory_hash ", f);
 		flags_found++;
 	}
 	if (s->s_flags & EXT2_FLAGS_UNSIGNED_HASH) {
-		fputs("unsigned directory hash ", f);
+		fputs("unsigned_directory_hash ", f);
+		flags_found++;
+	}
+	if (s->s_flags & EXT2_FLAGS_TEST_FILESYS) {
+		fputs("test_filesystem ", f);
 		flags_found++;
 	}
 	if (flags_found)
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index e096577..dd5e495 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -467,12 +467,14 @@ struct ext2_inode_large {
  */
 #define EXT2_VALID_FS			0x0001	/* Unmounted cleanly */
 #define EXT2_ERROR_FS			0x0002	/* Errors detected */
+#define EXT4_ORPHAN_FS			0x0004	/* Orphans being recovered */
 
 /*
  * Misc. filesystem flags
  */
 #define EXT2_FLAGS_SIGNED_HASH		0x0001  /* Signed dirhash in use */
 #define EXT2_FLAGS_UNSIGNED_HASH	0x0002  /* Unsigned dirhash in use */
+#define EXT2_FLAGS_TEST_FILESYS		0x0004	/* OK for use on development code */
 
 /*
  * Mount flags
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 7a360ea..e754d6b 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -849,6 +849,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
 
 param->s_reserved_gdt_blocks = rsv_gdb;
 			}
+		} else if (!strcmp(token, "test_fs")) {
+			param->s_flags |= EXT2_FLAGS_TEST_FILESYS;
 		} else
 			r_usage++;
 	}
@@ -859,7 +861,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
 			"\tis set off by an equals ('=') sign.\n\n"
 			"Valid extended options are:\n"
 			"\tstride=\n"
-			"\tresize=\n\n"));
+			"\tresize=\n"
+			"\ttest_fs\n"));
 		free(buf);
 		exit(1);
 	}
@@ -1556,6 +1559,9 @@ int main (int argc, char *argv[])
 		exit(1);
 	}
 
+	if (fs_param.s_flags & EXT2_FLAGS_TEST_FILESYS)
+		fs->super->s_flags |= EXT2_FLAGS_TEST_FILESYS;
+
 	/*
 	 * Wipe out the old on-disk superblock
 	 */
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 833b994..a714530 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -71,6 +71,7 @@ static unsigned short errors;
 static int open_flag;
 static char *features_cmd;
 static char *mntopts_cmd;
+static char *extended_cmd;
 
 int journal_size, journal_flags;
 char *journal_device;
@@ -505,7 +506,7 @@ static void parse_tune2fs_options(int argc, char **argv)
 	struct passwd * pw;
 
 	printf("tune2fs %s (%s)\n", E2FSPROGS_VERSION, E2FSPROGS_DATE);
-	while ((c = getopt(argc, argv, "c:e:fg:i:jlm:o:r:s:u:C:J:L:M:O:T:U:")) != EOF)
+	while ((c = getopt(argc, argv, "c:e:fg:i:jlm:o:r:s:u:C:E:J:L:M:O:T:U:")) != EOF)
 		switch (c)
 		{
 			case 'c':
@@ -548,6 +549,10 @@ static void parse_tune2fs_options

Re: [RFC] Parallelize IO for e2fsck

2008-01-22 Thread Theodore Tso
On Tue, Jan 22, 2008 at 12:00:50AM -0700, Andreas Dilger wrote:
> > AIX had SIGDANGER some 15 years ago.  Admittedly, that was sent when
> > the system was about to hit OOM, not when it was about to start swapping.
> 
> I'd tried to advocate SIGDANGER some years ago as well, but none of
> the kernel maintainers were interested.  It definitely makes sense
> to have some sort of mechanism like this.  At the time I first brought
> it up it was in conjunction with Netscape using too much cache on some
> system, but it would be just as useful for all kinds of other memory-
> hungry applications.

It's been discussed before, but I suspect the main reason why it was
never done is no one submitted a patch.  Also, the problem is actually
a pretty complex one.  There are a couple of different stages where
you might want to send an alert to processes:

* Data is starting to get ejected from page/buffer cache
* System is starting to swap
* System is starting to really struggle to find memory
* System is starting an out-of-memory killer

AIX's SIGDANGER really did the last two, where the OOM killer would
tend to avoid processes that had a SIGDANGER handler in favor of
processes that were SIGDANGER unaware.

Then there is the additional complexity in Linux that you have
multiple zones of memory, which at least on the historically more
popular x86 was highly, highly important.  You could say that whenever
there is sufficient memory pressure in any zone that you start
ejecting data from caches or start to swap that you start sending the
signals --- but on x86 systems with lowmem, that could happen quite
frequently, and since a user process has no idea whether its resources
are in lowmem or highmem, there's not much you can do about this.

Hopefully this is less of an issue today, since the 2.6 VM is much
more better behaved, and people are gradually moving over to x86_64
anyway.  (Sorry SGI and Intel, unfortunately they're not moving over
to the Itanic :-).   So maybe this would be better received now.

Bringing us back to the main topic at hand, one of the tradeoffs in
Val's current approach is that by relying on the kernel's buffer
cache, we don't have to worry about locking and coherency at the
userspace level.  OTOH, we give up low-level control about when memory
gets thrown out, and it also means that simply getting notified when
the system starts to swap isn't good enough.  We need to know much
earlier, when the system starts ejecting data from the buffer and page
caches.

Does this matter?  Well, there are a couple of use cases:

 * The restricted boot environment
 * The background "once a month" take a snapshot and check
 * The oh-my-gosh we-lost-a-filesystem -- repair it while the 
   IMAP server is still on-line serving data from the other 
   mounted filesystems.

It's the last case where things get really tricky

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Fix the BUG_ON in jbd2_journal_commit_transaction

2008-01-21 Thread Theodore Tso
On Mon, Jan 21, 2008 at 11:26:06AM +0530, Aneesh Kumar K.V wrote:
> kunmap_atomic was passed the wrong argument.

Thanks for pointing this out!  I've applied your fix to
jbd2-journal-chksum.patch in the ext4 patch queue.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-20 Thread Theodore Tso
On Sat, Jan 19, 2008 at 08:10:20PM -0800, Daniel Phillips wrote:
> 
> I can see value in preemptively loading indirect blocks into the buffer 
> cache, but is building a second-order extent tree really worth the 
> effort?  Probing the buffer cache is very fast.

It's not that much effort, and for a big database (say, like a 50GB
database file), the indirect blocks would take up 50 megabytes of
memory.  Collapsing it into an extent tree would save that memory into
a few kilobytes.  I suppose a database server would probably have
5-10GB's of memory, so the grand scheme of things it's not a vast
amount of memory, but the trick is keeping the indirect blocks pinned
so they don't get pushed out by some vast, gigunndo Java application
running in the same server as the database.  If you have the indirect
blocks encoded into the extent tree, then you don't have to worry
about that.

   - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-15 Thread Theodore Tso
On Tue, Jan 15, 2008 at 01:15:33PM +, Christoph Hellwig wrote:
> They won't fsck in planned downtimes.  They will have to use fsck when
> the shit hits the fan and they need to.   Not sure about ext3, but big
> XFS user with a close tie to the US goverment were concerned about this
> case for really big filesystems and have sponsored speedup including
> multithreading xfs_repair.  I'm pretty sure the same arguments apply
> to ext3, even if the filesystems are a few magnitudes smaller.

Agreed, 100%.  Even if you fsck snapshots during slow periods, it
still doesn't help you if the filesystem gets corrupted due to a
hardware or software error.  That's where this will matter the most.

Val Hensen has done a proof of concept patch that multi-threads e2fsck
(and she's working on one that would be long-term supportable) that
might reduce the value of this patch, but metaclustering should still
help.

> > In any decent environment, people will fsck their ext3 filesystems during
> > planned downtime, and the benefit of reducing that downtime from 6
> > hours/machine to 2 hours/machine is probably fairly small, given that there
> > is no service interruption.  (The same applies to desktops and laptops).
> > 
> > Sure, the benefit is not *zero*, but it's small.  Much less than it would
> > be with ext2.  I mean, the "avoid unplanned fscks" feature is the whole
> > reason why ext3 has journalling (and boy is that feature expensive during
> > normal operation).

Also, it's not just reducing fsck times, although that's the main one.
The last time this was suggested, the rationale was to speed up the
"rm dvd.iso" case.  Also, something which *could* be done, if Abhishek
wants to pursue it, would be to pull in all of the indirect blocks
when the file is opened, and create an in-memory extent tree that
would speed up access to the file.  It's rarely worth doing this
without metaclustering, since it doesn't help for sequential I/O, only
random I/O, but with metaclustering it would also be a win for
sequential I/O.  (This would also remove the minor performance
degradation for sequential I/O imposed by metaclustering, and in fact
improve it slightly for really big files.)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] Fix reading of bitmaps from filesystem image

2008-01-14 Thread Theodore Tso
On Wed, Jan 09, 2008 at 08:54:35PM +0100, Jan Kara wrote:
> 
> Reading of bitmaps from image file could never work with more than one
> group in a filesystem... Fix the loops so that we read appropriate number
> of blocks.

OK, so I'm probably being dense, but what's the problem?

You changed the loop from counting in bytes to inodes, but either
method should work.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] mke2fs: Make lost+found always have at least 2 blocks

2008-01-14 Thread Theodore Tso
On Wed, Jan 09, 2008 at 08:59:47PM +0100, Jan Kara wrote:
>   Hi,
> 
>   this is a resend of Andreas' patch I've sent in the beginning of
> December. The patch modifies mke2fs to always create lost+found with at
> least two directory blocks. I think this could make sence not only for
> testing but as a sanity check that 64KB support works for general e2fsprogs
> if someone decides to use it...

Thanks, applied.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [Coding Style]: misc fixes for fs/ext{3,4}/acl.{c,h} from checkpatch.pl

2008-01-04 Thread Theodore Tso
On Sat, Jan 05, 2008 at 01:12:44AM +0100, Paolo Ciarrocchi wrote:
> Isn't it a timing problem?
> I mean, I guess that codying style fixes are OK if there is a good 
> coordination
> with the maintainer and patches are sent with the right timing in
> order to not cause
> problems in the process.

But because running some kind of mechanical script and fixing up the
problems is relatively mindless, it doesn't *add* anything.  Only the
maintainer knows when it is a reasonably convenient time to fix all of
the problems, or when to fix portions of the code that is being
reworked anyway --- and the maintainer can just run the script by
himself, for himself.  The patch doesn't actually save him much work,
and in some cases, is actually more work than simply regenerating the
fixes *after* the other patches waiting in his patch queue have been
applied (but of course, it screws up everyone else's patches that
haven't been submitted to the maintainer yet).

In some cases, it's worth it.  In other (most) cases, it really isn't.

   - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [Coding Style]: misc fixes for fs/ext{3,4}/acl.{c,h} from checkpatch.pl

2008-01-04 Thread Theodore Tso
On Fri, Jan 04, 2008 at 05:30:00PM +0100, Andi Kleen wrote:
> 
> Exactly. And looking at the patch the old code was already perfectly
> readable anyways. Benefit about zero.

File this under the "checkpatch.pl" considered harmful category
The problem is not with the tool, but that at least *some* people seem
to think that making checkpatch.pl be completely silent is somehow
this holy grail that will make kernel code bug-free(tm).  (And of
course, people who want to encode nazi-like coding conventions and to
force all of the kernel to use a single coding convention as if that
somehow would improve the kernel's quality.  Some of that is OK, but
as long as the code is readable, do we really care about whether or
not the code is using exactly the same coding conventions everyplace?
Or, pressuring maintainers not to ignore cleanup patches lest they be
viewed as "bad" maintainers?)

Personally I find it annoying, but I'm willing to live with the
cleanup patches.  I don't think they add anything, though.  Maybe I
should be more cranky about such patches

> The recent flurry of cleanup code patches on l-k causes far more
> problems than it solves. I'm not even sure why people do this? Just
> because it is en vogue recently? 

I don't know, because people want to be able to say that they've
contributed fixes to the Linux kernel?

I will say that in past kernel summit program commitees, the
perception that someone _only_ submitted trivial patches (i.e.,
whitespace-only, spelling fixes in comments, etc.) has sometimes been
perceived a negative factor towards whether the program committee
might consider that person to be useful contributor to discussions at
the kernel summit  So people should be warned (I would have hoped
that it would be obvious), that submitting vast number of trivial
cleanup patches without contributing anything else will very likely
not work, and possibly backfire if that is your goal.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [Coding Style]: misc fixes for fs/ext{3,4}/acl.{c,h} from checkpatch.pl

2008-01-04 Thread Theodore Tso
On Fri, Jan 04, 2008 at 02:49:07PM +0100, Mathieu SEGAUD wrote:
> Vous m'avez dit récemment :
> 
> > Coding-style only changes tends to screw up our ability to merge
> > pending patches, but I'll take care of it, thanks
> 
> yep, I noticed that...
> would you rather me wait till 2.6.24 is out ?

I'll take your patches and merge all or most of it as I can into the
patch queue destined for 2.6.24-rc1.  Check back with me after -rc1;
if there are some left over we'll do them after that.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [Coding Style]: misc fixes for fs/ext{3,4}/acl.{c,h} from checkpatch.pl

2008-01-04 Thread Theodore Tso
Coding-style only changes tends to screw up our ability to merge
pending patches, but I'll take care of it, thanks.

   - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e2fsprogs: add minimal resize size option

2008-01-03 Thread Theodore Tso
On Wed, Jan 02, 2008 at 03:16:59PM -0500, Josef Bacik wrote:
> Hello,
> 
> People wishing to make live usb disks and such are looking for a way
> to get the minimum resize size for an ext fs in blocks so that they
> can just resize their image to that size and do with it what they
> will.  This patch adds that functionality, just pass -m option and
> it calculates the minimum number of blocks the fs can be resized to.

Three comments.  Instead of using a new option, why not simply let
resize2fs check to see if the optional parameter is something like
"min" or "0"?

Secondly, I'd suggest factoring out the calculation of the minimal
size into a separate function.

Finally, please don't fix whitespace issues, as it makes it harder for me
to merge between the "maint" and "master" development lines.  What I
plan to do is to do a massive fixup of trailing whitespace before the
next major (1.41) release of e2fsprogs.  Any new lines added by
patches should not introduce any new trailing whitespace, of course.

Thanks!

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's in e2fsprogs.git (stable)

2008-01-01 Thread Theodore Tso
Lots of old (and new) bugs that had been hanging around Sourcefroge
and Debian bug trackers have been fixed.  In addition, yesterday I
released 1.40.4 of e2fsprogs.

* The 'maint' branch has these fixes since the last announcement.

Theodore Ts'o (29):
  uuidd: Use /var/lib/libuuid instead of /var/run/uuidd
  libuuid: When starting uuidd, use waitpid() to reap the zombie
  process
  libuuid: Only try to start the uuidd daemon a limited number of
  times
  debian: Add a dependency on libuuid1 to the uuid-runtime package
  Add #define needed for Hurd ioctl definitions
  libuuid: Fix bug which caused uuidd to fail if sizeof(int) !=
  sizeof(int *)
  uuidd: Avoid race conditions to that only one uuidd is started
  Update Vietnamese translation from the Translation Project
  Convert use of ext2fs_get_mem to ext2fs_get_array for overflow
  detection
  Fix build failure on non-Linux/non-Hurd/non-Masix systems
  Add --disable-tls configure option
  Add --disable-uuidd configure option
  debian: Do not use TLS or uuidd when building the bootfloppy udeb's
  Test for sys/syscall.h in configure to fix dietlibc build problem
  Fix build error in blkid/tst_types.c when using diet libc
  debian: build the e2fsck-static package so it works on 2.4 kernels
  debian: Use useradd and groupadd in favor of adduser
  Update dependencies in lib/uuid/Makefile.in
  debian: Fix the document ID in comerr_dev.doc-base
  Expand discussion of the -D option in e2fsck's man page
  Update release notes, version files for 1.40.4 release
  libss: Remove unnecessary Makefile dependency for test_ss
  debian: Fix all postinst/prerm/postrm scripts to include debhelper
  additions
  uuidd: Add _GNU_SOURCE #define to pick up setres[ug]id() prototypes
  debugfs: Add #include  to pick up prototype for
  strcasecmp
  e2image: If there is an error while writing a block, call exit(1)
  e2fsck: When optimizing non-htree directories, sort by inode number
  debugfs: allow the undel command reallocate without linking the
  inode
  Fix build failure on non-Linux/non-Hurd/non-Masix systems


* The 'master' branch has these since the last announcement
  in addition to the above.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2007-12-17 Thread Theodore Tso
On Mon, Dec 17, 2007 at 04:36:34PM -0700, Andreas Dilger wrote:
> > But the performance problems are starting to make me worry.  Do you
> > know how many tdb entries you had before tdb performance started going
> > really badly down the toilet?  I wonder if there are some tuning knobs
> > we could tweak to the performance numbers.
> 
> There is some test data at
> https://bugzilla.lustre.org/attachment.cgi?id=13924 which is a PDF
> file.  This shows 1000 items is reasonable, and 1 is not.

I did some research, and the problem is that tdb uses a fixed number
of buckets for its hash size.  By default it is 131 hash buckets, but
you can pass in a different hash size when you create the tdb table.
So with 10,000 items, you will have an average of 76 objects per hash
chain, and that doesn't work terribly well, obviously.  Berkdb's hash
method uses an extensible hashing system which increases number of
bits that are used in the hash, and breaks up the hash buckets as they
get too big, which is a much nicer self-tuning algorithm.  With tdb,
you need to know from the get-go how much stuff you're likely going to
be storing in the tdb system, and adjust your hash size accordingly.

> The majority of the time is taken looking up existing entries, and this
> is due to one unusual requirement of the Lustre usage to be notified
> of duplicate insertions to detect duplicate use of objects, so this may
> have been a major factor in the slowdown.  It isn't really practical to
> use a regular libext2fs bitmap for our case, since the collision space
> is a 64-bit integer, but maybe we could have done this with an RB tree
> or some other mechanism.

Well, if you only need an in-core data structure, and it doesn't need
to be stored on disk, have you looked at e2fsck/dict.c, which was
lifted from Kazlib?  It's a userspace, single file, in-memory only RB
tree implementation.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in e2fsprogs.git (topics)

2007-12-17 Thread Theodore Tso
On Mon, Dec 17, 2007 at 03:34:55PM -0700, Andreas Dilger wrote:
> Did you see Eric's report that using the undo manager for mke2fs caused the
> performance to completely tank?  There is already enough memory pressure
> caused by a regular mke2fs that having to save the blocks into tdb for a
> large filesystem makes it unbearably slow.

Yup, I saw that, and had come to the same consluion (only use it when
uninit_groups/lazy_bg is enabled).

> We had also wanted to move from using db4 to tdb for the Lustre lfsck data
> (collection of EA information for distributed fsck) but even at 1 files
> the tdb performance was growing exponentially slower than db4 and we gave up.
> I suspect the same problem hits undo manager when the number of blocks to
> save is very high.

Hm.  I was very concerned about using db4, mainly because of the ABI
and on-disk format compatibility nightmare, which is why I chose tdb.
But the performance problems are starting to make me worry.  Do you
know how many tdb entries you had before tdb performance started going
really badly down the toilet?  I wonder if there are some tuning knobs
we could tweak to the performance numbers.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's in e2fsprogs.git (stable)

2007-12-17 Thread Theodore Tso
On Mon, Dec 17, 2007 at 06:23:01PM +0100, Jan Kara wrote:
> > Theodore Ts'o (20):
> >   Fix errors in test_ss.c so it can be an example application and
> >   test case
> >   libext2fs: Fix a corner case bug in ext2fs_unlink
>   Not that it would matter much but I think this was actually my fix ;).

Yep, sorry.  You are listed as the first Signed-off-by: in the patch
header.  When the patches is in a form that can't be automatically
sucked in via "git am" (including appropriately formatted patch
comments), I end up applying the patch by hand, and I do try to
remember to manually set the Author field, but sometimes I forget.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in e2fsprogs.git (topics)

2007-12-17 Thread Theodore Tso
Here are the topics that have been cooking.  Commits prefixed
with '-' are only in 'pu' while commits prefixed with '+' are
in 'next'.  The topics list the commits in reverse chronological
order.

* js/uninit (Sun Oct 21 21:04:24 2007 -0500) 14 commits
 - Add m_uninit test case.
 - Add new mm_lazy test case.
 - Fix test cases.
 - Update uninit block group documetation for some of the utilities.
 - Make e2fsck uninit block group aware.
 - Make debugfs uninit block group aware.
 - Make resize2fs uninit block group aware.
 - Make dumpe2fs uninit block group aware.
 - Make tune2fs uninit block group aware.
 - Add support for creating filesystems using uninit block group.
 - Rename feature name from gdt_checksum to uninit_groups.
 - Add uninit block group support on libe2fs.
 - Add initial checksum support.
 + Reorder some of the $(SRCS) in alphabetical order.

The uninit patch set is the next major set on my "to-merge"
list.  I've fixed this patch set so that "make check" works.

* tt/64bit-bitmaps (Sun Oct 14 22:51:51 2007 -0400) 1 commit
 - Initial design for 64-bit bitmaps

No change; still needs work, obviously.

* tt/extents (Mon Aug 20 21:31:11 2007 -0400) 5 commits
 - e2fsck: Add support for extents
 - Add support for extents to libext2fs
 - e2fsck: factor out code to clear an inode into
   e2fsck_clear_inode()
 - Don't byte swap extents information in the inode
 - Allow debugfs to be extended for use by test programs

I've started adding some additional checking into e2fsck, but
it still needs work.

* js/flex-bg (Mon Aug 13 23:33:14 2007 -0500) 1 commit
 - New bitmap and inode table allocation for FLEX_BG
* ak/undo-mgr (Mon Aug 13 15:56:26 2007 +0530) 6 commits
 - e2fsprogs: Add test case for undoe2fs
 - e2fsprogs: Fix the resize inode test case
 - e2fsprogs: Support for large inode migration.
 - e2fsprogs: Make mke2fs use undo I/O manager.
 - e2fsprogs: Add undoe2fs
 - e2fsprogs: Add undo I/O manager
* ad/extents-testcases (Thu Jul 12 11:20:14 2007 -0400) 19 commits
 . Add extent test: f_extents_shrd_blk
 . Add extent test: f_extents_unsorted
 . Add extent test: f_extents_res_blk
 . Add extent test: f_extents_overlap
 . Add extent test: f_extents_orphan_blks
 . Add extent test: f_extents_inrlevel-incons
 . Add extent test: f_extents_imbalanced_tree
 . Add extent test: f_extents_ei_leaf
 . Add extent test: f_extents_ei_block
 . Add extent test: f_extents_eh_max
 . Add extent test: f_extents_eh_magic
 . Add extent test: f_extents_eh_entries
 . Add extent test: f_extents_eh_depth
 . Add extent test: f_extents_ee_start
 . Add extent test: f_extents_ee_len
 . Add extent test: f_extents_ee_block
 . Add extent test: f_extents_bad_blk
 . Add extent test: f_extents
 . Add extent test: f_bad_disconnected_inode

These tests haven't been merged in yet, pending improvement of
the extents patches.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >