Re: Versioning file system

2007-09-29 Thread Sorin Faibish

Interesting that you mention the multitude of file systems because
I was very surprised to see NILFS being promoted in the latest Linux
Magazine but no mention of the other more important file systems
currently in work like UnionFS ChunkFS or ext4 so publisized.
I can say I was disapointed of the article. I still didn't
see any real prove that NILFS is the best file system since bread.
Neither I see any comments on nilfs from Andrew and others and
yet this is the best new file system coming to Linux. Maybe I missed
something that happened in Ottawa.

/Sorin


On Mon, 18 Jun 2007 05:45:24 -0400, Andreas Dilger <[EMAIL PROTECTED]>  
wrote:



On Jun 16, 2007  16:53 +0200, Jörn Engel wrote:

On Fri, 15 June 2007 15:51:07 -0700, alan wrote:
> >Thus, in the end it turns out that this stuff is better handled by
> >explicit version-control systems (which require explicit operations  
to

> >manage revisions) and atomic snapshots (for backup.)
>
> ZFS is the cool new thing in that space.  Too bad the license makes it
> hard to incorporate it into the kernel.

It may be the coolest, but there are others as well.  Btrfs looks good,
nilfs finally has a cleaner and may be worth a try, logfs will get
snapshots sooner or later.  Heck, even my crusty old cowlinks can be
viewed as snapshots.

If one has spare cycles to waste, working on one of those makes more
sense than implementing file versioning.


Too bad everyone is spending time on 10 similar-but-slightly-different
filesystems.  This will likely end up with a bunch of filesystems that
implement some easy subset of features, but will not get polished for
users or have a full set of features implemented (e.g. ACL, quota, fsck,
etc).  While I don't think there is a single answer to every question,
it does seem that the number of filesystem projects has climbed lately.

Maybe there should be a BOF at OLS to merge these filesystem projects
(btrfs, chunkfs, tilefs, logfs, etc) into a single project with multiple
people working on getting it solid, scalable (parallel readers/writers on
lots of CPUs), robust (checksums, failure localization), recoverable,  
etc.
I thought Val's FS summits were designed to get developers to  
collaborate,

but it seems everyone has gone back to their corners to work on their own
filesystem?

Working on getting hooks into DM/MD so that the filesystem and RAID  
layers

can move beyond "ignorance is bliss" when talking to each other would be
great.  Not rebuilding empty parts of the fs, limit parity resync to  
parts
of the fs that were in the previous transaction, use fs-supplied  
checksums
to verify on-disk data is correct, use RAID geometry when doing  
allocations,

etc.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"  
in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html







--
Best Regards
Sorin Faibish
Senior Technologist
Senior Consulting Software Engineer Network Storage Group

   EMC²
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

2007-02-26 Thread Sorin Faibish

On Mon, 26 Feb 2007 06:49:05 -0500, Yakov Lerner <[EMAIL PROTECTED]> wrote:


On 2/14/07, sfaibish <[EMAIL PROTECTED]> wrote:
On Sat, 10 Feb 2007 22:06:37 -0500, Sorin Faibish <[EMAIL PROTECTED]>  
wrote:


> Introducing DualFS
>
> File System developers played with the idea of separation of
> meta-data from data in file systems for a while. The idea was
> lately revived by a small group of file system enthusiasts
>  from Spain (from the little known University of Murcia) and
> it is called DualFS. We believe that the separation idea
> will bring to Linux file systems great value.
>
> We see DualFS as a next-generation journaling file system which
> has the same consistency guaranties as traditional journaling
> file systems but better performance characteristics. The new file
> system puts data and meta-data on different devices (usually, two
> partitions on the same disk or different disks)


Do you guys have an option of using just one partiton, and
divide it into two fixed parts at mkfs-time , one part X% (for metadata)
and  2nd part at (100-X)% (file file  blocks) ?


From an ease-of-use perspective I agree as this could be an option
but the point is to be able to manage the MD completely separate
in cases when you want for example to extend the MD volumes or
data volumes independently for example. It is all about different
address spaces and fencing. I am not sure is good for fsck tasks.


Would not this option
be easier-to-administer choice for naive users ?

From a code perspective should be possible. We could put it on a TODO
list if of interest for many. It is not a matter of naive users
or not. We are all naive when we use FS and are not experts in FS.


Or, you do not
view your FS as an option for naive users in the first place

If we are all naive of course we want DualFS to be useful for
all. We are open for proposed features and criticism from
the community. It is just a mater of resources.



Yakov







--
Best Regards
 Sorin Faibish
Senior Technologist
Senior Consulting Software Engineer Network Storage Group

   EMC²where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

2007-02-24 Thread Sorin Faibish

Jorg,

I am very found of all your comments and your positive attitude
on DualFS. I also understand that you have much more experience
than us in regard to GC and "cleaners". DualFS implementation is
using maybe old technology that can be definetly improved. Although
we understand the value of DualFS I don't believe we had the
intention to solve all the file systems issues but to address
some that are critical for the Linux users. We would be very
happy to improve DualFS cleaner based on your experience. So,
if you are interested we can rewrite/rearchitect the cleaner
using your ideas. After all we all want to have better FSs
for Linux and we believe that there is goodness in DualFS.

On Fri, 23 Feb 2007 08:26:45 -0500, Jörn Engel <[EMAIL PROTECTED]>  
wrote:



On Thu, 22 February 2007 20:57:12 +0100, Juan Piernas Canovas wrote:


I do not agree with this picture, because it does not show that all the
indirect blocks which point to a direct block are along with it in the
same segment. That figure should look like:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [some data] [ D0 D1' D2' ] [more data]
Segment 3: [some data] [ DB D1  D2  ] [more data]

where D0, DA, and DB are datablocks, D1 and D2 indirect blocks which
point to the datablocks, and D1' and D2' obsolete copies of those
indirect blocks. By using this figure, is is clear that if you need to
move D0 to clean the segment 2, you will need only one free segment at
most, and not more. You will get:

Segment 1: [some data] [ DA D1' D2' ] [more data]
Segment 2: [free]
Segment 3: [some data] [ DB D1' D2' ] [more data]
..
Segment n: [ D0 D1 D2 ] [ empty ]

That is, D0 needs in the new segment the same space that it needs in the
previous one.

The differences are subtle but important.


Ah, now I see.  Yes, that is deadlock-free.  If you are not accounting
the bytes of used space but the number of used segments, and you count
each partially used segment the same as a 100% used segment, there is no
deadlock.

Some people may consider this to be cheating, however.  It will cause
more than 50% wasted space.  All obsolete copies are garbage, after all.
With a maximum tree height of N, you can have up to (N-1) / N of your
filesystem occupied by garbage.

It also means that "df" will have unexpected output.  You cannot
estimate how much data can fit into the filesystem, as that depends on
how much garbage you will accumulate in the segments.  Admittedly this
is not a problem for DualFS, as the uncertainty only exists for
metadata, do "df" for DualFS still makes sense.

Another downside is that with large amounts of garbage between otherwise
useful data, your disk cache hit rate goes down.  Read performance is
suffering.  But that may be a fair tradeoff and will only show up in
large metadata reads in the uncached (per Linux) case.  Seems fair.

Quite interesting, actually.  The costs of your design are disk space,
depending on the amount and depth of your metadata, and metadata read
performance.  Disk space is cheap and metadata reads tend to be slow for
most filesystems, in comparison to data reads.  You gain faster metadata
writes and loss of journal overhead.  I like the idea.

Jörn



/Sorin


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

2007-02-17 Thread Sorin Faibish
On Sat, 17 Feb 2007 13:36:46 -0500, Jörn Engel <[EMAIL PROTECTED]>  
wrote:



On Sat, 17 February 2007 13:10:23 -0500, Bill Davidsen wrote:

>
I missed that. Which corner case did you find triggers this in DualFS?


This is not specific to DualFS, it applies to any log-structured
filesystem.

Garbage collection always needs at least one spare segment to collect
valid data into.  Regular writes may require additional free segments,
so GC has to kick in and free those when space is getting tight.  (1)

GC frees segments by writing all valid data in it into the spare
segment.  If there is remaining space in the spare segment, GC can move
more data from further segment.  Nice and simple.

The requirement is that GC *always* frees more segments than it uses up
doing so.  If that requirement is not fulfilled, GC will simply use up
its last spare segment without freeing a new one.  We have a deadlock.

Now imagine your filesystem is 90% full and all data is spread perfectly
across all segments.  The best segment you could pick for GC is 90%
full.  One would imagine that GC would only need to copy those 90% into
a spare segment and have freed 100%, making overall progress.

But more log-structured filesystems maintain a tree of some sorts on the
medium.  If you move data elsewhere, you also need to update the
indirect block pointing to it.  So that has to get written as well.  If
you have doubly or triply indirect blocks, those need to get written.
So you can end up writing 180% or more to free 100%.  Deadlock.

And if you read the documentation of the original Sprite LFS or any
other of the newer log-structured filesystems, you usually won't see a
solution to this problem, or even an acknowledgement that the problem
exists in the first place.  But there is no shortage of log-structured
filesystem projects that were abandoned years ago and have "cleaner" or
"garbage collector" as their top item on the todo-list.  Coincidence?


(1) GC may also kick in earlier, but that is just an optimization and
doesn't change the worst case, so that bit is irrelevant here.


Btw, the deadlock problem is solvable and I definitely don't want to
discourage further work in this area.  DualFS does look interesting.
But my solution for this problem will likely eat up all the performance
DualFS has gained and more, as it isn't aimed at hard disks.  So someone
has to come up with a different idea.

DualFS can probably get around this corner case as it is up to the user
to select the size of the MD device size. If you want to prevent this
corner case you can always use a device bigger than 10% of the data device
which is exagerate for any FS assuming that the directory files are so
large (this is when you have billions of files with long names).
In general the problem you mention is mainly due to the data blocks
filling the file system. In DualFS case you have the choice of selecting
different sizes for the MD and Data volume. When Data volume gets full
the GC will have a problem but the MD device will not have a problem.
It is my understanding that most of the GC problem you mention is
due to the filling of the FS with data and the result is a MD operation
being disrupted by the filling of the FS with data blocks. As about the
performance impact on solving this problem, as you mentioned all
journal FSs will have this problem, I am sure that DualFS performance
impact will be less than others at least due to using only one MD
write instead of 2.



Jörn





--
Best Regards
 Sorin Faibish
Senior Technologist
Senior Consulting Software Engineer Network Storage Group

   EMC²where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/