Wojciech Puchar writes:


Any Tree-like structure produces a huge risk of losing much more data that was corrupted at first place.

Not so sure about that statement, but well, let's agree we might disagree :)
disagreement is a source of all good ideas. but you should explain why.

Well, arguing can be fun at times, but my free time is rather limited; i wished
that thread could die peacefully.

I think I made my points clear. As i'm far from being qualified to discuss
these topics, i'll just add a bit but won't repeat my statements about
prerequisites regarding where's the line must be drawn and which running
conditions are expected from the FS pov.

my explanation below.


You asked for a little documentation about its layout, workings; this may be a good fit: http://www.dragonflybsd.org/presentations/nycbsdcon08/
this is about older hammer revision.

Matthew claimed some time ago that new hammer is completely different.

But after reading i understood that everything is in B-Tree. exactly what i call dangerous. B-Tree used to store everything, directory entries, inodes etc.

I won't speak on behalf of Matt, but iiuc HAMMER2 will use other structures
than B-Tree, with the goal to reduce complexity.
The presentation at the link i gave you is rather outdated, and targets HAMMER
"1" (first subversions of the FS in its first design).

B-Tree are dangerous if they are used as the only way to access data. Corrupted B-Tree does mean no access to anything below it!!


What i see as a main difference between HAMMER and ZFS are:

1) practical - hammer is very fast, don't use gigabytes of RAM and lots of CPU speed. Not that i did a lot of tests but it seems like UFS speed, sometimes even more, rarely less.

It is actually USAFUL, cannot be said on ZFS ;)

Sorry, i also just love ZFS for the business case i rely on it for. It has some
clearly nice features.

2) basic way of storing data are similar, details are different, danger is similar

No: this is wrong. I won't make a digest of papers on both of them for you.
Read about it.

3) HAMMER have recovery program. It will need to read whole media. Assume 2TB disk at 100MB/s -> 20000 seconds==6 hours. ZFS doesn't have, there are few businesses that recover ZFS data for money. For sure they doesn't feel it's a crisis ;)

I never had to use that recovery program. If you search the archives, only a
handful of people really had a need for it. Don't anticipate you'll need to use
it routinely.
The truth is: whatever happens (crash, lost power supply, sick HDD), you'll
just mount it, maybe some transactions will complete/be rolled back, and that's
it. A matter of 10 seconds.

Using recover onto a 2TB medium will be slow, of course. But you're then trying
to recover a full filesystem, including history for as much as was there before
the crash.

assume that i store my clients data in hammer filesystem and it crashed completely, but disks are fine. Assume it's tuesday 16pm, last copy done automatically monday 17:30, failure found at 17pm, i am on place 18pm

I ask my client - what do you prefer:

- wait 6 hours and there is good deal of chance that most of your data will be recovered. If so, the little few would be found out and recovered from backup. If not we will start recovery from backup that would take another 6 hours?

Moot point. See above.

- just clear things out and start recovery from backup, everything would be for sure recovered as it was yesterday after work?


the answer?

The answer using hammer: use mirror-stream and have your data onto another
disk, connected to a different host with a state "as of" 1 minute ago in the
worst case.
Dead hardware ? Just swap them, switch slave pfs to master, and you're done.

THE ANSWER:
---------------------------------------
1) divide disk space for metadata space and data space. amount of metadata space defined at filesystem creation, say 3% of whole drive.

And then you're into the "gosh! i never thought it'd store so many small files!
i'm screwed".

2) data stored only in B-Tree leafs, and all B-Tree leafs stored in "metadata space". few critical filesystem blocks stored here too at predefined place.

3) everything else stored in data space. B-Tree blocks excluding leafs, undo log, actual data.


4) everything else as it is already with modification to make sure every B-Tree leaf block will have data describing it properly. inodes having inode number inside, directory having it's inode number inside too. AFAIK it is already like that.

5) hammer recover modified to scan this 3% of space and then rebuild B-Tree. Will work faster or similar than fsck_ffs this way, in spite of being "last resort" tool.
---

THE RESULT: Fast and featureful filesystem that can always be quickly recovered even in "last resort" cases.

I just don't follow what you meant, honestly. But well, show us the code if you
feel brave.

That'll be the last reply to this thread for me.

Good night,
--
Francis

Reply via email to