Re: Unable to mount hammer file system Undo failed

2012-07-20 Thread Wojciech Puchar

   People who use HAMMER also tend to backup their filesystems using
   the streaming mirroring feature.  You need a backup anyway, regardless.


definitely. Backups are different thing.

But i do not consider online mirroring from hammer as backup feature, but 
something like more sophisticated mirroring.


As long as backed up data is available on line for writing i don't 
consider it backup.


I use rsync for backing up everything, with backup machine located on 
different place, and not accessible from outside internet.



   the fact that the copies are all being managed from the same machine).

and that this copies are not ever regenerated.


   failures over the years, mostly blamed on failed writes to disks or
   people not having UPS's (since UFS was never designed to work with
   a disk synchronization command, crashes from e.g. power failures could


Seems you are quite out of date with FreeBSD. FreeBSD UFS do perform disk 
cache flushes at right time.




Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar

HAMMER(ROOT) recovery check seqno=8ca97e62
HAMMER(ROOT) recovery range 36877528-36892fa0
HAMMER(ROOT) recovery nexto 36892fa0 endseqno=8ca98015
HAMMER(ROOT) recovery undo  36877528-36892fa0 (113272 bytes)(RW)
ad4: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=40UNCORRECTABLE LBA=483752928
HAMMER: UNDO record, cannot access buffer 203436e35ca8
HAMMER(ROOT) UNDO record at 36891a30 failed
HAMMER(ROOT) recovery complete
Failed to recover HAMMER filesystem on mount

this is example of what i fear about hammer. One error results in 
unability to mount.


If it is just software error in hammer that's great. If it is design... 
not great.


Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar

not great.


This is not a hammer problem but a problem with the underlying disk. It
couldn't read from the disk - that is pretty much a file-system
independent problem; UFS would fail equally miserably.

not true.
it is very unlinkey case you will not be able to mount. you will not be 
able to read everything.


copying to new disks with skipping errors (dd conv=sync,noerror) and doing 
fsck_ffs basically would recover everything that can be recovered.


UFS use flat on disk structure. inodes are at known places.

I don't know how HAMMER data is placed, but seems everything is dynamic.

any link to description of HAMMER on disk layout?



Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar

UFS use flat on disk structure. inodes are at known places.

I don't know how HAMMER data is placed, but seems everything is dynamic.

any link to description of HAMMER on disk layout?


Please, read hammer(8) (at the subcommand recover).

thank you very much.

While such recovery is painfully slow (it scans entire image not just 
selected predefined areas like fsck_ffs) it DO EXIST.


Seems i have to make some more tests with intentionally broken hardware, 
which i don't have at the moment.




Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar

which i don't have at the moment.




just dd /dev/random and overwrite a few sectors?


good but... real failures are always worse than that.

In my tests ZFS for example (which for me is plain example of bad design 
and bad implementation) failed within less than hour to the point it was 
mountable but anything in /usr subdirectory was unreadable resulting in 
crash.


i willingly used machine with failed chipset, resulting in bad data in RAM 
every now and then. When it hit userspace it resulted in signal 11 or 
weird results of software.


After hitting kernel it resulted what i described.

i've wrote this HDD image for later tests. newer ZFS versions fixed crash, 
replacing it with message of error with still no data access.


memory corruption resulted in writing bad metadata, in all copies of 
course. ZFS is tree structured so the results was clear.


No offline fsck exist for ZFS because it is not needed.

I actually agree. it is not needed, as well as ZFS ;)

As for me, what i really need is plain filesystem functionality.

I doesn't really need snapshots etc.

UFS is really fine for me, but seems like swapcache isn't well suited to 
cache it properly as Matthew Dillon said.


What i will need within a month is a service doing lots of I/O to some not 
that huge part of dataset, while keeping large files too.


swapcache is perfect fit for that workload if it would work.

All my fears aren't from nothing.
One should be picky about replacing something (UFS) that is close to 
perfect.


Thee fsck time is an overstated problem, over other problems.



On the same machine i was unable to destroy UFS after whole day of trying.
Of course it crashed. Of course it produced SMALL data loss (few files), 
fsck_ffs always fixed things properly.


This was under FreeBSD but DragonFly UFS is no different.


I really appreciate LOTS OF HARD WORK Of Matthew Dillon and others, but 
simply dismissing trusted UFS with over 20 years of history by stating 
just use hammer isn't good IMHO.




Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar




My main problem had been with ffs_fsck. At one point my machine was
randomly crashing due to a bad power supply. Everytime I started up, did
an hour of work, then crash, then 30-40 minutes for fsck to run, and an


you may postpone fsck when using softupdates. It is clearly stated in 
softupdate documents you may find (McKusick was one of the authors).

that's what i do.

of course i've had hardware failure like that, and got quite a few crashes 
before i was certain that it is hardware not software failure and 
requested new hardware (old was post warranty).


did fsck once a day after worktime. With new hardware did fsck as well as 
(just for sure) rebuild every index files, eg. dovecot indexes.


But EVEN if i would need to wait 30 minutes for fsck i prefer it over 
solutions that say fsck is not needed at all.


I would say more after really stress tests of HAMMER filesystem, including 
a real run of hammer rebuild at last.


as someone proposed doing tests with writing random disk blocks, i would 
rather write make a program that would flip random memory bit every few 
minutes.




hour later do it all over again.

I'd rather use Linux/ext3 than any UFS ever again.


i mean last sentence is a joke.

Linux extwhatever, and linux as a whole is most dangerous system i ever 
used if we talk about filesystem.


Not once i had to recover everything from backup because of amount of 
damage.


I wish you more such a luck in future, but remember that luck is never 
persistent even for you :)


Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar


you may postpone fsck when using softupdates. It is clearly stated in 
softupdate documents you may find (McKusick was one of the authors).

that's what i do.


Then, you suffer a performance hit when fsck'ing in bg.

once again - read more carefully :)

I am NOT talking about background fsck which is implemented in FreeBSD and 
i turn this off.


I am talking about just not doing fsck of every filesystem after crash. 
And doing it within same day but when pause is not a problem.


This is legitimate method with UFS+softupdates.

as someone proposed doing tests with writing random disk blocks, i would 
rather write make a program that would flip random memory bit every few 
minutes.


If you're assuming that even the computer itself ain't reliable, how the hell


Assuming hardware never fails is certainly wrong


could any FS be trustworthy then ??? IMHO, that's nonsense.

No it isn't. Sorry if i wasn't clear enough to explain it.



Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar


OK, understood now, i think: you agree with temporarily loosing a bit of 
unreclaimed free-space on disk until time permits cleaning things up 
properly, afaiu softupdates (+journalling ? not really clear).


That it. And that's how original softupdates document describe it.
You may run quite safely without fsck, just not abuse that feature for too 
long!


No journalling. I am currently FreeBSD user, FreeBSD 9 added softupdates 
journalling, but REALLY it doesn't change much except extra writes on disk.


I found that you actually have to run full fsck now and then even with 
journalling. In theory it shouldn't find any inconsistences, in practice 
it always find minor ones.


As to end that topic my practices are:

- do not make huge filesystems or create large RAID arrays.
2 disk, one mirror from them, one filesystem.
- it takes like 30 minutes or less to fsck it, and the same time for 10 
such filesystems as it can go in parallel.


in case of crash i do fsck manually when pause isn't a problem.
at reboot i check only root filesystem, and (if it's separate) /usr, so i 
could execute all other checks remotely without rebooting.



Assuming hardware never fails is certainly wrong


And there's no practical point assuming it *always* fails, is there ?


Just that it fails sometimes is enough to assume it can.


could any FS be trustworthy then ??? IMHO, that's nonsense.

No it isn't. Sorry if i wasn't clear enough to explain it.


Well, if the thing that you try to defend against is plain hardware failure 
(memory bits flipping, CPU going mad, whatever), i just doubt that any kind 
of software layer could definitely solve it (checksums of checksums of? i/o


You are completely right.

What i point out that flat data layout makes chance of recovery far higher 
and chance of bad destruction far lower.


Any Tree-like structure produces a huge risk of losing much more data that 
was corrupted at first place.



That rule already prove true for UFS filesystem, as well for eg. fixed 
size database tables like .DBF format which i still use, not modern 
ones.


Using DBF files as example - you have indexes in separate files, but 
indexes are not crucial and can be rebuild.


So if any tree like structure (or hash type or whatever) would be invented 
to speed up filesystem access - great. But only as extra index, with 
crucial data (INODES!!) written as flat fixed record table at known place.


I don't say HAMMER is bad - contrary to ZFS which is 100% PURE SHIT(R) - 
but i don't agree it is (or will be) a total replacement of older UFS.


Hammer pseudo-filesystems, snapshots and on like replication are useful 
features but actually not that needed for everyone, and not without cost 
of extra complexity. No matter how smart Matthew Dillon is, it still be 
far more complex, and more risky.


That's why it's not good that swapcache doesn't support efficient caching 
of UFS as there are no vfs.ufs.double_buffer feature just like hammer.




-
Disclaimer ;): None of my practices, my ideas about safe filesystems, 
mirroring or anything else are not replacement of proper backup 
strategy!!! Please do not interpret anything i write about ideas against 
backups.


Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar


Any Tree-like structure produces a huge risk of losing much more data that 
was corrupted at first place.


Not so sure about that statement, but well, let's agree we might disagree :)

disagreement is a source of all good ideas. but you should explain why.

my explanation below.



You asked for a little documentation about its layout, workings; this may be 
a good fit: http://www.dragonflybsd.org/presentations/nycbsdcon08/

this is about older hammer revision.

Matthew claimed some time ago that new hammer is completely different.

But after reading i understood that everything is in B-Tree. exactly 
what i call dangerous. B-Tree used to store everything, directory entries, inodes etc.


B-Tree are dangerous if they are used as the only way to access data. 
Corrupted B-Tree does mean no access to anything below it!!



What i see as a main difference between HAMMER and ZFS are:

1) practical - hammer is very fast, don't use gigabytes of RAM and lots of 
CPU speed. Not that i did a lot of tests but it seems like UFS speed, 
sometimes even more, rarely less.


It is actually USAFUL, cannot be said on ZFS ;)

2) basic way of storing data are similar, details are different, danger is 
similar


3) HAMMER have recovery program. It will need to read whole media. Assume 
2TB disk at 100MB/s - 2 seconds==6 hours.
ZFS doesn't have, there are few businesses that recover ZFS data for 
money. For sure they doesn't feel it's a crisis ;)



assume that i store my clients data in hammer filesystem and it crashed 
completely,  but disks are fine. Assume it's tuesday 16pm, last copy done 
automatically monday 17:30, failure found at 17pm, i am on place 18pm


I ask my client - what do you prefer:

- wait 6 hours and there is good deal of chance that most of your data 
will be recovered. If so, the little few would be found out and recovered 
from backup. If not we will start recovery from backup that would take 
another 6 hours?


- just clear things out and start recovery from backup, everything would 
be for sure recovered as it was yesterday after work?



the answer?


THE ANSWER:
---
1) divide disk space for metadata space and data space. amount of 
metadata space defined at filesystem creation, say 3% of whole drive.


2) data stored only in B-Tree leafs, and all B-Tree leafs stored in 
metadata space. few critical filesystem blocks stored here too at 
predefined place.


3) everything else stored in data space. B-Tree blocks excluding leafs, 
undo log, actual data.



4) everything else as it is already with modification to make sure every 
B-Tree leaf block will have data describing it properly. inodes having 
inode number inside, directory having it's inode number inside too. AFAIK 
it is already like that.


5) hammer recover modified to scan this 3% of space and then rebuild 
B-Tree. Will work faster or similar than fsck_ffs this way, in spite of 
being last resort tool.

---

THE RESULT: Fast and featureful filesystem that can always be quickly 
recovered even in last resort cases.


Re: Unable to mount hammer file system Undo failed

2012-07-19 Thread Wojciech Puchar


Sorry, i also just love ZFS for the business case i rely on it for. It has 
some

clearly nice features.


sorry if your resoning for software is based on love, not logic then it's 
good idea to end topic.


Probably your business is more about deploying as much as possible and 
that's all.


Re: questions from FreeBSD user

2012-07-16 Thread Wojciech Puchar

   though I don't remember the exact reason I chose it originally.

   The practical limitation for swap is 4096GB (4TB) due to the use
   of 32 bit block numbers coupled with internal arithmatic overflows
   in the swap algorithms which eats another 2 bits.


this is definitely enough for me :)


   We do not want to increase the size of the radix tree element because
   the larger structure size would double the per-swap-block physical memory
   overhead, and physical memory overhead is already fairly significant...
   around  1MB of physical memory is needed per 1GB of swap.


this is right. i don't like more locked memory just because more swap may 
POSSIBLY be used.



   There are a maximum of 4 swap devices (w/512GB limit by default in total,
   with the per-device limit 1/4 of that).  Devices are automatically


too - it is enough.

For now i am FreeBSD user, but when i read what are proposed by 
developers(!) for FreeBSD i clearly understand i will need something else.


And swapcache is nearly what i need for one use where i have a mix of I/O 
heavy files and large data. dividing it manually is hard.



   UFS ought to be be cached by swapcache but there's no point using it
   on DragonFly.  You should use HAMMER.


do you wrote some (even rough and preliminary) about how hammer's data is 
laid out over disk? Or maybe better HAMMER2 which you are working on now.


From my 15 year experience in unix (which first 5 were unfortunately linux 
and i know what is filesystem loss) it would be hard to impossible to 
convince me filesystem without offline fsck is a good idea.


Even from as good programmer as you. UFS is plain indestructible, 
including many of my harsh tests that eg. completely kills ZFS with full 
data loss ;)


Are i-nodes laid out in predefined places or scattered over and accessed 
with tree like structure?


UFS do first, so fsck ALWAYS know where to find inodes. getting 
trash-write on random place would destroy few files but not the whole 
thing.



: 3) how about reboots? From my understanding reboot, even clean, means losing
: ALL cached data. am i right?

   All swapcache-cache data is lost on reboot.

this is quite a disadventage. Of course production system would not crash 
every day, but imagine that crash happened for any reason (power spike, 
lost of power etc.) then i rebooted and it works and now all users wants 
to use server so we get time of highest load and... swapcache is empty.


warmup will take some time and system would be slower that time.

Still ability to manually decide what files are cached is plain great and 
exactly what i need.


And finally losing swapcache device means no data loss. i could risk using 
cheap flash media that have warranty. if it fails, just replace.



: In spite of HAMMER being far far far better implementation of filesystem
: that ZFS, i don't want to use any of them for the same reasons.
:
: UFS is safe.

   A large, full UFS filesystem can take hours to fsck, meaning that a


35 minutes for largest i use - 2TB. I ALWAYS(TM) do one filesystem per 
drive or 2 drive mirror, rest are checked in parallel.


This is safe, double disk failure would result of losing 2TB volume, not 
20TB.



   crash/reboot of the system could end up not coming back on line for
   a long, long time.  On 32-bit systems the UFS fsck can even run the
   system out of memory and not be able to complete.  On 64-bit systems
   this won't happen but the system can still end up paging heavily
   depending on how much ram it has.


wrong. I've never got more than 500MB RAM per drive. and i always have 
more than 500MB per drive on machine.


i would need to have tens of millions of files per disk. it doesn't 
happen. i never had more than 3 millions.



   In contrast, HAMMER is instant-up and has no significant physical
   memory limitations (very large HAMMER filesystems can run on systems
   with small amounts of memory).


this is true and i already tested it.

I could call HAMMER ZFS done right but still it is dangerous.

Until i would UNDERSTAND hammer is safe i will not believe it.

HAMMER is really great deal of Your work but if it is a good idea 
(contrary to good implementation) is something else.



: thanks

   With some work, people have had mixed results, but DragonFly is designed
   to run on actual hardware and not under virtualization.


Seems like you missed my question. i DO NOT WANT to virtualize DragonFly.
Just as i dont want to virtualize FreeBSD now.
Today virtualize everything trend is plain stupid, and people like 
stupid ideas. i don't.


But i run few windows sessions using VirtualBox UNDER FreeBSD. Without 
such option i would need separate machine for it.


for everything else i use jails, and DragonFly have working jails.


Re: questions from FreeBSD user

2012-07-16 Thread Wojciech Puchar



For now i am FreeBSD user, but when i read what are proposed by
developers(!) for FreeBSD i clearly understand i will need something else.


Which FreeBSD plans do you find worrisome?

more and more user friendly features that are proposed as well as 
confirmed by developers. Read FreeBSD-hackers mailing lists since 2 
months.




Re: questions from FreeBSD user

2012-07-16 Thread Wojciech Puchar



more and more user friendly features that are proposed as well as
confirmed by developers. Read FreeBSD-hackers mailing lists since 2 months.


Found training wheels and replacing rc(8) threads.
Anything else?


this is off topic so i recommend stopping that here.
i would post you privately but i don't mail with users of world-scale 
corporations free service like gmail. sorry.


mail me if you want and have normal e-mail.


questions from FreeBSD user

2012-07-15 Thread Wojciech Puchar
i have few questions. i am currently using FreeBSD, dragonfly was just 
tried.


1) why on amd64 platform swapcache is said to be limited to 512GB? 
actually it may be real limit on larger setup with more than one SSD.


2) it is said that you are limited to cache about 40 inodes unless you 
use sysctl setting vfs.hammer.doublebuffer or so.


in the same time it is said to be able to cache any filesystem.

Can UFS be cached efficiently with millions of files?

3) how about reboots? From my understanding reboot, even clean, means 
losing ALL cached data. am i right?



In spite of HAMMER being far far far better implementation of filesystem 
that ZFS, i don't want to use any of them for the same reasons.


UFS is safe.

4) will virtualbox or qemu-kvm or similar tool be ported ever to 
DragonFly? i am not fan of virtualizing everything, which is pure 
marketing nonsense, but i do some virtualize few windows sessions on 
server.


thanks