HEADS UP: 802.11 ioctl structure size change

2008-01-18 Thread Sepherosa Ziehau
HEAD users that have checked out following change needs to do at least
quickworld/realquickworld and quickkernel

On Jan 19, 2008 3:34 PM, Sepherosa Ziehau <[EMAIL PROTECTED]> wrote:
> sephe   2008/01/18 23:34:13 PST
>
> DragonFly src repository
>
>   Modified files:
> sbin/ifconfigifieee80211.c
> sys/netproto/802_11  ieee80211_ioctl.h
> sys/netproto/802_11/wlan ieee80211_ioctl.c
>   Log:
>   - Capabilities information field is 2bytes long.  Mark old sta info
> structure's capinfo as deprecated.  Add new field in sta info to
> deliver capabilities information to userland applications.
>   - Update ifconfig(8)
>
>   Revision  ChangesPath
>   1.20  +1 -1  src/sbin/ifconfig/ifieee80211.c
>   1.8   +2 -1  src/sys/netproto/802_11/ieee80211_ioctl.h
>   1.13  +2 -1  src/sys/netproto/802_11/wlan/ieee80211_ioctl.c
>
>
> http://www.dragonflybsd.org/cvsweb/src/sbin/ifconfig/ifieee80211.c.diff?r1=1.19&r2=1.20&f=u
> http://www.dragonflybsd.org/cvsweb/src/sys/netproto/802_11/ieee80211_ioctl.h.diff?r1=1.7&r2=1.8&f=u
> http://www.dragonflybsd.org/cvsweb/src/sys/netproto/802_11/wlan/ieee80211_ioctl.c.diff?r1=1.12&r2=1.13&f=u
>



-- 
Live Free or Die


Re: cvsup

2008-01-18 Thread Garance A Drosihn

At 9:16 AM + 1/18/08, Vincent Stemen wrote:


I realize that everything I read comparing cvsup to rsync indicates that
cvsup is faster with mirroring cvs repositories.  So I decided to run my
own tests this evening.  I thought everybody might be interested in the
results.

My results are not even close to what others are claiming.  Rsync was
vastly faster.  Granted, so far as I know, this was not right after
a large number of files have been tagged, but as you mentioned, that
does not happen very often.  If anybody wants to email me after that
does happen, I will try to make time to re-run the tests.


This is a very inadequate benchmark.  Certainly rsync works very well,
and the dragonfly repository's have enough capacity that they can
handle whatever the load is.  So, I realize that it is perfectly fine
to use rsync if that's what works for you.  And I realize that there
is the (unfortunate) headache due to needing modula-3 when it comes to
CVSUP.  So, I'm not saying anyone has to use cvsup, and I am sure that
rsync will get the job done.  I'm just saying that this specific
benchmark is so limited that it is meaningless.

What was the load on the server?  How well does rsync scale when there
are thousands of people updating at the same time?  (in particular, how
well does the *server* handle that?).

How big of an update-interval were you testing with?  If I'm reading
your message right, the largest interval you tested was 2-days-worth
of updates.  For most larger open-source projects, many end-users are
going at least a week between sync's, and many of my friends update
their copy of the freebsd repository once every three weeks.  Some
update their copy only two or three times a year, or after some
significant security update is announced.  Note that this means the
server sees a huge spike right after security updates, because there
are connections from people who haven't sync'ed in months, and who
probably would not have sync'ed for a few more months if it wasn't
for the security update.

Tags occur rarely, but they do occur.  And in the case of dragonfly,
there are also the sliding tags that Matt likes to use.  So while he
doesn't create a new tag very often, he does move the tag in a group
of files.  (Admittedly, I have no clue as to how well cvsup does
with a moved tag, but it would be worthwhile to know when it comes
to benchmarking rsync-vs-cvsup for dragonfly.  It is quite possible
that cvsup will actually get confused by a moved-tag, and thus not
be able to optimize the transfer of those files)

The shorter the update-interval, the less likely that all the
CVS-specific optimizing code in cvsup will do any good.  Note, for
instance:

For a 1.5 hour old repository:
rsync total time: 34.846
cvsup total time: 3:40.77
=
cvsup took 6.33 times as long as rsync

For a 2 day old repository:
rsync total time: 2:03.07
cvsup total time: 9:14.73
=
cvsup took 4.5 times as long as rsync

Even with just two data points, we see that larger the window, the
less-well that rsync does compared to cvsup.

In that 1.5 hour old repository, how many files were changed?  10?
100?  If there are only 100 files to do *anything* with, then there
isn't much for cvsup to optimize on.  It's pretty likely that rsync
is going to be faster than cvsup at "sync"ing a file which has zero
changes which need to be sync'ed.

If you have users who are regularly sync-ing their repository
every 1.5 hours, 24 hours a day, 7 days a week, then there are some
cvsup servers which would block that user's IP address for being such
an annoying pest.  The only people who need to sync *that* often are
people who themselves are running mirrors of the repository.  For all
other users, syncing that often is an undesirable and unwanted load
on the server.  The people running a sync-server wouldn't want to
optimize for behavior patterns which they don't want to see in the
first place.

I would say the *smallest* window that you should bother testing is
an six-hour window (which would be four updates per day), and that
the most interesting window to test would be a 1-week window.

It took more than a year to write cvsup, by someone who was working
basically full-time at it.  (that's what he told me, at least!  :-)
He wouldn't have put in all that work if there was no point to it,
and he would have based his work on a wide range of usage patterns.


Unless I am overlooking something obvious, I think I am going to
stick with updating our repository via rsync :-).


As I said earlier, rsync is certainly a reasonable solution.  I'm just
commenting on the "benchmark".  And I realize I haven't done *any*
benchmarks, so I can't claim much of anything either.  But you would
need a much more elaborate and tedious set of benchmarks before you
could draw any significant conclusions.

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior 

Re: Futures - HAMMER comparison testing?

2008-01-18 Thread Michael Neumann

Bill Hacker wrote:

Michael Neumann wrote:

Bill Hacker wrote:
I'm guessing it will be a while yet before HAMMER is ready for this, 
but it seems to be moving fast - and cleanly - so...


Sorry to hijack this thread. Just wanna mention a little write down of 
mine about HammerFS features (and sometimes comparing it with ZFS):


http://www.ntecs.de/blog/articles/2008/01/17/zfs-vs-hammerfs

I can't await to try it out in real!

Regards,

  Michael


Michael - that's a good start!

Such as good start, that I'd like to  suggest the 'un-blog-like' course 
of correcting the original, at least 'for a while yet' rather than 
blogging-on the errata at the tail.


Yes, I fixed it a bit ;-)

I don't think your ZFS assessment is 100% accurate, so a bit of clean-up 
there could reduce flame-bait.


Done.

That may earn further alteration if/as/when Sun integrate 'lustre' 
features. At present, HAMMMERfs and ZFS have only partial overlap in 
their Venn diagrams.


But I encourage you to keep updating and tracking the changes.

Maybe it should open with a date-stamped 'current state of ...'?


You mean using a "@@0x..." (HAMMER timestamp) in the URL ;-)
I wait until I port my blogging software natively over to HAMMER ;-)

Regards,

  Michael


Re: how to get dragonfly and freebsd source code

2008-01-18 Thread dark0s Optik
Indeed, I would like ti understand how programming ultrasparc
microprocessors. It is a my personal interest, but I want use
DragonFly.

2008/1/18, Erik Wikström <[EMAIL PROTECTED]>:
> On 2008-01-18 18:54, dark0s Optik wrote:
> > 2008/1/14, Nicolas Thery <[EMAIL PROTECTED]>:
> >> 2008/1/14, dark0s Optik <[EMAIL PROTECTED]>:
> >> > How can I to get source code of DragonFly and FreeBSD?
> >>
> >> On addition, you can also browse the source with opengrok and lxr:
> >>
> >> http://opengrok.creo.hu/dragonfly/
> >>
> >> http://fxr.watson.org/
> >
> > Ok, this is the best solution!
> > Now, I would like to analyze freebsd code about sparc64 architecture.
> > Can anyone to suggest me documents and people for helping to analyze
> > above code?
>
> If you want help with FreeBSD then a FreeBSD mailing-list is probably
> the place to ask, for issues regarding the Sparc64 port you should
> probably use the freebsd-sparc64 list. For more info on mailing-lists:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/eresources.html#ERESOURCES-MAIL
>
> --
> Erik Wikström
>


-- 
only the paranoid will survive


Re: how to get dragonfly and freebsd source code

2008-01-18 Thread Erik Wikström
On 2008-01-18 18:54, dark0s Optik wrote:
> 2008/1/14, Nicolas Thery <[EMAIL PROTECTED]>:
>> 2008/1/14, dark0s Optik <[EMAIL PROTECTED]>:
>> > How can I to get source code of DragonFly and FreeBSD?
>>
>> On addition, you can also browse the source with opengrok and lxr:
>>
>> http://opengrok.creo.hu/dragonfly/
>>
>> http://fxr.watson.org/
> 
> Ok, this is the best solution!
> Now, I would like to analyze freebsd code about sparc64 architecture.
> Can anyone to suggest me documents and people for helping to analyze
> above code?

If you want help with FreeBSD then a FreeBSD mailing-list is probably
the place to ask, for issues regarding the Sparc64 port you should
probably use the freebsd-sparc64 list. For more info on mailing-lists:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/eresources.html#ERESOURCES-MAIL

-- 
Erik Wikström


Re: Futures - HAMMER comparison testing?

2008-01-18 Thread Matthew Dillon

:> So it comes down to how much space you are willing to eat up to store
:> the history, and what kind of granularity you will want for the history.
:
:OK - so it WILL be a 'tunable', then.
:...
:HAMMER cannot protect against all forms of human error - BUT - if it 
:inherently rebuilds more intelligently than the least-intelligent of 
:RAID1, it can greatly reduce the opportunity for that sort of 'accident' 
:to occur.

One idea I had was to number the records as they were layed down on
disk, and validate the file or directory by determining that no records
were missing.  But that doesn't fly very well when things are deleted
and replaced.

Another idea, much easier to implement, is to have a way to guarentee that 
all
the bits and pieces of the file had been found by creating a record which
contains a CRC of the whole mess.   One could have a 'whole file' CRC, or
even a 'whole directory tree' CRC (as-of a particular timestamp).
Since HAMMER is record oriented associating special records with inodes
is uttery trivial.

For archival storage one could then 'tag' a directory tree with such a
record and have a way of validating that the directory tree had not become
corrupted, or was recovered properly.

For encryption one could 'tag' a directory tree or a file with an
encryption label.

Not implemented yet but a definite possibility.  There are so many things
we can do with HAMMER due to its record oriented nature.

:> Ultimately it will be extremely efficient simply by the fact that
:> there will be a balancer going through it and repacking it.
:> 
:"... constantly, and in the background..." (I presume)

In the background, for sure.  Probably not constantly, but taking a piece
at a time with a nightly cron job.  One thing I've learned over the
years is that it is a bad idea to just go randomly accessing the disk
at unexpected times.

The nice thing is that the balancing can occur on a cluster-by-cluster
basis, so one can do a bunch of clusters, then stop, then do a bunch
more, then stop, etc.

:Is variable-length still likely to have a payback if the data records 
:were to be fixed at 512B or 1024B or integer multiples thereof?

Not a good idea for HAMMER.  A HAMMER record is 96 bytes and a
HAMMER B-Tree element is 56 bytes.  That's 152 bytes of overhead
per record.  The smaller the data associated with each record,
the larger the overhead and the less efficient the filesystem
storage model.  Also, while accessing records is localized, you
only reap major benefits over a linear block storage scheme
if you can make those records reference a significant amount of
data.

So for large static files we definitely want to use a large
per-record data size, and for small static files we want to use
a small data size.  Theoretically the best-case storage for a tiny
file would be 96 + 56 + 128 (inode data) + 64 (data), or
344 bytes of disk space.  That's very, very good.  (In the current
incarnation the minimum disk space use per file is 96 + 56 + 128 + 16384).

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


Re: cvsup

2008-01-18 Thread Vincent Stemen
On 2008-01-18, Simon 'corecode' Schubert <[EMAIL PROTECTED]> wrote:
> Vincent Stemen wrote:
>> *** using cvsup
>> 
>> cvsup -L 3 ./DragonFly-cvs-supfile  155.08s user 69.40s system 40% cpu 
>> 9:14.73 total
>
> I for sure didn't use cvsup for a long time, but that seems quite long. 
> This can happen for various reasons (just guesses):
>
> - CVSup not knowing certain tags ("commitid") and thus falling back to a 
> less efficient mechanism
>
> - not using the list files which store file information.  Be sure to run 
> cvsup -s

Ok.  I was not aware of the -s switch.  That is a good suggestion.
I will see if I can make the time to re-run the comparison tests this
evening with "cvsup -s".

Although, technically it was probably the most fair performance
comparison to rsync without the "-s" because rsync still works properly
if files have been locally modified.  



Re: how to get dragonfly and freebsd source code

2008-01-18 Thread dark0s Optik
Ok, this is the best solution!
Now, I would like to analyze freebsd code about sparc64 architecture.
Can anyone to suggest me documents and people for helping to analyze above code?

Thanks,
dark0s

2008/1/14, Nicolas Thery <[EMAIL PROTECTED]>:
> 2008/1/14, dark0s Optik <[EMAIL PROTECTED]>:
> > How can I to get source code of DragonFly and FreeBSD?
>
> On addition, you can also browse the source with opengrok and lxr:
>
> http://opengrok.creo.hu/dragonfly/
>
> http://fxr.watson.org/
>


-- 
only the paranoid will survive


Re: cvsup

2008-01-18 Thread Bill Hacker

Vincent Stemen wrote:

*snip*



Unless I am overlooking something obvious,


It is not likely so many projects would be using cvsup for as long as 
they have if the rsync advantage was that great, or that simple [1].


Have you:

A) compared the loads and bandwidth as well as the time on BOTH end 
machines - host as well as client?


B) tested for the 'more common' case where cvsup/csup are applied to 
rather more sparse pulls of just a fraction of older, larger 
repositories (older *BSD's) - and by more users simultaneously?


Unless I am wrong, cvsup/csup places more of the load of determining 
what to pull on the client, less on the source server.


> I think I am going to stick

with updating our repository via rsync :-).




It may be the right answer for now, and for what you are doing.

It may be less so for general end-user use - or even your own if/as/when 
mirror hosts are under heavier load.


Most older mirror hosts throttle each connection as well as limit the 
maximum number permitted simultaneously. The one you are using presently 
seems not to do so.


The key is to include measurement of host and bandwidth as well as 
client.  TANSTAAFL.



Bill

[1][ subversion, mercurial, et al alternatives are a different type of 
issue.


Re: cvsup

2008-01-18 Thread Simon 'corecode' Schubert

Vincent Stemen wrote:

*** using cvsup

cvsup -L 3 ./DragonFly-cvs-supfile  155.08s user 69.40s system 40% cpu 9:14.73 
total


I for sure didn't use cvsup for a long time, but that seems quite long. 
This can happen for various reasons (just guesses):


- CVSup not knowing certain tags ("commitid") and thus falling back to a 
less efficient mechanism


- not using the list files which store file information.  Be sure to run 
cvsup -s


Unless I am overlooking something obvious, I think I am going to stick 
with updating our repository via rsync :-).


I do advise to stick to rsync, yes :)

cheers
  simon

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /"\
Work - Mac  +++  space for low €€€ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



Re: Futures - HAMMER comparison testing?

2008-01-18 Thread Bill Hacker

Matthew Dillon wrote:
:But - at the end of the day - how much [extra?] on-disk space will be 
:needed to insure mount 'as-of' is 'good enough' for some realisitic span 
:(a week?, a month?)? 'Forever' may be too much to ask.


The amount of disk needed is precisely the same as the amount of
historical data (different from current data) that must be retained,
plus record overhead.

So it comes down to how much space you are willing to eat up to store
the history, and what kind of granularity you will want for the history.


OK - so it WILL be a 'tunable', then.

FWIW - my yardsticks at the 'heavy' or most wasteful end are punch card 
& paper/mylar tape on low/no RAM systems, where 'backup' is essentially 
of 'infinite' granularity, moving through WORM storage to Plan9 Venti, 
et al.


AFAIK, none of the oldest 'write once' methods are in even 'virtualized' 
use - save possibly in the FAA or military fields, as few entities have 
any prectical use for that sort of history.


At the other end, one of our projects involved storing the floor plans 
of 60,000 buildings on RAID1. A technician manually rebuiding a failed 
array mirrored empty HDD to full, and over 600 CD's had to be manually 
reloaded.


In that case, there never had been risk of loss - anyone could buy the 
latast CD's from the government lands department.


What his error cost us was 'only' time and inconvenience.

HAMMER cannot protect against all forms of human error - BUT - if it 
inherently rebuilds more intelligently than the least-intelligent of 
RAID1, it can greatly reduce the opportunity for that sort of 'accident' 
to occur.




:How close are we to being able to start predicting that storage-space 
:efficiency relative to ${some_other_fs}?

:
:Bill

Ultimately it will be extremely efficient simply by the fact that
there will be a balancer going through it and repacking it.


"... constantly, and in the background..." (I presume)


".. and with tunable frequency and priority." (I wish, eventually).


For the moment (and through the alpha release) it will be fairly
inefficient because it is using fixed 16K data records, even for small
files.  The on-disk format doesn't care... records can reference 
variable-length data from around 1MB down to 64 bytes.  But supporting

variable-length data requires implementing some overwrite cases that
I don't want to do right now.


Is variable-length still likely to have a payback if the data records 
were to be fixed at 512B or 1024B or integer multiples thereof?


> This only applies to regular files

of course.  Directories store directory entries as records, not as data,
so directories are packed really nicely. 


e.g. if you have one record representing, say, 1MB of data, and you
write 64 bytes right smack in the middle of that, the write code will
have to take that one record, mark it as deleted, then create three
records to replace it (one pointing to the unchanged left portion of
the original data, one pointing to the 64 bytes of overwritten data,
and one pointing to the unchanged right portion of the original data).
The recovery and deletion code will also have to deal with that sort
of overlayed data situation.  I'm not going to be writing that
feature for a bit.  There are some quick hacks I can do too, for
small files, but its not on my list prior to the alpha release.

Remember that HAMMER is designed for large filesystems which don't fill
up instantly.  Consequently it will operate under the assumption that
it can take its time to recover free space.  If one doesn't want to use
the history feature one can turn it off, of course, or use a very
granular retention policy.

My local backup system is currently using a 730GB UFS partition and it
is able to backup apollo, crater, and leaf with daily cpdups (using
the hardlink snapshot trick) going back about 3 months.  In fact, I
can only fill up that 730GB about half way because fsck runs out of
memory and fails once you get over around 50 million inodes (mostly
dependant on the number of directories you have)... on UFS that is.
I found that out the hard way.


.which reminds us what we will ALL soon face if we do NOT seek newer 
solutions!




It takes almost a day for fsck to
recover the filesystem even half full.  I'll be happy when I can throw
that old stuff away.

-Matt
	Matthew Dillon 
	<[EMAIL PROTECTED]>


. or just relegate it to what it still does faster/better. IF..

I hope and trust that DragonFly BSD will earn a place as a 'broad 
spectrum' OS, competitive across the board with alternatives.


But - if not, or even just 'not at first'

- much as OpenBSD and NetBSD have long been seen as good choices for 
routers and firewalls, DragonFly should be able to carve out a viable 
niche as the better choice for centralized / c

Re: cvsup

2008-01-18 Thread Vincent Stemen
On 2008-01-16, Simon 'corecode' Schubert <[EMAIL PROTECTED]> wrote:
>
> You know that you can get a listing with rsync?
>
> sweatshorts % rsync chlamydia.fs.ei.tum.de:: 

No I did not know that.  I have been using rsync for transferring files
with ssh but had not studied the features for talking to an rsync daemon
on the other end.

Thank you very much Simon, Joerg, and Peter for the information.  It was
all useful and got me going.

>
> Considering the fact that you only changed one line per file, that's quite 
> a lot.  Plus it doesn't say how many round trips it was needing.
>
>> As you can see, it took 9 seconds for a full download, but only less than 1.5
>> seconds to update them.  That seems reasonably fast to me.  Is cvsup really
>> faster than that?  I am sceptical that it could be much faster.
>
> Yes, cvsup is way faster in such a case.  Maybe not necessarily for 8 
> files, but for 800 for sure.  It pipelines communication, so that there is 
> no need to wait for immediate replies, thus saving network roundtrip times.
>
> cheers
>simon
>

I realize that everything I read comparing cvsup to rsync indicates that
cvsup is faster with mirroring cvs repositories.  So I decided to run my
own tests this evening.  I thought everybody might be interested in the 
results.

My results are not even close to what others are claiming.  Rsync was
vastly faster.  Granted, so far as I know, this was not right after
a large number of files have been tagged, but as you mentioned, that
does not happen very often.  If anybody wants to email me after that
does happen, I will try to make time to re-run the tests.

Peter, I decided to take you up on your posting I found from the end of
2006 in a discussion about the same subject, where you said

From: Peter Avalos
Date: 2006-12-26 05:30:30

I'll also extend that if anyone wants to use theshell.com, go for
it.  It is well-connected, and I don't mind if people go to town on it.

I hope the offer still stands :-).

My tests were doing identical updates using both rsync and cvsup, back to
back from theshell.com.


Environment
===
DragonFly 1.10.1-RELEASE system with a 1.11.0-DEVELOPMENT kernel.
cvsup  version SNAP_16_1h
rsync  version 2.6.9

The updates included the CVSROOT, doc, and src directories.
"rsup" is the rsync script I used.  It contains:

  cd /home/dcvs || exit 1
  
  for d in CVSROOT doc src; do
  #   rsync -avHz --delete rsync.theshell.com::DragonFly/dcvs/$d .
  rsync -avH --delete rsync.theshell.com::DragonFly/dcvs/$d .
  done
  
I did one test with compression (-z) and one without.

For the cvsup tests, I used the standard DragonFly-cvs-supfile that that
comes with DragonFly with the following three file collections
uncommented.
dragonfly-cvs-root
dragonfly-cvs-src
dragonfly-cvs-doc


Here are the results


Updating about a 2 day old repository
With compression
cvsup verbose level 3 (-L 3)

Thu Jan 17 18:50:11 CST 2008

*** using rsync

rsup  8.29s user 13.31s system 17% cpu 2:03.07 total


Thu Jan 17 19:13:10 CST 2008

*** using cvsup

cvsup -L 3 ./DragonFly-cvs-supfile  155.08s user 69.40s system 40% cpu 9:14.73 
total


rsync total time: 2:03.07
cvsup total time: 9:14.73
=
cvsup took 4.5 times as long as rsync
=


Updating about a 1.5 hour old repository
Without compression  (Probably made no difference since there were
  apparently no updates)
cvsup default verbose level 1


Thu Jan 17 20:17:12 CST 2008

*** using cvsup

cvsup ./DragonFly-cvs-supfile  54.17s user 26.03s system 36% cpu 3:40.77 total


Thu Jan 17 20:29:40 CST 2008

*** using rsync

rsup  1.34s user 3.41s system 13% cpu 34.846 total


rsync total time: 34.846
cvsup total time: 3:40.77
=
cvsup took 6.33 times as long as rsync
=


I am seeing rsync perform from 4 to over 6 times faster than cvsup.
Not only that, from the output of "time" rsync took substantially less
user time, system time, and cpu percentage.

As long as the cvsup file collections are the same as the three
directories I rsync'd, then this should be a one on one fair comparison.

Unless I am overlooking something obvious, I think I am going to stick 
with updating our repository via rsync :-).




Re: Futures - HAMMER comparison testing?

2008-01-18 Thread Matthew Dillon

:But - at the end of the day - how much [extra?] on-disk space will be 
:needed to insure mount 'as-of' is 'good enough' for some realisitic span 
:(a week?, a month?)? 'Forever' may be too much to ask.

The amount of disk needed is precisely the same as the amount of
historical data (different from current data) that must be retained,
plus record overhead.

So it comes down to how much space you are willing to eat up to store
the history, and what kind of granularity you will want for the history.

:How close are we to being able to start predicting that storage-space 
:efficiency relative to ${some_other_fs}?
:
:Bill

Ultimately it will be extremely efficient simply by the fact that
there will be a balancer going through it and repacking it.

For the moment (and through the alpha release) it will be fairly
inefficient because it is using fixed 16K data records, even for small
files.  The on-disk format doesn't care... records can reference 
variable-length data from around 1MB down to 64 bytes.  But supporting
variable-length data requires implementing some overwrite cases that
I don't want to do right now.  This only applies to regular files
of course.  Directories store directory entries as records, not as data,
so directories are packed really nicely. 

e.g. if you have one record representing, say, 1MB of data, and you
write 64 bytes right smack in the middle of that, the write code will
have to take that one record, mark it as deleted, then create three
records to replace it (one pointing to the unchanged left portion of
the original data, one pointing to the 64 bytes of overwritten data,
and one pointing to the unchanged right portion of the original data).
The recovery and deletion code will also have to deal with that sort
of overlayed data situation.  I'm not going to be writing that
feature for a bit.  There are some quick hacks I can do too, for
small files, but its not on my list prior to the alpha release.

Remember that HAMMER is designed for large filesystems which don't fill
up instantly.  Consequently it will operate under the assumption that
it can take its time to recover free space.  If one doesn't want to use
the history feature one can turn it off, of course, or use a very
granular retention policy.

My local backup system is currently using a 730GB UFS partition and it
is able to backup apollo, crater, and leaf with daily cpdups (using
the hardlink snapshot trick) going back about 3 months.  In fact, I
can only fill up that 730GB about half way because fsck runs out of
memory and fails once you get over around 50 million inodes (mostly
dependant on the number of directories you have)... on UFS that is.
I found that out the hard way.  It takes almost a day for fsck to
recover the filesystem even half full.  I'll be happy when I can throw
that old stuff away.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>