Re: Plans for 1.8+ (2.0?)

2007-02-21 Thread Matthew Dillon

:Hey Matt,
:Thanks for your answers, but i have one more question for you. Will the 
:new file system be capable of ACLs?
:
:Cheers,
:Petr

It will be capable of storing meta-data records for certain, so yes.
However, the OS needs an ACL implementation to use the meta-data store
the filesystem makes available.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: Plans for 1.8+ (2.0?)

2007-02-20 Thread B. Estrade

On 2/19/07, Matthew Dillon [EMAIL PROTECTED] wrote:

I've been letting the conversation run to see what people have to say,
I am going to select this posting by Brett to answer, as well as
provide some more information.

:I am not sure I understand the potential aim of the new file system -
:is it to allow all nodes on the SSI (I purposefully avoid terms like
:grid) to have all local data actually on their hard drive or is it
:more like each node is aware of all data on the SSI, but the data may
:be scattered about all of the nodes on the SSI?

I have many requirements that need to be fullfilled by the new
filesystem.  I have just completed the basic design work and I feel
quite confident that I can have the basics working by our Summer
release.

- On-demand filesystem check and recovery.  No need to scan the entire
  filesystem before going live after a reboot.

- Infinite snapshots (e.g. on 30 second sync), ability to collapse
  snapshots for any time interval as a means of recovering space.

- Multi-master replication at the logical layer (not the physical layer),
  including the ability to self-heal a corrupted filesystem by accessing
  replicated data.  Multi-master means that each replicated store can
  act as a master and new filesystem ops can run independantly on any
  of the replicated stores and will flow to the others.

- Infinite log Replication.  No requirement to keep a log of changes
  for the purposes of replication, meaning that replication targets can
  be offline for 'days' without effecting performance or operation.
  ('mobile' computing, but also replication over slow links for backup
  purposes and other things).

- 64 bit file space, 64 bit filesystem space.  No space restrictions
  whatsoever.

- Reliably handle data storage for huge multi-hundred-terrabyte
  filesystems without fear of unrecoverable corruption.

- Cluster operation - ability to commit data to locally replicated
  store independantly of other nodes, access governed by cache
  coherency protocols.

:So, in effect, is it similar in concept to the notion of storing bits
:of files across many places using some unified knowledge of where the
:bits are? This of course implies redunancy and creates synchronization
:problems to handle (assuming bo global clock), but I certainly think
:it is a good goal.  In reality, how redundant will the data be?  In a
:practical sense, I think the principle of locality applies here -
:the pieces that make up large files will all be located very close to
:one another (aka, clustered around some single location).

Generally speaking the topology is up to the end-user.  The main
issue for me is that the type of replication being done here is
logical layer replication, not physical replication.  You can
think of it as running a totally independant filesystem for each
replication target, but the filesystems cooperate with each other
and cooperate in a clustered environment to provide a unified,
coherent result.

:From my experiences, one of the largest issues related to large scale
:computing is the movement of large files, but with the trend moving
:towards huges local disks and many-core architectures (which I agree
:with), I see the grid concept of geographically diverse machines
:connected as a single system being put to rest in favor of local
:clusters of many-core machines.
:...
:With that, the approach that DfBSD is taking is vital wrt distributed
:computing, but any hard requirement of moving huge files long
:distances (even if done so in parallel) might not be so great.  What
:is required is native parallel I/O that is able to handle locally
:distributed situations - because it is within a close proximity that
:many processes would be writing to a single file.  Reducing the
:scale of the problem may provide some clues into how it may be used
:and how it should handle the various situations effectively.
:...
:Additionally, the concept of large files somewhat disappears when you
:are talking about shipping off virtual processes to execute on some
:other processor or core because they are not shipped off with a whole
:lot of data to work on.  I know this is not necessarily a SSI concept,
:but one that DfBSD will have people wanting to do.
:
:Cheers,
:Brett

There are two major issues here:  (1) Where the large files reside
and (2) How much of that data running programs need to access.

For example, lets say you had a 16 gigabyte database.  There is a
big difference between scanning the entire 16 gigabytes of data
and doing a query that only has to access a few hundred kilobytes
of the data.

No matter what, you can't avoid reading data that a program insists
on reading.  If the data is not cacheable or the amount of data
being read is huge, the cluster has the choice of moving the program's
running context 

mailing list etiquette (was: Re: Plans for 1.8+ (2.0?))

2007-02-20 Thread Simon 'corecode' Schubert

B. Estrade wrote:

On 2/19/07, Matthew Dillon [EMAIL PROTECTED] wrote:

[complete quote of matt's mail]


Thanks for that, Matt.


could people please try to follow common netiquette standards, especially 
trimming extensive quotes and avoid top posting (obviously didn't happen in 
this case).  thanks.

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /\
Work - Mac  +++  space for low €€€ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



signature.asc
Description: OpenPGP digital signature


Re: Plans for 1.8+ (2.0?)

2007-02-20 Thread Petr Janda
As in the design spec. It works on paper. I haven't started coding 
anything yet.

-Matt
	Matthew Dillon 
	[EMAIL PROTECTED]


  

Hey Matt,
Thanks for your answers, but i have one more question for you. Will the 
new file system be capable of ACLs?


Cheers,
Petr


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Bill Hacker

Robert Luciani wrote:

*snip* (clustering discussion..)



Jokingly: I think the notion of functional individual computers
helping each other out sounds a bit like neourons in a brain. The
technological singularity is coming, nothing can stop it!


Oddly, they *are* - but not in the way theoreticians around the time 
'Metropolis' or even 'Brave New World' were written might have enviaged, i.e. 
neither centrally controlled nor even 'close coupled'.


The 'net, e-mail, file  media exchange, IRC... one could go on.. seem to JFDI 
w/r all manner of pragmatic 'sharing' with one of the essential characteristics 
of their human users:


- For the most part, a little 'latency' is not unwelcome.

IOW 'Mañana' - response / gratification in minutes, if not hours, is usually as 
good as we could absorb anyway w/o becomeing totally 'time-slaved' to the 
machinery that was intended to be servant - not master.


Specialized scientific applications are just that - specialized, and often 
warranting from-the-ground-up bespoke software - OS included.


Not to put too fine a point on it, but most of these need a very different core 
than (any of the) Unix anyway. Real-Time Exec's, self-onitoring, 
soft-fault-tolerance, etc.


Bill


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Peter Serwe

Michel Talon wrote:

Of course it is none of my business, but i have always wandered about the
real usefulness of a clustering OS in the context of free systems, and you
post allows me to explain why. People who have the money to buy machines by
the thousands, run them, pay the electricity bill, etc. should also have
the money to pay $$$ to IBM, and not count on the generosity of unpaid
developers. Small installations are the natural target of free systems, and
in this context i remain convinced that the clustering ideas have an
utility next to null. And frankly, i doubt they have any utility for big
systems if you don't use high speed, low latency connects which are far
more expensive than the machines themselves. And even with this highly
expensive hardware, if you don't have high brain programmers able to really
make use of concurrency.
On the contrary, the disks of Joe User are becoming bigger and bigger, his
processor is getting more and more cores, so there is clearly a need for
file systems appropriate for big disks and sufficiently reliable ( ZFS
being an example ) and operating systems able to use multicores
efficiently.
  
Open source software in a business context is about business 
applications that a small company
can fire up, scale up, and run for the longhaul.  A lot of the 
'generosity of unpaid developers'
you refer to is actually funded by the companies they work for, where 
working hours are
left available for the people to work on pieces of 'free' code while 
maintaining a place to live,

food in their stomachs, and a lifestyle of their choosing.

For very small companies, who might have a core team of innovative 
people, and don't want
VC financing to dictate the use of Micro$oft or other, shall we say, 
typically less functional,
less customizable, and less friendly platforms on the basis of 
'protecting the investment' or
funneling money into another investment, 'free' software is critical.  
The less software my
company has to use that is proprietary, closed source, and licen$ed at 
often ludicrous fees
to pay for a marketing arm and the CEO's Porsche, Ferrari, Bentley, or 
whatever, the more
money we have to hire people and pay $alaries.  Which gives us more time 
to innovate.


Lastly, where there's an application that a business needs, there's 
resources to develop it.


I can't count the number of projects that benefit from a given company 
doing customization
work, and then releasing the non-proprietary bits out via some sort of 
'free to use' license.


If it weren't for some notably large companies using open source 
software, open source
software wouldn't be nearly as far along as it is.  Academia can only 
take it so far.


Peter

--
Peter Serwe peter at infostreet dot com

http://www.infostreet.com

The only true sports are bullfighting, mountain climbing and auto racing. 
-Earnest Hemingway

Because everything else requires only one ball. -Unknown

Do you wanna go fast or suck? -Mike Kojima

There are two things no man will admit he cannot do well: drive and make 
love. -Sir Stirling Moss



Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Matthew Dillon
I've been letting the conversation run to see what people have to say,
I am going to select this posting by Brett to answer, as well as 
provide some more information.

:I am not sure I understand the potential aim of the new file system -
:is it to allow all nodes on the SSI (I purposefully avoid terms like
:grid) to have all local data actually on their hard drive or is it
:more like each node is aware of all data on the SSI, but the data may
:be scattered about all of the nodes on the SSI?

I have many requirements that need to be fullfilled by the new
filesystem.  I have just completed the basic design work and I feel
quite confident that I can have the basics working by our Summer
release.

- On-demand filesystem check and recovery.  No need to scan the entire
  filesystem before going live after a reboot.

- Infinite snapshots (e.g. on 30 second sync), ability to collapse 
  snapshots for any time interval as a means of recovering space.

- Multi-master replication at the logical layer (not the physical layer),
  including the ability to self-heal a corrupted filesystem by accessing
  replicated data.  Multi-master means that each replicated store can
  act as a master and new filesystem ops can run independantly on any
  of the replicated stores and will flow to the others.

- Infinite log Replication.  No requirement to keep a log of changes
  for the purposes of replication, meaning that replication targets can
  be offline for 'days' without effecting performance or operation.
  ('mobile' computing, but also replication over slow links for backup
  purposes and other things).

- 64 bit file space, 64 bit filesystem space.  No space restrictions
  whatsoever.

- Reliably handle data storage for huge multi-hundred-terrabyte
  filesystems without fear of unrecoverable corruption.

- Cluster operation - ability to commit data to locally replicated
  store independantly of other nodes, access governed by cache
  coherency protocols.

:So, in effect, is it similar in concept to the notion of storing bits
:of files across many places using some unified knowledge of where the
:bits are? This of course implies redunancy and creates synchronization
:problems to handle (assuming bo global clock), but I certainly think
:it is a good goal.  In reality, how redundant will the data be?  In a
:practical sense, I think the principle of locality applies here -
:the pieces that make up large files will all be located very close to
:one another (aka, clustered around some single location).

Generally speaking the topology is up to the end-user.  The main
issue for me is that the type of replication being done here is
logical layer replication, not physical replication.  You can
think of it as running a totally independant filesystem for each
replication target, but the filesystems cooperate with each other
and cooperate in a clustered environment to provide a unified,
coherent result.

:From my experiences, one of the largest issues related to large scale
:computing is the movement of large files, but with the trend moving
:towards huges local disks and many-core architectures (which I agree
:with), I see the grid concept of geographically diverse machines
:connected as a single system being put to rest in favor of local
:clusters of many-core machines.
:...
:With that, the approach that DfBSD is taking is vital wrt distributed
:computing, but any hard requirement of moving huge files long
:distances (even if done so in parallel) might not be so great.  What
:is required is native parallel I/O that is able to handle locally
:distributed situations - because it is within a close proximity that
:many processes would be writing to a single file.  Reducing the
:scale of the problem may provide some clues into how it may be used
:and how it should handle the various situations effectively.
:...
:Additionally, the concept of large files somewhat disappears when you
:are talking about shipping off virtual processes to execute on some
:other processor or core because they are not shipped off with a whole
:lot of data to work on.  I know this is not necessarily a SSI concept,
:but one that DfBSD will have people wanting to do.
:
:Cheers,
:Brett

There are two major issues here:  (1) Where the large files reside
and (2) How much of that data running programs need to access.

For example, lets say you had a 16 gigabyte database.  There is a
big difference between scanning the entire 16 gigabytes of data
and doing a query that only has to access a few hundred kilobytes
of the data.

No matter what, you can't avoid reading data that a program insists
on reading.  If the data is not cacheable or the amount of data
being read is huge, the cluster has the choice of moving the program's
running context closer to the storage, or transfering the data over
  

Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Matthew Dillon
:I think it's important to ask oneself these questions since it's a shame
:to waste time on something that nobody can ever appreciate. On the other
:hand, in Matt's and many other's vision clustered computing will perhaps
:be an integral part of the future, just like many cored processors will be
:in all personal computers of the future. Because of this, only now are
:people scrambling and trying to figure out how they can squeeze more juice
:out of their programs and operating systems in SMP environments. 

 Believe me, I think about this all the time.  I frankly have no idea
 whether 'DragonFly The OS' itself will survive the test of time, 
 but I guarentee you that everything we develop for 'DragonFly The OS',
 especially more portable entities such as filesystems and cluster
 protocols, *WILL*.

 The moment you leave the context of the operating system codebase and
 enter the context of userspace, you guarentee code survivability.

 This is, ultimately, what the DragonFly oprating system is intended
 support... the SYSLINK clustering protocol will allow all the major
 pieces to be moved into userland.  And, I will add, that the execution
 context piece can *ALREADY* be controlled by userland, with only a modest
 number of new system calls (DragonFly's VMSPACE_*() system calls)...
 our virtual kernel is proof of that.

:Jokingly: I think the notion of functional individual computers
:helping each other out sounds a bit like neourons in a brain. The
:technological singularity is coming, nothing can stop it!

Well, we can dream.  Unless the world self destructs, AI in all its
Sci-fi glory will become a reality.  It will happen in the next 80-200
years, most likely.  However, I won't be leading that particular project.
Hehe.  I'm more an infrastructure guy.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Justin C. Sherrill
On Mon, February 19, 2007 5:37 pm, Matthew Dillon wrote:

 I have many requirements that need to be fullfilled by the new
 filesystem.  I have just completed the basic design work and I feel
 quite confident that I can have the basics working by our Summer
 release.

How much is the basics? i.e. generally usable as the filesystem, or
available only as committed code?



Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Petr Janda

Matthew Dillon wrote:



-Matt
	Matthew Dillon 
	[EMAIL PROTECTED]


  

Hey Matt,
1) Does your filesystem plan include the ability to grow and shrink a 
partition/volume? ie. /home is running out of space so we could run 
shrinkfs ... on /usr which has a lot of space and growfs ... on /home


2) Are you going to do away disklabel stuff and replace it with 
something better/easier to use?


3) Is vinum finally gonna die with the new filesystem? ie. volume 
manager will be integrated in the new file system, like ZFS?


Cheers,

Petr


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Dmitri Nikulin

On 2/20/07, Petr Janda [EMAIL PROTECTED] wrote:

Hey Matt,
1) Does your filesystem plan include the ability to grow and shrink a
partition/volume? ie. /home is running out of space so we could run
shrinkfs ... on /usr which has a lot of space and growfs ... on /home

2) Are you going to do away disklabel stuff and replace it with
something better/easier to use?

3) Is vinum finally gonna die with the new filesystem? ie. volume
manager will be integrated in the new file system, like ZFS?


Seems all of this is handled in one fell swoop with the ZFS design.
Neat 'disklabels' which are dynamically sizable and can be configured
over multiple volumes. And with very little limitation all around.

It's entirely possible to use the same foundations without using the
same file system itself, and this would be good for DragonFly even
before it supports the full ZFS.

---
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Rahul Siddharthan
Matthew Dillon wrote:
 Believe me, I think about this all the time.  I frankly have no idea
 whether 'DragonFly The OS' itself will survive the test of time,
 but I guarentee you that everything we develop for 'DragonFly The OS',
 especially more portable entities such as filesystems and cluster
 protocols, *WILL*.

 The moment you leave the context of the operating system codebase and
 enter the context of userspace, you guarentee code survivability.

But isn't there a lot of kernel infrastructure in DragonFly that you
have done, to allow this stuff to run in userspace?  So won't any
other operating system need to have that infrastructure too?  Or will
it be fairly straightforward to, say, run MattFS under FUSE on Linux?

There would certainly be great interest in the Linux world in a robust
filesystem with the features you describe and a BSD licence.  Uptake
of ZFS has been slow because its licence conflicts with the GPL, so it
can't be put in the kernel.  The other filesystems on Linux don't do
anything revolutionary.

I've been running Linux for a while now, since a sane distro
(Debian/Ubuntu) lets me focus on work rather than struggling with
ports/pkgsrc everytime I want to install a package, but I seriously
want to install DragonFly on my next computer some weeks/months from
now... perhaps dual-booting with FreeBSD or NetBSD, so that I can
share my /home partition.

Rahul


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Matthew Dillon

:
:On Mon, February 19, 2007 5:37 pm, Matthew Dillon wrote:
:
: I have many requirements that need to be fullfilled by the new
: filesystem.  I have just completed the basic design work and I feel
: quite confident that I can have the basics working by our Summer
: release.
:
:How much is the basics? i.e. generally usable as the filesystem, or
:available only as committed code?

As in the design spec.  It works on paper.  I haven't started coding
anything yet.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: Plans for 1.8+ (2.0?)

2007-02-19 Thread Matthew Dillon

:Hey Matt,
:1) Does your filesystem plan include the ability to grow and shrink a 
:partition/volume? ie. /home is running out of space so we could run 
:shrinkfs ... on /usr which has a lot of space and growfs ... on /home

The filesystem's backing store will be segmented.  Segment size can 
range from 1MB to 4GB (ish).  A 'partition' would be able to hold
multiple segments from the filesystem's point of view, but the main
purpose of the segmentation is to create large, near-independant 
blocks of data which can be dealt with on a segment-by-segment basis
(e.g. for recovery, fsck/check, replication, growing, and shrinking
purposes).

Segmentation also means the filesystem's backing store is not 
restricted to a single block device but can be glued together
with several block devices, or even mixed-and-matched between
separate replicated data stores for recovery purposes.

So, yes, it will be possible to grow or shrink the filesystem on
a segment-by-segment basis.

:2) Are you going to do away disklabel stuff and replace it with 
:something better/easier to use?

Probably not in 2.0.  The disklabel still serves a purpose with
regards to mixing and matching different filesystems.

However, within the context of the new filesystem itself each
'segment' will be completely identified in its header so segments
belonging to different filesystems could comingle within one
disklabel partition.  The disklabel would simply say that the
storage is associated with the new filesystem but would not imply
that a particular parition would be associated with a mount 1:1
like they are currently.

This would effectively remove the partitioning requirement.  You would
just say how many segments you wanted each 'filesystem' to use, 
dynamically.  Growing is easy.  Shrinking would require a background
scan or temporary relocation of the effected segments but would
also be easy.

Since segments will self-identify in their header, the actual physical
location of a segment becomes irrelevant.

If you had 1TB of storage and 4GB segments the kernel would have to
do only 256 I/O's (reading the segment headers) to self-identify all
the segments and associate them with their filesystems.  Just as an
example.  Such a list would be cached, of course, but the point is
that for recovery purposes the OS would be able to regenerate the
list from scratch, given only access to the physical storage, with 
minimal delay.

:3) Is vinum finally gonna die with the new filesystem? ie. volume 
:manager will be integrated in the new file system, like ZFS?
:
:Cheers,
:
:Petr

I personally have never used vinum.  I never trusted the code enough
to use it... not so much the original code, but the fact that it has
gone unmaintained for so long a period of time.

But, yes, the new filesystem will have its own volume manager based
on the principle of self-identifying disk segments.

Note that I am not talking about RAID-5 here.  I'm talking about
replication topologies only.  I have no intention of supporting RAID-5
or other physical abstractions beyond pure replication at the logical
level.  This isn't to say that RAID-5 would not be supportable, only
that it would have to be implemented at the block level or the device
level rather then at the filesystem level.  The replication on the
other hand will be fully integrated into the filesystem.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Rupert Pigott
On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:

 Sort of.  I'm saying that if Matt rolls his own filesystem instead of
 using ZFS, that new filesystem is either:
 
 1: not going to have the variety of tools available with zfs for handling
 things like disk pooling/snapshots/data scrubbing/insert zfs term here.

Of course writing these things takes time, but from what I understand
of Matt's approach to this problem I think it will be possible to
leverage existing tools for most of the essential housekeeping operations.
This is a good thing, it means that people don't have to learn new stuff
to use the system. 

 
 2: going to have those features, which means Matt's time is going to be
 eaten up reimplementing features already present in other filesystems.
 

True, but Matt has explained that ZFS doesn't provide the functionality
that DragonFlyBSD needs for cluster computing.

ZFS solves the problem of building a bigger fileserver, but it
doesn't help you distribute that file system across hundreds or thousands
of grid nodes. ZFS doesn't address the issue of high-latency
comms links between nodes, and NFS just curls up and dies when you try to
run it across the Atlantic with 100+ms of latency.

I don't know if IBM's GridFS does any better with the latency, but it
certainly scales a lot better but the barrier for adoption is $$$. It
costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
it. There are other options like AFS too, but people tend to be put off by
the learning curve and the fact it's an extra rather than something that
is packaged with the OS.

Then there is plan C, the Grid Cache... Quite a few people are buying
stuff like Tangosol and Gemfire to solve the inadequacies of NFS/CIFS and
avoid paying for GridFS, or learning to use something like AFS. Not a
great option in my opinion because these Grid Caches have to provide all
the FS functionality AND they can't be managed with the existing tools
(eg: dd, ls, find, cp, mv, dump, restore etc).

I think that there is a need for a distributed FS suitable for Grids 
Clusters that doesn't require $$$ of retraining to use. Matt's approach to
DragonFlyBSD seems to be aiming to fill that hole, and about time too ! :)

 It's a moot point until Matt can
evaluate modifying existing filesystems
 vs building a new one, though.  I don't want NIH-ism to get in the way
 of having something neat, though

Port ZFS yourself, it still won't solve the problem of distributing
persistent data across several hundred nodes.

I looked at writing a BSD licensed clone for OpenBSD, but I realised that
it just won't help solve the networking problems posed by Grids 
Clusters. I think that a filesystem that is built from the ground up to
work with SYSLINK will though. :)

I think that the real work lies in SYSLINK... How do you deal with node
failure, how to recover etc... What happens if the cluster gets split down
the middle ... Lots of tricky problems there that will probably take up
10x as much of Matt's time as writing a FS. :)

Cheers,
Rupert


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Michael Neumann
Rupert Pigott wrote:

 On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:
 
 Sort of.  I'm saying that if Matt rolls his own filesystem instead of
 using ZFS, that new filesystem is either:
 
 1: not going to have the variety of tools available with zfs for handling
 things like disk pooling/snapshots/data scrubbing/insert zfs term here.
 
 Of course writing these things takes time, but from what I understand
 of Matt's approach to this problem I think it will be possible to
 leverage existing tools for most of the essential housekeeping operations.
 This is a good thing, it means that people don't have to learn new stuff
 to use the system.

I like the idea that Matt is writing a new filesystem (even if it takes a
lot time to complete). As I don't plan to run a big cluster in the near
future (at least not on my laptop :), will this new filesystem be usable to
run on a single machine? Will it have advantages over ufs, e.g. dynamic
space allocation (as found in zfs)? Or would I need to use ufs for that and
the new one in a cluster environment?

There's another thing this new filesystem could solve: Easy incremental
backups (especially of mobile computers). Imagine you go out with your
laptop, work, and come back home. As your laptop could get stolen the next
time you go out, or it might get destroyed etc., it would be nice to just
plug-in at home and run an incremental backup of the filesystem (well it
can be done with cpdup/rsync as well, but you'd lose history of changes).

I think it should be already possible with the journaling stuff Matt added
just by buffering the journal on harddisk.


Regards,

  Michael


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Simon 'corecode' Schubert

Rupert Pigott wrote:

I don't know if IBM's GridFS does any better with the latency, but it
certainly scales a lot better but the barrier for adoption is $$$. It
costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
it. There are other options like AFS too, but people tend to be put off by
the learning curve and the fact it's an extra rather than something that
is packaged with the OS.


Do you happen to have links to GridFS and other systems you mentioned?

cheers
 simon

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /\
Work - Mac  +++  space for low €€€ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



signature.asc
Description: OpenPGP digital signature


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Michel Talon
Rupert Pigott wrote:

 On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:

 True, but Matt has explained that ZFS doesn't provide the functionality
 that DragonFlyBSD needs for cluster computing.
 
 ZFS solves the problem of building a bigger fileserver, but it
 doesn't help you distribute that file system across hundreds or thousands
 of grid nodes. ZFS doesn't address the issue of high-latency
 comms links between nodes, and NFS just curls up and dies when you try to
 run it across the Atlantic with 100+ms of latency.
 
 I don't know if IBM's GridFS does any better with the latency, but it
 certainly scales a lot better but the barrier for adoption is $$$. It
 costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
 it. There are other options like AFS too, but people tend to be put off by
 the learning curve and the fact it's an extra rather than something that
 is packaged with the OS.

Of course it is none of my business, but i have always wandered about the
real usefulness of a clustering OS in the context of free systems, and you
post allows me to explain why. People who have the money to buy machines by
the thousands, run them, pay the electricity bill, etc. should also have
the money to pay $$$ to IBM, and not count on the generosity of unpaid
developers. Small installations are the natural target of free systems, and
in this context i remain convinced that the clustering ideas have an
utility next to null. And frankly, i doubt they have any utility for big
systems if you don't use high speed, low latency connects which are far
more expensive than the machines themselves. And even with this highly
expensive hardware, if you don't have high brain programmers able to really
make use of concurrency.
On the contrary, the disks of Joe User are becoming bigger and bigger, his
processor is getting more and more cores, so there is clearly a need for
file systems appropriate for big disks and sufficiently reliable ( ZFS
being an example ) and operating systems able to use multicores
efficiently.



Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread B. Estrade

On 2/18/07, Michel Talon [EMAIL PROTECTED] wrote:

Rupert Pigott wrote:

 On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:

 True, but Matt has explained that ZFS doesn't provide the functionality
 that DragonFlyBSD needs for cluster computing.

 ZFS solves the problem of building a bigger fileserver, but it
 doesn't help you distribute that file system across hundreds or thousands
 of grid nodes. ZFS doesn't address the issue of high-latency
 comms links between nodes, and NFS just curls up and dies when you try to
 run it across the Atlantic with 100+ms of latency.

 I don't know if IBM's GridFS does any better with the latency, but it
 certainly scales a lot better but the barrier for adoption is $$$. It
 costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
 it. There are other options like AFS too, but people tend to be put off by
 the learning curve and the fact it's an extra rather than something that
 is packaged with the OS.

Of course it is none of my business, but i have always wandered about the
real usefulness of a clustering OS in the context of free systems, and you
post allows me to explain why. People who have the money to buy machines by
the thousands, run them, pay the electricity bill, etc. should also have
the money to pay $$$ to IBM, and not count on the generosity of unpaid
developers. Small installations are the natural target of free systems, and
in this context i remain convinced that the clustering ideas have an
utility next to null. And frankly, i doubt they have any utility for big
systems if you don't use high speed, low latency connects which are far
more expensive than the machines themselves. And even with this highly
expensive hardware, if you don't have high brain programmers able to really
make use of concurrency.
On the contrary, the disks of Joe User are becoming bigger and bigger, his
processor is getting more and more cores, so there is clearly a need for
file systems appropriate for big disks and sufficiently reliable ( ZFS
being an example ) and operating systems able to use multicores
efficiently.




I am not sure I understand the potential aim of the new file system -
is it to allow all nodes on the SSI (I purposefully avoid terms like
grid) to have all local data actually on their hard drive or is it
more like each node is aware of all data on the SSI, but the data may
be scattered about all of the nodes on the SSI?

So, in effect, is it similar in concept to the notion of storing bits
of files across many places using some unified knowledge of where the
bits are? This of course implies redunancy and creates synchronization
problems to handle (assuming bo global clock), but I certainly think
it is a good goal.  In reality, how redundant will the data be?  In a
practical sense, I think the principle of locality applies here -
the pieces that make up large files will all be located very close to
one another (aka, clustered around some single location).


From my experiences, one of the largest issues related to large scale

computing is the movement of large files, but with the trend moving
towards huges local disks and many-core architectures (which I agree
with), I see the grid concept of geographically diverse machines
connected as a single system being put to rest in favor of local
clusters of many-core machines.

With that, the approach that DfBSD is taking is vital wrt distributed
computing, but any hard requirement of moving huge files long
distances (even if done so in parallel) might not be so great.  What
is required is native parallel I/O that is able to handle locally
distributed situations - because it is within a close proximity that
many processes would be writing to a single file.  Reducing the
scale of the problem may provide some clues into how it may be used
and how it should handle the various situations effectively.

Additionally, the concept of large files somewhat disappears when you
are talking about shipping off virtual processes to execute on some
other processor or core because they are not shipped off with a whole
lot of data to work on.  I know this is not necessarily a SSI concept,
but one that DfBSD will have people wanting to do.

Cheers,
Brett

--
AIM: bz743
Desk @ LONI/HPC:
225.578.1920


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread B. Estrade

On 2/18/07, Michel Talon [EMAIL PROTECTED] wrote:

Rupert Pigott wrote:

 On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:

 True, but Matt has explained that ZFS doesn't provide the functionality
 that DragonFlyBSD needs for cluster computing.

 ZFS solves the problem of building a bigger fileserver, but it
 doesn't help you distribute that file system across hundreds or thousands
 of grid nodes. ZFS doesn't address the issue of high-latency
 comms links between nodes, and NFS just curls up and dies when you try to
 run it across the Atlantic with 100+ms of latency.

 I don't know if IBM's GridFS does any better with the latency, but it
 certainly scales a lot better but the barrier for adoption is $$$. It
 costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
 it. There are other options like AFS too, but people tend to be put off by
 the learning curve and the fact it's an extra rather than something that
 is packaged with the OS.

Of course it is none of my business, but i have always wandered about the
real usefulness of a clustering OS in the context of free systems, and you
post allows me to explain why. People who have the money to buy machines by
the thousands, run them, pay the electricity bill, etc. should also have
the money to pay $$$ to IBM, and not count on the generosity of unpaid
developers. Small installations are the natural target of free systems, and
in this context i remain convinced that the clustering ideas have an
utility next to null. And frankly, i doubt they have any utility for big
systems if you don't use high speed, low latency connects which are far
more expensive than the machines themselves. And even with this highly
expensive hardware, if you don't have high brain programmers able to really
make use of concurrency.
On the contrary, the disks of Joe User are becoming bigger and bigger, his
processor is getting more and more cores, so there is clearly a need for
file systems appropriate for big disks and sufficiently reliable ( ZFS
being an example ) and operating systems able to use multicores
efficiently.



Sorry for the dup post if the other came through - sent it from the
wrong addy ... yada yada (won't happen again, I hope ;)...

I am not sure I understand the potential aim of the new file system -
is it to allow all nodes on the SSI (I purposefully avoid terms like
grid) to have all local data actually on their hard drive or is it
more like each node is aware of all data on the SSI, but the data may
be scattered about all of the nodes on the SSI?

So, in effect, is it similar in concept to the notion of storing bits
of files across many places using some unified knowledge of where the
bits are? This of course implies redunancy and creates synchronization
problems to handle (assuming bo global clock), but I certainly think
it is a good goal.  In reality, how redundant will the data be?  In a
practical sense, I think the principle of locality applies here -
the pieces that make up large files will all be located very close to
one another (aka, clustered around some single location).


From my experiences, one of the largest issues related to large scale

computing is the movement of large files, but with the trend moving
towards huges local disks and many-core architectures (which I agree
with), I see the grid concept of geographically diverse machines
connected as a single system being put to rest in favor of local
clusters of many-core machines.

With that, the approach that DfBSD is taking is vital wrt distributed
computing, but any hard requirement of moving huge files long
distances (even if done so in parallel) might not be so great.  What
is required is native parallel I/O that is able to handle locally
distributed situations - because it is within a close proximity that
many processes would be writing to a single file.  Reducing the
scale of the problem may provide some clues into how it may be used
and how it should handle the various situations effectively.

Additionally, the concept of large files somewhat disappears when you
are talking about shipping off virtual processes to execute on some
other processor or core because they are not shipped off with a whole
lot of data to work on.  I know this is not necessarily a SSI concept,
but one that DfBSD will have people wanting to do.

Cheers,
Brett
--
AIM: bz743
Desk @ LONI/HPC:
225.578.1920


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Steve O'Hara-Smith
On Sun, 18 Feb 2007 14:25:57 +0100
Michel Talon [EMAIL PROTECTED] wrote:

 Of course it is none of my business, but i have always wandered about the
 real usefulness of a clustering OS in the context of free systems,

snip

 Small installations are the natural target of free
 systems, and in this context i remain convinced that the clustering ideas
 have an utility next to null.

Well personally I can see uses for a clustering OS mostly for the
purpose of minimising the impact of hardware failures so that instead of
running each service and environment I need on a different machine I can
run them on different clusters supported by redundant resources on several
machines. Right now I mirror data among machines to prevent loss but if one
machine is down then everything I do on that machine is unavailable until
I either replace that machine or bring it's services and so forth up on
another machine. With clusters providing an abstraction between machines
and computing environments and bringing some degree of redundancy I hope to
be able to do better.

Spreading one big environment over a huge number of machines is
indeed something that those with money to burn can do, spreading several
small environments over a few machines is something that would be nice to
be able to do.

-- 
C:WIN  |   Directable Mirror Arrays
The computer obeys and wins.| A better way to focus the sun
You lose and Bill collects. |licences available see
|http://www.sohara.org/


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Bill Hacker

Simon 'corecode' Schubert wrote:

Rupert Pigott wrote:

I don't know if IBM's GridFS does any better with the latency, but it
certainly scales a lot better but the barrier for adoption is $$$. It
costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
it. There are other options like AFS too, but people tend to be put 
off by

the learning curve and the fact it's an extra rather than something that
is packaged with the OS.


Do you happen to have links to GridFS and other systems you mentioned?

cheers
 simon



Google, surprisingly, did not find Big Blue's 'RedBook' fingerprints so much as 
University work, some with Government funding (US, Chinese, Brazilain, rench, et 
al):


Overview with more links:

http://www.scl.ameslab.gov/Projects/Infrastructure/gridafs.html

a(free) .pdf:

http://www.slac.stanford.edu/econf/C0303241/proc/papers/THAT005.PDF



Standards bodies (fee):

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?isnumber=34198arnumber=1630912count=73index=25

http://doi.ieeecomputersociety.org/10.1109/CCGRID.2006.141



Other (fee) publications

http://www.springerlink.com/content/5xpry6xu0nnwrwcd/

http://www.springerlink.com/content/rdrdbu6pgeanxgqu/


- most citations indicate implementations that appear to rely heavily on AFS 
legacy.


Also of interest 'Distributed Shared Memory':

http://perso.ens-lyon.fr/laurent.lefevre/dsm2006/


But a brief scan of those that were 'free' brings up the question:

'Just who is it that actually NEEDS this anyway?'

Bill Hacker


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Rupert Pigott
On Sun, 18 Feb 2007 13:23:38 +0100, Simon 'corecode' Schubert wrote:

 Rupert Pigott wrote:
 I don't know if IBM's GridFS does any better with the latency, but it
 certainly scales a lot better but the barrier for adoption is $$$. It
 costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
 it. There are other options like AFS too, but people tend to be put off by
 the learning curve and the fact it's an extra rather than something that
 is packaged with the OS.
 
 Do you happen to have links to GridFS and other systems you mentioned?

IBM call it GPFS... Grid Parallel Filesystem

http://www-03.ibm.com/systems/clusters/software/gpfs.html

Regards,
Rupert


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread Rupert Pigott
On Sun, 18 Feb 2007 14:25:57 +0100, Michel Talon wrote:

 Rupert Pigott wrote:
 
 On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:
 
 True, but Matt has explained that ZFS doesn't provide the functionality
 that DragonFlyBSD needs for cluster computing.
 
 ZFS solves the problem of building a bigger fileserver, but it
 doesn't help you distribute that file system across hundreds or thousands
 of grid nodes. ZFS doesn't address the issue of high-latency
 comms links between nodes, and NFS just curls up and dies when you try to
 run it across the Atlantic with 100+ms of latency.
 
 I don't know if IBM's GridFS does any better with the latency, but it
 certainly scales a lot better but the barrier for adoption is $$$. It
 costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
 it. There are other options like AFS too, but people tend to be put off by
 the learning curve and the fact it's an extra rather than something that
 is packaged with the OS.
 
 Of course it is none of my business, but i have always wandered about the
 real usefulness of a clustering OS in the context of free systems, and you
 post allows me to explain why. 

Here's one reason : Free project wants to crack a particular problem that
needs massive amounts of cycles and/or IO bandwidth but no individual
can afford to run a datacentre. A distributed compile farm would fit
that bill.

 People who have the money to buy machines by
 the thousands, run them, pay the electricity bill, etc. should also have
 the money to pay $$$ to IBM, and not count on the generosity of unpaid
 developers. Small installations are the natural target of free systems,

Doesn't always happen that way. Quite frequently internal politics, short
sightedness, NIH, budget battles etc get in the way.

 and in this context i remain convinced that the clustering ideas have an
 utility next to null. And frankly, i doubt they have any utility for big

This would be an enabling technology. Big business doesn't innovate, the
little guys do the innovation. I didn't see big business amongst the early
adopters of Ethernet, TCP/IP, UNIX etc..

 systems if you don't use high speed, low latency connects which are far
 more expensive than the machines themselves. And even with this highly

Tell that to the folks who crack codes. Low latency is highly desirable
but it isn't essential for all problems... Render farms and compile
farms are good examples.

 expensive hardware, if you don't have high brain programmers able to
 really make use of concurrency.

They help, but they aren't essential. There are a surprising number of
problems out there that can be cracked in a dumb way. :)

 On the contrary, the disks of Joe User are becoming bigger and bigger,
 his processor is getting more and more cores, so there is clearly a need
 for file systems appropriate for big disks and sufficiently reliable (
 ZFS being an example ) and operating systems able to use multicores
 efficiently.

I suspect that smaller slower cores are on the agenda for the great
unwashed masses. I am one of those people who thinks the days of the foot
warming tower case are numbered. Laptops, PDAs and Game Consoles already
out-ship desktops by a few orders of magnitude, I don't see that trend
swinging back the other way anytime soon.

I think you have also missed a point here. Applications like SETI just
weren't possible without the Grid concept (funding) - and people
really do want to do that kind stuff. Sure you and I might question the
utility of it, but the fact is it gave those guys a shot at doing
something way beyond their budget *without* having to resort to exotic
hardware or software.

For the record I cut my Parallel Processing teeth on OCCAM  Transputers.
This Grid stuff is neanderthal by comparison, but I have seen people
get real work out of it, and I can see a bunch of folks out there who
could also find it useful... Perhaps in the future you could contribute
your unused cycles  storage to web-serving  compiling for the DFly
project. I wouldn't mind that. :)

Cheers,
Rupert


Re: Plans for 1.8+ (2.0?)

2007-02-18 Thread talon
Bill Hacker wrote:

 Simon 'corecode' Schubert wrote:
 Rupert Pigott wrote:
 
 
 But a brief scan of those that were 'free' brings up the question:
 
 'Just who is it that actually NEEDS this anyway?'
 
 Bill Hacker

Well Rupert Pigott gave some pretty convincing explanations of the
usefulness of the concept. Personnally i see the problem from the other
side, i am a physicist in a lab which has a cluster, the lab below us also
has a cluster. Besides pissing contests, who has the biggest, i have hard
time finding the real usefulness of this stuff. Basically the cluster is
used by people as a collection of independent computers running independent
computations. Probably few of these guys if any has any notion of
concurrent programming. On the other hand i know an american physicist who
has a cluster and does real clustered computations on it, in fact QCD
computations on the lattice. This is the work of his life, he has learnt
MPI and other hard stuff to exploit parallelism. People like that are so
rare, and can get funding both for the harware and software, that this
doesn't justify, in my opinion, lots of efforts from free software
developers. Of course developers develop what they like, i have absolutely
nothing against that. People see advantages in distributed filesystems.
Now AFS is 30 years old, its rejuvenated version ARLA is 10 years old, who
is really using that in the real world? Of course i don't know much but i
don't know a single installation using it. This says a lot about the
usefulness or the necessity of these concepts.


-- 
Michel Talon


Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Dmitri Nikulin

On 2/1/07, ricardo [EMAIL PROTECTED] wrote:

On Wed, 31 Jan 2007 21:35:42 -0500 (EST)
Justin C. Sherrill [EMAIL PROTECTED] wrote:

 On Wed, January 31, 2007 3:18 pm, Matthew Dillon wrote:
 

  I am seriously considering our options with regards to ZFS or a
  ZFS-like filesystem.  We clearly need something to replace UFS,
  but I am a bit worried that porting ZFS would be as much work
  as simply designing a new filesystem from scratch.

 One of the reasons people are so excited about ZFS is because it
 solves the problem of managing space.  Disk management is and has
 always been a pain in the rear, and ZFS goes a long way toward
 reducing that.

 While constructing a new filesystem will help your goals, it will also
 mean that DragonFly users miss out on having all the other advantages
 that come with ZFS.  Put another way, we're going to lose planned
 functionality.

  You're implying that ZFS=God, in other words, you're implying that
there could be no better FS that ZFS. A very obnoxious statement!


That's not his point. He means that ZFS, while very good at what it
is, would not be optimal for transparent clustering. And a file system
which is designed for clustering won't necessarily be as good as ZFS
on single machines. Either way, some use cases becomes sub-optimal,
and it's a choice of what's more important to do first.

ZFS is optimized all the way down to avoiding byte swapping with a
simple but adequate endian adaptiveness technique, and being as new
as it is, it still has a few years worth of optimization potential.
It's definitely not going to perform as well on DragonFly as it does
on Solaris for a long time, but it could still be better than UFS by
design alone. Any optimization over that is just a bonus.

On the other hand, I'm not convinced there's a need to make a new
filesystem just for clustering, not just yet anyway. How about 9P?
It's not like clustering is a brand new problem, it's had decades of
research applied and there is no shortage of work to reference until
it's practical to attempt to do better.

---
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia


Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Bill Hacker

Matthew Dillon wrote:

:Besides the finalization of vkernel, what else can we expect into 2.0? There 
are many long-awaited (not only by me) features and additions:
:- ZFS

I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem.  We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.

One big advantage of a from-scratch design is that I would be
able to address the requirements of a clustered operating system
in addition the requirements of multi-terrabyte storage media.


Tilt.

I thot that was one of the design goals of ZFS?

Would it not make sense also to look again at the pioneering work done in Plan 
9?

'Clustering' per se is not new - only the DFLY approach to same.

Bill



Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Justin C. Sherrill
On Thu, February 1, 2007 3:20 am, Dmitri Nikulin wrote:

 That's not his point. He means that ZFS, while very good at what it
 is, would not be optimal for transparent clustering. And a file system
 which is designed for clustering won't necessarily be as good as ZFS
 on single machines. Either way, some use cases becomes sub-optimal,
 and it's a choice of what's more important to do first.

Sort of.  I'm saying that if Matt rolls his own filesystem instead of
using ZFS, that new filesystem is either:

1: not going to have the variety of tools available with zfs for handling
things like disk pooling/snapshots/data scrubbing/insert zfs term here.

2: going to have those features, which means Matt's time is going to be
eaten up reimplementing features already present in other filesystems.

It's a moot point until Matt can evaluate modifying existing filesystems
vs building a new one, though.  I don't want NIH-ism to get in the way of
having something neat, though




Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Simon 'corecode' Schubert

Justin C. Sherrill wrote:

It's a moot point until Matt can evaluate modifying existing filesystems
vs building a new one, though.  I don't want NIH-ism to get in the way of
having something neat, though


Yah.  I think porting ZFS and possibly inventing a new FS or pimping up ZFS can 
run in parallel and thus ZFS _should_ be done.

cheers
 simon

--
Serve - BSD +++  RENT this banner advert  +++ASCII Ribbon   /\
Work - Mac  +++  space for low €€€ NOW!1  +++  Campaign \ /
Party Enjoy Relax   |   http://dragonflybsd.org  Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz   Mail + News   / \



signature.asc
Description: OpenPGP digital signature


Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Matthieu Guéguen

On 2/1/07, Simon 'corecode' Schubert [EMAIL PROTECTED] wrote:

Yah.  I think porting ZFS and possibly inventing a new FS or pimping up ZFS can 
run in parallel and thus ZFS _should_ be done.

cheers
  simon



Yes, I second that. Maybe ZFS could be improved to handle the problems
Matt listed. And I think it could be easier than creating a whole new
filesystem (and test this new FS, tune it, write tools for it...).

By the way, congrats to all DragonFly developers for this 1.8 release !


Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Chris Csanady

2007/1/31, Matthew Dillon [EMAIL PROTECTED]:


I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem.  We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.


It is worth noting that Sun is looking at extending ZFS to be a
cluster aware filesystem.  If you dig through their mailing list
archives, you will see that it is a topic that pops up every now and
then.

In any case, I feel that it would be best to port ZFS, even if you
intend to create a new filesystem.  It is a great local filesystem,
and it will offer compatibility with Solaris, MacOS, and FreeBSD. (and
probably Linux once it is relicensed.)  It seems a waste not to take
advantage of Sun's efforts, especially since the code is so
portable--in fact, almost all of the OS dependent bits are in a single
file.

Pawel Jakub Dawidek made very rapid progress on the FreeBSD port.
Considering that DragonFly now has a virtual kernel and much simpler
VFS, the project should be vastly easier.  If you were to work on it,
I wouldn't be surprised if you could finish the core of the work in a
weekend.  Probably the most time-consuming part will be interfacing
with the device layer; things like supporting EFI labels,
automatically discovering disks, and so forth.

They even have a porting guide if you are interested:

 http://www.opensolaris.org/os/community/zfs/porting


One big advantage of a from-scratch design is that I would be
able to address the requirements of a clustered operating system
in addition the requirements of multi-terrabyte storage media.


Even with a from-scratch design, ZFS is well worth careful
examination.  There are many things it does very well, and
re-implementing even a fraction of its features would be very time
consuming.  In the mean time, it would be good to have ZFS.

The one part of it that I think could be handled better is the
inflexibility of the redundancy.  It would be nice to specify
redundancy per-dataset, and not be tied to the underlying static vdev
redundancy.  RAIDZ is also a bit inflexible itself; it would be great
to throw arbitrarily sized disks into a pool and not have to worry
about the layout at all.  To distribute blocks and recovery blocks
(much like with par2) across machines.  Full 3-way mirroring is quite
expensive, but would be necessary over a WAN.  The current limitations
though seem to be the result of a compromise, considering that this is
a very difficult problem.

Finally, I think that the network filesystem is the single largest
gaping hole in modern operating systems.  All of the commonly
available systems are absolutely awful, and I have been anticipating
DragonFly's CCMS.  It seems that with this and the VFS messaging work,
it should be almost trivial to create a fast and solid remote
filesystem.  That said, the very paradigm of the network filesystem
should probably be tossed in favor of the clusterable filesystem which
I imagine you have in mind.

Chris


Re: Plans for 1.8+ (2.0?)

2007-02-01 Thread Bill Hacker

Chris Csanady wrote:

very well-thought-out post in re ZFS. Thanks!

I'd only add that porting one or more 'foreign' fs in general seem to be a good 
idea - it is bound to show up things not yet covered well.


In all of the published comparison tests, I have never seen a single 'always 
best' fs anyway.


Pre ZFS,though, JFS and XFS were consistently 'nearly always in the top 3' - IOW 
pretty good all-around *compromises*.


And therein lies the rub:  One man's needs for optiization differ from the next.

But one thing *for sure* NFS and SMBFS and others (TVFS, Andrew...) have 
'problems' of one sort or another.


So *anything* that makes for a better shared storage - even if it must rely on 
nothing slower than gig-E or 10-Gig-E to be at its best, is a plus.


...shared multi-host SCSI RAID controllers being rudely rare and expensive...

;-)

Bill



Re: Plans for 1.8+ (2.0?)

2007-01-31 Thread Matthew Dillon

:Besides the finalization of vkernel, what else can we expect into 2.0? There 
are many long-awaited (not only by me) features and additions:
:- ZFS

I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem.  We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.

One big advantage of a from-scratch design is that I would be
able to address the requirements of a clustered operating system
in addition the requirements of multi-terrabyte storage media.

With vertical recording, hard drives are set to exceed 1TB a platter
either this year or next year.

:- updated installer: fixed web based installation etc.
:- updated PF
:- getting the network stack (and others) out of the BGL
:- AMD64 port.

I am hoping other developers take up the ball on these items.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:Note that this is not meant to be demanding at all, I am just interested, 
because there is no clear roadmap. I just would like to see the developers' 
plans.
:
:And don't forget: I am here to test. ;-)
:
:-- 
:Gergo Szakal [EMAIL PROTECTED]
:University Of Szeged, HU
:Faculty Of General Medicine



Re: Plans for 1.8+ (2.0?)

2007-01-31 Thread Noah yan

Hi Matt,

first congratulation for your 1.8 release, a solid and tremendous
progress. some questions (hopefully not too far from the scope :) ...

On 1/31/07, Matthew Dillon [EMAIL PROTECTED] wrote:


:Besides the finalization of vkernel, what else can we expect into 2.0? There 
are many long-awaited (not only by me) features and additions:
:- ZFS

I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem.  We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.

One big advantage of a from-scratch design is that I would be
able to address the requirements of a clustered operating system
in addition the requirements of multi-terrabyte storage media.

is parallel file system in this scope, or it is the requirement you
have for cluster OS? what would be the advantage of designing a new
one compared to the current and existing one, considering the
advantage of the dfly kernel for SSI os to other kernel? put in
another way, is a new fs that will take advantage of the kernel
advantage superior to others,  from what we can predict now.

even too much questions, hope they are all relevant.

Noah





With vertical recording, hard drives are set to exceed 1TB a platter
either this year or next year.

:- updated installer: fixed web based installation etc.
:- updated PF
:- getting the network stack (and others) out of the BGL
:- AMD64 port.

I am hoping other developers take up the ball on these items.

-Matt
Matthew Dillon
[EMAIL PROTECTED]

:Note that this is not meant to be demanding at all, I am just interested, 
because there is no clear roadmap. I just would like to see the developers' 
plans.
:
:And don't forget: I am here to test. ;-)
:
:--
:Gergo Szakal [EMAIL PROTECTED]
:University Of Szeged, HU
:Faculty Of General Medicine




Re: Plans for 1.8+ (2.0?)

2007-01-31 Thread Justin C. Sherrill
On Wed, January 31, 2007 3:18 pm, Matthew Dillon wrote:


 I am seriously considering our options with regards to ZFS or a
 ZFS-like filesystem.  We clearly need something to replace UFS,
 but I am a bit worried that porting ZFS would be as much work
 as simply designing a new filesystem from scratch.

One of the reasons people are so excited about ZFS is because it solves
the problem of managing space.  Disk management is and has always been a
pain in the rear, and ZFS goes a long way toward reducing that.

While constructing a new filesystem will help your goals, it will also
mean that DragonFly users miss out on having all the other advantages that
come with ZFS.  Put another way, we're going to lose planned
functionality.



Re: Plans for 1.8+ (2.0?)

2007-01-31 Thread ricardo
On Wed, 31 Jan 2007 21:35:42 -0500 (EST)
Justin C. Sherrill [EMAIL PROTECTED] wrote:

 On Wed, January 31, 2007 3:18 pm, Matthew Dillon wrote:
 
 
  I am seriously considering our options with regards to ZFS or a
  ZFS-like filesystem.  We clearly need something to replace UFS,
  but I am a bit worried that porting ZFS would be as much work
  as simply designing a new filesystem from scratch.
 
 One of the reasons people are so excited about ZFS is because it
 solves the problem of managing space.  Disk management is and has
 always been a pain in the rear, and ZFS goes a long way toward
 reducing that.
 
 While constructing a new filesystem will help your goals, it will also
 mean that DragonFly users miss out on having all the other advantages
 that come with ZFS.  Put another way, we're going to lose planned
 functionality.

  You're implying that ZFS=God, in other words, you're implying that
there could be no better FS that ZFS. A very obnoxious statement!

 
 


-- 
[EMAIL PROTECTED]


Was: Plans for 1.8+ (2.0?) Now: Filesystem support?

2007-01-31 Thread Peter Serwe
There's a huge niche that desperately needs to be filled for systems 
that have huge
numbers of small files.  ReiserFS went some of the way towards doing 
that, but at
this point has pretty much officially flopped, and still has huge 
issues, not the least
of which are Hans' personal ones.  XFS and JFS don't have that as a 
sweet spot,
and have varying qualities of implementation under various *nix OS's, 
and ext3's
tuning/performance options once you fill a TB with the previously 
mentioned files

leave quite a bit to be desired.

I'm pretty much unfamiliar with ZFS, but for my own personal, selfish 
needs, I need
a filesystem that can handle double-digit TB capacities, store a 
bazillion ~4k files,
and deliver huge throughput to/from tons of TCP/IP clients. 

As an aside, a good replacement for NFS or architecting around NFS's 
weak points
would also be a plus. 


Peter


Justin C. Sherrill wrote:

On Wed, January 31, 2007 3:18 pm, Matthew Dillon wrote:
  

  

I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem.  We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.



One of the reasons people are so excited about ZFS is because it solves
the problem of managing space.  Disk management is and has always been a
pain in the rear, and ZFS goes a long way toward reducing that.

While constructing a new filesystem will help your goals, it will also
mean that DragonFly users miss out on having all the other advantages that
come with ZFS.  Put another way, we're going to lose planned
functionality.

  



--
Peter Serwe peter at infostreet dot com

http://www.infostreet.com

The only true sports are bullfighting, mountain climbing and auto racing. 
-Earnest Hemingway

Because everything else requires only one ball. -Unknown

Do you wanna go fast or suck? -Mike Kojima

There are two things no man will admit he cannot do well: drive and make 
love. -Sir Stirling Moss



Plans for 1.8+ (2.0?)

2007-01-30 Thread Gergo Szakal
Besides the finalization of vkernel, what else can we expect into 2.0? There 
are many long-awaited (not only by me) features and additions:
- ZFS
- updated installer: fixed web based installation etc.
- updated PF
- getting the network stack (and others) out of the BGL
- AMD64 port.

Note that this is not meant to be demanding at all, I am just interested, 
because there is no clear roadmap. I just would like to see the developers' 
plans.

And don't forget: I am here to test. ;-)

-- 
Gergo Szakal [EMAIL PROTECTED]
University Of Szeged, HU
Faculty Of General Medicine

/* Please do not CC me with replies, thank you. */