Re: The future of NetBSD by Charles M. Hannum

2006-09-02 Thread Bill Hacker

Jonathon McKitrick wrote:


I'm starting to imagine the size of the Lisp image I could run on a cluster
like the kind being discussed ;-)

Jonathon McKitrick
--
My other computer is your Windows box.


Go and wath out your mouth with thoap!

;-)

Bill



Re: The future of NetBSD by Charles M. Hannum

2006-09-02 Thread Bill Hacker

Matthew Dillon wrote:


:On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
:: that 75% of the interest in our project has nothing to do with my
:: project goals but instead are directly associated with work being done
:: by our relatively small community.  I truely appreciate that effort
:: because it allows me to focus on the part that is most near and dear
:: to my own heart.
:
:Big question: after all the work that will go into the clustering, other than
:scientific research, what will the average user be able to use such advanced
:capability for?
:
:Jonathon McKitrick

I held off answering because I became quite interested in what others
thought the clustering would be used for.

Lets take a big, big step back and look at what the clustering means
from a practical standpoint.

There are really two situations involved here.  First, we certainly
can allow you to say 'hey, I am going to take down machine A for
maintainance', giving the kernel the time to migrate all
resources off of machine A.

But being able to flip the power switch on machine A without warning,
or otherwise have a machine fail unexpectedly, is another ball of wax
entirely.  There are only a few ways to cope with such an event:

(1) Processes with inaccessible data are killed.  High level programs
such as 'make' would have to be made aware of this possibility,
process the correct error code, and restart the killed children
(e.g. compiles and such).

In this scenario, only a few programs would have to be made aware
of this type of failure in order to reap large benefits from a
	big cluster, such as the ability to do massively parallel 
	compiles or graphics or other restartable things.


(2) You take a snapshot every once in a while and if a process fails
on one machine you recover an earlier version of it on another
(including rolling back any file modifications that were made).

(3) You run the cpu context in tandem on multiple machines so if one
machine fails another can take over without a break.  This is
really an extension of the rollback mechanism, but with additional
requirements and it is particularly difficult to accomplish with
a threaded program where there may be direct memory interactions
between threads.

	Tandem operation is possible with non-threaded programs but all 
	I/O interactions would have to be synchronization points (and thus

performance would suffer).  Threaded programs would have to be
aware of the tandem operation, or else we make writing to memory
a synchronization point too (and even then I am not convinced it
is possible to keep two wholely duplicate copies of the program
operating in tandem).

Needless to say, a fully redundant system is very, very complex.   My
2-year goal is NOT to achieve #3.  It is to achieve #1 and also have the
ability to say 'hey, I'm taking machine BLAH down for maintainance,
migrate all the running contexts and related resources off of it please'.

Achieving #2 or #3 in a fully transparent fashion is more like a
5-year project, and you would take a very large performance hit in
order to achieve it.

But lets consider #1... consider the things you actually might want to
accomplish with a cluster.  Large simulations, huge builds, or simply
providing resources to other projects that want to do large simulations
or huge builds.

Only a few programs like 'make' or the window manager have to actually
be aware of the failure case in order to be able to restart the killed
programs and make a cluster useful to a very large class of work product.
Even programs like sendmail and other services can operate fairly well
in such an environment.

So what can the average user do ?

* The average user can support a third party project by providing
  cpu, memory, and storage resources to that project.

  (clearly there are security issues involved, but even so there is
  a large class of problems that can be addressed).

* The average user wants to leverage the cpu and memory resources 
  of all his networked machines for things like builds (buildworld,

  pkg builds, etc)... batch operations which can be restarted if a
  failure occurs.

  So, consider, the average user has his desktop, and most processes
  are running locally, but he also has other machines and they tie
  into a named cluster based on the desktop.   The cluster would
  'see' the desktop's filesystems but otherwise operate as a separate
  system.  The average user would then be able to login to the
  'cluster' and run things that then take advantage of all the machine's
  resources.

* The average user might be part of a large project that has access to
  a cluster.  



Re: The future of NetBSD by Charles M. Hannum

2006-09-02 Thread Jonathon McKitrick
On Sat, Sep 02, 2006 at 05:54:14PM +0800, Bill Hacker wrote:
: Jonathon McKitrick wrote:
: 
: I'm starting to imagine the size of the Lisp image I could run on a cluster
: like the kind being discussed ;-)
: 
: Jonathon McKitrick
: --
: My other computer is your Windows box.
: 
: Go and wath out your mouth with thoap!

Sorry, but I'm never coming back after discovering Lisp.  ;-P



Jonathon McKitrick
--
My other computer is your Windows box.


Re: The future of NetBSD by Charles M. Hannum

2006-09-01 Thread Martin P. Hellwig

Jonathon McKitrick wrote:

On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
: that 75% of the interest in our project has nothing to do with my
: project goals but instead are directly associated with work being done
: by our relatively small community.  I truely appreciate that effort
: because it allows me to focus on the part that is most near and dear
: to my own heart.

Big question: after all the work that will go into the clustering, other than
scientific research, what will the average user be able to use such advanced
capability for?

Jonathon McKitrick
--
My other computer is your Windows box.


Well, I for one would be thrilled if my high-availability is just as 
simple as putting another box on my cluster somewhere else on the 
internet, backups get easy with snapshots like implemented in ZFS.
It's a real peace of mind to know that a box is expected to fail at some 
point and I know that I don't need to figure out what went wrong, I just 
tell that administrator across the ocean to put another box in that 
cluster and remove the bad one when he has the time to do it.


No more 24h administration, no more emergency calls because of bad 
hardware, I finally can do the more important stuff in my job, like 
drinking coffee, socializing with that cute secretary, recreating 
solutions that is just perfect for a problem I'll never get.
Or porting that platform independent python program that some brain dead 
has found a way to make it exclusive linux.


my € 0,02

--
mph


Re: The future of NetBSD by Charles M. Hannum

2006-09-01 Thread Matthew Dillon
:On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
:: that 75% of the interest in our project has nothing to do with my
:: project goals but instead are directly associated with work being done
:: by our relatively small community.  I truely appreciate that effort
:: because it allows me to focus on the part that is most near and dear
:: to my own heart.
:
:Big question: after all the work that will go into the clustering, other than
:scientific research, what will the average user be able to use such advanced
:capability for?
:
:Jonathon McKitrick

I held off answering because I became quite interested in what others
thought the clustering would be used for.

Lets take a big, big step back and look at what the clustering means
from a practical standpoint.

There are really two situations involved here.  First, we certainly
can allow you to say 'hey, I am going to take down machine A for
maintainance', giving the kernel the time to migrate all
resources off of machine A.

But being able to flip the power switch on machine A without warning,
or otherwise have a machine fail unexpectedly, is another ball of wax
entirely.  There are only a few ways to cope with such an event:

(1) Processes with inaccessible data are killed.  High level programs
such as 'make' would have to be made aware of this possibility,
process the correct error code, and restart the killed children
(e.g. compiles and such).

In this scenario, only a few programs would have to be made aware
of this type of failure in order to reap large benefits from a
big cluster, such as the ability to do massively parallel 
compiles or graphics or other restartable things.

(2) You take a snapshot every once in a while and if a process fails
on one machine you recover an earlier version of it on another
(including rolling back any file modifications that were made).

(3) You run the cpu context in tandem on multiple machines so if one
machine fails another can take over without a break.  This is
really an extension of the rollback mechanism, but with additional
requirements and it is particularly difficult to accomplish with
a threaded program where there may be direct memory interactions
between threads.

Tandem operation is possible with non-threaded programs but all 
I/O interactions would have to be synchronization points (and thus
performance would suffer).  Threaded programs would have to be
aware of the tandem operation, or else we make writing to memory
a synchronization point too (and even then I am not convinced it
is possible to keep two wholely duplicate copies of the program
operating in tandem).

Needless to say, a fully redundant system is very, very complex.   My
2-year goal is NOT to achieve #3.  It is to achieve #1 and also have the
ability to say 'hey, I'm taking machine BLAH down for maintainance,
migrate all the running contexts and related resources off of it please'.

Achieving #2 or #3 in a fully transparent fashion is more like a
5-year project, and you would take a very large performance hit in
order to achieve it.

But lets consider #1... consider the things you actually might want to
accomplish with a cluster.  Large simulations, huge builds, or simply
providing resources to other projects that want to do large simulations
or huge builds.

Only a few programs like 'make' or the window manager have to actually
be aware of the failure case in order to be able to restart the killed
programs and make a cluster useful to a very large class of work product.
Even programs like sendmail and other services can operate fairly well
in such an environment.

So what can the average user do ?

* The average user can support a third party project by providing
  cpu, memory, and storage resources to that project.

  (clearly there are security issues involved, but even so there is
  a large class of problems that can be addressed).

* The average user wants to leverage the cpu and memory resources 
  of all his networked machines for things like builds (buildworld,
  pkg builds, etc)... batch operations which can be restarted if a
  failure occurs.

  So, consider, the average user has his desktop, and most processes
  are running locally, but he also has other machines and they tie
  into a named cluster based on the desktop.   The cluster would
  'see' the desktop's filesystems but otherwise operate as a separate
  system.  The average user would then be able to login to the
  'cluster' and run things that then take advantage of all the machine's
  resources.

* The average user might be part of a large project that has access to
  a cluster.  

 

Re: The future of NetBSD by Charles M. Hannum

2006-09-01 Thread Justin C. Sherrill
On Fri, September 1, 2006 12:45 pm, Matthew Dillon wrote:

 So what can the average user do ?

 * The average user can support a third party project by providing
   cpu, memory, and storage resources to that project.

   (clearly there are security issues involved, but even so there is
   a large class of problems that can be addressed).

It would be neat, in terms of both speed and community, if we could have
binary builds of pkgsrc for DragonFly accomplished by *everyone*.



Re: The future of NetBSD by Charles M. Hannum

2006-09-01 Thread Jonathon McKitrick

I'm starting to imagine the size of the Lisp image I could run on a cluster
like the kind being discussed ;-)

Jonathon McKitrick
--
My other computer is your Windows box.


Re: The future of NetBSD by Charles M. Hannum

2006-09-01 Thread Steve O'Hara-Smith
On Fri, 1 Sep 2006 09:45:32 -0700 (PDT)
Matthew Dillon [EMAIL PROTECTED] wrote:

 :On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
 :: that 75% of the interest in our project has nothing to do with my
 :: project goals but instead are directly associated with work being
 done :: by our relatively small community.  I truely appreciate that
 effort :: because it allows me to focus on the part that is most near
 and dear :: to my own heart.
 :
 :Big question: after all the work that will go into the clustering, other
 than :scientific research, what will the average user be able to use such
 advanced :capability for?
 :
 :Jonathon McKitrick
 
 I held off answering because I became quite interested in what others
 thought the clustering would be used for.
 
 Lets take a big, big step back and look at what the clustering means
 from a practical standpoint.
 
 There are really two situations involved here.  First, we certainly
 can allow you to say 'hey, I am going to take down machine A for
 maintainance', giving the kernel the time to migrate all
 resources off of machine A.
 
 But being able to flip the power switch on machine A without warning,
 or otherwise have a machine fail unexpectedly, is another ball of wax
 entirely.  There are only a few ways to cope with such an event:
 
 (1) Processes with inaccessible data are killed.  High level programs
   such as 'make' would have to be made aware of this possibility,
   process the correct error code, and restart the killed children
   (e.g. compiles and such).
 
   In this scenario, only a few programs would have to be made aware
   of this type of failure in order to reap large benefits from a
   big cluster, such as the ability to do massively parallel 
   compiles or graphics or other restartable things.


This is also quite good enough from my point of view, I think my
post may have given the impression that I was expecting #3 to appear - I
certainly was not, I know how hard that is. In fact #1 is more than I was
hoping for, having the make fail and a few windows close but being able to
reopen them and restart the make by hand would be orders of magnitude
better than I can achieve now with periodic rsync and a fair amount of
fiddling around to get environments running on a backup machine when I have
a hardware failure.

-- 
C:WIN  |   Directable Mirror Arrays
The computer obeys and wins.| A better way to focus the sun
You lose and Bill collects. |licences available see
|http://www.sohara.org/


Re: The future of NetBSD by Charles M. Hannum

2006-08-31 Thread Matthew Dillon

:Hello,
:
:I found this message on the NetBSD mailing list and it
:can be quite interesting for reading. It says about
:negative stuff in the NetBSD project and manners for
:fixing the problems of the project.
:
:I hope it can be useful for read to others, for me is
:quite interesting. He mentions DragonFly, so I think
:is worth mentioning it here because of that ;)
:
:http://mail-index.netbsd.org/netbsd-users/2006/08/30/0016.html
:
:Regards,
:timofonic

It's very interesting and should serve as a caution both that no
open-source project lasts forever, and no open-source project ever
truely dies, either.  What happens is that people move on, and others
fill the gaps and, eventually, even if it winds up being 20 years later,
the best pieces of the project morph into something else entirely.

For my part, I have a very clear set of personal goals that I want
to achieve with DragonFly, but regardless of my own goals the concept
of 'getting behind' in various areas is one that we, facing similarly
low numbers of developers, have to deal with every day.  In many
respects, the interest in the DragonFly project is very heavily
supported by the work that everyone is doing to keep the project
up to date as it is by my lofty clustering goals.  In fact, I would say
that 75% of the interest in our project has nothing to do with my
project goals but instead are directly associated with work being done
by our relatively small community.  I truely appreciate that effort
because it allows me to focus on the part that is most near and dear
to my own heart.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: The future of NetBSD by Charles M. Hannum

2006-08-31 Thread Jonathon McKitrick
On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
: that 75% of the interest in our project has nothing to do with my
: project goals but instead are directly associated with work being done
: by our relatively small community.  I truely appreciate that effort
: because it allows me to focus on the part that is most near and dear
: to my own heart.

Big question: after all the work that will go into the clustering, other than
scientific research, what will the average user be able to use such advanced
capability for?

Jonathon McKitrick
--
My other computer is your Windows box.


Re: The future of NetBSD by Charles M. Hannum

2006-08-31 Thread walt
Jonathon McKitrick wrote:
 On Thu, Aug 31, 2006 at 09:58:59AM -0700, Matthew Dillon wrote:
 : that 75% of the interest in our project has nothing to do with my
 : project goals but instead are directly associated with work being done
 : by our relatively small community.  I truely appreciate that effort
 : because it allows me to focus on the part that is most near and dear
 : to my own heart.
 
 Big question: after all the work that will go into the clustering, other than
 scientific research, what will the average user be able to use such advanced
 capability for?

Heh.  I'm probably not an average user (I'm just an amateur OS geek),
but the day I can compile OpenOffice in under an hour on my cluster of
five DragonFly PC's will be the day I die and go straight to Heaven.

Yes, I once hoped to achieve World Peace, but I'm older now and my goals
have become slightly less ambitious.


Re: The future of NetBSD by Charles M. Hannum

2006-08-31 Thread Justin C. Sherrill
On Thu, August 31, 2006 3:42 pm, Jonathon McKitrick wrote:

 Big question: after all the work that will go into the clustering, other
 than scientific research, what will the average user be able to use such
 advanced capability for?

Lots.  To get to a single system image, the operating system has to be
made less obfusticated.  A cleaner system means less pain for the
developers and less bugs due to obscurity.

Making the system multiprocessor safe generally improves uniprocessor
speed.  Even so, we are quickly reaching the point where all processors
are dual-core.

New ideas, like Kip Macy's checkpointing work, can be tried out without
judgement on anything but its technical merits.  While Matt concentrates
on his work, other people are free to add.  For instance, we haven't
removed any commit bits because of disagreements in project direction.  :)

Keep in mind that as Matt pointed out, a lot of what makes DragonFly
interesting is the other work being done.  Clustering is one part of many.