Re: [9fans] Plan9 - the next 20 years

2009-04-24 Thread ron minnich
On Thu, Apr 23, 2009 at 9:56 AM,  tlaro...@polynum.com wrote:

 clustermatic: not much left from lanl

This is a long story and the situation is less a comment on the
software than on the organization. I only say this because, almost 5
years after our last release,
there are still people out there using it.


 beowulf: seems to have stalled a little since 2007

eh? beowulf was never a product. It's a marketing term that means
linux cluster. The top 2 machines on the top 500 are beowulf
systems.


Not sure what you're getting at here, but you've barely scratched the surface.

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-24 Thread tlaronde
On Fri, Apr 24, 2009 at 08:33:59AM -0700, ron minnich wrote:
 
 [snipped precisions about some of my notes]

 Not sure what you're getting at here, but you've barely scratched the surface.

The fact that I'm not an english native speaker does not help and my
wording may be rude.
This was not intended to say that clusters have no usage or whatever.
This would be ridiculous. And I do expect that solutions that are indeed
hard to get right are still used and not chosen or thrown away depending
on the mood.

Simply that, precisely, specially in the open source, for somebody
like me, who does not work in this area, but wants to have at least some
rough ideas just to, if not write programs that can be used efficiently
out of the box on such beasts, but at least try to avoid mistakes that
make the program a nightmare to try to use such tools (or impact the
design of the solutions, having to deal with spaghetti code), following 
what appears on the surface is disappointing, since it appears,
disappears, and the hype around some solutions is not always a
clear indicator about the real value, or emphasize something that is not
the crux. In my area, the watershed computation
on a grid (raster) for geographical informations is a heavy process
stuff, and processing some huge data calls for solution both on
the algorithmic side (including the implementation), and on the
processing power. So even if I'm not a specialist, and don't plan to be
(assuming I could understand the basics), I feel compelled to have at
least some ideas about the problems.

For this kind of stuff, the Plan 9 organization has given me at least
some principles and some hard facts and tools: separate the 
representation (the terminal) from the processing. Remember that 
processing is about data that may not be served by the same instance
of the OS, i.e. that the locking of data, ensured during processing
is, on some OS and depending on the fileserver or filesystem,
advisory. So perhaps think differently about rights and locking. And,
no, this can not work in whatever environment or with whatever i
filesystem and fileservers. And adding Plan 9 to POSIX, showing the
differences is a great help in organizing the sources to, between
guarantedd by C, and system dependant for example.

After that, my only guidelines are that if some limited, calculus
intensive sub-tasks can be made in parallel but the whole is
interdependant, one can think about multiple threads sharing the
same address space.

But if I can design my data formats to allow independant processing of
chunks (locality in geometrical stuff is rather obvious ; and finally 
sewing all the chunks together afterwards, even with
some processing on the edges of the chunks), I can
imagine processes (tasks) distributed among distinct CPUs. In this case,
an OS can, too, launch the tasks on the same CPU with multiple cores.
At the moment, I think more on multiple tasks, than threads. 

But that's vague. I know what to avoid doing, but I'm not sure that what
I do is not to be added to the list of don't do that things.
-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-23 Thread erik quanstrom
 Not to beat a (potentially) dead horse (even further) to death, but if we
 had some way of knowing that files were actually data (i.e. not ctl files;
 cf. QTDECENT) we could do more prefetching in a proxy -- e.g. cfs could be
 modified to do read entire files into its cache (presumably it would have to
 Tstat afterwards to ensure that it's got a stable snapshot of the file).
 Adding cache journal callbacks would further allow us to avoid the RTT of
 Tstat on open and would bring us a bit closer to a fully coherent store.
 Achieving actual coherency could be an interesting future direction (IMHO).

you're right.  i think cfs is becoming a pretty dead horse.  if you're
running at a significant rtt from your main system, and you don't
have one of your own, the simple solution is to install 9vx.  that
way your system is local and only the shared files get shared.

does cfs even work with import?  if it doesn't then using it implies
that all your data are free for public viewing on whatever network
you're using, since direct fs connections aren't encrypted.

regarding the QTDECENT, i think russ is still on point
http://9fans.net/archive/2007/10/562

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-23 Thread tlaronde
On Sat, Apr 18, 2009 at 08:05:50AM -0700, ron minnich wrote:
   
 For cluster work that was done in the OS, see any clustermatic
 publication from minnich, hendriks, or watson, ca. 2000-2005.

FWIW, I haven't found much left, and finally purchased your (and al.)
article about HARE: The Right-Weight-Kernel... Since it considers, among
others, Plan 9, it's of much interest for me ;)

What impresses me much is the state of the open source cluster
solutions from a rapid tour.

clustermatic: not much left from lanl
openmosix: closed
beowulf: seems to have stalled a little since 2007
kerrighed: After 8 years of research it is considered a proof of
concept, but for obtaining a stable system, we have disabled some
features.

I hope that the last [from memory] quotes do not imply that the old way
of disproving an assertion by a counter-example has been replaced by
considered proved an assertion by advertising a limited not crashing too
fast example.
-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-22 Thread sqweek
2009/4/21 erik quanstrom quans...@coraid.com:
 http://moderator.appspot.com/#15/e=c9t=2d

 You must have JavaScript enabled in order to use this feature.

 cruel irony.

No silver bullet, unfortunately :)

Ken Thompson wrote:
| HTTP is really TCP/IP - a reliable stream transport. 9P is a
| filesystem protocol - a higher level protocol that can run over
| any of several transfer protocols, including TCP. 9P is more
| NFS-like than HTTP.
|
| HTTP gets it's speed mostly by being one way - download. Most
| browsers will speed up loading by asking for many pages at once.
|
| Now for the question. 9P could probably be speeded up, for large
| reads and writes, by issuing many smaller reads in parallel rather
| than serially. Another try would be to allow the client of the
| filesystem to issue asynchronous requests and at some point
| synchronize. Because 9P is really implementing a filesystem, it
| will be very hard to get any more parallelism with multiple outstanding
| requests.

 I followed it up with a more focused question (awkwardly worded to
fit within the draconian 250 character limit), but no response yet:
Hi Ken, thanks for your 9p/HTTP response. I guess my real question
is: can we achieve such parallelism transparently, given that most
code calls read() with 4k/8k blocks. The syscall implies
synchronisation... do we need new primitives? h8 chr limit

-sqweek




Re: [9fans] Plan9 - the next 20 years

2009-04-22 Thread Nathaniel W Filardo
On Thu, Apr 23, 2009 at 01:07:58PM +0800, sqweek wrote:
 Ken Thompson wrote:
 | Now for the question. 9P could probably be speeded up, for large
 | reads and writes, by issuing many smaller reads in parallel rather
 | than serially. Another try would be to allow the client of the
 | filesystem to issue asynchronous requests and at some point
 | synchronize. Because 9P is really implementing a filesystem, it
 | will be very hard to get any more parallelism with multiple outstanding
 | requests.
 
  I followed it up with a more focused question (awkwardly worded to
 fit within the draconian 250 character limit), but no response yet:
 Hi Ken, thanks for your 9p/HTTP response. I guess my real question
 is: can we achieve such parallelism transparently, given that most
 code calls read() with 4k/8k blocks. The syscall implies
 synchronisation... do we need new primitives? h8 chr limit

Not to beat a (potentially) dead horse (even further) to death, but if we
had some way of knowing that files were actually data (i.e. not ctl files;
cf. QTDECENT) we could do more prefetching in a proxy -- e.g. cfs could be
modified to do read entire files into its cache (presumably it would have to
Tstat afterwards to ensure that it's got a stable snapshot of the file).
Adding cache journal callbacks would further allow us to avoid the RTT of
Tstat on open and would bring us a bit closer to a fully coherent store.
Achieving actual coherency could be an interesting future direction (IMHO).

--nwf;


pgpVXNbfdyNw5.pgp
Description: PGP signature


Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread Bakul Shah
On Mon, 20 Apr 2009 16:33:41 EDT erik quanstrom quans...@coraid.com  wrote:
 let's take the path /sys/src/9/pc/sdata.c.  for http, getting
 this path takes one request (with the prefix http://$server)
 with 9p, this takes a number of walks, an open.  then you
 can start with the reads.  only the reads may be done in
 parallel.
 
 given network latency worth worring about, the total latency
 to read this file will be worse for 9p than for http.

Perhaps one can optimize for the common case by extending 9p
a bit: use special values for certain parameters to allow
sending consecutive Twalk, (Topen|Tcreate), (Tread|Twrite)
without waiting for intermeditate  R messages.  This makes
sense since the time to prepare /process a handful of
messages is much shorter than roundtrip latency.

A different performance problem arises when lots of data has
to be fetched.  You can pipeline data requests by having
multiple outstanding requests.  A further refinement would to
use something like RDMA -- in essence the receiver tells the
sender where exactly it wants the data delivered (thus
minimizing copying  processing).  You can very easily extend
the model to have data chunks delivered to different machines
in a cluster.  This is like separating a very high speed
data plane (with little or no processing) from a low speed
control plane (with lots of processing) in a modern
switch/router.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/20 andrey mirtchovski mirtchov...@gmail.com:
 with 9p, this takes a number of walks...

 shouldn't that be just one walk?

 % ramfs -D
 ...
 % mkdir -p /tmp/one/two/three/four/five/six
 ...
 % cd /tmp/one/two/three/four/five/six
 ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two
 2:three 3:four 4:five 5:six
 ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d)
 1:(0002 0 d) 2:(0003 0 d) 3:(0004
 0 d) 4:(0005 0 d) 5:(0006 0 d)

that depends if it's been gated through exportfs or not
(exportfs only walks one step at a time, regardless of
the incoming walk)

i'm sure something like this has been discussed before,
and this idea somewhat half-baked, but one could get
quite a long way by allowing the notion of a sequence
of related 9p actions - if one action fails, then all subsequent
actions are discarded.

one difficulty with using multiple concurrent requests
with 9p as it stands is that there's no way to force
the server to process them sequentially. fcp works
because the reads it sends can execute out of order
without changing the semantics, but this only works
on conventional files.

suppose all 9p Tmsgs were given an sid (sequence id)
field. a new 9p message, Tsequence, would start
a sequence; subsequent messages with the same sid
would be added to a server-side queue for that sequence
rather than being executed immediately.

the server would move sequentially through the queue,
executing actions and sending each reply when complete.
the sequence would abort when one of:
a) an Rerror is sent
b) a write returned less than the number of bytes written
c) a read returned less than the number of bytes requested.

this mechanism would allow a client to program a set of
actions to perform sequentially on the server without
having to wait for each reply in turn, i.e. avoiding the
usual 9p latency.

some use cases:

the currently rather complex definition of Twalk could
be replaced by clone and walk1 instead, as
in the original 9p: {Tclone, Twalk, Twalk, ...}

{Twrite, Tread} gives a RPC-style request - no need
for venti to use its own protocol (which i assume was invented
largely because of the latency inherent in doing two
separate 9p requests where one would do).

streaming - send several speculative requests, and keep
adding a request to the sequence when a reply arrives.
still probably not as good as straight streaming TCP,
but easier than fcp and more general.

there are probably lots of reasons why this couldn't
work, but i can't think of any right now...



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 maht mattmob...@proweb.co.uk:
 Tag 3 could conceivably arrive at the server before Tag 2

that's not true, otherwise the flush semantics wouldn't
work correctly. 9p *does* require in-order delivery.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
i wrote:
 the currently rather complex definition of Twalk could
 be replaced by clone and walk1 instead, as
 in the original 9p: {Tclone, Twalk, Twalk, ...}

i've just realised that the replacement would be
somewhat less efficient as the current Twalk, as the
cloned fid would still have to be clunked on a failed
walk. this is a common case when using paths
that traverse a mount point, but perhaps it's less
common that directories are mounted on external filesystems,
and hence not such an important issue (he says,
rationalising hastily :-) )



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
On Tue Apr 21 06:25:49 EDT 2009, rogpe...@gmail.com wrote:
 2009/4/21 maht mattmob...@proweb.co.uk:
  Tag 3 could conceivably arrive at the server before Tag 2
 
 that's not true, otherwise the flush semantics wouldn't
 work correctly. 9p *does* require in-order delivery.

i have never needed to do anything important with flush.
so i'm asking from ignorance.

what is the important use case of flush and why is this
so important that it drives the design?

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread maht

one issue with multiple 9p requests is that tags are not order enforced

consider the contrived directory tree

1/a/a
1/a/b
1/b/a
1/b/b

Twalk 1 fid 1
Twalk 2 fid a
Twalk 3 fid b

Tag 3 could conceivably arrive at the server before Tag 2







Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
On Tue Apr 21 10:05:43 EDT 2009, rogpe...@gmail.com wrote:
 2009/4/21 erik quanstrom quans...@quanstro.net:
  what is the important use case of flush and why is this
  so important that it drives the design?
 
[...]
 The 9P protocol must run above a reliable transport protocol with
 delimited messages. [...]
 UDP [RFC768] does not provide reliable in-order delivery.
 (is this the canonical reference for this requirement? the man page
 doesn't seem to say it)

great post, but i still don't understand why the
protocol is designed around flush semantics.

all your examples have to do with the interaction
between flush and something else.  why is flush
so important?  what if we just ignored the response
we don't want instead?

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 erik quanstrom quans...@quanstro.net:
 what is the important use case of flush and why is this
 so important that it drives the design?

actually the in-order delivery is most important
for Rmessages, but it's important for Tmessages too.
consider this exchange (C=client, S=server), where
the Tflush is sent almost immediately after the Twalk:

C-S Twalk tag=5 fid=22 newfid=24
C-S Tflush tag=6 oldtag=5
S-C Rflush tag=6

if outgoing tags 5 and 6 were swapped, we could get
this possible exchange:

C-S Tflush tag=6 oldtag=5
S-C Rflush tag=6
C-S Twalk tag=5 fid=22 newfid=24
S-C Rwalk tag=5

thus the flush is incorrectly ignored.
this won't break the protocol though, but
consider this example, where Rmsgs
can be delivered out-of-order:

here, the server replies to the Twalk message
before it receives the Tflush. the clone succeeds:

C-S Twalk tag=4 fid=22 newfid=23
C-S Tflush tag=5 oldtag=4
S-C Rwalk tag=4
S-C Rflush tag=5

here the two reply messages are switched (erroneously):

C-S Twalk tag=4 fid=22 newfid=23
C-S Tflush tag=5 oldtag=4
S-C Rflush tag=5
S-C Rwalk tag=4

the Rflush signals to the client that the Twalk
was successfully flushed, so the client
considers that the clone failed, whereas
it actually succeeded. the Rwalk
is considered a spurious message (it may
even interfere destructively with subsequent Tmsg).
result: death and destruction.

anyway, this is moot - from the original plan 9 paper:
The 9P protocol must run above a reliable transport protocol with
delimited messages. [...]
UDP [RFC768] does not provide reliable in-order delivery.
(is this the canonical reference for this requirement? the man page
doesn't seem to say it)

the protocol doesn't guarantee that requests are *processed*
in order, but that's a different thing entirely, and something
my half-baked proposal seeks to get around.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread Fco. J. Ballesteros
Well, if you don't have flush, your server is going to keep a request
for each process that dies/aborts. If requests always complete quite
soon it's not a problem, AFAIK, but your server may be keeping the
request to reply when something happens. Also, there's the issue that
the flushed request may have allocated a fid or some other resource.
If you don't agree that the thing is flushed you get out of sync with the
client.

What I mean is that as soon as you get concurrent requests you really
ned to implement flush. Again, AFAIK.

  From: quans...@quanstro.net
  To: 9fans@9fans.net
  Reply-To: 9fans@9fans.net
  Date: Tue Apr 21 16:11:28 CET 2009
  Subject: Re: [9fans] Plan9 - the next 20 years
  
  On Tue Apr 21 10:05:43 EDT 2009, rogpe...@gmail.com wrote:
   2009/4/21 erik quanstrom quans...@quanstro.net:
what is the important use case of flush and why is this
so important that it drives the design?
   
  [...]
   The 9P protocol must run above a reliable transport protocol with
   delimited messages. [...]
   UDP [RFC768] does not provide reliable in-order delivery.
   (is this the canonical reference for this requirement? the man page
   doesn't seem to say it)
  
  great post, but i still don't understand why the
  protocol is designed around flush semantics.
  
  all your examples have to do with the interaction
  between flush and something else. why is flush
  so important? what if we just ignored the response
  we don't want instead?
  
  - erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
On Tue Apr 21 10:34:34 EDT 2009, n...@lsub.org wrote:
 Well, if you don't have flush, your server is going to keep a request
 for each process that dies/aborts. If requests always complete quite
 soon it's not a problem, AFAIK, but your server may be keeping the
 request to reply when something happens. Also, there's the issue that
 the flushed request may have allocated a fid or some other resource.
 If you don't agree that the thing is flushed you get out of sync with the
 client.
 
 What I mean is that as soon as you get concurrent requests you really
 ned to implement flush. Again, AFAIK.

isn't the tag space per fid?  a variation on the tagged queuing flush
cache would be to force the client to make sure that reordered
flush tags aren't a problem.  it would not be very hard to ensure that
tag overlap does not happen.

if the problem with 9p is latency, then here's a decision that could be
revisisted.  it would be a complication, but it seems to me better than
a http-like protocol, bundling requets together or moving to a storage-
oriented protocol.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 erik quanstrom quans...@quanstro.net:
 isn't the tag space per fid?

no, otherwise every reply message (and Tflush) would include a fid too;
moreover Tversion doesn't use a fid (although it probably doesn't
actually need a tag)

 a variation on the tagged queuing flush
 cache would be to force the client to make sure that reordered
 flush tags aren't a problem.  it would not be very hard to ensure that
 tag overlap does not happen.

the problem is not in tag overlap, but in the fact that
the server may or may not already have serviced the
request when it receives a Tflush.
the client can't know this - the only way
it knows if the transaction actually took place is if the
reply to the request arrives before the reply to the flush.

this race is, i think, an inherent part of allowing
requests to be aborted. the flush protocol is probably the most
complex and the most delicate part of 9p, but it's also one
of the most useful, because reinventing it correctly is
hard and it solves a oft-found problem - how do i tear
down a request that i've already started?

plan 9 and inferno rely quite heavily on having flush,
and it's sometimes notable when servers don't implement it.
for instance, inferno's file2chan provides no facility
for flush notification, and wm/sh uses file2chan; thus if you
kill a process that's reading from wm/sh's /dev/cons,
the read goes ahead anyway, and a line of input is lost
(you might have seen this if you ever used os(1)).

that's aside from the issue of resource-leakage that
nemo points out.

the idea with my proposal is to have an extension that
changes as few of the semantics of 9p as possible:

C-S Tsequence tag=1 sid=1
C-S Topen tag=2 sid=1 fid=20 mode=0
C-S Tread tag=3 sid=1 fid=20 count=8192
C-S Tclunk tag=4 sid=1
S-C Rsequence tag=1
S-C Ropen tag=2 qid=...
S-C Rread tag=3 data=...
S-C Rclunk tag=4

would be exactly equivalent to:

C-S Topen tag=2 fid=20 mode=0
S-C Ropen tag=2 qid=...
C-S Tread tag=3 fid=20 count=8192
S-C Rread tag=3 data=...
C-S Tclunk tag=4
S-C Rclunk tag=4

and the client-side interface could be designed so
that the client code is the same regardless of whether
the server implements Tsequence or not (for instance,
in-kernel drivers need not implement it).

thus most of the code base could remain unchanged,
but everywhere gets the benefit of latency reduction from a few core
code changes (e.g. namec).



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread Bakul Shah
On Tue, 21 Apr 2009 10:50:18 EDT erik quanstrom quans...@quanstro.net  wrote:
 On Tue Apr 21 10:34:34 EDT 2009, n...@lsub.org wrote:
  Well, if you don't have flush, your server is going to keep a request
  for each process that dies/aborts. 

If a process crashes, who sends the Tflush?  The server must
clean up without Tflush if a connection closes unexpectedly.

I thought the whole point of Tflush was to cancel a
potentially expensive operation (ie when the user hits the
interrupt key). You still have to cleanup.

 If requests always complete quite
  soon it's not a problem, AFAIK, but your server may be keeping the
  request to reply when something happens. Also, there's the issue that
  the flushed request may have allocated a fid or some other resource.
  If you don't agree that the thing is flushed you get out of sync with the
  client.
  
  What I mean is that as soon as you get concurrent requests you really
  ned to implement flush. Again, AFAIK.
 
 isn't the tag space per fid?  a variation on the tagged queuing flush
 cache would be to force the client to make sure that reordered
 flush tags aren't a problem.  it would not be very hard to ensure that
 tag overlap does not happen.

Why does it matter?

 if the problem with 9p is latency, then here's a decision that could be
 revisisted.  it would be a complication, but it seems to me better than
 a http-like protocol, bundling requets together or moving to a storage-
 oriented protocol.

Can you explain why is it better than bundling requests
together?  Bundling requests can cut out a few roundtrip
delays, which can make a big difference for small files.
What you are talking about seems useful for large files [if I
understand you correctly].  Second, 9p doesn't seem to
restrict any replies other than Rflushes to be sent in order.
That means the server can still send Rreads in any order but
if a Tflush is seen, it must clean up properly.  The
situation is analogous what happens in an an OoO processor
(where results must be discarded in case of exceptions and
mis-prediction on branches).



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
 plan 9 and inferno rely quite heavily on having flush,
 and it's sometimes notable when servers don't implement it.
 for instance, inferno's file2chan provides no facility
 for flush notification, and wm/sh uses file2chan; thus if you
 kill a process that's reading from wm/sh's /dev/cons,
 the read goes ahead anyway, and a line of input is lost
 (you might have seen this if you ever used os(1)).

isn't the race still there, just with a smaller window of
oppertunity?

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
  if the problem with 9p is latency, then here's a decision that could be
  revisisted.  it would be a complication, but it seems to me better than
  a http-like protocol, bundling requets together or moving to a storage-
  oriented protocol.
 
 Can you explain why is it better than bundling requests
 together?  Bundling requests can cut out a few roundtrip
 delays, which can make a big difference for small files.
 What you are talking about seems useful for large files [if I
 understand you correctly].  Second, 9p doesn't seem to
 restrict any replies other than Rflushes to be sent in order.
 That means the server can still send Rreads in any order but
 if a Tflush is seen, it must clean up properly.  The
 situation is analogous what happens in an an OoO processor
 (where results must be discarded in case of exceptions and
 mis-prediction on branches).

bundling is equivalent to running the original sequence on
the remote machine and shipping only the result back.  some
rtt latency is eliminated but i think things will still be largely
in-order because walks will act like fences.  i think the lots-
of-small-files case will still suffer.  maybe i'm not quite following
along.

bundling will also require additional agent on the server to
marshal the bundled requests.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread Bakul Shah
On Tue, 21 Apr 2009 17:03:07 BST roger peppe rogpe...@gmail.com  wrote:
 the idea with my proposal is to have an extension that
 changes as few of the semantics of 9p as possible:
 
 C-S Tsequence tag=3D1 sid=3D1
 C-S Topen tag=3D2 sid=3D1 fid=3D20 mode=3D0
 C-S Tread tag=3D3 sid=3D1 fid=3D20 count=3D8192
 C-S Tclunk tag=3D4 sid=3D1
 S-C Rsequence tag=3D1
 S-C Ropen tag=3D2 qid=3D...
 S-C Rread tag=3D3 data=3D...
 S-C Rclunk tag=3D4
 
 would be exactly equivalent to:
 
 C-S Topen tag=3D2 fid=3D20 mode=3D0
 S-C Ropen tag=3D2 qid=3D...
 C-S Tread tag=3D3 fid=3D20 count=3D8192
 S-C Rread tag=3D3 data=3D...
 C-S Tclunk tag=3D4
 S-C Rclunk tag=3D4
 
 and the client-side interface could be designed so
 that the client code is the same regardless of whether
 the server implements Tsequence or not (for instance,
 in-kernel drivers need not implement it).

Do you really need a Tsequence? Seems to me this should
already work  Let me illustrate with a timing diagram:

Strict request/response:
 1 2 3 4 5
   012345678901234567890123456789012345678901234567890
C: Topen   Tread   Tclunk  |
S: Ropen   Rread   Rclunk

Pipelined case:
 1 2 3 4 5
   012345678901234567890123456789012345678901234567890
C: Topen Tread Tclunk  |
S: Ropen Rread Rclunk

Here latency is 8 time units (one column = 1 time unit). In
the first case it takes 48 time units from Topen to Rclunk
received by server. In the second case it takes 28 time
units.

In the pipelined case, from a server's perspective, client's
requests just get to it faster (and may already be waiting!).
It doesn't have to do anything special.  What am I missing?



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread David Leimbach
On Tue, Apr 21, 2009 at 2:52 AM, maht mattmob...@proweb.co.uk wrote:

 one issue with multiple 9p requests is that tags are not order enforced

 consider the contrived directory tree

 1/a/a
 1/a/b
 1/b/a
 1/b/b

 Twalk 1 fid 1
 Twalk 2 fid a
 Twalk 3 fid b

 Tag 3 could conceivably arrive at the server before Tag 2


This would be transport dependent I presume?


Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread David Leimbach
On Tue, Apr 21, 2009 at 1:19 AM, roger peppe rogpe...@gmail.com wrote:

 2009/4/20 andrey mirtchovski mirtchov...@gmail.com:
  with 9p, this takes a number of walks...
 
  shouldn't that be just one walk?
 
  % ramfs -D
  ...
  % mkdir -p /tmp/one/two/three/four/five/six
  ...
  % cd /tmp/one/two/three/four/five/six
  ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two
  2:three 3:four 4:five 5:six
  ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d)
  1:(0002 0 d) 2:(0003 0 d) 3:(0004
  0 d) 4:(0005 0 d) 5:(0006 0 d)

 that depends if it's been gated through exportfs or not
 (exportfs only walks one step at a time, regardless of
 the incoming walk)

 i'm sure something like this has been discussed before,
 and this idea somewhat half-baked, but one could get
 quite a long way by allowing the notion of a sequence
 of related 9p actions - if one action fails, then all subsequent
 actions are discarded.

 one difficulty with using multiple concurrent requests
 with 9p as it stands is that there's no way to force
 the server to process them sequentially. fcp works
 because the reads it sends can execute out of order
 without changing the semantics, but this only works
 on conventional files.

 suppose all 9p Tmsgs were given an sid (sequence id)
 field. a new 9p message, Tsequence, would start
 a sequence; subsequent messages with the same sid
 would be added to a server-side queue for that sequence
 rather than being executed immediately.

 the server would move sequentially through the queue,
 executing actions and sending each reply when complete.
 the sequence would abort when one of:
 a) an Rerror is sent
 b) a write returned less than the number of bytes written
 c) a read returned less than the number of bytes requested.

 this mechanism would allow a client to program a set of
 actions to perform sequentially on the server without
 having to wait for each reply in turn, i.e. avoiding the
 usual 9p latency.

 some use cases:

 the currently rather complex definition of Twalk could
 be replaced by clone and walk1 instead, as
 in the original 9p: {Tclone, Twalk, Twalk, ...}

 {Twrite, Tread} gives a RPC-style request - no need
 for venti to use its own protocol (which i assume was invented
 largely because of the latency inherent in doing two
 separate 9p requests where one would do).

 streaming - send several speculative requests, and keep
 adding a request to the sequence when a reply arrives.
 still probably not as good as straight streaming TCP,
 but easier than fcp and more general.

 there are probably lots of reasons why this couldn't
 work, but i can't think of any right now...


Roger... this sounds pretty promising.  10p?  I'd hate to call it 9p++.


Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 Bakul Shah bakul+pl...@bitblocks.com:
 In the pipelined case, from a server's perspective, client's
 requests just get to it faster (and may already be waiting!).
 It doesn't have to do anything special.  What am I missing?

you're missing the fact that without the sequence operator, the
second request can arrive before the first request
has completed, thus potentially making it invalid
(e.g. it's invalid to read from a file that hasn't been opened).

also, in many current server implementations, each
request gets serviced in its
own process - there's no guarantee that replies will
come back in the same order as the requests,
even if all the requests are serviced immediately.

it would be possible to do a similar kind of thing
by giving the same tag to the operations in a sequence,
but i'm quite attached to the fact that

a) the operations are otherwise identical to operations in the original protocol
b) a small bit of extra redundancy is useful for debugging.
c) you can get Tendsequence (which is necessary, i now realise)
by flushing the Tsequence with a request within the sequence itself.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread David Leimbach
On Tue, Apr 21, 2009 at 9:25 AM, erik quanstrom quans...@quanstro.netwrote:

   if the problem with 9p is latency, then here's a decision that could be
   revisisted.  it would be a complication, but it seems to me better than
   a http-like protocol, bundling requets together or moving to a storage-
   oriented protocol.
 
  Can you explain why is it better than bundling requests
  together?  Bundling requests can cut out a few roundtrip
  delays, which can make a big difference for small files.
  What you are talking about seems useful for large files [if I
  understand you correctly].  Second, 9p doesn't seem to
  restrict any replies other than Rflushes to be sent in order.
  That means the server can still send Rreads in any order but
  if a Tflush is seen, it must clean up properly.  The
  situation is analogous what happens in an an OoO processor
  (where results must be discarded in case of exceptions and
  mis-prediction on branches).

 bundling is equivalent to running the original sequence on
 the remote machine and shipping only the result back.  some
 rtt latency is eliminated but i think things will still be largely
 in-order because walks will act like fences.  i think the lots-
 of-small-files case will still suffer.  maybe i'm not quite following
 along.


Perhaps you don't want to use this technique for lots of smaller files.
 There's nothing in the protocol Roger suggested preventing us from using
different sequence id's and getting the old behavior back right?

It's a bit complex... but worth thinking about.

Dave




 bundling will also require additional agent on the server to
 marshal the bundled requests.

 - erik




Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 David Leimbach leim...@gmail.com:
 Roger... this sounds pretty promising.

i dunno, there are always hidden dragons in this area,
and forsyth, rsc and others are better at seeing them than i.

  10p?  I'd hate to call it 9p++.

9p2010, based on how soon it would be likely to be implemented...



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread David Leimbach
On Tue, Apr 21, 2009 at 10:06 AM, roger peppe rogpe...@gmail.com wrote:

 2009/4/21 David Leimbach leim...@gmail.com:
  Roger... this sounds pretty promising.

 i dunno, there are always hidden dragons in this area,
 and forsyth, rsc and others are better at seeing them than i.


Perhaps... but this discussion, and trying is better than not.  :-)



   10p?  I'd hate to call it 9p++.

 9p2010, based on how soon it would be likely to be implemented...


True.


Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 erik quanstrom quans...@quanstro.net:
 plan 9 and inferno rely quite heavily on having flush,
 and it's sometimes notable when servers don't implement it.
 for instance, inferno's file2chan provides no facility
 for flush notification, and wm/sh uses file2chan; thus if you
 kill a process that's reading from wm/sh's /dev/cons,
 the read goes ahead anyway, and a line of input is lost
 (you might have seen this if you ever used os(1)).

 isn't the race still there, just with a smaller window of
 oppertunity?

sure, there's always a race if you allow flush.
(but at least the results are well defined regardless
of the winner).

i was trying to point out that if you try to
ignore the issue by removing flush from the
protocol, you'll get a system that doesn't work so smoothly.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 erik quanstrom quans...@quanstro.net:
 bundling is equivalent to running the original sequence on
 the remote machine and shipping only the result back.  some
 rtt latency is eliminated but i think things will still be largely
 in-order because walks will act like fences.  i think the lots-
 of-small-files case will still suffer.  maybe i'm not quite following
 along.

i agree that the lots-of-small-files case will still suffer
(mainly because the non-hierarchical mount table
means we can't know what's mounted below a particular
node without asking the server).

but this still gives the opportunity to considerably speed
up many common actions (e.g. {walk, open, read, close})
without adding too much (i think) complexity of the protocol.

also, as david leimbach points out, this case is still amenable to the
send several
requests concurrently approach.



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread erik quanstrom
 i was trying to point out that if you try to
 ignore the issue by removing flush from the
 protocol, you'll get a system that doesn't work so smoothly.

your failure cases seem to rely on poorly chosen tags.
i wasn't suggesting that flush be eliminated.  i was
thinking of ways of keeping flush self-contained.

if the tag space were enlarged so that we could require
that tags never be reused, flushes that do not flush anything
could be remembered.  the problem with this is it could
require a large memory of unprocessed flushes.  there
are lots of potential solutions to that problem.
one could allow the server to stall if too many martian
flushes were hanging about and allow clients to declare
they will reuse part of the tag space and asserting that
nothing within reused portion is outstanding.  you could
then keep the current definition of tag, though 16 bits
seems a bit small to me.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-21 Thread roger peppe
2009/4/21 erik quanstrom quans...@quanstro.net:
 i was trying to point out that if you try to
 ignore the issue by removing flush from the
 protocol, you'll get a system that doesn't work so smoothly.

 your failure cases seem to rely on poorly chosen tags.
 i wasn't suggesting that flush be eliminated.  i was
 thinking of ways of keeping flush self-contained.

my failure cases were based around supposing
that 9p messages could be re-ordered in transit,
as a response to maht's post.

they can't, so there's no problem, as far as i can see.

my proposal barely affects the flush semantics
(there'd be an additional rule that flushing a Tsequence
aborts the entire sequence, but i think that's it)

the race case that i was talking about cannot
be avoided by remembering old flushes. the
problem is that an action (which could involve any
number of irreversible side-effects) might have been
performed at the instant that you tell it to abort.
the semantics of flush let you know if you got there
in time to stop it.



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread Uriel
On Mon, Apr 20, 2009 at 4:14 AM, Skip Tavakkolian 9...@9netics.com wrote:
 ericvh stated it better in the FAWN thread.  choosing the abstraction
 that makes the resulting environments have required attributes
 (reliable, consistent, easy, etc.) will be the trick.  i believe with
 the current state of the Internet -- e.g.  lack of speed and security
 -- service abstraction is the right level of distributedness.
 presenting the services as file hierarchy makes sense; 9p is efficient

9p is efficient as long as your latency is 30ms

uriel


 and so the plan9 approach still feels like the right path to cloud
 computing.

 On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote:

  Well, in the octopus you have a fixed part, the pc, but all other
  machines come and go. The feeling is very much that your stuff is in
  the cloud.

 i was going to mention this.  to me the current view of cloud
 computing as evidence by papers like this[1] are basically hardware
 infrastructure capable of running vm pools each of which would do
 exactly what a dedicated server would do.  the main benefits being low
 administration cost and elasticity.  networking, authentication and
 authorization remain as they are now.  they are still not addressing
 what octopus and rangboom are trying to address: how to seamlessly and
 automatically make resources accessible.  if you read what ken said it
 appears to be this view of cloud computing; he said some framework to
 allow many loosely-coupled Plan9 systems to emulate a single system
 that would be larger and more reliable.  in all virtualization
 systems i've seen the vm has to be smaller than the environment it
 runs on.  if vmware or xen were ever to give you a vm that was larger
 than any given real machine it ran on, they'd have to solve the same
 problem.


 I'm not sure a single system image is any better in the long run than
 Distributed Shared Memory.  Both have issues of locality, where the
 abstraction that gives you the view of a single machine hurts your ability
 to account for the lack of locality.

 In other words, I think applications should show a single system image but
 maybe not programming models.  I'm not 100% sure what I mean by that
 actually, but it's sort of an intuitive feeling.




 [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf









Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread maht




9p is efficient as long as your latency is 30ms
  

What kind of latency?

For speed of light in fibre optic 30ms is about 8000km (New York to San 
Francisco and back)


in that 30ms a 3.2Ghz P4 could do 292 million instructions

There's an interesting article about it in acmq queue20090203-dl.pdf
- Fighting Physics : A Tough Battle


http://www.maht0x0r.net/library/computing/acm/queue20090203-dl.pdf








Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread Skip Tavakkolian
 9p is efficient as long as your latency is 30ms

check out ken's answer to a question by sqweek.  the question
starts: With cross-continental round trip times, 9p has a hard time
competing (in terms of throughput) against less general protocols like
HTTP.  ...

http://moderator.appspot.com/#15/e=c9t=2d




Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread erik quanstrom
 http://moderator.appspot.com/#15/e=c9t=2d

You must have JavaScript enabled in order to use this feature.

cruel irony.

- erik




Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread J.R. Mauro

 What kind of latency?

 For speed of light in fibre optic 30ms is about 8000km (New York to San
 Francisco and back)

Assuming you have a direct fiber connection with no routers in
between. I would say that is somewhat rare.



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread David Leimbach
On Mon, Apr 20, 2009 at 11:03 AM, Skip Tavakkolian 9...@9netics.com wrote:

  9p is efficient as long as your latency is 30ms

 check out ken's answer to a question by sqweek.  the question
 starts: With cross-continental round trip times, 9p has a hard time
 competing (in terms of throughput) against less general protocols like
 HTTP.  ...

 http://moderator.appspot.com/#15/e=c9t=2d



I thought 9p had tagged requests so you could put many requests in flight at
once, then synchronize on them when the server replied.

Maybe i misunderstand the application of the tag field in the protocol then?

Tread tag fid offset count

Rread tag count data


Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread erik quanstrom
 I thought 9p had tagged requests so you could put many requests in flight at
 once, then synchronize on them when the server replied.
 
 Maybe i misunderstand the application of the tag field in the protocol then?
 
 Tread tag fid offset count
 
 Rread tag count data

without having the benefit of reading ken's thoughts ...

you can have 1 fd being read by 2 procs at the same time.
the only way to do this is by having multiple outstanding tags.

i think the complaint about 9p boils down to ordering.
if i want to do something like
cd /sys/src/9/pc/ ; cat sdata.c
that's a bunch of walks and then an open and then a read.
these are done serially, and each one takes 1rtt.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread Francisco J Ballesteros
I did the experiment, for the o/live, of issuing multiple (9p) RPCs
in parallel, without waiting for answers.

In general it was not enough, because in the end the client had to block
and wait for the file to come before looking at it to issue further rpcs.



On Mon, Apr 20, 2009 at 8:03 PM, Skip Tavakkolian 9...@9netics.com wrote:
 9p is efficient as long as your latency is 30ms

 check out ken's answer to a question by sqweek.  the question
 starts: With cross-continental round trip times, 9p has a hard time
 competing (in terms of throughput) against less general protocols like
 HTTP.  ...

 http://moderator.appspot.com/#15/e=c9t=2d






Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread Charles Forsyth
For speed of light in fibre optic 30ms is about 8000km (New York to San 
Francisco and back)

in that 30ms a 3.2Ghz P4 could do 292 million instructions

i think that's just enough to get to dbus and back.



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread maht

J.R. Mauro wrote:

What kind of latency?

For speed of light in fibre optic 30ms is about 8000km (New York to San
Francisco and back)



Assuming you have a direct fiber connection with no routers in
between. I would say that is somewhat rare.


  
The author found that  from klondike.cis.upenn.edu  cs.standford.edu 
added about 50ms to the round trip






Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread David Leimbach
On Mon, Apr 20, 2009 at 11:35 AM, erik quanstrom quans...@coraid.comwrote:

  I thought 9p had tagged requests so you could put many requests in flight
 at
  once, then synchronize on them when the server replied.
 
  Maybe i misunderstand the application of the tag field in the protocol
 then?
 
  Tread tag fid offset count
 
  Rread tag count data

 without having the benefit of reading ken's thoughts ...

 you can have 1 fd being read by 2 procs at the same time.
 the only way to do this is by having multiple outstanding tags.


I thought the tag was assigned by the client, not the server (since it shows
up as a field in the T message), and that this meant it's possible for one
client to put many of it's own locally tagged requests into the server, and
wait for them in any order it chooses.

It would not make sense to me to have to have a global pool of tags for all
possible connecting clients.

Again, this may just be my ignorance, and the fact that I've never
implemented a 9p client or server myself.  (haven't had a need to yet!)




 i think the complaint about 9p boils down to ordering.
 if i want to do something like
cd /sys/src/9/pc/ ; cat sdata.c
 that's a bunch of walks and then an open and then a read.
 these are done serially, and each one takes 1rtt.


Some higher operations probably require an ordering.  But there's no reason
you could do two different sequences of walks, and a read concurrently is
there?




 - erik




Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread erik quanstrom
   Tread tag fid offset count
  
   Rread tag count data
 
  without having the benefit of reading ken's thoughts ...
 
  you can have 1 fd being read by 2 procs at the same time.
  the only way to do this is by having multiple outstanding tags.
 
 
 I thought the tag was assigned by the client, not the server (since it shows
 up as a field in the T message), and that this meant it's possible for one
 client to put many of it's own locally tagged requests into the server, and
 wait for them in any order it chooses.

that's what i thought i said.  (from the perspective of pread and pwrite
not (T R)^(read write).)

  i think the complaint about 9p boils down to ordering.
  if i want to do something like
 cd /sys/src/9/pc/ ; cat sdata.c
  that's a bunch of walks and then an open and then a read.
  these are done serially, and each one takes 1rtt.
 
 
 Some higher operations probably require an ordering.  But there's no reason
 you could do two different sequences of walks, and a read concurrently is
 there?

not that i can think of.  but that addresses throughput, but not latency.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread Steve Simon
 I thought 9p had tagged requests so you could put many requests in flight
 at
 once, then synchronize on them when the server replied.

This is exactly what fcp(1) does, which is used by replica.

If you want to read a virtual file however, these often
don't support seeks or implement them in unexpected ways
(returning one line per read rather than a buffer full).

Thus running multiple reads (on the same file) only really
works for files which operate as read disks - e.g. real disks,
ram disks etc.

-Steve



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread erik quanstrom
 Thus running multiple reads (on the same file) only really
 works for files which operate as read disks - e.g. real disks,
 ram disks etc.

at which point, you have reinvented aoe. :-)

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread David Leimbach
On Mon, Apr 20, 2009 at 12:03 PM, erik quanstrom quans...@coraid.comwrote:

Tread tag fid offset count
   
Rread tag count data
  
   without having the benefit of reading ken's thoughts ...
  
   you can have 1 fd being read by 2 procs at the same time.
   the only way to do this is by having multiple outstanding tags.
 
 
  I thought the tag was assigned by the client, not the server (since it
 shows
  up as a field in the T message), and that this meant it's possible for
 one
  client to put many of it's own locally tagged requests into the server,
 and
  wait for them in any order it chooses.

 that's what i thought i said.  (from the perspective of pread and pwrite
 not (T R)^(read write).)


Ah that's what I didn't understand :-).




   i think the complaint about 9p boils down to ordering.
   if i want to do something like
  cd /sys/src/9/pc/ ; cat sdata.c
   that's a bunch of walks and then an open and then a read.
   these are done serially, and each one takes 1rtt.
  
 
  Some higher operations probably require an ordering.  But there's no
 reason
  you could do two different sequences of walks, and a read concurrently is
  there?

 not that i can think of.  but that addresses throughput, but not latency.


Right, but with better throughput overall, you can hide latency in some
applications.  That's what HTTP does with this AJAX fun right?

Show some of the page, load the rest over time, and people feel better
about stuff.

I had an application for SNMP in Erlang that did too much serially, and by
increasing the number of outstanding requests, I got the overall job done
sooner, despite the latency issues.  This improved the user experience by
about 10 seconds less wait time.  Tagged requests was actually how I
implemented it :-)

9p can't fix the latency problems, but applications over 9p can be designed
to try to hide some of it, depending on usage.




 - erik




Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread erik quanstrom
  not that i can think of.  but that addresses throughput, but not latency.
 
 
 Right, but with better throughput overall, you can hide latency in some
 applications.  That's what HTTP does with this AJAX fun right?
 
 Show some of the page, load the rest over time, and people feel better
 about stuff.
 
 I had an application for SNMP in Erlang that did too much serially, and by
 increasing the number of outstanding requests, I got the overall job done
 sooner, despite the latency issues.  This improved the user experience by
 about 10 seconds less wait time.  Tagged requests was actually how I
 implemented it :-)
 
 9p can't fix the latency problems, but applications over 9p can be designed
 to try to hide some of it, depending on usage.

let's take the path /sys/src/9/pc/sdata.c.  for http, getting
this path takes one request (with the prefix http://$server)
with 9p, this takes a number of walks, an open.  then you
can start with the reads.  only the reads may be done in
parallel.

given network latency worth worring about, the total latency
to read this file will be worse for 9p than for http.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread David Leimbach
On Mon, Apr 20, 2009 at 1:33 PM, erik quanstrom quans...@coraid.com wrote:

   not that i can think of.  but that addresses throughput, but not
 latency.
 
 
  Right, but with better throughput overall, you can hide latency in some
  applications.  That's what HTTP does with this AJAX fun right?
 
  Show some of the page, load the rest over time, and people feel better
  about stuff.
 
  I had an application for SNMP in Erlang that did too much serially, and
 by
  increasing the number of outstanding requests, I got the overall job done
  sooner, despite the latency issues.  This improved the user experience by
  about 10 seconds less wait time.  Tagged requests was actually how I
  implemented it :-)
 
  9p can't fix the latency problems, but applications over 9p can be
 designed
  to try to hide some of it, depending on usage.

 let's take the path /sys/src/9/pc/sdata.c.  for http, getting
 this path takes one request (with the prefix http://$server)
 with 9p, this takes a number of walks, an open.  then you
 can start with the reads.  only the reads may be done in
 parallel.


 given network latency worth worring about, the total latency
 to read this file will be worse for 9p than for http.


Yeah, I guess due to my lack of having written a 9p client by hand, I've
forgotten that a new 9p client session is stateful, and at the root of the
hierarchy presented by the server.  No choice but to walk, even if you know
the path to the named resource in advance.

This seems to give techniques like REST a bit of an advantage over 9p.
 (yikes)

Would we want a less stateful 9p then?  Does that end up being HTTP or IMAP
or some other thing that already exists?

Dave


 - erik




Re: [9fans] Plan9 - the next 20 years

2009-04-20 Thread andrey mirtchovski
 with 9p, this takes a number of walks...

shouldn't that be just one walk?

% ramfs -D
...
% mkdir -p /tmp/one/two/three/four/five/six
...
% cd /tmp/one/two/three/four/five/six
ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two
2:three 3:four 4:five 5:six
ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d)
1:(0002 0 d) 2:(0003 0 d) 3:(0004
0 d) 4:(0005 0 d) 5:(0006 0 d)



Re: [9fans] Plan9 - the next 20 years

2009-04-19 Thread Skip Tavakkolian
 Well, in the octopus you have a fixed part, the pc, but all other  
 machines come and go. The feeling is very much that your stuff is in  
 the cloud.

i was going to mention this.  to me the current view of cloud
computing as evidence by papers like this[1] are basically hardware
infrastructure capable of running vm pools each of which would do
exactly what a dedicated server would do.  the main benefits being low
administration cost and elasticity.  networking, authentication and
authorization remain as they are now.  they are still not addressing
what octopus and rangboom are trying to address: how to seamlessly and
automatically make resources accessible.  if you read what ken said it
appears to be this view of cloud computing; he said some framework to
allow many loosely-coupled Plan9 systems to emulate a single system
that would be larger and more reliable.  in all virtualization
systems i've seen the vm has to be smaller than the environment it
runs on.  if vmware or xen were ever to give you a vm that was larger
than any given real machine it ran on, they'd have to solve the same
problem.

[1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf




Re: [9fans] Plan9 - the next 20 years

2009-04-19 Thread David Leimbach
On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote:

  Well, in the octopus you have a fixed part, the pc, but all other
  machines come and go. The feeling is very much that your stuff is in
  the cloud.

 i was going to mention this.  to me the current view of cloud
 computing as evidence by papers like this[1] are basically hardware
 infrastructure capable of running vm pools each of which would do
 exactly what a dedicated server would do.  the main benefits being low
 administration cost and elasticity.  networking, authentication and
 authorization remain as they are now.  they are still not addressing
 what octopus and rangboom are trying to address: how to seamlessly and
 automatically make resources accessible.  if you read what ken said it
 appears to be this view of cloud computing; he said some framework to
 allow many loosely-coupled Plan9 systems to emulate a single system
 that would be larger and more reliable.  in all virtualization
 systems i've seen the vm has to be smaller than the environment it
 runs on.  if vmware or xen were ever to give you a vm that was larger
 than any given real machine it ran on, they'd have to solve the same
 problem.


I'm not sure a single system image is any better in the long run than
Distributed Shared Memory.  Both have issues of locality, where the
abstraction that gives you the view of a single machine hurts your ability
to account for the lack of locality.

In other words, I think applications should show a single system image but
maybe not programming models.  I'm not 100% sure what I mean by that
actually, but it's sort of an intuitive feeling.




 [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf





Re: [9fans] Plan9 - the next 20 years

2009-04-19 Thread Enrico Weigelt
* Latchesar Ionkov lu...@ionkov.net wrote:

Hi,

 I talked with a guy that's is doing parallel filesystem work, and
 according to him 80% of all filesystem operations when running an HPC
 job are for checkpointing (not that much restart). I just don't see
 how checkpointing can scale knowing how bad the parallel fs are.

We need a clustered venti and an cluster-aware fossil ;-P

I'm currently in the process of designing an clustered storage, 
inspired by venti and git, which also supports removing files,
on-demand sychronization, etc. I'll let you know when I've 
got something to present.


cu
-- 
--
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: i...@metux.de   skype: nekrad666
--
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
--



Re: [9fans] Plan9 - the next 20 years

2009-04-19 Thread ron minnich
On Sun, Apr 19, 2009 at 12:34 PM, Enrico Weigelt weig...@metux.de wrote:

 I'm currently in the process of designing an clustered storage,
 inspired by venti and git, which also supports removing files,
 on-demand sychronization, etc. I'll let you know when I've
 got something to present.

The only presentation with any value is code.

code-code is better than jaw-jaw.

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-19 Thread Skip Tavakkolian
ericvh stated it better in the FAWN thread.  choosing the abstraction
that makes the resulting environments have required attributes
(reliable, consistent, easy, etc.) will be the trick.  i believe with
the current state of the Internet -- e.g.  lack of speed and security
-- service abstraction is the right level of distributedness.
presenting the services as file hierarchy makes sense; 9p is efficient
and so the plan9 approach still feels like the right path to cloud
computing.

 On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote:
 
  Well, in the octopus you have a fixed part, the pc, but all other
  machines come and go. The feeling is very much that your stuff is in
  the cloud.

 i was going to mention this.  to me the current view of cloud
 computing as evidence by papers like this[1] are basically hardware
 infrastructure capable of running vm pools each of which would do
 exactly what a dedicated server would do.  the main benefits being low
 administration cost and elasticity.  networking, authentication and
 authorization remain as they are now.  they are still not addressing
 what octopus and rangboom are trying to address: how to seamlessly and
 automatically make resources accessible.  if you read what ken said it
 appears to be this view of cloud computing; he said some framework to
 allow many loosely-coupled Plan9 systems to emulate a single system
 that would be larger and more reliable.  in all virtualization
 systems i've seen the vm has to be smaller than the environment it
 runs on.  if vmware or xen were ever to give you a vm that was larger
 than any given real machine it ran on, they'd have to solve the same
 problem.
 
 
 I'm not sure a single system image is any better in the long run than
 Distributed Shared Memory.  Both have issues of locality, where the
 abstraction that gives you the view of a single machine hurts your ability
 to account for the lack of locality.
 
 In other words, I think applications should show a single system image but
 maybe not programming models.  I'm not 100% sure what I mean by that
 actually, but it's sort of an intuitive feeling.
 
 


 [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf







Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread lucio
 the original condor just forwarded system calls back to the node it
 was started from. Thus all system calls were done in the context of
 the originating node and user.

Not much good if you're migrating because the node's gone down.  What
happens then?  Sorry to ask, RTFM seems a bit beyond my ken, right
now.

Also, that gives you a single level of checkpoint, in fact, I'd call
that strictly migration, C/R should allow a history of checkpoints,
each of which can be restarted.  Might not be possible, of course,
unless one defines the underlying platform with new properties
specially designed for C/R (and migration).  Hence my insistence that
Plan 9 is a better platform than other OSes in common use.  The
hardware may need to be adjusted too, but that's where Inferno EMU
will come to the rescue :-)

++L




Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread J.R. Mauro
On Sat, Apr 18, 2009 at 12:16 AM, erik quanstrom quans...@quanstro.net wrote:
 On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net 
 wrote:
  I can imagine a lot of problems stemming from open files could be
  resolved by first attempting to import the process's namespace at the
  time of checkpoint and, upon that failing, using cached copies of the
  file made at the time of checkpoint, which could be merged later.
 
  there's no guarantee to a process running in a conventional
  environment that files won't change underfoot.  why would
  condor extend a new guarantee?
 
  maybe i'm suffering from lack of vision, but i would think that
  to get to 100% one would need to think in terms of transactions
  and have a fully transactional operating system.
 
  - erik
 

 There's a much lower chance of files changing out from you in a
 conventional environment. If the goal is to make the unconventional
 environment look and act like the conventional one, it will probably
 have to try to do some of these things to be useful.

 * you can get the same effect by increasing the scale of your system.

 * the reason conventional systems work is not, in my opinion, because
 the collision window is small, but because one typically doesn't do
 conflicting edits to the same file.

 * saying that something isn't likely in an unquantifiable way is
 not a recipie for success in computer science, in my experience.

 - erik


I don't see how any of that relates to having to do more work to
ensure that C/R and process migration across nodes works and keeps
things as consistent as possible.



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread lucio
 there's no guarantee to a process running in a conventional
 environment that files won't change underfoot.  why would
 condor extend a new guarantee?

Because you have to migrate standard applications, not only
applications that allow for migration.  Consider a word processing
session, for example.  Of course, one can design applications
differently and I'm sure cloud computing will demand a different
paradigm, but that is not what's being discussed here.

++L




Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread tlaronde
On Fri, Apr 17, 2009 at 03:15:25PM -0700, ron minnich wrote:
 if you want to look at checkpointing, it's worth going back to look at
 Condor, because they made it really work. There are a few interesting
 issues that you need to get right. You can't make it 50% of the way
 there; that's not useful. You have to hit all the bits -- open /tmp
 files, sockets, all of it. It's easy to get about 90% of it but the
 last bits are a real headache. Nothing that's come along since has
 really done the job (although various efforts claim to, you have to
 read the fine print).

My only knowledge about this area is through papers and books so very
abstract.

But my gut feeling, after reading about Mach or reading A. Tanenbaum
(that I find poor---but he is A. Tanenbaum, I'm only T. Laronde),
is that a cluster is above the OS (a collection of CPUs), but a
NUMA is for the OS an atom, i.e. is below the OS, a kind of
processor, a single CPU (so NUMA without a strong hardware specifity
is something I don't understand).

In all the mathematical or computer work I have done, defining the
element, the atom (that is the unit I don't have to know or to deal with
what is inside) has always given the best results.

Not related to what you wrote but the impression made by what can be
read about this cloud computing in the open sewer:

A NUMA made of totally heterogeneous hardware with users plugging or
unplugging a CPU component at will. Or a start-up (end-down) providing
cloud computing with as the only means the users' hardware connected
is perhaps a WEB 3.0 or more, a 4th millenium idea etc. but is for me at
best an error, at worst a swindle.
-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread Steve Simon
I assumed cloud computing means you can log into any node
that you are authorised to and your data and code will migrate
to you as needed.

The idea being the sam -r split is not only dynamic but on demand,
you may connect to the cloud from your phone just to read your email
so the editor session stays attached to your home server as it has not
requested many graphics events over the slow link.

anecdote
When I first read the London UKUUG Plan9 paper in the early nineties
I was managing a HPC workstation cluster.

I misunderstood the plan9 cpu(1) command and thought it, by default, connected
to the least loaded cpu server available. I was sufficently inspired to write
a cpu(1) command for my HP/UX cluster which did exactly that.

Funny old world.
/anecdote

-Steve



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread erik quanstrom
  * you can get the same effect by increasing the scale of your system.
 
  * the reason conventional systems work is not, in my opinion, because
  the collision window is small, but because one typically doesn't do
  conflicting edits to the same file.
 
  * saying that something isn't likely in an unquantifiable way is
  not a recipie for success in computer science, in my experience.
 
  - erik
 
 
 I don't see how any of that relates to having to do more work to
 ensure that C/R and process migration across nodes works and keeps
 things as consistent as possible.

that's a fine and sensible goal.  but for the reasons above, i don't buy this
line of reasoning.

in a plan 9 system, the only files that i can think of which many processes
have open at the same time are log files, append-only files.  just reopening
log file would solve the problem.

what is a specific case of contention you are thinking of?

i'm not sure why editor is the case that's being bandied about.  two users
don't usually edit the same file at the same time.  that case already
does not work.  and i'm not sure why one would snapshot an editing
session edit the file by other means and expect things to just work out.
(and finally, acme, for example, does not keep the original file open.
if open files are what get snapshotted, there would be not difference.)

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread tlaronde
[I reply to myself because I was replying half on two distinct threads]

On Sat, Apr 18, 2009 at 01:59:03PM +0200, tlaro...@polynum.com wrote:
 
 But my gut feeling, after reading about Mach or reading A. Tanenbaum
 (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde),
 is that a cluster is above the OS (a collection of CPUs), but a
 NUMA is for the OS an atom, i.e. is below the OS, a kind of
 processor, a single CPU (so NUMA without a strong hardware specifity
 is something I don't understand).
 
 In all the mathematical or computer work I have done, defining the
 element, the atom (that is the unit I don't have to know or to deal with
 what is inside) has always given the best results.

The link between this and the process migration is that, IMHO or in my
limited mind, one allocates, depending on resources available at the
moment, once and for the process duration, a node.  This is OS business
: allocating resources from a cluster of CPUs.

The task doesn't migrate between nodes, it can migrate inside
the node, from core to core in a tightly memory space coupled CPU 
(a mainframe, whether NUMA or not) that handles failover etc. But
this is infra-OS, hardware stuff and as far as the OS is concerned
nothing has changed since the node is an unit, an atom. And trying
to solve the problem by breaking the border (going inside the atom)
is something I don't feel.

-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread ron minnich
On Sat, Apr 18, 2009 at 4:59 AM,  tlaro...@polynum.com wrote:

 But my gut feeling, after reading about Mach or reading A. Tanenbaum
 (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde),
 is that a cluster is above the OS (a collection of CPUs), but a
 NUMA is for the OS an atom, i.e. is below the OS, a kind of
 processor, a single CPU (so NUMA without a strong hardware specifity
 is something I don't understand).

A cluster is above the OS because most cluster people don't know how
to do OS work. Hence most cluster software follows basic patterns
first set in 1991. That is no exaggeration.

For cluster work that was done in the OS, see any clustermatic
publication from minnich, hendriks, or watson, ca. 2000-2005.

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread ron minnich
On Sat, Apr 18, 2009 at 6:50 AM, erik quanstrom quans...@quanstro.net wrote:

 in a plan 9 system, the only files that i can think of which many processes
 have open at the same time are log files, append-only files.  just reopening
 log file would solve the problem.

you're not thinking in terms of parallel applications if you make this
statement.

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread Latchesar Ionkov
I talked with a guy that's is doing parallel filesystem work, and
according to him 80% of all filesystem operations when running an HPC
job are for checkpointing (not that much restart). I just don't see
how checkpointing can scale knowing how bad the parallel fs are.

Lucho

On Fri, Apr 17, 2009 at 4:15 PM, ron minnich rminn...@gmail.com wrote:
 if you want to look at checkpointing, it's worth going back to look at
 Condor, because they made it really work. There are a few interesting
 issues that you need to get right. You can't make it 50% of the way
 there; that's not useful. You have to hit all the bits -- open /tmp
 files, sockets, all of it. It's easy to get about 90% of it but the
 last bits are a real headache. Nothing that's come along since has
 really done the job (although various efforts claim to, you have to
 read the fine print).

 ron





Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread J.R. Mauro
On Sat, Apr 18, 2009 at 9:50 AM, erik quanstrom quans...@quanstro.net wrote:
  * you can get the same effect by increasing the scale of your system.
 
  * the reason conventional systems work is not, in my opinion, because
  the collision window is small, but because one typically doesn't do
  conflicting edits to the same file.
 
  * saying that something isn't likely in an unquantifiable way is
  not a recipie for success in computer science, in my experience.
 
  - erik
 

 I don't see how any of that relates to having to do more work to
 ensure that C/R and process migration across nodes works and keeps
 things as consistent as possible.

 that's a fine and sensible goal.  but for the reasons above, i don't buy this
 line of reasoning.

 in a plan 9 system, the only files that i can think of which many processes
 have open at the same time are log files, append-only files.  just reopening
 log file would solve the problem.

 what is a specific case of contention you are thinking of?

 i'm not sure why editor is the case that's being bandied about.  two users
 don't usually edit the same file at the same time.  that case already
 does not work.  and i'm not sure why one would snapshot an editing
 session edit the file by other means and expect things to just work out.
 (and finally, acme, for example, does not keep the original file open.
 if open files are what get snapshotted, there would be not difference.)

 - erik



Ron mentioned a bunch before, like /etc/hosts or a pipe to another
process, and I would also suggest that things in /net and databases
could be a serious problem. If you migrate a process, how do you
ensure that the process is in a sane state on the new node?

I agree that generally only one process will be accessing a normal
file at once. I think an editor is not a good example, as you say.



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread ron minnich
On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote:

 I agree that generally only one process will be accessing a normal
 file at once. I think an editor is not a good example, as you say.


I'll say it again. It does not matter what we think. It matters what
apps do. And some apps have multiple processes accessing one file.

As to the wisdom of such access, there are many opinions :-)

You really can not just rule things out because reasonable people
don't do them. Unreasonable people write apps too.

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread erik quanstrom
On Sat Apr 18 12:21:49 EDT 2009, rminn...@gmail.com wrote:
 On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote:
 
  I agree that generally only one process will be accessing a normal
  file at once. I think an editor is not a good example, as you say.
 
 
 I'll say it again. It does not matter what we think. It matters what
 apps do. And some apps have multiple processes accessing one file.
 
 As to the wisdom of such access, there are many opinions :-)
 
 You really can not just rule things out because reasonable people
 don't do them. Unreasonable people write apps too.

do you think plan 9 could have been written with consideration
of how people used x windows at the time?  and still have the qualities
that we love about it?

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread J.R. Mauro
On Sat, Apr 18, 2009 at 12:20 PM, ron minnich rminn...@gmail.com wrote:
 On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote:

 I agree that generally only one process will be accessing a normal
 file at once. I think an editor is not a good example, as you say.


 I'll say it again. It does not matter what we think. It matters what
 apps do. And some apps have multiple processes accessing one file.

 As to the wisdom of such access, there are many opinions :-)

 You really can not just rule things out because reasonable people
 don't do them. Unreasonable people write apps too.

 ron


I just meant it was a bad example, not that the case of an editor
doing something can or should be ruled out.



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread tlaronde
On Sat, Apr 18, 2009 at 12:20 PM, ron minnich rminn...@gmail.com wrote:

 I'll say it again. It does not matter what we think. It matters what
 apps do. And some apps have multiple processes accessing one file.

 As to the wisdom of such access, there are many opinions :-)

 You really can not just rule things out because reasonable people
 don't do them. Unreasonable people write apps too.

There are, from times to times, lists of Worst IT jobs ever.

I _do_ think yours should come first! Having to say: yes to an user...

Br... 
-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread J.R. Mauro

 I _do_ think yours should come first! Having to say: yes to an user...

If you don't say 'yes' at some point, you won't have a system anyone
will want to use. Remember all those quotes about why Unix doesn't
prevent you from doing stupid things?



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread ron minnich
A  checkpoint restart package.

https://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread Charles Forsyth
this discussion of checkpoint/restart reminds me of
a hint i was given years ago: if you wanted to break into a system,
attack through the checkpoint/restart system. i won a jug of
beer for my subsequent successful attack which involved patching
the disc offset for an open file in a copy of the Slave Service Area saved
by the checkpoint; with the offset patched to zero, the newly restored process
could read the file and dump the users and passwords conveniently stored in the 
clear at
the start of the system area of the system disc.  the hard bit was
writing the code to dump the data in a tidy way.



Re: [9fans] Plan9 - the next 20 years

2009-04-18 Thread J.R. Mauro
On Sat, Apr 18, 2009 at 7:31 PM, Charles Forsyth fors...@terzarima.net wrote:
 this discussion of checkpoint/restart reminds me of
 a hint i was given years ago: if you wanted to break into a system,
 attack through the checkpoint/restart system. i won a jug of
 beer for my subsequent successful attack which involved patching
 the disc offset for an open file in a copy of the Slave Service Area saved
 by the checkpoint; with the offset patched to zero, the newly restored process
 could read the file and dump the users and passwords conveniently stored in 
 the clear at
 the start of the system area of the system disc.  the hard bit was
 writing the code to dump the data in a tidy way.



Unfortunately, in the rush to build the Next Cool Thing people often
leave security issues to the very end, at which point shoehorning
fixes in gets ugly.



[9fans] Plan9 - the next 20 years

2009-04-17 Thread Steve Simon
I cannot find the reference (sorry), but I read an interview with Ken
(Thompson) a while ago.

He was asked what he would change if he where working on plan9 now,
and his reply was somthing like I would add support for cloud computing.

I admin I am not clear exactly what he meant by this.

-Steve



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread tlaronde
On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
 I cannot find the reference (sorry), but I read an interview with Ken
 (Thompson) a while ago.
 
 He was asked what he would change if he where working on plan9 now,
 and his reply was somthing like I would add support for cloud computing.
 
 I admin I am not clear exactly what he meant by this.

My interpretation of cloud computing is precisely the split done by
plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with
actual computing done somewhere about data stored somewhere.

Perhaps tools for migrating tasks or managing the thing. But I have the
impression that the Plan 9 framework is the best for such a scheme.
-- 
Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 3:43 PM,  tlaro...@polynum.com wrote:
 On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
 I cannot find the reference (sorry), but I read an interview with Ken
 (Thompson) a while ago.

 He was asked what he would change if he where working on plan9 now,
 and his reply was somthing like I would add support for cloud computing.

 I admin I am not clear exactly what he meant by this.

 My interpretation of cloud computing is precisely the split done by
 plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with
 actual computing done somewhere about data stored somewhere.

The problem is that the CPU and Fileservers can't be assumed to be
static. Things can and will go down, move about, and become
temporarily unusable over time.


 Perhaps tools for migrating tasks or managing the thing. But I have the
 impression that the Plan 9 framework is the best for such a scheme.
 --
 Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com
                 http://www.kergis.com/
 Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C





Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread Eric Van Hensbergen
On Fri, Apr 17, 2009 at 2:43 PM,  tlaro...@polynum.com wrote:
 On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
 I cannot find the reference (sorry), but I read an interview with Ken
 (Thompson) a while ago.


 My interpretation of cloud computing is precisely the split done by
 plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with
 actual computing done somewhere about data stored somewhere.


That misses the dynamic nature which clouds could enable -- something
we lack as well with our hardcoded /lib/ndb files -- there is no
provisions for cluster resources coming and going (or failing) and no
control facilities given for provisioning (or deprovisioning) those
resources in a dynamic fashion.  Lucho's kvmfs (and to a certain
extent xcpu) seem like steps in the right direction -- but IMHO more
fundamental changes need to occur in the way we think about things.  I
believe the file system interfaces While not focused on cloud
computing in particular, the work we are doing under HARE aims to
explore these directions further (both in the context of Plan
9/Inferno as well as broader themes involving other platforms).

For hints/ideas/whatnot you can check the current pubs (more coming
soon): http://www.research.ibm.com/hare

  -eric



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread John Barham
Steve Simon wrote:
 I cannot find the reference (sorry), but I read an interview with Ken
 (Thompson) a while ago.

 He was asked what he would change if he where working on plan9 now,
 and his reply was somthing like I would add support for cloud computing.

Perhaps you were thinking of his Ask a Google engineer answers at
http://moderator.appspot.com/#15/e=c9t=2d, specifically the question
If you could redesign Plan 9 now (and expect similar uptake to UNIX),
what would you do differently?



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread Benjamin Huntsman
Speaking of NUMA and such though, is there even any support for it in the 
kernel?
I know we have a 10gb Ethernet driver, but what about cluster interconnects 
such as InfiniBand, Quadrics, or Myrinet?  Are such things even desired in Plan 
9?

I'm glad see process migration has been mentioned
winmail.dat

Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 4:14 PM, Eric Van Hensbergen eri...@gmail.com wrote:
 On Fri, Apr 17, 2009 at 2:43 PM,  tlaro...@polynum.com wrote:
 On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
 I cannot find the reference (sorry), but I read an interview with Ken
 (Thompson) a while ago.


 My interpretation of cloud computing is precisely the split done by
 plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with
 actual computing done somewhere about data stored somewhere.


 That misses the dynamic nature which clouds could enable -- something
 we lack as well with our hardcoded /lib/ndb files -- there is no
 provisions for cluster resources coming and going (or failing) and no
 control facilities given for provisioning (or deprovisioning) those
 resources in a dynamic fashion.  Lucho's kvmfs (and to a certain
 extent xcpu) seem like steps in the right direction -- but IMHO more
 fundamental changes need to occur in the way we think about things.  I
 believe the file system interfaces While not focused on cloud
 computing in particular, the work we are doing under HARE aims to
 explore these directions further (both in the context of Plan
 9/Inferno as well as broader themes involving other platforms).

Vidi also seems to be an attempt to make Venti work in such a dynamic
environment. IMHO, the assumption that computers are always connected
to the network was a fundamental mistake in Plan 9


 For hints/ideas/whatnot you can check the current pubs (more coming
 soon): http://www.research.ibm.com/hare

      -eric





Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread Francisco J Ballesteros
Well, in the octopus you have a fixed part, the pc, but all other  
machines come and go. The feeling is very much that your stuff is in  
the cloud.


I mean, not everything has to be dynamic.

El 17/04/2009, a las 22:17, eri...@gmail.com escribió:


On Fri, Apr 17, 2009 at 2:43 PM, tlaro...@polynum.com wrote:

On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote:
I cannot find the reference (sorry), but I read an interview with  
Ken

(Thompson) a while ago.



My interpretation of cloud computing is precisely the split done by
plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal,  
with

actual computing done somewhere about data stored somewhere.



That misses the dynamic nature which clouds could enable -- something
we lack as well with our hardcoded /lib/ndb files -- there is no
provisions for cluster resources coming and going (or failing) and no
control facilities given for provisioning (or deprovisioning) those
resources in a dynamic fashion. Lucho's kvmfs (and to a certain
extent xcpu) seem like steps in the right direction -- but IMHO more
fundamental changes need to occur in the way we think about things. I
believe the file system interfaces While not focused on cloud
computing in particular, the work we are doing under HARE aims to
explore these directions further (both in the context of Plan
9/Inferno as well as broader themes involving other platforms).

For hints/ideas/whatnot you can check the current pubs (more coming
soon): http://www.research.ibm.com/hare

-eric

[/mail/box/nemo/msgs/200904/38399]




Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread ron minnich
if you want to look at checkpointing, it's worth going back to look at
Condor, because they made it really work. There are a few interesting
issues that you need to get right. You can't make it 50% of the way
there; that's not useful. You have to hit all the bits -- open /tmp
files, sockets, all of it. It's easy to get about 90% of it but the
last bits are a real headache. Nothing that's come along since has
really done the job (although various efforts claim to, you have to
read the fine print).

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 6:15 PM, ron minnich rminn...@gmail.com wrote:
 if you want to look at checkpointing, it's worth going back to look at
 Condor, because they made it really work. There are a few interesting
 issues that you need to get right. You can't make it 50% of the way
 there; that's not useful. You have to hit all the bits -- open /tmp
 files, sockets, all of it. It's easy to get about 90% of it but the
 last bits are a real headache. Nothing that's come along since has
 really done the job (although various efforts claim to, you have to
 read the fine print).

 ron



Amen. Linux is currently having a seriously hard time getting C/R
working properly, just because of the issues you mention. The second
you mix in non-local resources, things get pear-shaped.

Unfortunately, even if it does work, it will probably not have the
kind of nice Plan 9-ish semantics I can envision it having.



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread ron minnich
On Fri, Apr 17, 2009 at 3:35 PM, J.R. Mauro jrm8...@gmail.com wrote:

 Amen. Linux is currently having a seriously hard time getting C/R
 working properly, just because of the issues you mention. The second
 you mix in non-local resources, things get pear-shaped.

it's not just non-local. It's local too.

you are on a node. you open /etc/hosts. You C/R to another node with
/etc/hosts open. What's that mean?

You are on a node. you open a file in a ramdisk. Other programs have
it open too. You are watching each other's writes. You C/R to another
node with the file open. What's that mean?

You are on a node. You have a pipe to a process on that node. You C/R
to another node. Are you still talking at the end?

And on and on. It's quite easy to get this stuff wrong. But true C/R
requires that you get it right. The only system that would get this
stuff mostly right that I ever used was Condor. (and, well the Apollo
I think got it too, but that was a ways back).

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 7:01 PM, ron minnich rminn...@gmail.com wrote:
 On Fri, Apr 17, 2009 at 3:35 PM, J.R. Mauro jrm8...@gmail.com wrote:

 Amen. Linux is currently having a seriously hard time getting C/R
 working properly, just because of the issues you mention. The second
 you mix in non-local resources, things get pear-shaped.

 it's not just non-local. It's local too.

 you are on a node. you open /etc/hosts. You C/R to another node with
 /etc/hosts open. What's that mean?

 You are on a node. you open a file in a ramdisk. Other programs have
 it open too. You are watching each other's writes. You C/R to another
 node with the file open. What's that mean?

 You are on a node. You have a pipe to a process on that node. You C/R
 to another node. Are you still talking at the end?

 And on and on. It's quite easy to get this stuff wrong. But true C/R
 requires that you get it right. The only system that would get this
 stuff mostly right that I ever used was Condor. (and, well the Apollo
 I think got it too, but that was a ways back).

 ron



Yeah, the problem's bigger than I thought (not surprising since I
didn't think much about it). I'm having a hard time figuring out how
Condor handles these issues. All I can see from the documentation is
that it gives you warnings.

I can imagine a lot of problems stemming from open files could be
resolved by first attempting to import the process's namespace at the
time of checkpoint and, upon that failing, using cached copies of the
file made at the time of checkpoint, which could be merged later.

But this still has the 90% problem you mentioned.



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread ron minnich
On Fri, Apr 17, 2009 at 7:06 PM, J.R. Mauro jrm8...@gmail.com wrote:

 Yeah, the problem's bigger than I thought (not surprising since I
 didn't think much about it). I'm having a hard time figuring out how
 Condor handles these issues. All I can see from the documentation is
 that it gives you warnings.

the original condor just forwarded system calls back to the node it
was started from. Thus all system calls were done in the context of
the originating node and user.


 But this still has the 90% problem you mentioned.

it's just plain harder than it looks ...

ron



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 10:39 PM, ron minnich rminn...@gmail.com wrote:
 On Fri, Apr 17, 2009 at 7:06 PM, J.R. Mauro jrm8...@gmail.com wrote:

 Yeah, the problem's bigger than I thought (not surprising since I
 didn't think much about it). I'm having a hard time figuring out how
 Condor handles these issues. All I can see from the documentation is
 that it gives you warnings.

 the original condor just forwarded system calls back to the node it
 was started from. Thus all system calls were done in the context of
 the originating node and user.

Best effort is a good place to start.



 But this still has the 90% problem you mentioned.

 it's just plain harder than it looks ...

Yeah. Every time I think of a way to address the corner cases, new ones crop up.



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread erik quanstrom
 I can imagine a lot of problems stemming from open files could be
 resolved by first attempting to import the process's namespace at the
 time of checkpoint and, upon that failing, using cached copies of the
 file made at the time of checkpoint, which could be merged later.

there's no guarantee to a process running in a conventional
environment that files won't change underfoot.  why would
condor extend a new guarantee?

maybe i'm suffering from lack of vision, but i would think that
to get to 100% one would need to think in terms of transactions
and have a fully transactional operating system.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread erik quanstrom
 Vidi also seems to be an attempt to make Venti work in such a dynamic
 environment. IMHO, the assumption that computers are always connected
 to the network was a fundamental mistake in Plan 9

on the other hand, without this assumption, we would not have 9p.
it was a real innovation to dispense with underpowered workstations
with full adminstrative burdens.

i think it is anachronistic to consider the type of mobile devices we
have today.  in 1990 i knew exactly 0 people with a cell phone.  i had
a toshiba orange screen laptop from work, but in those days a 9600
baud vt100 was still a step up.

ah, the good old days.

none of this is do detract from the obviously good idea of being
able to carry around a working set and sync up with the main server
later without some revision control junk.  in fact, i was excited to
learn about fossil — i was under the impression from reading the
paper that that's how it worked.

speaking of vidi, do the vidi authors have an update on their work?
i'd really like to hear how it is working out.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net wrote:
 I can imagine a lot of problems stemming from open files could be
 resolved by first attempting to import the process's namespace at the
 time of checkpoint and, upon that failing, using cached copies of the
 file made at the time of checkpoint, which could be merged later.

 there's no guarantee to a process running in a conventional
 environment that files won't change underfoot.  why would
 condor extend a new guarantee?

 maybe i'm suffering from lack of vision, but i would think that
 to get to 100% one would need to think in terms of transactions
 and have a fully transactional operating system.

 - erik


There's a much lower chance of files changing out from you in a
conventional environment. If the goal is to make the unconventional
environment look and act like the conventional one, it will probably
have to try to do some of these things to be useful.



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Fri, Apr 17, 2009 at 11:56 PM, erik quanstrom quans...@quanstro.net wrote:
 Vidi also seems to be an attempt to make Venti work in such a dynamic
 environment. IMHO, the assumption that computers are always connected
 to the network was a fundamental mistake in Plan 9

 on the other hand, without this assumption, we would not have 9p.
 it was a real innovation to dispense with underpowered workstations
 with full adminstrative burdens.

 i think it is anachronistic to consider the type of mobile devices we
 have today.  in 1990 i knew exactly 0 people with a cell phone.  i had
 a toshiba orange screen laptop from work, but in those days a 9600
 baud vt100 was still a step up.

 ah, the good old days.

Of course it's easy to blame people for lack of vision 25 years later,
but with the rate at which computing moves in general, cell phones as
powerful as workstations should have been seen to be on their way
within the authors' lifetimes.

That said, Plan 9 was designed to furnish the needs of an environment
that might not ever have had iPhones and eeePCs attached to it even if
such things existed at the time it was made.

But I'll say that if anyone tries to solve these problems today, they
should not fall into the same trap, and look to the future. I hope
they'll consider how well their solution scales to computers so small
they're running through someone's bloodstream and so far away that
communication in one direction will take several light-minutes and be
subject to massive delay and loss.

It's not that ridiculous... teams are testing DTN, which hopes to
spread the internet to outer space, not only across this solar system,
but also to nearby stars. Now there's thinking forward!


 none of this is do detract from the obviously good idea of being
 able to carry around a working set and sync up with the main server
 later without some revision control junk.  in fact, i was excited to
 learn about fossil — i was under the impression from reading the
 paper that that's how it worked.

 speaking of vidi, do the vidi authors have an update on their work?
 i'd really like to hear how it is working out.

 - erik





Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread erik quanstrom
 But I'll say that if anyone tries to solve these problems today, they
 should not fall into the same trap,  [...]

yes.  forward thinking was just the thing that made multics
what it is today.

it is equally a trap to try to prognosticate too far in advance.
one increases the likelyhood of failure and the chances of being
dead wrong.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread erik quanstrom
 On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net 
 wrote:
  I can imagine a lot of problems stemming from open files could be
  resolved by first attempting to import the process's namespace at the
  time of checkpoint and, upon that failing, using cached copies of the
  file made at the time of checkpoint, which could be merged later.
 
  there's no guarantee to a process running in a conventional
  environment that files won't change underfoot.  why would
  condor extend a new guarantee?
 
  maybe i'm suffering from lack of vision, but i would think that
  to get to 100% one would need to think in terms of transactions
  and have a fully transactional operating system.
 
  - erik
 
 
 There's a much lower chance of files changing out from you in a
 conventional environment. If the goal is to make the unconventional
 environment look and act like the conventional one, it will probably
 have to try to do some of these things to be useful.

* you can get the same effect by increasing the scale of your system.

* the reason conventional systems work is not, in my opinion, because
the collision window is small, but because one typically doesn't do
conflicting edits to the same file.

* saying that something isn't likely in an unquantifiable way is
not a recipie for success in computer science, in my experience.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread erik quanstrom
 Speaking of NUMA and such though, is there even any support for it in the 
 kernel?
 I know we have a 10gb Ethernet driver, but what about cluster interconnects 
 such as InfiniBand, Quadrics, or Myrinet?  Are such things even desired in 
 Plan 9?

there is no explicit numa support in the pc kernel.
however it runs just fine on standard x86-64 numa
architectures like intel nelaham and amd opteron.

we have two 10gbe ethernet drivers the myricom driver
and the intel 82598 driver.  the blue gene folks have
support for a number of blue-gene-specific networks.

i don't know too much about myrinet, infiniband or
quadratics.  i have nothing against any of them, but
10gbe has been a much better fit for the things i've
wanted to do.

- erik



Re: [9fans] Plan9 - the next 20 years

2009-04-17 Thread J.R. Mauro
On Sat, Apr 18, 2009 at 12:16 AM, erik quanstrom quans...@quanstro.net wrote:
 But I'll say that if anyone tries to solve these problems today, they
 should not fall into the same trap,  [...]

 yes.  forward thinking was just the thing that made multics
 what it is today.

 it is equally a trap to try to prognosticate too far in advance.
 one increases the likelyhood of failure and the chances of being
 dead wrong.

 - erik



I don't think what I outlined is too far ahead, and the issues
presented are all doable as long as a small bit of extra consideration
is made.

Keeping your eye only on the here and now was just the thing that
gave Unix a bunch of tumorous growths like sockets and X11, and made
Windows the wonderful piece of hackery it is.

I'm not suggesting we consider how to solve the problems we'll face
when we're flying through space and time in the TARDIS and shrinking
ourselves and our bioships down to molecular sizes to cure someone's
brain cancer. I'm talking about making something scale across
distances and magnitudes that we will come accustomed to in the next
five decades.