Re: [9fans] Plan9 - the next 20 years
On Thu, Apr 23, 2009 at 9:56 AM, tlaro...@polynum.com wrote: clustermatic: not much left from lanl This is a long story and the situation is less a comment on the software than on the organization. I only say this because, almost 5 years after our last release, there are still people out there using it. beowulf: seems to have stalled a little since 2007 eh? beowulf was never a product. It's a marketing term that means linux cluster. The top 2 machines on the top 500 are beowulf systems. Not sure what you're getting at here, but you've barely scratched the surface. ron
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 24, 2009 at 08:33:59AM -0700, ron minnich wrote: [snipped precisions about some of my notes] Not sure what you're getting at here, but you've barely scratched the surface. The fact that I'm not an english native speaker does not help and my wording may be rude. This was not intended to say that clusters have no usage or whatever. This would be ridiculous. And I do expect that solutions that are indeed hard to get right are still used and not chosen or thrown away depending on the mood. Simply that, precisely, specially in the open source, for somebody like me, who does not work in this area, but wants to have at least some rough ideas just to, if not write programs that can be used efficiently out of the box on such beasts, but at least try to avoid mistakes that make the program a nightmare to try to use such tools (or impact the design of the solutions, having to deal with spaghetti code), following what appears on the surface is disappointing, since it appears, disappears, and the hype around some solutions is not always a clear indicator about the real value, or emphasize something that is not the crux. In my area, the watershed computation on a grid (raster) for geographical informations is a heavy process stuff, and processing some huge data calls for solution both on the algorithmic side (including the implementation), and on the processing power. So even if I'm not a specialist, and don't plan to be (assuming I could understand the basics), I feel compelled to have at least some ideas about the problems. For this kind of stuff, the Plan 9 organization has given me at least some principles and some hard facts and tools: separate the representation (the terminal) from the processing. Remember that processing is about data that may not be served by the same instance of the OS, i.e. that the locking of data, ensured during processing is, on some OS and depending on the fileserver or filesystem, advisory. So perhaps think differently about rights and locking. And, no, this can not work in whatever environment or with whatever i filesystem and fileservers. And adding Plan 9 to POSIX, showing the differences is a great help in organizing the sources to, between guarantedd by C, and system dependant for example. After that, my only guidelines are that if some limited, calculus intensive sub-tasks can be made in parallel but the whole is interdependant, one can think about multiple threads sharing the same address space. But if I can design my data formats to allow independant processing of chunks (locality in geometrical stuff is rather obvious ; and finally sewing all the chunks together afterwards, even with some processing on the edges of the chunks), I can imagine processes (tasks) distributed among distinct CPUs. In this case, an OS can, too, launch the tasks on the same CPU with multiple cores. At the moment, I think more on multiple tasks, than threads. But that's vague. I know what to avoid doing, but I'm not sure that what I do is not to be added to the list of don't do that things. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
Not to beat a (potentially) dead horse (even further) to death, but if we had some way of knowing that files were actually data (i.e. not ctl files; cf. QTDECENT) we could do more prefetching in a proxy -- e.g. cfs could be modified to do read entire files into its cache (presumably it would have to Tstat afterwards to ensure that it's got a stable snapshot of the file). Adding cache journal callbacks would further allow us to avoid the RTT of Tstat on open and would bring us a bit closer to a fully coherent store. Achieving actual coherency could be an interesting future direction (IMHO). you're right. i think cfs is becoming a pretty dead horse. if you're running at a significant rtt from your main system, and you don't have one of your own, the simple solution is to install 9vx. that way your system is local and only the shared files get shared. does cfs even work with import? if it doesn't then using it implies that all your data are free for public viewing on whatever network you're using, since direct fs connections aren't encrypted. regarding the QTDECENT, i think russ is still on point http://9fans.net/archive/2007/10/562 - erik
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 08:05:50AM -0700, ron minnich wrote: For cluster work that was done in the OS, see any clustermatic publication from minnich, hendriks, or watson, ca. 2000-2005. FWIW, I haven't found much left, and finally purchased your (and al.) article about HARE: The Right-Weight-Kernel... Since it considers, among others, Plan 9, it's of much interest for me ;) What impresses me much is the state of the open source cluster solutions from a rapid tour. clustermatic: not much left from lanl openmosix: closed beowulf: seems to have stalled a little since 2007 kerrighed: After 8 years of research it is considered a proof of concept, but for obtaining a stable system, we have disabled some features. I hope that the last [from memory] quotes do not imply that the old way of disproving an assertion by a counter-example has been replaced by considered proved an assertion by advertising a limited not crashing too fast example. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@coraid.com: http://moderator.appspot.com/#15/e=c9t=2d You must have JavaScript enabled in order to use this feature. cruel irony. No silver bullet, unfortunately :) Ken Thompson wrote: | HTTP is really TCP/IP - a reliable stream transport. 9P is a | filesystem protocol - a higher level protocol that can run over | any of several transfer protocols, including TCP. 9P is more | NFS-like than HTTP. | | HTTP gets it's speed mostly by being one way - download. Most | browsers will speed up loading by asking for many pages at once. | | Now for the question. 9P could probably be speeded up, for large | reads and writes, by issuing many smaller reads in parallel rather | than serially. Another try would be to allow the client of the | filesystem to issue asynchronous requests and at some point | synchronize. Because 9P is really implementing a filesystem, it | will be very hard to get any more parallelism with multiple outstanding | requests. I followed it up with a more focused question (awkwardly worded to fit within the draconian 250 character limit), but no response yet: Hi Ken, thanks for your 9p/HTTP response. I guess my real question is: can we achieve such parallelism transparently, given that most code calls read() with 4k/8k blocks. The syscall implies synchronisation... do we need new primitives? h8 chr limit -sqweek
Re: [9fans] Plan9 - the next 20 years
On Thu, Apr 23, 2009 at 01:07:58PM +0800, sqweek wrote: Ken Thompson wrote: | Now for the question. 9P could probably be speeded up, for large | reads and writes, by issuing many smaller reads in parallel rather | than serially. Another try would be to allow the client of the | filesystem to issue asynchronous requests and at some point | synchronize. Because 9P is really implementing a filesystem, it | will be very hard to get any more parallelism with multiple outstanding | requests. I followed it up with a more focused question (awkwardly worded to fit within the draconian 250 character limit), but no response yet: Hi Ken, thanks for your 9p/HTTP response. I guess my real question is: can we achieve such parallelism transparently, given that most code calls read() with 4k/8k blocks. The syscall implies synchronisation... do we need new primitives? h8 chr limit Not to beat a (potentially) dead horse (even further) to death, but if we had some way of knowing that files were actually data (i.e. not ctl files; cf. QTDECENT) we could do more prefetching in a proxy -- e.g. cfs could be modified to do read entire files into its cache (presumably it would have to Tstat afterwards to ensure that it's got a stable snapshot of the file). Adding cache journal callbacks would further allow us to avoid the RTT of Tstat on open and would bring us a bit closer to a fully coherent store. Achieving actual coherency could be an interesting future direction (IMHO). --nwf; pgpVXNbfdyNw5.pgp Description: PGP signature
Re: [9fans] Plan9 - the next 20 years
On Mon, 20 Apr 2009 16:33:41 EDT erik quanstrom quans...@coraid.com wrote: let's take the path /sys/src/9/pc/sdata.c. for http, getting this path takes one request (with the prefix http://$server) with 9p, this takes a number of walks, an open. then you can start with the reads. only the reads may be done in parallel. given network latency worth worring about, the total latency to read this file will be worse for 9p than for http. Perhaps one can optimize for the common case by extending 9p a bit: use special values for certain parameters to allow sending consecutive Twalk, (Topen|Tcreate), (Tread|Twrite) without waiting for intermeditate R messages. This makes sense since the time to prepare /process a handful of messages is much shorter than roundtrip latency. A different performance problem arises when lots of data has to be fetched. You can pipeline data requests by having multiple outstanding requests. A further refinement would to use something like RDMA -- in essence the receiver tells the sender where exactly it wants the data delivered (thus minimizing copying processing). You can very easily extend the model to have data chunks delivered to different machines in a cluster. This is like separating a very high speed data plane (with little or no processing) from a low speed control plane (with lots of processing) in a modern switch/router.
Re: [9fans] Plan9 - the next 20 years
2009/4/20 andrey mirtchovski mirtchov...@gmail.com: with 9p, this takes a number of walks... shouldn't that be just one walk? % ramfs -D ... % mkdir -p /tmp/one/two/three/four/five/six ... % cd /tmp/one/two/three/four/five/six ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two 2:three 3:four 4:five 5:six ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d) 1:(0002 0 d) 2:(0003 0 d) 3:(0004 0 d) 4:(0005 0 d) 5:(0006 0 d) that depends if it's been gated through exportfs or not (exportfs only walks one step at a time, regardless of the incoming walk) i'm sure something like this has been discussed before, and this idea somewhat half-baked, but one could get quite a long way by allowing the notion of a sequence of related 9p actions - if one action fails, then all subsequent actions are discarded. one difficulty with using multiple concurrent requests with 9p as it stands is that there's no way to force the server to process them sequentially. fcp works because the reads it sends can execute out of order without changing the semantics, but this only works on conventional files. suppose all 9p Tmsgs were given an sid (sequence id) field. a new 9p message, Tsequence, would start a sequence; subsequent messages with the same sid would be added to a server-side queue for that sequence rather than being executed immediately. the server would move sequentially through the queue, executing actions and sending each reply when complete. the sequence would abort when one of: a) an Rerror is sent b) a write returned less than the number of bytes written c) a read returned less than the number of bytes requested. this mechanism would allow a client to program a set of actions to perform sequentially on the server without having to wait for each reply in turn, i.e. avoiding the usual 9p latency. some use cases: the currently rather complex definition of Twalk could be replaced by clone and walk1 instead, as in the original 9p: {Tclone, Twalk, Twalk, ...} {Twrite, Tread} gives a RPC-style request - no need for venti to use its own protocol (which i assume was invented largely because of the latency inherent in doing two separate 9p requests where one would do). streaming - send several speculative requests, and keep adding a request to the sequence when a reply arrives. still probably not as good as straight streaming TCP, but easier than fcp and more general. there are probably lots of reasons why this couldn't work, but i can't think of any right now...
Re: [9fans] Plan9 - the next 20 years
2009/4/21 maht mattmob...@proweb.co.uk: Tag 3 could conceivably arrive at the server before Tag 2 that's not true, otherwise the flush semantics wouldn't work correctly. 9p *does* require in-order delivery.
Re: [9fans] Plan9 - the next 20 years
i wrote: the currently rather complex definition of Twalk could be replaced by clone and walk1 instead, as in the original 9p: {Tclone, Twalk, Twalk, ...} i've just realised that the replacement would be somewhat less efficient as the current Twalk, as the cloned fid would still have to be clunked on a failed walk. this is a common case when using paths that traverse a mount point, but perhaps it's less common that directories are mounted on external filesystems, and hence not such an important issue (he says, rationalising hastily :-) )
Re: [9fans] Plan9 - the next 20 years
On Tue Apr 21 06:25:49 EDT 2009, rogpe...@gmail.com wrote: 2009/4/21 maht mattmob...@proweb.co.uk: Tag 3 could conceivably arrive at the server before Tag 2 that's not true, otherwise the flush semantics wouldn't work correctly. 9p *does* require in-order delivery. i have never needed to do anything important with flush. so i'm asking from ignorance. what is the important use case of flush and why is this so important that it drives the design? - erik
Re: [9fans] Plan9 - the next 20 years
one issue with multiple 9p requests is that tags are not order enforced consider the contrived directory tree 1/a/a 1/a/b 1/b/a 1/b/b Twalk 1 fid 1 Twalk 2 fid a Twalk 3 fid b Tag 3 could conceivably arrive at the server before Tag 2
Re: [9fans] Plan9 - the next 20 years
On Tue Apr 21 10:05:43 EDT 2009, rogpe...@gmail.com wrote: 2009/4/21 erik quanstrom quans...@quanstro.net: what is the important use case of flush and why is this so important that it drives the design? [...] The 9P protocol must run above a reliable transport protocol with delimited messages. [...] UDP [RFC768] does not provide reliable in-order delivery. (is this the canonical reference for this requirement? the man page doesn't seem to say it) great post, but i still don't understand why the protocol is designed around flush semantics. all your examples have to do with the interaction between flush and something else. why is flush so important? what if we just ignored the response we don't want instead? - erik
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@quanstro.net: what is the important use case of flush and why is this so important that it drives the design? actually the in-order delivery is most important for Rmessages, but it's important for Tmessages too. consider this exchange (C=client, S=server), where the Tflush is sent almost immediately after the Twalk: C-S Twalk tag=5 fid=22 newfid=24 C-S Tflush tag=6 oldtag=5 S-C Rflush tag=6 if outgoing tags 5 and 6 were swapped, we could get this possible exchange: C-S Tflush tag=6 oldtag=5 S-C Rflush tag=6 C-S Twalk tag=5 fid=22 newfid=24 S-C Rwalk tag=5 thus the flush is incorrectly ignored. this won't break the protocol though, but consider this example, where Rmsgs can be delivered out-of-order: here, the server replies to the Twalk message before it receives the Tflush. the clone succeeds: C-S Twalk tag=4 fid=22 newfid=23 C-S Tflush tag=5 oldtag=4 S-C Rwalk tag=4 S-C Rflush tag=5 here the two reply messages are switched (erroneously): C-S Twalk tag=4 fid=22 newfid=23 C-S Tflush tag=5 oldtag=4 S-C Rflush tag=5 S-C Rwalk tag=4 the Rflush signals to the client that the Twalk was successfully flushed, so the client considers that the clone failed, whereas it actually succeeded. the Rwalk is considered a spurious message (it may even interfere destructively with subsequent Tmsg). result: death and destruction. anyway, this is moot - from the original plan 9 paper: The 9P protocol must run above a reliable transport protocol with delimited messages. [...] UDP [RFC768] does not provide reliable in-order delivery. (is this the canonical reference for this requirement? the man page doesn't seem to say it) the protocol doesn't guarantee that requests are *processed* in order, but that's a different thing entirely, and something my half-baked proposal seeks to get around.
Re: [9fans] Plan9 - the next 20 years
Well, if you don't have flush, your server is going to keep a request for each process that dies/aborts. If requests always complete quite soon it's not a problem, AFAIK, but your server may be keeping the request to reply when something happens. Also, there's the issue that the flushed request may have allocated a fid or some other resource. If you don't agree that the thing is flushed you get out of sync with the client. What I mean is that as soon as you get concurrent requests you really ned to implement flush. Again, AFAIK. From: quans...@quanstro.net To: 9fans@9fans.net Reply-To: 9fans@9fans.net Date: Tue Apr 21 16:11:28 CET 2009 Subject: Re: [9fans] Plan9 - the next 20 years On Tue Apr 21 10:05:43 EDT 2009, rogpe...@gmail.com wrote: 2009/4/21 erik quanstrom quans...@quanstro.net: what is the important use case of flush and why is this so important that it drives the design? [...] The 9P protocol must run above a reliable transport protocol with delimited messages. [...] UDP [RFC768] does not provide reliable in-order delivery. (is this the canonical reference for this requirement? the man page doesn't seem to say it) great post, but i still don't understand why the protocol is designed around flush semantics. all your examples have to do with the interaction between flush and something else. why is flush so important? what if we just ignored the response we don't want instead? - erik
Re: [9fans] Plan9 - the next 20 years
On Tue Apr 21 10:34:34 EDT 2009, n...@lsub.org wrote: Well, if you don't have flush, your server is going to keep a request for each process that dies/aborts. If requests always complete quite soon it's not a problem, AFAIK, but your server may be keeping the request to reply when something happens. Also, there's the issue that the flushed request may have allocated a fid or some other resource. If you don't agree that the thing is flushed you get out of sync with the client. What I mean is that as soon as you get concurrent requests you really ned to implement flush. Again, AFAIK. isn't the tag space per fid? a variation on the tagged queuing flush cache would be to force the client to make sure that reordered flush tags aren't a problem. it would not be very hard to ensure that tag overlap does not happen. if the problem with 9p is latency, then here's a decision that could be revisisted. it would be a complication, but it seems to me better than a http-like protocol, bundling requets together or moving to a storage- oriented protocol. - erik
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@quanstro.net: isn't the tag space per fid? no, otherwise every reply message (and Tflush) would include a fid too; moreover Tversion doesn't use a fid (although it probably doesn't actually need a tag) a variation on the tagged queuing flush cache would be to force the client to make sure that reordered flush tags aren't a problem. it would not be very hard to ensure that tag overlap does not happen. the problem is not in tag overlap, but in the fact that the server may or may not already have serviced the request when it receives a Tflush. the client can't know this - the only way it knows if the transaction actually took place is if the reply to the request arrives before the reply to the flush. this race is, i think, an inherent part of allowing requests to be aborted. the flush protocol is probably the most complex and the most delicate part of 9p, but it's also one of the most useful, because reinventing it correctly is hard and it solves a oft-found problem - how do i tear down a request that i've already started? plan 9 and inferno rely quite heavily on having flush, and it's sometimes notable when servers don't implement it. for instance, inferno's file2chan provides no facility for flush notification, and wm/sh uses file2chan; thus if you kill a process that's reading from wm/sh's /dev/cons, the read goes ahead anyway, and a line of input is lost (you might have seen this if you ever used os(1)). that's aside from the issue of resource-leakage that nemo points out. the idea with my proposal is to have an extension that changes as few of the semantics of 9p as possible: C-S Tsequence tag=1 sid=1 C-S Topen tag=2 sid=1 fid=20 mode=0 C-S Tread tag=3 sid=1 fid=20 count=8192 C-S Tclunk tag=4 sid=1 S-C Rsequence tag=1 S-C Ropen tag=2 qid=... S-C Rread tag=3 data=... S-C Rclunk tag=4 would be exactly equivalent to: C-S Topen tag=2 fid=20 mode=0 S-C Ropen tag=2 qid=... C-S Tread tag=3 fid=20 count=8192 S-C Rread tag=3 data=... C-S Tclunk tag=4 S-C Rclunk tag=4 and the client-side interface could be designed so that the client code is the same regardless of whether the server implements Tsequence or not (for instance, in-kernel drivers need not implement it). thus most of the code base could remain unchanged, but everywhere gets the benefit of latency reduction from a few core code changes (e.g. namec).
Re: [9fans] Plan9 - the next 20 years
On Tue, 21 Apr 2009 10:50:18 EDT erik quanstrom quans...@quanstro.net wrote: On Tue Apr 21 10:34:34 EDT 2009, n...@lsub.org wrote: Well, if you don't have flush, your server is going to keep a request for each process that dies/aborts. If a process crashes, who sends the Tflush? The server must clean up without Tflush if a connection closes unexpectedly. I thought the whole point of Tflush was to cancel a potentially expensive operation (ie when the user hits the interrupt key). You still have to cleanup. If requests always complete quite soon it's not a problem, AFAIK, but your server may be keeping the request to reply when something happens. Also, there's the issue that the flushed request may have allocated a fid or some other resource. If you don't agree that the thing is flushed you get out of sync with the client. What I mean is that as soon as you get concurrent requests you really ned to implement flush. Again, AFAIK. isn't the tag space per fid? a variation on the tagged queuing flush cache would be to force the client to make sure that reordered flush tags aren't a problem. it would not be very hard to ensure that tag overlap does not happen. Why does it matter? if the problem with 9p is latency, then here's a decision that could be revisisted. it would be a complication, but it seems to me better than a http-like protocol, bundling requets together or moving to a storage- oriented protocol. Can you explain why is it better than bundling requests together? Bundling requests can cut out a few roundtrip delays, which can make a big difference for small files. What you are talking about seems useful for large files [if I understand you correctly]. Second, 9p doesn't seem to restrict any replies other than Rflushes to be sent in order. That means the server can still send Rreads in any order but if a Tflush is seen, it must clean up properly. The situation is analogous what happens in an an OoO processor (where results must be discarded in case of exceptions and mis-prediction on branches).
Re: [9fans] Plan9 - the next 20 years
plan 9 and inferno rely quite heavily on having flush, and it's sometimes notable when servers don't implement it. for instance, inferno's file2chan provides no facility for flush notification, and wm/sh uses file2chan; thus if you kill a process that's reading from wm/sh's /dev/cons, the read goes ahead anyway, and a line of input is lost (you might have seen this if you ever used os(1)). isn't the race still there, just with a smaller window of oppertunity? - erik
Re: [9fans] Plan9 - the next 20 years
if the problem with 9p is latency, then here's a decision that could be revisisted. it would be a complication, but it seems to me better than a http-like protocol, bundling requets together or moving to a storage- oriented protocol. Can you explain why is it better than bundling requests together? Bundling requests can cut out a few roundtrip delays, which can make a big difference for small files. What you are talking about seems useful for large files [if I understand you correctly]. Second, 9p doesn't seem to restrict any replies other than Rflushes to be sent in order. That means the server can still send Rreads in any order but if a Tflush is seen, it must clean up properly. The situation is analogous what happens in an an OoO processor (where results must be discarded in case of exceptions and mis-prediction on branches). bundling is equivalent to running the original sequence on the remote machine and shipping only the result back. some rtt latency is eliminated but i think things will still be largely in-order because walks will act like fences. i think the lots- of-small-files case will still suffer. maybe i'm not quite following along. bundling will also require additional agent on the server to marshal the bundled requests. - erik
Re: [9fans] Plan9 - the next 20 years
On Tue, 21 Apr 2009 17:03:07 BST roger peppe rogpe...@gmail.com wrote: the idea with my proposal is to have an extension that changes as few of the semantics of 9p as possible: C-S Tsequence tag=3D1 sid=3D1 C-S Topen tag=3D2 sid=3D1 fid=3D20 mode=3D0 C-S Tread tag=3D3 sid=3D1 fid=3D20 count=3D8192 C-S Tclunk tag=3D4 sid=3D1 S-C Rsequence tag=3D1 S-C Ropen tag=3D2 qid=3D... S-C Rread tag=3D3 data=3D... S-C Rclunk tag=3D4 would be exactly equivalent to: C-S Topen tag=3D2 fid=3D20 mode=3D0 S-C Ropen tag=3D2 qid=3D... C-S Tread tag=3D3 fid=3D20 count=3D8192 S-C Rread tag=3D3 data=3D... C-S Tclunk tag=3D4 S-C Rclunk tag=3D4 and the client-side interface could be designed so that the client code is the same regardless of whether the server implements Tsequence or not (for instance, in-kernel drivers need not implement it). Do you really need a Tsequence? Seems to me this should already work Let me illustrate with a timing diagram: Strict request/response: 1 2 3 4 5 012345678901234567890123456789012345678901234567890 C: Topen Tread Tclunk | S: Ropen Rread Rclunk Pipelined case: 1 2 3 4 5 012345678901234567890123456789012345678901234567890 C: Topen Tread Tclunk | S: Ropen Rread Rclunk Here latency is 8 time units (one column = 1 time unit). In the first case it takes 48 time units from Topen to Rclunk received by server. In the second case it takes 28 time units. In the pipelined case, from a server's perspective, client's requests just get to it faster (and may already be waiting!). It doesn't have to do anything special. What am I missing?
Re: [9fans] Plan9 - the next 20 years
On Tue, Apr 21, 2009 at 2:52 AM, maht mattmob...@proweb.co.uk wrote: one issue with multiple 9p requests is that tags are not order enforced consider the contrived directory tree 1/a/a 1/a/b 1/b/a 1/b/b Twalk 1 fid 1 Twalk 2 fid a Twalk 3 fid b Tag 3 could conceivably arrive at the server before Tag 2 This would be transport dependent I presume?
Re: [9fans] Plan9 - the next 20 years
On Tue, Apr 21, 2009 at 1:19 AM, roger peppe rogpe...@gmail.com wrote: 2009/4/20 andrey mirtchovski mirtchov...@gmail.com: with 9p, this takes a number of walks... shouldn't that be just one walk? % ramfs -D ... % mkdir -p /tmp/one/two/three/four/five/six ... % cd /tmp/one/two/three/four/five/six ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two 2:three 3:four 4:five 5:six ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d) 1:(0002 0 d) 2:(0003 0 d) 3:(0004 0 d) 4:(0005 0 d) 5:(0006 0 d) that depends if it's been gated through exportfs or not (exportfs only walks one step at a time, regardless of the incoming walk) i'm sure something like this has been discussed before, and this idea somewhat half-baked, but one could get quite a long way by allowing the notion of a sequence of related 9p actions - if one action fails, then all subsequent actions are discarded. one difficulty with using multiple concurrent requests with 9p as it stands is that there's no way to force the server to process them sequentially. fcp works because the reads it sends can execute out of order without changing the semantics, but this only works on conventional files. suppose all 9p Tmsgs were given an sid (sequence id) field. a new 9p message, Tsequence, would start a sequence; subsequent messages with the same sid would be added to a server-side queue for that sequence rather than being executed immediately. the server would move sequentially through the queue, executing actions and sending each reply when complete. the sequence would abort when one of: a) an Rerror is sent b) a write returned less than the number of bytes written c) a read returned less than the number of bytes requested. this mechanism would allow a client to program a set of actions to perform sequentially on the server without having to wait for each reply in turn, i.e. avoiding the usual 9p latency. some use cases: the currently rather complex definition of Twalk could be replaced by clone and walk1 instead, as in the original 9p: {Tclone, Twalk, Twalk, ...} {Twrite, Tread} gives a RPC-style request - no need for venti to use its own protocol (which i assume was invented largely because of the latency inherent in doing two separate 9p requests where one would do). streaming - send several speculative requests, and keep adding a request to the sequence when a reply arrives. still probably not as good as straight streaming TCP, but easier than fcp and more general. there are probably lots of reasons why this couldn't work, but i can't think of any right now... Roger... this sounds pretty promising. 10p? I'd hate to call it 9p++.
Re: [9fans] Plan9 - the next 20 years
2009/4/21 Bakul Shah bakul+pl...@bitblocks.com: In the pipelined case, from a server's perspective, client's requests just get to it faster (and may already be waiting!). It doesn't have to do anything special. What am I missing? you're missing the fact that without the sequence operator, the second request can arrive before the first request has completed, thus potentially making it invalid (e.g. it's invalid to read from a file that hasn't been opened). also, in many current server implementations, each request gets serviced in its own process - there's no guarantee that replies will come back in the same order as the requests, even if all the requests are serviced immediately. it would be possible to do a similar kind of thing by giving the same tag to the operations in a sequence, but i'm quite attached to the fact that a) the operations are otherwise identical to operations in the original protocol b) a small bit of extra redundancy is useful for debugging. c) you can get Tendsequence (which is necessary, i now realise) by flushing the Tsequence with a request within the sequence itself.
Re: [9fans] Plan9 - the next 20 years
On Tue, Apr 21, 2009 at 9:25 AM, erik quanstrom quans...@quanstro.netwrote: if the problem with 9p is latency, then here's a decision that could be revisisted. it would be a complication, but it seems to me better than a http-like protocol, bundling requets together or moving to a storage- oriented protocol. Can you explain why is it better than bundling requests together? Bundling requests can cut out a few roundtrip delays, which can make a big difference for small files. What you are talking about seems useful for large files [if I understand you correctly]. Second, 9p doesn't seem to restrict any replies other than Rflushes to be sent in order. That means the server can still send Rreads in any order but if a Tflush is seen, it must clean up properly. The situation is analogous what happens in an an OoO processor (where results must be discarded in case of exceptions and mis-prediction on branches). bundling is equivalent to running the original sequence on the remote machine and shipping only the result back. some rtt latency is eliminated but i think things will still be largely in-order because walks will act like fences. i think the lots- of-small-files case will still suffer. maybe i'm not quite following along. Perhaps you don't want to use this technique for lots of smaller files. There's nothing in the protocol Roger suggested preventing us from using different sequence id's and getting the old behavior back right? It's a bit complex... but worth thinking about. Dave bundling will also require additional agent on the server to marshal the bundled requests. - erik
Re: [9fans] Plan9 - the next 20 years
2009/4/21 David Leimbach leim...@gmail.com: Roger... this sounds pretty promising. i dunno, there are always hidden dragons in this area, and forsyth, rsc and others are better at seeing them than i. 10p? I'd hate to call it 9p++. 9p2010, based on how soon it would be likely to be implemented...
Re: [9fans] Plan9 - the next 20 years
On Tue, Apr 21, 2009 at 10:06 AM, roger peppe rogpe...@gmail.com wrote: 2009/4/21 David Leimbach leim...@gmail.com: Roger... this sounds pretty promising. i dunno, there are always hidden dragons in this area, and forsyth, rsc and others are better at seeing them than i. Perhaps... but this discussion, and trying is better than not. :-) 10p? I'd hate to call it 9p++. 9p2010, based on how soon it would be likely to be implemented... True.
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@quanstro.net: plan 9 and inferno rely quite heavily on having flush, and it's sometimes notable when servers don't implement it. for instance, inferno's file2chan provides no facility for flush notification, and wm/sh uses file2chan; thus if you kill a process that's reading from wm/sh's /dev/cons, the read goes ahead anyway, and a line of input is lost (you might have seen this if you ever used os(1)). isn't the race still there, just with a smaller window of oppertunity? sure, there's always a race if you allow flush. (but at least the results are well defined regardless of the winner). i was trying to point out that if you try to ignore the issue by removing flush from the protocol, you'll get a system that doesn't work so smoothly.
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@quanstro.net: bundling is equivalent to running the original sequence on the remote machine and shipping only the result back. some rtt latency is eliminated but i think things will still be largely in-order because walks will act like fences. i think the lots- of-small-files case will still suffer. maybe i'm not quite following along. i agree that the lots-of-small-files case will still suffer (mainly because the non-hierarchical mount table means we can't know what's mounted below a particular node without asking the server). but this still gives the opportunity to considerably speed up many common actions (e.g. {walk, open, read, close}) without adding too much (i think) complexity of the protocol. also, as david leimbach points out, this case is still amenable to the send several requests concurrently approach.
Re: [9fans] Plan9 - the next 20 years
i was trying to point out that if you try to ignore the issue by removing flush from the protocol, you'll get a system that doesn't work so smoothly. your failure cases seem to rely on poorly chosen tags. i wasn't suggesting that flush be eliminated. i was thinking of ways of keeping flush self-contained. if the tag space were enlarged so that we could require that tags never be reused, flushes that do not flush anything could be remembered. the problem with this is it could require a large memory of unprocessed flushes. there are lots of potential solutions to that problem. one could allow the server to stall if too many martian flushes were hanging about and allow clients to declare they will reuse part of the tag space and asserting that nothing within reused portion is outstanding. you could then keep the current definition of tag, though 16 bits seems a bit small to me. - erik
Re: [9fans] Plan9 - the next 20 years
2009/4/21 erik quanstrom quans...@quanstro.net: i was trying to point out that if you try to ignore the issue by removing flush from the protocol, you'll get a system that doesn't work so smoothly. your failure cases seem to rely on poorly chosen tags. i wasn't suggesting that flush be eliminated. i was thinking of ways of keeping flush self-contained. my failure cases were based around supposing that 9p messages could be re-ordered in transit, as a response to maht's post. they can't, so there's no problem, as far as i can see. my proposal barely affects the flush semantics (there'd be an additional rule that flushing a Tsequence aborts the entire sequence, but i think that's it) the race case that i was talking about cannot be avoided by remembering old flushes. the problem is that an action (which could involve any number of irreversible side-effects) might have been performed at the instant that you tell it to abort. the semantics of flush let you know if you got there in time to stop it.
Re: [9fans] Plan9 - the next 20 years
On Mon, Apr 20, 2009 at 4:14 AM, Skip Tavakkolian 9...@9netics.com wrote: ericvh stated it better in the FAWN thread. choosing the abstraction that makes the resulting environments have required attributes (reliable, consistent, easy, etc.) will be the trick. i believe with the current state of the Internet -- e.g. lack of speed and security -- service abstraction is the right level of distributedness. presenting the services as file hierarchy makes sense; 9p is efficient 9p is efficient as long as your latency is 30ms uriel and so the plan9 approach still feels like the right path to cloud computing. On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote: Well, in the octopus you have a fixed part, the pc, but all other machines come and go. The feeling is very much that your stuff is in the cloud. i was going to mention this. to me the current view of cloud computing as evidence by papers like this[1] are basically hardware infrastructure capable of running vm pools each of which would do exactly what a dedicated server would do. the main benefits being low administration cost and elasticity. networking, authentication and authorization remain as they are now. they are still not addressing what octopus and rangboom are trying to address: how to seamlessly and automatically make resources accessible. if you read what ken said it appears to be this view of cloud computing; he said some framework to allow many loosely-coupled Plan9 systems to emulate a single system that would be larger and more reliable. in all virtualization systems i've seen the vm has to be smaller than the environment it runs on. if vmware or xen were ever to give you a vm that was larger than any given real machine it ran on, they'd have to solve the same problem. I'm not sure a single system image is any better in the long run than Distributed Shared Memory. Both have issues of locality, where the abstraction that gives you the view of a single machine hurts your ability to account for the lack of locality. In other words, I think applications should show a single system image but maybe not programming models. I'm not 100% sure what I mean by that actually, but it's sort of an intuitive feeling. [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
Re: [9fans] Plan9 - the next 20 years
9p is efficient as long as your latency is 30ms What kind of latency? For speed of light in fibre optic 30ms is about 8000km (New York to San Francisco and back) in that 30ms a 3.2Ghz P4 could do 292 million instructions There's an interesting article about it in acmq queue20090203-dl.pdf - Fighting Physics : A Tough Battle http://www.maht0x0r.net/library/computing/acm/queue20090203-dl.pdf
Re: [9fans] Plan9 - the next 20 years
9p is efficient as long as your latency is 30ms check out ken's answer to a question by sqweek. the question starts: With cross-continental round trip times, 9p has a hard time competing (in terms of throughput) against less general protocols like HTTP. ... http://moderator.appspot.com/#15/e=c9t=2d
Re: [9fans] Plan9 - the next 20 years
http://moderator.appspot.com/#15/e=c9t=2d You must have JavaScript enabled in order to use this feature. cruel irony. - erik
Re: [9fans] Plan9 - the next 20 years
What kind of latency? For speed of light in fibre optic 30ms is about 8000km (New York to San Francisco and back) Assuming you have a direct fiber connection with no routers in between. I would say that is somewhat rare.
Re: [9fans] Plan9 - the next 20 years
On Mon, Apr 20, 2009 at 11:03 AM, Skip Tavakkolian 9...@9netics.com wrote: 9p is efficient as long as your latency is 30ms check out ken's answer to a question by sqweek. the question starts: With cross-continental round trip times, 9p has a hard time competing (in terms of throughput) against less general protocols like HTTP. ... http://moderator.appspot.com/#15/e=c9t=2d I thought 9p had tagged requests so you could put many requests in flight at once, then synchronize on them when the server replied. Maybe i misunderstand the application of the tag field in the protocol then? Tread tag fid offset count Rread tag count data
Re: [9fans] Plan9 - the next 20 years
I thought 9p had tagged requests so you could put many requests in flight at once, then synchronize on them when the server replied. Maybe i misunderstand the application of the tag field in the protocol then? Tread tag fid offset count Rread tag count data without having the benefit of reading ken's thoughts ... you can have 1 fd being read by 2 procs at the same time. the only way to do this is by having multiple outstanding tags. i think the complaint about 9p boils down to ordering. if i want to do something like cd /sys/src/9/pc/ ; cat sdata.c that's a bunch of walks and then an open and then a read. these are done serially, and each one takes 1rtt. - erik
Re: [9fans] Plan9 - the next 20 years
I did the experiment, for the o/live, of issuing multiple (9p) RPCs in parallel, without waiting for answers. In general it was not enough, because in the end the client had to block and wait for the file to come before looking at it to issue further rpcs. On Mon, Apr 20, 2009 at 8:03 PM, Skip Tavakkolian 9...@9netics.com wrote: 9p is efficient as long as your latency is 30ms check out ken's answer to a question by sqweek. the question starts: With cross-continental round trip times, 9p has a hard time competing (in terms of throughput) against less general protocols like HTTP. ... http://moderator.appspot.com/#15/e=c9t=2d
Re: [9fans] Plan9 - the next 20 years
For speed of light in fibre optic 30ms is about 8000km (New York to San Francisco and back) in that 30ms a 3.2Ghz P4 could do 292 million instructions i think that's just enough to get to dbus and back.
Re: [9fans] Plan9 - the next 20 years
J.R. Mauro wrote: What kind of latency? For speed of light in fibre optic 30ms is about 8000km (New York to San Francisco and back) Assuming you have a direct fiber connection with no routers in between. I would say that is somewhat rare. The author found that from klondike.cis.upenn.edu cs.standford.edu added about 50ms to the round trip
Re: [9fans] Plan9 - the next 20 years
On Mon, Apr 20, 2009 at 11:35 AM, erik quanstrom quans...@coraid.comwrote: I thought 9p had tagged requests so you could put many requests in flight at once, then synchronize on them when the server replied. Maybe i misunderstand the application of the tag field in the protocol then? Tread tag fid offset count Rread tag count data without having the benefit of reading ken's thoughts ... you can have 1 fd being read by 2 procs at the same time. the only way to do this is by having multiple outstanding tags. I thought the tag was assigned by the client, not the server (since it shows up as a field in the T message), and that this meant it's possible for one client to put many of it's own locally tagged requests into the server, and wait for them in any order it chooses. It would not make sense to me to have to have a global pool of tags for all possible connecting clients. Again, this may just be my ignorance, and the fact that I've never implemented a 9p client or server myself. (haven't had a need to yet!) i think the complaint about 9p boils down to ordering. if i want to do something like cd /sys/src/9/pc/ ; cat sdata.c that's a bunch of walks and then an open and then a read. these are done serially, and each one takes 1rtt. Some higher operations probably require an ordering. But there's no reason you could do two different sequences of walks, and a read concurrently is there? - erik
Re: [9fans] Plan9 - the next 20 years
Tread tag fid offset count Rread tag count data without having the benefit of reading ken's thoughts ... you can have 1 fd being read by 2 procs at the same time. the only way to do this is by having multiple outstanding tags. I thought the tag was assigned by the client, not the server (since it shows up as a field in the T message), and that this meant it's possible for one client to put many of it's own locally tagged requests into the server, and wait for them in any order it chooses. that's what i thought i said. (from the perspective of pread and pwrite not (T R)^(read write).) i think the complaint about 9p boils down to ordering. if i want to do something like cd /sys/src/9/pc/ ; cat sdata.c that's a bunch of walks and then an open and then a read. these are done serially, and each one takes 1rtt. Some higher operations probably require an ordering. But there's no reason you could do two different sequences of walks, and a read concurrently is there? not that i can think of. but that addresses throughput, but not latency. - erik
Re: [9fans] Plan9 - the next 20 years
I thought 9p had tagged requests so you could put many requests in flight at once, then synchronize on them when the server replied. This is exactly what fcp(1) does, which is used by replica. If you want to read a virtual file however, these often don't support seeks or implement them in unexpected ways (returning one line per read rather than a buffer full). Thus running multiple reads (on the same file) only really works for files which operate as read disks - e.g. real disks, ram disks etc. -Steve
Re: [9fans] Plan9 - the next 20 years
Thus running multiple reads (on the same file) only really works for files which operate as read disks - e.g. real disks, ram disks etc. at which point, you have reinvented aoe. :-) - erik
Re: [9fans] Plan9 - the next 20 years
On Mon, Apr 20, 2009 at 12:03 PM, erik quanstrom quans...@coraid.comwrote: Tread tag fid offset count Rread tag count data without having the benefit of reading ken's thoughts ... you can have 1 fd being read by 2 procs at the same time. the only way to do this is by having multiple outstanding tags. I thought the tag was assigned by the client, not the server (since it shows up as a field in the T message), and that this meant it's possible for one client to put many of it's own locally tagged requests into the server, and wait for them in any order it chooses. that's what i thought i said. (from the perspective of pread and pwrite not (T R)^(read write).) Ah that's what I didn't understand :-). i think the complaint about 9p boils down to ordering. if i want to do something like cd /sys/src/9/pc/ ; cat sdata.c that's a bunch of walks and then an open and then a read. these are done serially, and each one takes 1rtt. Some higher operations probably require an ordering. But there's no reason you could do two different sequences of walks, and a read concurrently is there? not that i can think of. but that addresses throughput, but not latency. Right, but with better throughput overall, you can hide latency in some applications. That's what HTTP does with this AJAX fun right? Show some of the page, load the rest over time, and people feel better about stuff. I had an application for SNMP in Erlang that did too much serially, and by increasing the number of outstanding requests, I got the overall job done sooner, despite the latency issues. This improved the user experience by about 10 seconds less wait time. Tagged requests was actually how I implemented it :-) 9p can't fix the latency problems, but applications over 9p can be designed to try to hide some of it, depending on usage. - erik
Re: [9fans] Plan9 - the next 20 years
not that i can think of. but that addresses throughput, but not latency. Right, but with better throughput overall, you can hide latency in some applications. That's what HTTP does with this AJAX fun right? Show some of the page, load the rest over time, and people feel better about stuff. I had an application for SNMP in Erlang that did too much serially, and by increasing the number of outstanding requests, I got the overall job done sooner, despite the latency issues. This improved the user experience by about 10 seconds less wait time. Tagged requests was actually how I implemented it :-) 9p can't fix the latency problems, but applications over 9p can be designed to try to hide some of it, depending on usage. let's take the path /sys/src/9/pc/sdata.c. for http, getting this path takes one request (with the prefix http://$server) with 9p, this takes a number of walks, an open. then you can start with the reads. only the reads may be done in parallel. given network latency worth worring about, the total latency to read this file will be worse for 9p than for http. - erik
Re: [9fans] Plan9 - the next 20 years
On Mon, Apr 20, 2009 at 1:33 PM, erik quanstrom quans...@coraid.com wrote: not that i can think of. but that addresses throughput, but not latency. Right, but with better throughput overall, you can hide latency in some applications. That's what HTTP does with this AJAX fun right? Show some of the page, load the rest over time, and people feel better about stuff. I had an application for SNMP in Erlang that did too much serially, and by increasing the number of outstanding requests, I got the overall job done sooner, despite the latency issues. This improved the user experience by about 10 seconds less wait time. Tagged requests was actually how I implemented it :-) 9p can't fix the latency problems, but applications over 9p can be designed to try to hide some of it, depending on usage. let's take the path /sys/src/9/pc/sdata.c. for http, getting this path takes one request (with the prefix http://$server) with 9p, this takes a number of walks, an open. then you can start with the reads. only the reads may be done in parallel. given network latency worth worring about, the total latency to read this file will be worse for 9p than for http. Yeah, I guess due to my lack of having written a 9p client by hand, I've forgotten that a new 9p client session is stateful, and at the root of the hierarchy presented by the server. No choice but to walk, even if you know the path to the named resource in advance. This seems to give techniques like REST a bit of an advantage over 9p. (yikes) Would we want a less stateful 9p then? Does that end up being HTTP or IMAP or some other thing that already exists? Dave - erik
Re: [9fans] Plan9 - the next 20 years
with 9p, this takes a number of walks... shouldn't that be just one walk? % ramfs -D ... % mkdir -p /tmp/one/two/three/four/five/six ... % cd /tmp/one/two/three/four/five/six ramfs 640160:-Twalk tag 18 fid 1110 newfid 548 nwname 6 0:one 1:two 2:three 3:four 4:five 5:six ramfs 640160:-Rwalk tag 18 nwqid 6 0:(0001 0 d) 1:(0002 0 d) 2:(0003 0 d) 3:(0004 0 d) 4:(0005 0 d) 5:(0006 0 d)
Re: [9fans] Plan9 - the next 20 years
Well, in the octopus you have a fixed part, the pc, but all other machines come and go. The feeling is very much that your stuff is in the cloud. i was going to mention this. to me the current view of cloud computing as evidence by papers like this[1] are basically hardware infrastructure capable of running vm pools each of which would do exactly what a dedicated server would do. the main benefits being low administration cost and elasticity. networking, authentication and authorization remain as they are now. they are still not addressing what octopus and rangboom are trying to address: how to seamlessly and automatically make resources accessible. if you read what ken said it appears to be this view of cloud computing; he said some framework to allow many loosely-coupled Plan9 systems to emulate a single system that would be larger and more reliable. in all virtualization systems i've seen the vm has to be smaller than the environment it runs on. if vmware or xen were ever to give you a vm that was larger than any given real machine it ran on, they'd have to solve the same problem. [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
Re: [9fans] Plan9 - the next 20 years
On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote: Well, in the octopus you have a fixed part, the pc, but all other machines come and go. The feeling is very much that your stuff is in the cloud. i was going to mention this. to me the current view of cloud computing as evidence by papers like this[1] are basically hardware infrastructure capable of running vm pools each of which would do exactly what a dedicated server would do. the main benefits being low administration cost and elasticity. networking, authentication and authorization remain as they are now. they are still not addressing what octopus and rangboom are trying to address: how to seamlessly and automatically make resources accessible. if you read what ken said it appears to be this view of cloud computing; he said some framework to allow many loosely-coupled Plan9 systems to emulate a single system that would be larger and more reliable. in all virtualization systems i've seen the vm has to be smaller than the environment it runs on. if vmware or xen were ever to give you a vm that was larger than any given real machine it ran on, they'd have to solve the same problem. I'm not sure a single system image is any better in the long run than Distributed Shared Memory. Both have issues of locality, where the abstraction that gives you the view of a single machine hurts your ability to account for the lack of locality. In other words, I think applications should show a single system image but maybe not programming models. I'm not 100% sure what I mean by that actually, but it's sort of an intuitive feeling. [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
Re: [9fans] Plan9 - the next 20 years
* Latchesar Ionkov lu...@ionkov.net wrote: Hi, I talked with a guy that's is doing parallel filesystem work, and according to him 80% of all filesystem operations when running an HPC job are for checkpointing (not that much restart). I just don't see how checkpointing can scale knowing how bad the parallel fs are. We need a clustered venti and an cluster-aware fossil ;-P I'm currently in the process of designing an clustered storage, inspired by venti and git, which also supports removing files, on-demand sychronization, etc. I'll let you know when I've got something to present. cu -- -- Enrico Weigelt, metux IT service -- http://www.metux.de/ cellphone: +49 174 7066481 email: i...@metux.de skype: nekrad666 -- Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme --
Re: [9fans] Plan9 - the next 20 years
On Sun, Apr 19, 2009 at 12:34 PM, Enrico Weigelt weig...@metux.de wrote: I'm currently in the process of designing an clustered storage, inspired by venti and git, which also supports removing files, on-demand sychronization, etc. I'll let you know when I've got something to present. The only presentation with any value is code. code-code is better than jaw-jaw. ron
Re: [9fans] Plan9 - the next 20 years
ericvh stated it better in the FAWN thread. choosing the abstraction that makes the resulting environments have required attributes (reliable, consistent, easy, etc.) will be the trick. i believe with the current state of the Internet -- e.g. lack of speed and security -- service abstraction is the right level of distributedness. presenting the services as file hierarchy makes sense; 9p is efficient and so the plan9 approach still feels like the right path to cloud computing. On Sun, Apr 19, 2009 at 12:12 AM, Skip Tavakkolian 9...@9netics.com wrote: Well, in the octopus you have a fixed part, the pc, but all other machines come and go. The feeling is very much that your stuff is in the cloud. i was going to mention this. to me the current view of cloud computing as evidence by papers like this[1] are basically hardware infrastructure capable of running vm pools each of which would do exactly what a dedicated server would do. the main benefits being low administration cost and elasticity. networking, authentication and authorization remain as they are now. they are still not addressing what octopus and rangboom are trying to address: how to seamlessly and automatically make resources accessible. if you read what ken said it appears to be this view of cloud computing; he said some framework to allow many loosely-coupled Plan9 systems to emulate a single system that would be larger and more reliable. in all virtualization systems i've seen the vm has to be smaller than the environment it runs on. if vmware or xen were ever to give you a vm that was larger than any given real machine it ran on, they'd have to solve the same problem. I'm not sure a single system image is any better in the long run than Distributed Shared Memory. Both have issues of locality, where the abstraction that gives you the view of a single machine hurts your ability to account for the lack of locality. In other words, I think applications should show a single system image but maybe not programming models. I'm not 100% sure what I mean by that actually, but it's sort of an intuitive feeling. [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
Re: [9fans] Plan9 - the next 20 years
the original condor just forwarded system calls back to the node it was started from. Thus all system calls were done in the context of the originating node and user. Not much good if you're migrating because the node's gone down. What happens then? Sorry to ask, RTFM seems a bit beyond my ken, right now. Also, that gives you a single level of checkpoint, in fact, I'd call that strictly migration, C/R should allow a history of checkpoints, each of which can be restarted. Might not be possible, of course, unless one defines the underlying platform with new properties specially designed for C/R (and migration). Hence my insistence that Plan 9 is a better platform than other OSes in common use. The hardware may need to be adjusted too, but that's where Inferno EMU will come to the rescue :-) ++L
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 12:16 AM, erik quanstrom quans...@quanstro.net wrote: On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net wrote: I can imagine a lot of problems stemming from open files could be resolved by first attempting to import the process's namespace at the time of checkpoint and, upon that failing, using cached copies of the file made at the time of checkpoint, which could be merged later. there's no guarantee to a process running in a conventional environment that files won't change underfoot. why would condor extend a new guarantee? maybe i'm suffering from lack of vision, but i would think that to get to 100% one would need to think in terms of transactions and have a fully transactional operating system. - erik There's a much lower chance of files changing out from you in a conventional environment. If the goal is to make the unconventional environment look and act like the conventional one, it will probably have to try to do some of these things to be useful. * you can get the same effect by increasing the scale of your system. * the reason conventional systems work is not, in my opinion, because the collision window is small, but because one typically doesn't do conflicting edits to the same file. * saying that something isn't likely in an unquantifiable way is not a recipie for success in computer science, in my experience. - erik I don't see how any of that relates to having to do more work to ensure that C/R and process migration across nodes works and keeps things as consistent as possible.
Re: [9fans] Plan9 - the next 20 years
there's no guarantee to a process running in a conventional environment that files won't change underfoot. why would condor extend a new guarantee? Because you have to migrate standard applications, not only applications that allow for migration. Consider a word processing session, for example. Of course, one can design applications differently and I'm sure cloud computing will demand a different paradigm, but that is not what's being discussed here. ++L
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 03:15:25PM -0700, ron minnich wrote: if you want to look at checkpointing, it's worth going back to look at Condor, because they made it really work. There are a few interesting issues that you need to get right. You can't make it 50% of the way there; that's not useful. You have to hit all the bits -- open /tmp files, sockets, all of it. It's easy to get about 90% of it but the last bits are a real headache. Nothing that's come along since has really done the job (although various efforts claim to, you have to read the fine print). My only knowledge about this area is through papers and books so very abstract. But my gut feeling, after reading about Mach or reading A. Tanenbaum (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde), is that a cluster is above the OS (a collection of CPUs), but a NUMA is for the OS an atom, i.e. is below the OS, a kind of processor, a single CPU (so NUMA without a strong hardware specifity is something I don't understand). In all the mathematical or computer work I have done, defining the element, the atom (that is the unit I don't have to know or to deal with what is inside) has always given the best results. Not related to what you wrote but the impression made by what can be read about this cloud computing in the open sewer: A NUMA made of totally heterogeneous hardware with users plugging or unplugging a CPU component at will. Or a start-up (end-down) providing cloud computing with as the only means the users' hardware connected is perhaps a WEB 3.0 or more, a 4th millenium idea etc. but is for me at best an error, at worst a swindle. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
I assumed cloud computing means you can log into any node that you are authorised to and your data and code will migrate to you as needed. The idea being the sam -r split is not only dynamic but on demand, you may connect to the cloud from your phone just to read your email so the editor session stays attached to your home server as it has not requested many graphics events over the slow link. anecdote When I first read the London UKUUG Plan9 paper in the early nineties I was managing a HPC workstation cluster. I misunderstood the plan9 cpu(1) command and thought it, by default, connected to the least loaded cpu server available. I was sufficently inspired to write a cpu(1) command for my HP/UX cluster which did exactly that. Funny old world. /anecdote -Steve
Re: [9fans] Plan9 - the next 20 years
* you can get the same effect by increasing the scale of your system. * the reason conventional systems work is not, in my opinion, because the collision window is small, but because one typically doesn't do conflicting edits to the same file. * saying that something isn't likely in an unquantifiable way is not a recipie for success in computer science, in my experience. - erik I don't see how any of that relates to having to do more work to ensure that C/R and process migration across nodes works and keeps things as consistent as possible. that's a fine and sensible goal. but for the reasons above, i don't buy this line of reasoning. in a plan 9 system, the only files that i can think of which many processes have open at the same time are log files, append-only files. just reopening log file would solve the problem. what is a specific case of contention you are thinking of? i'm not sure why editor is the case that's being bandied about. two users don't usually edit the same file at the same time. that case already does not work. and i'm not sure why one would snapshot an editing session edit the file by other means and expect things to just work out. (and finally, acme, for example, does not keep the original file open. if open files are what get snapshotted, there would be not difference.) - erik
Re: [9fans] Plan9 - the next 20 years
[I reply to myself because I was replying half on two distinct threads] On Sat, Apr 18, 2009 at 01:59:03PM +0200, tlaro...@polynum.com wrote: But my gut feeling, after reading about Mach or reading A. Tanenbaum (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde), is that a cluster is above the OS (a collection of CPUs), but a NUMA is for the OS an atom, i.e. is below the OS, a kind of processor, a single CPU (so NUMA without a strong hardware specifity is something I don't understand). In all the mathematical or computer work I have done, defining the element, the atom (that is the unit I don't have to know or to deal with what is inside) has always given the best results. The link between this and the process migration is that, IMHO or in my limited mind, one allocates, depending on resources available at the moment, once and for the process duration, a node. This is OS business : allocating resources from a cluster of CPUs. The task doesn't migrate between nodes, it can migrate inside the node, from core to core in a tightly memory space coupled CPU (a mainframe, whether NUMA or not) that handles failover etc. But this is infra-OS, hardware stuff and as far as the OS is concerned nothing has changed since the node is an unit, an atom. And trying to solve the problem by breaking the border (going inside the atom) is something I don't feel. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 4:59 AM, tlaro...@polynum.com wrote: But my gut feeling, after reading about Mach or reading A. Tanenbaum (that I find poor---but he is A. Tanenbaum, I'm only T. Laronde), is that a cluster is above the OS (a collection of CPUs), but a NUMA is for the OS an atom, i.e. is below the OS, a kind of processor, a single CPU (so NUMA without a strong hardware specifity is something I don't understand). A cluster is above the OS because most cluster people don't know how to do OS work. Hence most cluster software follows basic patterns first set in 1991. That is no exaggeration. For cluster work that was done in the OS, see any clustermatic publication from minnich, hendriks, or watson, ca. 2000-2005. ron
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 6:50 AM, erik quanstrom quans...@quanstro.net wrote: in a plan 9 system, the only files that i can think of which many processes have open at the same time are log files, append-only files. just reopening log file would solve the problem. you're not thinking in terms of parallel applications if you make this statement. ron
Re: [9fans] Plan9 - the next 20 years
I talked with a guy that's is doing parallel filesystem work, and according to him 80% of all filesystem operations when running an HPC job are for checkpointing (not that much restart). I just don't see how checkpointing can scale knowing how bad the parallel fs are. Lucho On Fri, Apr 17, 2009 at 4:15 PM, ron minnich rminn...@gmail.com wrote: if you want to look at checkpointing, it's worth going back to look at Condor, because they made it really work. There are a few interesting issues that you need to get right. You can't make it 50% of the way there; that's not useful. You have to hit all the bits -- open /tmp files, sockets, all of it. It's easy to get about 90% of it but the last bits are a real headache. Nothing that's come along since has really done the job (although various efforts claim to, you have to read the fine print). ron
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 9:50 AM, erik quanstrom quans...@quanstro.net wrote: * you can get the same effect by increasing the scale of your system. * the reason conventional systems work is not, in my opinion, because the collision window is small, but because one typically doesn't do conflicting edits to the same file. * saying that something isn't likely in an unquantifiable way is not a recipie for success in computer science, in my experience. - erik I don't see how any of that relates to having to do more work to ensure that C/R and process migration across nodes works and keeps things as consistent as possible. that's a fine and sensible goal. but for the reasons above, i don't buy this line of reasoning. in a plan 9 system, the only files that i can think of which many processes have open at the same time are log files, append-only files. just reopening log file would solve the problem. what is a specific case of contention you are thinking of? i'm not sure why editor is the case that's being bandied about. two users don't usually edit the same file at the same time. that case already does not work. and i'm not sure why one would snapshot an editing session edit the file by other means and expect things to just work out. (and finally, acme, for example, does not keep the original file open. if open files are what get snapshotted, there would be not difference.) - erik Ron mentioned a bunch before, like /etc/hosts or a pipe to another process, and I would also suggest that things in /net and databases could be a serious problem. If you migrate a process, how do you ensure that the process is in a sane state on the new node? I agree that generally only one process will be accessing a normal file at once. I think an editor is not a good example, as you say.
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote: I agree that generally only one process will be accessing a normal file at once. I think an editor is not a good example, as you say. I'll say it again. It does not matter what we think. It matters what apps do. And some apps have multiple processes accessing one file. As to the wisdom of such access, there are many opinions :-) You really can not just rule things out because reasonable people don't do them. Unreasonable people write apps too. ron
Re: [9fans] Plan9 - the next 20 years
On Sat Apr 18 12:21:49 EDT 2009, rminn...@gmail.com wrote: On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote: I agree that generally only one process will be accessing a normal file at once. I think an editor is not a good example, as you say. I'll say it again. It does not matter what we think. It matters what apps do. And some apps have multiple processes accessing one file. As to the wisdom of such access, there are many opinions :-) You really can not just rule things out because reasonable people don't do them. Unreasonable people write apps too. do you think plan 9 could have been written with consideration of how people used x windows at the time? and still have the qualities that we love about it? - erik
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 12:20 PM, ron minnich rminn...@gmail.com wrote: On Sat, Apr 18, 2009 at 9:10 AM, J.R. Mauro jrm8...@gmail.com wrote: I agree that generally only one process will be accessing a normal file at once. I think an editor is not a good example, as you say. I'll say it again. It does not matter what we think. It matters what apps do. And some apps have multiple processes accessing one file. As to the wisdom of such access, there are many opinions :-) You really can not just rule things out because reasonable people don't do them. Unreasonable people write apps too. ron I just meant it was a bad example, not that the case of an editor doing something can or should be ruled out.
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 12:20 PM, ron minnich rminn...@gmail.com wrote: I'll say it again. It does not matter what we think. It matters what apps do. And some apps have multiple processes accessing one file. As to the wisdom of such access, there are many opinions :-) You really can not just rule things out because reasonable people don't do them. Unreasonable people write apps too. There are, from times to times, lists of Worst IT jobs ever. I _do_ think yours should come first! Having to say: yes to an user... Br... -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
I _do_ think yours should come first! Having to say: yes to an user... If you don't say 'yes' at some point, you won't have a system anyone will want to use. Remember all those quotes about why Unix doesn't prevent you from doing stupid things?
Re: [9fans] Plan9 - the next 20 years
A checkpoint restart package. https://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml
Re: [9fans] Plan9 - the next 20 years
this discussion of checkpoint/restart reminds me of a hint i was given years ago: if you wanted to break into a system, attack through the checkpoint/restart system. i won a jug of beer for my subsequent successful attack which involved patching the disc offset for an open file in a copy of the Slave Service Area saved by the checkpoint; with the offset patched to zero, the newly restored process could read the file and dump the users and passwords conveniently stored in the clear at the start of the system area of the system disc. the hard bit was writing the code to dump the data in a tidy way.
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 7:31 PM, Charles Forsyth fors...@terzarima.net wrote: this discussion of checkpoint/restart reminds me of a hint i was given years ago: if you wanted to break into a system, attack through the checkpoint/restart system. i won a jug of beer for my subsequent successful attack which involved patching the disc offset for an open file in a copy of the Slave Service Area saved by the checkpoint; with the offset patched to zero, the newly restored process could read the file and dump the users and passwords conveniently stored in the clear at the start of the system area of the system disc. the hard bit was writing the code to dump the data in a tidy way. Unfortunately, in the rush to build the Next Cool Thing people often leave security issues to the very end, at which point shoehorning fixes in gets ugly.
[9fans] Plan9 - the next 20 years
I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. He was asked what he would change if he where working on plan9 now, and his reply was somthing like I would add support for cloud computing. I admin I am not clear exactly what he meant by this. -Steve
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. He was asked what he would change if he where working on plan9 now, and his reply was somthing like I would add support for cloud computing. I admin I am not clear exactly what he meant by this. My interpretation of cloud computing is precisely the split done by plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with actual computing done somewhere about data stored somewhere. Perhaps tools for migrating tasks or managing the thing. But I have the impression that the Plan 9 framework is the best for such a scheme. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 3:43 PM, tlaro...@polynum.com wrote: On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. He was asked what he would change if he where working on plan9 now, and his reply was somthing like I would add support for cloud computing. I admin I am not clear exactly what he meant by this. My interpretation of cloud computing is precisely the split done by plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with actual computing done somewhere about data stored somewhere. The problem is that the CPU and Fileservers can't be assumed to be static. Things can and will go down, move about, and become temporarily unusable over time. Perhaps tools for migrating tasks or managing the thing. But I have the impression that the Plan 9 framework is the best for such a scheme. -- Thierry Laronde (Alceste) tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 2:43 PM, tlaro...@polynum.com wrote: On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. My interpretation of cloud computing is precisely the split done by plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with actual computing done somewhere about data stored somewhere. That misses the dynamic nature which clouds could enable -- something we lack as well with our hardcoded /lib/ndb files -- there is no provisions for cluster resources coming and going (or failing) and no control facilities given for provisioning (or deprovisioning) those resources in a dynamic fashion. Lucho's kvmfs (and to a certain extent xcpu) seem like steps in the right direction -- but IMHO more fundamental changes need to occur in the way we think about things. I believe the file system interfaces While not focused on cloud computing in particular, the work we are doing under HARE aims to explore these directions further (both in the context of Plan 9/Inferno as well as broader themes involving other platforms). For hints/ideas/whatnot you can check the current pubs (more coming soon): http://www.research.ibm.com/hare -eric
Re: [9fans] Plan9 - the next 20 years
Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. He was asked what he would change if he where working on plan9 now, and his reply was somthing like I would add support for cloud computing. Perhaps you were thinking of his Ask a Google engineer answers at http://moderator.appspot.com/#15/e=c9t=2d, specifically the question If you could redesign Plan 9 now (and expect similar uptake to UNIX), what would you do differently?
Re: [9fans] Plan9 - the next 20 years
Speaking of NUMA and such though, is there even any support for it in the kernel? I know we have a 10gb Ethernet driver, but what about cluster interconnects such as InfiniBand, Quadrics, or Myrinet? Are such things even desired in Plan 9? I'm glad see process migration has been mentioned winmail.dat
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 4:14 PM, Eric Van Hensbergen eri...@gmail.com wrote: On Fri, Apr 17, 2009 at 2:43 PM, tlaro...@polynum.com wrote: On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. My interpretation of cloud computing is precisely the split done by plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with actual computing done somewhere about data stored somewhere. That misses the dynamic nature which clouds could enable -- something we lack as well with our hardcoded /lib/ndb files -- there is no provisions for cluster resources coming and going (or failing) and no control facilities given for provisioning (or deprovisioning) those resources in a dynamic fashion. Lucho's kvmfs (and to a certain extent xcpu) seem like steps in the right direction -- but IMHO more fundamental changes need to occur in the way we think about things. I believe the file system interfaces While not focused on cloud computing in particular, the work we are doing under HARE aims to explore these directions further (both in the context of Plan 9/Inferno as well as broader themes involving other platforms). Vidi also seems to be an attempt to make Venti work in such a dynamic environment. IMHO, the assumption that computers are always connected to the network was a fundamental mistake in Plan 9 For hints/ideas/whatnot you can check the current pubs (more coming soon): http://www.research.ibm.com/hare -eric
Re: [9fans] Plan9 - the next 20 years
Well, in the octopus you have a fixed part, the pc, but all other machines come and go. The feeling is very much that your stuff is in the cloud. I mean, not everything has to be dynamic. El 17/04/2009, a las 22:17, eri...@gmail.com escribió: On Fri, Apr 17, 2009 at 2:43 PM, tlaro...@polynum.com wrote: On Fri, Apr 17, 2009 at 08:16:40PM +0100, Steve Simon wrote: I cannot find the reference (sorry), but I read an interview with Ken (Thompson) a while ago. My interpretation of cloud computing is precisely the split done by plan9 with terminal/CPU/FileServer: a UI runing on a this Terminal, with actual computing done somewhere about data stored somewhere. That misses the dynamic nature which clouds could enable -- something we lack as well with our hardcoded /lib/ndb files -- there is no provisions for cluster resources coming and going (or failing) and no control facilities given for provisioning (or deprovisioning) those resources in a dynamic fashion. Lucho's kvmfs (and to a certain extent xcpu) seem like steps in the right direction -- but IMHO more fundamental changes need to occur in the way we think about things. I believe the file system interfaces While not focused on cloud computing in particular, the work we are doing under HARE aims to explore these directions further (both in the context of Plan 9/Inferno as well as broader themes involving other platforms). For hints/ideas/whatnot you can check the current pubs (more coming soon): http://www.research.ibm.com/hare -eric [/mail/box/nemo/msgs/200904/38399]
Re: [9fans] Plan9 - the next 20 years
if you want to look at checkpointing, it's worth going back to look at Condor, because they made it really work. There are a few interesting issues that you need to get right. You can't make it 50% of the way there; that's not useful. You have to hit all the bits -- open /tmp files, sockets, all of it. It's easy to get about 90% of it but the last bits are a real headache. Nothing that's come along since has really done the job (although various efforts claim to, you have to read the fine print). ron
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 6:15 PM, ron minnich rminn...@gmail.com wrote: if you want to look at checkpointing, it's worth going back to look at Condor, because they made it really work. There are a few interesting issues that you need to get right. You can't make it 50% of the way there; that's not useful. You have to hit all the bits -- open /tmp files, sockets, all of it. It's easy to get about 90% of it but the last bits are a real headache. Nothing that's come along since has really done the job (although various efforts claim to, you have to read the fine print). ron Amen. Linux is currently having a seriously hard time getting C/R working properly, just because of the issues you mention. The second you mix in non-local resources, things get pear-shaped. Unfortunately, even if it does work, it will probably not have the kind of nice Plan 9-ish semantics I can envision it having.
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 3:35 PM, J.R. Mauro jrm8...@gmail.com wrote: Amen. Linux is currently having a seriously hard time getting C/R working properly, just because of the issues you mention. The second you mix in non-local resources, things get pear-shaped. it's not just non-local. It's local too. you are on a node. you open /etc/hosts. You C/R to another node with /etc/hosts open. What's that mean? You are on a node. you open a file in a ramdisk. Other programs have it open too. You are watching each other's writes. You C/R to another node with the file open. What's that mean? You are on a node. You have a pipe to a process on that node. You C/R to another node. Are you still talking at the end? And on and on. It's quite easy to get this stuff wrong. But true C/R requires that you get it right. The only system that would get this stuff mostly right that I ever used was Condor. (and, well the Apollo I think got it too, but that was a ways back). ron
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 7:01 PM, ron minnich rminn...@gmail.com wrote: On Fri, Apr 17, 2009 at 3:35 PM, J.R. Mauro jrm8...@gmail.com wrote: Amen. Linux is currently having a seriously hard time getting C/R working properly, just because of the issues you mention. The second you mix in non-local resources, things get pear-shaped. it's not just non-local. It's local too. you are on a node. you open /etc/hosts. You C/R to another node with /etc/hosts open. What's that mean? You are on a node. you open a file in a ramdisk. Other programs have it open too. You are watching each other's writes. You C/R to another node with the file open. What's that mean? You are on a node. You have a pipe to a process on that node. You C/R to another node. Are you still talking at the end? And on and on. It's quite easy to get this stuff wrong. But true C/R requires that you get it right. The only system that would get this stuff mostly right that I ever used was Condor. (and, well the Apollo I think got it too, but that was a ways back). ron Yeah, the problem's bigger than I thought (not surprising since I didn't think much about it). I'm having a hard time figuring out how Condor handles these issues. All I can see from the documentation is that it gives you warnings. I can imagine a lot of problems stemming from open files could be resolved by first attempting to import the process's namespace at the time of checkpoint and, upon that failing, using cached copies of the file made at the time of checkpoint, which could be merged later. But this still has the 90% problem you mentioned.
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 7:06 PM, J.R. Mauro jrm8...@gmail.com wrote: Yeah, the problem's bigger than I thought (not surprising since I didn't think much about it). I'm having a hard time figuring out how Condor handles these issues. All I can see from the documentation is that it gives you warnings. the original condor just forwarded system calls back to the node it was started from. Thus all system calls were done in the context of the originating node and user. But this still has the 90% problem you mentioned. it's just plain harder than it looks ... ron
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 10:39 PM, ron minnich rminn...@gmail.com wrote: On Fri, Apr 17, 2009 at 7:06 PM, J.R. Mauro jrm8...@gmail.com wrote: Yeah, the problem's bigger than I thought (not surprising since I didn't think much about it). I'm having a hard time figuring out how Condor handles these issues. All I can see from the documentation is that it gives you warnings. the original condor just forwarded system calls back to the node it was started from. Thus all system calls were done in the context of the originating node and user. Best effort is a good place to start. But this still has the 90% problem you mentioned. it's just plain harder than it looks ... Yeah. Every time I think of a way to address the corner cases, new ones crop up.
Re: [9fans] Plan9 - the next 20 years
I can imagine a lot of problems stemming from open files could be resolved by first attempting to import the process's namespace at the time of checkpoint and, upon that failing, using cached copies of the file made at the time of checkpoint, which could be merged later. there's no guarantee to a process running in a conventional environment that files won't change underfoot. why would condor extend a new guarantee? maybe i'm suffering from lack of vision, but i would think that to get to 100% one would need to think in terms of transactions and have a fully transactional operating system. - erik
Re: [9fans] Plan9 - the next 20 years
Vidi also seems to be an attempt to make Venti work in such a dynamic environment. IMHO, the assumption that computers are always connected to the network was a fundamental mistake in Plan 9 on the other hand, without this assumption, we would not have 9p. it was a real innovation to dispense with underpowered workstations with full adminstrative burdens. i think it is anachronistic to consider the type of mobile devices we have today. in 1990 i knew exactly 0 people with a cell phone. i had a toshiba orange screen laptop from work, but in those days a 9600 baud vt100 was still a step up. ah, the good old days. none of this is do detract from the obviously good idea of being able to carry around a working set and sync up with the main server later without some revision control junk. in fact, i was excited to learn about fossil — i was under the impression from reading the paper that that's how it worked. speaking of vidi, do the vidi authors have an update on their work? i'd really like to hear how it is working out. - erik
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net wrote: I can imagine a lot of problems stemming from open files could be resolved by first attempting to import the process's namespace at the time of checkpoint and, upon that failing, using cached copies of the file made at the time of checkpoint, which could be merged later. there's no guarantee to a process running in a conventional environment that files won't change underfoot. why would condor extend a new guarantee? maybe i'm suffering from lack of vision, but i would think that to get to 100% one would need to think in terms of transactions and have a fully transactional operating system. - erik There's a much lower chance of files changing out from you in a conventional environment. If the goal is to make the unconventional environment look and act like the conventional one, it will probably have to try to do some of these things to be useful.
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 11:56 PM, erik quanstrom quans...@quanstro.net wrote: Vidi also seems to be an attempt to make Venti work in such a dynamic environment. IMHO, the assumption that computers are always connected to the network was a fundamental mistake in Plan 9 on the other hand, without this assumption, we would not have 9p. it was a real innovation to dispense with underpowered workstations with full adminstrative burdens. i think it is anachronistic to consider the type of mobile devices we have today. in 1990 i knew exactly 0 people with a cell phone. i had a toshiba orange screen laptop from work, but in those days a 9600 baud vt100 was still a step up. ah, the good old days. Of course it's easy to blame people for lack of vision 25 years later, but with the rate at which computing moves in general, cell phones as powerful as workstations should have been seen to be on their way within the authors' lifetimes. That said, Plan 9 was designed to furnish the needs of an environment that might not ever have had iPhones and eeePCs attached to it even if such things existed at the time it was made. But I'll say that if anyone tries to solve these problems today, they should not fall into the same trap, and look to the future. I hope they'll consider how well their solution scales to computers so small they're running through someone's bloodstream and so far away that communication in one direction will take several light-minutes and be subject to massive delay and loss. It's not that ridiculous... teams are testing DTN, which hopes to spread the internet to outer space, not only across this solar system, but also to nearby stars. Now there's thinking forward! none of this is do detract from the obviously good idea of being able to carry around a working set and sync up with the main server later without some revision control junk. in fact, i was excited to learn about fossil — i was under the impression from reading the paper that that's how it worked. speaking of vidi, do the vidi authors have an update on their work? i'd really like to hear how it is working out. - erik
Re: [9fans] Plan9 - the next 20 years
But I'll say that if anyone tries to solve these problems today, they should not fall into the same trap, [...] yes. forward thinking was just the thing that made multics what it is today. it is equally a trap to try to prognosticate too far in advance. one increases the likelyhood of failure and the chances of being dead wrong. - erik
Re: [9fans] Plan9 - the next 20 years
On Fri, Apr 17, 2009 at 11:37 PM, erik quanstrom quans...@quanstro.net wrote: I can imagine a lot of problems stemming from open files could be resolved by first attempting to import the process's namespace at the time of checkpoint and, upon that failing, using cached copies of the file made at the time of checkpoint, which could be merged later. there's no guarantee to a process running in a conventional environment that files won't change underfoot. why would condor extend a new guarantee? maybe i'm suffering from lack of vision, but i would think that to get to 100% one would need to think in terms of transactions and have a fully transactional operating system. - erik There's a much lower chance of files changing out from you in a conventional environment. If the goal is to make the unconventional environment look and act like the conventional one, it will probably have to try to do some of these things to be useful. * you can get the same effect by increasing the scale of your system. * the reason conventional systems work is not, in my opinion, because the collision window is small, but because one typically doesn't do conflicting edits to the same file. * saying that something isn't likely in an unquantifiable way is not a recipie for success in computer science, in my experience. - erik
Re: [9fans] Plan9 - the next 20 years
Speaking of NUMA and such though, is there even any support for it in the kernel? I know we have a 10gb Ethernet driver, but what about cluster interconnects such as InfiniBand, Quadrics, or Myrinet? Are such things even desired in Plan 9? there is no explicit numa support in the pc kernel. however it runs just fine on standard x86-64 numa architectures like intel nelaham and amd opteron. we have two 10gbe ethernet drivers the myricom driver and the intel 82598 driver. the blue gene folks have support for a number of blue-gene-specific networks. i don't know too much about myrinet, infiniband or quadratics. i have nothing against any of them, but 10gbe has been a much better fit for the things i've wanted to do. - erik
Re: [9fans] Plan9 - the next 20 years
On Sat, Apr 18, 2009 at 12:16 AM, erik quanstrom quans...@quanstro.net wrote: But I'll say that if anyone tries to solve these problems today, they should not fall into the same trap, [...] yes. forward thinking was just the thing that made multics what it is today. it is equally a trap to try to prognosticate too far in advance. one increases the likelyhood of failure and the chances of being dead wrong. - erik I don't think what I outlined is too far ahead, and the issues presented are all doable as long as a small bit of extra consideration is made. Keeping your eye only on the here and now was just the thing that gave Unix a bunch of tumorous growths like sockets and X11, and made Windows the wonderful piece of hackery it is. I'm not suggesting we consider how to solve the problems we'll face when we're flying through space and time in the TARDIS and shrinking ourselves and our bioships down to molecular sizes to cure someone's brain cancer. I'm talking about making something scale across distances and magnitudes that we will come accustomed to in the next five decades.