[ Possibly overlengthy reply. However given a multiple sets of cans of worms... ] On Friday 07 October 2005 07:25, Josiah Carlson wrote: > One thing I notice is absent from the Kamaelia page is benchmarks.
That's largely for one simple reason: we haven't done any yet. At least not anything I'd call a benchmark. "There's lies, damn lies, statistics and then there's benchmarks." //Theoretically// I suspect that the system /could/ perform as well as traditional approaches to dealing with concurrent problems single threaded (and multi-thread/process). This is based on the recognition of two things: * Event systems (often implementing state machine type behaviour, not always though), often have intermediate buffers between states & operations. Some systems divide a problem into multiple reactors and stages and have communication between them, though this can sometimes be hidden. All we've done is make this much more explicit. * Event systems (and state machine based approaches) can often be used to effectively say "I want to stop and wait here, come back to me later" or simply "I'm doing something processor intensive, but I'm being nice and letting something else have a go". The use of generators here simply makes that particular behaviour more explicit. This is a nice bonus of python. [neither is a negative really, just different. The first bullet has implicit buffers in the system, the latter has a more implicit state machine in the system. ICBVW here of course.] However, COULD is not is, and whilst I say "in theory", I am painfully aware that theory and practice often have a big gulf between them. Also, I'm certain that at present our performance is nowhere near optimal. We've focussed on trying to find what works from a few perspectives rather than performance (one possible definition of correctness here, but certainly not the only one). Along the way we've made compomises in favour of clarity as to what's going on, rather than performance. For example, one are we know we can optimise is the handling of message delivery. The mini-axon tutorial represents delivery between active components as being performed by an independent party - a postman. This is precisely what happens in the current system. That can be optimised for example by collapsing outboxes into inboxes (ie removing one of the lists when a linkage is made and changing the refernce), and at that point you have a single intermediate buffer (much like an event/state system communicating between subsystems). We haven't done this yet, Whilst it would partly simplify things, it makes other areas more complex, and seems like premature optimisation. However I have performed an //informal comparison// between the use of a Kamaelia type approach and a traditional approach not using any framework at all for implementing a trivial game. (Cats bouncing around the screen scaling, rotating, etc, controlled by a user) The reason I say Kamaelia-type approach is because it was a mini-axon based experiment using collapsed outboxes to inboxes (as above). The measure I used was simply framerate. This is a fair real value and has a real use - if it drops too low, the system is simply unusable. I measured the framerate before transforming the simplistic game to work well in the framework, and after transforming it. The differences were: * 5% drop in performance/framerate * The ability to reuse much of the code in other systems and environments. >From that perspective it seems acceptable (for now). This *isn't* as you would probably say a rigorous or trustable benchmark, but was a useful "smoke test" if you like of the approach. >From a *pragmatic* perspective, currently the system is fast enough for simple games (say a hundred, 2 hundred, maybe more, sprites actve at once), for interactive applications, video players, realtime audio mixing and a variety of other things, so currently we're leaving that aside. Also from an even more pragmatic perspective, I would say if you're after performance and throughput then I'd say use Twisted, since it's a proven technology. **If** our stuff turns out to be useful, we'd like to find way of making our stuff available inside twisted -- if they'd like it (*) -- since we're not the least bit interested in competing with anyone :-) So far *we're* finding it useful, which is all I'd personally claim, and hope that it's useful to others. (*) The all too brief conversation I had with Tommi Virtanen at Europython suggested that he at least thought the pipeline/graphline idea was worth taking - so I'd like to do that at some point, even if it sidelines our work to date. Once we've validated the model though (which I expect to take some time, you only learn if it's validated by builiding things IMO), then we'll look at optimisation. (if the model is validated :-) All that said, I'm open to suggestion as to what sort of benchmark you'd like to see. I'm more interested in benchmarks that actually mean something rather than say X is better than Y though. Summarising them, no benchmarks, yet. If you're after speed, I'm certain you can find that elsewhere. If you're after an easy way of dealing with a concurrent problem, that's where we're starting from, and then optimising. We're very open to suggestions for improvement on both usability/leanability and on keeping doors open/open doors to performance though. I'd hate to have to rewrite everything in a another language later simply due to poor design decisions. [ Network controlled Networked Audio Mixing Matrix ] > Very neat. How much data? What kind of throughput? What kinds of > latencies? For the test system we tested with 3 raw PCM audio data streams. That 's 3 x 44.1Khz, 16 bit stereo - which is around 4.2Mbit/s of data from the network being processed realtime and output back to the network at 1.4Mbit/s. So, not huge numbers, but not insignificant amounts of data either. I suppose one thing I can take more time with now is to look at the specific latency of the mixer. It didn't *appear* to be large however. (there appeared to be similar latency in the system with or without the mixer) [[The aim of the rapid prototyping session was to see what could be done rather than to measure the results. The total time taken for coding the mixing matrix was 2.5 days. About 1/2 day spent on finding an issue we had with network resends regarding non-blocking sockets. A day with me totally misunderstanding how mixing raw audio byte streams works. The backplane was written during that 3 day time period. The control protocol for switching on/off mixes and querying the system though was ~1.5 hours from start to finish, including testing. To experiment with what dataflow architecture might work, I knocked up a command line controlled dynamic graph viewer (add nodes, link nodes, delete nodes) in about 5 minutes and then experimented with what the system would look like if done naively. The backplane idea became clear as useful here because we wanted to allow multiple mixers. ]] A more interesting effect we found was dealing with mouse movement in pygame where we found that *huge* numbers of messages being sent one at a time and processed one at a time (with yields after each) became a huge bottleneck. It became more sense to batch the events and pass them to client surfaces. (If that makes no sense we allow pygame components to act as if they have control of the display by giving them a surface from a pygame display service. This acts essentially as a simplistic window manager. That means pygame events need to be passed through quickly and cleanly.) The reason I like using pygame for these things is because a) it's relatively raw and fast b) games are another often /naturally/ concurrent system. Also it normally allows other senses beyond reading numbers/graphs to kick in when evaluating changes "that looks better/worse", "Theres's something wrong there". > I have two recent posts about the performance and features of a (hacked > together) tuple space system Great :-) I'll have a dig around. > The only thing that it is missing is a prioritization mechanism (fifo, > numeric priority, etc.), which would get us a job scheduling kernel. Not > bad for a "message passing"/"tuple space"/"IPC" library. Sounds interesting. I'll try and find some time to have a look and have a play. FWIW, we're also missing a prioritisation mechanism right now. Though currently I have SImon Wittber's latest release of Nanothreads on my stack of to look at. I do have a soft spot for Linda type approaches though :-) Best Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group [EMAIL PROTECTED], http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com