Re: [fonc] Linus Chews Up Kernel Maintainer For Introducing Userspace Bug - Slashdot
On 12/31/2012 10:47 PM, Marcus G. Daniels wrote: On 12/31/12 8:30 PM, Paul D. Fernhout wrote: So, I guess another meta-level bug in the Linux Kernel is that it is written in C, which does not support certain complexity management features, and there is no clear upgrade path from that because C++ has always had serious linking problems. But the ABIs aren't specified in terms of language interfaces, they are architecture-specific. POSIX kernel interfaces don't need C++ link level compatibility, or even extern C compatibility interfaces. Similarly on the device side, that's packing command blocks and such, byte by byte. Until a few years ago, GCC was the only compiler ever used (or able) to compile the Linux kernel. It is a feature that it all can be compiled with one open source toolchain. Every aspect can be improved. granted. typically, the actual call into kernel-land is a target-specific glob of ASM code, which may then be wrapped up to make all the various system calls. as for ABIs a few things could help: if the C++ ABI was defined *along with* the C ABI for a given target; if the C++ compilers would use said ABI, rather than each rolling their own; if the ABI were sufficiently general to be more useful to multiple languages (besides just C and C++); ... in this case, the C ABI could be considered a formal subset of the C++ ABI. admittedly, if I could have my say, I would make some changes to the way struct/class passing and returning is handled in SysV / AMD64. namely make it less complicated/evil, like, say, the struct is either passed in a single register, or passed as a reference (no decomposition and passing via multiple registers). more-so, probably also provide spill-space for arguments passed as registers (more like in Win64). granted, this itself may illustrate part of the problem: with many of these ABIs, not everyone is happy, so there is a lot of temptation for compiler vendors to go their own way (making going mix and match with code compiled by different compilers, or sometimes with different compiler options, unsafe...). sometimes, it may usually work, but sometimes fail, due to minor ABI differences. From that thread I read that those in the Linus camp are fine with abstraction, but it has to be their abstraction on their terms. An later in the thread, Theodore T'so gave an example of opacity in the programming model: a = b + /share/ + c + serial_num; Arguing where you can have absolutely no idea how many memory allocations are done, due to type coercions, overloaded operators Well, I'd say just write the code in concise notation. If there are memory allocations they'll show up in valgrind runs, for example. Then disassemble that function and understand what the memory allocations actually are. If there is a better way to do it, then either change abstractions, or improve the compiler to do it more efficiently. Yes, there can be an investment in a lot of stuff. But just defining any programming model with a non-obvious performance model as a bad programming model is shortsighted advice, especially for developers outside of the world of operating systems. That something is non-obvious is not necessarily a bad thing. It just means a bit more depth-first investigation. At least one can _learn_ something from the diversion. yep. some of this is also a bit of a problem for many VM based languages, which may, behind the scenes, chew through memory, while giving little control of any of this to the programmer. in my case, I have been left fighting performance in many areas with my own language, admittedly because its present core VM design isn't particularly high performance in some areas. though, one can still be left looking at a sort of ugly wall: the wall separating static and dynamic types. dynamic types is a land of relative ease, but not particularly good performance. static types is a land of pain and implementation complexity, but also better performance. well, there is also the fixnum issue, where a fixnum may be just slightly smaller than an analogous native type (it is the curse of the 28-30 bit fixnum, or the 60-62 bit long-fixnum...). this issue is annoying specifically because it specifically gets in the way of having an efficient fixnum type and also map it to a sensible native type (like int) while keeping the usual definition intact that int is exactly 32-bits and/or that long is exactly 64-bits. but, as a recent attempt at trying to switch to untagged value types revealed, even with an interpreter core that is mostly statically typed, making this switch may still open a big can of worms in some other cases (because there are still holes in the static type-system). I have been left considering the possibility of instead making a compromise: int, float, and double can be represented directly; long, however, would (still) be handled as a boxed-value. this
[fonc] Current topics
The most recent discussions get at a number of important issues whose pernicious snares need to be handled better. In an analogy to sending messages most of the time successfully through noisy channels -- where the noise also affects whatever we add to the messages to help (and we may have imperfect models of the noise) -- we have to ask: what kinds and rates of error would be acceptable? We humans are a noisy species. And on both ends of the transmissions. So a message that can be proved perfectly received as sent can still be interpreted poorly by a human directly, or by software written by humans. A wonderful specification language that produces runable code good enough to make a prototype, is still going to require debugging because it is hard to get the spec-specs right (even with a machine version of human level AI to help with larger goals comprehension). As humans, we are used to being sloppy about message creation and sending, and rely on negotiation and good will after the fact to deal with errors. We've not done a good job of dealing with these tendencies within programming -- we are still sloppy, and we tend not to create negotiation processes to deal with various kinds of errors. However, we do see something that is actual engineering -- with both care in message sending *and* negotiation -- where eventual failure is not tolerated: mostly in hardware, and in a few vital low-level systems which have to scale pretty much finally-essentially error-free such as the Ethernet and Internet. My prejudices have always liked dynamic approaches to problems with error detection and improvements (if possible). Dan Ingalls was (and is) a master at getting a whole system going in such a way that it has enough integrity to exhibit its failures and allow many of them to be addressed in the context of what is actually going on, even with very low level failures. It is interesting to note the contributions from what you can say statically (the higher the level the language the better) -- what can be done with meta (the more dynamic and deep the integrity, the more powerful and safe meta becomes) -- and the tradeoffs of modularization (hard to sum up, but as humans we don't give all modules the same care and love when designing and building them). Mix in real human beings and a world-wide system, and what should be done? (I don't know, this is a question to the group.) There are two systems I look at all the time. The first is lawyers contrasted with engineers. The second is human systems contrasted with biological systems. There are about 1.2 million lawyers in the US, and about 1.5 million engineers (some of them in computing). The current estimates of programmers in the US are about 1.3 million (US Dept of Labor counting programmers and developers). Also, the Internet and multinational corporations, etc., internationalizes the impact of programming, so we need an estimate of the programmers world-wide, probably another million or two? Add in the ad hoc programmers, etc? The populations are similar in size enough to make the contrasts in methods and results quite striking. Looking for analogies, to my eye what is happening with programming is more similar to what has happened with law than with classical engineering. Everyone will have an opinion on this, but I think it is partly because nature is a tougher critic on human built structures than humans are on each other's opinions, and part of the impact of this is amplified by the simpler shorter term liabilities of imperfect structures on human safety than on imperfect laws (one could argue that the latter are much more of a disaster in the long run). And, in trying to tease useful analogies from Biology, one I get is that the largest gap in complexity of atomic structures is the one from polymers to the simplest living cells. (One of my two favorite organisms is Pelagibacter unique, which is the smallest non-parasitic standalone organism. Discovered just 10 years ago, it is the most numerous known bacterium in the world, and accounts for 25% of all of the plankton in the oceans. Still it has about 1300+ genes, etc.) What's interesting (to me) about cell biology is just how much stuff is organized to make integrity of life. Craig Ventor thinks that a minimal hand-crafted genome for a cell would still require about 300 genes (and a tiniest whole organism still winds up with a lot of components). Analogies should be suspect -- both the one to the law, and the one here should be scrutinized -- but this one harmonizes with one of Butler Lampson's conclusions/prejudices: that you are much better off making -- with great care -- a few kinds of relatively big modules as basic building blocks than to have zillions of different modules being constructed by vanilla programmers. One of my favorite examples of this was the Beings master's thesis by Doug Lenat at Stanford in the 70s. And this
Re: [fonc] Incentives and Metrics for Infrastructure vs. Functionality
On Mon, Dec 31, 2012 at 04:36:09PM -0700, Marcus G. Daniels wrote: On 12/31/12 2:58 PM, Paul D. Fernhout wrote: 2. The programmer has a belief or preference that the code is easier to work with if it isn't abstracted. […] I have evidence for this poisonous belief. Here is some production C++ code I saw: if (condition1) { if (condition2) { // some code } } instead of if (condition1 condition2) { // some code } - void latin1_to_utf8(std::string s); instead of std::string utf8_of_latin1(std::string s) or std::string utf8_of_latin1(const std::string s) - (this one is more controversial) Foo foo; if (condition) foo = bar; else foo = baz; instead of Foo foo = condition ? bar : baz; I think the root cause of those three examples can be called step by step thinking. Some people just can't deal with abstractions at all, not even functions. They can only make procedures, which do their thing step by step, and rely on global state. (Yes, global state, though they do have the courtesy to fool themselves by putting it in a long lived object instead of the toplevel.) The result is effectively a monster of mostly linear code, which is cut at obvious boundaries whenever `main()` becomes too long (too long generally being a couple hundred lines. Each line of such code _is_ highly legible, I'll give them that. The whole however would frighten even Cthulhu. Loup. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Incentives and Metrics for Infrastructure vs. Functionality
On 1/1/2013 2:12 PM, Loup Vaillant-David wrote: On Mon, Dec 31, 2012 at 04:36:09PM -0700, Marcus G. Daniels wrote: On 12/31/12 2:58 PM, Paul D. Fernhout wrote: 2. The programmer has a belief or preference that the code is easier to work with if it isn't abstracted. […] I have evidence for this poisonous belief. Here is some production C++ code I saw: if (condition1) { if (condition2) { // some code } } instead of if (condition1 condition2) { // some code } - void latin1_to_utf8(std::string s); instead of std::string utf8_of_latin1(std::string s) or std::string utf8_of_latin1(const std::string s) - (this one is more controversial) Foo foo; if (condition) foo = bar; else foo = baz; instead of Foo foo = condition ? bar : baz; I think the root cause of those three examples can be called step by step thinking. Some people just can't deal with abstractions at all, not even functions. They can only make procedures, which do their thing step by step, and rely on global state. (Yes, global state, though they do have the courtesy to fool themselves by putting it in a long lived object instead of the toplevel.) The result is effectively a monster of mostly linear code, which is cut at obvious boundaries whenever `main()` becomes too long (too long generally being a couple hundred lines. Each line of such code _is_ highly legible, I'll give them that. The whole however would frighten even Cthulhu. part of the issue may be a tradeoff: does the programmer think in terms of abstractions and using high-level overviews? or, does the programmer mostly think in terms of step-by-step operations and make use of their ability to keep large chunks of information in memory? it is a question maybe of whether the programmer sees the forest or the trees. these sorts of things may well have an impact on the types of code a person writes, and what sorts of things the programmer finds more readable. like, for a person who can mentally more easily deal with step-by-step thinking, but can keep much of the code in their mind at-once, and quickly walk around and explore the various possibilities and scenarios, this kind of bulky low-abstraction code may be preferable, since when they walk the graph in their mind, they don't really have to stop and think too much about what sorts of items they encounter along the way. in their minds-eye, it may well look like a debugger stepping at a rate of roughly 5-10 statements per second or so. maybe they may or may not be fully aware how their mind does it, but they can vaguely see the traces along the call-stack, ghosts of intermediate values, and the sudden jump of attention to somewhere where a crash has occurred or an exception has been thrown. actually, I had before compared it to ants: it is like ones' mind has ants in it, which walk along trails, either stepping code, or trying out various possibilities, ... once something interesting comes up, it starts attracting more of these mental ants, until it has a whole swarm, and then a more clear image of the scenario or idea may emerge in ones' mind. but, abstractions and difficult concepts are like oil to these ants, where if ants encounter something they don't like (like oil) they will back up and try to walk around it (and individual ants aren't particularly smart). and, probably, other people use other methods of reasoning about code... ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Incentives and Metrics for Infrastructure vs. Functionality
On Tue, Jan 01, 2013 at 03:02:09PM -0600, BGB wrote: On 1/1/2013 2:12 PM, Loup Vaillant-David wrote: On Mon, Dec 31, 2012 at 04:36:09PM -0700, Marcus G. Daniels wrote: On 12/31/12 2:58 PM, Paul D. Fernhout wrote: 2. The programmer has a belief or preference that the code is easier to work with if it isn't abstracted. […] I have evidence for this poisonous belief. Here is some production C++ code I saw: [code snips] I think the root cause of those three examples can be called step by step thinking. […] part of the issue may be a tradeoff: does the programmer think in terms of abstractions and using high-level overviews? or, does the programmer mostly think in terms of step-by-step operations and make use of their ability to keep large chunks of information in memory? it is a question maybe of whether the programmer sees the forest or the trees. these sorts of things may well have an impact on the types of code a person writes, and what sorts of things the programmer finds more readable. Well, that could be tested. Let's write some code in a procedural way, and in a functional way. Show it to a bunch of programmers, and see if they understand it, spot the bugs, can extend it etc. I'm not sure what to expect from such tests. One could think most people would deal more easily with the procedural program, but on the other hand, I expect the procedural version will be significantly more complex, especially if it abides step by step aesthetics. Loup. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Incentives and Metrics for Infrastructure vs. Functionality
On Tue, Jan 01, 2013 at 09:12:07PM +0100, Loup Vaillant-David wrote: On Mon, Dec 31, 2012 at 04:36:09PM -0700, Marcus G. Daniels wrote: On 12/31/12 2:58 PM, Paul D. Fernhout wrote: 2. The programmer has a belief or preference that the code is easier to work with if it isn't abstracted. […] This depends lot on context. On one end you have pile copypasted of visual basic code that could be easily refactored into tenth of its size. On opposite end of spectrum you have piece of haskell code where everything is abstracted and each abstraction is wrong in some way or another. Main reason of later is functional fixedness. A haskell programmer will see a structure as a monad but then does not see more apropriate abstractions. This is mainly problematic when there are portions of code that are very similar but only by chance and each requires different treatment. You merge them into one function and after some time this function ends with ten parameters. I have evidence for this poisonous belief. Here is some production C++ code I saw: if (condition1) { if (condition2) { // some code } } instead of if (condition1 condition2) { // some code } - void latin1_to_utf8(std::string s); Let me guess. They do it to save cycles caused by allocation of new string. instead of std::string utf8_of_latin1(std::string s) or std::string utf8_of_latin1(const std::string s) - (this one is more controversial) Foo foo; if (condition) foo = bar; else foo = baz; instead of Foo foo = condition ? bar : baz; I think the root cause of those three examples can be called step by step thinking. Some people just can't deal with abstractions at all, not even functions. They can only make procedures, which do their thing step by step, and rely on global state. (Yes, global state, though they do have the courtesy to fool themselves by putting it in a long lived object instead of the toplevel.) The result is effectively a monster of mostly linear code, which is cut at obvious boundaries whenever `main()` becomes too long (too long generally being a couple hundred lines. Each line of such code _is_ highly legible, I'll give them that. The whole however would frighten even Cthulhu. Loup. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc -- The electricity substation in the car park blew up. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] SubScript website gone live: programming with Process Algebra
Please allow me to to blurb the following, which is related to several discussions at FONC: Our web site http://subscript-lang.org went officially live last Saturday. SubScript is a way to extend common programming languages, aimed to ease event handling and concurrency. Typical application areas are GUI controllers, text processing applications and discrete event simulations. SubScript is based on a mathematical concurrency theory named Algebra of Communicating Processes (ACP). ACP is a 30 year old branch of mathematics, as solid as numeric algebra and as boolean algebra. In fact, you can regard ACP as an extension to boolean algebra with 'things that can happen'. These items are glued together with operations such alternative, sequential and parallel compositions. This way ACP combines the essence of compiler-compilers and notions of parallelism. Adding ACP to a common programming language yields a lightweight alternative for threading concurrency. It also brings the 50 year old but still magic expressiveness of languages for parser generators and compiler compilers, so that SubScript suits language processing. The nondeterministic style combined with concurrency support happens to be very useful for programming GUI controllers. Surprisingly, ACP with a few extras even enables data flow style programming, like you have with pipes in Unix shell language. For instance, to program a GUI controller for a simple search application takes about 15 lines of code in Java or Scala, if you do threading well. In SubScript it is only 5 lines; see http://subscript-lang.org/examples/a-simple-gui-application/ At the moment SubScript is being implemented as an extension to the programming language Scala; other languages, such as C, C++, C#, Java and JavaScript, would be possible too. The current state of the implementation is mature enough for experimentation by language researchers, but not yet for real application development. If you have the Eclipse environment with the Scala plugin installed, it is easy to get SubScript running with the example applications from our Google Code project. We hope this announcement will raise interest from programming language researchers, and that some developers will get aboard on the project. In the second half of February 2013 we will very probably give a presentation and a hands on workshop at EPFL in Lausanne, the place where Scala is developed. We hope have a SubScript compiler ready then, branched from the Scala compiler scalac. A more detailed announcement will follow by the end of January on our site. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] Wrapping object references in NaN IEEE floats for performance (was Re: Linus...)
On 1/1/13 3:43 AM, BGB wrote: here is mostly that this still allows for type-tags in the references, but would likely involve a partial switch to the use of 64-bit tagged references within some core parts of the VM (as a partial switch away from magic pointers). I am currently leaning towards putting the tag in the high-order bits (to help reduce 64-bit arithmetic ops on x86). One idea I heard somewhere (probably on some Squeak-related list several years ago) is to have all objects stored as floating point NaN instances (NaN == Not a Number). The biggest bottleneck in practice for many applications that need computer power these days (like graphical simulations) usually seems to be floating point math, especially with arrays of floating point numberls. Generally when you do most other things, you're already paying some other overhead somewhere already. But multiplying arrays of floats efficiently is what makes or breaks many interesting applications. So, by wrapping all other objects as instances of floating point numbers using the NaN approach, you are optimizing for the typically most CPU intensive case of many user applications. Granted, there is going to be tradeoffs like integer math and so looping might then probably be a bit slower? Perhaps there is some research paper already out there about the tradeoffs for this sort of approach? For more background, see: http://en.wikipedia.org/wiki/NaN For example, a bit-wise example of a IEEE floating-point standard single precision (32-bit) NaN would be: s111 1axx where s is the sign (most often ignored in applications), a determines the type of NaN, and x is an extra payload (most often ignored in applications) So, information about other types of objects would start in that extra payload part. There may be some inconsistency in how hardware interprets some of these bits, so you'd have to think about if that could be worked around if you want to be platform-independent. See also: http://en.wikipedia.org/wiki/IEEE_floating_point You might want to just go with 64 bit floats, which would support wrapping 32 bit integers (including as pointers to an object table if you wanted, even up to probably around 52 bit integer pointers); see: IEEE 754 double-precision binary floating-point format: binary64 http://en.wikipedia.org/wiki/Binary64 does sometimes seem like I am going in circles at times though... I know that feeling myself, as I've been working on semantic-related generally-triple-based stuff for going on 30 years, and I still feel like the basics could be improved. :-) Meanwhile I'm going to think about Alan Kay's latest comments... --Paul Fernhout http://www.pdfernhout.net/ The biggest challenge of the 21st century is the irony of technologies of abundance in the hands of those thinking in terms of scarcity. ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Incentives and Metrics for Infrastructure vs. Functionality (eye tracking)
On 1/1/13 4:29 PM, Loup Vaillant-David wrote: On Tue, Jan 01, 2013 at 03:02:09PM -0600, BGB wrote: it is a question maybe of whether the programmer sees the forest or the trees. these sorts of things may well have an impact on the types of code a person writes, and what sorts of things the programmer finds more readable. Well, that could be tested. Let's write some code in a procedural way, and in a functional way. Show it to a bunch of programmers, and see if they understand it, spot the bugs, can extend it etc. I'm not sure what to expect from such tests. One could think most people would deal more easily with the procedural program, but on the other hand, I expect the procedural version will be significantly more complex, especially if it abides step by step aesthetics. This sounds like a great idea, and there are probably some PhDs to be had doing that (if it has not been done a lot already?). At least such research is starting though. Here is a related article about research using eye tracking software to find differences between how experts and novices look at code, with links to videos of eye movements: http://developers.slashdot.org/story/12/12/19/1711225/how-experienced-and-novice-programmers-see-code Here is a direct link to Michael Hansen's blog, who is a PhD student doing related research: http://synesthesiam.com/?p=218 As my fellow Ph.D. student Eric Holk talked about recently in his blog, I’ve been running eye-tracking experiments with programmers of different experience levels. In the experiment, a programmer is tasked with predicting the output of 10 short Python programs. A Tobii TX300 eye tracker keeps track of their eyes at 300 Hz, allowing me to see where they’re spending their time. I imagine the same approach could be useful to look into this issue, as a new way to quantify what different programmers are doing in different situations. Here is a link to question on eye tracking code, and form that I see that a search on eye tracking software opencv turns up a bunch of stuff, where the basic theory is that the pupils change shape depending on where the eyes are looking: http://stackoverflow.com/questions/8959423/opencv-eye-tracking-on-android In theory, the more real data we have on how people actually use software, the better we can make designs for new computing. Eye tracking is one way to collect a lot of that fairly quickly, and it can do that in a way that is much better than just recording what a user clicks on. I'm getting a Samsung Galaxy Note 10.1 tablet as a next step towards a Dynabook. I chose that one mostly because it comes with a pressure sensitive pen as an input device. Apparently, that tablet still is not as good as a fifteen year old Apple Newton in some ways, and so is yet another example of technology regressing for reasons of strong copyright and generational turn-over. Related: http://myapplenewton.blogspot.com/2012/10/apple-newton-still-beats-samsung-galaxy.html http://myapplenewton.blogspot.com/2012/12/apple-newton-replacement-candidate.html But I mention that tablet because one other feature is that the Samsung tablet supposedly uses the built-in front-facing camera to look to see whether the user's pupils are visible. If the tablet can't see the users pupils, then it goes into power saving mode. Of course, that is probably one of the first features I'll turn off in case it got hacked. :-) Still, there is a lot of possible promise there perhaps where the tablet could provide information or functionality more effectively somehow using that information about where I was looking. I've also heard you can make a fairly cheap eye tracker with a pair of glasses and some infrared LEDs and receivers, which sounds less intrusive than using cameras. Anyway, I feel it is quite possible what would be found from that sort of eye tracking research on programmers is that, as in other domains of life, people have various characteristics, preferences, habits, skills, and so on at some particular time in their life, and those can be strengths or weaknesses depending on the context. A big part of whether a programmer is productive probably has to do with whether they are in flow, which in turn depends on how their current abilities relate to the current task (which is why many game developers make levels of their games progressively harder as players get better at them). So, even if we find that some programmers look at code differently than others based on experience or aptitude, that still does not mean that there is likely to be one type of programmer who is going to solve all the worlds programming problems, and nor would that mean that there is one kind of IDE that would satisfy all programmers at all stages of their careers. (Again, DrScheme/PLTScheme/Racket's language levels are a step towards this.) Many of those programmers best at abstraction and with years of experience might probably just
Re: [fonc] Current topics
My thinking has been going the other way for some time now. I see the problem as the need to build bigger systems than any individual can currently imagine. The real value from computers isn#39;t just collecting the input from a single person, but rather #39;combining#39; the inputs from huge groups of people. It#39;s that ability to unify and harmonize our collective knowledge that gives us a leg up on being able to rationalize our rather over-complicated world. The problem I see with components, partically a small set of large ones, is that as the size of a formal system increases, the possible variations explode. That is, if we consider a nearly trival small set of primitives, there are several different possible decompositions. As the size of the system grows, the number of decompositions grows probably exponentially or better. Thus as we walk up the levels of abstraction to something higher, there becomes a much larger set of possibilities. If what we desire is beyond any individuals comprehension, and there is a huge variance in the pieces that will get created, then we#39;ll run into considerable problems when we try to bring all of these pieces together. That I think is esentially where we are currently. My sense of the problem is to go the other way. To make the peices so trivial that they can be combined easily. It may sound labour intensive to bring it all together, but then we do have the ability of computers themselves to spend endless hours doing mundane chores for us. The trick then would be to engage as many people as possible in constructing these little pieces, then bring them all together. In a design sense, this is not substantally different than the Internet, or Wikipedia. These both grew organically out of relatively small pieces with minimal organization, yet somehow converged on an end-product that is considerably larger than any individual#39;s single effort. Paul.___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Current topics
Read this guy! On Tue, Jan 1, 2013 at 7:53 AM, Alan Kay alan.n...@yahoo.com wrote: The most recent discussions get at a number of important issues whose pernicious snares need to be handled better. In an analogy to sending messages most of the time successfully through noisy channels -- where the noise also affects whatever we add to the messages to help (and we may have imperfect models of the noise) -- we have to ask: what kinds and rates of error would be acceptable? We humans are a noisy species. And on both ends of the transmissions. So a message that can be proved perfectly received as sent can still be interpreted poorly by a human directly, or by software written by humans. A wonderful specification language that produces runable code good enough to make a prototype, is still going to require debugging because it is hard to get the spec-specs right (even with a machine version of human level AI to help with larger goals comprehension). As humans, we are used to being sloppy about message creation and sending, and rely on negotiation and good will after the fact to deal with errors. We've not done a good job of dealing with these tendencies within programming -- we are still sloppy, and we tend not to create negotiation processes to deal with various kinds of errors. However, we do see something that is actual engineering -- with both care in message sending *and* negotiation -- where eventual failure is not tolerated: mostly in hardware, and in a few vital low-level systems which have to scale pretty much finally-essentially error-free such as the Ethernet and Internet. My prejudices have always liked dynamic approaches to problems with error detection and improvements (if possible). Dan Ingalls was (and is) a master at getting a whole system going in such a way that it has enough integrity to exhibit its failures and allow many of them to be addressed in the context of what is actually going on, even with very low level failures. It is interesting to note the contributions from what you can say statically (the higher the level the language the better) -- what can be done with meta (the more dynamic and deep the integrity, the more powerful and safe meta becomes) -- and the tradeoffs of modularization (hard to sum up, but as humans we don't give all modules the same care and love when designing and building them). Mix in real human beings and a world-wide system, and what should be done? (I don't know, this is a question to the group.) There are two systems I look at all the time. The first is lawyers contrasted with engineers. The second is human systems contrasted with biological systems. There are about 1.2 million lawyers in the US, and about 1.5 million engineers (some of them in computing). The current estimates of programmers in the US are about 1.3 million (US Dept of Labor counting programmers and developers). Also, the Internet and multinational corporations, etc., internationalizes the impact of programming, so we need an estimate of the programmers world-wide, probably another million or two? Add in the *ad hoc* programmers, etc? The populations are similar in size enough to make the contrasts in methods and results quite striking. Looking for analogies, to my eye what is happening with programming is more similar to what has happened with law than with classical engineering. Everyone will have an opinion on this, but I think it is partly because nature is a tougher critic on human built structures than humans are on each other's opinions, and part of the impact of this is amplified by the simpler shorter term liabilities of imperfect structures on human safety than on imperfect laws (one could argue that the latter are much more of a disaster in the long run). And, in trying to tease useful analogies from Biology, one I get is that the largest gap in complexity of atomic structures is the one from polymers to the simplest living cells. (One of my two favorite organisms is *Pelagibacter unique*, which is the smallest non-parasitic standalone organism. Discovered just 10 years ago, it is the most numerous known bacterium in the world, and accounts for 25% of all of the plankton in the oceans. Still it has about 1300+ genes, etc.) What's interesting (to me) about cell biology is just how much stuff is organized to make integrity of life. Craig Ventor thinks that a minimal hand-crafted genome for a cell would still require about 300 genes (and a tiniest whole organism still winds up with a lot of components). Analogies should be suspect -- both the one to the law, and the one here should be scrutinized -- but this one harmonizes with one of Butler Lampson's conclusions/prejudices: that you are much better off making -- with great care -- a few kinds of relatively big modules as basic building blocks than to have zillions of different modules being constructed by vanilla programmers. One of my
Re: [fonc] Wrapping object references in NaN IEEE floats for performance (was Re: Linus...)
On 1/1/2013 6:36 PM, Paul D. Fernhout wrote: On 1/1/13 3:43 AM, BGB wrote: here is mostly that this still allows for type-tags in the references, but would likely involve a partial switch to the use of 64-bit tagged references within some core parts of the VM (as a partial switch away from magic pointers). I am currently leaning towards putting the tag in the high-order bits (to help reduce 64-bit arithmetic ops on x86). One idea I heard somewhere (probably on some Squeak-related list several years ago) is to have all objects stored as floating point NaN instances (NaN == Not a Number). The biggest bottleneck in practice for many applications that need computer power these days (like graphical simulations) usually seems to be floating point math, especially with arrays of floating point numberls. Generally when you do most other things, you're already paying some other overhead somewhere already. But multiplying arrays of floats efficiently is what makes or breaks many interesting applications. So, by wrapping all other objects as instances of floating point numbers using the NaN approach, you are optimizing for the typically most CPU intensive case of many user applications. Granted, there is going to be tradeoffs like integer math and so looping might then probably be a bit slower? Perhaps there is some research paper already out there about the tradeoffs for this sort of approach? I actually tried this already... I had borrowed the idea originally off of Lua (a paper I was reading talking about it mentioned it as having been used in Lua). the problems were, primarily on 64-bit targets: my other code assumed value-ranges which didn't fit nicely in the 52-bit mantissa; being a NaN obscured the pointers from the GC; it added a fair bit of cost to pointer and integer operations; ... granted, you only really need 48 bits for current pointers on x86-64, the problem was that other code had been already assuming using a 56-bit tagged space when using pointers (spaces), leaving a little bit of a problem of 5652. so, everything was crammed into the mantissa somewhat inelegantly, and the costs regarding integer and pointer operations made it not really an attractive option. all this was less of an issue with 32-bit x86, as I could essentially just shove the whole pointer into the mantissa (spaces and all), and the GC wouldn't be confused by the value. basically, what spaces is, is that a part of the address space will basically be used and divided up into a number of regions for various dynamically typed values (the larger ones being for fixnum and flonum). on 32-bit targets, spaces is 30 bits, and located between the 3GB and 4GB address mark (which the OS generally reserves for itself). on x86-64, currently it is a 56-bit space located at 0x7F00_. For more background, see: http://en.wikipedia.org/wiki/NaN For example, a bit-wise example of a IEEE floating-point standard single precision (32-bit) NaN would be: s111 1axx where s is the sign (most often ignored in applications), a determines the type of NaN, and x is an extra payload (most often ignored in applications) So, information about other types of objects would start in that extra payload part. There may be some inconsistency in how hardware interprets some of these bits, so you'd have to think about if that could be worked around if you want to be platform-independent. See also: http://en.wikipedia.org/wiki/IEEE_floating_point You might want to just go with 64 bit floats, which would support wrapping 32 bit integers (including as pointers to an object table if you wanted, even up to probably around 52 bit integer pointers); see: IEEE 754 double-precision binary floating-point format: binary64 http://en.wikipedia.org/wiki/Binary64 yep... my current tagging scheme partly incorporates parts of double, mostly in the sense that some tags were chosen mostly such that a certain range of doubles could be passed through unmodified and with full precision. the drawback is that 0 is special, and I haven't yet thought up a good way around this issue. admittedly I am not entirely happy with the handling of fixnums either (more arithmetic and conditionals than I would like). here is what I currently have: http://cr88192.dyndns.org:8080/wiki/index.php/Tagged_references does sometimes seem like I am going in circles at times though... I know that feeling myself, as I've been working on semantic-related generally-triple-based stuff for going on 30 years, and I still feel like the basics could be improved. :-) yes. well, in this case, it is that I have bounced back and forth between tagged-references and magic pointers multiple times over the years. granted, this would be the first time I am doing so using fixed 64-bit tagged references. granted, on x86-64, I will probably end up later merging a lot of this back into the
Re: [fonc] Current topics
Inline. On Tue, Jan 1, 2013 at 7:53 AM, Alan Kay alan.n...@yahoo.com wrote: The most recent discussions get at a number of important issues whose pernicious snares need to be handled better. In an analogy to sending messages most of the time successfully through noisy channels -- where the noise also affects whatever we add to the messages to help (and we may have imperfect models of the noise) -- we have to ask: what kinds and rates of error would be acceptable? Depends on the context, I'm sure. When I'm solving a Project Euler problem in Squeak, my heart isn't broken if I manage to bork the image, because I'm writing the code to throw it away, and nothing depends on it but my own flights of fancy. A missile guidance system, eh, well now that's something I'd like a bit more well tested. Etc. We humans are a noisy species. And on both ends of the transmissions. So a message that can be proved perfectly received as sent can still be interpreted poorly by a human directly, or by software written by humans. A wonderful specification language that produces runable code good enough to make a prototype, is still going to require debugging because it is hard to get the spec-specs right (even with a machine version of human level AI to help with larger goals comprehension). Makes me think of debugging grammars. The grammar is quite specified, but the specification is deceptively complex, recursive. Hard to hold in the lobes all at once. As humans, we are used to being sloppy about message creation and sending, and rely on negotiation and good will after the fact to deal with errors. We've not done a good job of dealing with these tendencies within programming -- we are still sloppy, and we tend not to create negotiation processes to deal with various kinds of errors. Contracts. I think I might grok how we arrived upon the law metaphor. However, we do see something that is actual engineering -- with both care in message sending *and* negotiation -- where eventual failure is not tolerated: mostly in hardware, and in a few vital low-level systems which have to scale pretty much finally-essentially error-free such as the Ethernet and Internet. I had a manager once who said The reason what we do isn't engineering is people aren't dying from it often enough. Bridge collapses with people on it, career over. Kernel panic? Tell them to reboot, and ship a hot fix as soon as possible. My prejudices have always liked dynamic approaches to problems with error detection and improvements (if possible). Dan Ingalls was (and is) a master at getting a whole system going in such a way that it has enough integrity to exhibit its failures and allow many of them to be addressed in the context of what is actually going on, even with very low level failures. It is interesting to note the contributions from what you can say statically (the higher the level the language the better) -- what can be done with meta (the more dynamic and deep the integrity, the more powerful and safe meta becomes) -- and the tradeoffs of modularization (hard to sum up, but as humans we don't give all modules the same care and love when designing and building them). Right. Again, the missile guidance system (ironically?) gets more love than my solutions to Project Euler problems. Mix in real human beings and a world-wide system, and what should be done? (I don't know, this is a question to the group.) Don't panic:) There are two systems I look at all the time. The first is lawyers contrasted with engineers. The second is human systems contrasted with biological systems. There are about 1.2 million lawyers in the US, and about 1.5 million engineers (some of them in computing). The current estimates of programmers in the US are about 1.3 million (US Dept of Labor counting programmers and developers). Also, the Internet and multinational corporations, etc., internationalizes the impact of programming, so we need an estimate of the programmers world-wide, probably another million or two? Add in the *ad hoc* programmers, etc? The populations are similar in size enough to make the contrasts in methods and results quite striking. Looking for analogies, to my eye what is happening with programming is more similar to what has happened with law than with classical engineering. Everyone will have an opinion on this, but I think it is partly because nature is a tougher critic on human built structures than humans are on each other's opinions, and part of the impact of this is amplified by the simpler shorter term liabilities of imperfect structures on human safety than on imperfect laws (one could argue that the latter are much more of a disaster in the long run). Yeah, the short term liabilities, and the yelling executives interfering with the process. Also, being able to retroactively fix DOA systems remotely produces weird effects that are hard to think about naturally, e.g., working