Arne Babenhauserheide: > I've been pondering why using ! for indentation, \\ for empty and $ > for sublist feels so alien to me, and today I finally found an answer, > when I looked into letter distributions[^1].
An interesting approach! > # Firstoff: The problem of brackets in lisp. > > This has mostly been discussed to death, so I'll just write what I can > add to the discussion. > > 1. Brackets are rare charakters in normal writing: about 0.2% of the > charakters in normal text are brackets. So the code feels strange, > because the distribution of letters is too different from normal > text. I don't think that "normal writing" is the best model for comparison. Instead, I would use as the model "typical programs in widely-used programming languages". Also, I think word (not character) analysis would be be the better comparison. I wonder if your analysis would different in that case. That said, I'm always delighted to see quantitative analysis, so let's discuss given the numbers available here. > 2. In lisp brackets are the first letter, but people tend to remember > words by their first letter. Agreed. Again, I think the better model is "typical programs in widely used programming languages", or at least mathematics (since people spend 10-20 years using math notation in school, and decades afterwards using it). > The second is fixed by neoteric expressions: You can now use the first > letter of the function you call as the first non-whitespace letter of > the line of code. :-). > To fix the first, the use of brackets has to be reduced - or replaced > by letters which are more common in normal text. > > # Common letters > > I now took my list of letters and looked at the distribution of > letters which are suitable as control charakters (not used in normal > words). Then I grouped them such that the groups differ by roughly a > factor of two in occurrance frequency in normal text. ... > To make this quantitative: Take the syntax needed to express > something. Divide the occurance frequency of each letter in the syntax > by the frequency of a normal letter. Then multiply the results. If > this is higher than the syntax we want to replace, then we win > something. > > The mean occurance frequency for the 20 most frequent characters as about 5%: > import numpy > numpy.mean(numpy.array(numbers[:20])/ sum(numbers)) > > () is (0.2% / 5%)**2 = 0.0016 > $ is 6.64e-5 / 0.05 = 0.0013 > > So by using a sublist with $ instead of a (b c), we actually make our > code less similar to normal text. > > \\ is even worse: (1.05e-8 / 0.05)**2 = (2.1e-7)**2 = 4.4e-14 > > We replace (()) with it, which is 0.0016**2 = 2.56e-06, but it is > still a huge net loss of similarity to normal text. Interesting approach, but there's another constraint that I'm trying to deal with, and this analysis omits: existing programs. If we use a marker that's already being significantly used in existing programs, then it's harder to transition to the notation and easier to make mistakes (because now you have to escape their use). So we actually have conflicting requirements: We want "familiar" markers, preferably markers with the same semantics used in other programming languages, but NOT markers already in significant use in existing Scheme code. The argument for the spelling "$" is actually really straightforward: it's already used, with that meaning, in Haskell. Anybody who's learned Haskell already knows what that symbol means.. and that's a good argument for its spelling. Similarly, many people avoid using "\\" because of slashification issues... which means that it works fine for us, because it's unlikely to be in existing code. Also, the "\\" has a visual "slant down to the right" that makes it useful for GROUP, and its similarity to "\" of other languages makes it useful for SPLIT. That said, we can certainly consider other markers. > ! for indentation on the other hand is 0.00036 / 0.05 = 0.007 which is > a net win of a factor of 4.6 over 2 brackets. > > Without the ! it would be over 1/0.0016, though (but since it is > optional, I can live with it - just don't use it in code-examples > where it is not needed). ... > Essentially this all boils down to only using the most common special > characters when trying to improve the readability of lisp: > > \n > ., > -()":' > /*= > <>?!\; > % > > So, please rethink using \\ and $ for groups and sublist. Actually the most > common available letters for that are: > > ,': > > I left out the . for groups, because I found a usecase which it would > break: (let (((genvariable) name))). Handling "." is already complicated, you REALLY don't want to overload it any more. > Also I left out the *, because it is needed in curly infix. Well, {...} disables these markers, so that's not a problem given the current semantics. But I wouldn't want to use "*" because it's useful to be able to multiply a bunch of complex constructs by doing this: * ! calculate1 ... ! calculate2 ... ... > But , ' and : on their own, not at the beginning of a line and > surrounded by whitespace have no meaning in lisp (to the best of my > knowledge). Two of them do have problems. The ' is quote, and , is unquote. These abbreviations already have specific meanings when followed by text, and in fact, we handle them specially when at the beginning of a line to copy SRFI-49. SRFI-49 has the clever idea that, at the beginning of a line, "abbreviation hspace" applies the abbreviation to the entire construct. This makes it easy to use abbreviations without adding new vertical lines (vertical lines are precious, because screens are typically wider than taller as measured in characters). And if it's not at the beginning of a line, the whitespace should be ignored, because that's how they are interpreted inside a list (we want to minimize surprises). A lone ":" could, as far as I know, be used as a marker. I kind-of hate to use it that way, though, because ":" is a pretty plausible user symbol for various operations. While "$" already has a history for the meaning currently given, ":" does not. Spelling the GROUP_SPLIT operator as ":" instead of "\\" does have some appeal, in particular, it does look reasonable as the SPLIT operator. But I worry about making another single-character symbol unavailable to users; it's pretty unlikely that they're using \\, and one-character symbols are really handy. --- David A. Wheeler ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss