[Readable-discuss] I-expressions issues

David A. Wheeler Wed, 26 Dec 2007 19:28:05 -0800

Scheme SFRI-49 ( http://srfi.schemers.org/srfi-49/srfi-49.html ) defines 
I-expressions, invented by Egil Möller.  It's a great idea, but as I noted 
earlier, I've found some problems while using it (with either the spec and/or 
the sample implementation).  Now that Egil Möller is on the list, he can 
perhaps explain why they aren't problems (just misunderstandings or 
disagreements), or if they ARE, help fix them.



* Multiple blank lines become () in the implementation, and I believe that's 
per spec. E.G.:
x
<Blank line>
<Blank line>
<Blank line>
y
maps to x, (), (), (), y.
This is surprising and ugly.  I propose solving this by ignoring leading blank 
lines (which MAY have comments) at the top level, so this becomes x later 
followed by y.


* Sample implementation doesn't always handle inline comments correctly. E.G.:
a ; hi
 b ; there
q

Both with and without a blank line before "q" produces (a b q), instead of the 
expected (a b) followed later by q.


* Sample implementation has various other bugs.
I've fixed the following bugs in the sample implementation:
; * 2007-10-15 David A. Wheeler <dwheeler at dwheeler dot com>
;   - Changed "t" to "#t" (t is Common Lisp, #t is Scheme)
; * 2006-06-10 David A. Wheeler <dwheeler at dwheeler dot com>
;   - Changed eq? to eqv? when comparing characters.
;     The "eq?" operator is not guaranteed to work
;     for comparing characters, or in comparing with end-of-file,
;     in the R5RS specification.
;   - sugar-load now calls sugar-read, not read;
;     that way, even if sugar has not been enabled, sugar-load will
;     correctly use sugar-read to read the contents.


* Implementation doesn't work on Guile 1.8.1.
I haven't tracked this down; some change in guile causes the sample code to 
fail.


* Spec unclear about indentation issues of INITIAL line.  The spec hides 
indentation issues into the indentation tokens, glossing over some stuff.  
Which can lead to some surprises.  The sample implementation maps this:
    a b
    c d
into (a b (c d)), and NOT into the possibly-expected pair of lists (a b) 
followed later by (c d).

The implementation appears to skip any spaces/tabs on the first non-empty line. 
 Then, any following lines that are indented AT ALL are considered indented - 
so the NEXT line to see how much indentation is used for the first-level indent.

This is actually justifiable.  The implementation could do a LITTLE better in 
theory - it could carefully count out the whitespace of the INITIAL expression, 
and use THAT.  But unless the reader maintains per-port hidden data (a bad 
idea), this information will be lost when returning the object to the caller, 
so if the next line isn't on the left edge and there wasn't an intervening 
blank line, this information will be lost anyway.

I think this is actually reasonable for the implementation, but I suspect that 
few people will understand this just by reading the current spec.  It should be 
made clearer.

Interestingly, indentation of a topmost entry is an error in Python. I'm not 
sure that would be a good idea here, since doing so would make an 
indentation-reader less capable of reading existing files.  It COULD notice 
indentation of the initial line and turn off indentation processing when it 
occurs; that would make it more backwards-compatible, but with a serious risk 
of making it easy to insert mistakes through unintended indents. A special case 
like "indentation of whitespace and the first character is '(' means 
indentation processing is disabled" could reduce that risk somewhat, but that's 
obviously more complex and any rule that complicated has its own problems.  I 
note this so we can brainstorm further.

* The spec has loops.  In particular, it has these productions:
expr -> head
 and
head-> expr
which in theory could spin forever without consuming anything.


* In general, is there a way to simplify the spec and/or implementation?
I'd like the spec AND the implementation to be "obviously correct", so that 
people will be willing to trust it.


* Is there any way to make it even more compatible with existing files or to 
make it easier to use immediately on the command line?
Currently, on the command line it is easy to get "out of sync" (you press Enter 
and see the results from a PREVIOUS command).  Adding support for blank lines 
(Enter Enter) would help, but additional improvements in either attribute would 
be great.

* Are there other potential issues with I-expressions?

I think I-expressions are a great idea, but I think they can be improved on.  
So let's discuss what can be done.

--- David A. Wheeler

[Readable-discuss] I-expressions issues

Reply via email to