New Layout Rule

2006-12-08 Thread John Meacham
Motivated by some recent discussion, I thought I would explore the
possibilty of formalizing the haskell layout rule without the dreaded
parse-error clause, as in, one that can be completly handled by the
lexer.

motivated by that I have written a little program that takes a haskell
file with layout on stdin and spits out one without layout on stdout.

it can be gotten here:
darcs get http://repetae.net/repos/getlaid/

the code is designed to make the layout algorithm completly transparent,
so that we might experiment with it. The function layout in 'Layout.hs'
is the single and complete layout algorithm and the only thing that need
be modified by experimentors.

I have come up with a simple improvement to the algorithm given in the
paper that seems to catch a very large number of layouts. basically,
whenever it comes across something that must come in matched pairs (, ), case
of, if then. it pushes a special context onto the stack, when it comes
across the closing token, it pops every layout context down to the
special context.

there is a special case for in that causes it to pop only up to the
last context created with a let, but not further.

here is the complete algorithm (with my modification, sans the
parse-error rule):
 
 data Token = Token String | TokenVLCurly String !Int | TokenNL !Int 
 deriving(Show)
 
 data Context = NoLayout | Layout String !Int

 -- the string on 'Layout' and 'TokenVLCurly' is the token that
 -- created the layout, always one of where, let, do, or of
 
 layout :: [Token] - [Context] - [Token]
 layout (TokenNL n:rs) (Layout h n':ls)
 | n == n' = semi:layout rs (Layout h n':ls)
 | n  n' = layout rs (Layout h n':ls)
 | n  n' = rbrace:layout (TokenNL n:rs) ls
 layout (TokenNL _:rs) ls = layout rs ls
 layout (TokenVLCurly h n:rs) (Layout h' n':ls)
 | n = n' = lbrace:layout rs (Layout h n:Layout h' n':ls)
 | otherwise = error inner layout can't be shorter than outer one
 layout (TokenVLCurly h n:rs) ls = lbrace:layout rs (Layout h n:ls)
 layout (t@(Token s):rs) ls | s `elem` fsts layoutBrackets = t:layout rs 
 (NoLayout:ls)
 layout (t@(Token s):rs) ls | s `elem` snds layoutBrackets = case ls of
 Layout _ _:ls - rbrace:layout (t:rs) ls
 NoLayout:ls - t:layout rs ls
 [] - error $ unexpected  ++ show s
 layout (t@(Token in):rs) ls = case ls of
 Layout let n:ls - rbrace:t:layout rs ls
 Layout _ _:ls - rbrace:layout (t:rs) ls
 ls - t:layout rs ls
 layout (t:rs) ls = t:layout rs ls
 layout [] (Layout _ n:ls) = rbrace:layout [] ls
 layout [] [] = []

 layoutBrackets = [ (case,of), (if,then), 
((,)), ([,]), ({,}) ]

now. there are a few cases it doesn't catch. the hanging case at the end
of a guard for instance, I believe this can be solved easily by treating 

'|' and '='  as opening and closing pairs in lets and wheres
'|' and '-' as opening and closing pairs in case bodies.

it is easy to see which one you are in by looking at the context stack.


commas are trickier and are the only other case I think we need to
consider.

I welcome people to experiment and send patches or brainstorm ideas, I
have what I believe is a full solution percolating in my head, but am
unhappy with it, I am going to sleep on it and see if it crystalizes by
morning. In the meantime, perhaps someone can come up with something
more elegant for dealing with the remaining cases. or at least find some
real programs that this code breaks down on!

(bug fixes for the lexer and everything are very much welcome. it will
probably choke on some ghc extensions that would be trivial to add to
the alex grammar)

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: New Layout Rule

2006-12-08 Thread John Meacham
On Fri, Dec 08, 2006 at 03:26:30PM +, Ian Lynagh wrote:
 On Fri, Dec 08, 2006 at 02:33:47AM -0800, John Meacham wrote:
  Motivated by some recent discussion, I thought I would explore the
  possibilty of formalizing the haskell layout rule without the dreaded
  parse-error clause, as in, one that can be completly handled by the
  lexer.
 
 There was some discussion about that a while ago on this list, e.g.
 http://www.haskell.org/pipermail/haskell-prime/2006-March/000915.html
 and other subthreads in that thread.
 
 I'd still love to see a replacement which can be a separate phase
 between lexing and parsing, even if it means we need to lay some things
 out differently or tweak other bits of the syntax.

let isn't an issue (at least not for the reason specified in that
mail). It is taken care of properly in the version I posted. the trick
is to annotate each layout context with what caused it to occur. when
you reach an in rather than popping up to the most recent NoLayout
(as you would with a bracket) you pop up to the most recent layout
context that was started with a let. (if such a context doesn't exist,
it is a syntax error)

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime