Re: The dreaded layout rule
Simon Marlow wrote: Does it mean that the following expressions would be illegal? if cond then do proc1; proc2 else do proc3; proc4 (case e of Just x - x 0; Nothing - False) Unfortunately, yes. Now one can forget about {} and use layout everywhere. He would no longer be able to forget or he would have to split some expressions into indented lines, even when they are unambiguous in one line. You could just enumerate all keywords that allow/enforce insertion of }. A suitable list for Haskell 98 might be: in where ) ] module type data newtype class instance default In fact I think that this would be the cleanest and simplest rule. (At least that is how I once implemented layout similar to Haskell's, because I couldn't get Yacc's error productions to work properly in all cases). For Haskell 2(000) I would suggest removing all but the first 4 tokens from the list above. - Andreas -- Andreas Rossberg, [EMAIL PROTECTED] :: be declarative. be functional. just be. ::
Re: The dreaded layout rule
I wrote: lexeme - qvarid | qconid | qvarsym | qconsym | literal | special | reservedop | reservedid Now we could replace qvarsym and qconsym by qop, and have both examples parse in the same way. However, unlike the other change in lexeme's definition, I don't suggest this, I only want to point out that there is a (formally) simple way out of the present somewhat inconsistent state. I changed my mind about this issue, I do suggest to change it as proposed, for if `elem` were three lexemes, any whitespace between them would be allowed. This might even be considered a typo, as I think no one intended to allow expressions like x ` {- look ma -} elem -- comments inside! ` l All the best, Christian Sievers
Re: The dreaded layout rule
As an author of an Haskell Emacs mode that deals with the layout rule (described in Journal of Functional Programming 8(5) 493-502), I strongly agree that the "parse-error condition" is really a bad idea. For example, in Emacs, no full Haskell parse is done. After all, layout should be there to indicate clearly to a user what section of code depends on which other; the user should not have to parse and deal with some local fixity declarations. I know this suggestion would break a few Haskell programs but perhaps it should be interesting to come back to the first functional language that implemented the layout rule, Miranda (tm) where the rule was much more simply stated Syntactic objects obey Landin's offside rule. This requires that every token of the object lie directly or to the right of its first token. A token which breaks this rule is said to offside with respect to that object. And that's it... no need to have three pages of explanations and an appendix. One can find many examples where Haskell rules and Miranda differ and some times one is better than the other, but you would be surprised to see that in the majority of the cases, the indentation that people normally produce are very similar under both rules. Guy Lapalme Université de Montréal PS: as a Quiz, can you guess how in Haskell the following is interpreted? f x = 1 + x g y = 1 + y
RE: The dreaded layout rule
Simon Marlow [EMAIL PROTECTED] wrote, Does anybody disagree with my interpretation of the standard? Are there any implementations that actually follow the standard here? (Maybe the standard should be changed to follow the implementations in this area.) Phew. Well spotted. Of course, none of the existing Haskell implementations are in conformance here. I think this has just about convinced me that the parse-error condition is a really bad idea. It definitely is. The main reason for its inclusion was to allow things like let f x = x in ... and also to automatically insert the final '}' before the end of file. Perhaps the layout rule should be restricted to these two cases? Proposal: - replace t by 'in' in the parse-error rule. EOF is already handled by the last clause in the layout spec. My guess is that this would break very few programs. The problem with this fix is that layout for let-in and other grouping constructs would be handled differently, which is not very intuitive. The problem with the `do'-notation also appears for `case'-constructs. A simpler rule might involve automatically inserting '}' before 'in' during lexical analysis iff (a) we're in a layout context and (b) the close brace hasn't already been inserted by the layout rule. This would decouple the parser and lexer which is a Good Thing. Still not intuitive, but from a syntactical point of view much more preferable. But you want to change (b) to ``the last lexeme was no close brace'' (it may have been explicitly inserted by the programmer. IMHO, in the long run (= next version of Haskell) this rule should completely vanish. So, who is collecting the proposals for the next Haskell standard? Cheers, Manuel
RE: The dreaded layout rule
Does it mean that the following expressions would be illegal? if cond then do proc1; proc2 else do proc3; proc4 (case e of Just x - x 0; Nothing - False) Unfortunately, yes. Now one can forget about {} and use layout everywhere. He would no longer be able to forget or he would have to split some expressions into indented lines, even when they are unambiguous in one line. Hmm, the `do x == y == z' case is a real trouble. Would it be not too ugly to formalize the current common behavior as something like "for the purposes of layout resolution, the syntax does not care about fixity declarations"? I guess that treating them in any way at this stage, as long as they don't reject non-associative operators, would yield the same result... Ugly but practical. One other possible solution is to remove the fixity resolution from the grammar itself and describe it as a separate process post-parsing. This is probably a good thing anyway: it matches the way most implementations work and it would clean up the grammar. Cheers, Simon
Re: The dreaded layout rule
Simon Peyton-Jones [EMAIL PROTECTED] writes: In other words, it is a bug (and GHC and Hugs don't do it right - see my previous message; from your comment, I presume HBC also doesn't follow the definition). I think, the only Right Thing is to remove this awful rule (unless somebody comes up with a rule that can be decided locally). Maybe so. But (H98 editors hat on) this is more than a "typo". I am surprised! ;-) It's a Haskell 2 issue. Perhaps there will be no fully conforming H98 compilers! Perhaps it would be a reasonable Haskell 1.6 issue? Wolfram
RE: The dreaded layout rule
In other words, it is a bug (and GHC and Hugs don't do it right - see my previous message; from your comment, I presume HBC also doesn't follow the definition). I think, the only Right Thing is to remove this awful rule (unless somebody comes up with a rule that can be decided locally). Maybe so. But (H98 editors hat on) this is more than a "typo". It's a Haskell 2 issue. Perhaps there will be no fully conforming H98 compilers! Simon
Re: The dreaded layout rule
| How about the Carl Witty's | | do a == b == c | | does NHC handle this correctly? It matches ghc and Hugs, reporting Error when renaming: Infix operator at 2:21 is non-associative. Note that this is reported one stage *after* parsing. Because parsing of infix operators is difficult, all implementations (to my knowledge) leave resolution of fixity and associativity until later. Indeed, the Haskell 98 standard recognises this (in an oblique way) by permitting infix decls to appear *after* the first use. Hence, it is now impossible to resolve fix/assoc in a single pass anyway. Regards, Malcolm
Re: The dreaded layout rule
Malcolm Wallace wrote: Because parsing of infix operators is difficult, all implementations (to my knowledge) leave resolution of fixity and associativity until later. Indeed, the Haskell 98 standard recognises this (in an oblique way) by permitting infix decls to appear *after* the first use. Hbc does the resolution while parsing, which means it cannot be Haskell 98 compliant. I really dislike the new rules about where infix can occur since they don't really buy us anything, but they do force a more complicated implementation. (Just as an aside. Local infix declarations, which was just added to Haskell 98, was very high on the list of design mistakes in SML and will be removed in ML2000.) -- -- Lennart
RE: The dreaded layout rule
Does anybody disagree with my interpretation of the standard? Are there any implementations that actually follow the standard here? (Maybe the standard should be changed to follow the implementations in this area.) Phew. Well spotted. Of course, none of the existing Haskell implementations are in conformance here. I think this has just about convinced me that the parse-error condition is a really bad idea. The main reason for its inclusion was to allow things like let f x = x in ... and also to automatically insert the final '}' before the end of file. Perhaps the layout rule should be restricted to these two cases? Proposal: - replace t by 'in' in the parse-error rule. EOF is already handled by the last clause in the layout spec. My guess is that this would break very few programs. A simpler rule might involve automatically inserting '}' before 'in' during lexical analysis iff (a) we're in a layout context and (b) the close brace hasn't already been inserted by the layout rule. This would decouple the parser and lexer which is a Good Thing. Cheers, Simon
Re: The dreaded layout rule
Fri, 30 Jul 1999 05:12:51 -0700, Simon Marlow [EMAIL PROTECTED] pisze: The main reason for its inclusion was to allow things like let f x = x in ... and also to automatically insert the final '}' before the end of file. Perhaps the layout rule should be restricted to these two cases? Does it mean that the following expressions would be illegal? if cond then do proc1; proc2 else do proc3; proc4 (case e of Just x - x 0; Nothing - False) Now one can forget about {} and use layout everywhere. He would no longer be able to forget or he would have to split some expressions into indented lines, even when they are unambiguous in one line. Hmm, the `do x == y == z' case is a real trouble. Would it be not too ugly to formalize the current common behavior as something like "for the purposes of layout resolution, the syntax does not care about fixity declarations"? I guess that treating them in any way at this stage, as long as they don't reject non-associative operators, would yield the same result... Ugly but practical. -- __("Marcin Kowalczyk * [EMAIL PROTECTED] http://kki.net.pl/qrczak/ \__/ GCS/M d- s+:-- a22 C++$ UL++$ P+++ L++$ E- ^^W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP-+ t QRCZAK 5? X- R tv-- b+++ DI D- G+ e h! r--%++ y-
RE: The dreaded layout rule
Malcolm Wallace [EMAIL PROTECTED] wrote, [...] Simon Marlow replies: GHC and Hugs both make use of yacc-style error recovery, albeit in a very limited form. And nhc uses parser combinators, which give you backtracking on error conditions for free. We actually do almost all layout processing at the lexical stage, but where the parser expects a } and doesn't get one, we just insert the }, and re-lex the remaining input. I suppose having to re-lex is a bit of a chore, but laziness comes to the rescue somewhat. How about the Carl Witty's do a == b == c does NHC handle this correctly? Manuel
RE: The dreaded layout rule
Manuel Chakravarty writes: What kind of implementation did the originators of this clause envision? If the layout rule is really implemented as a filter between the scanner and the parser, it seems extremely awkward to add a dependency on the error condition of the parser - in particular, it makes a functional, ie, side-effect free implementation rather hard and a true two phase implementation impossible. So, I guess (I hope!!) there is a nifty trick that lets you achieve the same effect by using only conditions depending on local information (either during layout processing or by letting the parser insert the missing braces). GHC and Hugs both make use of yacc-style error recovery, albeit in a very limited form. The idea is to have a production in your grammar like this: close_brace : '}' | error where the '}' token is assumed to have been inserted by the lexical analyser as a result of layout (i.e. a token was found to be less indented than the current layout context). The error case fires if any other token is encountered, and the semantic action for this production will probably pop the current layout context and carry on (in practice you also have to tell yacc not to continue with error recovery, otherwise all sorts of strange things happen). Take a look at GHC's parser for the details. I believe you're right in that a true two-phase implementation of the Haskell grammar is impossible. This is consistent with Haskell's policy of making life easy for programmers and hard for compiler writers :) Cheers, Simon
The dreaded layout rule
One of our students just pointed out an IMHO rather problematic clause in the layout rule. In Section 2.7 of the Haskell 98 Report it says, A close brace is also inserted whenever the syntactic category containing the layout list ends; that is, if an illegal lexeme is encountered at a point where a close brace would be legal, a close brace is inserted. And in B.3, we have in the first equation of the definition of `L', L (t:ts) (m:ms) = } : (L (t:ts) ms) if parse-error(t) (Note 1) where Note 1 says, The side condition parse-error(t) is to be interpreted as follows: if the tokens generated so far by L together with the next token t represent an invalid prefix of the Haskell grammar, and the tokens generated so far by L followed by the token } represent a valid prefix of the Haskell grammar, then parse-error(t) is true. What kind of implementation did the originators of this clause envision? If the layout rule is really implemented as a filter between the scanner and the parser, it seems extremely awkward to add a dependency on the error condition of the parser - in particular, it makes a functional, ie, side-effect free implementation rather hard and a true two phase implementation impossible. So, I guess (I hope!!) there is a nifty trick that lets you achieve the same effect by using only conditions depending on local information (either during layout processing or by letting the parser insert the missing braces). Cheers, Manuel
Re: The dreaded layout rule
This is a multi-part message in MIME format. --F93F7E72348E2F23CC7D1D40 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Manuel says One of our students just pointed out an IMHO rather problematic clause in the layout rule ... So, I guess (I hope!!) there is a nifty trick that lets you achieve the same effect by using only conditions depending on local information ... Attached is Haskell code which handles the layout rule reasonably well as a separate pass between scanning and parsing (though it is Haskell 1.4 rather than 98 and imperfect). -- Ian[EMAIL PROTECTED], http://www.cs.bris.ac.uk/~ian --F93F7E72348E2F23CC7D1D40 Content-Type: text/plain; charset=us-ascii; name="Layout.hs" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Layout.hs" {-- LAYOUT ANALYSIS The layout function deals with the layout conventions of Haskell 1.4, inserting extra "{", ";" and "}" tokens to represent implicit blocks. The inserted tokens are marked as implicit, and are inserted as early as possible in the token stream, in order to promote well-phrased and well-positioned error messages in case of trouble. The layout function never fails; it is left up to a parser to detect errors. The Haskell standard says that a block (or layout list) is terminated "whenever the syntactic category containing the layout list ends, that is, if an illegal lexeme is encountered at a point where a close brace would be legal". This can only be implememented easily if layout processing is combined with parsing. Here, layout processing is done separately, so an approximation to the standard is achieved by keeping track of brackets. See the end of this file for examples where the layout function deviates from the standard. ISSUES TO BE RESOLVED -- "case" terminated by "where" may be common enough to make a special case -- "case" terminated by "," may be worth dealing with -- check that "let"s inside "do" (which don't have "in") are handled OK -- check explicit blocks, and their interaction with implicit ones Ian Holyer @(#) $Id: Layout.hs,v 1.2 1998/10/26 15:18:39 ian Exp $ --} module Layout (layout) where import Haskell import Lex -- Start layout processing. If the source does not begin with "module" or "{", -- then there is an implicit surrounding block. Here and elsewhere, a -- lookahead past possible comments is done so that a token can be inserted -- before the comments if necessary; also, the end-of-file token makes it -- unnecessary to check for an empty token stream. layout :: [Token] - [Token] layout ts = if s == "module" then comments ++ scanExplicit [] (tok:rest) else openBlock [] (Tok "}" 1 0 Implicit) ts where comments = takeWhile isComment ts tok @ (Tok s r c k) : rest = dropWhile isComment ts -- A stack of tokens is used to keep track of the surrounding blocks. For each -- block, its opening "{" token is pushed onto the stack. In an implicit -- block, the brackets "(",")" and "[","]" and "case","of" and "let","in" and -- "if","then","else" are tracked by putting the opening bracket on the stack -- until the matching closing bracket is found. Each opening bracket is stored -- on the stack with the indent for the current block in place of its actual -- column. type Stack = [Token] -- Scan the source tokens while in an explicit block (or while not in any -- blocks) when layout is inactive. Look for an explicit close block token, or -- a keyword which indicates the beginning of a new block. Treat field selector -- as an explicit block. scanExplicit :: Stack - [Token] - [Token] scanExplicit stack [] = [] scanExplicit stack (t@(Tok s r c k) : ts1) = case s of "}" - t : closeBlock stack t ts1 "where" - t : openBlock stack t ts1 "let" - t : openBlock stack t ts1 "do" - t : openBlock stack t ts1 "of" - t : openBlock stack t ts1 "{" - openBlock stack undefined (t:ts1) _ - t : scanExplicit stack ts1 -- Scan the source tokens while in an implicit block, with layout active. The -- parameters are the stack, the last token dealt with, and the remaining -- tokens. The block is terminated by indenting or by a suitable closing -- bracket. Treat field selector as an explicit block. scanImplicit :: Stack - Token - [Token] - [Token] scanImplicit stack@(Tok bs br bc bk : stack1) last@(Tok ls lr lc lk) ts = if c bc || k == EndToken then close else if c == bc r lr then newline else case s of "where" - open "let" - pushopen "do" - open "of" - popopen "case" "in" - pop "let" "(" - push ")" - pop "(" "[" - push "]" - pop "[" "case" - push "if" - push "then" - poppush "if" "else" - pop "then" "}" - close "{" -
RE: The dreaded layout rule
Manuel Chakravarty writes: What kind of implementation did the originators of this clause envision? If the layout rule is really implemented as a filter between the scanner and the parser, it seems extremely awkward to add a dependency on the error condition of the parser - in particular, it makes a functional, ie, side-effect free implementation rather hard and a true two phase implementation impossible. Simon Marlow replies: GHC and Hugs both make use of yacc-style error recovery, albeit in a very limited form. And nhc uses parser combinators, which give you backtracking on error conditions for free. We actually do almost all layout processing at the lexical stage, but where the parser expects a } and doesn't get one, we just insert the }, and re-lex the remaining input. I suppose having to re-lex is a bit of a chore, but laziness comes to the rescue somewhat. Regards, Malcolm
Re: The dreaded layout rule
If the scanning stage pairs the tokens it returns with their positions, then scanning can be done once before parsing begins. I've done this with a parser implemented with parser combinators, these combinators then decide whether or not to accept a token based on which token it is and how far it is indented. I think this means the grammar being parsed is a context sensitive one, since the state of the parser is represented by more than just a single stack. We need a stack telling us what to do next, and a stack of indentation levels, although the way in which these stacks grow and shrink is related they could not be replaced by a single stack, so the grammar is not context free. Now that I write this, I think that we could combine these stacks as a stack of stacks, although this isn't how I did it. I don't think this satisfies the requirements for a context free grammar (CFG) but I don't have a definition to hand at the moment. Mike Simon Marlow wrote: I believe you're right in that a true two-phase implementation of the Haskell grammar is impossible. This is consistent with Haskell's policy of making life easy for programmers and hard for compiler writers :)