[Haskell-cafe] Design of extremely usable programming language libraries

2013-05-28 Thread Andrey Chudnov
Dear Cafe,
I'm exploring the design space of programming language
libraries with enhanced usability and I'd your help and comments.
I'll start with a few short questions, but offer a detailed discussion
of the motivations
and the problems I'm facing below. So, if you have interest in
the subject or feel you can offer some insight, please, do read on.

* Does any generic traversal/transformation (uniplate-style) library
  support GADTs?
* What is the best choice, performance- and memory-wise, for a parser
  combinator  library with support for arbitrary look-ahead? Parsec is
  considered slow by some [1], but is it only in comparison with
  attoparsec (which, unfortunately, doesn't support arbitrary
  look-ahead)? Is there any parser library that performs better than
  Parsec while still supporting arbitrary look-ahead.
* Any multi-mode pretty printer libraries? By multi-mode I mean
  writing code once and being able to generate, say, both pretty and
  minified text representations of a program by changing just one
  parameter. Also, what's the most efficient pretty-printing library
  nowadays? Blaze?

I've been using Haskell for quite a while now, primarily, for
programming-language applications: program analysis, transformation
and compilation. I'm sure many would agree that PL work is where
Haskell shines. In the recent years new language features and
libraries --namely, GADTs, Template Haskell, quasi-quotation and
generic programming--- have appeared that could make working with
languages even easier than before. That's why it's sad to see that
none of the PL libraries seem to make good use of these features
(however, I might be starting to understand why). So, I'm currently
exploring
the design space for a library that uses these advanced Haskell features for
delivering better usability, and I'm having problems with implementing some
of them. I welcome comments on both the motivations, overall design and
the more technical aspects. I've omitted a few details because it's a long
e-mail as is. If something is not clear or doesn't make sense, please,
let me know.

I'll start by listing the features that an ideal PL library should
have, and that I've come to cherish as both a heavy user and a
developer of such. The basic features (pretty much every library has
them) include a parser (text-AST (abstract syntax tree)) and a
pretty-printer (AST-text), as well as a Haskell representation of the
AST that is somewhat easy to use. Pretty much every library has that
--- although some might debate the ease of use of the AST
representations.


However, there are other features that, in my opinion, are essential
to a PL library. The features are motivated by three requirements:
static safety (as few run-time errors as possible), minimal code
duplication (DRY) and ease of use and inspection of the code.

1) the pretty-printer should be multi-mode. One should be able to write
code once and be able to generate different textual representations of
the AST:
 - the pretty which is nice to the eye with white spaces,
   indentations etc.
 - minified, with minimum white space (while still being valid)
 - debuggable which inserts comments based on AST annotations
 - source-map generation
 - being able to generate colored LaTeX/HTML code would be
   nice, but non-essential

2) ASTs should be statically safe: you should only be able construct
values that represent valid programs, or get a typechecker/compiler
error otherwise. Languages that have syntactic productions that can
appear in one context but not in another need GADTs with type witnesses to
achieve that. In fact, such languages are often used to motivate GADTs
in the first place [2]. And while the problem in [2] could have been
solved by splitting the Expr datatype into two (IntExpr and BoolExpr),
in some languages this can't be done (or produces awkward syntax
trees).

3) a quasi-quoter with support for anti-quotation and quoted
patterns. This also saves a lot of typing *and* makes your code less
error-prone and easier to read. What is better (to both write and read)?
  [js|#x# = (function (a, b) {return {t1: a + b, t2: a*b};})(#x#, #y#);|]
or
 ExprStmt def $ AssignExpr def x (CallExpr def (FuncExpr def Nothing
 [Id def a, Id def b] $ ReturnStmt def $ ObjectLit [(PropId def $
 Id def t1, InfixExpr def OpAdd (VarRef def $ Id def a) (VarRef def $
 Id def b)), PropId def $
 Id def t2, InfixExpr def OpMul (VarRef def $ Id def a) (VarRef def $
 Id def b)])) [x, y]

The caveat here is that, to help ensure correctness,
the quasi-quoter and the parser should share code as much code as
possible. Ideally, there should be just one parser that has a switch
for recognizing normal and quasi-quoted programs. However, that would
require adding additional constructors representing anti-quotations to
our AST. And with that the user might be able to generate invalid
AST's and cause a run-time error. The solution to the last problem 

Re: [Haskell-cafe] Design of extremely usable programming language libraries

2013-05-28 Thread Roman Cheplyaka
* Andrey Chudnov achud...@gmail.com [2013-05-28 16:36:14-0400]
 * Does any generic traversal/transformation (uniplate-style) library
   support GADTs?

Any syb-style library works with GADTs, by the virtue of dealing with
value representations instead of type representations.

 * What is the best choice, performance- and memory-wise, for a parser
   combinator  library with support for arbitrary look-ahead? Parsec is
   considered slow by some [1], but is it only in comparison with
   attoparsec (which, unfortunately, doesn't support arbitrary
   look-ahead)? Is there any parser library that performs better than
   Parsec while still supporting arbitrary look-ahead.

Not sure what you mean here — attoparsec does support unlimited
lookahead, in the sense that a parser may fail arbitrarily late in the
input stream, and backtrack to any previous state.
Although attoparsec is a poor choice for programming language parsing,
primarily because of the error messages.

Roman

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design of extremely usable programming language libraries

2013-05-28 Thread Andrey Chudnov
Thanks for a prompt reply, Roman.

On 05/28/2013 04:52 PM, Roman Cheplyaka wrote:
 Any syb-style library works with GADTs, by the virtue of dealing with
 value representations instead of type representations. 
I tried to use syb, but the following code fails to typecheck for me.
What am I doing wrong?
 {-# LANGUAGE GADTs, EmptyDataDecls, MultiParamTypeClasses,
TypeFamilies #-}
 {-# LANGUAGE DeriveDataTypeable, StandaloneDeriving #-}

 data HasHoles
 data Complete
 deriving instance Typeable HasHoles
 deriving instance Data HasHoles
 deriving instance Typeable Complete
 deriving instance Data Complete
 type family Holes a b :: *
 canHaveHolesT :: a - b - Holes a b
 canHaveHolesT _ _ = undefined
 type instance Holes HasHoles Complete = HasHoles
 type instance Holes Complete HasHoles = HasHoles
 type instance Holes HasHoles HasHoles = HasHoles
 type instance Holes Complete Complete = HasHoles

 data Expression k a where
   EQuote  :: a - String - Expression HasHoles a
   IntLit  :: a - Int - Expression Complete a
   EArith  :: a - ArithOp - Expression k1 a - Expression k2 a -
  Expression (Holes k1 k2) a
 deriving instance Typeable2 (Expression)
 deriving instance Data (Expression k a)
 data ArithOp = OpAdd
   | OpSub
   | OpMul
   | OpDiv
deriving (Data, Typeable)

Fails with:
 Couldn't match type `Complete' with `HasHoles'
 Expected type: a - String - Expression k a
   Actual type: a - String - Expression HasHoles a
 In the first argument of `z', namely `EQuote'
 In the first argument of `k', namely `z EQuote'
 When typechecking the code for  `Data.Data.gunfold'
   in a standalone derived instance for `Data (Expression k a)':
   To see the code I am typechecking, use -ddump-deriv


 Not sure what you mean here — attoparsec does support unlimited
 lookahead, in the sense that a parser may fail arbitrarily late in the
 input stream, and backtrack to any previous state. Although attoparsec
 is a poor choice for programming language parsing, primarily because
 of the error messages. 
I guess I have an outdated notion of attoparsec. But yes, error messages
seem to be the weak point of attoparsec. Also, the fact that it only
accepts bytestrings makes it harder (but no impossible, since we can
convert Strings to ByteStrings) to reuse the parser as a QuasiQuoter.
So, I'll rephrase my question. What's the best choice for a library for
parsing programming languages nowadays?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design of extremely usable programming language libraries

2013-05-28 Thread John Lato
  Not sure what you mean here — attoparsec does support unlimited
  lookahead, in the sense that a parser may fail arbitrarily late in the
  input stream, and backtrack to any previous state. Although attoparsec
  is a poor choice for programming language parsing, primarily because
  of the error messages.
 I guess I have an outdated notion of attoparsec. But yes, error messages
 seem to be the weak point of attoparsec. Also, the fact that it only
 accepts bytestrings makes it harder (but no impossible, since we can
 convert Strings to ByteStrings) to reuse the parser as a QuasiQuoter.
 So, I'll rephrase my question. What's the best choice for a library for
 parsing programming languages nowadays?


Parsec is still widely popular since it's part of the HP, but I use
uu-parsinglib as my first-choice parser.  It comes with a lot of examples,
good documentation, and many features I like (good error messages and auto
error correction).  I don't know how performance compares with parsec or
attoparsec, but it's always been good enough for me.

John L.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design of extremely usable programming language libraries

2013-05-28 Thread Roman Cheplyaka
Unfortunately you can only do traversals, not unfolds, with GADTs.

That's because in an unfold, the return type is determined by the value
itself and can vary among the produced results, whereas in a traversal
it is determined by the input type.

This means also that you cannot simply derive Data, because the derived
instance will contain a gunfold function, which then will fail to
typecheck.

You can copy-paste the generated instance (-ddump-deriv) and simply
remove the code for gunfold (or write your own deriver). The following
compiles for me:
https://gist.github.com/feuerbach/5668198

Roman

* Andrey Chudnov achud...@gmail.com [2013-05-28 17:29:10-0400]
 Thanks for a prompt reply, Roman.
 
 On 05/28/2013 04:52 PM, Roman Cheplyaka wrote:
  Any syb-style library works with GADTs, by the virtue of dealing with
  value representations instead of type representations. 
 I tried to use syb, but the following code fails to typecheck for me.
 What am I doing wrong?
  {-# LANGUAGE GADTs, EmptyDataDecls, MultiParamTypeClasses,
 TypeFamilies #-}
  {-# LANGUAGE DeriveDataTypeable, StandaloneDeriving #-}
 
  data HasHoles
  data Complete
  deriving instance Typeable HasHoles
  deriving instance Data HasHoles
  deriving instance Typeable Complete
  deriving instance Data Complete
  type family Holes a b :: *
  canHaveHolesT :: a - b - Holes a b
  canHaveHolesT _ _ = undefined
  type instance Holes HasHoles Complete = HasHoles
  type instance Holes Complete HasHoles = HasHoles
  type instance Holes HasHoles HasHoles = HasHoles
  type instance Holes Complete Complete = HasHoles
 
  data Expression k a where
EQuote  :: a - String - Expression HasHoles a
IntLit  :: a - Int - Expression Complete a
EArith  :: a - ArithOp - Expression k1 a - Expression k2 a -
   Expression (Holes k1 k2) a
  deriving instance Typeable2 (Expression)
  deriving instance Data (Expression k a)
  data ArithOp = OpAdd
| OpSub
| OpMul
| OpDiv
 deriving (Data, Typeable)
 
 Fails with:
  Couldn't match type `Complete' with `HasHoles'
  Expected type: a - String - Expression k a
Actual type: a - String - Expression HasHoles a
  In the first argument of `z', namely `EQuote'
  In the first argument of `k', namely `z EQuote'
  When typechecking the code for  `Data.Data.gunfold'
in a standalone derived instance for `Data (Expression k a)':
To see the code I am typechecking, use -ddump-deriv
 
 
  Not sure what you mean here — attoparsec does support unlimited
  lookahead, in the sense that a parser may fail arbitrarily late in the
  input stream, and backtrack to any previous state. Although attoparsec
  is a poor choice for programming language parsing, primarily because
  of the error messages. 
 I guess I have an outdated notion of attoparsec. But yes, error messages
 seem to be the weak point of attoparsec. Also, the fact that it only
 accepts bytestrings makes it harder (but no impossible, since we can
 convert Strings to ByteStrings) to reuse the parser as a QuasiQuoter.
 So, I'll rephrase my question. What's the best choice for a library for
 parsing programming languages nowadays?
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe