Re: [Haskell-cafe] Re: What *is* a DSL?
On Sat, 2009-10-10 at 20:12 +0200, Ben Franksen wrote: Since 'some' is defined recursively, this creates an infinite production for numbers that you can neither print nor otherwise analyse in finite time. Yes, sorry, I should have been more careful there. One has to be careful to handle EDSLs that have potentially infinite syntax properly. I can see at least two solutions: One is to parameterize everything over the type of terminals, too. The second solution (which I followed) is to break the recursion by adding another nonterminal to the NT type: A third solution is to add the Kleene star to the grammar DSL, so the representation of productions becomes: data Production nt a = Stopa | TerminalChar (Production nt a) | forall b. NonTerminal (nt b) (Production nt (b - a)) | forall b. Star(Production nt b) (Production nt ([b] - a)) You also need to add the necessary parts for Alternative to the Production type too, because they may be nested inside Star constructors: | Zero | Choose (Production nt a) (Production nt a) The type Production nt is now directly an Applicative and an Alternative and also has the combinator: star :: Production nt a - Production nt [a] star p = Star p $ Stop id The type of grammars is changed to (with the additional of the starting nonterminal, as you point out): type Grammar nt = forall a. nt a - Production nt a It is probably also possible to write a function that converts grammars with “Star”s in to ones without by introducing new non-terminals in the way you did. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: What *is* a DSL?
On Sun, 2009-10-11 at 21:54 +0200, Ben Franksen wrote: Ben Franksen wrote: Next thing I'll try is to transform such a grammar into an actual parser... Which I also managed to get working. However, this exposed yet another problem I am not sure how to solve. Another option is to not use a recursive descent parser, and switch to a parsing algorithm for any context-free such as CYK or Earley's algorithm. A little test implementation of a well-typed version of the CYK algorithm seems to work and seems to be as efficient as the normal imperative one if enough memoisation is used. I'm trying to see if I can get Earley's algorithm to work nicely in the well-typed setting. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] What *is* a DSL?
On Tue, 2009-10-13 at 13:28 +0100, Nils Anders Danielsson wrote: On 2009-10-07 17:29, Robert Atkey wrote: A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what free applicative functors were. The Agda code you saw may have been due to Ulf Norell and me. There is a note about it on my web page: http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinators.html Yes, it might have been that, OTOH I'm sure I saw it in some Haskell code. Maybe I was imagining it. Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future. Previously parsed input /can/ determine what the parser will accept in the future (as pointed out by Peter Ljunglöf in his licentiate thesis). Consider the following grammar for the context-sensitive language {aⁿbⁿcⁿ| n ∈ ℕ}: Yes, sorry, I was sloppy in what I said there. Do you know of a characterisation of what languages having a possibly infinite amount of nonterminals gives you. Is it all context-sensitive languages or a subset? And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar: Things become more complicated if the reusable component is defined using non-terminals which take rules (defined using an arbitrary non-terminal type) as arguments. Exercise: Define a reusable variant of the Kleene star, without using grammars of infinite depth. I see that you have an answer in the paper you linked to above. Another possible answer is to consider open sets of rules in a grammar: data OpenRuleSet inp exp = forall hidden. OpenRuleSet (forall a. (exp :+: hidden) a - Rule (exp :+: hidden :+: inp) a) data (f :+: g) a = Left2 (f a) | Right2 (g a) So OpenRuleSet inp exp, exports definitions of the nonterminals in 'exp', imports definitions of nonterminals in 'inp' (and has a collection of hidden nonterminals). It is then possible to combine them with a function of type: combineG :: (inp1 := exp1 :+: inp) - (inp2 := exp2 :+: inp) - OpenRuleSet inp1 exp1 - OpenRuleSet inp2 exp2 - OpenRuleSet inp (exp1 :+: exp2) One can then give a reusable Kleene star by stating it as an open rule set: star :: forall a nt. Rule nt a - OpenRuleSet nt (Equal [a]) where Equal is the usual equality GADT. Obviously, this would be a bit clunky to use in practice, but maybe more specialised versions combineG could be given. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] What *is* a DSL?
On Mon, 2009-10-12 at 15:49 +0200, Sjoerd Visscher wrote: Hi Bob, I tried to understand this by applying what you said here to your deep embedding of a parsing DSL. But I can't figure out how to do that. What things become the type class T? Here's the API version of the grammar DSL: class GrammarDSL grammar where type Rule grammar :: (* - *) - * - * pure:: a - Rule grammar nt a (*) :: Rule grammar nt (a - b) - Rule grammar nt a - Rule grammar nt b empty :: Rule grammar nt a (|) :: Rule grammar nt a - Rule grammar nt a - Rule grammar nt a char:: Char - Rule grammar nt () nt :: nt a - Rule grammar nt a grammar :: forall nt a. nt a - (forall a. nt a - Rule grammar nt a) - grammar nt a The language of typed-grammars-with-actions is composed of: * two sorts: grammars and rules * rules support the applicative and alternative interfaces, and also two special operators for incorporating terminals and nonterminals into rules. * grammars support a single operation: taking a nonterminal-indexed collection of rules, and a starting non-terminal (as Ben Franksen pointed out), producing a grammar. Basically, the idea is to think 1) what are the syntactic categories of my DSL?, these become the sorts; 2) what are the basic syntactic constructions of my DSL?, these become the operations of the type class. Because we are embedded in Haskell, we can have infinite syntax, as demonstrated by the grammar operation. WRT the recipe for getting deep embeddings in my previous post, it isn't quite true that the type Grammar nt a = forall grammar. GrammarDSL grammar = grammar nt a is isomorphic to the deep embedding I posted before, because it doesn't guarantee that the applicative functor or alternative laws hold, while the deep embedding does (and it also ensures that * and | distribute). It isn't hard to come up with a deep embedding that is initial for the completely free version though. The deep embedding from the previous post is an instance of this type class. So is, as Ben Franksen showed, a Parsec parser. I ended up having to inline the applicative and alternative interfaces into the class definition above. I wanted to write: class (Applicative (Rule grammar nt), Alternative (Rule grammar nt)) = Grammar grammar where ... but GHC wouldn't let me, complaining that 'nt' wasn't bound. Is there any reason this couldn't be made to work? Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Applicative do?
On Fri, 2009-10-09 at 18:06 +0100, Philippa Cowderoy wrote: This leads us to the bikeshed topic: what's the concrete syntax? I implemented a simple Camlp4 syntax extension for Ocaml to do this. I chose the syntax: applicatively let x = foo let y = bar in pure stuff I quite like the word applicatively. Your overloading suggestion sounds to me like it would require the desugaring process to know something about types, but I'm not sure. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] What *is* a DSL?
On Wed, 2009-10-07 at 11:32 -0400, Joe Fredette wrote: So, if I understand this: Parsec is a DSL, I'm going to venture it's a Deep embedding -- I don't understand the internals, but if I were to build something like Parsec, I would probably build up a Parser datastructure and then apply optimizations to it, then run it with another function. Am I on the right track here? Parsec, like most other parser combinator libraries, is a shallowly embedded DSL. The Parser a type is a Haskell function that does parsing, i.e. a function of type String - Maybe (String, a). (Obviously, the real Parsec library allows more than strings, and has better error reporting than this type, but this is the basic idea). You can't analyse it further---you can't transform it into another grammar to optimise it or print it out---because the information about what things it accepts has been locked up into a non-analysable Haskell function. The only thing you can do with it is feed it input and see what happens. A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what free applicative functors were. First we define what a production with a semantic action is, parameterised by the type of non-terminals in our grammar and the result type: data Production nt a = Stopa | TerminalChar (Production nt a) | forall b. NonTerminal (nt b) (Production nt (b - a)) You can think of a production as a list of either terminals or non-terminals, terminated by the value of that production. The non-regular nested type argument in NonTerminal means that the final value can depend on the values that will be returned when parsing the strings that match other non-terminals. Productions are functors: instance Functor (Production nt) where fmap f (Stop a) = Stop (f a) fmap f (Terminal c p) = Expect c (fmap f p) fmap f (NonTerminal nt p) = NonTerminal nt (fmap (fmap f) p) They are also applicative functors: instance Applicative (Production nt) where pure = Stop (Stop f) * a = fmap f a (Terminal c t) * a = Terminal c (t * a) (NonTerminal nt t) * a = NonTerminal nt (fmap flip t * a) A rule in one of our grammars is just a list of alternative productions: newtype Rule nt a = Rule [Production nt a] Since lists are (applicative) functors and (applicative) functors compose, Rule nt is also a Functor and Applicative functor: instance Functor (Rule nt) where fmap f (Rule l) = Rule (fmap (fmap f) l) instance Applicative (Rule nt) where pure x = Rule $ pure (pure x) (Rule lf) * (Rule la) = Rule $ (*) $ lf * la It is also an instance of Alternative, because we composed with lists: instance Alternative (Rule nt) where empty = Rule [] (Rule r1) | (Rule r2) = Rule $ r1 | r2 A grammar is a map from nonterminals to rules, which are lists of alternative productions, which may themselves refer back to nonterminals in the grammar: type Grammar nt = forall a. nt a - Rule nt a Given a value of type Grammar nt, and a starting nonterminal in nt a for some a, one can easily write a function that translates it into a Parsec grammar to do actual parsing, or implement a different parsing strategy using memoisation or something similar. The translation to a traditional parser combinator library is actually a (indexed-)homomorphism of applicative functors + extra operations, which is pretty cool. If you also know some extra facts about the nt type (e.g. that it is finite), then it should be possible implement an CYK or Earley parser using this, or to print out the grammar (for documentation purposes, or for telling another node in a distributed network what things you accept, for instance). Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future. This is the fundamental reason for using the applicative functor interface rather than the monad interface here. I'll give an example grammar for parsing expressions modelled by the following data type: data Expr = ENum Int | ESum Expr Expr | EProduct Expr Expr deriving Show To define a grammar in this formalism, one first has to define the set of nonterminals that one wants to use: data NT a where Value :: NT Expr Product :: NT Expr Sum :: NT Expr Now, a grammar is simply a function from members of this type to productions. We use the applicative/alternative functor interface to build up the productions. Conor's SHE would make this look a lot nicer, using idiom brackets. myGrm :: Grammar NT
Re: [Haskell-cafe] What *is* a DSL?
What is a DSL? How about this as a formal-ish definition, for at least a pretty big class of DSLs: A DSL is an algebraic theory in the sense of universal algebra. I.e. it is an API of a specific form, which consists of: a) a collection of abstract types, the carriers. Need not all be of kind *. b) a collection of operations, of type t1 - t2 - ... - tn where tn must be one of the carrier types from (a), but the others can be any types you like. c) (Optional) a collection of properties about the operations (e.g. equations that must hold) Haskell has a nice way of specifying such things (except part (c)): type classes. Examples of type classes that fit this schema include Monad, Applicative and Alternative. Ones that don't include Eq, Ord and Show. The Num type class would be, if it didn't specify Eq and Show as superclasses. An implementation of a DSL is just an implementation of corresponding type class. Shallowly embedded DSLs dispense with the type class step and just give a single implementation. Deeply embedded implementations are *initial* implementations: there is a unique function from the deep embedding to any of the other implementations that preserves all the operations. The good thing about this definition is that anything we do to the deep embedding, we can do to any of the other implementations via the unique map. Thanks to Church and Reynolds, we can always get a deep embedding for free (free as in Theorems for Free). If our DSL is defined by some type class T, then the deep embedding is: type DeepT = forall a. T a = a (and so on, for multiple carrier types, possibly with type parameterisation). Of course, there is often an easier and more efficient way of representing the initial algebra using algebraic data types. Conor McBride often goes on about how the initial algebra (i.e. the deep embedding) of a given specification is the one you should be worrying about, because it often has a nice concrete representation and gives you all you need to reason about any of the other implementations. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Finally tagless - stuck with implementation of lam
On Mon, 2009-10-05 at 19:22 -0400, David Menendez wrote: The two obvious options are call-by-name and call-by-value. I wonder how easily one can provide both, like in Algol. Fairly easy, you can either do a language that has an explicit monad (a bit like Haskell with only the IO monad), or you can go the whole hog and embed Paul Levy's Call-by-push-value. This requires a bit more cleverness, and an explicit representation of the embedded types in order to define the algebra maps on the computation types. Obviously, doing a deterministic version is simpler. You can probably get away with representing values as simple self-evaluating thunks. data Thunk s a = STRef s (Either a (ST s a)) evalThunk :: Thunk s a - ST s a evalThunk r = readSTRef r = either return update where update m = m = \x - writeSTRef r (Left x) return x I had a go at this and it seems to work. Happily, it is a tiny tweak to the CBV implementation I gave: you just replace return with your evalThunk (or something like it) and wrap the argument in the 'app' case with a function that creates a thunk. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Finally tagless - stuck with implementation of?lam
On Mon, 2009-10-05 at 22:06 -0400, Chung-chieh Shan wrote: Robert Atkey bob.at...@ed.ac.uk wrote in article 1254778973.3675.42.ca...@bismuth in gmane.comp.lang.haskell.cafe: To implement the translation of embedded language types to Haskell types in Haskell we use type families. This type-to-type translation is indeed the crux of the trickiness. By the way, Section 4.3 of our (JFP) paper describes how to follow such a translation without type families. If I were to avoid type families in Haskell, I would make Symantics into a multiparameter type class Yes, this is another way to do it. I prefer having an explicit representation of the types of the embedded language though. For one, the multi-parameter type classes get a bit unwieldy if you have lots of types in your embedded language. Also, sometimes you might want to do special things with the denotations of each type. For instance, I did an embedding of Levy's CBPV, where the type system is split into value types and computation types. For the computation types X there is always an algebra map of type m [[ X ]] - [[ X ]]. To define this I needed a term level representative of the type, and also the type family to give the semantics of the embedded type. I don't know if this is easily done using multi-parameter type classes. The underlying problem with the implementation of 'lam' is that you have to pick an evaluation order for the side effects you want in the semantics of your embedded language. The two obvious options are call-by-name and call-by-value. (Section 5 of our (JFP) paper addresses both CBN and CBV.) Yes, thanks for reminding me. I vaguely remembered when I went to bed after posting that you had done something via Plotkin's CPS transformations. I rather like the direct approach though; sometimes it is nice not to have to solve every problem by hitting it with the continuations hammer ;). Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Finally tagless - stuck with implementation of lam
On Mon, 2009-10-05 at 22:42 +0100, Robert Atkey wrote: There is a difference in the syntax between CBN and CBV that is not always obvious from the usual paper presentations. There is a split between pieces of syntax that are values and those that are computations. Values do not have side-effects, while computations may. In this presentation, I have chosen that the only values are variables, while everything else is a computation. I should correct this: it is possible to give the CBV semantics to the original syntax definition (with the single parameter in the type class), you just have to move the return of the monad into the lam case. I implied in the paragraph above that the split between values and computations was required for CBV case; this is not true. Bob -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Finally tagless - stuck with implementation of lam
Hi Günther, The underlying problem with the implementation of 'lam' is that you have to pick an evaluation order for the side effects you want in the semantics of your embedded language. The two obvious options are call-by-name and call-by-value. The tricky point is that the types of the embedded language cannot be interpreted directly as Haskell types wrapped in a monad. You need to either take the call-by-name or the call-by-value interpretation of types: For CBN: [[ Int]] = IO Int [[ A - B ]] = [[ A ]] - [[ B ]] For CBV: [[ Int]] = Int [[ A - B ]] = [[ A ]] - IO [[ B ]] (obviously, there is nothing special about the IO monad here, or in the rest of this email: we could replace it with any other monad). To implement the translation of embedded language types to Haskell types in Haskell we use type families. Here is the code for call-by-name: module CBN where First of all, we define some dummy types so that we have a different representation of types in the embedded language than those in the meta-language, and we don't get confused: data IntT data a :- b infixr 5 :- Now we give the definition of the higher-order abstract syntax of our embedded language: class HOAS exp where lam :: (exp a - exp b) - exp (a :- b) app :: exp (a :- b) - exp a - exp b int :: Int - exp IntT add :: exp IntT - exp IntT - exp IntT This is as in your definition, except I have used the different type names to highlight the distinction between embedded and meta-language types. Next, we give the CBN semantics for the types, using the definition above, with a type family. type family Sem a :: * type instance Sem IntT = IO Int type instance Sem (a :- b) = Sem a - Sem b Now we can give an instance of the syntax's type class, that translates a piece of embedded code into its denotational semantics. newtype S a = S { unS :: Sem a } instance HOAS S where int = S . return add (S x) (S y) = S $ do a - x b - y putStrLn Adding return (a + b) lam f = S (unS . f . S) app (S x) (S y) = S (x y) As an example, we can give a term and see what happens when we evaluate it: let_ :: HOAS exp = exp a - (exp a - exp b) - exp b let_ x y = (lam y) `app` x t :: HOAS exp = exp IntT t = (lam $ \x - let_ (x `add` x) $ \y - y `add` y) `app` int 10 Evaluating 'unS t' in GHCi gives the output: Adding Adding Adding 40 Note that the add operation is invoked three times: the x `add` x that is let-bound is evaluated twice. The second option is call-by-value. This is a bit more interesting because we have to change the syntax of the embedded language to deal with the different substitution rules for CBV. module CBV where Again, we use a different representation for embedded-language types: data IntT data a :- b infixr 5 :- There is a difference in the syntax between CBN and CBV that is not always obvious from the usual paper presentations. There is a split between pieces of syntax that are values and those that are computations. Values do not have side-effects, while computations may. In this presentation, I have chosen that the only values are variables, while everything else is a computation. To represent this split, we use a multi-parameter type class. Note that the technique of using multi-parameter type classes is generally useful for representing abstract syntaxes in the HOAS style with multiple mutually defined syntactic categories. class HOAS exp val | exp - val, val - exp where val :: val a - exp a lam :: (val a - exp b) - exp (a :- b) app :: exp (a :- b) - exp a - exp b int :: Int - exp IntT add :: exp IntT - exp IntT - exp IntT The 'val' operation injects values into computations: this will be interpreted by the 'return' of the monad. We now give the semantics of types for CBV, following the definition above: type family Sem a :: * type instance Sem IntT = Int type instance Sem (a :- b) = Sem a - IO (Sem b) We need two newtypes for the two different syntactic categories. Values are interpeted by 'SV', and are just pure values; computations are interpreted by wrapping them in the monad. newtype SV a = SV { unSV :: Sem a } newtype SC a = SC { unSC :: IO (Sem a) } Now we can define the semantics of the syntax in terms of these types: instance HOAS SC SV where val = SC . return . unSV lam f = SC $ return (unSC . f . SV) app (SC x) (SC y) = SC $ do f - x; a - y; f a int i = SC $ return i add (SC x) (SC y) = SC $ do a - x b - y putStrLn Adding return (a + b) The same example term is expressed in CBV as follows: let_ :: HOAS exp val = exp a - (val a - exp b) - exp b let_ x y = (lam y) `app` x t :: HOAS exp val = exp IntT t =