This message demonstrates SYB3-like generic term processing for transformations and selections from (HS)XML-like documents. Our running example is an HSXML document that includes Haskell sample code, which is to be formatted specially. Moreover, we wish to guarantee that the sample code is syntactically and type-correct. Furthermore, we wish to evaluate expressions in the context of the code -- and include the result in the rendered HTML document. The present message is inspired by the example posted by [EMAIL PROTECTED] on the Haskell list, Mar 25, 2006, and benefited from discussions with [EMAIL PROTECTED]
We show a quite non-trivial generic term processing: extracting textual data from a particular subterm. The subterm in question is identified by its type; as there may be several occurrences of subterms of a given type, we further identify the desired subterm by its content (i.e., a string label). Some of the extracted strings must be concatenated, while the others must be kept distinct -- depending on their occurrences within terms of a particular type. Thus our extraction has both static aspects (i.e., type-directed) as well as dynamic ones (i.e., guards). We should point out that in HSXML, the notions of the element type (in the sense of the XML Recommendation) and of the Haskell data type coincide. Our generic term processing is like SYB3 in spirit, but different in implementation. In particular, we do not require a GHC extension of recursive instances, although this point is moot here as HSXML so far has had no recursive datatypes. Like SYB3, we do need overlapping instances to specify generic term processing, which the user can later override for particular types of terms. Our HSXML transformations are like SXSLT, SXML pre-post-order transformations described in http://pobox.com/~oleg/ftp/Scheme/xml.html#SXSLT only more modular. The complete code discussed in the present message is in the file sample-code.hs included in the (updated) archive http://pobox.com/~oleg/ftp/Haskell/HSXML.tar.gz Here's our running example: > test_haskell = > (document > (head > [meta_tag [description "Haskell"]]) > (body > [h1 "Haskell examples"] > [h2 "The first sample code"] > [hcode "sample1" > "module Main where" > "fact n = product [1..n]" > ] > [hcode "sample2" > [[thide "module Main where"]] > [[thcode "fix f = f $" [[em "fix f"]]]] > "fact self n = if n <= 0 then 1 else n * (self (n-1))" > ] > [p "the result:" [[thrunh test_haskell "sample2" > "show $ fix fact 5"]]] > )) It includes two pieces of Haskell code, gabelled "sample1" and "sample2". The sample code is comprised of lines, which may include formatting annotations such as 'em' to emphasize certain parts of code. Some other lines of code, like module declarations, can be hidden in rendering for the sake of clarity. We may wish to extract a particular piece of code from the HSXML document, with formatting annotations removed. For example, we may wish to type-check that code, and evaluate some expressions in its context. We may want to do that while we render the HSXML document into HTML. For instance, [[thrunh test_haskell "sample2" "show $ fix fact 5"]], when rendered on a particular formatter, will extract "sample2", evaluate the expression "show $ fix fact 5" in its context, and insert the result in the rendered HTML document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <meta name="description" content="Haskell"></head> <body bgcolor="#FFFFFF"> <h1>Haskell examples</h1> <h2>The first sample code</h2> <pre> > module Main where > fact n = product [1..n]</pre> <pre> > fix f = f $ <em>fix f</em> > fact self n = if n <= 0 then 1 else n * (self (n-1))</pre> <p>the result: <code>show $ fix fact 5</code> is 120</p> </body> </html> To accomplish evaluation during rendering, we do invoke a Haskell system (runghc). To render Haskell code in a particular way, we define a new HSXML context, CT_hcode, and provide for interpretation of the text data and elements in that context: > -- Rendering of Strings in the HCODE context. > instance Render (HW CT_hcode String) where > render (HW str) = emit "\n> " >> emit str > > -- `Transparent' grouping of inline elements into one > -- hcode element > -- This is an administrative element > newtype THCODE a = THCODE a deriving Show > thcode x = build (as_hcode . HW . THCODE . rev'apppend HNil) nil_inline x > > -- An THCODE element is rendered as one string > instance RenderInline b => Render (HW CT_hcode (THCODE b)) where > render (HW (THCODE b)) = emit "\n> " >> (render_inline False b > >> return ()) Of a particular interest is SYB3-like generic processing for extracting sections of HSXML document. Just like in SYB3, the generic term processing code naturally splits into three distinct modules. In sample-code.hs file however, all three `modules' are placed within one file, for convenience. Again, our processing is SYB3 is spirit only -- but not in the implementation. The first part of the library defines the class Dat and its instances for term types of interest. This part of the library depends exclusively on the structure of terms -- irrespective of operations one may wish to perform on terms. > class Dat gctx a where > gmapq :: ([gctx] ->w) -> a -> w We also define a class of generic functions, identified by labels. We don't provide any instances of this class here. > class FN label a where > fn :: a -> label Unlike SYB3, Dat is not a subclass of any class (cf. the SAT class in SYB3). Thus the dictionary for Dat does not depend on other dictionaries. Furthermore, the function gmapq has _no_ higher-rank type. The type gctx is the label that identifies a particular generic function to apply. Within this part of the library, we treat gctx as an (abstract) type variable. Here are a few instances of the Dat class > instance Dat gctx (HW ctx String) where > gmapq u _ = u [] > instance Dat gctx HNil where > gmapq u _ = u [] > instance (FN gctx a, FN gctx b) => Dat gctx (HCons a b) where > gmapq u (HCons a b) = u [fn a, fn b] > -- Instances of Dat for more specific containers > instance (FN gctx h, FN gctx b) => Dat gctx (HW ctx (Document h b)) where > gmapq u (HW (Document h b)) = u [fn h, fn b] > > instance (FN gctx h) => Dat gctx (HW ctx (Head h)) where > gmapq u (HW (Head h)) = u [fn h] As we can see, all the instances are quite regular and could well be automatically derived (e.g., via DrIFT). The second part of the generic term traversal library defines generic functions, irrespective of the structure of the types to which they may apply. Here we define a traversal function to extract a part of the term according to some criterion (to be specified later by the user of the library) > class ExtractD a r where > extractd :: a -> r > newtype EXD r = EXD{unEXD :: r} We use EXD in two ways. First, we use its type as a label to select the proper instance of FN. Second, we use EXD to encapsulate the result type. So, we can select the proper instance of generic function based on the argument type and the result type. The following `registers' ExtractD to be a generic `function'. > instance ExtractD a r => FN (EXD r) a where fn = EXD . extractd That's all there is to the second part of the library. Incidentally, none of the previous parts needed overlapping instances. Finally, we wish to instantiate the ExtractD function to extract all strings from a part of the term that has the type (HW CT_block (HCode b)) and which has the desired label (given as a string): First we define a generic extraction, which applies to any term type: > instance Dat (EXD (String->[String])) a > => ExtractD a (String->[String]) where > extractd a label = gmapq (concatMap (\x -> unEXD x $ label)) a This traversal rule obviously does nothing but propagating accumulated strings. To get any result we need to define specific extraction rules. In our case, we have to find the HCode sub-term of the desired label and extract all strings from it: > instance ExtractC b => > ExtractD (HW CT_block (HCode b)) (String->[String]) where > extractd (HW (HCode label b)) label' = > if label == label' > then extractc b > else [] -- otherwise, ignore the whole block Here extractc is yet another generic function, which extracts the string content from terms and returns the array of lines. The following is the entire definition. A three-line boiler-plate: > class ExtractC a where extractc :: a -> [String] > newtype EXC = EXC{unEXC :: [String]} > instance ExtractC a => FN EXC a where fn = EXC . extractc General extraction instance, which only propagates what other instances might have produced: > instance Dat EXC a => ExtractC a where > extractc a = gmapq (concatMap unEXC) a Specific instances: > instance ExtractC (HW ctx String) where > extractc (HW str) = [str] > instance ExtractC b => ExtractC (HW CT_hcode (THIDE b)) where > extractc (HW (THIDE b)) = [concat $ extractc b] > instance ExtractC b => ExtractC (HW CT_hcode (THCODE b)) where > extractc (HW (THCODE b)) = [concat $ intersperse " " (extractc b) ] _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell