Re: Parsing Clojure with instaparse: how to handle special forms?
That answers my question pretty well, thanks. On Thursday, February 6, 2014 11:20:42 PM UTC-8, Reid McKenzie wrote: Okay. So there's one big thing you're doing wrong here just from reading your grammars: you are complecting the datastructures and valid _tokens_ which make up the clojure language with the _meaing_ associated therewith by the language. If you discard such things as destructuring as part of the grammar and instead just provide the parse grammars for basic datastructures like symbols, maps, keywords sets and soforth it's trivial to produce a grammar which can _parse_ valid clojure code. _Reading_ clojure code from such a parse tree is and should be an entirely seperate concern, implemented as a pass over the generated parse structure. - Reid On Thursday, February 6, 2014 9:28:08 PM UTC-6, Travis Moy wrote: I'm trying to use instaparse to parse Clojure code so that I can reformat it, but I'm having an issue with how to handle special forms. Should I attempt to parse special forms such as let and defn into their own rules, or should I rely instead on the actual content of the terminal to determine what lists should be treated as special forms? For example, let's say I want to write a function which takes the parse tree returned by instaparse and arranges all the let bindings as recommended by the Clojure style guide ( https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization). There are two approaches I could take: 1) Build the recognition into the grammar itself: S = Form* Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | SpecialForm | !SpecialForm Symbol List = '(' Form* ')' ... SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor defn = '(' defn Symbol String? MapMetadata? VectorDestructuring Form* ')' Destructuring = VectorDestructuring | MapDestructuring VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | Destructuring))? ']' MapDestructuring = Map 2) Don't try to detect the let bindings in the grammar. Instead, search the resulting parse tree for lists with let content. Which of these is a better approach? I sadly didn't take compilers in college so I'm kind of playing this by ear; I'm sure if I had I'd have a better idea of what the best practice is here. Thanks! (Full code for my project is at https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: unconditional append to end
You should use a vector, but it's also possible to use concat. For example, (concat '(1 2 3) [4]) will give you (1 2 3 4). This made me curious as to the best way to get a collection into vector, so I played around with it some: user= (def r 10) #'user/r user= (def coll (range 1)) #'user/coll user= (def coll-v (into [] coll)) #'user/coll-v user= (time (dotimes [_ r] (conj (into [] coll) :a))) Elapsed time: 14074.018464 msecs nil user= (time (dotimes [_ r] (conj (apply vector coll) :a))) Elapsed time: 22565.594515 msecs nil user= (time (dotimes [_ r] (conj (vec coll) :a))) Elapsed time: 22424.174719 msecs nil user= (time (dotimes [_ r] (concat coll '(:a Elapsed time: 5.366059 msecs nil user= (time (dotimes [_ r] (concat coll-v '(:a Elapsed time: 5.56465 msecs nil user= (time (dotimes [_ r] (conj coll-v :a))) Elapsed time: 10.65771 msecs nil user= (time (dotimes [_ r] (concat coll coll))) Elapsed time: 6.048041 msecs nil user= (time (dotimes [_ r] (apply conj coll-v coll-v))) Elapsed time: 72414.847105 msecs nil Surprisingly it looks like (concat coll '(:a)) is faster than (conj coll-v :a). That's not really what I would expect; does anybody have a good explanation for this? Did I just bork the test somehow, or - I mean, obviously concat's pretty fast but I was expecting conj to be on the level. In fact, if you convert and then conj it's significantly slower than using concat. ...not that it'd really matter, in basically all cases, since (into [] ...) is definitely still in the fast enough category. Still, if you're building a sequence, what's the reasoning against using (concat coll ...) instead of (conj (into [] ...) ...)? Is it a matter of elegance, or is there a specific practical reason? On Friday, February 7, 2014 8:06:20 PM UTC-8, Armando Blancas wrote: For efficient appends at the end you need a vector. Using the sequence library can be tricky while you're putting together your data structures because it's likely that you'll not done yet with type-specific functions. You'll need to re-create your vector after using map/filter/etc to be able to keep adding at the end. On Friday, February 7, 2014 4:20:09 PM UTC-8, t x wrote: Consider the following: (cons 1 '(2 3 4)) == (1 2 3 4) (cons 1 [2 3 4]) == (1 2 3 4) (conj '(a b c) 1) == (1 a b c) (conj '[a b c] 1) == [a b c 1] Now, I would like something that _always_ * appends to the end cons is almost what I want, except it always appends to front. conj is not what I want -- in fact, I'm afraid of conj. Often times, I'll run map/filter on something, and suddenly, instead of a vector, I now have a list -- and conj changes the order of the item added. Thus, my question: is there a builtin to _unconditinoally_ append to the end of a list/sequence/vector? Thanks! -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: unconditional append to end
Ah! That makes more sense. Yeah, after I forced it to realize the sequence, it turned out that concat was a lot slower than sticking it into an array: #'user/r user= (def coll (range 1)) #'user/coll user= (def coll-v (into [] coll)) #'user/coll-v user= (time (dotimes [_ r] (count (concat coll '(:a) Elapsed time: 55803.147526 msecs nil user= (time (dotimes [_ r] (count (conj coll-v :a Elapsed time: 18.591737 msecs nil user= (time (dotimes [_ r] (count (conj (into [] coll) :a Elapsed time: 16224.79319 msecs nil On Friday, February 7, 2014 9:26:05 PM UTC-8, puzzler wrote: On Fri, Feb 7, 2014 at 9:08 PM, Travis Moy moyt...@gmail.comjavascript: wrote: Surprisingly it looks like (concat coll '(:a)) is faster than (conj coll-v :a). That's not really what I would expect; does anybody have a good explanation for this? Did I just bork the test somehow, or - I mean, obviously concat's pretty fast but I was expecting conj to be on the level. In fact, if you convert and then conj it's significantly slower than using concat. concat is lazy, so it's not really doing any work until you try to realize the sequence -- that's why it is so fast. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Parsing Clojure with instaparse: how to handle special forms?
I'm trying to use instaparse to parse Clojure code so that I can reformat it, but I'm having an issue with how to handle special forms. Should I attempt to parse special forms such as let and defn into their own rules, or should I rely instead on the actual content of the terminal to determine what lists should be treated as special forms? For example, let's say I want to write a function which takes the parse tree returned by instaparse and arranges all the let bindings as recommended by the Clojure style guide (https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization). There are two approaches I could take: 1) Build the recognition into the grammar itself: S = Form* Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | SpecialForm | !SpecialForm Symbol List = '(' Form* ')' ... SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor defn = '(' defn Symbol String? MapMetadata? VectorDestructuring Form* ')' Destructuring = VectorDestructuring | MapDestructuring VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | Destructuring))? ']' MapDestructuring = Map 2) Don't try to detect the let bindings in the grammar. Instead, search the resulting parse tree for lists with let content. Which of these is a better approach? I sadly didn't take compilers in college so I'm kind of playing this by ear; I'm sure if I had I'd have a better idea of what the best practice is here. Thanks! (Full code for my project is at https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. S = Form* Form = List | ReaderMacro | Literal | Vector | Map | SpecialForm | Symbol List = '(' Form* ')' ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment | Metadata | QuotedInternal (*TODO - Slash*) Quote = ' Form SyntaxQuote = '`' Form Dispatch = '#' DispatchMacro DispatchMacro = Set | Var | Regex | AnonFuncLit (*TODO - IgnoreForm*) Set = '{' Form* '}' Var = ' Form Regex = String AnonFuncLit = '(' Form* ')' Comment = ';' #'[^\n]*' Metadata = SymbolMetadata | KeywordMetadata | StringMetadata | MapMetadata SymbolMetadata = ^ Symbol Form KeywordMetadata = ^ Keyword Form StringMetadata = ^ String Form MapMetadata = ^ Map Form QuotedInternal = Unquote | UnquoteSplice | GenSym Unquote = '~' Form (*TODO - This should ONLY be used INSIDE a quoted form!*) UnquoteSplice = '~@' Form (*TODO - This should ONLY be used INSIDE a quoted form!*) GenSym = Symbol '#' (*TODO - This should ONLY be used INSIDE a quoted form!*) Symbol = Division | Custom Division = '/' Custom = #'[a-zA-Z\*\+\!\-\_\?\=%][a-zA-Z0-9\*\+\!\-\_\?\=\.%]*/?[a-zA-Z0-9\*\+\!\-\_\?\=\.%]*' Literal = String | Number | Character | Boolean | Keyword | NilLiteral String = '' #'(\\\|[^])*' '' (*Matches \\\ or any char not \*) Number = Integer | Float | Ratio (* TODO - add in support for hex/oct forms*) Integer = #'[+-]?[0-9]+r?[0-9]*' (*The r is so you can do 8r52 - 8 radix 52*) Float = #'[+-]?([0-9]*\.[0-9]+|[0-9]+\.[0-9]*)' | (*Decimal form*) #'[+-]?[0-9]+\.?[0-9]*e[+-]?[0-9]+' (*Exponent form*) Ratio = #'[+-]?[0-9]+/[0-9]+' Character = #'\\.' | '\\newline' | '\\space' | '\\tab' | '\\formfeed' | '\\backspace' | '\\return' (* TODO - add in support for unicode character representations!*) Boolean = 'true' | 'false' Keyword = #'::?[a-zA-Z0-9\*\+\!\-\_\?]*' NilLiteral = 'nil' Vector = '[' Form* ']' Map = '{' (Form Form)* '}'S = Form* Destructuring = VectorDestructuring | MapDestructuring VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | Destructuring))? ']' MapDestructuring = Map Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | SpecialForm | !SpecialForm Symbol List = '(' Form* ')' ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment |