Re: Parsing Clojure with instaparse: how to handle special forms?

2014-02-07 Thread Travis Moy
That answers my question pretty well, thanks.

On Thursday, February 6, 2014 11:20:42 PM UTC-8, Reid McKenzie wrote:

 Okay. So there's one big thing you're doing wrong here just from reading 
 your grammars: you are complecting the datastructures and valid _tokens_ 
 which make up the clojure language with the _meaing_ associated therewith 
 by the language. If you discard such things as destructuring as part of 
 the grammar and instead just provide the parse grammars for basic 
 datastructures like symbols, maps, keywords sets and soforth it's trivial 
 to produce a grammar which can _parse_ valid clojure code. _Reading_ 
 clojure code from such a parse tree is and should be an entirely seperate 
 concern, implemented as a pass over the generated parse structure.

 - Reid

 On Thursday, February 6, 2014 9:28:08 PM UTC-6, Travis Moy wrote:

 I'm trying to use instaparse to parse Clojure code so that I can reformat 
 it, but I'm having an issue with how to handle special forms. Should I 
 attempt to parse special forms such as let and defn into their own rules, 
 or should I rely instead on the actual content of the terminal to determine 
 what lists should be treated as special forms?

 For example, let's say I want to write a function which takes the parse 
 tree returned by instaparse and arranges all the let bindings as 
 recommended by the Clojure style guide (
 https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization).
  
 There are two approaches I could take:

 1) Build the recognition into the grammar itself:

 S = Form*

 Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
  SpecialForm | !SpecialForm Symbol
 
 List = '(' Form* ')'

...

 SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor
 defn = '(' defn Symbol String? MapMetadata? 
 VectorDestructuring Form* ')'

 Destructuring = VectorDestructuring | MapDestructuring
 VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
 Destructuring))? ']'
 MapDestructuring = Map


 2) Don't try to detect the let bindings in the grammar. Instead, search 
 the resulting parse tree for lists with let content.

 Which of these is a better approach? I sadly didn't take compilers in 
 college so I'm kind of playing this by ear; I'm sure if I had I'd have a 
 better idea of what the best practice is here.

 Thanks!

 (Full code for my project is at 
 https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed)



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: unconditional append to end

2014-02-07 Thread Travis Moy
You should use a vector, but it's also possible to use concat. For example, 
(concat '(1 2 3) [4]) will give you (1 2 3 4).

This made me curious as to the best way to get a collection into vector, so 
I played around with it some:

user= (def r 10)
 #'user/r
 user= (def coll (range 1))
 #'user/coll
 user= (def coll-v (into [] coll))
 #'user/coll-v
 user= (time (dotimes [_ r] (conj (into [] coll) :a)))
 Elapsed time: 14074.018464 msecs
 nil
 user= (time (dotimes [_ r] (conj (apply vector coll) :a)))
 Elapsed time: 22565.594515 msecs
 nil
 user= (time (dotimes [_ r] (conj (vec coll) :a)))
 Elapsed time: 22424.174719 msecs
 nil
 user= (time (dotimes [_ r] (concat coll '(:a
 Elapsed time: 5.366059 msecs
 nil
 user= (time (dotimes [_ r] (concat coll-v '(:a
 Elapsed time: 5.56465 msecs
 nil
 user= (time (dotimes [_ r] (conj coll-v :a)))
 Elapsed time: 10.65771 msecs
 nil
 user= (time (dotimes [_ r] (concat coll coll)))
 Elapsed time: 6.048041 msecs
 nil
 user= (time (dotimes [_ r] (apply conj coll-v coll-v)))
 Elapsed time: 72414.847105 msecs
 nil


Surprisingly it looks like (concat coll '(:a)) is faster than (conj coll-v 
:a). That's not really what I would expect; does anybody have a good 
explanation for this? Did I just bork the test somehow, or - I mean, 
obviously concat's pretty fast but I was expecting conj to be on the level. 
In fact, if you convert and then conj it's significantly slower than using 
concat.

...not that it'd really matter, in basically all cases, since (into [] ...) 
is definitely still in the fast enough category. Still, if you're 
building a sequence, what's the reasoning against using (concat coll ...) 
instead of (conj (into [] ...) ...)? Is it a matter of elegance, or is 
there a specific practical reason?

On Friday, February 7, 2014 8:06:20 PM UTC-8, Armando Blancas wrote:

 For efficient appends at the end you need a vector. Using the sequence 
 library can be tricky while you're putting together your data structures 
 because it's likely that you'll not done yet with type-specific functions. 
 You'll need to re-create your vector after using map/filter/etc to be able 
 to keep adding at the end. 

 On Friday, February 7, 2014 4:20:09 PM UTC-8, t x wrote:

 Consider the following: 

 (cons 1 '(2 3 4)) == (1 2 3 4) 
 (cons 1 [2 3 4])  == (1 2 3 4) 

 (conj '(a b c) 1) == (1 a b c) 
 (conj '[a b c] 1) == [a b c 1] 


  

 Now, I would like something that _always_ 
   * appends to the end 

 cons is almost what I want, except it always appends to front. 

 conj is not what I want -- in fact, I'm afraid of conj. Often times, 
 I'll run map/filter on something, and suddenly, instead of a vector, I 
 now have a list -- and conj changes the order of the item added. 

 Thus, my question: is there a builtin to _unconditinoally_ append to 
 the end of a list/sequence/vector? 

 Thanks! 



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: unconditional append to end

2014-02-07 Thread Travis Moy
Ah! That makes more sense. Yeah, after I forced it to realize the sequence, 
it turned out that concat was a lot slower than sticking it into an array:

#'user/r
 user= (def coll (range 1))
 #'user/coll
 user= (def coll-v (into [] coll))
 #'user/coll-v
 user= (time (dotimes [_ r] (count (concat coll '(:a)
 Elapsed time: 55803.147526 msecs
 nil
 user= (time (dotimes [_ r] (count (conj coll-v :a
 Elapsed time: 18.591737 msecs
 nil
 user= (time (dotimes [_ r] (count (conj (into [] coll) :a
 Elapsed time: 16224.79319 msecs
 nil


On Friday, February 7, 2014 9:26:05 PM UTC-8, puzzler wrote:

 On Fri, Feb 7, 2014 at 9:08 PM, Travis Moy moyt...@gmail.comjavascript:
  wrote:

 Surprisingly it looks like (concat coll '(:a)) is faster than (conj 
 coll-v :a). That's not really what I would expect; does anybody have a good 
 explanation for this? Did I just bork the test somehow, or - I mean, 
 obviously concat's pretty fast but I was expecting conj to be on the level. 
 In fact, if you convert and then conj it's significantly slower than using 
 concat.


 concat is lazy, so it's not really doing any work until you try to realize 
 the sequence -- that's why it is so fast. 


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Parsing Clojure with instaparse: how to handle special forms?

2014-02-06 Thread Travis Moy
I'm trying to use instaparse to parse Clojure code so that I can reformat 
it, but I'm having an issue with how to handle special forms. Should I 
attempt to parse special forms such as let and defn into their own rules, 
or should I rely instead on the actual content of the terminal to determine 
what lists should be treated as special forms?

For example, let's say I want to write a function which takes the parse 
tree returned by instaparse and arranges all the let bindings as 
recommended by the Clojure style guide 
(https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization).
 
There are two approaches I could take:

1) Build the recognition into the grammar itself:

S = Form*

 Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
  SpecialForm | !SpecialForm Symbol
 
 List = '(' Form* ')'

...

 SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor
 defn = '(' defn Symbol String? MapMetadata? VectorDestructuring 
 Form* ')'

 Destructuring = VectorDestructuring | MapDestructuring
 VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
 Destructuring))? ']'
 MapDestructuring = Map


2) Don't try to detect the let bindings in the grammar. Instead, search the 
resulting parse tree for lists with let content.

Which of these is a better approach? I sadly didn't take compilers in 
college so I'm kind of playing this by ear; I'm sure if I had I'd have a 
better idea of what the best practice is here.

Thanks!

(Full code for my project is at 
https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
S = Form*

Form = List | ReaderMacro | Literal | Vector | Map | 
 SpecialForm | Symbol

List = '(' Form* ')'

ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment | Metadata | 
QuotedInternal (*TODO - Slash*)
Quote = ' Form
SyntaxQuote = '`' Form
Dispatch = '#' DispatchMacro
DispatchMacro = Set | Var | Regex | AnonFuncLit (*TODO - 
IgnoreForm*)
Set = '{' Form* '}'
Var = ' Form
Regex = String
AnonFuncLit = '(' Form* ')'
Comment = ';' #'[^\n]*'
Metadata = SymbolMetadata | KeywordMetadata | StringMetadata | 
MapMetadata
SymbolMetadata = ^ Symbol Form
KeywordMetadata = ^ Keyword Form
StringMetadata = ^ String Form
MapMetadata = ^ Map Form
QuotedInternal = Unquote | UnquoteSplice | GenSym
Unquote = '~' Form (*TODO - This should ONLY be used INSIDE a 
quoted form!*)
UnquoteSplice = '~@' Form (*TODO - This should ONLY be used INSIDE 
a quoted form!*)
GenSym = Symbol '#' (*TODO - This should ONLY be used INSIDE a 
quoted form!*)

Symbol = Division | Custom
Division = '/'
Custom = 
#'[a-zA-Z\*\+\!\-\_\?\=%][a-zA-Z0-9\*\+\!\-\_\?\=\.%]*/?[a-zA-Z0-9\*\+\!\-\_\?\=\.%]*'

Literal = String | Number | Character | Boolean | Keyword | NilLiteral
String = '' #'(\\\|[^])*' '' (*Matches \\\ or any char not \*)
Number = Integer | Float | Ratio (* TODO - add in support for hex/oct 
forms*)
Integer = #'[+-]?[0-9]+r?[0-9]*' (*The r is so you can do 8r52 - 8 
radix 52*)
Float = #'[+-]?([0-9]*\.[0-9]+|[0-9]+\.[0-9]*)' | (*Decimal form*)
#'[+-]?[0-9]+\.?[0-9]*e[+-]?[0-9]+' (*Exponent form*)
Ratio = #'[+-]?[0-9]+/[0-9]+'
Character = #'\\.' | '\\newline' | '\\space' | '\\tab' | '\\formfeed' |
'\\backspace' | '\\return'
(* TODO - add in support for unicode character 
representations!*)
Boolean = 'true' | 'false'
Keyword = #'::?[a-zA-Z0-9\*\+\!\-\_\?]*'
NilLiteral = 'nil'

Vector = '[' Form* ']'
Map = '{' (Form Form)* '}'S = Form*

Destructuring = VectorDestructuring | MapDestructuring
VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
Destructuring))? ']'
MapDestructuring = Map

Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
 SpecialForm | !SpecialForm Symbol

List = '(' Form* ')'

ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment |