Re: Parsing Clojure with instaparse: how to handle special forms?

2014-02-07 Thread Reid McKenzie
Okay. So there's one big thing you're doing wrong here just from reading 
your grammars: you are complecting the datastructures and valid _tokens_ 
which make up the clojure language with the _meaing_ associated therewith 
by the language. If you discard such things as destructuring as part of 
the grammar and instead just provide the parse grammars for basic 
datastructures like symbols, maps, keywords sets and soforth it's trivial 
to produce a grammar which can _parse_ valid clojure code. _Reading_ 
clojure code from such a parse tree is and should be an entirely seperate 
concern, implemented as a pass over the generated parse structure.

- Reid

On Thursday, February 6, 2014 9:28:08 PM UTC-6, Travis Moy wrote:

 I'm trying to use instaparse to parse Clojure code so that I can reformat 
 it, but I'm having an issue with how to handle special forms. Should I 
 attempt to parse special forms such as let and defn into their own rules, 
 or should I rely instead on the actual content of the terminal to determine 
 what lists should be treated as special forms?

 For example, let's say I want to write a function which takes the parse 
 tree returned by instaparse and arranges all the let bindings as 
 recommended by the Clojure style guide (
 https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization).
  
 There are two approaches I could take:

 1) Build the recognition into the grammar itself:

 S = Form*

 Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
  SpecialForm | !SpecialForm Symbol
 
 List = '(' Form* ')'

...

 SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor
 defn = '(' defn Symbol String? MapMetadata? VectorDestructuring 
 Form* ')'

 Destructuring = VectorDestructuring | MapDestructuring
 VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
 Destructuring))? ']'
 MapDestructuring = Map


 2) Don't try to detect the let bindings in the grammar. Instead, search 
 the resulting parse tree for lists with let content.

 Which of these is a better approach? I sadly didn't take compilers in 
 college so I'm kind of playing this by ear; I'm sure if I had I'd have a 
 better idea of what the best practice is here.

 Thanks!

 (Full code for my project is at 
 https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed)



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Parsing Clojure with instaparse: how to handle special forms?

2014-02-07 Thread Travis Moy
That answers my question pretty well, thanks.

On Thursday, February 6, 2014 11:20:42 PM UTC-8, Reid McKenzie wrote:

 Okay. So there's one big thing you're doing wrong here just from reading 
 your grammars: you are complecting the datastructures and valid _tokens_ 
 which make up the clojure language with the _meaing_ associated therewith 
 by the language. If you discard such things as destructuring as part of 
 the grammar and instead just provide the parse grammars for basic 
 datastructures like symbols, maps, keywords sets and soforth it's trivial 
 to produce a grammar which can _parse_ valid clojure code. _Reading_ 
 clojure code from such a parse tree is and should be an entirely seperate 
 concern, implemented as a pass over the generated parse structure.

 - Reid

 On Thursday, February 6, 2014 9:28:08 PM UTC-6, Travis Moy wrote:

 I'm trying to use instaparse to parse Clojure code so that I can reformat 
 it, but I'm having an issue with how to handle special forms. Should I 
 attempt to parse special forms such as let and defn into their own rules, 
 or should I rely instead on the actual content of the terminal to determine 
 what lists should be treated as special forms?

 For example, let's say I want to write a function which takes the parse 
 tree returned by instaparse and arranges all the let bindings as 
 recommended by the Clojure style guide (
 https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization).
  
 There are two approaches I could take:

 1) Build the recognition into the grammar itself:

 S = Form*

 Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
  SpecialForm | !SpecialForm Symbol
 
 List = '(' Form* ')'

...

 SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor
 defn = '(' defn Symbol String? MapMetadata? 
 VectorDestructuring Form* ')'

 Destructuring = VectorDestructuring | MapDestructuring
 VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
 Destructuring))? ']'
 MapDestructuring = Map


 2) Don't try to detect the let bindings in the grammar. Instead, search 
 the resulting parse tree for lists with let content.

 Which of these is a better approach? I sadly didn't take compilers in 
 college so I'm kind of playing this by ear; I'm sure if I had I'd have a 
 better idea of what the best practice is here.

 Thanks!

 (Full code for my project is at 
 https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed)



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Parsing Clojure with instaparse: how to handle special forms?

2014-02-06 Thread Travis Moy
I'm trying to use instaparse to parse Clojure code so that I can reformat 
it, but I'm having an issue with how to handle special forms. Should I 
attempt to parse special forms such as let and defn into their own rules, 
or should I rely instead on the actual content of the terminal to determine 
what lists should be treated as special forms?

For example, let's say I want to write a function which takes the parse 
tree returned by instaparse and arranges all the let bindings as 
recommended by the Clojure style guide 
(https://github.com/bbatsov/clojure-style-guide#source-code-layout--organization).
 
There are two approaches I could take:

1) Build the recognition into the grammar itself:

S = Form*

 Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
  SpecialForm | !SpecialForm Symbol
 
 List = '(' Form* ')'

...

 SpecialForm = defn | let | try | JavaMemberAccess | JavaConstructor
 defn = '(' defn Symbol String? MapMetadata? VectorDestructuring 
 Form* ')'

 Destructuring = VectorDestructuring | MapDestructuring
 VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
 Destructuring))? ']'
 MapDestructuring = Map


2) Don't try to detect the let bindings in the grammar. Instead, search the 
resulting parse tree for lists with let content.

Which of these is a better approach? I sadly didn't take compilers in 
college so I'm kind of playing this by ear; I'm sure if I had I'd have a 
better idea of what the best practice is here.

Thanks!

(Full code for my project is at 
https://github.com/MoyTW/clojure-toys/tree/master/formatter if needed)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
S = Form*

Form = List | ReaderMacro | Literal | Vector | Map | 
 SpecialForm | Symbol

List = '(' Form* ')'

ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment | Metadata | 
QuotedInternal (*TODO - Slash*)
Quote = ' Form
SyntaxQuote = '`' Form
Dispatch = '#' DispatchMacro
DispatchMacro = Set | Var | Regex | AnonFuncLit (*TODO - 
IgnoreForm*)
Set = '{' Form* '}'
Var = ' Form
Regex = String
AnonFuncLit = '(' Form* ')'
Comment = ';' #'[^\n]*'
Metadata = SymbolMetadata | KeywordMetadata | StringMetadata | 
MapMetadata
SymbolMetadata = ^ Symbol Form
KeywordMetadata = ^ Keyword Form
StringMetadata = ^ String Form
MapMetadata = ^ Map Form
QuotedInternal = Unquote | UnquoteSplice | GenSym
Unquote = '~' Form (*TODO - This should ONLY be used INSIDE a 
quoted form!*)
UnquoteSplice = '~@' Form (*TODO - This should ONLY be used INSIDE 
a quoted form!*)
GenSym = Symbol '#' (*TODO - This should ONLY be used INSIDE a 
quoted form!*)

Symbol = Division | Custom
Division = '/'
Custom = 
#'[a-zA-Z\*\+\!\-\_\?\=%][a-zA-Z0-9\*\+\!\-\_\?\=\.%]*/?[a-zA-Z0-9\*\+\!\-\_\?\=\.%]*'

Literal = String | Number | Character | Boolean | Keyword | NilLiteral
String = '' #'(\\\|[^])*' '' (*Matches \\\ or any char not \*)
Number = Integer | Float | Ratio (* TODO - add in support for hex/oct 
forms*)
Integer = #'[+-]?[0-9]+r?[0-9]*' (*The r is so you can do 8r52 - 8 
radix 52*)
Float = #'[+-]?([0-9]*\.[0-9]+|[0-9]+\.[0-9]*)' | (*Decimal form*)
#'[+-]?[0-9]+\.?[0-9]*e[+-]?[0-9]+' (*Exponent form*)
Ratio = #'[+-]?[0-9]+/[0-9]+'
Character = #'\\.' | '\\newline' | '\\space' | '\\tab' | '\\formfeed' |
'\\backspace' | '\\return'
(* TODO - add in support for unicode character 
representations!*)
Boolean = 'true' | 'false'
Keyword = #'::?[a-zA-Z0-9\*\+\!\-\_\?]*'
NilLiteral = 'nil'

Vector = '[' Form* ']'
Map = '{' (Form Form)* '}'S = Form*

Destructuring = VectorDestructuring | MapDestructuring
VectorDestructuring = '[' (Symbol | Destructuring)* ('' (Symbol | 
Destructuring))? ']'
MapDestructuring = Map

Form = !SpecialForm List | ReaderMacro | Literal | Vector | Map | 
 SpecialForm | !SpecialForm Symbol

List = '(' Form* ')'

ReaderMacro = Quote | SyntaxQuote | Var | Dispatch | Comment |