subject:"Java 8 Parser"

Re: Java 8 Parser

2017-05-01 Thread Durand Jean-Damien

Hello,

With Marpa::R2 it is possible to do exclusions at the lexeme level using 
user-defined character classes.

Such an implementation was used in ECMAScript as mentionned indeed by 
Jeffrey, c.f; 
https://github.com/jddurand/MarpaX-Languages-ECMAScript-AST/blob/master/lib/MarpaX/Languages/ECMAScript/AST/Grammar/CharacterClasses.pm
 
(which I admint is a bit hard to understand stand-alone without the grammar 
itself - but these are the lexeme implementation with... exclusions).
For example:

sub IsSourceCharacterButNotStarOrLineTerminator { return <<END;
+MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsSourceCharacter
-MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsStar
-MarpaX::Languages::ECMAScript::AST::Grammar::CharacterClasses::IsLineTerminator
END
}







Regards, Jean-Damien.

Le jeudi 13 octobre 2016 17:00:29 UTC+2, Harry a écrit :
>
> Hello,
>
> I'm very new to Marpa but, from its description, it looks extremely 
> awesome. 
>
> I'm also done playing with the beginner's example of the expression 
> calculator; was also able to make small changes to it. So far, so good.
>
> However, now, I'm trying to write a Java 8 Parser using the grammar 
> published here:
> https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>
> While I think I'm able to map the above Oracle grammar spec to the G1 
> rules (if I stub out some of the lexemes referenced the G1 rules) and 
> create an instance of Marpa::R2::Scanless::G, I'm having a hard time 
> writing the L0 lexer rules in SLIF for the Lexer grammar 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some 
> issues that I will need to (but don't know how to) deal with are:
>
> 1. Keyword vs Identifier: 
>
>   The Java spec defines Identifier 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> 
> thus:
> Identifier:
> IdentifierChars 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>  but not a Keyword 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>  or BooleanLiteral 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>  or NullLiteral 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
> IdentifierChars:
> JavaLetter 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>  {JavaLetterOrDigit 
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
> }
> JavaLetter:
> any Unicode character that is a "Java letter"
> JavaLetterOrDigit:
> any Unicode character that is a "Java letter-or-digit"
>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" 
> part? In Perl regex, one could do a negative lookahead assertion like so...
> 
> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x
> ) {
> # this is an Identifier
> }
>
>
> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it 
> doesn't, apparently, in SLIF.
>
> 2. Comment (single- and multi-line versions)
> I could write a bunch of G1 rules to handle the multi-line Java comment, 
> but I'm seeing it becoming very verbose. Is there an easier way to handle 
> stuff like this in SLIF?
>  
> 3. Since Marpa is Perl-based, is it possible to tap the full power of Perl 
> regex engine, especially for lexing? 
>
> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer 
> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... 
> that is written in BNF style instead of a 'flat', regex style. If I were to 
> mechanically replicate the Lexer grammar using G1 rules (instead of L0 
> rules), would it entail a performance and space overhead by creating 
> unnecessary tree nodes for what would otherwise be a flat lexeme in 
> bison/flex?
>
> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8, 
> or should I abandon it in favor of a custom / external lexer?
>
>
> Regards,
> /Harry
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-17 Thread Ruslan Shvedov

Great, thanks.

BTW, s/How to I/How do I/ on both.

I'd file a PR, but couldn't find those entries at
https://github.com/ronsavage/marpa.faq -- missing something perhaps.

On Tue, Oct 18, 2016 at 3:12 AM, Ron Savage  wrote:

> Using this material, I've added 2 new questions to the FAQ:
> http://savage.net.au/Perl-modules/html/marpa.faq/faq.html. Nos 144 and
> 145.
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-17 Thread Ron Savage

Using this material, I've added 2 new questions to the 
FAQ: http://savage.net.au/Perl-modules/html/marpa.faq/faq.html. Nos 144 and 
145.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-17 Thread Ruslan Shvedov

On Thu, Oct 13, 2016 at 6:00 PM, Harry  wrote:

>
> 1. Keyword vs Identifier:
>
http://stackoverflow.com/questions/27109840/marpa-can-i-explicitly-disallow-keywords-as-identifiers

https://gist.github.com/rns/d19b40ffc5523659dec9 -- events can be used to
analyze the string using Perl regexes and read the contex-defined lexeme.


> 2. Comment (single- and multi-line versions)
>
perhaps you can use this https://gist.github.com/jeffreykegler/5015057 by
Jeffrey

Hope this helps.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-15 Thread Paul Bennett

On Oct 15, 2016 13:01, "Jeffrey Kegler" 
wrote:
>
> Re #4, why not implement Perl regexes?  A full syntax of Perl regexes is
gruesomely complex, and much of it is symptoms rather than features.

Somewhere deep within perldoc there's a howto on making your own \p{} named
properties, which AFAICT are acceptable to Marpa's regex engine. IIRC, I
once had some progress that way.

--
P/PW/PWBENNETT

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-15 Thread Jeffrey Kegler

And here's another tutorial with a short example of external lexing:
http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2013/06/mixing-procedural.html

On Sat, Oct 15, 2016 at 4:12 AM, Harry  wrote:

> Thanks, Jeffrey, for your responses.
>
> I've gone through the documentation of Marpa::R2::Scanless::R
>  but
> it's not becoming fully clear how to 'connect' the external lexing routine
> of mine to Marpa's built-in G1 parser. I've looked at some random Marpa
> code on the Net but that code is looking way too complicated as far as
> illustrating just the "external lexing" part goes.
>
> My expectation was (and, is) that of a bison/flex type of interface where
> yyparse() calls yylex() to get the next token following which things
> automatically work. With Marpa, it seems, you have to do (much?) more than
> that (sorry, if I'm being inaccurate here).
>
> What I already understand: When doing external lexing, I assume I'll have
> to create the equivalent of yylex() myself - in my case this function will,
> e.g., be making heavy use of Perl Regex's.
>
> However, how would I pass the string value returned by my yylex() to Marpa
> parser?
>
> Could someone please share a simple, "hello world" type of example, or if
> not that, at least some pseudocode?
>
>
> On Thursday, October 13, 2016 at 10:03:41 PM UTC+5:30, Jeffrey Kegler
> wrote:
>>
>> 4.) Not sure this answers your question, but L0 rules allow full Marpa
>> syntax.
>>
>
> May I ask, why the full Perl Regex syntax is not supported in L0 rules? If
> it were supported, I could've read a Java multi-line comment simply with
> just this one-liner:
> MultilineComment ~ ('/*' .*? '*/')   # parentheses being optional
>
>
> Regards,
> /Harry
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-15 Thread Jeffrey Kegler

Re #4, why not implement Perl regexes?  A full syntax of Perl regexes is
gruesomely complex, and much of it is symptoms rather than features.

But some of the features *are* useful, including eager matching, which is
what your example depends on.  The obstacle is that Perl regexes are
deterministic, while Marpa is non-deterministic.  TDeterminism imposes
severe limits on regexes -- they're committed to the limits and
inefficiencies of a deterministic approach.  *But* there is a partially
compensating advantage -- deterministic thinking can be easier,
particularly if you get used to its limits.  So, if you are proceeding
deterministically, an instruction to "accept the shortest match" is easy to
implement.

I hope to add eager matching to Marpa::R3.  For now, in Marpa::R2, you have
to re-express the idea in BNF, even in cases where the deterministic
approach is easier and more natural.  Sorry.

Hopefully, the tutorial in my previous answer also shows you how to switch
to lexing in Perl, so you can have the best of both worlds.

Hope this helps! -- jeffrey

On Sat, Oct 15, 2016 at 4:12 AM, Harry  wrote:

> Thanks, Jeffrey, for your responses.
>
> I've gone through the documentation of Marpa::R2::Scanless::R
>  but
> it's not becoming fully clear how to 'connect' the external lexing routine
> of mine to Marpa's built-in G1 parser. I've looked at some random Marpa
> code on the Net but that code is looking way too complicated as far as
> illustrating just the "external lexing" part goes.
>
> My expectation was (and, is) that of a bison/flex type of interface where
> yyparse() calls yylex() to get the next token following which things
> automatically work. With Marpa, it seems, you have to do (much?) more than
> that (sorry, if I'm being inaccurate here).
>
> What I already understand: When doing external lexing, I assume I'll have
> to create the equivalent of yylex() myself - in my case this function will,
> e.g., be making heavy use of Perl Regex's.
>
> However, how would I pass the string value returned by my yylex() to Marpa
> parser?
>
> Could someone please share a simple, "hello world" type of example, or if
> not that, at least some pseudocode?
>
>
> On Thursday, October 13, 2016 at 10:03:41 PM UTC+5:30, Jeffrey Kegler
> wrote:
>>
>> 4.) Not sure this answers your question, but L0 rules allow full Marpa
>> syntax.
>>
>
> May I ask, why the full Perl Regex syntax is not supported in L0 rules? If
> it were supported, I could've read a Java multi-line comment simply with
> just this one-liner:
> MultilineComment ~ ('/*' .*? '*/')   # parentheses being optional
>
>
> Regards,
> /Harry
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-15 Thread Harry

Thanks, Jeffrey, for your responses.

I've gone through the documentation of Marpa::R2::Scanless::R 
 but 
it's not becoming fully clear how to 'connect' the external lexing routine 
of mine to Marpa's built-in G1 parser. I've looked at some random Marpa 
code on the Net but that code is looking way too complicated as far as 
illustrating just the "external lexing" part goes.

My expectation was (and, is) that of a bison/flex type of interface where 
yyparse() calls yylex() to get the next token following which things 
automatically work. With Marpa, it seems, you have to do (much?) more than 
that (sorry, if I'm being inaccurate here). 

What I already understand: When doing external lexing, I assume I'll have 
to create the equivalent of yylex() myself - in my case this function will, 
e.g., be making heavy use of Perl Regex's. 

However, how would I pass the string value returned by my yylex() to Marpa 
parser?

Could someone please share a simple, "hello world" type of example, or if 
not that, at least some pseudocode?

On Thursday, October 13, 2016 at 10:03:41 PM UTC+5:30, Jeffrey Kegler wrote:
>
> 4.) Not sure this answers your question, but L0 rules allow full Marpa 
> syntax.
>

May I ask, why the full Perl Regex syntax is not supported in L0 rules? If 
it were supported, I could've read a Java multi-line comment simply with 
just this one-liner:
MultilineComment ~ ('/*' .*? '*/')   # parentheses being optional

Regards,
/Harry

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-13 Thread Jeffrey Kegler

Your specific questions, of the top of my head:

1.) You may want to look at lexeme priorities.  If not, yes, external
lexing may be what you need.

2.) There are several examples of ways to write multi-line comments.  One
is in the FAQ:
http://savage.net.au/Perl-modules/html/marpa.faq/faq.html#q110

3.) Yes, but only via external lexing.

4.) Not sure this answers your question, but L0 rules allow full Marpa
syntax.

5.) For a large language, this can be a very hard call.  Note that you
*can* switch back and forth -- you can use the SLIF for some lexemes, and
use events to switch to external processing for others.

Quick answers, but I hope they help, jeffrey

On Thu, Oct 13, 2016 at 9:24 AM, Jeffrey Kegler <
jeffreykeg...@jeffreykegler.com> wrote:

> Javascript is not Java I know, but Jean-Damien Durand has written several
> full language parsers, including ECMAScript: https://github.
> com/jddurand/MarpaX-Languages-ECMAScript-AST
>
> On Thu, Oct 13, 2016 at 8:00 AM, Harry <simonsha...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm very new to Marpa but, from its description, it looks extremely
>> awesome.
>>
>> I'm also done playing with the beginner's example of the expression
>> calculator; was also able to make small changes to it. So far, so good.
>>
>> However, now, I'm trying to write a Java 8 Parser using the grammar
>> published here:
>> https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>>
>> While I think I'm able to map the above Oracle grammar spec to the G1
>> rules (if I stub out some of the lexemes referenced the G1 rules) and
>> create an instance of Marpa::R2::Scanless::G, I'm having a hard time
>> writing the L0 lexer rules in SLIF for the Lexer grammar
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some
>> issues that I will need to (but don't know how to) deal with are:
>>
>> 1. Keyword vs Identifier:
>>
>>   The Java spec defines Identifier
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8>
>> thus:
>> Identifier:
>> IdentifierChars
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>>  but not a Keyword
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>>  or BooleanLiteral
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>>  or NullLiteral
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
>> IdentifierChars:
>> JavaLetter
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>>  {JavaLetterOrDigit
>> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
>> }
>> JavaLetter:
>> any Unicode character that is a "Java letter"
>> JavaLetterOrDigit:
>> any Unicode character that is a "Java letter-or-digit"
>>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral"
>> part? In Perl regex, one could do a negative lookahead assertion like so...
>>
>> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /
>> x) {
>> # this is an Identifier
>> }
>>
>>
>> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it
>> doesn't, apparently, in SLIF.
>>
>> 2. Comment (single- and multi-line versions)
>> I could write a bunch of G1 rules to handle the multi-line Java comment,
>> but I'm seeing it becoming very verbose. Is there an easier way to handle
>> stuff like this in SLIF?
>>
>> 3. Since Marpa is Perl-based, is it possible to tap the full power of
>> Perl regex engine, especially for lexing?
>>
>> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer
>> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>...
>> that is written in BNF style instead of a 'flat', regex style. If I were to
>> mechanically replicate the Lexer grammar using G1 rules (instead of L0
>> rules), would it entail a performance and space overhead by creating
>> unnecessary tree nodes for what would otherwise be a flat lexeme in
>> bison/flex?
>>
>> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java
>> 8, or should I abandon it in favor of a custom / external lexer?
>>
>>
>> Regards,
>> /Harry
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to marpa-parser+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

2016-10-13 Thread Jeffrey Kegler

Javascript is not Java I know, but Jean-Damien Durand has written several
full language parsers, including ECMAScript:
https://github.com/jddurand/MarpaX-Languages-ECMAScript-AST

On Thu, Oct 13, 2016 at 8:00 AM, Harry <simonsha...@gmail.com> wrote:

> Hello,
>
> I'm very new to Marpa but, from its description, it looks extremely
> awesome.
>
> I'm also done playing with the beginner's example of the expression
> calculator; was also able to make small changes to it. So far, so good.
>
> However, now, I'm trying to write a Java 8 Parser using the grammar
> published here:
> https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
>
> While I think I'm able to map the above Oracle grammar spec to the G1
> rules (if I stub out some of the lexemes referenced the G1 rules) and
> create an instance of Marpa::R2::Scanless::G, I'm having a hard time
> writing the L0 lexer rules in SLIF for the Lexer grammar
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some
> issues that I will need to (but don't know how to) deal with are:
>
> 1. Keyword vs Identifier:
>
>   The Java spec defines Identifier
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8>
> thus:
> Identifier:
> IdentifierChars
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
>  but not a Keyword
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword>
>  or BooleanLiteral
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
>  or NullLiteral
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
> IdentifierChars:
> JavaLetter
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
>  {JavaLetterOrDigit
> <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
> }
> JavaLetter:
> any Unicode character that is a "Java letter"
> JavaLetterOrDigit:
> any Unicode character that is a "Java letter-or-digit"
>   *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral"
> part? In Perl regex, one could do a negative lookahead assertion like so...
>
> if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x
> ) {
> # this is an Identifier
> }
>
>
> ... but only if Marpa allowed such a rich, Perl regex syntax. Which it
> doesn't, apparently, in SLIF.
>
> 2. Comment (single- and multi-line versions)
> I could write a bunch of G1 rules to handle the multi-line Java comment,
> but I'm seeing it becoming very verbose. Is there an easier way to handle
> stuff like this in SLIF?
>
> 3. Since Marpa is Perl-based, is it possible to tap the full power of Perl
> regex engine, especially for lexing?
>
> 4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer
> grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>...
> that is written in BNF style instead of a 'flat', regex style. If I were to
> mechanically replicate the Lexer grammar using G1 rules (instead of L0
> rules), would it entail a performance and space overhead by creating
> unnecessary tree nodes for what would otherwise be a flat lexeme in
> bison/flex?
>
> 5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8,
> or should I abandon it in favor of a custom / external lexer?
>
>
> Regards,
> /Harry
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Java 8 Parser

2016-10-13 Thread Harry

Hello,

I'm very new to Marpa but, from its description, it looks extremely 
awesome. 

I'm also done playing with the beginner's example of the expression 
calculator; was also able to make small changes to it. So far, so good.

However, now, I'm trying to write a Java 8 Parser using the grammar 
published here:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html

While I think I'm able to map the above Oracle grammar spec to the G1 rules 
(if I stub out some of the lexemes referenced the G1 rules) and create an 
instance of Marpa::R2::Scanless::G, I'm having a hard time writing the L0 
lexer rules in SLIF for the Lexer grammar 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>.  Some 
issues that I will need to (but don't know how to) deal with are:

1. Keyword vs Identifier: 

  The Java spec defines Identifier 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8> thus:
Identifier:
IdentifierChars 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-IdentifierChars>
 but not a Keyword 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-Keyword> 
or BooleanLiteral 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-BooleanLiteral>
 or NullLiteral 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-NullLiteral>
IdentifierChars:
JavaLetter 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetter>
 {JavaLetterOrDigit 
<https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-JavaLetterOrDigit>
}
JavaLetter:
any Unicode character that is a "Java letter"
JavaLetterOrDigit:
any Unicode character that is a "Java letter-or-digit"
  *So, how do I do* the "not a Keyword or BooleanLiteral or NullLiteral" 
part? In Perl regex, one could do a negative lookahead assertion like so...

if (m/ (?! $Keyword | $BooleanLiteral | $NullLiteral ) $IdentifierChars /x) 
{
# this is an Identifier
}


... but only if Marpa allowed such a rich, Perl regex syntax. Which it 
doesn't, apparently, in SLIF.

2. Comment (single- and multi-line versions)
I could write a bunch of G1 rules to handle the multi-line Java comment, 
but I'm seeing it becoming very verbose. Is there an easier way to handle 
stuff like this in SLIF?
 
3. Since Marpa is Perl-based, is it possible to tap the full power of Perl 
regex engine, especially for lexing? 

4. Notice that Java 8 spec for recognizing tokens is in the form of a Lexer 
grammar <https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html>... 
that is written in BNF style instead of a 'flat', regex style. If I were to 
mechanically replicate the Lexer grammar using G1 rules (instead of L0 
rules), would it entail a performance and space overhead by creating 
unnecessary tree nodes for what would otherwise be a flat lexeme in 
bison/flex?

5. Would Marpa experts recommend using SLIF (internal scanner) for Java 8, 
or should I abandon it in favor of a custom / external lexer?


Regards,
/Harry

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Re: Java 8 Parser

Java 8 Parser

11 matches

Site Navigation

Mail list logo

Footer information