RE: switching to different parser in Pig

2009-08-25 Thread Olga Natkovich
To answer Santhosh's question. I think the plan is to move to Jflex and CUP but 
when that happens is a matter of priorities and resources which are not clear 
at this point. We do welcome contributions ;).

Olga

-Original Message-
From: Thejas Nair [mailto:te...@yahoo-inc.com] 
Sent: Tuesday, August 25, 2009 12:52 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Cc: pi.so...@gmail.com
Subject: Re: switching to different parser in Pig

Jflex is covered by GPL, but code generated by it is not. Only the code that
is generated by Jflex goes into pig.jar.
We can't checkin Jflex.jar into svn, ivy will be setup to download it from
maven repository.
-Thejas



On 8/25/09 11:57 AM, "Dmitriy Ryaboy"  wrote:

> Santosh,
> Am I missing something about Jflex licensing? I thought that it being
> GPL, we can't package it with apache-licensed software, which prevents
> it from being a viable option (regardless of technical merits)
> 
> -Dmitriy
> 
> On Tue, Aug 25, 2009 at 1:58 PM, Santhosh Srinivasan 
> wrote:
>> Its been 6 months since this topic was discussed but we don't have
>> closure on it.
>> For SQL on top of Pig, we are using Jflex and CUP
>> (https://issues.apache.org/jira/browse/PIG-824). If we have decided on
>> the right parser, can we have a plan to move the other parsers in Pig to
>> the same technology?
>> 
>> Thanks,
>> Santhosh
>> 
>> PS: I am assuming we are not moving to Antlr.
>> 
>> 
>> -Original Message-
>> From: Alan Gates [mailto:ga...@yahoo-inc.com]
>> Sent: Tuesday, February 24, 2009 10:17 AM
>> To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
>> Subject: Re: switching to different parser in Pig
>> 
>> Sorry, after I sent that email yesterday I realized I was not very
>> clear.  I did not mean to imply that antlr didn't have good
>> documentation or good error handling.  What I wanted to say was we
>> want all three of those things, and it didn't appear that antlr
>> provided all three, since it doesn't separate out scanner and parser.
>> Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc
>> to top down parsers like javacc.  My understanding is that antlr is
>> top down like javacc.  My reasoning for this preference is that parser
>> books and classes have used those for decades, so there are a large
>> number of engineers out there (including me :) ) who know how to work
>> with them.  But maybe antlr is close enough to what we need.  I'll
>> take a deeper look at it before I vote officially on which way we
>> should go.
>> 
>> As for loops and branches, I'm not saying we need those in Pig Latin.
>> We need them somehow.  Whether it's better to put them in Pig Latin or
>> imbed pig in a existing script language is an ongoing debate.  I don't
>> want to make a decision now that effectively ends that debate without
>> buy in from those who feel strongly that Pig Latin should include
>> those constructs.
>> 
>> I agree with you that we should modify the logical plan to support
>> this rather than add another layer.  As for active development, the
>> only thing I'm aware of is we hope to start working on a more robust
>> optimizer for pig soon, and that will require some additional
>> functionality out of the logical operators, but it shouldn't cause any
>> fundamental architectural changes.
>> 
>> Alan.
>> 
>> 
>> On Feb 24, 2009, at 1:27 AM, pi song wrote:
>> 
>>> (1) Lack of good documentation which makes it hard to and time
>>> consuming
>>> to learn javacc and make changes to Pig grammar
>>> <== ANTLR is very very well documented.
>>> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
>>> http://media.pragprog.com/titles/tpantlr/toc.pdf
>>> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>>> 
>>> (2) No easy way to customize error handling and error messages
>>> <== ANTLR has very extensive error handling support
>>> http://media.pragprog.com/titles/tpantlr/errors.pdf
>>> 
>>> (3) Single path that performs both tokenizing and parsing
>>> <== What is the advantage of decoupling tokenizer and parsing ?
>>> 
>>> In addition, "Composite Grammar" is very useful for keeping the parser
>>> modular. Things that can be treated as sub-languages such as bag
>>> schema
>>> definition can be done and unit tested separately.
>>> 
>>> ANTLRWorks http://www.antlr.org/works/index.html
>>> <http://www.antlr.org/works/i

Re: switching to different parser in Pig

2009-08-25 Thread Thejas Nair
Jflex is covered by GPL, but code generated by it is not. Only the code that
is generated by Jflex goes into pig.jar.
We can't checkin Jflex.jar into svn, ivy will be setup to download it from
maven repository.
-Thejas



On 8/25/09 11:57 AM, "Dmitriy Ryaboy"  wrote:

> Santosh,
> Am I missing something about Jflex licensing? I thought that it being
> GPL, we can't package it with apache-licensed software, which prevents
> it from being a viable option (regardless of technical merits)
> 
> -Dmitriy
> 
> On Tue, Aug 25, 2009 at 1:58 PM, Santhosh Srinivasan 
> wrote:
>> Its been 6 months since this topic was discussed but we don't have
>> closure on it.
>> For SQL on top of Pig, we are using Jflex and CUP
>> (https://issues.apache.org/jira/browse/PIG-824). If we have decided on
>> the right parser, can we have a plan to move the other parsers in Pig to
>> the same technology?
>> 
>> Thanks,
>> Santhosh
>> 
>> PS: I am assuming we are not moving to Antlr.
>> 
>> 
>> -Original Message-
>> From: Alan Gates [mailto:ga...@yahoo-inc.com]
>> Sent: Tuesday, February 24, 2009 10:17 AM
>> To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
>> Subject: Re: switching to different parser in Pig
>> 
>> Sorry, after I sent that email yesterday I realized I was not very
>> clear.  I did not mean to imply that antlr didn't have good
>> documentation or good error handling.  What I wanted to say was we
>> want all three of those things, and it didn't appear that antlr
>> provided all three, since it doesn't separate out scanner and parser.
>> Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc
>> to top down parsers like javacc.  My understanding is that antlr is
>> top down like javacc.  My reasoning for this preference is that parser
>> books and classes have used those for decades, so there are a large
>> number of engineers out there (including me :) ) who know how to work
>> with them.  But maybe antlr is close enough to what we need.  I'll
>> take a deeper look at it before I vote officially on which way we
>> should go.
>> 
>> As for loops and branches, I'm not saying we need those in Pig Latin.
>> We need them somehow.  Whether it's better to put them in Pig Latin or
>> imbed pig in a existing script language is an ongoing debate.  I don't
>> want to make a decision now that effectively ends that debate without
>> buy in from those who feel strongly that Pig Latin should include
>> those constructs.
>> 
>> I agree with you that we should modify the logical plan to support
>> this rather than add another layer.  As for active development, the
>> only thing I'm aware of is we hope to start working on a more robust
>> optimizer for pig soon, and that will require some additional
>> functionality out of the logical operators, but it shouldn't cause any
>> fundamental architectural changes.
>> 
>> Alan.
>> 
>> 
>> On Feb 24, 2009, at 1:27 AM, pi song wrote:
>> 
>>> (1) Lack of good documentation which makes it hard to and time
>>> consuming
>>> to learn javacc and make changes to Pig grammar
>>> <== ANTLR is very very well documented.
>>> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
>>> http://media.pragprog.com/titles/tpantlr/toc.pdf
>>> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>>> 
>>> (2) No easy way to customize error handling and error messages
>>> <== ANTLR has very extensive error handling support
>>> http://media.pragprog.com/titles/tpantlr/errors.pdf
>>> 
>>> (3) Single path that performs both tokenizing and parsing
>>> <== What is the advantage of decoupling tokenizer and parsing ?
>>> 
>>> In addition, "Composite Grammar" is very useful for keeping the parser
>>> modular. Things that can be treated as sub-languages such as bag
>>> schema
>>> definition can be done and unit tested separately.
>>> 
>>> ANTLRWorks http://www.antlr.org/works/index.html
>>> <http://www.antlr.org/works/index.html>also
>>> makes grammar development very efficient. Think about IDE that helps
>>> you
>>> debug your code (which is grammar).
>>> 
>>> One question, is there any use case for branching and loops? The
>>> current Pig
>>> is more like a query (declarative) language. 

RE: switching to different parser in Pig

2009-08-25 Thread Olga Natkovich
We don't need to package it - we only use it at compile time. There are other 
Apache projects such as Lucine that use JFlex.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Tuesday, August 25, 2009 11:58 AM
To: pig-dev@hadoop.apache.org
Cc: pi.so...@gmail.com
Subject: Re: switching to different parser in Pig

Santosh,
Am I missing something about Jflex licensing? I thought that it being
GPL, we can't package it with apache-licensed software, which prevents
it from being a viable option (regardless of technical merits)

-Dmitriy

On Tue, Aug 25, 2009 at 1:58 PM, Santhosh Srinivasan wrote:
> Its been 6 months since this topic was discussed but we don't have
> closure on it.
> For SQL on top of Pig, we are using Jflex and CUP
> (https://issues.apache.org/jira/browse/PIG-824). If we have decided on
> the right parser, can we have a plan to move the other parsers in Pig to
> the same technology?
>
> Thanks,
> Santhosh
>
> PS: I am assuming we are not moving to Antlr.
>
>
> -Original Message-
> From: Alan Gates [mailto:ga...@yahoo-inc.com]
> Sent: Tuesday, February 24, 2009 10:17 AM
> To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
> Subject: Re: switching to different parser in Pig
>
> Sorry, after I sent that email yesterday I realized I was not very
> clear.  I did not mean to imply that antlr didn't have good
> documentation or good error handling.  What I wanted to say was we
> want all three of those things, and it didn't appear that antlr
> provided all three, since it doesn't separate out scanner and parser.
> Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc
> to top down parsers like javacc.  My understanding is that antlr is
> top down like javacc.  My reasoning for this preference is that parser
> books and classes have used those for decades, so there are a large
> number of engineers out there (including me :) ) who know how to work
> with them.  But maybe antlr is close enough to what we need.  I'll
> take a deeper look at it before I vote officially on which way we
> should go.
>
> As for loops and branches, I'm not saying we need those in Pig Latin.
> We need them somehow.  Whether it's better to put them in Pig Latin or
> imbed pig in a existing script language is an ongoing debate.  I don't
> want to make a decision now that effectively ends that debate without
> buy in from those who feel strongly that Pig Latin should include
> those constructs.
>
> I agree with you that we should modify the logical plan to support
> this rather than add another layer.  As for active development, the
> only thing I'm aware of is we hope to start working on a more robust
> optimizer for pig soon, and that will require some additional
> functionality out of the logical operators, but it shouldn't cause any
> fundamental architectural changes.
>
> Alan.
>
>
> On Feb 24, 2009, at 1:27 AM, pi song wrote:
>
>> (1) Lack of good documentation which makes it hard to and time
>> consuming
>> to learn javacc and make changes to Pig grammar
>> <== ANTLR is very very well documented.
>> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
>> http://media.pragprog.com/titles/tpantlr/toc.pdf
>> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>>
>> (2) No easy way to customize error handling and error messages
>> <== ANTLR has very extensive error handling support
>> http://media.pragprog.com/titles/tpantlr/errors.pdf
>>
>> (3) Single path that performs both tokenizing and parsing
>> <== What is the advantage of decoupling tokenizer and parsing ?
>>
>> In addition, "Composite Grammar" is very useful for keeping the parser
>> modular. Things that can be treated as sub-languages such as bag
>> schema
>> definition can be done and unit tested separately.
>>
>> ANTLRWorks http://www.antlr.org/works/index.html
>> <http://www.antlr.org/works/index.html>also
>> makes grammar development very efficient. Think about IDE that helps
>> you
>> debug your code (which is grammar).
>>
>> One question, is there any use case for branching and loops? The
>> current Pig
>> is more like a query (declarative) language. I don't really see how
>> loop
>> constructs would fit. I think what Ted mentioned is more embedding
>> Pig in
>> other languages and use those languages to do loops.
>>
>> We should think about how the logical p

Re: switching to different parser in Pig

2009-08-25 Thread Dmitriy Ryaboy
Santosh,
Am I missing something about Jflex licensing? I thought that it being
GPL, we can't package it with apache-licensed software, which prevents
it from being a viable option (regardless of technical merits)

-Dmitriy

On Tue, Aug 25, 2009 at 1:58 PM, Santhosh Srinivasan wrote:
> Its been 6 months since this topic was discussed but we don't have
> closure on it.
> For SQL on top of Pig, we are using Jflex and CUP
> (https://issues.apache.org/jira/browse/PIG-824). If we have decided on
> the right parser, can we have a plan to move the other parsers in Pig to
> the same technology?
>
> Thanks,
> Santhosh
>
> PS: I am assuming we are not moving to Antlr.
>
>
> -Original Message-
> From: Alan Gates [mailto:ga...@yahoo-inc.com]
> Sent: Tuesday, February 24, 2009 10:17 AM
> To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
> Subject: Re: switching to different parser in Pig
>
> Sorry, after I sent that email yesterday I realized I was not very
> clear.  I did not mean to imply that antlr didn't have good
> documentation or good error handling.  What I wanted to say was we
> want all three of those things, and it didn't appear that antlr
> provided all three, since it doesn't separate out scanner and parser.
> Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc
> to top down parsers like javacc.  My understanding is that antlr is
> top down like javacc.  My reasoning for this preference is that parser
> books and classes have used those for decades, so there are a large
> number of engineers out there (including me :) ) who know how to work
> with them.  But maybe antlr is close enough to what we need.  I'll
> take a deeper look at it before I vote officially on which way we
> should go.
>
> As for loops and branches, I'm not saying we need those in Pig Latin.
> We need them somehow.  Whether it's better to put them in Pig Latin or
> imbed pig in a existing script language is an ongoing debate.  I don't
> want to make a decision now that effectively ends that debate without
> buy in from those who feel strongly that Pig Latin should include
> those constructs.
>
> I agree with you that we should modify the logical plan to support
> this rather than add another layer.  As for active development, the
> only thing I'm aware of is we hope to start working on a more robust
> optimizer for pig soon, and that will require some additional
> functionality out of the logical operators, but it shouldn't cause any
> fundamental architectural changes.
>
> Alan.
>
>
> On Feb 24, 2009, at 1:27 AM, pi song wrote:
>
>> (1) Lack of good documentation which makes it hard to and time
>> consuming
>> to learn javacc and make changes to Pig grammar
>> <== ANTLR is very very well documented.
>> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
>> http://media.pragprog.com/titles/tpantlr/toc.pdf
>> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>>
>> (2) No easy way to customize error handling and error messages
>> <== ANTLR has very extensive error handling support
>> http://media.pragprog.com/titles/tpantlr/errors.pdf
>>
>> (3) Single path that performs both tokenizing and parsing
>> <== What is the advantage of decoupling tokenizer and parsing ?
>>
>> In addition, "Composite Grammar" is very useful for keeping the parser
>> modular. Things that can be treated as sub-languages such as bag
>> schema
>> definition can be done and unit tested separately.
>>
>> ANTLRWorks http://www.antlr.org/works/index.html
>> <http://www.antlr.org/works/index.html>also
>> makes grammar development very efficient. Think about IDE that helps
>> you
>> debug your code (which is grammar).
>>
>> One question, is there any use case for branching and loops? The
>> current Pig
>> is more like a query (declarative) language. I don't really see how
>> loop
>> constructs would fit. I think what Ted mentioned is more embedding
>> Pig in
>> other languages and use those languages to do loops.
>>
>> We should think about how the logical plan layer can be made simpler
>> for
>> external use so don't have to introduce a new layer. Is there any
>> major
>> active development on it? Currently I have more spare time and
>> should be
>> able to help out. (BTW, I'm slow because this is just my hobby. I
>> don't want
>> to drag you guys)
>>
>> Pi Song
>>
>> On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia
> > >wrote:
>>
>>> Hi
>>> I got this info fr

RE: switching to different parser in Pig

2009-08-25 Thread Santhosh Srinivasan
Its been 6 months since this topic was discussed but we don't have
closure on it. 

For SQL on top of Pig, we are using Jflex and CUP
(https://issues.apache.org/jira/browse/PIG-824). If we have decided on
the right parser, can we have a plan to move the other parsers in Pig to
the same technology?

Thanks,
Santhosh

PS: I am assuming we are not moving to Antlr.


-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com] 
Sent: Tuesday, February 24, 2009 10:17 AM
To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
Subject: Re: switching to different parser in Pig

Sorry, after I sent that email yesterday I realized I was not very  
clear.  I did not mean to imply that antlr didn't have good  
documentation or good error handling.  What I wanted to say was we  
want all three of those things, and it didn't appear that antlr  
provided all three, since it doesn't separate out scanner and parser.   
Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc  
to top down parsers like javacc.  My understanding is that antlr is  
top down like javacc.  My reasoning for this preference is that parser  
books and classes have used those for decades, so there are a large  
number of engineers out there (including me :) ) who know how to work  
with them.  But maybe antlr is close enough to what we need.  I'll  
take a deeper look at it before I vote officially on which way we  
should go.

As for loops and branches, I'm not saying we need those in Pig Latin.   
We need them somehow.  Whether it's better to put them in Pig Latin or  
imbed pig in a existing script language is an ongoing debate.  I don't  
want to make a decision now that effectively ends that debate without  
buy in from those who feel strongly that Pig Latin should include  
those constructs.

I agree with you that we should modify the logical plan to support  
this rather than add another layer.  As for active development, the  
only thing I'm aware of is we hope to start working on a more robust  
optimizer for pig soon, and that will require some additional  
functionality out of the logical operators, but it shouldn't cause any  
fundamental architectural changes.

Alan.


On Feb 24, 2009, at 1:27 AM, pi song wrote:

> (1) Lack of good documentation which makes it hard to and time  
> consuming
> to learn javacc and make changes to Pig grammar
> <== ANTLR is very very well documented.
> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
> http://media.pragprog.com/titles/tpantlr/toc.pdf
> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>
> (2) No easy way to customize error handling and error messages
> <== ANTLR has very extensive error handling support
> http://media.pragprog.com/titles/tpantlr/errors.pdf
>
> (3) Single path that performs both tokenizing and parsing
> <== What is the advantage of decoupling tokenizer and parsing ?
>
> In addition, "Composite Grammar" is very useful for keeping the parser
> modular. Things that can be treated as sub-languages such as bag  
> schema
> definition can be done and unit tested separately.
>
> ANTLRWorks http://www.antlr.org/works/index.html
> <http://www.antlr.org/works/index.html>also
> makes grammar development very efficient. Think about IDE that helps  
> you
> debug your code (which is grammar).
>
> One question, is there any use case for branching and loops? The  
> current Pig
> is more like a query (declarative) language. I don't really see how  
> loop
> constructs would fit. I think what Ted mentioned is more embedding  
> Pig in
> other languages and use those languages to do loops.
>
> We should think about how the logical plan layer can be made simpler  
> for
> external use so don't have to introduce a new layer. Is there any  
> major
> active development on it? Currently I have more spare time and  
> should be
> able to help out. (BTW, I'm slow because this is just my hobby. I  
> don't want
> to drag you guys)
>
> Pi Song
>
> On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia
 >wrote:
>
>> Hi
>> I got this info from javacc mailing lists. This may prove helpful:
>>
>>
>>



>> -Original Message- From: Ken Beesley
>> [mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
>> PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello  
>> All)
>>
>> Vicas wrote:
>>
>> Hello All
>>
>> Kindly let me know other parsers available which does the same job as
>> javacc.
>>
>> It would be very nice of you if you 

Re: switching to different parser in Pig

2009-02-24 Thread Ted Dunning
Yes.

And one thing I should have mentioned was Chris W's thoughts along the lines
that it would be very nice to expose the logical plan to something like
Cascading so that a global restructuring could be done across more than just
Pig programs.  It works the other way as well, with it becoming possible for
Pig to execute programs expressed (conceivably) in Cascading form.

On Tue, Feb 24, 2009 at 1:27 AM, pi song  wrote:

> I think what Ted mentioned is more embedding Pig in
> other languages and use those languages to do loops.
>



-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)


Re: switching to different parser in Pig

2009-02-24 Thread Alan Gates
Sorry, after I sent that email yesterday I realized I was not very  
clear.  I did not mean to imply that antlr didn't have good  
documentation or good error handling.  What I wanted to say was we  
want all three of those things, and it didn't appear that antlr  
provided all three, since it doesn't separate out scanner and parser.   
Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc  
to top down parsers like javacc.  My understanding is that antlr is  
top down like javacc.  My reasoning for this preference is that parser  
books and classes have used those for decades, so there are a large  
number of engineers out there (including me :) ) who know how to work  
with them.  But maybe antlr is close enough to what we need.  I'll  
take a deeper look at it before I vote officially on which way we  
should go.


As for loops and branches, I'm not saying we need those in Pig Latin.   
We need them somehow.  Whether it's better to put them in Pig Latin or  
imbed pig in a existing script language is an ongoing debate.  I don't  
want to make a decision now that effectively ends that debate without  
buy in from those who feel strongly that Pig Latin should include  
those constructs.


I agree with you that we should modify the logical plan to support  
this rather than add another layer.  As for active development, the  
only thing I'm aware of is we hope to start working on a more robust  
optimizer for pig soon, and that will require some additional  
functionality out of the logical operators, but it shouldn't cause any  
fundamental architectural changes.


Alan.


On Feb 24, 2009, at 1:27 AM, pi song wrote:

(1) Lack of good documentation which makes it hard to and time  
consuming

to learn javacc and make changes to Pig grammar
<== ANTLR is very very well documented.
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
http://media.pragprog.com/titles/tpantlr/toc.pdf
http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home

(2) No easy way to customize error handling and error messages
<== ANTLR has very extensive error handling support
http://media.pragprog.com/titles/tpantlr/errors.pdf

(3) Single path that performs both tokenizing and parsing
<== What is the advantage of decoupling tokenizer and parsing ?

In addition, "Composite Grammar" is very useful for keeping the parser
modular. Things that can be treated as sub-languages such as bag  
schema

definition can be done and unit tested separately.

ANTLRWorks http://www.antlr.org/works/index.html
also
makes grammar development very efficient. Think about IDE that helps  
you

debug your code (which is grammar).

One question, is there any use case for branching and loops? The  
current Pig
is more like a query (declarative) language. I don't really see how  
loop
constructs would fit. I think what Ted mentioned is more embedding  
Pig in

other languages and use those languages to do loops.

We should think about how the logical plan layer can be made simpler  
for
external use so don't have to introduce a new layer. Is there any  
major
active development on it? Currently I have more spare time and  
should be
able to help out. (BTW, I'm slow because this is just my hobby. I  
don't want

to drag you guys)

Pi Song

On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia >wrote:



Hi
I got this info from javacc mailing lists. This may prove helpful:



-Original Message- From: Ken Beesley
[mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello  
All)


Vicas wrote:

Hello All

Kindly let me know other parsers available which does the same job as
javacc.

It would be very nice of you if you can send me some documentation
related to this.

Thanks Vikas

(Correction and clarifications to the following would be _very_
welcome. I'm very likely out of date.)

Of course, no two software tools are likely to do _exactly_ the same
job. Someone already pointed you to ANTLR, which is probably the
best-known alternative to JavaCC. Another possibility is SableCC.
http://sablecc.org

The criteria include stability, documentation, language of the parser
generated, and abstract-syntax-tree building.

When I last looked (a couple of years ago) at ANTLR, SableCC and
JavaCC, I chose JavaCC for the following reasons:

1. ANTLR could not handle Unicode input. Things change, of course, so
ANTLR might now be more Unicode-friendly. Unicode was important to  
me,

so this was a big factor in my decision.

On the plus side for ANTLR, it has better abstract-syntax-tree
building capabilities (in my opinion) than JJTree/JavaCC. You can
learn to use JJTree commands, but it's not easy for most people.

And ANTLR can generate either a Java or a C++ parser. JavaCC

Re: switching to different parser in Pig

2009-02-24 Thread pi song
 (1) Lack of good documentation which makes it hard to and time consuming
to learn javacc and make changes to Pig grammar
<== ANTLR is very very well documented.
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
http://media.pragprog.com/titles/tpantlr/toc.pdf
http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home

(2) No easy way to customize error handling and error messages
<== ANTLR has very extensive error handling support
http://media.pragprog.com/titles/tpantlr/errors.pdf

(3) Single path that performs both tokenizing and parsing
<== What is the advantage of decoupling tokenizer and parsing ?

In addition, "Composite Grammar" is very useful for keeping the parser
modular. Things that can be treated as sub-languages such as bag schema
definition can be done and unit tested separately.

ANTLRWorks http://www.antlr.org/works/index.html
also
makes grammar development very efficient. Think about IDE that helps you
debug your code (which is grammar).

One question, is there any use case for branching and loops? The current Pig
is more like a query (declarative) language. I don't really see how loop
constructs would fit. I think what Ted mentioned is more embedding Pig in
other languages and use those languages to do loops.

We should think about how the logical plan layer can be made simpler for
external use so don't have to introduce a new layer. Is there any major
active development on it? Currently I have more spare time and should be
able to help out. (BTW, I'm slow because this is just my hobby. I don't want
to drag you guys)

Pi Song

On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia wrote:

> Hi
> I got this info from javacc mailing lists. This may prove helpful:
>
>
> 
> -Original Message- From: Ken Beesley
> [mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
> PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello All)
>
> Vicas wrote:
>
> Hello All
>
> Kindly let me know other parsers available which does the same job as
> javacc.
>
> It would be very nice of you if you can send me some documentation
> related to this.
>
> Thanks Vikas
>
> (Correction and clarifications to the following would be _very_
> welcome. I'm very likely out of date.)
>
> Of course, no two software tools are likely to do _exactly_ the same
> job. Someone already pointed you to ANTLR, which is probably the
> best-known alternative to JavaCC. Another possibility is SableCC.
> http://sablecc.org
>
> The criteria include stability, documentation, language of the parser
> generated, and abstract-syntax-tree building.
>
> When I last looked (a couple of years ago) at ANTLR, SableCC and
> JavaCC, I chose JavaCC for the following reasons:
>
> 1. ANTLR could not handle Unicode input. Things change, of course, so
> ANTLR might now be more Unicode-friendly. Unicode was important to me,
> so this was a big factor in my decision.
>
> On the plus side for ANTLR, it has better abstract-syntax-tree
> building capabilities (in my opinion) than JJTree/JavaCC. You can
> learn to use JJTree commands, but it's not easy for most people.
>
> And ANTLR can generate either a Java or a C++ parser. JavaCC generates
> only Java parsers.
>
> Another concern about ANTLR was that it was reputed to change a lot as
> the guru, Terence Parr, experimented with new syntax and
> functionality. JavaCC, at least at the time, was reputed to be more
> stable, perhaps stable to a fault. I wanted stability and reliability.
>
> 2. SableCC is much like JavaCC; it generates a Java parser from a
> grammar description; but it had, in my opinion, less flexible
> abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I
> looked at it), the AST it built was always a direct reflection of your
> grammar, generating one tree node for each grammar expansion involved
> in a parse, much like using JavaCC with Java Tree Builder (JTB
> http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the
> alternative to using JJTree.
>
> Using SableCC, or the combination JavaCC/JTB, should be _very_ similar
> indeed.
>
> In my opinion, SableCC and JavaCC/JTB have made a conscious choice to
> simplify AST building--you get trees that reflect the expansions in
> your grammar. Period. But often these default trees will be big, full
> of extraneous nodes that reflect precedence hierarchies in the
> recursive-descent parsing. If you want to have more control over AST
> building, to get more compact and tailored ASTs, you need to pay the
> price of learning JJTree.
>
> Assuming that you need to build ASTs, with JavaCC you have the choice
> between JJTree and JTB. With SableCC, when I last looked at it, you
> only get the JTB-like option.
>
> ***
>
> (Again, corrections and expansions would be much appreciated.)
>
> Ken Bee

Re: switching to different parser in Pig

2009-02-23 Thread nitesh bhatia
Hi
I got this info from javacc mailing lists. This may prove helpful:


-Original Message- From: Ken Beesley
[mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello All)

Vicas wrote:

Hello All

Kindly let me know other parsers available which does the same job as javacc.

It would be very nice of you if you can send me some documentation
related to this.

Thanks Vikas

(Correction and clarifications to the following would be _very_
welcome. I'm very likely out of date.)

Of course, no two software tools are likely to do _exactly_ the same
job. Someone already pointed you to ANTLR, which is probably the
best-known alternative to JavaCC. Another possibility is SableCC.
http://sablecc.org

The criteria include stability, documentation, language of the parser
generated, and abstract-syntax-tree building.

When I last looked (a couple of years ago) at ANTLR, SableCC and
JavaCC, I chose JavaCC for the following reasons:

1. ANTLR could not handle Unicode input. Things change, of course, so
ANTLR might now be more Unicode-friendly. Unicode was important to me,
so this was a big factor in my decision.

On the plus side for ANTLR, it has better abstract-syntax-tree
building capabilities (in my opinion) than JJTree/JavaCC. You can
learn to use JJTree commands, but it's not easy for most people.

And ANTLR can generate either a Java or a C++ parser. JavaCC generates
only Java parsers.

Another concern about ANTLR was that it was reputed to change a lot as
the guru, Terence Parr, experimented with new syntax and
functionality. JavaCC, at least at the time, was reputed to be more
stable, perhaps stable to a fault. I wanted stability and reliability.

2. SableCC is much like JavaCC; it generates a Java parser from a
grammar description; but it had, in my opinion, less flexible
abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I
looked at it), the AST it built was always a direct reflection of your
grammar, generating one tree node for each grammar expansion involved
in a parse, much like using JavaCC with Java Tree Builder (JTB
http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the
alternative to using JJTree.

Using SableCC, or the combination JavaCC/JTB, should be _very_ similar indeed.

In my opinion, SableCC and JavaCC/JTB have made a conscious choice to
simplify AST building--you get trees that reflect the expansions in
your grammar. Period. But often these default trees will be big, full
of extraneous nodes that reflect precedence hierarchies in the
recursive-descent parsing. If you want to have more control over AST
building, to get more compact and tailored ASTs, you need to pay the
price of learning JJTree.

Assuming that you need to build ASTs, with JavaCC you have the choice
between JJTree and JTB. With SableCC, when I last looked at it, you
only get the JTB-like option.

***

(Again, corrections and expansions would be much appreciated.)

Ken Beesley




---


Of course, no two software tools are likely to do _exactly_ the same
job. Someone already pointed you to ANTLR, which is probably the
best-known alternative to JavaCC. Another possibility is SableCC.
http://sablecc.org

The criteria include stability, documentation, language of the parser
generated, and abstract-syntax-tree building.

When I last looked (a couple of years ago) at ANTLR, SableCC and
JavaCC, I chose JavaCC for the following reasons:

1. ANTLR could not handle Unicode input. Things change, of course, so
ANTLR might now be more Unicode-friendly. Unicode was important to me,
so this was a big factor in my decision.

On the plus side for ANTLR, it has better abstract-syntax-tree
building capabilities (in my opinion) than JJTree/JavaCC. You can
learn to use JJTree commands, but it's not easy for most people.

And ANTLR can generate either a Java or a C++ parser. JavaCC generates
only Java parsers.

Another concern about ANTLR was that it was reputed to change a lot as
the guru, Terence Parr, experimented with new syntax and
functionality. JavaCC, at least at the time, was reputed to be more
stable, perhaps stable to a fault. I wanted stability and reliability.

2. SableCC is much like JavaCC; it generates a Java parser from a
grammar description; but it had, in my opinion, less flexible
abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I
looked at it), the AST it built was always a direct reflection of your
grammar, generating one tree node for each grammar expansion involved
in a parse, much like using JavaCC with Java Tree Builder (JTB
http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the
alternative t

Re: switching to different parser in Pig

2009-02-23 Thread Alan Gates
We looked into antlr.  It appears to be very similar to javacc, with  
the added feature that the java code it generates is humanly  
readable.  That isn't why we want to switch off of javacc.  Olga  
listed the 3 things we want out of a parser that javacc isn't giving  
us (lack of docs, no easy customization of error handle, decoupling of  
scanning and parsing).  So antlr doesn't look viable.


In response to Pi's suggestion that we could use the logical plan, I  
hope we could use something close to it.  Whatever we choose we want  
it to be flexible enough to represent richer language constructs (like  
branch and loop).  I'm not sure our current logical plan can do that.   
At the same time, we don't need another layer of translation (we  
already have logical -> physical -> mapreduce).  I would like to find  
a representation that could handle expressing the syntax and what is  
currently the logical plan.


Alan.

On Feb 20, 2009, at 5:15 PM, pi song wrote:

Should be pretty close but we may need to cleanup the interface a  
bit. Then

the new parser  module can be switched in easily.
BTW, have we already got the solution for the new parser generator?

Pi


On Fri, Feb 20, 2009 at 9:03 PM, Ted Dunning   
wrote:




Probably nearly the same effect as you suggest.  Are the concepts  
at the

logical plan layer similar to those expressed in pig latin?  Or has a
significant transformation occurred by then?


On Fri, Feb 20, 2009 at 1:59 AM, pi song  wrote:


Sounds good but how about exposing the logical plan layer instead?
Wouldn't
that yield the same effect?  From python for example you still can
construct
a logical plan and give to Pig to execute.





--
Ted Dunning, CTO
DeepDyve






Re: switching to different parser in Pig

2009-02-20 Thread pi song
Should be pretty close but we may need to cleanup the interface a bit. Then
the new parser  module can be switched in easily.
BTW, have we already got the solution for the new parser generator?

Pi


On Fri, Feb 20, 2009 at 9:03 PM, Ted Dunning  wrote:

>
> Probably nearly the same effect as you suggest.  Are the concepts at the
> logical plan layer similar to those expressed in pig latin?  Or has a
> significant transformation occurred by then?
>
>
> On Fri, Feb 20, 2009 at 1:59 AM, pi song  wrote:
>
>> Sounds good but how about exposing the logical plan layer instead?
>> Wouldn't
>> that yield the same effect?  From python for example you still can
>> construct
>> a logical plan and give to Pig to execute.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
>


Re: switching to different parser in Pig

2009-02-20 Thread Ted Dunning
Probably nearly the same effect as you suggest.  Are the concepts at the
logical plan layer similar to those expressed in pig latin?  Or has a
significant transformation occurred by then?

On Fri, Feb 20, 2009 at 1:59 AM, pi song  wrote:

> Sounds good but how about exposing the logical plan layer instead? Wouldn't
> that yield the same effect?  From python for example you still can
> construct
> a logical plan and give to Pig to execute.
>



-- 
Ted Dunning, CTO
DeepDyve


Re: switching to different parser in Pig

2009-02-20 Thread pi song
Sounds good but how about exposing the logical plan layer instead? Wouldn't
that yield the same effect?  From python for example you still can construct
a logical plan and give to Pig to execute.
On Wed, Feb 18, 2009 at 10:07 AM, Ted Dunning  wrote:

> 2009/2/17 Alan Gates 
>
> > [not commenting on the switch, only on the exposure of AST's] Is that
> > correct?
> >
>
> Nearly so.
>
>
> > So whether we switch parsing technologies or not is not of interest to
> you,
> > only the interfaces we expose?
> >
>
> I would think that switching parsing technologies would encourage creation
> of a better AST interface layer which further my goal of getting to the
> AST's for other purposes.  I also think that exposing the AST layer would
> further your goal of switching parser technology by allowing outsiders to
> contribute parsers that you might ultimately like better.
>
> So I do see a linkage and do support switching.
>
> +1 to switching parsers (and thus making switching easier)
>


Re: switching to different parser in Pig

2009-02-17 Thread Ted Dunning
2009/2/17 Alan Gates 

> [not commenting on the switch, only on the exposure of AST's] Is that
> correct?
>

Nearly so.


> So whether we switch parsing technologies or not is not of interest to you,
> only the interfaces we expose?
>

I would think that switching parsing technologies would encourage creation
of a better AST interface layer which further my goal of getting to the
AST's for other purposes.  I also think that exposing the AST layer would
further your goal of switching parser technology by allowing outsiders to
contribute parsers that you might ultimately like better.

So I do see a linkage and do support switching.

+1 to switching parsers (and thus making switching easier)


Re: switching to different parser in Pig

2009-02-17 Thread Alan Gates

Ted,

If understand your comments correctly you aren't chiming in on whether  
we should switch parsers, just that you would like there to be a  
published interface of what pig latin syntax trees look like so you  
could generate them in other tools and then feed them into pig.  Is  
that correct?  So whether we switch parsing technologies or not is not  
of interest to you, only the interfaces we expose?


Alan.

On Feb 12, 2009, at 4:42 PM, Ted Dunning wrote:


In general, it would be really, really nice if it were easy to build
abstract Pig syntax trees outside of the normal parser.

For instance, I find the fact that pig is not a full scale scripting
language incredibly confining.  I would love to be able to build a  
DSL in
groovy that let me use groovy for scripting, but still execute pig  
jobs
easily.  If I could build Pig syntax trees easily, then I would be,  
as they

say, in pig heaven.

That would also let the switch to a different parsing technology  
happen
gradually rather than all at once.  Two different grunt interpreters  
could

coexist for a short time while the new one is proved out.

On Thu, Feb 12, 2009 at 3:58 PM, Olga Natkovich inc.com> wrote:



Pig Developers,

Pig currently uses javacc for parsing pig commands. We have found
several shortcomings with using javacc. In particular,

(1) Lack of good documentation which makes it hard to and time  
consuming

to learn javacc and make changes to Pig grammar
(2) No easy way to customize error handling and error messages
(3) Single path that performs both tokenizing and parsing

We are considering to use JFlex and Cup which are Java versions of  
Lex
and Bison instead. The main advantage of this transition is proven,  
well
known and well understood technology and input format. In addition,  
it

addresses the issues stated above.

One problem with the transition is that JFlex and Cup have GPL  
license
that is not compatible with Apache license. The workaround could be  
that

we don't commit the tools into SVN and instead developers who need to
update grammar would install them on their own. Note, that we can  
commit

the input grammar as well as the output of the grammar into SVN which
means that for developers just compiling code or making non-parser
changes, there will be no impact.

Please, comment on whether you think this is a reasonable change.

Thanks,

Olga





--
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)




Re: switching to different parser in Pig

2009-02-15 Thread Jeff Hammerbacher
Hey,

Just chiming in to say that Hive uses ANTLR; Ashish (athu...@facebook.com)
can provide more detailed feedback on their experiences with ANTLR.

Later,
Jeff

On Sat, Feb 14, 2009 at 5:16 AM, Ted Dunning  wrote:

> Not even close.
>
> Take, for example,
>
> 1) the problem of using the output of a Pig query as the list of files used
> as input.
>
> 2) Or running some query in an iterative fashion until convergence is
> reached.
>
> 3) Or running a pig query, doing a matrix computation on the result and
> then
> running another pig query on the output of the matrix computation.
>
> You can do (1) by running a pig query using an external script and then
> downloading the output and expanding that into a pig using a template
> expansion and then executing that expanded template.
>
> You can do (2) by having an external script that runs a pig program over
> and
> over again, downloading the results and checking for convergence.
>
> You can do (3) by running one program, then downloading results, computing,
> uploading results and running another program.
>
> All of these are maintenance nightmares which would be greatly eased if
> pig's semantics could be glued nicely into a good scripting language.
> Having access to the AST's would make that pretty easy.
>
> On Sat, Feb 14, 2009 at 4:10 AM, pi song  wrote:
>
> > Due to my limited knowledge, I don't quite understand why building ast
> from
> > outside Pig would be helpful. Isn't Pig Latin already good enough to
> > interface to the world?
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
> 4600 Bohannon Drive, Suite 220
> Menlo Park, CA 94025
> www.deepdyve.com
> 650-324-0110, ext. 738
> 858-414-0013 (m)
>


Re: switching to different parser in Pig

2009-02-14 Thread Ted Dunning
Not even close.

Take, for example,

1) the problem of using the output of a Pig query as the list of files used
as input.

2) Or running some query in an iterative fashion until convergence is
reached.

3) Or running a pig query, doing a matrix computation on the result and then
running another pig query on the output of the matrix computation.

You can do (1) by running a pig query using an external script and then
downloading the output and expanding that into a pig using a template
expansion and then executing that expanded template.

You can do (2) by having an external script that runs a pig program over and
over again, downloading the results and checking for convergence.

You can do (3) by running one program, then downloading results, computing,
uploading results and running another program.

All of these are maintenance nightmares which would be greatly eased if
pig's semantics could be glued nicely into a good scripting language.
Having access to the AST's would make that pretty easy.

On Sat, Feb 14, 2009 at 4:10 AM, pi song  wrote:

> Due to my limited knowledge, I don't quite understand why building ast from
> outside Pig would be helpful. Isn't Pig Latin already good enough to
> interface to the world?
>



-- 
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)


Re: switching to different parser in Pig

2009-02-14 Thread pi song
Due to my limited knowledge, I don't quite understand why building ast from
outside Pig would be helpful. Isn't Pig Latin already good enough to
interface to the world?

In terms of parser generator, has anyone considered ANTLR? I had spent a few
weeks on it a while ago. It is quite well-documented and the tools are
GREAT!! (see http://www.antlr.org/works/index.html) Its license is BSD which
is the same as JavaCC anyway. The only ugly thing is that you'll have
antlr.jar in your distribution.

Pi

On Fri, Feb 13, 2009 at 6:34 PM, Mridul Muralidharan
wrote:

>
> This sounds like a great idea !
> Would be great if other means of generating ast's for pig was possible.
>
> Regards,
> Mridul
>
>
> Ted Dunning wrote:
>
>> In general, it would be really, really nice if it were easy to build
>> abstract Pig syntax trees outside of the normal parser.
>>
>> For instance, I find the fact that pig is not a full scale scripting
>> language incredibly confining.  I would love to be able to build a DSL in
>> groovy that let me use groovy for scripting, but still execute pig jobs
>> easily.  If I could build Pig syntax trees easily, then I would be, as
>> they
>> say, in pig heaven.
>>
>> That would also let the switch to a different parsing technology happen
>> gradually rather than all at once.  Two different grunt interpreters could
>> coexist for a short time while the new one is proved out.
>>
>> On Thu, Feb 12, 2009 at 3:58 PM, Olga Natkovich 
>> wrote:
>>
>>  Pig Developers,
>>>
>>> Pig currently uses javacc for parsing pig commands. We have found
>>> several shortcomings with using javacc. In particular,
>>>
>>> (1) Lack of good documentation which makes it hard to and time consuming
>>> to learn javacc and make changes to Pig grammar
>>> (2) No easy way to customize error handling and error messages
>>> (3) Single path that performs both tokenizing and parsing
>>>
>>> We are considering to use JFlex and Cup which are Java versions of Lex
>>> and Bison instead. The main advantage of this transition is proven, well
>>> known and well understood technology and input format. In addition, it
>>> addresses the issues stated above.
>>>
>>> One problem with the transition is that JFlex and Cup have GPL license
>>> that is not compatible with Apache license. The workaround could be that
>>> we don't commit the tools into SVN and instead developers who need to
>>> update grammar would install them on their own. Note, that we can commit
>>> the input grammar as well as the output of the grammar into SVN which
>>> means that for developers just compiling code or making non-parser
>>> changes, there will be no impact.
>>>
>>> Please, comment on whether you think this is a reasonable change.
>>>
>>> Thanks,
>>>
>>> Olga
>>>
>>>
>>
>>
>>
>


Re: switching to different parser in Pig

2009-02-12 Thread Mridul Muralidharan


This sounds like a great idea !
Would be great if other means of generating ast's for pig was possible.

Regards,
Mridul

Ted Dunning wrote:

In general, it would be really, really nice if it were easy to build
abstract Pig syntax trees outside of the normal parser.

For instance, I find the fact that pig is not a full scale scripting
language incredibly confining.  I would love to be able to build a DSL in
groovy that let me use groovy for scripting, but still execute pig jobs
easily.  If I could build Pig syntax trees easily, then I would be, as they
say, in pig heaven.

That would also let the switch to a different parsing technology happen
gradually rather than all at once.  Two different grunt interpreters could
coexist for a short time while the new one is proved out.

On Thu, Feb 12, 2009 at 3:58 PM, Olga Natkovich  wrote:


Pig Developers,

Pig currently uses javacc for parsing pig commands. We have found
several shortcomings with using javacc. In particular,

(1) Lack of good documentation which makes it hard to and time consuming
to learn javacc and make changes to Pig grammar
(2) No easy way to customize error handling and error messages
(3) Single path that performs both tokenizing and parsing

We are considering to use JFlex and Cup which are Java versions of Lex
and Bison instead. The main advantage of this transition is proven, well
known and well understood technology and input format. In addition, it
addresses the issues stated above.

One problem with the transition is that JFlex and Cup have GPL license
that is not compatible with Apache license. The workaround could be that
we don't commit the tools into SVN and instead developers who need to
update grammar would install them on their own. Note, that we can commit
the input grammar as well as the output of the grammar into SVN which
means that for developers just compiling code or making non-parser
changes, there will be no impact.

Please, comment on whether you think this is a reasonable change.

Thanks,

Olga









Re: switching to different parser in Pig

2009-02-12 Thread Ted Dunning
In general, it would be really, really nice if it were easy to build
abstract Pig syntax trees outside of the normal parser.

For instance, I find the fact that pig is not a full scale scripting
language incredibly confining.  I would love to be able to build a DSL in
groovy that let me use groovy for scripting, but still execute pig jobs
easily.  If I could build Pig syntax trees easily, then I would be, as they
say, in pig heaven.

That would also let the switch to a different parsing technology happen
gradually rather than all at once.  Two different grunt interpreters could
coexist for a short time while the new one is proved out.

On Thu, Feb 12, 2009 at 3:58 PM, Olga Natkovich  wrote:

> Pig Developers,
>
> Pig currently uses javacc for parsing pig commands. We have found
> several shortcomings with using javacc. In particular,
>
> (1) Lack of good documentation which makes it hard to and time consuming
> to learn javacc and make changes to Pig grammar
> (2) No easy way to customize error handling and error messages
> (3) Single path that performs both tokenizing and parsing
>
> We are considering to use JFlex and Cup which are Java versions of Lex
> and Bison instead. The main advantage of this transition is proven, well
> known and well understood technology and input format. In addition, it
> addresses the issues stated above.
>
> One problem with the transition is that JFlex and Cup have GPL license
> that is not compatible with Apache license. The workaround could be that
> we don't commit the tools into SVN and instead developers who need to
> update grammar would install them on their own. Note, that we can commit
> the input grammar as well as the output of the grammar into SVN which
> means that for developers just compiling code or making non-parser
> changes, there will be no impact.
>
> Please, comment on whether you think this is a reasonable change.
>
> Thanks,
>
> Olga
>



-- 
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)