We don't need to package it - we only use it at compile time. There are other Apache projects such as Lucine that use JFlex.
Olga -----Original Message----- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Tuesday, August 25, 2009 11:58 AM To: pig-dev@hadoop.apache.org Cc: pi.so...@gmail.com Subject: Re: switching to different parser in Pig Santosh, Am I missing something about Jflex licensing? I thought that it being GPL, we can't package it with apache-licensed software, which prevents it from being a viable option (regardless of technical merits) -Dmitriy On Tue, Aug 25, 2009 at 1:58 PM, Santhosh Srinivasan<s...@yahoo-inc.com> wrote: > Its been 6 months since this topic was discussed but we don't have > closure on it. > For SQL on top of Pig, we are using Jflex and CUP > (https://issues.apache.org/jira/browse/PIG-824). If we have decided on > the right parser, can we have a plan to move the other parsers in Pig to > the same technology? > > Thanks, > Santhosh > > PS: I am assuming we are not moving to Antlr. > > > -----Original Message----- > From: Alan Gates [mailto:ga...@yahoo-inc.com] > Sent: Tuesday, February 24, 2009 10:17 AM > To: pig-dev@hadoop.apache.org; pi.so...@gmail.com > Subject: Re: switching to different parser in Pig > > Sorry, after I sent that email yesterday I realized I was not very > clear. I did not mean to imply that antlr didn't have good > documentation or good error handling. What I wanted to say was we > want all three of those things, and it didn't appear that antlr > provided all three, since it doesn't separate out scanner and parser. > Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc > to top down parsers like javacc. My understanding is that antlr is > top down like javacc. My reasoning for this preference is that parser > books and classes have used those for decades, so there are a large > number of engineers out there (including me :) ) who know how to work > with them. But maybe antlr is close enough to what we need. I'll > take a deeper look at it before I vote officially on which way we > should go. > > As for loops and branches, I'm not saying we need those in Pig Latin. > We need them somehow. Whether it's better to put them in Pig Latin or > imbed pig in a existing script language is an ongoing debate. I don't > want to make a decision now that effectively ends that debate without > buy in from those who feel strongly that Pig Latin should include > those constructs. > > I agree with you that we should modify the logical plan to support > this rather than add another layer. As for active development, the > only thing I'm aware of is we hope to start working on a more robust > optimizer for pig soon, and that will require some additional > functionality out of the logical operators, but it shouldn't cause any > fundamental architectural changes. > > Alan. > > > On Feb 24, 2009, at 1:27 AM, pi song wrote: > >> (1) Lack of good documentation which makes it hard to and time >> consuming >> to learn javacc and make changes to Pig grammar >> <== ANTLR is very very well documented. >> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference >> http://media.pragprog.com/titles/tpantlr/toc.pdf >> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home >> >> (2) No easy way to customize error handling and error messages >> <== ANTLR has very extensive error handling support >> http://media.pragprog.com/titles/tpantlr/errors.pdf >> >> (3) Single path that performs both tokenizing and parsing >> <== What is the advantage of decoupling tokenizer and parsing ? >> >> In addition, "Composite Grammar" is very useful for keeping the parser >> modular. Things that can be treated as sub-languages such as bag >> schema >> definition can be done and unit tested separately. >> >> ANTLRWorks http://www.antlr.org/works/index.html >> <http://www.antlr.org/works/index.html>also >> makes grammar development very efficient. Think about IDE that helps >> you >> debug your code (which is grammar). >> >> One question, is there any use case for branching and loops? The >> current Pig >> is more like a query (declarative) language. I don't really see how >> loop >> constructs would fit. I think what Ted mentioned is more embedding >> Pig in >> other languages and use those languages to do loops. >> >> We should think about how the logical plan layer can be made simpler >> for >> external use so don't have to introduce a new layer. Is there any >> major >> active development on it? Currently I have more spare time and >> should be >> able to help out. (BTW, I'm slow because this is just my hobby. I >> don't want >> to drag you guys) >> >> Pi Song >> >> On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia > <niteshbhatia...@gmail.com >> >wrote: >> >>> Hi >>> I got this info from javacc mailing lists. This may prove helpful: >>> >>> >>> > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > ---------------- >>> -----Original Message----- From: Ken Beesley >>> [mailto:ken....@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56 >>> PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello >>> All) >>> >>> Vicas wrote: >>> >>> Hello All >>> >>> Kindly let me know other parsers available which does the same job as >>> javacc. >>> >>> It would be very nice of you if you can send me some documentation >>> related to this. >>> >>> Thanks Vikas >>> >>> (Correction and clarifications to the following would be _very_ >>> welcome. I'm very likely out of date.) >>> >>> Of course, no two software tools are likely to do _exactly_ the same >>> job. Someone already pointed you to ANTLR, which is probably the >>> best-known alternative to JavaCC. Another possibility is SableCC. >>> http://sablecc.org >>> >>> The criteria include stability, documentation, language of the parser >>> generated, and abstract-syntax-tree building. >>> >>> When I last looked (a couple of years ago) at ANTLR, SableCC and >>> JavaCC, I chose JavaCC for the following reasons: >>> >>> 1. ANTLR could not handle Unicode input. Things change, of course, so >>> ANTLR might now be more Unicode-friendly. Unicode was important to >>> me, >>> so this was a big factor in my decision. >>> >>> On the plus side for ANTLR, it has better abstract-syntax-tree >>> building capabilities (in my opinion) than JJTree/JavaCC. You can >>> learn to use JJTree commands, but it's not easy for most people. >>> >>> And ANTLR can generate either a Java or a C++ parser. JavaCC >>> generates >>> only Java parsers. >>> >>> Another concern about ANTLR was that it was reputed to change a lot >>> as >>> the guru, Terence Parr, experimented with new syntax and >>> functionality. JavaCC, at least at the time, was reputed to be more >>> stable, perhaps stable to a fault. I wanted stability and >>> reliability. >>> >>> 2. SableCC is much like JavaCC; it generates a Java parser from a >>> grammar description; but it had, in my opinion, less flexible >>> abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I >>> looked at it), the AST it built was always a direct reflection of >>> your >>> grammar, generating one tree node for each grammar expansion involved >>> in a parse, much like using JavaCC with Java Tree Builder (JTB >>> http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the >>> alternative to using JJTree. >>> >>> Using SableCC, or the combination JavaCC/JTB, should be _very_ >>> similar >>> indeed. >>> >>> In my opinion, SableCC and JavaCC/JTB have made a conscious choice to >>> simplify AST building--you get trees that reflect the expansions in >>> your grammar. Period. But often these default trees will be big, full >>> of extraneous nodes that reflect precedence hierarchies in the >>> recursive-descent parsing. If you want to have more control over AST >>> building, to get more compact and tailored ASTs, you need to pay the >>> price of learning JJTree. >>> >>> Assuming that you need to build ASTs, with JavaCC you have the choice >>> between JJTree and JTB. With SableCC, when I last looked at it, you >>> only get the JTB-like option. >>> >>> ******* >>> >>> (Again, corrections and expansions would be much appreciated.) >>> >>> Ken Beesley >>> >>> >>> >>> >>> >>> > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > --- >>> >>> >>> Of course, no two software tools are likely to do _exactly_ the same >>> job. Someone already pointed you to ANTLR, which is probably the >>> best-known alternative to JavaCC. Another possibility is SableCC. >>> http://sablecc.org >>> >>> The criteria include stability, documentation, language of the parser >>> generated, and abstract-syntax-tree building. >>> >>> When I last looked (a couple of years ago) at ANTLR, SableCC and >>> JavaCC, I chose JavaCC for the following reasons: >>> >>> 1. ANTLR could not handle Unicode input. Things change, of course, so >>> ANTLR might now be more Unicode-friendly. Unicode was important to >>> me, >>> so this was a big factor in my decision. >>> >>> On the plus side for ANTLR, it has better abstract-syntax-tree >>> building capabilities (in my opinion) than JJTree/JavaCC. You can >>> learn to use JJTree commands, but it's not easy for most people. >>> >>> And ANTLR can generate either a Java or a C++ parser. JavaCC >>> generates >>> only Java parsers. >>> >>> Another concern about ANTLR was that it was reputed to change a lot >>> as >>> the guru, Terence Parr, experimented with new syntax and >>> functionality. JavaCC, at least at the time, was reputed to be more >>> stable, perhaps stable to a fault. I wanted stability and >>> reliability. >>> >>> 2. SableCC is much like JavaCC; it generates a Java parser from a >>> grammar description; but it had, in my opinion, less flexible >>> abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I >>> looked at it), the AST it built was always a direct reflection of >>> your >>> grammar, generating one tree node for each grammar expansion involved >>> in a parse, much like using JavaCC with Java Tree Builder (JTB >>> http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the >>> alternative to using JJTree. >>> >>> Using SableCC, or the combination JavaCC/JTB, should be _very_ >>> similar >>> indeed. >>> >>> In my opinion, SableCC and JavaCC/JTB have made a conscious choice to >>> simplify AST building--you get trees that reflect the expansions in >>> your grammar. Period. But often these default trees will be big, full >>> of extraneous nodes that reflect precedence hierarchies in the >>> recursive-descent parsing. If you want to have more control over AST >>> building, to get more compact and tailored ASTs, you need to pay the >>> price of learning JJTree. >>> >>> Assuming that you need to build ASTs, with JavaCC you have the choice >>> between JJTree and JTB. With SableCC, when I last looked at it, you >>> only get the JTB-like option. >>> >>> ---------- >>> >>> >>> >>> >>> >>> On Mon, Feb 23, 2009 at 10:06 PM, Alan Gates <ga...@yahoo-inc.com> >>> wrote: >>>> We looked into antlr. It appears to be very similar to javacc, >>>> with the >>>> added feature that the java code it generates is humanly >>>> readable. That >>>> isn't why we want to switch off of javacc. Olga listed the 3 >>>> things we >>> want >>>> out of a parser that javacc isn't giving us (lack of docs, no easy >>>> customization of error handle, decoupling of scanning and >>>> parsing). So >>>> antlr doesn't look viable. >>>> >>>> In response to Pi's suggestion that we could use the logical plan, >>>> I hope >>> we >>>> could use something close to it. Whatever we choose we want it to >>>> be >>>> flexible enough to represent richer language constructs (like >>>> branch and >>>> loop). I'm not sure our current logical plan can do that. At the >>>> same >>>> time, we don't need another layer of translation (we already have >>>> logical >>> -> >>>> physical -> mapreduce). I would like to find a representation >>>> that could >>>> handle expressing the syntax and what is currently the logical plan. >>>> >>>> Alan. >>>> >>>> On Feb 20, 2009, at 5:15 PM, pi song wrote: >>>> >>>>> Should be pretty close but we may need to cleanup the interface a >>>>> bit. >>>>> Then >>>>> the new parser module can be switched in easily. >>>>> BTW, have we already got the solution for the new parser generator? >>>>> >>>>> Pi >>>>> >>>>> >>>>> On Fri, Feb 20, 2009 at 9:03 PM, Ted Dunning >>>>> <ted.dunn...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> Probably nearly the same effect as you suggest. Are the >>>>>> concepts at >>> the >>>>>> logical plan layer similar to those expressed in pig latin? Or >>>>>> has a >>>>>> significant transformation occurred by then? >>>>>> >>>>>> >>>>>> On Fri, Feb 20, 2009 at 1:59 AM, pi song <pi.so...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Sounds good but how about exposing the logical plan layer >>>>>>> instead? >>>>>>> Wouldn't >>>>>>> that yield the same effect? From python for example you still >>>>>>> can >>>>>>> construct >>>>>>> a logical plan and give to Pig to execute. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ted Dunning, CTO >>>>>> DeepDyve >>>>>> >>>>>> >>>> >>>> >>> >>> >>> >>> -- >>> Nitesh Bhatia >>> Dhirubhai Ambani Institute of Information & Communication Technology >>> Gandhinagar >>> Gujarat >>> >>> "Life is never perfect. It just depends where you draw the line." >>> >>> visit: >>> http://www.awaaaz.com - connecting through music >>> http://www.volstreet.com - lets volunteer for better tomorrow >>> http://www.instibuzz.com - Voice opinions, Transact easily, Have fun >>> > >