Thank you all for your rapid responses; some really useful information and pointers in there.
We'll keep the list updated with our progress. On 18 February 2018 at 19:00, Dharmesh Kakadia <dhkaka...@gmail.com> wrote: > +1 for using ParseDriver for this. I also have used it to intercept and > augment query AST. > > Also, I would echo others sentiment that its quite ugly. It would be great > if we can refactor/standardize this. That will make integrating other > system a lot easier. > > Thanks, > Dharmesh > > On Sat, Feb 17, 2018 at 12:07 AM, Furcy Pin <pin.fu...@gmail.com> wrote: > >> Hi Elliot, >> >> Actually, I have done quite similar work regarding Hive custom Parsing, >> you should have a look at my project: https://github.com/flaminem/flamy >> >> The Hive parsing related stuff is here: https://github.com/flami >> nem/flamy/tree/master/src/main/scala/com/flaminem/flamy/parsing/hive >> A good starting point to see how to parse queries is here: >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/PopulateParserInfo.scala#L492 >> >> >> Basically, all you need is to use a org.apache.hadoop.hive.ql.pars >> e.ParseDriver. >> >> val pd: ParseDriver = new ParseDriver >> val tree: ASTNode = pd.parse(query, hiveContext) >> >> You then get the ASTNode, that you can freely parse and change. >> Also, I must say that it is quite ugly to manipulate, and the Presto >> Parser seems to be much better designed (but it is not the same syntax, >> unfortunately), >> I recommend to look at it to get better design ideas. >> >> >> If you want to enrich your Hive syntax like I did (I wanted to be able to >> parse ${VARS} in queries), >> you will not be able to use the HiveParser without some workaround. >> What I did was replacing these ${VARS} by strings "${VARS}" that the >> HiveParser would agree to parse, >> and that I could recognize afterwards... >> >> Also, if you are familiar with Scala, I recommend using it, it helps a >> lot... >> >> For instance, I have this class that transforms an AST back into a string >> query: >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/ast/SQLFormatter.scala >> I could never have done something that good looking in Java... >> >> Finally this method helps a lot to understand how the hell the AST works: >> https://github.com/flaminem/flamy/blob/master/src/main/scala >> /com/flaminem/flamy/parsing/hive/HiveParserUtils.scala#L593 >> >> Make sure to write *tons* of unit tests too, you'll need them. >> >> Hope this helps, >> >> Furcy >> >> >> >> On 16 February 2018 at 21:20, Gopal Vijayaraghavan <gop...@apache.org> >> wrote: >> >>> >>> > However, ideally we wish to manipulate the original query as delivered >>> by the user (or as close to it as possible), and we’re finding that the >>> tree has been modified significantly by the time it hits the hook >>> >>> That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook - >>> the bushy join conversion is already done by the time the hook gets called. >>> >>> We need a Parser hook to hook it ahead of CBO, not a Semantic Analyzer >>> hook. >>> >>> > Additionally we wish to track back ASTNodes to the character sequences >>> in the source HQL that were their origin (where sensible), and ultimately >>> hope to be able regenerate the query text from the AST. >>> >>> I started work on a Hive-unparser a while back based on this class, but >>> it a world of verbose coding. >>> >>> https://github.com/apache/hive/blob/master/ql/src/java/org/a >>> pache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L850 >>> >>> If you're doing active work on this, I'd like to help, because I need >>> the AST -> query to debug CBO. >>> >>> > The use case, if you are interested, is a mutation testing framework >>> for HQL. The testing of mutants is operational, but now we need to report >>> on survivors, hence the need to track back from specific query elements to >>> character sequences in the original query string. >>> >>> This sounds a lot like the fuzzing random-query-gen used in Cloudera to >>> have Impala vs Hive bug-for-bug compat. >>> >>> https://cwiki.apache.org/confluence/download/attachments/273 >>> 62054/Random%20Query%20Gen-%20Hive%20Meetup.pptx >>> >>> Cheers, >>> Gopal >>> >>> >>> >> >