Re: HQL parser internals

Dharmesh Kakadia Sun, 18 Feb 2018 11:01:14 -0800

+1 for using ParseDriver for this. I also have used it to intercept and
augment query AST.


Also, I would echo others sentiment that its quite ugly. It would be great
if we can refactor/standardize this. That will make integrating other
system a lot easier.

Thanks,
Dharmesh

On Sat, Feb 17, 2018 at 12:07 AM, Furcy Pin <pin.fu...@gmail.com> wrote:

> Hi Elliot,
>
> Actually, I have done quite similar work regarding Hive custom Parsing,
> you should have a look at my project: https://github.com/flaminem/flamy
>
> The Hive parsing related stuff is here: https://github.com/
> flaminem/flamy/tree/master/src/main/scala/com/flaminem/flamy/parsing/hive
> A good starting point to see how to parse queries is here:
> https://github.com/flaminem/flamy/blob/master/src/main/
> scala/com/flaminem/flamy/parsing/hive/PopulateParserInfo.scala#L492
>
>
> Basically, all you need is to use a org.apache.hadoop.hive.ql.
> parse.ParseDriver.
>
> val pd: ParseDriver = new ParseDriver
> val tree: ASTNode = pd.parse(query, hiveContext)
>
> You then get the ASTNode, that you can freely parse and change.
> Also, I must say that it is quite ugly to manipulate, and the Presto
> Parser seems to be much better designed (but it is not the same syntax,
> unfortunately),
> I recommend to look at it to get better design ideas.
>
>
> If you want to enrich your Hive syntax like I did (I wanted to be able to
> parse ${VARS} in queries),
> you will not be able to use the HiveParser without some workaround.
> What I did was replacing these ${VARS} by strings "${VARS}" that the
> HiveParser would agree to parse,
> and that I could recognize afterwards...
>
> Also, if you are familiar with Scala, I recommend using it, it helps a
> lot...
>
> For instance, I have this class that transforms an AST back into a string
> query:
> https://github.com/flaminem/flamy/blob/master/src/main/
> scala/com/flaminem/flamy/parsing/hive/ast/SQLFormatter.scala
> I could never have done something that good looking in Java...
>
> Finally this method helps a lot to understand how the hell the AST works:
> https://github.com/flaminem/flamy/blob/master/src/main/
> scala/com/flaminem/flamy/parsing/hive/HiveParserUtils.scala#L593
>
> Make sure to write *tons* of unit tests too, you'll need them.
>
> Hope this helps,
>
> Furcy
>
>
>
> On 16 February 2018 at 21:20, Gopal Vijayaraghavan <gop...@apache.org>
> wrote:
>
>>
>> > However, ideally we wish to manipulate the original query as delivered
>> by the user (or as close to it as possible), and we’re finding that the
>> tree has been modified significantly by the time it hits the hook
>>
>> That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook -
>> the bushy join conversion is already done by the time the hook gets called.
>>
>> We need a Parser hook to hook it ahead of CBO, not a Semantic Analyzer
>> hook.
>>
>> > Additionally we wish to track back ASTNodes to the character sequences
>> in the source HQL that were their origin (where sensible), and ultimately
>> hope to be able regenerate the query text from the AST.
>>
>> I started work on a Hive-unparser a while back based on this class, but
>> it a world of verbose coding.
>>
>> https://github.com/apache/hive/blob/master/ql/src/java/org/
>> apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L850
>>
>> If you're doing active work on this, I'd like to help, because I need the
>> AST -> query to debug CBO.
>>
>> > The use case, if you are interested, is a mutation testing framework
>> for HQL. The testing of mutants is operational, but now we need to report
>> on survivors, hence the need to track back from specific query elements to
>> character sequences in the original query string.
>>
>> This sounds a lot like the fuzzing random-query-gen used in Cloudera to
>> have Impala vs Hive bug-for-bug compat.
>>
>> https://cwiki.apache.org/confluence/download/attachments/
>> 27362054/Random%20Query%20Gen-%20Hive%20Meetup.pptx
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Re: HQL parser internals

Reply via email to