Re: [DISCUSS] Hyperspace + Hudi

2020-07-28 Thread Vinoth Chandar
Very informative. Thanks! On Mon, Jul 27, 2020 at 5:09 PM nishith agarwal wrote: > Yes. > > SparkSession has a reference to something called a SessionState here -> > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L152 > > Each

Re: [DISCUSS] Hyperspace + Hudi

2020-07-27 Thread nishith agarwal
Yes. SparkSession has a reference to something called a SessionState here -> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L152 Each SessionState allows for a bunch of experimentalMethods for specific optimizations that you can plug

Re: [DISCUSS] Hyperspace + Hudi

2020-07-27 Thread Vinoth Chandar
Thanks Nishith! >>Plugs in at the time of spark query planning to allow for automatic indexing optimizations based on the created index This is very interesting. Could you expand more? One day, love to support point(ish) lookups on. Hudi tables :) On Mon, Jul 27, 2020 at 8:29 AM nishith agarwal

Re: [DISCUSS] Hyperspace + Hudi

2020-07-27 Thread nishith agarwal
Thanks Vinoth for kicking off this thread. I have also been looking into hyperspace and is definitely an interesting project. On exploring the project, I found the following in addition to what you mentioned - Super easy to use, has a simple API to integrate into a spark based application -

[DISCUSS] Hyperspace + Hudi

2020-07-26 Thread Vinoth Chandar
Hello all, In case you have not followed Hyperspace is a new indexing subsystem for Spark from Microsoft. It seemed like a very interesting project and I tried to explore if it can help us with an indexing option inside Hudi. TL;DR : - Was exploring if hyperspace can be used an alternative