Re: Unbundling "spark-avro" dependency

2022-03-08 Thread Sivabalan
I too second that for existing users we should keep the same behavior. But would like to get some clarity on what's the path towards unbundling spark-avro. Or are we always going to have only bundled (hudi spark bundle with spark-avro) artifacts in maven and for unbundled version, we are going to

Re: Unbundling "spark-avro" dependency

2022-03-08 Thread Y Ethan Guo
Thanks for raising the discussion. I agree that from the usability standpoint from the user side, we should keep the same expectation regarding "--packages" for Spark and reliance bundled spark-avro for utilities bundle in this release. Given that there are Spark API changes between 3.2.0 and

Re: Unbundling "spark-avro" dependency

2022-03-08 Thread Vinoth Chandar
Thanks Alexey. This was actually the case for a while now, I think. From what I can see, our quickstart for spark still suggests passing spark-avro in via --packages, but utilities bundle related examples are relying on the fact that this is pre-bundled. I do acknowledge that with recent Spark

Unbundling "spark-avro" dependency

2022-03-08 Thread Alexey Kudinkin
Hello, everyone! While working on HUDI-3549 , we've surprisingly discovered that Hudi actually bundles "spark-avro" dependency *by default*. This is problematic b/c "spark-avro" is tightly coupled with some of the other Spark components making up