[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-541204842 @umehrot2 : Trying to understand why this was closed. Is this no longer neeeded ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-537163855 > > @vinothchandar @bvaradar updated the PR > > > Just few nits.. But the integration test seems to fail? ideas? > > @vinothchandar I noticed an issue with this approach today, and probably thats why the integration tests are failing. > > When we will build with the profile `shade-avro` things would be file, but when we build without this profile it can cause other things to break. Because we have always added an `` for the avro dependency as well as `` for avro and are just trying to turn shading on or off using `scope`. However, when the scope is `provided`, due to our inclusion of `relocation` section all references to `org.apache.avro` in our code/jar would be replaced with `org.apache.hudi.org.apache.avro`. However, it will not actually find that relocated dependency because it has not actually been shaded/relocated because of the scope. This will cause runtime errors. > > Essentially, now I am not able to find a good way how to activate/deactivate the shading. Only way now I can think of is adding the whole `shade plugin` with all its contents and the `avro` inside the profile. But that seems like a bad approach to me. > > Have u guys achieved anything like this in the past ? @umehrot2 : This can be fixed by adding another mvn property. I have explained in the comments. Can you please try and see if it works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-534552878 @umehrot2 : Unfortunately, the examples that I mentioned is not in the open source world. If you want to reproduce it, a high level direction would be 1. Implement HoodieRecordPayload class in a separate jar. (For testing : simply extend OverwriteWithLatestAvroPayload.java) 2. If you want to understand, how this is used in reader : See here : https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java#L109 3. This is a quick and dirty way to repro this : You can simply use the docker based demo steps as described in https://hudi.incubator.apache.org/docker_demo.html but have the following changes: (a) Pass --payload-class for all delta-streamer invocations with the class that you created in (1) (b) After you ingest the data once using deltastreamer, you can simply add an entry to hoodie.properties file in the dataset .hoodie folder -> Add "hoodie.compaction.payload.class = (c) Continue the steps. (d) In step (6a) where you query Realtime table, you should see the class getting instantiated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-534322614 @umehrot2 Shading Avro will cause some Realtime Table use-cases to break. This was one of the reasons why we ended up not using this approach. Hudi allows for pluggable record payload implementations. HoodieRecordPayload (a public facing interface) exposes Avro GenericRecord types as part of its interface. We have some deployments where custom implementations of this interface (which resides in different jar) is used to perform on-the-fly merges for Realtime Table reading. If we make shading avro as mandatory for hoodie-hadoop-mr-table, then these plugins also needs to shade avro in the same way. I think keeping the bundling optional (default = not bundle) would be better. @vinothchandar : Assuming this is a one-off case, If it makes things easier for everyone, would it make sense to publish a new jar type (hudi-hadoop-mr-bundle-with-avro-shaded) along with hudi-hadoop-mr-bundle ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services