[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle

2019-10-11 Thread GitBox
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency 
in hadoop-mr-bundle
URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-541204842
 
 
   @umehrot2 : Trying to understand why this was closed. Is this no longer 
neeeded ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle

2019-10-01 Thread GitBox
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency 
in hadoop-mr-bundle
URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-537163855
 
 
   > > @vinothchandar @bvaradar updated the PR
   > 
   > > Just few nits.. But the integration test seems to fail? ideas?
   > 
   > @vinothchandar I noticed an issue with this approach today, and probably 
thats why the integration tests are failing.
   > 
   > When we will build with the profile `shade-avro` things would be file, but 
when we build without this profile it can cause other things to break. Because 
we have always added an `` for the avro dependency as well as 
`` for avro and are just trying to turn shading on or off using 
`scope`. However, when the scope is `provided`, due to our inclusion of 
`relocation` section all references to `org.apache.avro` in our code/jar would 
be replaced with `org.apache.hudi.org.apache.avro`. However, it will not 
actually find that relocated dependency because it has not actually been 
shaded/relocated because of the scope. This will cause runtime errors.
   > 
   > Essentially, now I am not able to find a good way how to 
activate/deactivate the shading. Only way now I can think of is adding the 
whole `shade plugin` with all its contents and the `avro` inside the profile. 
But that seems like a bad approach to me.
   > 
   > Have u guys achieved anything like this in the past ?
   
   @umehrot2 : This can be fixed by adding another mvn property. I have 
explained in the comments. Can you please try and see if it works.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle

2019-09-24 Thread GitBox
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency 
in hadoop-mr-bundle
URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-534552878
 
 
   
   
   @umehrot2 : Unfortunately, the examples that I mentioned is not in the open 
source world. If you want to reproduce it, a high level direction would be
   
   1. Implement HoodieRecordPayload class in a separate jar. (For testing : 
simply extend OverwriteWithLatestAvroPayload.java)
   2. If you want to understand, how this is used in reader : See here : 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java#L109
   3. This is a quick and dirty way to repro this : You can simply use the 
docker based demo steps as described in 
https://hudi.incubator.apache.org/docker_demo.html but have the following 
changes:
 (a) Pass --payload-class  for all delta-streamer invocations with the 
class that you created in (1)
 (b) After you ingest the data once using deltastreamer, you can simply add 
an entry to hoodie.properties file in the dataset .hoodie folder -> Add 
"hoodie.compaction.payload.class = 

 (c) Continue the steps.
 (d) In step (6a) where you query Realtime table, you should see the class 
getting instantiated
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency in hadoop-mr-bundle

2019-09-23 Thread GitBox
bvaradar commented on issue #915: [HUDI-268] Shade and relocate Avro dependency 
in hadoop-mr-bundle
URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-534322614
 
 
   @umehrot2  Shading Avro will cause some Realtime Table use-cases to break. 
This was one of the reasons why we ended up not using this approach. 
   
   Hudi allows for pluggable record payload implementations. 
HoodieRecordPayload (a public facing interface) exposes Avro GenericRecord 
types as part of its interface. 
   
   We have some deployments where custom implementations of this interface 
(which resides in different jar) is used to perform on-the-fly merges for 
Realtime Table reading.  If we make shading avro as mandatory for 
hoodie-hadoop-mr-table, then these plugins also needs to shade avro in the same 
way. 
   
   I think keeping the bundling optional (default = not bundle) would be 
better.  
   
   @vinothchandar : Assuming this is a one-off case, If it makes things easier 
for everyone, would it make sense to publish a new jar type 
(hudi-hadoop-mr-bundle-with-avro-shaded) along with hudi-hadoop-mr-bundle ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services