eigakow opened a new issue #1398: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver URL: https://github.com/apache/incubator-hudi/issues/1398 **Describe the problem you faced** Using DeltaStreamer with --enable-hive-sync throws `java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver` error. Should I change something in the default compilation process to include this class? **To Reproduce** Steps to reproduce the behavior: 1. Properties file: ``` hoodie.datasource.write.recordkey.field=ts hoodie.datasource.write.partitionpath.field=ts hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro source-class=FR24JsonKafkaSource bootstrap.servers=streaming-kafka-broker-1:9092,streaming-kafka-broker-2:9092,streaming-kafka-broker-3:9092 group.id=hudi_testing hoodie.deltastreamer.source.kafka.topic=fr-bru enable.auto.commit=false schemaprovider-class=org.apache.hudi.utilities.schema.FilebasedSchemaProvider auto.offset.reset=earliest hoodie.datasource.hive_sync.database=fr24raw hoodie.datasource.hive_sync.table=test_hudi hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL hoodie.datasource.hive_sync.assume_date_partitioning=true hoodie.datasource.hive_sync.useJdbc=false ``` 2. Launch spark-submit with HoodieDeltaStreamer ``` spark-submit --master yarn --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars $(pwd)/../my-app-1-jar-with-dependencies.jar $(pwd)/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.5.1-incubating.jar --props hdfs:///tmp/hudi-fr24.properties --target-base-path adl://XXX.azuredatalakestore.net/test-hudi --table-type MERGE_ON_READ --target-table test_hudi --source-class FR24JsonKafkaSource --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --enable-hive-sync --continuous --source-limit 100 ``` **Expected behavior** Sync to hive works **Environment Description** * Hudi version : hudi-0.5.1-incubating * Spark version : 2.4.0-cdh6.1.0 * Hive version : 2.1.1-cdh6.1.0 * Hadoop version : 3.0.0-cdh6.1.0 * Storage (HDFS/S3/GCS..) : ADLS * Running on Docker? (yes/no) : no **Stacktrace** ``` 0/03/11 16:04:47 INFO cluster.YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed, from pool 20/03/11 16:04:47 INFO scheduler.DAGScheduler: ResultStage 37 (collect at HoodieMergeOnReadTableCompactor.java:208) finished in 0.679 s 20/03/11 16:04:47 INFO scheduler.DAGScheduler: Job 12 finished: collect at HoodieMergeOnReadTableCompactor.java:208, took 0.680344 s 20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total of 0 compactions are retrieved 20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of latest files slices 4 20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of log files 0 20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of file slices 4 20/03/11 16:04:47 WARN compact.HoodieMergeOnReadTableCompactor: After filtering, Nothing to compact for adl://ecintpocdl.azuredatalakestore.net/FlightRadar24/test-hudi3 20/03/11 16:04:47 INFO deltastreamer.DeltaSync: Syncing target hoodie table with hive table(test_hudi). Hive metastore URL :jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL, basePath :adl://XXX.azuredatalakestore.net/test-hudi 20/03/11 16:04:47 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync shutdown. Error ?false 20/03/11 16:04:47 WARN deltastreamer.HoodieDeltaStreamer: Gracefully shutting down compactor 20/03/11 16:05:00 INFO deltastreamer.HoodieDeltaStreamer: Compactor shutting down properly!! 20/03/11 16:05:00 ERROR deltastreamer.AbstractDeltaStreamerService: Service shutdown with error java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver at org.apache.hudi.hive.HoodieHiveClient.<clinit>(HoodieHiveClient.java:80) at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:66) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:481) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:423) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:238) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:393) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 10 more ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services