hikiyoung opened a new issue #1499: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver URL: https://github.com/apache/incubator-hudi/issues/1499 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Using DeltaStreamer with --enable-hive-sync and it throws NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; error. Should I change something in the default compilation process to include this class? **To Reproduce** Steps to reproduce the behavior: 1. Properties file ``` include=base.properties hoodie.datasource.write.recordkey.field=ORDERNUMBER hoodie.datasource.write.partitionpath.field=PARTITIONPATH hoodie.datasource.hive_sync.assume_date_partitioning=false hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/hadoop/hudi/config/orders_hudi_schema.avro hoodie.deltastreamer.source.kafka.topic=orders_hudi_v1 bootstrap.servers=kafka-broker-1:9092 auto.offset.reset=smallest hoodie.datasource.hive_sync.database=hudi hoodie.datasource.hive_sync.table=orders_hudi_cow hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000 hoodie.datasource.hive_sync.username=hive hoodie.datasource.hive_sync.password=hive hoodie.datasource.hive_sync.partition_fields=PARTITIONPATH ionValueExtractor ``` 2. Launch script with HoodieDeltaStreamer ``` TARGET_DATABASE="hudi" TRAGET_TABLE="orders_hudi" HUDI_UTILITIES_BUNDLE="file:///usr/lib/hudi/hudi-utilities-bundle.jar" TARGET_BASE_PATH="s3://data-store/$TARGET_DATABASE/$TRAGET_TABLE" PROPS="file:///home/hadoop/hudi/config/kafka-source.properties" CHECKPOINT_BASE_PATH="s3://data-store/checkpoint/$TARGET_DATABASE/$TRAGET_TABLE" spark-submit \ --conf 'spark.jars=/usr/lib/hudi/hudi-hadoop-mr-bundle.jar,/usr/lib/hudi/hudi-hive-bundle.jar,/usr/lib/hudi/hudi-presto-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/hudi/hudi-timeline-server-bundle.jar' \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ --master yarn \ --deploy-mode client \ --jars /usr/lib/spark/jars/httpclient-4.5.9.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \ --storage-type MERGE_ON_READ \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --target-base-path $TARGET_BASE_PATH \ --target-table "$TARGET_DATABASE.$TRAGET_TABLE" \ --source-ordering-field UPDATEDATE \ --enable-hive-sync \ --continuous \ --props $PROPS \ --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider ``` **Expected behavior** Sync to hive **Environment Description** EMR 2.59.0 * Hudi version : 0.5.0-inc * Spark version : 2.4.4 * Hive version : 2.3.6 * Hadoop version : 2.8.5 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ``` 20/04/08 17:54:22 INFO YarnScheduler: Removed TaskSet 39.0, whose tasks have all completed, from pool 20/04/08 17:54:22 INFO DAGScheduler: ResultStage 39 (collect at HoodieRealtimeTableCompactor.java:200) finished in 3.432 s 20/04/08 17:54:22 INFO DAGScheduler: Job 13 finished: collect at HoodieRealtimeTableCompactor.java:200, took 3.436397 s 20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested 20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/.aux/20200408175418.compaction.requested 20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested 20/04/08 17:54:22 INFO MultipartUploadOutputStream: close closed:false s3://data-store/hudi/orders_hudi_cow/.hoodie/20200408175418.compaction.requested 20/04/08 17:54:22 INFO S3NativeFileSystem: Opening 's3://data-store/hudi/orders_hudi_cow/.hoodie/hoodie.properties' for reading 20/04/08 17:49:45 INFO Utils: Supplied authorities: localhost:10000 20/04/08 17:49:45 INFO Utils: Resolved authority: localhost:10000 20/04/08 17:49:45 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000 20/04/08 17:49:46 WARN AbstractDeltaStreamerService: Gracefully shutting down compactor 20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Service shutdown with error java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111) at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 20/04/08 17:50:27 ERROR AbstractDeltaStreamerService: Monitor noticed one or more threads failed. Requesting graceful shutdown of other threads java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.lambda$monitorThreads$0(AbstractDeltaStreamerService.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111) at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 3 more 20/04/08 17:50:27 INFO Javalin: Stopping Javalin ... 20/04/08 17:50:27 INFO SparkUI: Stopped Spark web UI at http://xxx.yyy.compute.internal:4040 20/04/08 17:50:27 INFO Javalin: Javalin has stopped 20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Interrupting monitor thread 20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Shutting down all executors 20/04/08 17:50:27 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/04/08 17:50:27 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 20/04/08 17:50:27 INFO YarnClientSchedulerBackend: Stopped 20/04/08 17:50:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/04/08 17:50:27 INFO MemoryStore: MemoryStore cleared 20/04/08 17:50:27 INFO BlockManager: BlockManager stopped 20/04/08 17:50:27 INFO BlockManagerMaster: BlockManagerMaster stopped 20/04/08 17:50:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/04/08 17:50:27 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:70) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:116) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:292) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.metadata.Hive.get(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/metadata/Hive; at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:111) at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:440) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:382) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:390) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services