PutHiveStreaming is really the only thing that should cause this because your bypassing Hive and writing directly to the file system. The JDBC Driver itself isn’t supposed to have external dependencies beyond the basic Hive ones. I’ve used the default Hive Processors to connect to AWS EMR Hive, EMR Spark, and AWS Athena without any additional jars on 1.10.0.
Thanks Shawn From: Matt Burgess <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, January 23, 2020 at 4:44 PM To: "[email protected]" <[email protected]> Subject: Re: ClassNotFound exceptio That's a good point Shawn, I'd seen similar issues (which is where the Jira came from) for PutHive3Streaming, which doesn't use the JDBC driver. Juan you might want to send this to the Hive users list as well, perhaps they have more insight as to why filesystem-specific stuff is happening on the Hive JDBC client side. Thanks, Matt On Thu, Jan 23, 2020 at 5:39 PM Shawn Weeks <[email protected]<mailto:[email protected]>> wrote: I’m pretty sure that exception is coming from Hive and not NiFi. I’m really struggling to see why the Hive JDBC driver needs understanding of storage when it’s just Thrift messages to the HiveServer2. Are you able to run these queries through beeline? Thanks From: Matt Burgess <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, January 23, 2020 at 2:33 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: ClassNotFound exceptio Juan, I'm not sure if NIFI-6912 [1] will help or not, but the imminent 1.11.0 release will have extra JARs (ADLS, Azure, AWS, etc.) in the Hive 3 bundle. Since you're using the Hive 1 bundle, then to include such dependencies you'd have to add them to the nifi-hive-nar/pom.xml and build a custom NAR manually. Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-6912 On Thu, Jan 23, 2020 at 11:58 AM FABIAN Juan-antonio <[email protected]<mailto:[email protected]>> wrote: Hi, I've migrated one flow from using HDFS to use Minio as the storage layer. Basically I deleted the PutHDFS processors, and now I only rely on PutS3Object. After writing to Minio, I got stuck with PutHiveQL. To make it work, I had to setup a HiveConnectionPool. I'm using the provided hive-site.xml, for NiFi to be aware of our Hive metastore. Just for you to know, I started a NiFi cluster somewhere else, and configured git as the flow provider, so I edited the "old" flow inside the "new" NiFi deployment. When deploying NiFi, I manually copied the hive-site.xml to a known location, and it's working fine, apparently. Even there's no PutHDFS in the "new" NiFi flow, I'm getting this error from the PutHiveQL processor: 2020-01-23 16:46:09,402 ERROR [Timer-Driven Process Thread-7] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=bff6add0-cdbc-3c2f-b79f-b3a2438139c1] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: Failed to process StandardFlowFileRecord[uuid=3a089972-3b10-46bc-a880-bbf9cb322a6b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1579621741830-937, container=default, section=937], offset=852885, length=176],offset=0,name=3441009295635454,size=176] due to java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found);: org.apache.nifi.processor.exception.ProcessException: Failed to process StandardFlowFileRecord[uuid=3a089972-3b10-46bc-a880-bbf9cb322a6b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1579621741830-937, container=default, section=937], offset=852885, length=176],offset=0,name=3441009295635454,size=176] due to java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found); org.apache.nifi.processor.exception.ProcessException: Failed to process StandardFlowFileRecord[uuid=3a089972-3b10-46bc-a880-bbf9cb322a6b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1579621741830-937, container=default, section=937], offset=852885, length=176],offset=0,name=3441009295635454,size=176] due to java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found); at org.apache.nifi.processor.util.pattern.ExceptionHandler.lambda$createOnGroupError$2(ExceptionHandler.java:226) at org.apache.nifi.processor.util.pattern.ExceptionHandler.lambda$createOnError$1(ExceptionHandler.java:179) at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54) at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148) at org.apache.nifi.processors.hive.PutHiveQL.lambda$new$4(PutHiveQL.java:223) at org.apache.nifi.processor.util.pattern.Put.putFlowFiles(Put.java:59) at org.apache.nifi.processor.util.pattern.Put.onTrigger(Put.java:102) at org.apache.nifi.processors.hive.PutHiveQL.lambda$onTrigger$6(PutHiveQL.java:289) at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114) at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184) at org.apache.nifi.processors.hive.PutHiveQL.onTrigger(PutHiveQL.java:289) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found); at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:98) at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) at org.apache.nifi.processors.hive.PutHiveQL.lambda$null$3(PutHiveQL.java:251) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127) ... 17 common frames omitted Somewhere on the Internet I've read about some jars. So I've copied the required jars to /usr/lib/hdinsight-datalake, but since none of my processors need this, I think it was not necessary (I'm not manually referencing them). However, the flow is not working. The jars I copied are: adls2-oauth2-token-provider-1.0.jar hadoop-azure-datalake-2.7.3.2.6.5.10-2.jar okhttp-2.7.5.jar azure-data-lake-store-sdk-2.2.5.jar jackson-core-2.7.8.jar okio-1.6.0.jar Any help or insight is really appreciated, I'm just starting my NiFi journey. Best, Juan A. Fabián Simón Data Engineer Alstom Calle Martínez Villergas 49, ed. V - 28027 Madrid - Spain Office: +34 91 384 89 00 Email: [email protected]<mailto:[email protected]> www.alstom.com<http://www.alstom.com> [cid:16fd491aa3c4cff311] [cid:16fd491aa3c5b16b22] <https://twitter.com/Alstom> [cid:16fd491aa3c692e333] <https://www.linkedin.com/company/alstom/> [cid:16fd491aa3c7745b44] <https://www.facebook.com/ALSTOM/> [cid:16fd491aa3d855d355] <https://www.instagram.com/alstom> [cid:16fd491aa3d9374b66] <https://www.youtube.com/user/Alstom> ________________________________ CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium.
