Thanks Wei-Chiu. Your clue help me find the root cause: spark2.3.1 jars from Maven is built with hadoop2. I solved the problem by using spark2.3.1(built with hadoop3) grabbed from HDP3 cluster.
On Thu, Aug 30, 2018 at 9:44 AM Wei-Chiu Chuang <weic...@cloudera.com> wrote: > Hi Lian, I don't know much about Spark structured streaming, but judging > from the stacktrace, you're application was trying to access > HftpFileSystem, which is removed in Apache Hadoop 3. Most likely it is > removed in HDP3.0 too (Hortonworks folks can confirm) > This is documented in CDH6.0 release note: > https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_incompatible_changes.html#hadoop_600_ic > > Please use webhdfs or httpfs instead. > > On Thu, Aug 30, 2018 at 9:36 AM Lian Jiang <jiangok2...@gmail.com> wrote: > >> I am using HDP3.0 which uses HADOOP3.1.0 and Spark 2.3.1. My spark >> streaming jobs running fine in HDP2.6.4 (HADOOP2.7.3, spark 2.2.0) fails in >> HDP3: >> >> java.lang.IllegalAccessError: class >> org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface >> org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator >> >> at java.lang.ClassLoader.defineClass1(Native Method) >> >> at java.lang.ClassLoader.defineClass(ClassLoader.java:763) >> >> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >> >> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) >> >> at java.net.URLClassLoader.access$100(URLClassLoader.java:73) >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:368) >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:362) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:361) >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> >> at java.lang.Class.forName0(Native Method) >> >> at java.lang.Class.forName(Class.java:348) >> >> at >> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) >> >> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) >> >> at java.util.ServiceLoader$1.next(ServiceLoader.java:480) >> >> at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3268) >> >> at >> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3313) >> >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352) >> >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) >> >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) >> >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) >> >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) >> >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) >> >> at >> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) >> >> at >> org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.<init>(HadoopFileLinesReader.scala:46) >> >> at >> org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.readFile(JsonDataSource.scala:125) >> >> at >> org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:132) >> >> at >> org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:130) >> >> at >> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:148) >> >> at >> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132) >> >> at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org >> $apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:128) >> >> at >> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182) >> >> at >> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) >> >> at >> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown >> Source) >> >> at >> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) >> >> at >> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) >> >> at >> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216) >> >> at >> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:108) >> >> at >> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101) >> >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) >> >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) >> >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) >> >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) >> >> at org.apache.spark.scheduler.Task.run(Task.scala:109) >> >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> Any idea? Thanks. >> >> >> I sent the same question to spark user group. Sorry if you got it twice >> but this is a little urgent. >> > > > -- > A very happy Clouderan >