malanb5 opened a new issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs URL: https://github.com/apache/incubator-hudi/issues/1487 Receiving the following Exception when querying data brought in from a SparkSession from a Hive table, which was setup via the steps outlined in the demo: https://hudi.apache.org/docs/docker_demo.html **Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs** **To Reproduce** Steps to reproduce the behavior: 1. Inserting data in the steps outlined in the demo. 2. Starting a SparkSession using the following: ``` private static final String HUDI_SPARK_BUNDLE_JAR_FP = "/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar"; private static final String HADOOP_CONF_DIR = "/etc/hadoop"; private static final String STOCK_TICKS_COW="stock_ticks_cow"; SparkSession spark = SparkSession .builder() .appName("FeatureExtractor") .config("spark.master", "local") .config("spark.jars", FeatureExtractor.HUDI_SPARK_BUNDLE_JAR_FP) .config("spark.driver.extraClassPath", FeatureExtractor.HADOOP_CONF_DIR) .config("spark.sql.hive.convertMetastoreParquet", false) .config("spark.submit.deployMode", "client") .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4") .config("spark.sql.warehouse.dir", "/user/hive/warehouse") // on HDFS .config("hive.metastore.uris", "thrift://hivemetastore:9083") // hive metastore uri .enableHiveSupport() .getOrCreate(); ``` 3. Make a query and try to display the results: ``` Dataset<Row> all_stock_cow = spark.sql(String.format("select * from %s", STOCK_TICKS_COW)); all_stock_cow.show(); ``` **Expected behavior** The contents of the table to be rendered. The filesystem to recognize the scheme hdfs. **Environment Description** * Hudi version : 0.5.2-incubating * Spark version : 2.4.5 * Hive version : 2.4.1 * Hadoop version : 2.7.3 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : yes **Additional context** Maven pom file: ``` <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <mainClass>com.mlpipelines.FeatureExtractor</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.4.5</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.11</artifactId> <version>2.4.5</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-hive</artifactId> <version>0.5.2-incubating</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark_2.12</artifactId> <version>0.5.2-incubating</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi</artifactId> <version>0.5.2-incubating</version> <type>pom</type> </dependency> </dependencies> ``` **Stacktrace** ``` Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hudi.hadoop.InputPathHandler.parseInputPaths(InputPathHandler.java:98) at org.apache.hudi.hadoop.InputPathHandler.<init>(InputPathHandler.java:58) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:73) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:84) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.sql.execution.SQLExecutionRDD.getPartitions(SQLExecutionRDD.scala:44) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1184) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1182) at org.apache.spark.mllib.feature.IDF.fit(IDF.scala:54) at org.apache.spark.ml.feature.IDF.fit(IDF.scala:92) at com.github.malanb5.mlpipelines.FeatureExtractor.main(Fe ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services