[jira] [Updated] (SPARK-6330) newParquetRelation gets incorrect FileSystem
[ https://issues.apache.org/jira/browse/SPARK-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6330: Priority: Blocker (was: Major) newParquetRelation gets incorrect FileSystem Key: SPARK-6330 URL: https://issues.apache.org/jira/browse/SPARK-6330 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Volodymyr Lyubinets Assignee: Volodymyr Lyubinets Priority: Blocker Fix For: 1.3.1, 1.4.0 Here's a snippet from newParquet.scala: def refresh(): Unit = { val fs = FileSystem.get(sparkContext.hadoopConfiguration) // Support either reading a collection of raw Parquet part-files, or a collection of folders // containing Parquet files (e.g. partitioned Parquet table). val baseStatuses = paths.distinct.map { p = val qualified = fs.makeQualified(new Path(p)) if (!fs.exists(qualified) maybeSchema.isDefined) { fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) } fs.getFileStatus(qualified) }.toArray If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6330) newParquetRelation gets incorrect FileSystem
[ https://issues.apache.org/jira/browse/SPARK-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-6330: Target Version/s: 1.3.1 newParquetRelation gets incorrect FileSystem Key: SPARK-6330 URL: https://issues.apache.org/jira/browse/SPARK-6330 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Volodymyr Lyubinets Assignee: Volodymyr Lyubinets Fix For: 1.4.0, 1.3.1 Here's a snippet from newParquet.scala: def refresh(): Unit = { val fs = FileSystem.get(sparkContext.hadoopConfiguration) // Support either reading a collection of raw Parquet part-files, or a collection of folders // containing Parquet files (e.g. partitioned Parquet table). val baseStatuses = paths.distinct.map { p = val qualified = fs.makeQualified(new Path(p)) if (!fs.exists(qualified) maybeSchema.isDefined) { fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) } fs.getFileStatus(qualified) }.toArray If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6330) newParquetRelation gets incorrect FileSystem
[ https://issues.apache.org/jira/browse/SPARK-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-6330: -- Fix Version/s: 1.4.0 newParquetRelation gets incorrect FileSystem Key: SPARK-6330 URL: https://issues.apache.org/jira/browse/SPARK-6330 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Volodymyr Lyubinets Assignee: Volodymyr Lyubinets Fix For: 1.4.0, 1.3.1 Here's a snippet from newParquet.scala: def refresh(): Unit = { val fs = FileSystem.get(sparkContext.hadoopConfiguration) // Support either reading a collection of raw Parquet part-files, or a collection of folders // containing Parquet files (e.g. partitioned Parquet table). val baseStatuses = paths.distinct.map { p = val qualified = fs.makeQualified(new Path(p)) if (!fs.exists(qualified) maybeSchema.isDefined) { fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) } fs.getFileStatus(qualified) }.toArray If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org