Re: SparkSQL 1.3.0 cannot read parquet files from different file system
Looks like this is already solved in https://issues.apache.org/jira/browse/SPARK-6330 On Mon, Mar 16, 2015 at 6:43 PM, Cheng Lian wrote: > Oh sorry, I misread your question. I thought you were trying something > like parquetFile(“s3n://file1,hdfs://file2”). Yeah, it’s a valid bug. > Thanks for opening the JIRA ticket and the PR! > > > Cheng > > On 3/16/15 6:39 PM, Cheng Lian wrote: > > Hi Pei-Lun, > > We intentionally disallowed passing multiple comma separated paths in > 1.3.0. One of the reason is that users report that this fail when a file > path contain an actual comma in it. In your case, you may do something like > this: > > val s3nDF = parquetFile("s3n > ://... > ")val hdfsDF = parquetFile("hdfs://...")val finalDF = s3nDF.union(finalDF) > > Cheng > > On 3/16/15 4:03 PM, Pei-Lun Lee wrote: > > Hi, > > I am using Spark 1.3.0, where I cannot load parquet files from more than > one file system, say one s3n://... and another hdfs://..., which worked in > older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. > > One way to fix this is instead of get a single FileSystem from default > configuration in ParquetRelation2, call Path.getFileSystem for each path. > > Here's the JIRA link and pull > request:https://issues.apache.org/jira/browse/SPARK-6351https://github.com/apache/spark/pull/5039 > > Thanks, > -- > Pei-Lun > > > > > > > >
Re: SparkSQL 1.3.0 cannot read parquet files from different file system
Oh sorry, I misread your question. I thought you were trying something like |parquetFile(“s3n://file1,hdfs://file2”)|. Yeah, it’s a valid bug. Thanks for opening the JIRA ticket and the PR! Cheng On 3/16/15 6:39 PM, Cheng Lian wrote: Hi Pei-Lun, We intentionally disallowed passing multiple comma separated paths in 1.3.0. One of the reason is that users report that this fail when a file path contain an actual comma in it. In your case, you may do something like this: |val s3nDF = parquetFile("s3n://... ") val hdfsDF = parquetFile("hdfs://...") val finalDF = s3nDF.union(finalDF) | Cheng On 3/16/15 4:03 PM, Pei-Lun Lee wrote: Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from default configuration in ParquetRelation2, call Path.getFileSystem for each path. Here's the JIRA link and pull request: https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 Thanks, -- Pei-Lun
Re: SparkSQL 1.3.0 cannot read parquet files from different file system
Hi Pei-Lun, We intentionally disallowed passing multiple comma separated paths in 1.3.0. One of the reason is that users report that this fail when a file path contain an actual comma in it. In your case, you may do something like this: |val s3nDF = parquetFile("s3n://...") val hdfsDF = parquetFile("hdfs://...") val finalDF = s3nDF.union(finalDF) | Cheng On 3/16/15 4:03 PM, Pei-Lun Lee wrote: Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from default configuration in ParquetRelation2, call Path.getFileSystem for each path. Here's the JIRA link and pull request: https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 Thanks, -- Pei-Lun
SparkSQL 1.3.0 cannot read parquet files from different file system
Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from default configuration in ParquetRelation2, call Path.getFileSystem for each path. Here's the JIRA link and pull request: https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 Thanks, -- Pei-Lun