[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-02-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873960#comment-15873960
 ] 

Apache Spark commented on SPARK-19340:
--

User 'lxsmnv' has created a pull request for this issue:
https://github.com/apache/spark/pull/16995

> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>Priority: Minor
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> {noformat}
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> {noformat}
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/reza/test/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}
> If you put the file on hadoop (like on /user/root) when you try to run the 
> following:
> {noformat}
> val df=spark.read.option("header", false).csv("/user/root/*.txt")
> {noformat}
>  
> You will get the following exception:
> {noformat}
> org.apache.hadoop.mapred.InvalidInputException: Input Pattern 
> hdfs://hosturl/user/root/test\{00-01\}.txt matches 0 files
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 

[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-02-11 Thread Alex S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862532#comment-15862532
 ] 

Alex S commented on SPARK-19340:


Looks like the error happens when spark tries to infers a schema from the file, 
which happens in csv case. If you configure user-defined schema it should work. 
I haven't tried it with hdfs though.

{code}
spark.read.option("header","false").schema(customSchema).csv("/test*.txt")
{code}

customSchema - is the schema you define for your csv file. 

> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>Priority: Minor
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> {noformat}
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> {noformat}
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/reza/test/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}
> If you put the file on hadoop (like on /user/root) when you try to run the 
> following:
> {noformat}
> val df=spark.read.option("header", false).csv("/user/root/*.txt")
> {noformat}
>  
> You will get the following exception:
> {noformat}
> org.apache.hadoop.mapred.InvalidInputException: Input Pattern 
> hdfs://hosturl/user/root/test\{00-01\}.txt matches 0 files
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at 

[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-01-25 Thread Reza Safi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837424#comment-15837424
 ] 

Reza Safi commented on SPARK-19340:
---

As I mentioned in an earlier comment the exception only occurs if we open the 
file as csv. If we open it as text, there wouldn't be an exception and the data 
will be successfully loaded.
Also we can put a file with name text{00-1}.txt on hdfs. If the file is in 
local file system under for example /tmp/spark, then use something like this:  
sudo -u hdfs hadoop fs -put /tmp/spark/test%7B00-01%7D.txt /user/root
Instead of the curly brackets use their UTF equivalent.

> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>Priority: Minor
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> {noformat}
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> {noformat}
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/reza/test/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}
> If you put the file on hadoop (like on /user/root) when you try to run the 
> following:
> {noformat}
> val df=spark.read.option("header", false).csv("/user/root/*.txt")
> {noformat}
>  
> You will get the following exception:
> {noformat}
> org.apache.hadoop.mapred.InvalidInputException: Input Pattern 
> hdfs://hosturl/user/root/test\{00-01\}.txt matches 0 files
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> 

[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-01-25 Thread Song Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837407#comment-15837407
 ] 

Song Jun commented on SPARK-19340:
--

the reason is that spark sql treat the test{00-1}.txt as a globpath.
we can not put a file name like text{00-1}.txt to hdfs, it will throw an 
exception.

I think this is not a bug

> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>Priority: Minor
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> {noformat}
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> {noformat}
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/reza/test/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}
> If you put the file on hadoop (like on /user/root) when you try to run the 
> following:
> {noformat}
> val df=spark.read.option("header", false).csv("/user/root/*.txt")
> {noformat}
>  
> You will get the following exception:
> {noformat}
> org.apache.hadoop.mapred.InvalidInputException: Input Pattern 
> hdfs://hosturl/user/root/test\{00-01\}.txt matches 0 files
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 

[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-01-24 Thread Reza Safi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836629#comment-15836629
 ] 

Reza Safi commented on SPARK-19340:
---

[~jayadevan.m] You can have does file names in hadoop. In fact the document 
that you referred also didn't say anything about filenames with brackets in 
their name. You need to use urlencoder to put such files on hdfs.

> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>Priority: Minor
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> {noformat}
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> {noformat}
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/reza/test/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}
> If you put the file on hadoop (like on /user/root) when you try to run the 
> following:
> {noformat}
> val df=spark.read.option("header", false).csv("/user/root/*.txt")
> {noformat}
>  
> You will get the following exception:
> {noformat}
> org.apache.hadoop.mapred.InvalidInputException: Input Pattern 
> hdfs://hosturl/user/root/test\{00-01\}.txt matches 0 files
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 

[jira] [Commented] (SPARK-19340) Opening a file in CSV format will result in an exception if the filename contains special characters

2017-01-24 Thread Reza Safi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836489#comment-15836489
 ] 

Reza Safi commented on SPARK-19340:
---

[~hyukjin.kwon] I updated the description and the way to reproduce. If you run:
val 
df=spark.read.option("header","false").csv("/Users/reza/test/test{00-1}.txt"), 
you are right the issue is not just about csv. but if you run:
{noformat}
val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
{noformat}
 you will get the exception as I mentioned in the updated description. However 
running this will be successful:
{noformat}
val df=spark.read.option("header","false").text("/Users/reza/test/*.txt")
{noformat}


> Opening a file in CSV format will result in an exception if the filename 
> contains special characters
> 
>
> Key: SPARK-19340
> URL: https://issues.apache.org/jira/browse/SPARK-19340
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0
>Reporter: Reza Safi
>
> If you want to open a file that its name is like  {noformat} "*{*}*.*" 
> {noformat} or {noformat} "*[*]*.*" {noformat} using CSV format, you will get 
> the "org.apache.spark.sql.AnalysisException: Path does not exist" whether the 
> file is a local file or on hdfs.
> This bug can be reproduced on master and all other Spark 2 branches.
> To reproduce:
> # Create a file like "test{00-1}.txt" on a local directory (like in 
> /Users/reza/test/test{00-1}.txt)
> # Run spark-shell
> # Execute this command:
> val df=spark.read.option("header","false").csv("/Users/reza/test/*.txt")
> You will see the following stack trace:
> {noformat}
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/Users/rezasafi/bck/sp2/test\{00-01\}.txt;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:367)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.readText(CSVFileFormat.scala:208)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:174)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:173)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:158)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:423)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:360)
>   ... 48 elided
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org