[jira] [Assigned] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition

2020-10-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32069:
-

Assignee: angerszhu

> Improve error message on reading unexpected directory which is not a table 
> partition
> 
>
> Key: SPARK-32069
> URL: https://issues.apache.org/jira/browse/SPARK-32069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: angerszhu
>Priority: Minor
>  Labels: starter
> Fix For: 3.1.0
>
>
> To reproduce:
> {code:java}
> spark-sql> create table test(i long);
> spark-sql> insert into test values(1);
> {code}
> {code:java}
> bash $ mkdir ./spark-warehouse/test/data
> {code}
> There will be such error messge
> {code:java}
> java.io.IOException: Not a file: 
> file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
>   at 
> org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
>   at 
> 

[jira] [Assigned] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition

2020-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32069:


Assignee: (was: Apache Spark)

> Improve error message on reading unexpected directory which is not a table 
> partition
> 
>
> Key: SPARK-32069
> URL: https://issues.apache.org/jira/browse/SPARK-32069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Minor
>  Labels: starter
>
> To reproduce:
> {code:java}
> spark-sql> create table test(i long);
> spark-sql> insert into test values(1);
> {code}
> {code:java}
> bash $ mkdir ./spark-warehouse/test/data
> {code}
> There will be such error messge
> {code:java}
> java.io.IOException: Not a file: 
> file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
>   at 
> org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   

[jira] [Assigned] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition

2020-10-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32069:


Assignee: Apache Spark

> Improve error message on reading unexpected directory which is not a table 
> partition
> 
>
> Key: SPARK-32069
> URL: https://issues.apache.org/jira/browse/SPARK-32069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> To reproduce:
> {code:java}
> spark-sql> create table test(i long);
> spark-sql> insert into test values(1);
> {code}
> {code:java}
> bash $ mkdir ./spark-warehouse/test/data
> {code}
> There will be such error messge
> {code:java}
> java.io.IOException: Not a file: 
> file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>   at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173)
>   at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412)
>   at 
> org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
>   at 
>