GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/16910

    [SPARK-19575][SQL]Reading from or writing to a hive serde table with a non 
pre-existing location should succeed

    ## What changes were proposed in this pull request?
    
    This PR is a folllowup work from 
[SPARK-19329](https://issues.apache.org/jira/browse/SPARK-19329), which has 
unify the action when we reading from or writing to a datasource table with a 
non pre-existing locaiton, so here we should also unify the hive serde tables.
    
    Currently when we select from a hive serde table which has a non 
pre-existing location will throw an exception:
    ```
    Input path does not exist: 
file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274
    at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
    at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
    at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2080)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:258)
    ```
    
    ## How was this patch tested?
    unit tests added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark selectHiveFromNotExistLocation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16910
    
----
commit cb983756f7fb270c545f90a98d03e0db3ccc0bd9
Author: windpiger <song...@outlook.com>
Date:   2017-02-13T07:50:55Z

    [SPARK-19575][SQL]Reading from or writing to a hive serde table with a non 
pre-existing location should succeed

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to