GitHub user chenghao-intel opened a pull request:
https://github.com/apache/spark/pull/4356
[SPARK-5068] [SQL] Fix bug query data when path doesn't exist for
HiveContext
This is a follow up for #3907 & #3891 .
Hive actually support the not existed path(either table or partition path)
by yielding an empty row, but Spark SQL will throws exception.
Ideally, we need to check the path existence during the partition
processing, however, the `InputFormat` always computes the file splits before
that, hence exception will raised if the specified path doesn't exists.
This PR backs to the solution of #3891, and check the partition/table paths
existence in spark plan generation. And of course we can move that logic into
`HadoopRDD` if it support the non exist path in the future.
@jeanlyn, @marmbrus, @srowen
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chenghao-intel/spark partition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4356.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4356
----
commit 0033ed2ef2013e3f4abb56f9bc47989575589518
Author: jeanlyn <[email protected]>
Date: 2015-01-04T16:26:14Z
SPARK-5068: fix bug query data when path doesn't exists
commit 1a65548ee2d1ddc8d0a65cae2b421307d9b6b252
Author: jeanlyn <[email protected]>
Date: 2015-01-28T03:29:21Z
add the Licensed
commit 76df33f443dcdaff97a0b8511a0fc656fef81fe2
Author: jeanlyn <[email protected]>
Date: 2015-02-02T11:49:46Z
fix code style
commit 6958312bbd9d29c042294960d95ec0aaadaead9c
Author: Cheng Hao <[email protected]>
Date: 2015-02-04T06:01:36Z
Return empty row when table / partition path doesn't exist
commit 1f033cd8901bd97c8a4677e284847a2e975c6987
Author: Cheng Hao <[email protected]>
Date: 2015-02-04T06:26:41Z
Move the FileSystem variable as class member
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]