GitHub user mn-mikke opened a pull request:
https://github.com/apache/spark/pull/21215
[SPARK-24148][SQL] Overloading array function to support typed empty arrays
## What changes were proposed in this pull request?
The PR proposes to overload `array` function and allow users to specify the
element type for empty arrays. Currently, empty arrays produced by `array`
function are of `StringType` and there is no way how to cast them to a
different type.
A perfect example of the use case is `when(cond,
trueExp).otherwise(falseExp)`, which expects `trueExp` and `falseExp` of being
the same type. In scenario where we want to produce an empty array, in one of
these cases, there's no other way than creating an `UDF`.
## How was this patch tested?
Added test cases into `DataFrameComplexTypeSuite`
## Note
Eventually, I will add a wrapper for PySpark, but would like to discuss the
idea first.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AbsaOSS/spark
feature/array-api-empty-array-to-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21215.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21215
----
commit 44b18520dcf8e3e3639756cd8a12f75ea1080bee
Author: Marek Novotny <mn.mikke@...>
Date: 2018-05-02T13:42:42Z
[SPARK-24148][SQL] Overloading array function to support typed empty arrays.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]