GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/10391
[SPARK-12439][SQL] Fix toCatalystArray and MapObjects
JIRA: https://issues.apache.org/jira/browse/SPARK-12439
In toCatalystArray, we should look at the data type returned by dataTypeFor
instead of silentSchemaFor, to determine if the element is native type. An
obvious problem is when the element is Option[Int] class, catalsilentSchemaFor
will return Int, then we will wrongly recognize the element is native type.
There is another problem when using Option as array element. When we encode
data like Seq(Some(1), Some(2), None) with encoder, we will use MapObjects to
construct an array for it later. But in MapObjects, we don't check if the
return value of lambdaFunction is null or not. That causes a bug that the
decoded data for Seq(Some(1), Some(2), None) would be Seq(1, 2, -1), instead of
Seq(1, 2, null).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 fix-catalystarray
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10391.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10391
----
commit 788a7c6b2156d43be9c389d6a67b70e8ff9bbbb2
Author: Liang-Chi Hsieh <[email protected]>
Date: 2015-12-19T12:45:48Z
Fix toCatalystArray and MapObjects.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]