GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/6222
[SPARK-7691] [WIP] Refactor CatalystTypeConverter to use type-specific row
accessors
This patch significantly refactors CatalystTypeConverters to both clean up
the code and enable these conversions to work with future Project Tungsten
features.
At a high level, I've reorganized the code so that all functions dealing
with the same type are grouped together into type-specific subclasses of
`CatalystTypeConveter`. In addition, I've added new methods that allow the
Catalyst Row -> Scala Row conversions to access the Catalyst row's fields
through type-specific `getTYPE()` methods rather than the generic `get()` /
`Row.apply` methods. This refactoring is a blocker to being able to unit test
new operators that I'm developing as part of Project Tungsten, since those
operators may output `UnsafeRow` instances which don't support the generic
`get()`.
The stricter type usage of types here has uncovered some bugs in other
parts of Spark SQL:
- #6217: DescribeCommand is assigned wrong output attributes in
SparkStrategies
- #6218: DataFrame.describe() should cast all aggregates to doubles
Once those PRs are merged, I'll rebase this patch and remove the `[WIP]`
tag.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark
catalyst-converters-refactoring
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6222.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6222
----
commit a4eec9f0b1d909636846ec88946768d79bc18205
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T08:20:39Z
Add failing regression test for SPARK-7686
commit 953a344ae56b0f6a5677ebe295348d0829feb070
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T08:21:10Z
Fix SPARK-7686 with a simple change in SparkStrategies.
commit bedb4909a1abd88c390d7cbd902a25c8691f38f3
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T08:54:12Z
Add failing regression test for SPARK-7687
commit 696206cb7cad45b414c2b54294fa3400fa176ef2
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T08:54:27Z
Cast to double to fix SPARK-7687
commit aa46bcc617d030e6a611e2e8a7b23640a5b87637
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T18:44:35Z
Merge branch 'SPARK-7687' into catalyst-converters-refactoring
commit 856aa5e655e401dfcf2435738b3a0ed80777092c
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T18:44:42Z
Merge branch 'SPARK-7686' into catalyst-converters-refactoring
commit 8ac5c861bd5322c87affe9ec5ee47cd71ff7bfb0
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T18:45:12Z
WIP refactoring of CatalystTypeConverters
commit 93314c51124e357d230b95e7e82faa7989bcae28
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T23:14:10Z
Comments and cleanup
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]