GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/6222

    [SPARK-7691] [WIP] Refactor CatalystTypeConverter to use type-specific row 
accessors

    This patch significantly refactors CatalystTypeConverters to both clean up 
the code and enable these conversions to work with future Project Tungsten 
features.
    
    At a high level, I've reorganized the code so that all functions dealing 
with the same type are grouped together into type-specific subclasses of 
`CatalystTypeConveter`.  In addition, I've added new methods that allow the 
Catalyst Row -> Scala Row conversions to access the Catalyst row's fields 
through type-specific `getTYPE()` methods rather than the generic `get()` / 
`Row.apply` methods.  This refactoring is a blocker to being able to unit test 
new operators that I'm developing as part of Project Tungsten, since those 
operators may output `UnsafeRow` instances which don't support the generic 
`get()`.
    
    The stricter type usage of types here has uncovered some bugs in other 
parts of Spark SQL:
    
    - #6217: DescribeCommand is assigned wrong output attributes in 
SparkStrategies
    - #6218: DataFrame.describe() should cast all aggregates to doubles
    
    Once those PRs are merged, I'll rebase this patch and remove the `[WIP]` 
tag.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark 
catalyst-converters-refactoring

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6222
    
----
commit a4eec9f0b1d909636846ec88946768d79bc18205
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T08:20:39Z

    Add failing regression test for SPARK-7686

commit 953a344ae56b0f6a5677ebe295348d0829feb070
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T08:21:10Z

    Fix SPARK-7686 with a simple change in SparkStrategies.

commit bedb4909a1abd88c390d7cbd902a25c8691f38f3
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T08:54:12Z

    Add failing regression test for SPARK-7687

commit 696206cb7cad45b414c2b54294fa3400fa176ef2
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T08:54:27Z

    Cast to double to fix SPARK-7687

commit aa46bcc617d030e6a611e2e8a7b23640a5b87637
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T18:44:35Z

    Merge branch 'SPARK-7687' into catalyst-converters-refactoring

commit 856aa5e655e401dfcf2435738b3a0ed80777092c
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T18:44:42Z

    Merge branch 'SPARK-7686' into catalyst-converters-refactoring

commit 8ac5c861bd5322c87affe9ec5ee47cd71ff7bfb0
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T18:45:12Z

    WIP refactoring of CatalystTypeConverters

commit 93314c51124e357d230b95e7e82faa7989bcae28
Author: Josh Rosen <[email protected]>
Date:   2015-05-17T23:14:10Z

    Comments and cleanup

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to