GitHub user viirya reopened a pull request:
https://github.com/apache/spark/pull/22749
[SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag
## What changes were proposed in this pull request?
This is inspired during implementing #21732. For now `ScalaReflection`
needs to consider how `ExpressionEncoder` uses generated serializers and
deserializers. And `ExpressionEncoder` has a weird `flat` flag. After
discussion with @cloud-fan, it seems to be better to refactor
`ExpressionEncoder`. It should make SPARK-24762 easier to do.
To summarize the proposed changes:
1. `serializerFor` and `deserializerFor` return expressions for
serializing/deserializing an input expression for a given type. They are
private and should not be called directly.
2. `serializerForType` and `deserializerForType` returns an expression for
serializing/deserializing for an object of type T to/from Spark SQL
representation. It assumes the input object/Spark SQL representation is located
at ordinal 0 of a row.
So in other words, `serializerForType` and `deserializerForType` return
expressions for atomically serializing/deserializing JVM object to/from Spark
SQL value.
A serializer returned by `serializerForType` will serialize an object at
`row(0)` to a corresponding Spark SQL representation, e.g. primitive type,
array, map, struct.
A deserializer returned by `deserializerForType` will deserialize an input
field at `row(0)` to an object with given type.
3. The construction of `ExpressionEncoder` takes a pair of serializer and
deserializer for type `T`. It uses them to create serializer and deserializer
for T <-> row serialization. Now `ExpressionEncoder` dones't need to remember
if serializer is flat or not. When we need to construct new `ExpressionEncoder`
based on existing ones, we only need to change input location in the atomic
serializer and deserializer.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-24762-refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22749.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22749
----
commit e1b5deebe715479125c8878f0c90a55dc9ab3e85
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-07-09T03:42:04Z
Aggregator should be able to use Option of Product encoder.
commit 80506f4e98184ccd66dbaac14ec52d69c358020d
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-07-13T04:40:55Z
Enable top-level Option of Product encoders.
commit ed3d5cb697b10af2e2cf4c78ab521d4d0b2f3c9b
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-24T04:26:28Z
Remove topLevel parameter.
commit 9fc3f6165156051142a8366a32726badaaa16bb7
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-24T04:37:39Z
Merge remote-tracking branch 'upstream/master' into SPARK-24762
commit 5f95bd0cf1bd308c7df55c41caef7a9f19368f5d
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-24T04:42:33Z
Remove useless change.
commit a4f04055b2ba22f371663565710328791942855a
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-24T14:38:16Z
Add more tests.
commit c1f798f7e9cba0d04223eed06f1b1f547ec29dc5
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-25T01:52:01Z
Add test.
commit 80e11d289d7775863cb9c28b2c1d4364292048a4
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-06T04:06:57Z
Merge remote-tracking branch 'upstream/master' into SPARK-24762
commit 0f029b0a28700334dc6334f1ad89b3124f235a51
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-06T04:40:07Z
Improve code comments.
commit 84f3ce07f2f6a9236bd27f927fbb877e937f6917
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-15T09:55:03Z
Refactoring ExpressionEncoder.
commit 6a6fa454e22728cc2ad8e5515cd587fe0be84b26
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-17T02:07:40Z
Fix Malformed class name.
commit 25a616286075ca4f0a7d528095b387172b05c6c3
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-17T05:11:10Z
Fix error message.
commit 295ecde8103c26dda169d931f939f8a2fe641c4c
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-18T15:58:03Z
Fix test.
commit 85a91220ec4eb00bd9d5020ecf980eac0301f716
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-18T16:05:22Z
Merge remote-tracking branch 'upstream/master' into SPARK-24762-refactor
commit 35700f4a0f36fb397ac028a68011a2753c5c2c75
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-19T00:07:29Z
Fix rebase error.
commit b211ed069dceb33c45cf6caf12c19527334d4ad8
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-19T00:16:24Z
Fix unintentional style change.
commit 0c78b73e5abce2a51763c860e43aab214c8634d9
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-19T00:51:52Z
Address comments.
commit 5b9abb67907dfdb0c0c64751db3525564f832422
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-20T02:26:07Z
Address ComplexTypeMergingExpression issue.
commit 7432344143fb4889ed3d5cbde21872c8fdd6d3f1
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-20T12:47:37Z
Try more reasonable solution.
commit 400f87817183640006140e2db1839f8d78a13856
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-10-22T02:56:20Z
Address comment.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]