GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/15718
[SPARK-16839][SQL] Simplify Struct creation code path
## What changes were proposed in this pull request?
Simplify struct creation, especially the aspect of `CleanupAliases` which
missed some aliases when handling trees created by `CreateStruct`.
This PR includes:
1. A failing test (create struct with nested aliases, some of the aliases
survive `CleanupAliases`).
2. A fix that transforms `CreateStruct` into a `CreateNamedStruct`
constructor, effectively eliminating `CreateStruct` from all expression trees.
3. A `NamePlaceHolder` used by `CreateStruct` when column names cannot be
extracted from unresolved `NamedExpression`.
4. A new Analyzer rule that resolves `NamePlaceHolder` into a string
literal once the `NamedExpression` is resolved.
5. `CleanupAliases` code was simplified as it no longer has to deal with
`CreateStruct`'s top level columns.
## How was this patch tested?
Running all tests-suits in package org.apache.spark.sql, especially
including the analysis suite, making sure added test initially fails, after
applying suggested fix rerun the entire analysis package successfully.
Modified few tests that expected `CreateStruct` which is now transformed
into `CreateNamedStruct`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-16839-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15718.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15718
----
commit 313d584532a4435e02e39e574070354cdef240ea
Author: eyal farago <eyal farago>
Date: 2016-08-01T21:51:45Z
SPARK-16839_redundant_aliases_after_cleanupAliases: failing test.
commit 0ac4a4d4d75c0ceb0c32619d8658d40b7136af46
Author: eyal farago <eyal farago>
Date: 2016-08-01T23:01:43Z
SPARK-16839_redundant_aliases_after_cleanupAliases: trnsform CreateStruct
into CreateNamedStruct, this maintains field names and simplifies Alias removal.
commit 8920c12a2a363c90b441dd7f29400c951ce9cd2f
Author: eyal farago <eyal farago>
Date: 2016-08-01T23:03:31Z
SPARK-16839_redundant_aliases_after_cleanupAliases: fix tests, all tests in
analysis package are now passing.
commit ae4ec34f8c6148e33c4bb87bc0df2e4c764d09a9
Author: eyal farago <eyal farago>
Date: 2016-08-03T18:05:06Z
SPARK-16839_redundant_aliases_after_cleanupAliases: styling fixes.
commit 0bcbab7566ade873b746157eba432684dd0d1240
Author: eyal farago <eyal farago>
Date: 2016-08-03T18:28:19Z
SPARK-16839_redundant_aliases_after_cleanupAliases: moved the
struct=>namedStruct loginc into its own Analyzer rule.
commit c0f601cbe8dab6c50d8a9da5b390d1db8a4b92f8
Author: eyal farago <eyal farago>
Date: 2016-08-04T20:05:47Z
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
CreateStruct is now unevaluable, direct test is marked as ignored.
commit 26565474a67b4bc2b35d72445e1a133a393ad034
Author: eyal farago <eyal farago>
Date: 2016-08-04T20:06:36Z
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
reorg analyzer rules according to cloud-fan's comment.
commit a0f491b5c276071c5080a85e5483170e548ce35c
Author: eyal farago <eyal farago>
Date: 2016-08-04T20:35:51Z
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
ExpressionEncoder used CreateStruct directly, fixed.
commit c701c01ab954c2413f5a757fe39bb097e15fe91b
Author: eyal farago <eyal farago>
Date: 2016-08-04T21:29:53Z
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
refactored ResolveCreateStruct, move the conversion logic to CrateStruct and
CreateStructUnsafe, introduce a helper mixin.
commit a1d240deb6bb6de75eb3eae1366b76e0a4d1b82d
Author: eyal farago <eyal farago>
Date: 2016-08-04T21:30:57Z
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
few tester fixes, few tests are still failing.
commit 2d377f914253849646efe15b990321e3bb866da2
Author: eyal farago <eyal farago>
Date: 2016-08-04T22:13:47Z
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct:
it turns out we're calling toCreateNameStruct during AST construction (and some
tests), in these cases the tree is not resolved yet which causes the dataType
method to explode. instead of relying on the dataType method, we examine the
children names directly, as a manner of code reuse dataType is now implemented
in terms of toCreateNamedStruct.
commit 3fc24b72317f1e15953d3b24a0b2e4370b619c01
Author: eyal farago <eyal farago>
Date: 2016-08-21T20:04:38Z
fix a mintority that prevented decent debugging on windows (default
resulted with an invalid URI, URI considered file:c:/... as a relative path,
modified to file:c:/...).
commit 62d060c7678aeb2c2155d46463ded7f505388b9c
Author: eyal farago <eyal farago>
Date: 2016-08-21T21:03:05Z
SPARK-16839_redundant_aliases_after_cleanupAliases: make sure not to miss
any expressions 'hiding' in projections or other corners of the logical plan.
commit ffc521ec1202b22b160af8d11bb4172791f58837
Author: eyal farago <eyal farago>
Date: 2016-08-21T21:58:01Z
Merge remote-tracking branch 'origin/master' into
SPARK-16839_redundant_aliases_after_cleanupAliases
commit 27c8157afd9d44a2ca4b32e3894425f8762d8bbe
Author: eyal farago <eyal farago>
Date: 2016-08-21T21:59:50Z
SPARK-16839_redundant_aliases_after_cleanupAliases: fix after merge,
ast-builder no longer 'resolves' createStruct into createNamedStruct.
commit 7f1be3449f7ecd10ce97744c1df5db4e5946aa98
Author: eyal farago <eyal farago>
Date: 2016-08-29T17:48:36Z
SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStruct is now a
'constructor' of CreateNamedStruct.
commit 7ed5144b6f234c36209f68732908958f8d2c4eac
Author: eyal farago <eyal farago>
Date: 2016-08-29T19:07:23Z
SPARK-16839_redundant_aliases_after_cleanupAliases: address some review
comments.
commit 48ed3a9283c986e998a812dd6c18c9d88374ac49
Author: eyal farago <eyal farago>
Date: 2016-08-29T19:15:14Z
SPARK-16839_redundant_aliases_after_cleanupAliases: restore two ignored
tests.
commit fa1ae24a7fe81ddaec905a6d5ad31326465ad9c1
Author: eyal farago <eyal farago>
Date: 2016-08-29T19:26:09Z
SPARK-16839_redundant_aliases_after_cleanupAliases: fix formatting.
commit 76dd6a4417be14c9293132dcdf6ab966907ddfa2
Author: eyal farago <eyal farago>
Date: 2016-09-05T17:58:05Z
SPARK-16839_redundant_aliases_after_cleanupAliases: simplify CreateStruct
and CreateStructUnsafe. restructure test to meet project's standards.
commit 8a34fb25b246092841d94fed33de31f4a2995aa6
Author: eyal farago <eyal farago>
Date: 2016-09-05T19:01:51Z
SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStructUnsafe is
no more.
commit 3f7b63f6fb087f04b9d0c9fd0c7861d65a5ef3b2
Author: eyal farago <eyal farago>
Date: 2016-09-05T20:47:41Z
SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant
comments.
commit d6fbe2aa5f9bc59d00611c9e5fe02a7831f571ce
Author: eyal farago <eyal farago>
Date: 2016-09-07T20:11:50Z
SPARK-16839_redundant_aliases_after_cleanupAliases: fix reviewer's comments
commit 5012a47ac723034b77f34eb5cb7c7775d2f9a570
Author: eyal farago <eyal farago>
Date: 2016-09-08T18:32:33Z
SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant
annotation.
commit a77ec20b695f624a3a163b04bdf8621b5b99beb1
Author: eyal farago <eyal farago>
Date: 2016-09-12T21:52:33Z
SPARK-16839_redundant_aliases_after_cleanupAliases: remove duplicate test.
commit 53308ea3c4a1d2c08ccf621d6a0ab8ec7e7cfc4c
Author: eyal farago <eyal farago>
Date: 2016-09-12T21:54:25Z
SPARK-16839_redundant_aliases_after_cleanupAliases: updated hive golden
file for a test affected by CreateStruct becoming a CreateNamedStruct.
commit 4cbb9a52e532a40449bc3038cd6bf4c4c3bad82d
Author: Eyal Farago <[email protected]>
Date: 2016-09-13T07:28:32Z
Merge branch 'master' into
SPARK-16839_redundant_aliases_after_cleanupAliases
Conflicts:
sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql
commit c1d5b2034b9d38d578cc97f625d73246de82bad8
Author: Hyukjin Kwon <[email protected]>
Date: 2016-09-14T11:26:31Z
Fix R tests to pass by using alias for struct functions (#1)
fix R tests that relied on the generated alias for CreateStruct expressions
(thanks to @HyukjinKwon)
commit f4f3de40ca3a4a0f381a746e7b23b535c43c04a7
Author: eyal farago <eyal farago>
Date: 2016-09-14T17:47:30Z
SPARK-16839_redundant_aliases_after_cleanupAliases: fix quotes type in
expected output.
commit 60a49c355c4da647991cbc94954b0336f0b597cd
Author: Eyal Farago <[email protected]>
Date: 2016-09-15T13:53:32Z
Merge branch 'master' into
SPARK-16839_redundant_aliases_after_cleanupAliases
Conflicts:
sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]