GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/15718

    [SPARK-16839][SQL] Simplify Struct creation code path

    ## What changes were proposed in this pull request?
    
    Simplify struct creation, especially the aspect of `CleanupAliases` which 
missed some aliases when handling trees created by `CreateStruct`.
    
    This PR includes:
    
    1. A failing test (create struct with nested aliases, some of the aliases 
survive `CleanupAliases`).
    2. A fix that transforms `CreateStruct` into a `CreateNamedStruct` 
constructor, effectively eliminating `CreateStruct` from all expression trees.
    3. A `NamePlaceHolder` used by `CreateStruct` when column names cannot be 
extracted from unresolved `NamedExpression`.
    4. A new Analyzer rule that resolves `NamePlaceHolder` into a string 
literal once the `NamedExpression` is resolved.
    5. `CleanupAliases` code was simplified as it no longer has to deal with 
`CreateStruct`'s top level columns.
    
    ## How was this patch tested?
    Running all tests-suits in package org.apache.spark.sql, especially 
including the analysis suite, making sure added test initially fails, after 
applying suggested fix rerun the entire analysis package successfully.
    
    Modified few tests that expected `CreateStruct` which is now transformed 
into `CreateNamedStruct`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-16839-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15718
    
----
commit 313d584532a4435e02e39e574070354cdef240ea
Author: eyal farago <eyal farago>
Date:   2016-08-01T21:51:45Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: failing test.

commit 0ac4a4d4d75c0ceb0c32619d8658d40b7136af46
Author: eyal farago <eyal farago>
Date:   2016-08-01T23:01:43Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: trnsform CreateStruct 
into CreateNamedStruct, this maintains field names and simplifies Alias removal.

commit 8920c12a2a363c90b441dd7f29400c951ce9cd2f
Author: eyal farago <eyal farago>
Date:   2016-08-01T23:03:31Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: fix tests, all tests in 
analysis package are now passing.

commit ae4ec34f8c6148e33c4bb87bc0df2e4c764d09a9
Author: eyal farago <eyal farago>
Date:   2016-08-03T18:05:06Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: styling fixes.

commit 0bcbab7566ade873b746157eba432684dd0d1240
Author: eyal farago <eyal farago>
Date:   2016-08-03T18:28:19Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: moved the 
struct=>namedStruct loginc into its own Analyzer rule.

commit c0f601cbe8dab6c50d8a9da5b390d1db8a4b92f8
Author: eyal farago <eyal farago>
Date:   2016-08-04T20:05:47Z

    
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
CreateStruct is now unevaluable, direct test is marked as ignored.

commit 26565474a67b4bc2b35d72445e1a133a393ad034
Author: eyal farago <eyal farago>
Date:   2016-08-04T20:06:36Z

    
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
reorg analyzer rules according to cloud-fan's comment.

commit a0f491b5c276071c5080a85e5483170e548ce35c
Author: eyal farago <eyal farago>
Date:   2016-08-04T20:35:51Z

    
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
ExpressionEncoder used CreateStruct directly, fixed.

commit c701c01ab954c2413f5a757fe39bb097e15fe91b
Author: eyal farago <eyal farago>
Date:   2016-08-04T21:29:53Z

    
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
refactored ResolveCreateStruct, move the conversion logic to CrateStruct and 
CreateStructUnsafe, introduce a helper mixin.

commit a1d240deb6bb6de75eb3eae1366b76e0a4d1b82d
Author: eyal farago <eyal farago>
Date:   2016-08-04T21:30:57Z

    
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
few tester fixes, few tests are still failing.

commit 2d377f914253849646efe15b990321e3bb866da2
Author: eyal farago <eyal farago>
Date:   2016-08-04T22:13:47Z

    
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_CreateStruct: 
it turns out we're calling toCreateNameStruct during AST construction (and some 
tests), in these cases the tree is not resolved yet which causes the dataType 
method to explode. instead of relying on the dataType method, we examine the 
children names directly, as a manner of code reuse dataType is now implemented 
in terms of toCreateNamedStruct.

commit 3fc24b72317f1e15953d3b24a0b2e4370b619c01
Author: eyal farago <eyal farago>
Date:   2016-08-21T20:04:38Z

    fix a mintority that prevented decent debugging on windows (default 
resulted with an invalid URI, URI considered file:c:/... as a relative path, 
modified to file:c:/...).

commit 62d060c7678aeb2c2155d46463ded7f505388b9c
Author: eyal farago <eyal farago>
Date:   2016-08-21T21:03:05Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: make sure not to miss 
any expressions 'hiding' in projections or other corners of the logical plan.

commit ffc521ec1202b22b160af8d11bb4172791f58837
Author: eyal farago <eyal farago>
Date:   2016-08-21T21:58:01Z

    Merge remote-tracking branch 'origin/master' into 
SPARK-16839_redundant_aliases_after_cleanupAliases

commit 27c8157afd9d44a2ca4b32e3894425f8762d8bbe
Author: eyal farago <eyal farago>
Date:   2016-08-21T21:59:50Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: fix after merge, 
ast-builder no longer 'resolves' createStruct into createNamedStruct.

commit 7f1be3449f7ecd10ce97744c1df5db4e5946aa98
Author: eyal farago <eyal farago>
Date:   2016-08-29T17:48:36Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStruct is now a 
'constructor' of CreateNamedStruct.

commit 7ed5144b6f234c36209f68732908958f8d2c4eac
Author: eyal farago <eyal farago>
Date:   2016-08-29T19:07:23Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: address some review 
comments.

commit 48ed3a9283c986e998a812dd6c18c9d88374ac49
Author: eyal farago <eyal farago>
Date:   2016-08-29T19:15:14Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: restore two ignored 
tests.

commit fa1ae24a7fe81ddaec905a6d5ad31326465ad9c1
Author: eyal farago <eyal farago>
Date:   2016-08-29T19:26:09Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: fix formatting.

commit 76dd6a4417be14c9293132dcdf6ab966907ddfa2
Author: eyal farago <eyal farago>
Date:   2016-09-05T17:58:05Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: simplify CreateStruct 
and CreateStructUnsafe. restructure test to meet project's standards.

commit 8a34fb25b246092841d94fed33de31f4a2995aa6
Author: eyal farago <eyal farago>
Date:   2016-09-05T19:01:51Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStructUnsafe is 
no more.

commit 3f7b63f6fb087f04b9d0c9fd0c7861d65a5ef3b2
Author: eyal farago <eyal farago>
Date:   2016-09-05T20:47:41Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant 
comments.

commit d6fbe2aa5f9bc59d00611c9e5fe02a7831f571ce
Author: eyal farago <eyal farago>
Date:   2016-09-07T20:11:50Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: fix reviewer's comments

commit 5012a47ac723034b77f34eb5cb7c7775d2f9a570
Author: eyal farago <eyal farago>
Date:   2016-09-08T18:32:33Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant 
annotation.

commit a77ec20b695f624a3a163b04bdf8621b5b99beb1
Author: eyal farago <eyal farago>
Date:   2016-09-12T21:52:33Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: remove duplicate test.

commit 53308ea3c4a1d2c08ccf621d6a0ab8ec7e7cfc4c
Author: eyal farago <eyal farago>
Date:   2016-09-12T21:54:25Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: updated hive golden 
file for a test affected by CreateStruct becoming a CreateNamedStruct.

commit 4cbb9a52e532a40449bc3038cd6bf4c4c3bad82d
Author: Eyal Farago <[email protected]>
Date:   2016-09-13T07:28:32Z

    Merge branch 'master' into 
SPARK-16839_redundant_aliases_after_cleanupAliases
    
    Conflicts:
        sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql

commit c1d5b2034b9d38d578cc97f625d73246de82bad8
Author: Hyukjin Kwon <[email protected]>
Date:   2016-09-14T11:26:31Z

    Fix R tests to pass by using alias for struct functions (#1)
    
    fix R tests that relied on the generated alias for CreateStruct expressions 
(thanks to @HyukjinKwon)

commit f4f3de40ca3a4a0f381a746e7b23b535c43c04a7
Author: eyal farago <eyal farago>
Date:   2016-09-14T17:47:30Z

    SPARK-16839_redundant_aliases_after_cleanupAliases: fix quotes type in 
expected output.

commit 60a49c355c4da647991cbc94954b0336f0b597cd
Author: Eyal Farago <[email protected]>
Date:   2016-09-15T13:53:32Z

    Merge branch 'master' into 
SPARK-16839_redundant_aliases_after_cleanupAliases
    
    Conflicts:
        sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to