[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878111#comment-15878111 ] Apache Spark commented on SPARK-13721: -- User 'bogdanrdc' has created a pull request for this issue: https://github.com/apache/spark/pull/17026 > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom >Assignee: Bogdan Raducanu > Fix For: 2.2.0 > > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870106#comment-15870106 ] Apache Spark commented on SPARK-13721: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/16958 > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom >Assignee: Bogdan Raducanu > Fix For: 2.2.0 > > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824308#comment-15824308 ] Apache Spark commented on SPARK-13721: -- User 'bogdanrdc' has created a pull request for this issue: https://github.com/apache/spark/pull/16608 > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464427#comment-15464427 ] Ewan Leith commented on SPARK-13721: Assuming Don's use case is the same as ours, we have to do odd looking queries like this pseudo-code to get the full set of entries when using explode with records where the nested array is not always populated (with the .filter's to make it explicit what's happening): val df1 = df .filter("column.nested_array is not null") .withColumn("element", explode(col("column.nested_array"))) .select("other_column", "element") val df2 = df .filter("column.nested_array is null") .select("other_column", lit("") as "element") df1.unionAll(df2) > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459539#comment-15459539 ] Herman van Hovell commented on SPARK-13721: --- Could you explain what this would looks like? I am asking because adding {outer} to {explode()} is a bit weird, since outer is a property of the generate process and not of the generator. > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459490#comment-15459490 ] Don Drake commented on SPARK-13721: --- My nested structures aren't simple types, they are structs (case classes) and so this existing method works great for me. This ticket it about modifying the explode() call to support outer, not adding outer to the data frame api. > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452926#comment-15452926 ] Herman van Hovell commented on SPARK-13721: --- You can follow the suggestion in the deprecation warning and do this: {noformat} scala> val df = spark.range(1000).select($"id", array($"id" % 2, $"id" % 3).as("values")) scala> df.select($"id", explode($"values")).show +---+---+ | id|col| +---+---+ | 0| 0| | 0| 0| | 1| 1| | 1| 1| | 2| 0| | 2| 2| | 3| 1| | 3| 0| | 4| 0| | 4| 1| | 5| 1| | 5| 2| | 6| 0| | 6| 0| | 7| 1| | 7| 1| | 8| 0| | 8| 2| | 9| 1| | 9| 0| +---+---+ only showing top 20 rows {noformat} This not what the ticket is about, that would be for adding `outer` to the data frame api. > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452846#comment-15452846 ] Don Drake commented on SPARK-13721: --- Spark 2.0 has deprecated this function, what workarounds are suggested? > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13721) Add support for LATERAL VIEW OUTER explode()
[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184213#comment-15184213 ] Xiao Li commented on SPARK-13721: - That sounds reasonable. Maybe we can wait until DataFrame and DataSet APIs are combined. > Add support for LATERAL VIEW OUTER explode() > > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org