[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enrico Minack updated SPARK-39292: ---------------------------------- Description: In SPARK-38864, the melt function was added to Dataset. It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following: Given a Dataset with following schema: {code:java} root |-- an: struct (nullable = false) | |-- id: integer (nullable = false) |-- str: struct (nullable = false) | |-- one: string (nullable = true) | |-- two: string (nullable = true) {code} For example: {code:java} +---+-------------+ | an| str| +---+-------------+ |{1}| {one, One}| |{2}| {two, null}| |{3}|{null, three}| |{4}| {null, null}| +---+-------------+ {code} Melting with value columns {{Seq("str.one", "str.two")}} on id columns {{Seq("an.id")}} would result in {code:java} +--+--------+-----+ |an|variable|value| +--+--------+-----+ | 1| str.one| one| | 1| str.two| One| | 2| str.one| two| | 2| str.two| null| | 3| str.one| null| | 3| str.two|three| | 4| str.one| null| | 4| str.two| null| +--+--------+-----+ {code} See test in {{org.apache.spark.sql.MeltSuite}}: {code:java} test("SPARK-39292: melt with struct fields") { val df = meltWideDataDs.select( struct($"id").as("an"), struct( $"str1".as("one"), $"str2".as("two") ).as("str") ) checkAnswer( Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", "value"), meltedWideDataRows.map(row => Row( row.getInt(0), row.getString(1) match { case "str1" => "str.one" case "str2" => "str.two" }, row.getString(2) )) ) } {code} was: In SPARK-38864, the melt function was added to Dataset. It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following: Given a Dataset with following schema: {code:java} root |-- an: struct (nullable = false) | |-- id: integer (nullable = false) |-- str: struct (nullable = false) | |-- one: string (nullable = true) | |-- two: string (nullable = true) {code} For example: {code:java} +---+-------------+ | an| str| +---+-------------+ |{1}| {one, One}| |{2}| {two, null}| |{3}|{null, three}| |{4}| {null, null}| +---+-------------+ {code} Melting with value columns {{Seq("str.one", "str.two")}} on id columns {{Seq("an.id")}} would result in {code:java} +--+--------+-----+ |an|variable|value| +--+--------+-----+ | 1| str.one| one| | 1| str.two| One| | 2| str.one| two| | 2| str.two| null| | 3| str.one| null| | 3| str.two|three| | 4| str.one| null| | 4| str.two| null| +--+--------+-----+ {code} See test in {{org.apache.spark.sql.MeltSuite}}: {code:java} test("melt with struct fields") { val df = meltWideDataDs.select( struct($"id").as("an"), struct( $"str1".as("one"), $"str2".as("two") ).as("str") ) checkAnswer( Melt.of(df, Seq("an.id"), Seq("str.one", "str.two")), meltedWideDataRows.map(row => Row( row.getInt(0), row.getString(1) match { case "str1" => "str.one" case "str2" => "str.two" }, row.getString(2) )) ) } {code} > Make Dataset.melt work with struct fields > ----------------------------------------- > > Key: SPARK-39292 > URL: https://issues.apache.org/jira/browse/SPARK-39292 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: Enrico Minack > Priority: Major > > In SPARK-38864, the melt function was added to Dataset. > It would be nice if fields of struct fields could be used as id and value > columns. This would allow for the following: > Given a Dataset with following schema: > {code:java} > root > |-- an: struct (nullable = false) > | |-- id: integer (nullable = false) > |-- str: struct (nullable = false) > | |-- one: string (nullable = true) > | |-- two: string (nullable = true) > {code} > For example: > {code:java} > +---+-------------+ > | an| str| > +---+-------------+ > |{1}| {one, One}| > |{2}| {two, null}| > |{3}|{null, three}| > |{4}| {null, null}| > +---+-------------+ > {code} > Melting with value columns {{Seq("str.one", "str.two")}} on id columns > {{Seq("an.id")}} would result in > {code:java} > +--+--------+-----+ > |an|variable|value| > +--+--------+-----+ > | 1| str.one| one| > | 1| str.two| One| > | 2| str.one| two| > | 2| str.two| null| > | 3| str.one| null| > | 3| str.two|three| > | 4| str.one| null| > | 4| str.two| null| > +--+--------+-----+ > {code} > See test in {{org.apache.spark.sql.MeltSuite}}: > {code:java} > test("SPARK-39292: melt with struct fields") { > val df = meltWideDataDs.select( > struct($"id").as("an"), > struct( > $"str1".as("one"), > $"str2".as("two") > ).as("str") > ) > checkAnswer( > Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", > "value"), > meltedWideDataRows.map(row => Row( > row.getInt(0), > row.getString(1) match { > case "str1" => "str.one" > case "str2" => "str.two" > }, > row.getString(2) > )) > ) > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org