Use explode_outer() when rows have null values. tor. 16. feb. 2023 kl. 16:48 skrev Navneet <kr.navneetve...@gmail.com>:
> I am not expert, may be try if this works: > In order to achieve the desired output using the explode() method in > Java, you can create a User-Defined Function (UDF) that zips the lists > in each row and returns the resulting list. Here's an example > implementation: > > typescript > Copy code > import org.apache.spark.sql.Row; > import org.apache.spark.sql.api.java.UDF1; > import org.apache.spark.sql.types.DataTypes; > > public class ZipRows implements UDF1<Row, Row> { > @Override > public Row call(Row row) { > List<String> list1 = row.getList(0); > List<String> list2 = row.getList(1); > List<String> list3 = row.getList(2); > List<List<String>> zipped = new ArrayList<>(); > for (int i = 0; i < list1.size(); i++) { > List<String> sublist = new ArrayList<>(); > sublist.add(list1.get(i)); > sublist.add(list2.get(i)); > sublist.add(list3.get(i)); > zipped.add(sublist); > } > return RowFactory.create(zipped); > } > } > This UDF takes a Row as input, which contains the three lists in each > row of the original DataFrame. It then zips these lists using a loop > that creates a new sublist for each element in the lists. Finally, it > returns a new Row that contains the zipped list. > > You can then use this UDF in combination with explode() to achieve the > desired output: > > javascript > Copy code > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import static org.apache.spark.sql.functions.*; > > // assuming you have a Dataset<Row> called "df" > df.withColumn("zipped", callUDF(new ZipRows(), > > DataTypes.createArrayType(DataTypes.createArrayType(DataTypes.StringType))), > "col1", "col2", "col3") > .selectExpr("explode(zipped) as zipped") > .selectExpr("zipped[0] as col1", "zipped[1] as col2", "zipped[2] as col3") > .show(); > This code first adds a new column called "zipped" to the DataFrame > using the callUDF() function, which applies the ZipRows UDF to the > "col1", "col2", and "col3" columns. It then uses explode() to explode > the "zipped" column, and finally selects the three sub-elements of the > zipped list as separate columns using selectExpr(). The output should > be the desired DataFrame. > > > > Regards, > Navneet Kr > > > On Thu, 16 Feb 2023 at 00:07, Enrico Minack <i...@enrico.minack.dev> > wrote: > > > > You have to take each row and zip the lists, each element of the result > becomes one new row. > > > > So turn write a method that turns > > Row(List("A","B","null"), List("C","D","null"), > List("E","null","null")) > > into > > List(List("A","C","E"), List("B","D","null"), > List("null","null","null")) > > and use flatmap with that method. > > > > In Scala, this would read: > > > > df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), > row.getSeq[String](2)).zipped.toIterable }.show() > > > > Enrico > > > > > > Am 14.02.23 um 22:54 schrieb sam smith: > > > > Hello guys, > > > > I have the following dataframe: > > > > col1 > > > > col2 > > > > col3 > > > > ["A","B","null"] > > > > ["C","D","null"] > > > > ["E","null","null"] > > > > > > > > I want to explode it to the following dataframe: > > > > col1 > > > > col2 > > > > col3 > > > > "A" > > > > "C" > > > > "E" > > > > "B" > > > > "D" > > > > "null" > > > > "null" > > > > "null" > > > > "null" > > > > > > How to do that (preferably in Java) using the explode() method ? knowing > that something like the following won't yield correct output: > > > > for (String colName: dataset.columns()) > > dataset=dataset.withColumn(colName,explode(dataset.col(colName))); > > > > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297