MGHawes commented on a change in pull request #25575: [SPARK-28818][SQL] 
Respect source column nullability in the arrays created by `freqItems()`
URL: https://github.com/apache/spark/pull/25575#discussion_r318092473
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala
 ##########
 @@ -117,10 +112,16 @@ object FrequentItems extends Logging {
     )
     val justItems = freqItems.map(m => m.baseMap.keys.toArray)
     val resultRow = Row(justItems : _*)
-    // append frequent Items to the column name for easy debugging
-    val outputCols = colInfo.map { v =>
-      StructField(v._1 + "_freqItems", ArrayType(v._2, false))
-    }
+
+    val originalSchema = df.schema
+    val outputCols = cols.map { name =>
+      val index = originalSchema.fieldIndex(name)
+      val originalField = originalSchema.fields(index)
+
+      // append frequent Items to the column name for easy debugging
+      StructField(name + "_freqItems", ArrayType(originalField.dataType, 
originalField.nullable))
+    }.toArray
+
 
 Review comment:
   In the interests of leaving the code in a better state than I found it 😄 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to