Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232485612
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -27,17 +27,62 @@ import
org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans.physical._
-import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan,
UnaryExecNode}
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples,
ClusteredDistribution, Distribution, Partitioning}
+import org.apache.spark.sql.execution.{ExternalAppendOnlyUnsafeRowArray,
SparkPlan}
import org.apache.spark.sql.execution.arrow.ArrowUtils
-import org.apache.spark.sql.types.{DataType, StructField, StructType}
+import org.apache.spark.sql.execution.window._
+import org.apache.spark.sql.types._
import org.apache.spark.util.Utils
+/**
+ * This class calculates and outputs windowed aggregates over the rows in
a single partition.
+ *
+ * It is very similar to [[WindowExec]] and has similar logic. The main
difference is that this
+ * node doesn't not compute any window aggregation values. Instead, it
computes the lower and
+ * upper bound for each window (i.e. window bounds) and pass the data and
indices to python work
+ * to do the actual window aggregation.
+ *
+ * It currently materialize all data associated with the same partition
key and pass them to
--- End diff --
tiny typo: `materialize` -> `materializes` and `pass` -> `passes`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]